At my last job, I wrote a lot of backend code and had plenty of time to think about storage APIs or “data layers.”
Commonly we think of data layer in an application as a resource or object store, along with CRUD methods (Create, Read, Update, Delete) to interact with it. Unfortunately, this is one of those things that feels like a neat pattern, but isn’t. When you dig into it, a lot of unspecified details come to the surface:
- What are we creating? In some cases it is a whole object, but often there are parts that are added by the creation process, such as DB-generated ID or creation time.
- What does update really mean? Replacing an object wholesale is possible, but again, frequently we want to only update parts of an object (so the DB can reconcile non-conflicting changes), or provide conditions like “only update if it hasn’t been changed since I read it.”
- Should it be possible to update an object without starting with a retrieved copy? E.g. “set field F to value X” doesn’t require a read first.
- For reading, what is the input? We need our stored objects to have a unique derivable “key.” Frequently this is a DB-generated row number or random ID, but we can have unique keys be part of the data itself.
- What are the error conditions? What happens if we update an object that’s been deleted; should it be recreated? What happens if we create an object with a key that already exists? Or delete a key which doesn’t exist? Shall we have
allowExistsmodifiers? Shall we represent clashes in the same error channel as DB errors, or a different one?
- Maybe we shouldn’t have create or update, and instead have a “set”/”ensure”?
The idea of a clean separation between app layer and data layer starts to dissolve. For example, which layer should handle assigning creation times? Is there even a correct answer to that question? Different storage backends or databases might work so differently that trying to cover them in a single abstraction just doesn’t make much sense.
At my last job, I started to move towards having no layer separation at all, and just wrote the DB code directly in the app logic. I think for smaller applications this saves headaches compared to maintaining an arbitrary separation.
What if we tried to capture some of these details in the data API? What does a general algebra of CRUD actually look like? We might have something like this:
- Two types,
StoredMyObjectas a subtype of
- A function
getKeywhich returns a key type
Key. Depending on the kind of data, getKey is defined either on
StoredMyObject; i.e. in general, stored objects always have keys, but only some kinds of non-stored objects have keys.
create: (MyObject) -> StoredMyObject.
read: (Key) -> StoredMyObject.
- Updating has various approaches:
replace: (Key, MyObject) -> StoredMyObject.
- Sometimes this will be equivalent to delete-then-create, but not generally.
(Key, MyObject)could (should?) be just
getKeyis defined on
- If we want patching, we need a type
MyObjectPatchso we can define
patch: (Key, MyObjectPatch) -> StoredMyObject.
We might also have some function
derive: (StoredMyObject, ...) -> MyObjectPatchif we need an existing object to create patches. (
...is the description for some kind of change operation.) The familiar case of retrieving an object, changing a field, then saving it is then actually something like this:
x = read(k) xp = derive(x, ...) x = patch(k, xp)
- For update conditions,
patchmight need to accept an extra
- We might also have a function
core: (StoredMyObject) -> MyObjectwhich returns the blueprint for recreating a stored object.
Note that a key-value store doesn’t fit the above schema. Key-value stores have a read which does
(Key) -> Value; we would need to augment it so it looks like
read: (Key) -> (Key, Value).
Overall, given how much I read about software, I was surprised to feel like I was figuring this out myself. Surely saving and loading state with databases is something the vast majority of software does? Yet I feel like I’ve heard more about how to avoid state with e.g. functional programming than I have heard how to deal with state well. There is the great talk Are We There Yet by Rich Hickey, which is about modelling change and time within a program. And the Ruby Object Mapper seems like a comprehensive set of data mapper building blocks. Is there any other material out there on this kind of thing?