How to ditch your ORM

In my previous post on how to simplify code by using composition I talked about how we can reduce complexity by removing an AOP-framework (or annotations-based programming). In this post I want to continue on the same line and talk about how we can reduce complexity by removing an ORM and replacing it by a simpler pattern. Before I show how we can get rid of the ORM I want to talk about why I think ORM’s introduce complexity.

ORM’s are not evil, they have certain advantages and disadvantages. These are some of their characteristics and how they may influence a project.

  • Simplicity: At the start of a project, an ORM is a real productivity booster, because you can load and save objects by writing very little code. There’s probably very little complexity in your domain model, so your model is very similar to your database structure and so mapping is very easy. When your model becomes more complex, mapping will get more complex. When this happens, you have a problem: either your complex domain model will be restricted by how your database is designed (in order to have simpler mapping) or your mapping will become very complex. (Here’s one example of how an ORM can limit your ability to model your application: http://stackoverflow.com/questions/17275030/how-to-map-a-value-type-which-has-a-reference-to-an-entity )
  • Abstraction: An ORM provides an abstraction. This abstraction is leaky. If you look at documentation of any ORM, you will find a lot of references to SQL concepts. I have never been able to treat the ORM as just an object store, every time I needed to know how the ORM does things in order to get the correct data. Think about it like this: if you wouldn’t know anything about SQL, would you be able to use an ORM?
  • Learning curve: Every ORM has a different API. That means that with every ORM, you’ll have a new learning curve. Since it’s a leaky abstraction, it doesn’t free you from learning SQL either so now you not only need to learn SQL, but also (N)Hibernate and later Entity Framework and later …
  • Efficiency: All ORM’s do admit that you will be giving up a bit of efficiency. For small projects, that’s not an issue. When it becomes an issue, you’ll need to bypass the ORM and access the database with plain SQL, again asserting the point that an ORM is a leaky abstraction. (See also the part on ORM’s in my post about simplicity in software)

So, ORM’s have certain disadvantages and in my opinion they are not a good fit for complex applications, because they tend to increase complexity. But they’re not useless either: they can increase productivity at the start of a project (and can be removed/replaced when necessary). If your application is small and is very CRUD oriented, they provide great value as well.

The reason that ORM’s can only provide a leaky abstraction is that object relational mapping is in fact very hard (also known as the Vietnam of Computer Science).

The problem

Object hierarchies are inherently very different from relational hierarchies. Relational hierarchies center around data, whereas objects gravitate towards behavior (at least it should). OO modeling is a lot more powerful than relational modeling and because software development is in fact very difficult, we want the most powerful tool at our disposal. The issue is that we try to create a mapping between a database and an object model. This has some consequences: we will either have a limited object model that is just a representation of our relational model (lowest common denominator), or we will have very complex mappings (which can break down as we continue to model).

Changing the problem

I’m not pretending that you should write yet another ORM, or that I’m creating a new revolutionary ORM. As I said, object relational mapping is hard, so instead of trying to solve this problem, we want to change the problem so we don’t have to deal with it.

A first step towards changing the problem is realizing that reading and writing are two very different operations. Typically when you write, you want to ensure consistency. To ensure consistency, you need a strong model (DDD is one approach towards a stronger model). When you read, you’re trying to display the saved data in a certain form. Two read operations on the same data, may want a different representation. Ideally, when you’re reading you want a simple, flat model. Thus, the requirements imposed on the model are different for reading and writing. If we create a model that caters to reading and writing it will be more complex.

The first step towards easier mapping is to split out our read and write model. This means we can have our simple models on the read side, but still have a strong model that ensures consistency on the write side.

Tackling the read side

On the read side models are relatively simple, so we don’t need any complex mapping. In fact, reading should just be about projecting data into our models. Because it’s just a projection, no ORM is needed and you can just write plain SQL queries. You could use one of the micro-ORM’s available (PetaPoco is my personal favorite on the .NET platform). Although technically these are also ORM’s, they don’t carry the same weight as their full-blown counterparts. The biggest difference is that they don’t try to abstract the database away. I prefer to think of them as SQL-libraries rather than ORM’s.

Tackling the write side

On the write side, we’ll usually have a complex model that enforces constraints. If we want to persist our entire entity (or aggregate root) at once, that means that we need to do some complex mapping or write complex queries. If you were to use a repository pattern, when you call the save method, that repository will somehow have to find out what has changed and how that relates to what is in the database. This is hard and is the root of most complex mapping. If these notifications are fine grained, the listeners can be very simple. Let’s see an example of how we would persist a user;

Example 1: Persisting a user using a repository

public class User
{
    public int Id {get; set;}
    public string Name {get; set;}
    public List<User> Friends {get; set;}
}

public class UserRepository
{
    public void Save(User user)
    {
         // what has changed?
         // added, removed friends?
         // updated name of a friend
         // changed name?
    }
}

In the example above, when we save a user, so many things could have changed that the save-method has a hard time figuring out what to persist. An ORM takes this work out of your hands but then your mapping can become complex (is it a many-to-many? what happens if I change a friends name? if a friend does not have an ID, does it do an insert?, …). This is a trivial example and most ORM’s can handle this fairly easy, but in more complex scenarios, mapping can become really difficult and obscure. Handling this manually is very difficult as well, since you need to manage all these changes yourself (and essentially you’d be writing your own ORM).

In order to circumvent this complexity, we need a different strategy. We could let the model notify what has happened and then have a dedicated listener listen to those changes:

public class User
{   
    int id;
    public void AddFriend(User friend)   
    {
        EventBus.Raise(new FriendAddedToUser(id, friend));
    }

    public void RemoveFriend(User friend)   
    {
        EventBus.Raise(new FriendRemovedFromUser(id, friend));
    }

    public void ChangeName(string name)   
    {
        EventBus.Raise(new UserNameChanged(id, name));
    }
}

public class UserEventHandlers
{
    public void Handle(FriendAddedToUser @event)
    {
         // pseudo code
         // insert into user_friends (userid, friendid) values (@event.Id, @event.Friend.Id);
    }
    public void Handle(FriendRemovedFromUser @event)
    {
         // pseudo code
         // delete from user_friends where userid = @event.Id and @event.Friend.Id);

    }
    public void Handle(UserNameChanged @event)
    {
         // pseudo code
         // Update users set name = @event.Name where id = @event.Id;
    }
}

The User-class does not have any public properties, only method calls. We don’t need properties, since we’re not using this class to read and we’re not using an ORM. Whenever a method is called, the User class will push some event onto a bus. The bus will then look up one or more handlers to handle those events. In this case there’s just one for each event. Because these events are very fine grained, the resulting SQL is very easy to write (again you could use micro ORM to make life simpler, PetaPoco has an excellent SQL-builder that makes this trivial. Did I say I like PetaPoco yet?).
An added advantage is that it makes it easier to enforce constraints on your model. Here the methods just raised an event, but they could do anything to enforce invariants. In the first sample, public properties are exposed so there’s no way to control what happens. Of course you could add some checks, but if you’re going to a use an ORM, you will need public properties. You do need a bit of infrastructure code to set up the event-bus, but it’s fairly trivial and it’s a one-off investment.

We’re replacing a repository pattern with a publish-subscribe pattern in order to decouple domain logic from data access logic.

One part that I didn’t include here, is loading a user. This is a different use case than reading for displaying a user. In order to be able to enforce invariants, you need the whole aggregate to be loaded. Let’s say that a user can have a maximum of 10 friends. In order to enforce this, the method AddFriend needs to know how many friends there currently are. For that you can use the memento pattern:

Memento pattern: Without violating encapsulation, capture and externalize an object’s internal state so that the object can be restored to this state later. (Design patterns, elements of reusable Object-Oriented software)

public class User
{   
    int id;
    List<User> friends;
    public void AddFriend(User friend)   
    {
        if(friends.Count() < 10)
        {
            friends.Add(friend);
            EventBus.Raise(new FriendAddedToUser(id, friend));
        }
        else
        {
             throw new Exception("nope");
        }
    }

    public void RemoveFriend(User friend)   
    {
        EventBus.Raise(new FriendRemovedFromUser(id, friend));
    }

    public void ChangeName(string name)   
    {
        EventBus.Raise(new UserNameChanged(id, name));
    }
    public static User FromMemento(UserMemento memento)
    {
        var user = new User();
        user.id = memento.id;
        user.friends = memento.friends;
        return user;
    }
}

In this case, we’re using only half of the memento pattern (the restoring part) since the data is already captured in the database. Since the memento object is a simple bag of properties, it can easily be read from the database in the same way as we are projecting data into our read-model.

Conclusion

Using these patterns comes with the cost of more infrastructure code, although this code is relatively easy to write. In my experience, on complex domains it certainly pays off. In simpler domains, it’s often not worth the overhead. Object relation mapping is hard. ORM’s help solving this, but they come with the cost of complexity. In order to reduce complexity we can eliminate the ORM, but this is only possible if we can eliminate the object relational mapping. By leveraging some well-known patterns, we can simplify our database access. This allows for a rich write model and a light read model, while still allowing us to free domain logic from data persistence concerns.

Comments are closed.