Simplicity in software: what you see is not what you get

As software developers we put a lot of focus on simplicity. And rightly so, because making code readable and understandable keeps it from becoming a maintenance nightmare. Simplicity is often confused though with how many lines of code there are.

When you use lines of code to measure complexity, you’re likely just hiding complexity behind dependencies or complex components. I recently came across a video where Greg Young describes it as “magic”. In his video he talks about how we are using dynamic proxies, aspect oriented concepts and DI constructions to make our code look simple, while in fact it’s all magic behind the scenes.

I strongly encourage you to watch the video as it is really good material. The video is available here: http://www.infoq.com/presentations/8-lines-code-refactoring

More and more are we using components and third-party libraries to quickly get a solution. While I don’t think this is necessarily a bad thing, often the decision to incorporate third-party libraries is taken too lightly.

It seems that, whenever a pain is discovered in a codebase, instead of actually looking at the problem at hand, a library is thrown at it which makes the problem go away. Magically. This is a short term solution though. What do you think will happen when there’s a problem in that particular part of the code? (Notice I said when, not if) Because everything is hidden behind a magic library, it’ll be much harder to figure what’s going on.

When you break your leg, you could take a few painkillers and make the pain go away. The right thing to do though, is to go to a hospital and fix it, instead of hiding it behind an apparent solution.

To explain what I’m talking about I want to show some examples and how a different (and sometimes more verbose) approach can actually be simpler.

Example: DI auto configuration

To configure a DI container, a generally accepted best practice is automatic dependency resolution. The (perceived) problem goes as follows: With manual dependency configuration, every time you need a new dependency you need to configure it in the composition root. Suppose we have the following DI-container configuration (using Ninject):

kernel.Bind<IFoo>().To<Foo>();
kernel.Bind<IBar>().To<Bar>();
kernel.Bind<IBaz>().To<Baz>();

The “problem” here is that when you can create a new interface (say IQux) and a new default implementation for that interface (Qux), you need to add another line:

kernel.Bind<IQux>().To<Qux>();

Now you can imagine this list becoming quite long after a while. Since we are developers, our first instinct tells us to automate this. The usual solution is to have the resolution depend on conventions. In Ninject you could do it like this (other containers have similar ways of doing this):

kernel.Bind(x => x.FromAssembliesMatching("*")
                  .SelectAllClasses()
                  .BindDefaultInterface());

Well, this is definitely simpler. Or is it? What will happen when I create a second class that implements IQux? Can you predict how the the dependency will be resolved? How would you troubleshoot a dependency not resolving to the correct class? Does this cause any side effects? How does it affect performance? All questions that can only be answered by learning about the internals of the particular container.

You might call out “yes, but convention over configuration!”. Well, convention over configuration only works when you have a decent, well-established convention that is very unlikely to change.

The first approach is definitely longer, but I’d argue that it’s simpler and more explicit. When I look at the first listing I can immediately and unambiguously see that IQux will be resolved by Qux. If I want to change it, it’s a very simple modification. Without knowing how Ninject works, do you know how to modify the auto configuration?

In one of the projects I’m working on, I decided against the use of auto configuration. Yes, the composition root is large (it’s about 200 lines of code), but it’s dead easy to maintain. When a dependency doesn’t resolve as expected, the only thing I need to do is find the line that declares the binding and modify it (and write a unit test for it). I’m writing more lines of code, but I’m writing simpler code. Writing code is not the bottleneck, it’s reading code and understanding how it works that is. When you’re writing this code, at the same time you’re writing documentation on your dependency graph. Now it’s very easy to see how the application is composed because the composition root has all the documentation you need.

Another advantage of this approach is that you increase visibility on your problems. If you see your composition root growing exponentially to hundreds or even thousands of lines of code, then maybe your dependencies are too granular. You could throw auto-configuration at it, and your immediate pain will be gone, but the underlying problem is not solved. By being explicit, you make the pain more immediate and it will force you to think of a better way to solve it.

Another possible approach is using a more functional approach towards dependency inversion. That way you can get rid of the container altogether. This topic deserves a post of its own though, so I’m not going to dig into that for the moment.

Example: AutoMapper

Another tool often overused is AutoMapper. AutoMapper is used to alleviate the pain of transforming one object into another. This happens when transforming from domain objects to ViewModels or DTO’s or …
Suppose we have two classes:

public class Customer
{
    public int Id { get; set; }
    public List<Order> Orders { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

public class CustomerViewModel
{
    public int Id { get; set; }
    public decimal TotalOrdered { get; set; }
    public string DisplayName { get; set; }
}

Now every time we want to transform a Customer into a CustomerViewModel we need to write something like this:

var customer = getCustomerById(1);
var vm = new CustomerViewModel
{
    Id = customer.Id,
    TotalOrdered = customer.Orders.Sum(o => o.Total),
    DisplayName = customer.FirstName + " " + customer.LastName
};

AutoMapper fixes this by letting you write a MappingProfile like this:

public class CustomerMap : Profile
{
    protected override void Configure()
    {
        CreateMap<Customer, CustomerViewModel>()
            .ForMember(vm => vm.DisplayName, o => o.MapFrom(c => c.FirstName + " " + c.LastName))
            .ForMember(vm => vm.TotalOrdered, o => o.MapFrom(c => c.Orders.Sum(order => order.Total)))
    }
}

There are two things to note:

  • We only have to write this mapping code once.
  • The Id was automatically mapped. You get this for free by the conventions that AutoMapper uses.

The first advantage is fairly easy to match with just regular OO-code. It’s called a factory-method:

public static CustomerViewModel FromCustomer(Customer customer)
{
    return new CustomerViewModel
    {
        Id = customer.Id,
        TotalOrdered = customer.Orders.Sum(o => o.Total),
        DisplayName = customer.FirstName + " " + customer.LastName
    };
}

So if we compare using AutoMapper with doing it manually, we end up with these two calls:

var vm = Mapper.Map<CustomerViewModel>(customer);    // using automapper
var vm = CustomerViewModel.FromCustomer(customer);    // manually

Suppose that you’re debugging this code and you notice that after this call, the viewmodel is not populated correctly. With which method would you find the issue the fastest? If we use Automapper, the profile probably lives somewhere else in the solution. We’d have to go and look for that profile, determine which properties are mapped by convention and which are done manually. More importantly though, we need to know what the conventions are.

If we use a manual approach, we simply step into the FromCustomer method and can easily see how one is converted to the other.

The only disadvantage you have with using a manual approach is that you need to write mapping information for properties with the same name.
If those case are rare, then it’s not really a problem.
If on the other hand, you find yourself writing a lot of that code, then the question you should be asking is not “How can I make this automatic?” but “Am I doing too much mapping? Could I restructure the code so I don’t need this mapping?”. Maybe you need to compose your objects differently or use different queries on the database. Solving these question will make your code even easier to understand and it will eliminate complexity instead of adding to it.

Example: ORM’s

ORM’s are yet another example of where we introduce complexity to solve an imminent pain. The problem with a big ORM is that it starts out really simple (“look how easy it is to write some linq to fetch a bunch of objects”), but it soon turns into a big beast that no-one understands.

There comes a point in every project that a developer says “Hmm, I wish I could just write a SQL-query, that would be a lot easier than figuring out this NHibernate mapping”. Isn’t that a bit ridiculous? We jump through a lot of hoops to have some syntactical sugar which then generates SQL that we are perfectly capable of writing ourselves (again, typing is not the bottleneck). Apart from the fact that you still have to know SQL, you now also need to know how the ORM works. And in the next project, there’s a new ORM, so now you have to start over again.

This knowledge doesn’t apply to mapping alone. I recently got bitten by how Entity Framework works. Let’s take a look at two different ways of querying some data:

var searchIds = new List<int>{1,2,3,4,5}; // in reality, there were about 30 Id's
var result = persons.Where(p => 
    p.Locations.Any(l => 
        searchIds.Any(id => l.Id == id)));
var searchIds = new List<int>{1,2,3,4,5}; // in reality, there were about 30 Id's
var result = persons.Where(p => 
    p.Locations.Any(l => 
        searchIds.contains(l.Id));

Can you spot the difference? It’s the difference between “searchIds.Any(…)” and “searchIds.Contains(…)”. At first sight what do you think the difference in performance would be? Initially I though it would be quite similar. That is, until I actually ran the query. (note that the total query was quite a bit bigger). The first query was a 2000+ lines SQL query with all kinds of weird nesting which threw an exception because of too many levels of nesting. The second query was about 35 lines.

This one I spotted because it actually threw an exception, but how many of those queries are out there that are 100 lines that could have been 5? Now, I know query length is not the determining factor here, but let me tell you, the 35 lines query was a lot simpler and a lot faster.

The irony of all this is that we already have quite a good DSL available to do queries over data. It’s called SQL and it’s been around for quite a while. There is an impedance mismatch between OO and relational data, but throwing a big ORM at it is like using a sledgehammer to put square pegs into round holes.

There are viable alternatives though. You can look at a few micro ORM’s such as Massive, Dapper and PetaPoco. They provide the bare minimum to deal with connections, mapping readers into objects, … But they still allow you to use a DSL that is actually designed for data access. The best thing is that they’re tiny and when something goes wrong, you could actually look at the source code and figure out what’s going on. Massive and PetaPoco are not even libraries, they’re single .cs-files. You add them to your library and you have full visibility over what’s going on.

A great presentation by Rob Conery, the creator of Massive, about ORM’s vs micro ORM’s can be seen here: http://ndc2011.macsimum.no/mp4/Day2%20Thursday/Track1%201140-1240.mp4

Measuring complexity

When measuring complexity, the amount of code is not the only factor. Often complexity is thought of this way:

complexity = lines of code

A better way of looking at complexity is this:

complexity = lines of code * complexity of said lines

When I refer to measuring complexity, I’m not talking about cyclomatic complexity or any other hard number, but about a general sense of complexity.

When you look at a line of code, you should be thinking about whether you understand all the consequences of that line. Let’s retake the AutoMapper example:

var vm = Mapper.Map<CustomerViewModel>(customer);    // using automapper
var vm = CustomerViewModel.FromCustomer(customer);   // manually

At first sight, they look equally complex (or simple). But let’s look at what actually happens:

Line 1 will do a number of things:

  • Find out the type that is passed in
  • Look up (through reflection) whether there’s a mapping for Customer and CustomerViewModel
  • Construct a new CustomerViewModel
  • Look up all the properties on both classes and match them by name (and a bunch of other conventions)
  • Call the lambdas we defined in the mapping profile

There are a lot of things here that are not obvious:

  • Assumptions about CustomerViewModel:
    • It must have a public constructor
    • That constructor must be parameterless
    • The properties need to have public setters
  • The conventions that are used when mapping
  • Whether or not there is a custom mapping profile for this conversion

All these non-obvious things that you need to remember, add complexity.

Now let’s look at line 2: it calls a method.

Do you still think they are equally complex?

Libraries

I want to make clear that I have nothing against tools or libraries. I do not want to discredit AutoMapper, Ninject, NHibernate or EF in any way. They are excellent libraries and can be useful in the right situation. The fact that I’m using these in my examples is just a coincidence.

The point I wanted to make is that a library does not get you simplicity. Rather the opposite actually, by introducing a library you will most definitely introduce more complexity. Remember that you will be the one who has to support that complexity.

Conclusion

I have seen countless examples of apparent simplicity, but with a lot of complex context. By looking at a problem from a different point of view, we can often solve these problems using a truly simple solution.

I know this post is highly opinionated, so I’m not expecting everyone to agree with me. Whether or not you agree with me, I would still like to encourage you once again to have a look at Greg Young’s video mentioned in the beginning. It’s worth every minute of your time.

  • David DV

    another good presentation to link here is “Simple Made Easy” by Rich Hickey
    http://www.infoq.com/presentations/Simple-Made-Easy

    on ORMs, “There is another” called Linq2Db @ https://github.com/linq2db/linq2db
    It’s more like a micro-ORM but with a very good Linq Provider on top.
    So you don’t get the leaky stuff and complexity from the big ORM’s but you also don’t need to write your sql in strings :)

  • http://www.borismod.net Boris Modylevsky

    Great blog! I’d like to comment about your useful and detailed post about Builder pattern. i agree that this is very useful pattern, especially for testing. I guess you are familiar with NBuilder project which implements this pattern in a generic way with fluent syntax and lambda expressions.

    • http://www.kenneth-truyers.net Kenneth Truyers

      Yes, I’m aware of that project. However, I think it’s not really a necessary dependency. AutoFixture is another project much like that one, however, if you look at the blog post you’re commenting on, I’m advocating for less dependencies and although my post on the builder was a generic example I think its benefit lies in the fact that you can construct semantic methods such as CustomerBuilder.AsGoldUser().WithoutCreditLimit()… That is where its really value lies and you cannot achieve that with a generic library.

  • Md.Ibrahim

    Really liked this article. I might even consider shifting from ORM to a Micro-ORM. Thanks.