In the wake of the issue with NPM, I wanted to share my view and experience with dependency management.
First of all, what happened? I’m not going to go too deep on what actually happened (there’s plenty of information about that), but essentially, because of a copyright dispute a package was removed and replaced with a different one. Because of this, a whole slew of popular packages that depended on this broke. All projects depending on those consequently broke as well, resulting in broken builds all over the world.
To add insult to injury, the package that caused most issues was leftPad, a seemingly trivial module that should be easy for any developer to implement themselves.
The incident has caused a lot of reflection on what the best way is to do dependency management. The discussion basically broke down into two camps:
In the one camp you have people saying that developers should not include dependencies for trivial things or on modules that consist of a one-liner. They argue that:
- Functions or one liners are too small to include as dependencies
- There’s no guarantee that what someone has written is correct
- Developers who are not capable of writing a trivial method like the one mentioned, should not be writing code at all
- Modules this trivial and this widely used, should be part of the base framework (nodejs in this case)
The other camp sees the above problem as an issue with the registry (NPM) and that there’s no problem in having small packages. They argue that:
- We shouldn’t reinvent the wheel for common functionality
- Small modules allow for more modular design and composability of applications
All good points, and I certainly agree with every argument made from both camps.
Granular or coarse composition?
I have experienced a similar problem in one of my previous projects. We were using internally distributed modules, so we didn’t have any of the aforementioned issues with registries and authorship.
The project I’m talking about was built in .NET and we were using NuGet packages. Regardless, the problem space is still the same: package and dependency management.
We were torn between two different styles of dependency management:
- Either create a single package with multiple utilities, the typical <companyname>.Utils package so to speak.
- Or create a single package for every single utility: <companyname>.LeftPad (for example)
Our problem was related to the authoring and the management of these packages:
In the first option, it was easy to publish new versions of the package. On the other hand, if you wanted only one part of it, you had to import everything and the kitchen sink.
The second option was easier from a client point of view as you could just import what you needed, but it made authoring and maintaining all these different packages quite difficult.
Dependency management and visibility
Another problem, which is true for both solutions, but magnified by the second approach, was debugging these modules. If you would encounter a problem in one of the packages, it was impossible to step into them to see what they were doing. This shouldn’t be a problem in stable packages, but that’s not always the case. If you have a single package, you can publish the symbol files (in .NET) and use them for debugging. For multiple modules, this is also possible but it creates more overhead (at authoring time, but also in start up time of your application and loading symbols).
Side note: We experienced first hand that SymbolSource, the repository for .NET symbols, is also not the most stable solution for storing symbols.
While this is a problem specific to .NET, I do think that hiding away dependencies in a node_modules folder is kind of the same, you lose visibility.
The main issue I have with traditional package management solutions (NPM, NuGet, …) is that you lose visibility on what’s happening with your code base. I don’t like the idea of having to depend on external developers and less when there’s no guarantee that these packages will always be in the same hands (a change breaks the trust you have in a package).
Not only is visibility a problem, but when there’s an actual bug in a dependency, you have to hope that the author will fix it or accept your pull request if you decide to fix it yourself. If you can’t do this, you might get stuck.
We came up with an intermediate solution (based on a post by Nik Molnar) for package management: Instead of depending on a compiled package that is stored in the packages folder (equivalent to a package that lives in node_modules), we decided to publish source-only packages. What does this mean?
Say we have a package LeftPad. Instead of creating a package that distributes a DLL with that function, we would distribute a package that creates a class LeftPad.cs in your solution. This solves a few problems:
- You don’t need to reinvent the wheel, you can reuse existing modules that are used by many others (and thus potentially vetted)
- The code is available inside your project, so it’s visible and modifiable at all times
- If an update comes along, it will overwrite the class. Source control will show you exactly what was modified and you have great visibility over what could potentially be a breaking change
- You can make modifications to it very easily, again, any updates will highlight where your local modifications would be overwritten and it’s quite easy to manage
Another key point is that these packages are marked as development dependencies, which means they won’t be installed in subsequent levels of the dependency chain. This makes the dependency chain a lot flatter and more manageable.
A potential disadvantage is that you have to take ownership of the external code. I don’t see that as a problem though, but rather an advantage. It improves visibility and in the end you’re the one responsible for your application.
This approach is not new and has been used in previous projects:
- NETFx: https://netfx.codeplex.com/
- Quarks: https://github.com/shaynevanasperen/Quarks (this is the project that grew out of our project)
In my opinion this approach could work as well for NPM. Package authors could have an option of outputting certain files to the application directory instead of the nodes_modules folder so they can increase the visibility,
What do you think about this approach? We haven’t experienced any downsides with it (apart from some naming collisions, that were easily solved). Sound off in the comments!