Over the last few years I've had many discussions about what constitutes a good PR with fellow developers. I want to summarize here what I've learned and touch base with the rest of developers in the wild (you).
Note: For the context this post, I'm talking about PR's inside an organisation. The requirements for PR's on open source projects may be very different and very often depend on what type of project it is.
Why do we use pull requests?
To understand what constitutes a good pull request, we must first define the reason why we use the pull request process:
- PR's are a great way of sharing information about the code base. They limit code ownership (check my post on avoiding code ownership to see why that's a bad thing)
- It creates an extra gate before our code goes to production
- It improves code quality. This is not only because there's another person to catch mistakes. If you know someone will review your code, you're probably, at least subconsciously, making a bigger effort to write better code.
Creating good pull requests
Whenever I review a PR, I look for the following qualities:
- Logical commits. Each commit should be a single, coherent change.
- All commit messages are descriptive
- The list of commits are related and implement a complete story
- The work in the PR adds quantifiable value.
- Code complies with the defined standards
- The pull request is as small as practically possible
When reviewing a PR, it's really difficult to judge code quality when you only have half the information. Therefore, it's very import that each commit contains a single coherent change, that in theory should be reviewable on its own.
I also prefer that each commit would be deployable individually, although sometimes this can be a bit hard to achieve. However hard it may be in some cases, I always strive towards that end.
This is a pet peeve of me, but far too often I see commit messages that just describe what is done. While that is important information, it's only part of the story. Usually, what is done is easily verifiable from the code. What interests me, as a reviewer, and later as a reader of the code, is why a particular change was implemented. The key questions a commit message should answer are:
- What has changed
- Why was the change necessary, possibly referencing a formal requirement
- What are direct and indirect consequences of the change
What does not need to be in the commit message?
- Who made the change (that is already part of the commit)
- When the change was made (also part of the commit)
- How this code can be improved (this should go in the backlog)
- How you feel about the code
When you submit a PR, it should relate to a single theme. Reviewing code can be a difficult task since very often the context is not entirely clear. If a PR tries to fix or implement multiple things, this only gets harder. To avoid this, we should strive to divide a user story we're working on into multiple related commits and submit them as a single PR.
The pull requests adds quantifiable value
This could be in fulfilling a customer requirement, but could also be a refactor, performance improvements, new tests etc.
In the case of implementing requirements, it's fairly easy to see that this adds value. However, in the case of a refactor or performance improvements this may be harder. In these you usually want to discuss this beforehand. Nevertheless, it should be fairly obvious once the PR has been submitted that there is value in the addition.
Code complies with defined standards
While I don't believe style checking should be performed by a human reviewer (this task is better suited to automated tools), there are other standards that do need a review. These could be things like the data access method, a recurring pattern that's been used, usage of internal libraries vs rolling your own.
Before submitting the PR, these are things that should be checked by the submitter of the PR.
Smaller is better
Reviewing code can be a daunting task. As a consequence, the larger the PR, the harder it will be to review. The harder it is to review, the less deep the review will be and it will be more of a code "scan", rather than a code "review".
A good review
Conversely, creating good PR's doesn't mean that you have a good review process. The flip side of it, reviewing the code, is also very important. This is a list of questions and attitudes towards the code under review that could help:
Does this code solve a problem?
Ideally, this should have been a pre-condition before even starting work on the code, however, there are cases where it may not be clear-cut and this is a good first question to ask. If it doesn't solve a problem, then all the rest is pointless.
Does the code do what the submitter intended?
Once established that the purpose is sound, we need to verify whether the implementation is actually correct. This could be trying to find potential bugs, but also unwanted side effects that the submitter did not intend to cause.
How would I have solved this?
This is a tricky one. Sometimes developer styles vary wildly. The purpose of this question is not to force the developer to do it your way, but rather contrast different coding styles and evaluate if one is preferable over the other.
Are there any useful abstractions?
Sometimes the problem may already have been solved elsewhere, but the author is unaware. It's good to point this out and suggest ideas on how to make use of this. In some cases, it will simply be reusing existing code, in others, it might need introducing a new abstract concept to rely on.
Play advocate of the devil
Try to catch any mistakes, errors, violations against conventions, ...
That said, be nice about it. The goal is to improve the code, not to prove your superiority over the other developer. All review comments should strictly be about the code. It's the code that is under review, not the developer.
Is this code covered by tests (and does it need to be)?
Tests are important, not only for regression issues, but also for documentation. If the code is highly algorithmic, then you probably want automated unit tests in place (hint: these are a nice place to start the review, as it will give you a nice list of requirements that are being implemented).
Conversely, if it's mainly coordinating code, question why unit tests are being added and whether they are really necessary. (Check out my post on the test pyramid for more information on what to test and what not)
Does this code follow styles and patterns?
As I mentioned before, pure code style issues should probably be caught by automated tools. Regardless, some things won't be possible through static code analysis, so you want to have a look at those more closely.
Setting standards and guidelines
Apart from submitting good PR's and executing good reviews, another part of the puzzle is having a clear understanding of when a PR will be rejected and when it can be merged. Here are some of the standards that I believe should be in place. The specifics of these may vary under circumstances, but regardless of the parameters, it's good to have a clear understanding between all developers:
Reasons for rejection
Certain things may be a clear reason for rejection. It's important to list these and agree with everybody. The reason is that an early rejection can save a lot of time, both on the reviewer as well as on the submitter side. It avoids discussions over trivial problems. Some of the things that can lead to a straight rejection:
- Failing tests (even better is to enforce this by the CI server)
- Not following styling guidelines (even better to enforce through an automated tool)
- Infractions against the rules of a good PR. This could mean non-descriptive commit messages, very large PR's, multiple stories in one PR, ...
Eventually you'll run into the situation where two developers will disagree. That is perfectly fine and actually a sign of good team dynamics. What is not fine however is having endless discussions about it or one developer overriding the other because of reasons like seniority, having the admin rights, etc.
It's better to prevent these situations from happening and have a conflict resolution strategy. One particular method that works well is to always require at least two code reviewers. In those cases, it's simply the majority vote that counts.
Review lead time
One of the biggest reasons I have seen people skip PR's is because it blocks them. When the product team is pushing for a deployment, it doesn't help when your PR is being left in the review code for days on end.
Ideally, a PR should be reviewed within an hour. If PR's are following the practices mentioned in this post, this should definitely be doable. Where it breaks down is on large PR's.
It's therefor very important to keep PR's as small as possible and ensure that reviews are executed promptly.