Skip to content
Commits on Source (6)
---
inherits: base.yml
url: /solving-n-plus-1-problem-in-orms
title: "Solving the N+1 problem in ORMs"
lang: en
theme: main
#menu: Foo/Bar
context:
post_date: 2019-09-06
introduction: In this article, I present a new way to deal with the N+1 issue that we are facing when using most ORMs.
related:
- /tdbm5.1-symfony-tutorial
- /tdbm5.1-rich-model
- /tdbm5-coming-out
# update_date: 2017-11-21
---
I'm an ORM developer. If you are not aware, I'm involved in the development of [TDBM, a DB-first PHP ORM](https://thecodingmachine.github.io/tdbm/).
As anyone developing an ORM, I came to face the dreaded N+1 performance problem. I believe we came with an elegant solution
to this issue and I wanted to share that with the wider community (other ORM developers might be interested!)
## The N+1 problem
ORMs are tools that write SQL requests for you. Because you use a nice object model instead of writing SQL, it is easy not
to realize that a big number of requests can be emitted.
Have a look at this simple example:
![data-model](images/orm-n-plus-one/simple_schema.png)
In this example, each user belongs to a country.
Assuming our ORM gives us a "user repository" that enables us to fetch users, our code could look like this:
```php
$users = $userRepository->findAll();
foreach ($users as $user) {
echo $user->getName().' lives in '.$user->getCountry()->getLabel()."\n";
}
```
<small>This is PHP but this could really be any other programming language.</small>
What will happen here?
On the first line, the ORM will probably fetch all the users. Something like:
```sql
SELECT * FROM users;
```
Let's say it fetches 2000 users.
For each user, we want to fetch the country. So we will call `$user->getCountry()`.
But we don't have the country data loaded in memory! So the ORM will probably want to fetch the data from the country,
by running this query:
```sql
SELECT * FROM countries WHERE country_id = :country_id_for_current_user;
```
And it will do this for **each user**. So it will do this **2000 times**!
Oops... This very simple code is actually performing 2000 + 1 queries! (hence the *"N+1" problem*).
Of course, if we were to write pure SQL, we could solve this in exactly one query:
```php
SELECT * FROM users JOIN countries ON users.country_id = countries.id
```
Needless to say the performance of raw SQL is way better than the performance of this hypothetical ORM.
## Eager loading to the rescue
Hopefully, ORM developers have known the issue for quite some time and they already have working solutions.
The idea is always the same: the developer should tell in advance to the ORM that it will need additional data.
The ORM can then fetch that data in advance (we call that "eager loading").
With eager loading, the related data is fetched along with the parent object. This is more efficient in loading data
but it will load data irrespective of the data being used or not.
### Eager loading in Eloquent
For instance, [Eloquent (the Laravel ORM) comes with a `with` method](https://laravel.com/docs/master/eloquent-relationships#eager-loading):
```php
$users = User::with('country')->get();
foreach ($users as $user) {
echo $user->country->name;
}
```
### Eager loading in Ruby on Rails
Ruby on Rails is very similar to Eloquent:
```ruby
@users = User.all.includes(:country)
```
### Eager loading in Doctrine
Another example: Doctrine ORM comes with several strategies for dealing with eager loading.
You can decide that a relationship should **always** be eager-loaded:
```php
/**
* @ORM\ManyToOne(targetEntity="Country", fetch="EAGER")
*/
private $country;
```
Of course, this will be always triggered so you will fetch the country related to a user even when you don't need it.
Not ideal.
You can also write a DQL query to fetch both tables at the same time:
```php
$dql = "SELECT u, c FROM User u JOIN Country c";
$query = $entityManager->createQuery($dql);
$users = $query->getResult();
```
There are actually more ways to work with JOINs in Doctrine and if you want to learn more, you can read this
[excellent article by Benjamin Eberlei](https://tideways.com/profiler/blog/5-doctrine-orm-performance-traps-you-should-avoid).
## The problem with eager loading
Current available solutions are sharing a common issue: **they rely on the developer** to tell the ORM when to perform
eager-loading or not.
But it is far too easy to forget setting up eager loading and it requires *knowledge* to perform eager loading only
when needed.
It becomes even harder with several developers! Look at this Twig template:
```twig
{% for user in users %}
<li>{{ user.name }} lives in {{ user.country.label }}</li>
{% endfor %}
```
What are the chances that the designer working on Twig understands the implication of the "for" loop in terms of performance?
Literally none.
Even worse, sub-optimal code can go undetected for quite some time. If N is small enough, you might not notice the issue,
but when your database starts growing, N will become large and you will face a problem after a few months/years
in production!
Also, the advent of **GraphQL** makes things worse. In GraphQL, it is the client that decides what fields it wants to fetch.
Depending on the client query, we should decide dynamically which fields must be eager loaded.
This is a tremendously complex problem to solve!
## Smart eager loading to the rescue!
In an ideal world, a developer should not have to care about eager-loading or not. The ORM should be able to take the
decision on behalf of the developer.
The good news is... I think we found a way to do it!
We dubbed this feature *smart eager loading* and we tested it in [TDBM (our very own ORM)](https://thecodingmachine.github.io/tdbm/).
I believe it is not implemented in any ORM so far. [The TDBM pending pull request is here](https://github.com/thecodingmachine/tdbm/pull/172).
The idea is simple. For each entity fetched, we try to remember where the entity comes from.
- Does it come from a direct access by ID? (like: `$userRepository->getById(42)`)
- or does it come from a query that returned many users? (like: `$userRepository->findAll()`)
```php
$users = $userRepository->findAll();
// $users is an instance of a "ResultIterator"
foreach ($users as $user) {
// At this point, if you give TDBM a $user instance
// TDBM can go back to the ResultIterator that generated it
}
```
![From any entity, we can go back to the original query](images/orm-n-plus-one/smart-eager-load.svg)
Now, here is sneak peek of what is going on in TDBM mind when we call `$user->getCountry()`.
On the **first iteration of the loop**, TDBM will ask:
<style>
.chatwrapper {
width: 100%;
display: flex;
flex-direction: column;
align-items: center;
}
.chat {
width: 80%;
border: solid 1px #EEE;
display: flex;
flex-direction: column;
padding: 10px;
}
.messages {
margin-top: 30px;
display: flex;
flex-direction: column;
}
.message {
border-radius: 20px;
padding: 8px 15px;
margin-top: 5px;
margin-bottom: 5px;
display: inline-block;
}
.yours {
align-items: flex-start;
}
.yours .message {
margin-right: 25%;
background-color: #EEE;
position: relative;
}
.yours .message.last:before {
content: "";
position: absolute;
z-index: 0;
bottom: 0;
left: -7px;
height: 20px;
width: 20px;
background: #EEE;
border-bottom-right-radius: 15px;
}
.yours .message.last:after {
content: "";
position: absolute;
z-index: 1;
bottom: 0;
left: -10px;
width: 10px;
height: 20px;
background: white;
border-bottom-right-radius: 10px;
}
.mine {
align-items: flex-end;
}
.mine .message {
color: white;
margin-left: 25%;
background: rgb(0, 120, 254);
position: relative;
}
.mine .message.last:before {
content: "";
position: absolute;
z-index: 0;
bottom: 0;
right: -8px;
height: 20px;
width: 20px;
background: rgb(0, 120, 254);
border-bottom-left-radius: 15px;
}
.mine .message.last:after {
content: "";
position: absolute;
z-index: 1;
bottom: 0;
right: -10px;
width: 10px;
height: 20px;
background: white;
border-bottom-left-radius: 10px;
}
</style>
<div class="chatwrapper">
<div class="chat">
<div class="yours messages">
<div class="message last">
Where does <code>$user</code> comes from?
</div>
</div>
<div class="mine messages">
<div class="message last">
It comes from a ResultIterator
</div>
</div>
<div class="yours messages">
<div class="message last">
So it is very very likely that we are currently in a foreach loop and that the <code>getCountry</code> method will be called in a loop for every user of the ResultIterator, right?
</div>
</div>
<div class="mine messages">
<div class="message last">
Yup...
</div>
</div>
<div class="yours messages">
<div class="message last">
What was the query that generated the ResultIterator?
</div>
</div>
<div class="mine messages">
<div class="message last">
<code>SELECT * FROM users WHERE status="ON"</code>
</div>
</div>
<div class="yours messages">
<div class="message last">
And we want the list of countries attached to this list of users, right?
</div>
</div>
<div class="mine messages">
<div class="message last">
Yup...
</div>
</div>
<div class="yours messages">
<div class="message last">
So what about this?<br/>
<code>SELECT DISTINCT * FROM countries WHERE country_id IN (SELECT country_id FROM users WHERE status="ON")</code>
</div>
</div>
<div class="mine messages">
<div class="message last">
Excellent!
</div>
</div>
<div class="yours messages">
<div class="message last">
So let's cache all this data for the next loop iteration, shall we?
</div>
</div>
</div>
</div>
On the **subsequent iterations of the loop**, here is what will happen:
<div class="chatwrapper">
<div class="chat">
<div class="yours messages">
<div class="message last">
Where does <code>$user</code> comes from?
</div>
</div>
<div class="mine messages">
<div class="message last">
It comes from a ResultIterator
</div>
</div>
<div class="yours messages">
<div class="message last">
Did we already fetched some data related to countries for this result iterator?
</div>
</div>
<div class="mine messages">
<div class="message last">
Yup...
</div>
</div>
<div class="yours messages">
<div class="message last">
Excellent! Give me the data! No queries required.
</div>
</div>
</div>
</div>
In the end, we managed to fetch all the required data in only **2 queries** (instead of N+1 queries!).
And more importantly, **we did not touch the code**! TDBM will eager load the countries data "on the fly".
<div class="alert alert-info">We only make one assumption: if an entity was fetched through a result iterator, it is likely that it will be used
in a loop and that similar entities will be used in a similar fashion.</div>
According to our experience, this is a very reasonable assumption. So by default, TDBM comes with smart eager loading
enabled. And you can of course disable it in the very special cases where it is over-fetching data.
## Benchmarking smart eager loading
Theory is nice, but how much time are we gaining in practice?
Well, obviously, it depends on the value of "N".
We did a test with N=2000 (2000 users in 2000 distinct countries).
Test | Time
----------------------------|---------
Without smart eager loading | 544 ms
With smart eager loading | 257 ms
That's a **x2 speedup**!
Notice that MySQL is really quite fast because it can answer 2001 requests in less than 500ms.
Yet, using *smart eager loading* doubles the speed. And of course, as N grows, the gain will increase!
## The details are more complex
In the example above, we are playing with the happy path: we have a one-level "many-to-one" relationship.
But of course, if we want this optimization to make sense, we should also be able to handle more complex cases:
- recursive calls, like:
```php
$user->getCountry()->getContinent()->getLabel()
```
- other kinds of relationships (one-to-many and many-to-many)
```php
foreach ($user->getRoles() as $role) {
foreach ($role->getRights() as $right) {
// We should be able to fetch all rights of all users
// in one query.
echo $right->getLabel();
}
}
```
This actually requires to add an abstraction on top of the ORM that is capable of finding its way back to the original
query and to build the new query at the same time. We worked on that in TDBM, and the future v5.2 will have support
for all these features.
Currently, TDBM 5.2-dev supports many-to-one queries and recursion and we are actively working on one-to-many and
many-to-many support. You can give it a try!
## Stay tuned!
If you made it this far, I hope this article will make you want to try TDBM.
If I wrote this article, it is also in the spirit to share this idea with my fellows.
So if you are an ORM developer, maybe you will consider adding a similar feature to your library?
Want to stay tuned on the latest TDBM releases or PHP related news? [Follow me on Twitter!](https://twitter.com/david_negrier/)
Also, now is a good time to thank my wonderful teammates [Arthmael](https://github.com/Kharhamel) and [Guillaume](https://github.com/homersimpsons)
that made these innovations possible!
This diff is collapsed.