The things you need to know about PostgreSQL indexes to keep your Rails applications snappy.
The purpose of indexes is to make access to data faster. Most of the time an index will make your queries faster but the trade off is that for each index you have your data insertion will become slower. That’s because when you insert data with an index it must write data to two different places.
PostgreSQL has many types of options when it comes to indexing. The one we’re going to focus on in this article is the B-tree index type which is the most commonly used index type for most use cases.
Primary key indexes
Ok, let’s start with the basics. In general it’s a good practice to add an index for the primary key in your tables. If your table will have a large number of rows it makes good use of an index and the lookup will take place in the index instead of sequentially scan your table for the matching rows. Luckily, PostgreSQL automatically creates an index for primary keys to enforce uniqueness. Thus, it is not necessary to create an index explicitly for primary key columns:
…your table description will look like this in psql:
As you can see, you now have an primary key index using the btree type to index the id column.
Foreign keys and other commonly used columns
Unlike primary keys, foreign keys and other columns in your table will not be indexed automatically in Rails. So it’s always a good idea to add indexes for foreign keys, columns that need to be sorted, lookup fields and columns that are used with the
group method (GROUP BY) in the Active Record Query Interface.
One of the most common performance problem with rails applications is the lack of indexes on foreign keys. Luckily it’s very easy to avoid this pitfall:
And after adding the migration above you should see this in psql:
Notice that we now have an index called
index_products_on_category_id for the category_id. So that extra
add_index line in the migration will make your application perform a lot better.
If you create a unique index for a column it means you’re guaranteed the table won’t have more than one row with the same value for that column. Using only
validates_uniqueness_of validation in your model isn’t enough to enforce uniqueness because there can be concurrent users trying to create the same data.
Imagine that two users tries to register an account with the same username where you have added
validates_uniqueness_of :username in your user model. If they hit the “Sign up” button at the same time, Rails will look in the user table for that username and respond back that everything is fine and that it’s ok to save the record to the table. Rails will then save the two records to the user table with the same username and now you have a really shitty problem to deal with.
To avoid this you need to create a unique constraint at the database level as well:
So by creating the
index_users_on_username unique index you get two very nice benefits. Data integrity as descibed above and good performance because unique indexes tends to be very fast.
By default, the entries in a B-tree index is sorted in ascending order. However, in some particular cases it can be a good idea to use a descending order for the index instead.
One of the most obvious case is when you have something that is paginated and all the items are sorted by the most recent released first. For example a blog post model that has a released_at column. For unreleased blog posts, the released_at value is NULL.
This is how you create this kind of index:
As we’re going to query the table in sorted order by released_at and limiting the result, we may have some benefit by creating an index in that order. PostgreSQL will find the rows it needs from the index in the correct order, and then go to the data blocks to retrieve the data. If the index wasn’t sorted, there’s a good chance that PostgreSQL would read the data blocks sequentially and then sort the results.
This technique is mostly relevant with single column indexes when you require nulls to be last. Otherwise the order is already there because an index can be scanned in any direction.
If you frequently filter your queries by a particular column value, and that column value is present in a minority of your rows, partial indexes may increase your performance significantly. A partial index is basically an index using a
WHERE clause. It increases the efficiency of the index by reducing its size which makes the index smaller and takes less storage, is easier to maintain, and is faster to scan.
Let’s say that you have a table for orders. That table can contain both billed and unbilled orders, where the unbilled orders take up a minority of the total rows in the table. It’s very likely that the unbilled orders are also the most accessed rows in your application. Then it is very likely that your application performance will increase if you use an partial index.
This is how it looks in psql: