Addressing Obstacles Using SQL Indexes and Partitions

In the initial lesson of our SQL Indexes Explained series, we established that SELECT queries perform better when the underlying data is already sorted according to the values within specific columns.

Our second lesson delved into the fundamental structure of B-tree indexes, exploring their role in minimizing the amount of data accessed during query execution. We also examined query implementation for joining multiple tables and the positive impact of indexes on the speed of such queries.

Furthermore, we emphasized two situations where using indexes in SQL proves advantageous. Firstly, when indexes encompass all the columns referenced in the WHERE conditions, JOIN conditions, and the SELECT list—we call these covering indexes—we can bypass reading the entire corresponding table. Secondly, indexes can be beneficial when they substantially reduce the number of data blocks accessed, limiting it to a small portion of the overall table size.

Conversely, if these conditions aren’t met, it becomes more efficient to scan the entire table directly rather than relying on an index and having to jump back and forth between the index and the corresponding table rows.

SQL Range Queries

Queries that benefit from indexes usually include conditions that notably limit the range of possible values one or more columns can have. We call these range queries, and they use conditions like “Column A’s value must fall between X and Y” to filter data.

A prime example is the query from Exercise 4 in the second lesson:

1
2
3
4
5
6
7
8
SELECT c.ClientName
FROM Reservations r
JOIN Clients c ON r.ClientID = c.ClientID
WHERE r.DateFrom BETWEEN (
    TO_DATE('2020-08-13', 'YYYY-MM-DD') AND
    TO_DATE('2020-08-14', 'YYYY-MM-DD')
  ) AND
  r.HotelID = 3;

This example features two ranges. The first range encompasses the dates between August 13, 2020, and August 14, 2020. The second one is a very small numeric range, as the condition essentially translates to r.HotelID BETWEEN 3 AND 3.

Exercise 1: Periods (Date and Time Range Queries)

Imagine we add a CheckInTime column to the Reservations table. this spreadsheet provides sample data, and you’ll notice a single index covering both CheckInTime and ClientId.

Your task is to write a query that would retrieve the names of clients who checked in on August 15, 2020.

SQL developers who are new to this might write the following query:

1
2
3
4
SELECT c.ClientName
FROM Reservations r
JOIN Clients c ON r.ClientID = c.ClientID
WHERE TO_DATE(r.CheckInTime, 'YYYY-MM-DD') = '2020-08-15';

They envision the query execution unfolding like this:

1
2
3
4
5
6
7
Get first row from IX_CheckInTime_ClientID where
  TO_DATE(CheckInTime, 'YYYY-MM-DD') = '2020-08-15'

While found and TO_DATE(CheckInTime, 'YYYY-MM-DD') = '2020-08-15' 
  Fetch Clients.* where ClientID = IX_CheckInTime_ClientID.ClientID
  Write down Clients.ClientName
  Get next row from IX_CheckInTime_ClientID

However, the issue is that, as of today, no Relational Database Management System (RDBMS) can generate such an execution plan. They interpret functions like TO_DATE (using Oracle syntax as an example) as transforming the CheckInTime column into a non-indexed value. Consequently, they tend to produce execution plans resembling this:

1
2
3
4
For each row from IX_CheckInTime_ClientID
  If TO_DATE(CheckInTime, 'YYYY-MM-DD') = '2020-08-15' then
    Fetch Clients.* where ClientID = IX_CheckInTime_ClientID.ClientID
    Write down Clients.ClientName

While this approach would still be quicker than reading every single row from the Reservations table because an index row is smaller than a table row—meaning fewer blocks need to be retrieved from the disk—we know the first execution plan would be far more efficient.

To guide our RDBMS towards that optimal approach, we need to rephrase the query:

1
2
3
4
5
SELECT c.ClientName
FROM Reservations r
JOIN Clients c ON r.ClientID = c.ClientID
WHERE r.CheckInTime >= TO_DATE('2020-08-15 00:00:00', 'YYYY-MM-DD HH:MI:SS')
  AND r.CheckInTime < TO_DATE('2020-08-16 00:00:00', 'YYYY-MM-DD HH:MI:SS');

This revised query is a proper range query, easily understood by any well-designed RDBMS. It allows the RDBMS to recognize that we need data from Reservations where the actual value of CheckInTime—not a derived value—falls within a well-defined range. As a result, the execution plan it generates would resemble this:

1
2
3
4
5
6
7
Get first row from IX_CheckInTime_ClientID where
  CheckInTime >= '2020-08-15 00:00:00'

While found and CheckInTime < '2020-08-16 00:00:00' 
  Fetch Clients.* where ClientID = IX_CheckInTime_ClientID.ClientID
  Write down Clients.ClientName
  Get next row from IX_CheckInTime_ClientID

This is our desired outcome: Leveraging not only the index itself but also capitalizing on its inherent sorted nature.

Exercise 2: `LIKE` With a Wildcard at the Start

Let’s say our detective visits the hotel with limited information about a suspect, knowing only that the suspect’s last name ends with “-son.” They need the first and last names of all guests matching this criterion:

1
2
3
SELECT FirstName, LastName
FROM Clients
WHERE LastName LIKE '%son';

We’ll work with the Clients table and an index on LastName, referring to this spreadsheet for data. Your task is to determine the results this query would return, considering different approaches.

Table Scan Approach

The most straightforward method involves sequentially reading all data from the table and noting down the names of guests whose last names end with “-son”:

1
2
For each row from Clients
  If LastName like '%son' then write down FirstName, LastName

This approach necessitates reading the entire table.

Using an Index

Can we leverage the index on LastName to our advantage? If we navigate to the IX_LastName sheet and try to use it to find all clients matching the criterion, we realize that we have to read the entire index to locate all the Andersons, Robinsons, and Thompsons.

Is this truly an improvement over a table scan? Keep in mind that in addition to reading the entire index, we also need to find the corresponding row in the table for each match using the rowAddress value and then retrieve the FirstName.

1
2
3
4
For each row from IX_LastName
  If LastName like '%son' then
    Fetch Clients.* where RowAddress = IX_LastName.RowAddress
    Write down FirstName, LastName

In this scenario, it was simpler and faster to read the table sequentially. For an RDBMS, the decision would depend on the percentage of rows meeting the criteria. If, for instance, there are very few Andersons, Robinsons, and Thompsons within a massive table, the RDBMS might opt to read fewer data blocks from much smaller index entries, even if it means reading a few extra table blocks upon finding a match. Otherwise, a table scan would be quicker.

The key takeaway is that the index’s data order doesn’t help with this type of query. While the smaller size of the index row can offer some benefit, it’s not always guaranteed.

Exercise 3: `LIKE` With a Wildcard at the End

This time, our detective needs to find all clients whose last names begin with “Rob-.”

1
2
3
SELECT FirstName, LastName
FROM Clients
WHERE LastName LIKE 'Rob%';

Let’s try extracting data matching this query from the same spreadsheet.

If you opted for the table scan approach, you missed the opportunity to fully utilize the IX_LastName index. A more efficient method involves locating the first index entry starting with “Rob-” (Roberts), reading subsequent rows (including both Robertses and Robinsons), and stopping once the LastName no longer matches:

1
2
3
4
5
6
Get first row from IX_LastName where LastName <= 'Rob'

While found and LastName < 'Roc' 
  Fetch Clients.* where rowAddress = IX_LastName.rowAddress
  Write down FirstName, LastName
  Get next from IX_LastName

In this case, after the initial B-tree lookup for the first entry, we only read entries that satisfy the criterion, stopping as soon as we encounter a non-matching name.

Addressing B-tree Scaling Woes

When a new database is deployed, it usually starts with a few populated lookup tables and empty transactional tables. The system initially runs smoothly, particularly if we’ve followed good database design principles such as table normalization, defining primary, foreign, and unique keys, and supporting foreign keys with corresponding indexes.

However, as time passes and data volume grows, increasing system and database complexity, we often observe performance degradation. This naturally leads to discussions about the root cause of the slowdown and potential solutions.

A common assumption is that the database size is the primary culprit, and the solution often involves removing historical data that isn’t needed daily and storing it in a separate database for reporting and analytics.

Let’s examine this assumption.

SQL Range Queries: Does Execution Time Depend on Table Size?

Consider a typical range query on a single table:

1
2
3
SELECT Column1, …, ColumnN
FROM Table
WHERE Column BETWEEN X AND Y;

Assuming an index exists on Column, the ideal execution plan would be:

1
2
3
4
5
6
Get first row from IX_Column where Column between X and Y

While found and Column <= Y
  Fetch Table.* where rowAddress = IX_Column.rowAddress
  Write down Column1, …, ColumnN
  Get next row from IX_Column

Let’s analyze the number of blocks an RDBMS would need to read to return this data.

The “Get first row” part is achieved through the B-tree lookup we discussed in the second lesson. The number of blocks read here is equal to the B-tree’s depth. Subsequently, we read consecutive items from the leaf level of the index.

With Online Transaction Processing (OLTP) queries, all results typically reside within a single index block (occasionally two, rarely more). Additionally, each index entry grants us access to a block in the table to locate the corresponding row using its address. While some table rows might be in the same already loaded table block, for simplicity, let’s assume a new block is loaded each time.

This leads us to the formula:

B = D + 1 + R

Where B represents the total blocks read, D is the B-tree depth, and R is the number of rows returned by the query.

The only variable influenced by the number of rows in the table is D, the B-tree depth.

To illustrate this concept, let’s assume 1,000 index entries fit into one block. This means D = 1 as long as the table contains fewer than 1,000 rows. For tables handling business transactions, this might hold true for the first business day after deployment.

However, the B-tree depth will inevitably increase. As long as the table holds fewer than 1 million rows, the index will consist of two levels.

If we’re concerned about sluggish database response times and attribute it to data volume, it’s crucial to remember that transaction tables often contain mere millions of rows. Since a two-level B-tree index can accommodate only 1 million rows, the depth must be at least three. This depth won’t reach four unless the table surpasses 1 billion rows. This gives us a more accurate estimate:

B = 4 + R

If R is small, reducing the B-tree depth back to two would noticeably speed up the query. When searching by a primary or unique key value, the system would read four blocks instead of five, representing a 20% improvement. However, if the query returns a larger number of rows, the improvement might not be as significant.

The challenge is that many applications can’t function with fewer than 1 million transactions in the database.

Therefore, it might seem that table size is irrelevant, and moving historical data is a futile effort.

However, let’s delve deeper into the structure of a B-tree index and how data modifications affect it.

B-tree Index Implementation Details

We learned in our previous lesson that all levels of a balanced B-tree are physically ordered by key column values. However, inserting, updating, or deleting an item often requires moving a substantial amount of data to maintain this order.

For instance, imagine inserting data into the middle of an already full block. This necessitates splitting the block, rearranging data, and potentially updating data on other B-tree levels that point to the current one.

To optimize such situations, each index item includes pointers to the preceding and subsequent rows, forming a doubly linked list. This means insertions generally involve writing new items as close as possible to the previous one and adjusting the pointers.

When a block split is necessary, we must also write a new item to the previous B-tree level. This is primarily a matter of modifying a few more pointers—we don’t need to rewrite large chunks of the tree. After a split, both resulting data blocks are approximately half-full. Depending on the disk’s free space distribution, “neighboring” blocks might be physically located far apart.

Over time, this can lead to increased index fragmentation, causing noticeable query execution slowdowns. With an RDBMS executing queries as described, the assumption of item order and proximity becomes less accurate, resulting in more reads. In the worst-case scenario, where all data blocks are half-empty, the system might have to read twice as many blocks.

B-tree Index Maintenance

The remedy for this issue is index defragmentation, also known as reindexing. RDBMSs provide a feature to recreate an entire index, restoring physical order.

Reindexing is generally a fast process despite the large volume of data read and written. Modern RDBMSs usually offer two modes: a faster one requiring table locks during processing, and a slower, online mode. Regardless of the mode, it’s recommended to reindex during off-peak hours to minimize performance impact.

Deleting Historical Data

When dealing with tables containing billions or even hundreds of millions of rows, completing reindexing within a limited off-peak window might be impossible.

This is where moving historical data out of the OLTP database can be beneficial. However, simply deleting rows older than a certain threshold further fragments the indexes, requiring more frequent reindexing.

SQL Partitioning to the Rescue?

There’s a way to prevent fragmentation caused by historical data removal while keeping only active transactions in the production database. Most major RDBMSs implement a solution where a table is divided into smaller units called partitions. These systems then offer functionalities to add, remove, or even switch these partitions between tables (for example, from an active to a historical table with the same structure).

Let’s look at a partitioned Reservations table, as illustrated in this spreadsheet. The table is partitioned by month, with partition names mapping to date periods and other spreadsheets. To understand query execution on a partitioned table, we’ll go through a few exercises.

Exercise 4: Partition Query in SQL

Using the spreadsheet linked above, try to extract data requested by the following query without relying on indexes:

1
2
3
SELECT HotelID, ReservationID, ClientID, DateFrom, DateTo
FROM Reservations
WHERE DateFrom BETWEEN TO_DATE('2021-03-01','YYYY-MM-DD') AND TO_DATE('2021-03-03');

You likely realized that you first need to consult the partition-mapping sheet to identify the partition containing reservations from March 2021. Then, you’d open the relevant partition, sequentially read its data, and filter out rows not meeting the condition.

While this process is simple, it’s not ideal to read so many rows only to keep a small subset. While reading the March partition is better than reading the entire reservation table, it’s still not optimal. This brings us to the question of indexes.

Global Indexes

RDBMSs allow creating global indexes that span all partitions of a partitioned table. However, there’s no underlying difference between how global and regular indexes operate: Global indexes aren’t partition-aware. This means CRUD queries using a global index don’t utilize the table’s partition map.

The partition map is only updated when an entire partition is dropped, requiring the removal of all index rows pointing to the removed partition. This essentially means rebuilding the entire global index.

An outage window is still necessary because the index can’t be used until obsolete items are removed. However, if dropping partitions regularly and limiting the number of active ones is possible, the reindexing operation might fit within the outage window. This demonstrates how partitioning can address the initial issue by minimizing maintenance task durations, including global index maintenance.

But what if even this downtime is unacceptable?

Globally Partitioned Indexes

This is where globally partitioned indexes come in. This approach involves partitioning the index similarly to the table. In our spreadsheet example, each partition contains its portion of the Reservations table and an IX_DateFrom index sheet, both partitioned by DateFrom.

To execute the query from Exercise 4, an RDBMS would first consult an index partitions map to determine which partitions contain dates within the specified range (in our case, only one index partition). Then, it would perform B-tree lookups, traverse to the leaf level, and finally access the table using the corresponding row address.

When dropping a table partition, we only need to drop the corresponding index partition, eliminating downtime.

Local Indexes

The main drawback of globally partitioned indexes is the need to manage dropping both the table and index partitions. While the overhead of reading from and maintaining the index partitions map is minimal, local indexes offer a slightly different approach.

Instead of partitioning a single global index, we create a local index within each table partition. This way, local indexes retain the key advantage of globally partitioned indexes—zero downtime—while avoiding their drawbacks.

This seems like the perfect solution. However, before we celebrate, let’s analyze the potential execution plan of a few queries.

Exercise 5: Locally Partitioned Index

Let’s revisit our query, this time utilizing the locally partitioned index on DateFrom.

You probably used this execution plan:

1
2
3
4
5
6
7
For all partitions where [StartDateFrom, StartDateTo)
              intersects ['2021-03-01', '2021-03-03']
  Get first row from IX_DateFrom where DateFrom between '2021-03-01' and '2021-03-03'
  While found and DateFrom < '2021-03-04'
    Fetch Reservations.* where RowAddress = IX_DateFrom.RowAddress
    Write down HotelID, ReservationID, ClientID, DateFrom, DateTo
    Get next row from IX_DateFrom 

We’re fortunate that all dates fall within a single partition, requiring traversal of only one local index. If the period spanned six months, we’d have to read six local indexes.

Exercise 6: In Contrast

This time, use the Reservations partition map to list the periods when Client 124 visited Hotel 1:

1
2
3
4
SELECT DateFrom, DateTo
FROM Reservations
WHERE ClientID = 124
  AND HotelID = 1;

This exercise highlights the primary disadvantage of local indexes. We had to read the IX_HotelID_CientID local index sheet from every single partition of the Reservations table:

1
2
3
4
5
6
For all partitions
  Get first row from IX_HotelID_ClientID where ClientID = 124 and HotelID = 1
  While found and ClientID = 124 and HotelID = 1
    Fetch Reservations.* where RowAddress = IX_HotelID_ClientID.RowAddress
    Write down DateFrom, DateTo
    Get next row from IX_HotelID_ClientID

This execution would undoubtedly read more blocks and take longer than if the table weren’t partitioned.

Therefore, while we found a way to maintain index health during off-peak hours, it came at the cost of potentially slower queries.

If our business model permits maintaining a small number of partitions or if the most frequent queries include criteria allowing the RDBMS to access only one or two partitions, this solution might be suitable. Otherwise, it’s better to focus on optimizing the data model, indexes, and queries themselves—and potentially enhancing the database server—rather than partitioning.

Indexes in SQL: What to Learn Next

This concludes our exploration of SQL indexes. We focused on the index implementation common to all modern RDBMSs, prioritizing topics relevant to application developers over those typically concerning database administrators. While the latter might benefit from researching the impact of fill factor on index fragmentation, both roles can benefit from further exploring these areas:

Data and index caching
Non-B-tree index structures (e.g., hash, GiST, bitmap, and columnstore indexes)
Clustered indexes (also known as index-organized tables in Oracle)
Functional indexes
Partial indexes

We discussed range partitioning, the most common type. However, other partitioning schemes exist, like hash and list partitioning. Additionally, some RDBMSs offer multi-level partitioning.

Finally, SQL developers should delve into other important RDBMS query execution aspects, including query parsing, cost-based execution plan compilation, caching, and reuse.