Query Performance Tuning: Why We Must Consider Partitions

f:id:TosskaTechnologies:20201110224427g:plain

How many partitions can one have in MySQL? As a DBA, you may face this question sooner or later, either as a hypothetical one or as an interesting case in a real-world scenario. What if a client dealing with a huge MySQL data warehouse has a table with data combined through INSERT ON DUPLICATE KEY UPDATE queries?

As one might expect, the result would be an extremely slow performance, thanks to the creation of numerous daily partitions for the table. Additionally, every statement faces a different impact, but the database would require query performance tuning regardless.

Optimization in SQL - Keep a Check on the Number of Partitions

Suppose you create the following test table -

CREATE TABLE `p10` (

`identity` int(10) unsigned NOT NULL,

`cust` int(10) unsigned NOT NULL,

PRIMARY KEY (`identity`),

KEY(cust)

) ENGINE=InnoDB

PARTITION BY RANGE(identity) (

PARTITION p100000 VALUES LESS THAN(100001),

PARTITION p200000 VALUES LESS THAN(200001),

PARTITION p900000 VALUES LESS THAN(900001),

PARTITION p1000000 VALUES LESS THAN(1000001)

);

……………………………….

CREATE TABLE `p10` (

`id` int(10) unsigned NOT NULL,

`c` int(10) unsigned NOT NULL,

PRIMARY KEY (`id`),

KEY(c)

) ENGINE=InnoDB

PARTITION BY RANGE(id) (

PARTITION p100000 VALUES LESS THAN(100001),

PARTITION p200000 VALUES LESS THAN(200001),

PARTITION p900000 VALUES LESS THAN(900001),

PARTITION p1000000 VALUES LESS THAN(1000001)

);

If you vary the quantity of partitions between one and thousand, following it up by loading the table full of a million sequential values ranging from one to a million with the help of bulk insert queries (adding a thousand to every column), here’s how much time it’ll take to load that information -

9.5 seconds for a single partition
10.5 secs for ten partitions
16.8 secs for a hundred partitions
24 secs for a thousand partitions

In other words, loading slows down by at least 2.6 times with an increase of a thousand in the quantity of partitions. When you attempt optimization in SQL, this seems surprising because only one out of a maximum of two partitions are actually having data insertions in each insert statement.

Such regression only increases if you test the UPDATE query as INSERT ON DUPLICATE KEY UPDATE set c=c+1 to the bulk inserts. In fact, it was observed that -

f:id:TosskaTechnologies:20201110224545g:plain

A single partition insertion took 51 seconds
Ten partitions took over 53 seconds
A hundred partitions consumed 73 seconds
A thousand partitions took over 293 seconds

The pattern has shifted slightly here in terms of performance loss - it has gotten almost 1.5 times slower, worsening with the jump from 100 to 1000 partitions.

Again, the difference increases once you eliminate an index on column C. The UPDATE aspect of the INSERT ON DUPLICATE KEY UPDATE query takes 23 seconds for a single partition and over 250 for a thousand partitions – which is more than ten times the difference - calling for a much-needed performance tuning in SQL Oracle.

This problem is caused in both MyISAM and InnoDB. In MyISAM, the Update path for in the absence of indexes took 11 seconds for one partition and over 53 seconds for a thousand partitions.

What Lead to Such Drastic Results, Requiring a Performance Tuning in SQL Oracle?

There are a few suspects here. One of them is the “setup” expense of the query that prepares all the partitions for execution, and the other could be the time taken to perform this for each row - the “per row” cost.

When we wanted to confirm this, we ran the queries using different quantities of rows in the batch. Using a hundred rows in each batch versus ten thousand rows per batch didn’t make too much of a difference than using a thousand rows in every batch. Therefore, we can safely conclude that it is a per-row expense.

Interestingly, it shows updates in Innodb can get over five times slower than their insertion counterparts, at least for CPU bound workloads. No wonder we end up query performance tuning for databases!

In Conclusion

The most important point to note here is that you need to be careful with partitions. The more you include, the slower your database will get, so try to avoid too many unused partitions, especially those you create in advance, thinking they will be useful in the future.

Taking into account the example above, database performance remains more or less consistent up to a hundred partitions. However, these numbers may vary significantly based on database workloads.

f:id:TosskaTechnologies:20201110224259g:plain