I had a little research task today at work where I needed to evaluate which MySQL storage engine and technique would be the fastest to retrieve lots of (like millions) log data. I stumbled upon this post which explained the new horizontal partitioning features of MySQL 5.1 and what I read there made me curious to test it out myself, also because the original author forgot to include a test with a (non-)partitioned, but indexed table.
This is my test setup: Linux 2.6.34, MySQL community server 5.1.46, Intel Pentium D CPU with 3.2GHz, 2GB RAM
Test MyISAM tables
The table definitions are copied and adapted from the aforementioned article:
CREATE TABLE myi_no_part (
c1 int default NULL,
c2 varchar(30) default NULL,
c3 date default NULL
) engine=MyISAM;
CREATE TABLE myi_no_part_index (
c1 int default NULL,
c2 varchar(30) default NULL,
c3 date default NULL,
index(c3)
) engine=MyISAM;
CREATE TABLE myi_part (
c1 int default NULL,
c2 varchar(30) default NULL,
c3 date default NULL
) PARTITION BY RANGE (year(c3))
(PARTITION p0 VALUES LESS THAN (1995),
PARTITION p1 VALUES LESS THAN (1996),
PARTITION p2 VALUES LESS THAN (1997),
PARTITION p3 VALUES LESS THAN (1998),
PARTITION p4 VALUES LESS THAN (1999),
PARTITION p5 VALUES LESS THAN (2000),
PARTITION p6 VALUES LESS THAN (2001),
PARTITION p7 VALUES LESS THAN (2002),
PARTITION p8 VALUES LESS THAN (2003),
PARTITION p9 VALUES LESS THAN (2004),
PARTITION p10 VALUES LESS THAN (2010),
PARTITION p11 VALUES LESS THAN MAXVALUE)
engine=MyISAM;
CREATE TABLE myi_part_index (
c1 int default NULL,
c2 varchar(30) default NULL,
c3 date default NULL,
index(c3)
) PARTITION BY RANGE (year(c3))
(PARTITION p0 VALUES LESS THAN (1995),
PARTITION p1 VALUES LESS THAN (1996),
PARTITION p2 VALUES LESS THAN (1997),
PARTITION p3 VALUES LESS THAN (1998),
PARTITION p4 VALUES LESS THAN (1999),
PARTITION p5 VALUES LESS THAN (2000),
PARTITION p6 VALUES LESS THAN (2001),
PARTITION p7 VALUES LESS THAN (2002),
PARTITION p8 VALUES LESS THAN (2003),
PARTITION p9 VALUES LESS THAN (2004),
PARTITION p10 VALUES LESS THAN (2010),
PARTITION p11 VALUES LESS THAN MAXVALUE)
engine=MyISAM;
Test Archive tables
Since MySQL’s Archive engine does only support one index which is primarily used for identifying the primary id, I left out the indexed versions for that:
CREATE TABLE ar_no_part (
c1 int default NULL,
c2 varchar(30) default NULL,
c3 date default NULL
) engine=Archive;
CREATE TABLE ar_part (
c1 int default NULL,
c2 varchar(30) default NULL,
c3 date default NULL,
index(c3)
) PARTITION BY RANGE (year(c3))
(PARTITION p0 VALUES LESS THAN (1995),
PARTITION p1 VALUES LESS THAN (1996),
PARTITION p2 VALUES LESS THAN (1997),
PARTITION p3 VALUES LESS THAN (1998),
PARTITION p4 VALUES LESS THAN (1999),
PARTITION p5 VALUES LESS THAN (2000),
PARTITION p6 VALUES LESS THAN (2001),
PARTITION p7 VALUES LESS THAN (2002),
PARTITION p8 VALUES LESS THAN (2003),
PARTITION p9 VALUES LESS THAN (2004),
PARTITION p10 VALUES LESS THAN (2010),
PARTITION p11 VALUES LESS THAN MAXVALUE)
engine=Archive;
Test data
I re-used the procedure to create about 8 million test data records spread randomly over the complete partitioned area and subsequently copied the generated data to the other tables:
delimiter //
CREATE PROCEDURE load_part_tab()
begin
declare v int default 0;
while v < 8000000
do
insert into myi_no_part
values (v,'testing partitions',adddate('1995-01-01',(rand(v)*36520) mod 3652));
set v = v + 1;
end while;
end
//
delimiter ;
call load_part_tab;
insert into myi_no_part_index select * from myi_no_part;
...
Test query and the results
I used the same query to retrieve data from all of the tables:
select count(*) from TABLE_NAME
where c3 > date '1995-01-01' and c3 < date '1995-12-31';
and these were the results (mean values of several executions):
table |
exec time |
`myi_no_part` |
~ 6.4s |
`myi_no_part_index` |
~ 1.2s |
`myi_part` |
~ 0.7s |
`myi_part_index` |
~ 1.3s |
`ar_no_part` |
~ 10.2s |
`ar_part` |
~ 1.1s |
These results were actually pretty suprising to me, for various reasons:
- I would not have thought that intelligent partitioning would beat an index on the particular column by saving the hard disk space for the index at the same time (roughly 1/3 of the total data size in this test case).
- The values for `myi_no_part` were actually better than expected - I would have thought that these should be much worse, also if you compare them with the values from the author of the original article.
- The archive engine adds actually nothing to the mix, but disadvantages. Maybe my test case is flawed because I "only" tested with 8 million rows, but one can clearly see that a partitionated MyISAM table beats a partitionated Archive table by more than 40%, so the usage of the Archive engine gives you no advantages, but only disadvantages, like being not able to delete records or add additional indexes.
- Apparently partitioning and indexing the column in question is slightly slower instead of faster, however if one tries to use a subset of a partitioned table (like restricting to
where c3 > date '1995-06-01' and c3 < date '1995-08-31'
) it is faster - ~0.3s with index vs ~0.7s without index.
Conclusion
MySQL's partitioning is a great new feature in 5.1 and should be used complementary to subtle and wise indexing.