読者です 読者をやめる 読者になる 読者になる

MySQLの範囲検索でレコードが多すぎるとフルテーブルスキャンになってしまう現象

mysql innodb

created_atが何月何日から何月何日のレコードをSELECTする場合に、ある量を超えるとEXPLAIN上ではフルテーブルスキャンの判定になってしまったのでどうしてなのか実験してみました。

実験は次のような環境で行いました。

OS: OS X El Capitan 10.11.3
CPU: 1.2 GHz Intel Core M
Memory: 8GB 1600 MHz DDR3

$ mysql --version
mysql  Ver 14.14 Distrib 5.6.28, for osx10.11 (x86_64) using  EditLine wrapper

準備

まず、range_testという名前のテーブルを作ります。

int型、bigint型、datetime型のカラムを作成して、それぞれ比較してみました。

カラム名 取りうる値の範囲
id bigint unsigned (PK) 1 〜 5,000,000
int_col int 0 〜 10,000
bigint_col bigint 1 〜 9,999,999
datetime_col datetime 2000-01-01 00:00:00 〜3000-01-08 00:00:00

インデックスが効くかどうかの実験なので、それぞれのカラムにインデックスを貼っておきます。

mysql> create database db_range_test;  
Query OK, 1 row affected (0.03 sec)

mysql> use db_range_test;
Database changed

mysql> create table range_test (
    -> id bigint unsigned not null auto_increment,
    -> int_col int,
    -> bigint_col bigint,
    -> datetime_col datetime,
    -> primary key (id),
    -> key idx_int (int_col),
    -> key idx_bigint (bigint_col),
    -> key idx_datetime (datetime_col)
    -> ) engine=InnoDB default charset utf8;
Query OK, 0 rows affected (0.11 sec)

こんな感じのスクリプトを書いて、MySQLに5,000,000件のダミーデータをINSERTします。

require 'sequel'
require 'mysql2'
require 'active_support'
require 'active_support/core_ext'

DB = Sequel.connect('mysql2://root@localhost/db_range_test')

(1..5_000_000).each do |i|
  int_r = rand(0..10_000)
  big_r = rand(0..10_000_000)
  dat_r = rand(0..1_000)
  DB[:range_test].insert(
    int_col: int_r,
    bigint_col: big_r,
    datetime_col: DateTime.new(2000, 1, 1).since(dat_r.years)
  )
end

5,000,000件入りました。 SELECT COUNT(*)に5秒弱かかるとは。。。

mysql> select count(*) from range_test;
+----------+
| count(*) |
+----------+
|  5000000 |
+----------+
1 row in set (4.96 sec)

mysql> select * from range_test limit 10;
+----+---------+------------+---------------------+
| id | int_col | bigint_col | datetime_col        |
+----+---------+------------+---------------------+
|  1 |    7521 |    8840230 | 2988-01-08 00:00:00 |
|  2 |    8951 |     617956 | 2454-01-03 12:00:00 |
|  3 |    4677 |    4115021 | 2111-01-01 18:00:00 |
|  4 |    8235 |    7919478 | 2906-01-07 12:00:00 |
|  5 |    5710 |    2487144 | 2958-01-07 12:00:00 |
|  6 |    9313 |       3383 | 2805-01-06 06:00:00 |
|  7 |    3602 |    9241458 | 2336-01-04 00:00:00 |
|  8 |    8950 |    8957220 | 2970-01-07 12:00:00 |
|  9 |    9052 |     731523 | 2865-01-06 06:00:00 |
| 10 |    5988 |    5204737 | 2121-01-01 06:00:00 |
+----+---------+------------+---------------------+
10 rows in set (0.00 sec)

実験

int

mysql> explain select * from range_test where 0 <= int_col and int_col <= 7;
+----+-------------+------------+-------+---------------+---------+---------+------+------+-----------------------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows | Extra                 |
+----+-------------+------------+-------+---------------+---------+---------+------+------+-----------------------+
|  1 | SIMPLE      | range_test | range | idx_int       | idx_int | 5       | NULL | 3849 | Using index condition |
+----+-------------+------------+-------+---------------+---------+---------+------+------+-----------------------+
1 row in set (0.01 sec)

mysql> select count(*) from range_test where 0 <= int_col and int_col <= 7;
+----------+
| count(*) |
+----------+
|     3850 |
+----------+
1 row in set (0.00 sec)


mysql> explain select * from range_test where 0 <= int_col and int_col <= 8;
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows | Extra                            |
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------+
|  1 | SIMPLE      | range_test | range | idx_int       | idx_int | 5       | NULL | 4355 | Using index condition; Using MRR |
+----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from range_test where 0 <= int_col and int_col <= 8;
+----------+
| count(*) |
+----------+
|     4356 |
+----------+
1 row in set (0.01 sec)


mysql> explain select * from range_test where 0 <= int_col and int_col <= 1009;
+----+-------------+------------+-------+---------------+---------+---------+------+--------+----------------------------------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows   | Extra                            |
+----+-------------+------------+-------+---------------+---------+---------+------+--------+----------------------------------+
|  1 | SIMPLE      | range_test | range | idx_int       | idx_int | 5       | NULL | 929646 | Using index condition; Using MRR |
+----+-------------+------------+-------+---------------+---------+---------+------+--------+----------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from range_test where 0 <= int_col and int_col <= 1009;
+----------+
| count(*) |
+----------+
|   505016 |
+----------+
1 row in set (0.20 sec)


mysql> explain select * from range_test where 0 <= int_col and int_col <= 1010;
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | range_test | ALL  | idx_int       | NULL | NULL    | NULL | 4986946 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

mysql> select count(*) from range_test where 0 <= int_col and int_col <= 1010;
+----------+
| count(*) |
+----------+
|   505547 |
+----------+
1 row in set (0.15 sec)

途中でExtraにUsing MRRというものが出るようになったのですが、どうやらMySQL 5.6から入ったMulti-Range Readというものがあるらしい。

MySQL :: MySQL 5.6 リファレンスマニュアル :: 8.2.1.13 Multi-Range Read の最適化

検索条件に合致するレコード数 全レコード数から占める割合 type Extra
〜 3,850 〜 0.0770% range Using index condition
4,356 〜 505,016 0.0871%〜10.1% range Using index condition; Using MRR
505,547 〜 10.1% 〜 ALL Using where

bitint

mysql> explain select * from range_test where 0 <= bigint_col and bigint_col <= 7843;
+----+-------------+------------+-------+---------------+------------+---------+------+------+-----------------------+
| id | select_type | table      | type  | possible_keys | key        | key_len | ref  | rows | Extra                 |
+----+-------------+------------+-------+---------------+------------+---------+------+------+-----------------------+
|  1 | SIMPLE      | range_test | range | idx_bigint    | idx_bigint | 9       | NULL | 3989 | Using index condition |
+----+-------------+------------+-------+---------------+------------+---------+------+------+-----------------------+
1 row in set (0.00 sec)

mysql> select count(*) from range_test where 0 <= bigint_col and bigint_col <= 7843;
+----------+
| count(*) |
+----------+
|     3990 |
+----------+
1 row in set (0.00 sec)


mysql> explain select * from range_test where 0 <= bigint_col and bigint_col <= 7844;
+----+-------------+------------+-------+---------------+------------+---------+------+------+----------------------------------+
| id | select_type | table      | type  | possible_keys | key        | key_len | ref  | rows | Extra                            |
+----+-------------+------------+-------+---------------+------------+---------+------+------+----------------------------------+
|  1 | SIMPLE      | range_test | range | idx_bigint    | idx_bigint | 9       | NULL | 3990 | Using index condition; Using MRR |
+----+-------------+------------+-------+---------------+------------+---------+------+------+----------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from range_test where 0 <= bigint_col and bigint_col <= 7844;
+----------+
| count(*) |
+----------+
|     3991 |
+----------+
1 row in set (0.00 sec)


mysql> explain select * from range_test where 0 <= bigint_col and bigint_col <= 898823;
+----+-------------+------------+-------+---------------+------------+---------+------+--------+----------------------------------+
| id | select_type | table      | type  | possible_keys | key        | key_len | ref  | rows   | Extra                            |
+----+-------------+------------+-------+---------------+------------+---------+------+--------+----------------------------------+
|  1 | SIMPLE      | range_test | range | idx_bigint    | idx_bigint | 9       | NULL | 982422 | Using index condition; Using MRR |
+----+-------------+------------+-------+---------------+------------+---------+------+--------+----------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from range_test where 0 <= bigint_col and bigint_col <= 898823;
+----------+
| count(*) |
+----------+
|   450211 |
+----------+
1 row in set (0.18 sec)

mysql> explain select * from range_test where 0 <= bigint_col and bigint_col <= 898824;
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | range_test | ALL  | idx_bigint    | NULL | NULL    | NULL | 4986946 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

mysql> select count(*) from range_test where 0 <= bigint_col and bigint_col <= 898824;
+----------+
| count(*) |
+----------+
|   450212 |
+----------+
1 row in set (0.13 sec)
検索条件に合致するレコード数 全レコード数から占める割合 type Extra
〜 3,990 〜 0.0798% range Using index condition
3,991 〜 450,211 0.0798% 〜 9.00% range Using index condition; Using MRR
450,212 〜 9.00% 〜 ALL Using where

datetime

mysql> explain select * from range_test where '2000-01-01 00:00:00' <= datetime_col and datetime_col <= '2079-01-01 00:00:00';
+----+-------------+------------+-------+---------------+--------------+---------+------+--------+----------------------------------+
| id | select_type | table      | type  | possible_keys | key          | key_len | ref  | rows   | Extra                            |
+----+-------------+------------+-------+---------------+--------------+---------+------+--------+----------------------------------+
|  1 | SIMPLE      | range_test | range | idx_datetime  | idx_datetime | 6       | NULL | 905046 | Using index condition; Using MRR |
+----+-------------+------------+-------+---------------+--------------+---------+------+--------+----------------------------------+
1 row in set (0.01 sec)

mysql> select count(*) from range_test where '2000-01-01 00:00:00' <= datetime_col and datetime_col <= '2079-01-01 00:00:00';
+----------+
| count(*) |
+----------+
|   398998 |
+----------+
1 row in set (0.18 sec)


mysql> explain select * from range_test where '2000-01-01 00:00:00' <= datetime_col and datetime_col <= '2080-01-01 00:00:00';
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | range_test | ALL  | idx_datetime  | NULL | NULL    | NULL | 4986946 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

mysql> select count(*) from range_test where '2000-01-01 00:00:00' <= datetime_col and datetime_col <= '2080-01-01 00:00:00';
+----------+
| count(*) |
+----------+
|   404030 |
+----------+
1 row in set (0.16 sec)
検索条件に合致するレコード数 全レコード数から占める割合 type Extra
〜 398,998 〜 7.98% range Using index condition; Using MRR
404,030 〜 8.08% 〜 ALL Using where

単純な実験ですが、全レコードの約9%より多いレコードを取得しようとすると、EXPLAIN上ではフルテーブルスキャンの戦略が選ばれてしまうようでした。

フルテーブルスキャンの戦略が選ばれたクエリを打ってみても、あまり実行時間に変化がない(0.02 sec前後)ので、本当にフルテーブルスキャンしているのかが気になりました。

途中まで実験してから思ったのですが、MySQLのパラメータをいじったら変化が出るのではないかとかも考えたのですが、追いきれなかったのでまた今度。。。


追記

http://spring-mt.hatenablog.com/entry/2016/02/18/010911

optimizer_traceなんてものがあるのですね…!