MySQLの範囲検索でレコードが多すぎるとフルテーブルスキャンになってしまう現象
created_atが何月何日から何月何日
のレコードをSELECTする場合に、ある量を超えるとEXPLAIN上ではフルテーブルスキャンの判定になってしまったのでどうしてなのか実験してみました。
実験は次のような環境で行いました。
OS: OS X El Capitan 10.11.3 CPU: 1.2 GHz Intel Core M Memory: 8GB 1600 MHz DDR3 $ mysql --version mysql Ver 14.14 Distrib 5.6.28, for osx10.11 (x86_64) using EditLine wrapper
準備
まず、range_test
という名前のテーブルを作ります。
int型、bigint型、datetime型のカラムを作成して、それぞれ比較してみました。
カラム名 | 型 | 取りうる値の範囲 |
---|---|---|
id | bigint unsigned (PK) | 1 〜 5,000,000 |
int_col | int | 0 〜 10,000 |
bigint_col | bigint | 1 〜 9,999,999 |
datetime_col | datetime | 2000-01-01 00:00:00 〜3000-01-08 00:00:00 |
インデックスが効くかどうかの実験なので、それぞれのカラムにインデックスを貼っておきます。
mysql> create database db_range_test; Query OK, 1 row affected (0.03 sec) mysql> use db_range_test; Database changed mysql> create table range_test ( -> id bigint unsigned not null auto_increment, -> int_col int, -> bigint_col bigint, -> datetime_col datetime, -> primary key (id), -> key idx_int (int_col), -> key idx_bigint (bigint_col), -> key idx_datetime (datetime_col) -> ) engine=InnoDB default charset utf8; Query OK, 0 rows affected (0.11 sec)
こんな感じのスクリプトを書いて、MySQLに5,000,000件のダミーデータをINSERTします。
require 'sequel' require 'mysql2' require 'active_support' require 'active_support/core_ext' DB = Sequel.connect('mysql2://root@localhost/db_range_test') (1..5_000_000).each do |i| int_r = rand(0..10_000) big_r = rand(0..10_000_000) dat_r = rand(0..1_000) DB[:range_test].insert( int_col: int_r, bigint_col: big_r, datetime_col: DateTime.new(2000, 1, 1).since(dat_r.years) ) end
5,000,000件入りました。 SELECT COUNT(*)に5秒弱かかるとは。。。
mysql> select count(*) from range_test; +----------+ | count(*) | +----------+ | 5000000 | +----------+ 1 row in set (4.96 sec) mysql> select * from range_test limit 10; +----+---------+------------+---------------------+ | id | int_col | bigint_col | datetime_col | +----+---------+------------+---------------------+ | 1 | 7521 | 8840230 | 2988-01-08 00:00:00 | | 2 | 8951 | 617956 | 2454-01-03 12:00:00 | | 3 | 4677 | 4115021 | 2111-01-01 18:00:00 | | 4 | 8235 | 7919478 | 2906-01-07 12:00:00 | | 5 | 5710 | 2487144 | 2958-01-07 12:00:00 | | 6 | 9313 | 3383 | 2805-01-06 06:00:00 | | 7 | 3602 | 9241458 | 2336-01-04 00:00:00 | | 8 | 8950 | 8957220 | 2970-01-07 12:00:00 | | 9 | 9052 | 731523 | 2865-01-06 06:00:00 | | 10 | 5988 | 5204737 | 2121-01-01 06:00:00 | +----+---------+------------+---------------------+ 10 rows in set (0.00 sec)
実験
int
mysql> explain select * from range_test where 0 <= int_col and int_col <= 7; +----+-------------+------------+-------+---------------+---------+---------+------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+---------+---------+------+------+-----------------------+ | 1 | SIMPLE | range_test | range | idx_int | idx_int | 5 | NULL | 3849 | Using index condition | +----+-------------+------------+-------+---------------+---------+---------+------+------+-----------------------+ 1 row in set (0.01 sec) mysql> select count(*) from range_test where 0 <= int_col and int_col <= 7; +----------+ | count(*) | +----------+ | 3850 | +----------+ 1 row in set (0.00 sec) mysql> explain select * from range_test where 0 <= int_col and int_col <= 8; +----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------+ | 1 | SIMPLE | range_test | range | idx_int | idx_int | 5 | NULL | 4355 | Using index condition; Using MRR | +----+-------------+------------+-------+---------------+---------+---------+------+------+----------------------------------+ 1 row in set (0.00 sec) mysql> select count(*) from range_test where 0 <= int_col and int_col <= 8; +----------+ | count(*) | +----------+ | 4356 | +----------+ 1 row in set (0.01 sec) mysql> explain select * from range_test where 0 <= int_col and int_col <= 1009; +----+-------------+------------+-------+---------------+---------+---------+------+--------+----------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+---------+---------+------+--------+----------------------------------+ | 1 | SIMPLE | range_test | range | idx_int | idx_int | 5 | NULL | 929646 | Using index condition; Using MRR | +----+-------------+------------+-------+---------------+---------+---------+------+--------+----------------------------------+ 1 row in set (0.00 sec) mysql> select count(*) from range_test where 0 <= int_col and int_col <= 1009; +----------+ | count(*) | +----------+ | 505016 | +----------+ 1 row in set (0.20 sec) mysql> explain select * from range_test where 0 <= int_col and int_col <= 1010; +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | range_test | ALL | idx_int | NULL | NULL | NULL | 4986946 | Using where | +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+ 1 row in set (0.00 sec) mysql> select count(*) from range_test where 0 <= int_col and int_col <= 1010; +----------+ | count(*) | +----------+ | 505547 | +----------+ 1 row in set (0.15 sec)
途中でExtraにUsing MRR
というものが出るようになったのですが、どうやらMySQL 5.6から入ったMulti-Range Readというものがあるらしい。
MySQL :: MySQL 5.6 リファレンスマニュアル :: 8.2.1.13 Multi-Range Read の最適化
検索条件に合致するレコード数 | 全レコード数から占める割合 | type | Extra |
---|---|---|---|
〜 3,850 | 〜 0.0770% | range | Using index condition |
4,356 〜 505,016 | 0.0871%〜10.1% | range | Using index condition; Using MRR |
505,547 〜 | 10.1% 〜 | ALL | Using where |
bitint
mysql> explain select * from range_test where 0 <= bigint_col and bigint_col <= 7843; +----+-------------+------------+-------+---------------+------------+---------+------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+------------+---------+------+------+-----------------------+ | 1 | SIMPLE | range_test | range | idx_bigint | idx_bigint | 9 | NULL | 3989 | Using index condition | +----+-------------+------------+-------+---------------+------------+---------+------+------+-----------------------+ 1 row in set (0.00 sec) mysql> select count(*) from range_test where 0 <= bigint_col and bigint_col <= 7843; +----------+ | count(*) | +----------+ | 3990 | +----------+ 1 row in set (0.00 sec) mysql> explain select * from range_test where 0 <= bigint_col and bigint_col <= 7844; +----+-------------+------------+-------+---------------+------------+---------+------+------+----------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+------------+---------+------+------+----------------------------------+ | 1 | SIMPLE | range_test | range | idx_bigint | idx_bigint | 9 | NULL | 3990 | Using index condition; Using MRR | +----+-------------+------------+-------+---------------+------------+---------+------+------+----------------------------------+ 1 row in set (0.00 sec) mysql> select count(*) from range_test where 0 <= bigint_col and bigint_col <= 7844; +----------+ | count(*) | +----------+ | 3991 | +----------+ 1 row in set (0.00 sec) mysql> explain select * from range_test where 0 <= bigint_col and bigint_col <= 898823; +----+-------------+------------+-------+---------------+------------+---------+------+--------+----------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+------------+---------+------+--------+----------------------------------+ | 1 | SIMPLE | range_test | range | idx_bigint | idx_bigint | 9 | NULL | 982422 | Using index condition; Using MRR | +----+-------------+------------+-------+---------------+------------+---------+------+--------+----------------------------------+ 1 row in set (0.00 sec) mysql> select count(*) from range_test where 0 <= bigint_col and bigint_col <= 898823; +----------+ | count(*) | +----------+ | 450211 | +----------+ 1 row in set (0.18 sec) mysql> explain select * from range_test where 0 <= bigint_col and bigint_col <= 898824; +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | range_test | ALL | idx_bigint | NULL | NULL | NULL | 4986946 | Using where | +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+ 1 row in set (0.00 sec) mysql> select count(*) from range_test where 0 <= bigint_col and bigint_col <= 898824; +----------+ | count(*) | +----------+ | 450212 | +----------+ 1 row in set (0.13 sec)
検索条件に合致するレコード数 | 全レコード数から占める割合 | type | Extra |
---|---|---|---|
〜 3,990 | 〜 0.0798% | range | Using index condition |
3,991 〜 450,211 | 0.0798% 〜 9.00% | range | Using index condition; Using MRR |
450,212 〜 | 9.00% 〜 | ALL | Using where |
datetime
mysql> explain select * from range_test where '2000-01-01 00:00:00' <= datetime_col and datetime_col <= '2079-01-01 00:00:00'; +----+-------------+------------+-------+---------------+--------------+---------+------+--------+----------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+--------------+---------+------+--------+----------------------------------+ | 1 | SIMPLE | range_test | range | idx_datetime | idx_datetime | 6 | NULL | 905046 | Using index condition; Using MRR | +----+-------------+------------+-------+---------------+--------------+---------+------+--------+----------------------------------+ 1 row in set (0.01 sec) mysql> select count(*) from range_test where '2000-01-01 00:00:00' <= datetime_col and datetime_col <= '2079-01-01 00:00:00'; +----------+ | count(*) | +----------+ | 398998 | +----------+ 1 row in set (0.18 sec) mysql> explain select * from range_test where '2000-01-01 00:00:00' <= datetime_col and datetime_col <= '2080-01-01 00:00:00'; +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | range_test | ALL | idx_datetime | NULL | NULL | NULL | 4986946 | Using where | +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+ 1 row in set (0.00 sec) mysql> select count(*) from range_test where '2000-01-01 00:00:00' <= datetime_col and datetime_col <= '2080-01-01 00:00:00'; +----------+ | count(*) | +----------+ | 404030 | +----------+ 1 row in set (0.16 sec)
検索条件に合致するレコード数 | 全レコード数から占める割合 | type | Extra |
---|---|---|---|
〜 398,998 | 〜 7.98% | range | Using index condition; Using MRR |
404,030 〜 | 8.08% 〜 | ALL | Using where |
単純な実験ですが、全レコードの約9%より多いレコードを取得しようとすると、EXPLAIN上ではフルテーブルスキャンの戦略が選ばれてしまうようでした。
フルテーブルスキャンの戦略が選ばれたクエリを打ってみても、あまり実行時間に変化がない(0.02 sec前後)ので、本当にフルテーブルスキャンしているのかが気になりました。
途中まで実験してから思ったのですが、MySQLのパラメータをいじったら変化が出るのではないかとかも考えたのですが、追いきれなかったのでまた今度。。。
追記
http://spring-mt.hatenablog.com/entry/2016/02/18/010911
optimizer_traceなんてものがあるのですね…!