结论先行

如果对索引字段函数操作,可能会破坏索引值的有序性,因此优化器会决定放弃树搜索功能

条件字段函数操作

交易日志表

1
2
3
4
5
6
7
8
9
CREATE TABLE `tradelog` (
`id` INT(11) NOT NULL,
`tradeid` VARCHAR(32) DEFAULT NULL,
`operator` INT(11) DEFAULT NULL,
`t_modified` DATETIME DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `tradeid` (`tradeid`),
KEY `t_modified` (`t_modified`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- 94608000 = 3 * 365 * 24 * 3600
-- t_modified : 2016-01-01 00:00:00 ~ 2019-01-01 00:00:00
DELIMITER ;;
CREATE PROCEDURE tdata()
BEGIN
DECLARE i INT;
SET i=0;
WHILE i<1000000 DO
INSERT INTO tradelog VALUES (i,i,i,FROM_UNIXTIME(UNIX_TIMESTAMP('2016-01-01 00:00:00')+FLOOR(0+(RAND()*94608000))));
SET i=i+1;
END WHILE;
END;;
DELIMITER ;

CALL tdata();

month函数

1
SELECT COUNT(*) FROM tradelog WHERE MONTH(t_modified)=7;

explain

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mysql> EXPLAIN SELECT COUNT(*) FROM tradelog WHERE MONTH(t_modified)=7\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tradelog
partitions: NULL
type: index
possible_keys: NULL
key: t_modified
key_len: 6
ref: NULL
rows: 998838
filtered: 100.00
Extra: Using where; Using index
  1. key=t_modified:优化器选择了遍历二级索引t_modified
  2. type=index:表示全索引扫描(二级索引)
  3. rows=998,838≈1,000,000:说明这条语句基本扫描了整个二级索引t_modified
  4. Using index:表示使用了覆盖索引无需回表
  5. 在索引字段t_modified上加上MONTH函数,导致了全索引扫描,无法使用树搜索功能

slowlog

Rows_examined=1,000,000,佐证了全索引扫描

1
2
3
4
5
# Time: 2019-02-12T14:25:07.158350+08:00
# User@Host: root[root] @ localhost [] Id: 13
# Query_time: 0.208787 Lock_time: 0.000162 Rows_sent: 1 Rows_examined: 1000000
SET timestamp=1549952707;
SELECT COUNT(*) FROM tradelog WHERE MONTH(t_modified)=7;

分析

  1. WHERE t_modified='2018-07-01',InnoDB会按照绿色箭头的路线找到结果(树搜索)
    • 这源于B+树的特性:同一层兄弟节点的有序性
  2. WHERE MONTH(t_modified)=7,在树的第一层就不知道如何操作,因此优化器放弃了树搜索功能
    • 优化器可以选择遍历聚簇索引,或者遍历二级索引t_modified
    • 优化器在对比索引大小后发现,二级索引t_modified更小,最终选择了遍历二级索引t_modified

优化方案

1
2
3
4
mysql> SELECT COUNT(*) FROM tradelog WHERE
-> (t_modified >= '2016-7-1' AND t_modified<'2016-8-1') OR
-> (t_modified >= '2017-7-1' AND t_modified<'2017-8-1') OR
-> (t_modified >= '2018-7-1' AND t_modified<'2018-8-1');

explain

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
mysql> EXPLAIN SELECT COUNT(*) FROM tradelog WHERE
-> (t_modified >= '2016-7-1' AND t_modified<'2016-8-1') OR
-> (t_modified >= '2017-7-1' AND t_modified<'2017-8-1') OR
-> (t_modified >= '2018-7-1' AND t_modified<'2018-8-1')\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tradelog
partitions: NULL
type: range
possible_keys: t_modified
key: t_modified
key_len: 6
ref: NULL
rows: 180940
filtered: 100.00
Extra: Using where; Using index
  1. type=range:表示索引范围扫描(二级索引)
  2. rows=180,940 < 998,838,扫描行数远小于上面使用MONTH函数的情况

slowlog

Rows_examined=84,704 < 1,000,000Query_time也仅为使用MONTH函数情况的25%

1
2
3
4
5
# Time: 2019-02-12T14:56:51.727672+08:00
# User@Host: root[root] @ localhost [] Id: 13
# Query_time: 0.051701 Lock_time: 0.000239 Rows_sent: 1 Rows_examined: 84704
SET timestamp=1549954611;
SELECT COUNT(*) FROM tradelog WHERE (t_modified >= '2016-7-1' AND t_modified<'2016-8-1') OR (t_modified >= '2017-7-1' AND t_modified<'2017-8-1') OR (t_modified >= '2018-7-1' AND t_modified<'2018-8-1');

id+1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
mysql> explain select * from tradelog where id+1 = 1000000\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tradelog
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 998838
filtered: 100.00
Extra: Using where

mysql> EXPLAIN SELECT * FROM tradelog WHERE id = 999999\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tradelog
partitions: NULL
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: const
rows: 1
filtered: 100.00
Extra: NULL
  1. 优化器会偷懒,依然认为id+1=1,000,000是应用在索引字段上的函数,因此采用的是全表扫描
  2. id=999,999会走聚簇索引树搜索const表示这是常量操作(最多只会有一行记录匹配)

隐式类型转换

字符串 -> 数字

在MySQL中,如果字符串和数字做比较,会先将字符串转换为数字

1
2
3
4
5
6
mysql> SELECT '10' > 9;
+----------+
| '10' > 9 |
+----------+
| 1 |
+----------+

tradeid

explain

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mysql> EXPLAIN SELECT * FROM tradelog WHERE tradeid=625912\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tradelog
partitions: NULL
type: ALL
possible_keys: tradeid
key: NULL
key_len: NULL
ref: NULL
rows: 998838
filtered: 10.00
Extra: Using where
  1. type=ALL:表示全表扫描
  2. rows=998,838≈1,000,000
  3. 等价于SELECT * FROM tradelog WHERE CAST(tradid AS SIGNED INT)=625912;
    • 隐式的类型转换,导致会在索引字段上做函数操作,优化器会放弃走树搜索的功能

slowlog

Rows_examined依然为1,000,000

1
2
3
4
5
# Time: 2019-02-12T15:30:09.033772+08:00
# User@Host: root[root] @ localhost [] Id: 13
# Query_time: 0.312170 Lock_time: 0.000114 Rows_sent: 1 Rows_examined: 1000000
SET timestamp=1549956609;
SELECT * FROM tradelog WHERE tradeid=625912;

id

explain

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mysql> EXPLAIN SELECT * FROM tradelog WHERE id='625912'\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tradelog
partitions: NULL
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: const
rows: 1
filtered: 100.00
Extra: NULL
  1. type=const:表示常量操作
  2. key=PRIMARY:走聚簇索引的树搜索功能
  3. rows=1:只需要扫描一行
  4. 等价于SELECT * FROM tradelog WHERE id=CAST('625912' AS SIGNED INT);
    • 只是在输入参数上做隐式类型转换,在索引字段上并没有做函数操作,依然可以走聚簇索引的树搜索功能

slowlog

Rows_examined=1,只需要扫描一行

1
2
3
4
5
# Time: 2019-02-12T15:45:38.222760+08:00
# User@Host: root[root] @ localhost [] Id: 13
# Query_time: 0.000476 Lock_time: 0.000210 Rows_sent: 1 Rows_examined: 1
SET timestamp=1549957538;
SELECT * FROM tradelog WHERE id='625912';

隐式字符编码转换

交易详情表

1
2
3
4
5
6
7
8
9
-- tradelog的编码为utf8mb4,trade_detail的编码为utf8
CREATE TABLE `trade_detail` (
`id` INT(11) NOT NULL,
`tradeid` VARCHAR(32) DEFAULT NULL,
`trade_step` INT(11) DEFAULT NULL,
`step_info` VARCHAR(32) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `tradeid` (`tradeid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
INSERT INTO tradelog VALUES (1, 'aaaaaaaa', 1000, NOW());
INSERT INTO tradelog VALUES (2, 'aaaaaaab', 1000, NOW());
insert into tradelog VALUES (3, 'aaaaaaac', 1000, NOW());

INSERT INTO trade_detail VALUES (1, 'aaaaaaaa', 1, 'add');
INSERT INTO trade_detail VALUES (2, 'aaaaaaaa', 2, 'update');
INSERT INTO trade_detail VALUES (3, 'aaaaaaaa', 3, 'commit');
INSERT INTO trade_detail VALUES (4, 'aaaaaaab', 1, 'add');
INSERT INTO trade_detail VALUES (5, 'aaaaaaab', 2, 'update');
INSERT INTO trade_detail VALUES (6, 'aaaaaaab', 3, 'update again');
INSERT INTO trade_detail VALUES (7, 'aaaaaaab', 4, 'commit');
INSERT INTO trade_detail VALUES (8, 'aaaaaaac', 1, 'add');
INSERT INTO trade_detail VALUES (9, 'aaaaaaac', 2, 'update');
INSERT INTO trade_detail VALUES (10, 'aaaaaaac', 3, 'update again');
INSERT INTO trade_detail VALUES (11, 'aaaaaaac', 4, 'commit');

函数作用于二级索引

explain

1
2
3
4
5
6
7
mysql> EXPLAIN SELECT d.* FROM tradelog l, trade_detail d WHERE d.tradeid=l.tradeid AND l.id=2;
+----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | l | NULL | const | PRIMARY,tradeid | PRIMARY | 4 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | d | NULL | ALL | NULL | NULL | NULL | NULL | 11 | 100.00 | Using where |
+----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+
  1. tradelog称为驱动表trade_detail称为被驱动表tradeid关联字段。驱动原则:_小表驱动大表_
  2. 优化器会先在tradelog表上查找id=2的行,使用了tradelog的聚簇索引,只扫描了一行,取出tradeid='aaaaaaab'
  3. 然后到trade_detail表上查找tradeid='aaaaaaab'的行,但没有选择二级索引tradeid,而选择了全表扫描
    • type=ALL,不符合预期,本希望走二级索引tradeid的树搜索功能
    • 原因:两个表的字符集不相同
      • tradelog的编码为utf8mb4trade_detail的编码为utf8
      • utf8mb4utf8的超集,详见mysql中utf8和utf8mb4区别
      • d.tradeid=l.tradeid时,需要先utf8字符串转换成utf8mb4字符串
      • 因此,被驱动表trade_detail里面的tradeid字段需要先转换成utf8mb4类型,再跟L2进行比较
    • 等价于SELECT * FROM trade_detail WHERE CONVERT(traideid USING utf8mb4)=$L2.tradeid.value;
      • 隐式的字符编码转换,导致会在二级索引tradeid上做函数操作,优化器会放弃走树搜索的功能

slowlog

1
2
3
4
5
# Time: 2019-02-12T16:45:14.841502+08:00
# User@Host: root[root] @ localhost [] Id: 13
# Query_time: 0.000470 Lock_time: 0.000202 Rows_sent: 4 Rows_examined: 11
SET timestamp=1549961114;
SELECT d.* FROM tradelog l, trade_detail d WHERE d.tradeid=l.tradeid AND l.id=2;

函数作用于输入参数

explain

1
2
3
4
5
6
7
mysql> EXPLAIN SELECT l.* FROM tradelog l, trade_detail d WHERE d.tradeid=l.tradeid AND d.id=4;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | d | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | l | NULL | ref | tradeid | tradeid | 131 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
  1. trade_detail称为驱动表tradelog称为被驱动表tradeid关联字段
  2. 被驱动表tradelog的编码为utf8mb4,驱动表trade_detail的编码为utf8
    • 等价于SELECT * FROM tradelog WHERE traideid = CONVERT($R4.tradeid.value USING utf8mb4);
    • 函数是用在输入参数上的,并非二级索引tradeid上,因此可以用树搜索功能(key=tradeidrows=1
    • type=refJoin语句中被驱动表索引引用的查询

slowlog

1
2
3
4
5
# Time: 2019-02-12T17:31:50.553151+08:00
# User@Host: root[root] @ localhost [] Id: 13
# Query_time: 0.004090 Lock_time: 0.001874 Rows_sent: 1 Rows_examined: 1
SET timestamp=1549963910;
SELECT l.* FROM tradelog l, trade_detail d WHERE d.tradeid=l.tradeid AND d.id=4;

优化方案

  1. 常用:将trade_detail.tradeid的字符串编码修改为utf8mb4
    • ALTER TABLE trade_detail MODIFY tradeid VARCHAR(32) CHARACTER SET utf8mb4 DEFAULT NULL;
  2. 修改SQL(场景:数据量较大或暂不支持该DDL)
    • 主动把l.tradeid转换为utf8,避免了被驱动表上的隐式字符编码转换
    • SELECT d.* FROM tradelog l, trade_detail d WHERE d.tradeid=CONVERT(l.tradeid USING utf8) AND l.id=2;
1
2
3
4
5
6
7
mysql> EXPLAIN SELECT d.* FROM tradelog l, trade_detail d WHERE d.tradeid=CONVERT(l.tradeid USING utf8) AND l.id=2;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | l | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | d | NULL | ref | tradeid | tradeid | 99 | const | 4 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
1
2
3
4
5
# Time: 2019-02-12T17:50:29.844772+08:00
# User@Host: root[root] @ localhost [] Id: 13
# Query_time: 0.000504 Lock_time: 0.000206 Rows_sent: 4 Rows_examined: 4
SET timestamp=1549965029;
SELECT d.* FROM tradelog l, trade_detail d WHERE d.tradeid=CONVERT(l.tradeid USING utf8) AND l.id=2;

参考资料

《MySQL实战45讲》