索引 - explain详解

  • 作者:KK

  • 发表日期:2017.9.13


多次查询就有多条分析记录

以下LEFT JOIN的查询,实际上有2次查询

mysql> EXPLAIN SELECT * FROM user AS a LEFT JOIN article AS b on b.user_id = a.id WHERE a.id = 1;
+----+-------------+-------+--------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | a     | const  | PRIMARY       | PRIMARY | 4       | const |    1 |       |
|  1 | SIMPLE      | b     | system | user_id       | NULL    | NULL    | NULL  |    1 |       |
+----+-------------+-------+--------+---------------+---------+---------+-------+------+-------+

从table列可以看出表名分别是a和b


select_type列(查询类型)

PRIMARY(主查询)和DEPENDENT SUBQUERY(子查询)

通常会在分析结果中发现select_type的值是SIMPLE(简单的查询),当然它是有其它值的,看看这个带子查询的:

mysql> EXPLAIN SELECT * FROM article WHERE id IN (SELECT id FROM user WHERE id = 1);
+----+--------------------+---------+--------+---------------+---------+---------+------+------+-------------+
| id | select_type        | table   | type   | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+--------------------+---------+--------+---------------+---------+---------+------+------+-------------+
|  1 | PRIMARY            | article | system | NULL          | NULL    | NULL    | NULL |    1 |             |
|  2 | DEPENDENT SUBQUERY | user    | const  | PRIMARY       | PRIMARY | 4       |      |    1 | Using index |
+----+--------------------+---------+--------+---------------+---------+---------+------+------+-------------+

子查询语句对user表进行了查询,因此以上结果中可以看到user表的分析记录里select_type是DEPENDENT SUBQUERY,这时候外面的会是PRIMARY,可以解读为主查询的意思


UNION(联合查询)

mysql> EXPLAIN SELECT * FROM article WHERE id = 1 union SELECT * FROM user WHERE id = 1;
+----+--------------+------------+--------+---------------+---------+---------+-------+------+-------+
| id | select_type  | table      | type   | possible_keys | key     | key_len | ref   | rows | Extra |
+----+--------------+------------+--------+---------------+---------+---------+-------+------+-------+
|  1 | PRIMARY      | article    | system | PRIMARY       | NULL    | NULL    | NULL  |    1 |       |
|  2 | UNION        | user       | const  | PRIMARY       | PRIMARY | 4       | const |    1 |       |
| NULL | UNION RESULT | <union1,2> | ALL    | NULL          | NULL    | NULL    | NULL  | NULL |       |
+----+--------------+------------+--------+---------------+---------+---------+-------+------+-------+

左边是主查询(PRIMARY),右边是被联合的查询(UNION),两边查出来后合并成一个临时表(UNION RESULT)

当有子查询或者union的时候,最外面的就是PRIMARY


少见的情况1

mysql> EXPLAIN
    -> SELECT * FROM(
    -> 		(SELECT * FROM article1 WHERE id < 100)
    -> UNION ALL
    -> 		(SELECT * FROM article2 WHERE id >= 1000 <= 2000)
    -> UNION ALL
    -> 		(SELECT * FROM article3 ))
    -> AS TMP_TABLE;
+----+--------------+--------------+--------+---------------+------+---------+------+------+-------------+
| id | select_type  | table        | type   | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+--------------+--------------+--------+---------------+------+---------+------+------+-------------+
|  1 | PRIMARY      | <derived2>   | ALL    | NULL          | NULL | NULL    | NULL |    5 |             |
|  2 | DERIVED      | article1     | system | PRIMARY       | NULL | NULL    | NULL |    1 |             |
|  3 | UNION        | article2     | ALL    | NULL          | NULL | NULL    | NULL |    2 | Using where |
|  4 | UNION        | article3     | ALL    | NULL          | NULL | NULL    | NULL |    2 |             |
| NULL | UNION RESULT | <union2,3,4> | ALL    | NULL          | NULL | NULL    | NULL | NULL |             |
+----+--------------+--------------+--------+---------------+------+---------+------+------+-------------+

以上语句是将三个查询语句的结果联合成一个最终的结果(TMP_TABLE这个临时表)

最外面的依然是PRIMARY主查询,而第2、第3、第4其实都是UNION,此时第一个UNION会显示为DERIVED

还有DEPENDENT UNIONDEPENDENT SUBQUERY这两种情况我就不举例子了,其实select_type只是让我们知道“这是一个怎样的查询”,在索引使用情况的分析过程中起的作用并不是很大,大家碰多了就惯了


type列(连接类型)

这一列非常重要,代表了连接类别,有没有使用索引,以下将它的值全部列出来,并且从左到右分别是最好到最坏的值,看type就看是否接近最左边吧:

systemconsteq_refreffulltextref_or_nullindex_mergeunique_subqueryindex_subqueryrangeindexall(全表扫描,这个最讨厌了),在解析它们的含义之前,我先贴上官方文档的英文原文(看不懂或没兴趣的请直接跳过往后面看中文):

The type column of EXPLAIN output describes how tables are joined. In JSON-formatted output, these are found as values of the access_type property. The following list describes the join types, ordered from the best type to the worst:


system

The table has only one row (= system table). This is a special case of the const join type.


const

The table has at most one matching row, which is read at the start of the query. Because there is only one row, values from the column in this row can be regarded as constants by the rest of the optimizer. const tables are very fast because they are read only once.

const is used when you compare all parts of a PRIMARY KEY or UNIQUE index to constant values. In the following queries, tbl_name can be used as a const table:

SELECT * FROM tbl_name WHERE primary_key=1;

SELECT * FROM tbl_name WHERE primary_key_part1=1 AND primary_key_part2=2;


eq_ref

One row is read from this table for each combination of rows from the previous tables. Other than the system and const types, this is the best possible join type. It is used when all parts of an index are used by the join and the index is a PRIMARY KEY or UNIQUE NOT NULL index.

eq_ref can be used for indexed columns that are compared using the = operator. The comparison value can be a constant or an expression that uses columns from tables that are read before this table. In the following examples, MySQL can use an eq_ref join to process ref_table:

SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column;

SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1;


ref

All rows with matching index values are read from this table for each combination of rows from the previous tables. ref is used if the join uses only a leftmost prefix of the key or if the key is not a PRIMARY KEY or UNIQUE index (in other words, if the join cannot select a single row based on the key value). If the key that is used matches only a few rows, this is a good join type.

ref can be used for indexed columns that are compared using the = or <=> operator. In the following examples, MySQL can use a ref join to process ref_table:

SELECT * FROM ref_table WHERE key_column=expr;

SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column;

SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1;


fulltext

The join is performed using a FULLTEXT index.


ref_or_null

This join type is like ref, but with the addition that MySQL does an extra search for rows that contain NULL values. This join type optimization is used most often in resolving subqueries. In the following examples, MySQL can use a ref_or_null join to process ref_table:

SELECT * FROM ref_table WHERE key_column=expr OR key_column IS NULL;

See Section 8.2.1.12, “IS NULL Optimization”.


index_merge

This join type indicates that the Index Merge optimization is used. In this case, the key column in the output row contains a list of indexes used, and key_len contains a list of the longest key parts for the indexes used. For more information, see Section 8.2.1.3, “Index Merge Optimization”.


unique_subquery

This type replaces eq_ref for some IN subqueries of the following form:

value IN (SELECT primary_key FROM single_table WHERE some_expr) unique_subquery is just an index lookup function that replaces the subquery completely for better efficiency.


index_subquery

This join type is similar to unique_subquery. It replaces IN subqueries, but it works for nonunique indexes in subqueries of the following form:

value IN (SELECT key_column FROM single_table WHERE some_expr)


range

Only rows that are in a given range are retrieved, using an index to select the rows. The key column in the output row indicates which index is used. The key_len contains the longest key part that was used. The ref column is NULL for this type.

range can be used when a key column is compared to a constant using any of the =, <>, >, >=, <, <=, IS NULL, <=>, BETWEEN, or IN() operators:

SELECT * FROM tbl_name WHERE key_column = 10;

SELECT * FROM tbl_name WHERE key_column BETWEEN 10 and 20;

SELECT * FROM tbl_name WHERE key_column IN (10,20,30);

SELECT * FROM tbl_name WHERE key_part1 = 10 AND key_part2 IN (10,20,30);


index

The index join type is the same as ALL, except that the index tree is scanned. This occurs two ways:

If the index is a covering index for the queries and can be used to satisfy all data required from the table, only the index tree is scanned. In this case, the Extra column says Using index. An index-only scan usually is faster than ALL because the size of the index usually is smaller than the table data.

A full table scan is performed using reads from the index to look up data rows in index order. Uses index does not appear in the Extra column.

MySQL can use this join type when the query uses only columns that are part of a single index.


ALL

A full table scan is done for each combination of rows from the previous tables. This is normally not good if the table is the first table not marked const, and usually very bad in all other cases. Normally, you can avoid ALL by adding indexes that enable row retrieval from the table based on constant values or column values from earlier tables.


我没有能力完整地翻译这些含义,结合着翻译工具和网上的一些文章,当然也有我的实战和额外测试,整理出了以下笔记:


type:system解析

表里面只有一行数据的时候就会这样,而且不管有没有索引都一样,这是一个特殊的const类型(就是指下面马上要讲的const)


type:const解析

当只有一行数据会被匹配的时候就会这样,有的人可能会搞混了,它跟system不同,system是表里只有一条数据,但const是表里可以有很多数据,然而只有1条匹配,比如id是主键,tel是唯一索引,那么id = 333tel = 13800138000那就只能查出一条数据,这就符合const的情况,于是经常就出现在主键(Primary)和唯一(Unique)索引的场合

然而也不是完全这样,比如where id in(1, 2, 3)那匹配的行就不是一行了,于是会变成后面的range类型


type:eq_ref解析

这个建议先看查询情况(下面test和test2的表结构是不同的,但记录条数一样):

mysql> explain select * from test,test2 where test.id = test2.id;
+----+-------------+-------+--------+---------------+---------+---------+--------------+------+-------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref          | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+--------------+------+-------+
|  1 | SIMPLE      | test2 | ALL    | PRIMARY       | NULL    | NULL    | NULL         |    2 |       |
|  1 | SIMPLE      | test  | eq_ref | PRIMARY       | PRIMARY | 4       | abc.test2.id |    1 |       |
+----+-------------+-------+--------+---------------+---------+---------+--------------+------+-------+

第2条分析记录显示test表是eq_ref连接类型,要获取test的哪些数据?那是靠前面的test2表的查询结果中的id来比较咯,如果test哪些的id跟test2的id相等那就取test的行

不过你会发现,怎么explain的结果好像将test2和test的顺序反过来了呀?这个真实原因我也没搞清楚,网上还没找到答案,我自己做了一些测试,在某些情况下它又会按照查询声明的顺序来(声明调换一下它也跟着换),这个留着以后再研究吧,其实对我们常用的查询分析不大影响,解决我们常用的需求再说

而官方对eq_ref的解析其实大概是这样的:

对于每个来自于前面的表的行组合,从该表中读取一行。这可能是最好的联接类型,除了system和const类型。它用在一个索引的所有部分被联接使用并且索引是UNIQUEPRIMARY KEY

没错,并且索引是UNIQUE或PRIMARY KEY这句也很关键,如果order_id不是唯一索引或主键,那么WHERE test.order_id = test2.order_id就不是eq_ref,而是下面的ref


type:ref 解析

先读懂上面的eq_ref,然后这个就好理解:如果比较得到的字段值不是唯一的话(会有多个相同的),那就是ref

英文原文翻译:

对于每个来自于前面的表的行组合,所有有匹配索引值的行将从这张表中读取。如果联接只使用键的最左边的前缀,或如果键不是UNIQUE或PRIMARY KEY(换句话说,如果联接不能基于关键字选择单个行的话),则使用ref。如果使用的键仅仅匹配少量行,该联接类型是不错的


type:fulltext 解析

当使用了全文索引的时候就会这样,没啥好说的


type:ref_or_null 解析

和ref其实是一样的,只是当被使用的索引的字段可以为null的时候,一旦有WHERE phone = '13800138000' OR phone IS NULL这样的条件导致额外搜索了null值的记录,那么连接类型就会是ref_or_null,99%的情况我们都不会允许字段为null,所以这个情况了解下就好,不用太纠结它。但据说它经常用于子查询,而很少写子查询的我就不好体会了,以后再回来说吧


type:index_merge 解析

index merge是使用索引进行优化的重要基础之一。理解了 index merge 技术,我们才知道应该如何在表上建立索引

这就尴尬了,你看这篇文章篇幅已经很长了……而index_merge其实又是一个大课题,不方便在这里塞这么多内容啦,请到《转载收藏 - MySQL 优化之 index merge(索引合并)》中详细参考吧


type:unique_subquery 解析

下面是查询未成年用户的所有发表文章例子:

mysql> EXPLAIN SELECT * FROM article WHERE user_id IN( SELECT id FROM user WHERE age < 18 );
+----+--------------------+-------+----------------+---------------+---------+---------+------+------+--------------------------+
| id | select_type        | table | type           | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+--------------------+-------+----------------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY            | article | index          | NULL        | PRIMARY | 4       | NULL |    2 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | user    | index_subquery | PRIMARY,age | age     | 2       | func |    2 | Using where              |
+----+--------------------+-------+----------------+---------------+---------+---------+------+------+--------------------------+

你应该要注意到子查询返回了一列ID,它的特性是:所有值都是唯一的,这里不会有重复的ID,然后交给外面去WHERE user_id IN $子查询返回的ID结果列

好了这下就好说了,unique_subquery顾名思义,就是返回列的值都是唯一的子查询


type:index_subquery 解析

跟unique_subquery的解析逻辑一模一样,只是子查询返回的值非唯一而已,例如:WHERE age in( SELECT age FROM user WHERE gender = 'girl' )


type:range 解析

当条件中给出了一些指定范围时就会这样,比如:

  1. IN(value1, value2, value3)

  2. age > 10

  3. age > 10 AND age < 18

  4. age BETWEEN 10 AND 18

还有>=、<=什么的就不一一列举了


type:index 解析

这和下面的all一样是扫全表的,不过它是先扫了整个索引,然后再读取记录,由于索引是事先排好顺序的,于是就避免了排序,也比直接扫全表更快速


type:all 解析

全表扫描


possible_keys和key列

就是可能会用到的索引和实际用到的索引,最后肯定还是盯准key了


key_len列

所使用的索引的最大长度,比如一个int字段len就是4字节长度。

有时候会被人理解为数据类型的字节长度,其实不是,既然叫key_len那自然就是索引的长度了,这里还要看它存了什么,比如varchar(20),key_len为63,计算公式:20 * 3 + 1(是否为null的标识) +2(变长字符的额外开销字节)

这其实在我们设计表的时候就注定了索引的时候key_len是多少,除非你要修改表设计,不然是这样就这样了,数字越大说明你的索引扫描开销会更大

如果想深究各种字段加索引后key_len是多少可以上网查“mysql key_len


ref列

表达了是什么东西参与了索引(key列显示的索引)的比较,比如WHERE id = 7,那你自己实现查询的话肯定会写大概这样的代码来查找数据并返回吧:

searchValue = 7
if(row.id == searchVaule){
	return row
}

ref所表达的就是if判断那里,底层肯定存在比较逻辑,比较出符合的值了才会筛选需要的数据,那参与比较的到底是谁呢?一方面是索引,ref指的就是参与索引比较的东西,那另一方面就是我们输入的值了,我们where条件中指出的是id 7,7是一个常量,因此你可以发现这种比较输出的ref列的值是一个const,就是一个常量参与的比较

如果是一个子查询那就会是func,还有其它情况也可能是func的

这一列对分析过程没什么影响,而且我也没能从网上找到什么深度解读的文章进行参考


rows列

系统评估需要扫描的行数,但不是最终真实扫描的行数,只是通常都是准确的


extra列

这个显示其他的一些信息,但对优化sql也非常的重要

常见值一览(网上摘抄):

  1. distinct:一旦mysql找到了与行相联合匹配的行,就不再搜索了

  2. not exists:mysql优化了left join,一旦它找到了匹配left join标准的行,就不再搜索了

  3. range checked for each record(index map:#):没有找到理想的索引,因此对于从前面表中来的每一个行组合,mysql检查使用哪个索引,并用它来从表中返回行。这是使用索引的最慢的连接之一

  4. using filesort:看到这个的时候,查询就需要优化了。mysql需要进行额外的步骤来发现如何对返回的行排序。它根据连接类型以及存储排序键值和匹配条件的全部行的行指针来排序全部行

  5. using index:列数据是从仅仅使用了索引中的信息而没有读取实际的行动的表返回的,这发生在对表的全部的请求列都是同一个索引的部分的时候

  6. using temporary:看到这个的时候,查询需要优化了。这里,mysql需要创建一个临时表来存储结果,这通常发生在对不同的列集进行order by上,而不是group by上

  7. where used:使用了where从句来限制哪些行将与下一张表匹配或者是返回给用户。如果不想返回表中的全部行,并且连接类型all或index,这就会发生,或者是查询有问题不同连接类型的解释(按照效率高低的顺序排序)