【GreatSQL优化器-05】条件过滤condition_fanout_filter

【GreatSQL优化器-05】条件过滤condition_fanout_filter

一、condition_fanout_filter概述

GreatSQL优化器在处理连接(join)操作时,需要基于行数和成本(cost)来决定表的执行顺序。这一过程中,预估满足特定条件的数据行数至关重要。condition_fanout_filter功能便是基于一系列算法估算出一个数据过滤比例,即filtered系数,其取值范围在[0,1]之间,数值越小表示过滤效果越佳。将此系数乘以表的总行数,即可得到实际需要扫描的行数,从而显著降低开销和执行时长。

该特性由OPTIMIZER_SWITCH_COND_FANOUT_FILTER优化器开关控制,默认状态下为启用。在大多数情况下,无需手动关闭此功能,但若遇到执行效率低下的情况,可以考虑关闭它。

以下是一个简单的示例,用以阐释condition_fanout_filter的工作原理:

CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT, date1 DATETIME);
INSERT INTO t1 VALUES (1,10,'2021-03-25 16:44:00.123456'),(2,1,'2022-03-26 16:44:00.123456'),(3,4,'2023-03-27 16:44:00.123456'),(5,5,'2024-03-25 16:44:00.123456');
CREATE TABLE t2 (cc1 INT PRIMARY KEY, cc2 INT);
INSERT INTO t2 VALUES (1,3),(2,1),(3,2),(4,3),(5,15);
# 创建一个无主键的表用于过滤估计。
CREATE TABLE t3 (ccc1 INT, ccc2 varchar(100));
INSERT INTO t3 VALUES (1,'aa1'),(2,'bb1'),(3,'cc1'),(4,'dd1'),(null,'ee');
CREATE INDEX idx1 ON t1(c2);
CREATE INDEX idx2 ON t1(c2,date1);
CREATE INDEX idx2_1 ON t2(cc2);
CREATE INDEX idx3_1 ON t3(ccc1);

观察到t3的过滤百分比为46.66%,这意味着在两张表的连接操作中,原本需要处理20行数据,由于t3的过滤百分比为46.66%,最终仅需读取9行数据(20*46.66%)。

注意:此处未构建直方图,因此结果中未包含直方图过滤因素。直方图的相关内容将在后续章节中详细讲解。

greatsql> EXPLAIN SELECT * FROM t1 JOIN t3 ON t1.c1=t3.ccc1 OR t3.ccc1 <3;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                      |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | index | PRIMARY       | idx2 | 11      | NULL |    4 |   100.00 | Using index                                |
|  1 | SIMPLE      | t3    | NULL       | ALL   | idx3_1        | NULL | NULL    | NULL |    5 |    46.66 | Using where; Using join buffer (hash join) |
# 显示t3的过滤数据百分比
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+

greatsql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t3 ON t1.c1=t3.ccc1 OR t3.ccc1<3;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: ((t3.ccc1 = t1.c1) or (t3.ccc1 < 3))  (cost=3.65 rows=9)
        -> Inner hash join (no condition)  (cost=3.65 rows=9) # 结果显示读取9行数据,与上文计算一致
            -> Table scan on t3  (cost=0.12 rows=5)
            -> Hash
                -> Index scan on t1 using idx2  (cost=1.40 rows=4)
     |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

二、best_access_path代码解析

condition_fanout_filter的计算逻辑封装在

相关文章

暂无评论

暂无评论...