本文共 16651 字,大约阅读时间需要 55 分钟。
主要环境如下:
SQL> select * from v$version;
BANNER
-------------------------------------------------------------------------------Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
4节点 HPUX RAC OLAP 环境
SQL> show parameter db_block_size
NAME TYPE VALUE
------------------------------------ --------------------------------- ------ db_block_size integer 16384 SQL> show parameter db_fileNAME TYPE VALUE
------------------------------------ --------------------------------- ------ db_file_multiblock_read_count integer 64 ETL 开发人员发来邮件说有个long running job,跑了2小时左右 还未完成 叫我check一下SQL> select * from table(dbms_xplan.display_cursor('gh1hw18uz6dcm',0));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------SQL_ID gh1hw18uz6dcm, child number 0
------------------------------------- create table OPT_REF_BASE_UOM_TEMP_SDIM parallel 2 TABLESPACE OPTIMA01M nologging as SELECT PROD_SKID, RELTV_CURR_QTY, STAT_CURR_VAL, BAR_CURR_CODE FROM OPT_REF_BASE_UOM_DIM_VWPlan hash value: 2933813170
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------ | 0 | CREATE TABLE STATEMENT | | | | 8883 (100)| | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 54 | 2916 | 8882 (1)| 00:01:02 | Q1,01 | P->S | QC (RAND) | | 3 | LOAD AS SELECT | | | | | | Q1,01 | PCWP | | | 4 | HASH GROUP BY | | 54 | 2916 | 8882 (1)| 00:01:02 | Q1,01 | PCWP | | | 5 | PX RECEIVE | | 54 | 2916 | 8882 (1)| 00:01:02 | Q1,01 | PCWP | | | 6 | PX SEND HASH | :TQ10000 | 54 | 2916 | 8882 (1)| 00:01:02 | Q1,00 | P->P | HASH | | 7 | HASH GROUP BY | | 54 | 2916 | 8882 (1)| 00:01:02 | Q1,00 | PCWP | | | 8 | NESTED LOOPS | | | | | | Q1,00 | PCWP | | | 9 | NESTED LOOPS | | 3134 | 165K| 8881 (1)| 00:01:02 | Q1,00 | PCWP | | | 10 | PX BLOCK ITERATOR | | | | | | Q1,00 | PCWC | | |* 11 | TABLE ACCESS FULL | OPT_REF_UOM_TEMP_SDIM | 3065 | 104K| 355 (13)| 00:00:03 | Q1,00 | PCWP | | |* 12 | INDEX RANGE SCAN | PROD_DIM_PK | 3 | | 1 (0)| 00:00:01 | Q1,00 | PCWP | | |* 13 | TABLE ACCESS BY INDEX ROWID| PROD_DIM | 1 | 19 | 3 (0)| 00:00:01 | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------11 - access(:Z>=:Z AND :Z<=:Z)
filter("UOM"."RELTV_CURR_QTY"=1) 12 - access("PROD"."PROD_SKID"="UOM"."PROD_SKID") 13 - filter(("PROD"."BUOM_CURR_SKID" IS NOT NULL AND "PROD"."PROD_END_DATE"=TO_DATE(' 9999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "PROD"."CURR_IND"='Y' AND "PROD"."BUOM_CURR_SKID"="UOM"."UOM_SKID")) 36 rows selected.那么这个JOB 很简单,就是一个 create table xxx as select
上面的执行计划是有问题的,执行计划里面显示有PX 操作,其实我没发现有并行,不过上面的执行计划访问路径是对的
关于执行计划显示错误,这里不予讨论,11G上面有很多BUG 我们已经遇到5个了,另外也不要问我为什么升级到11gR1那么实际上要执行的SQL就是
create table OPT_REF_BASE_UOM_TEMP_SDIM parallel 2 TABLESPACE OPTIMA01M nologging as SELECT PROD_SKID, RELTV_CURR_QTY, STAT_CURR_VAL, BAR_CURR_CODE FROM OPT_REF_BASE_UOM_DIM_VWOPT_REF_BASE_UOM_DIM_VW是一个视图,该视图定义如下
SELECT UOM.PROD_SKID,
MAX (UOM.RELTV_CURR_QTY) RELTV_CURR_QTY, MAX (UOM.STAT_CURR_VAL) STAT_CURR_VAL, MAX (UOM.BAR_CURR_CODE) BAR_CURR_CODE --modified by Daniel on 10/26/2010 for CR 326285 of QMR10.1 hypercare retrofit begin FROM OPT_REF_UOM_TEMP_SDIM UOM, REF_PROD_DIM PROD WHERE UOM.RELTV_CURR_QTY = 1 AND PROD.CURR_IND = 'Y' AND PROD.PROD_END_DATE = TO_DATE ('31-12-9999', 'dd-mm-yyyy') AND PROD.PROD_SKID = UOM.PROD_SKID AND PROD.BUOM_CURR_SKID = UOM.UOM_SKID --modified by Daniel on 10/26/2010 for CR 326285 of QMR10.1 hypercare retrofit end GROUP BY UOM.PROD_SKID那么现在 这个视图的查询效率就直接决定了该JOB的效率,我们 现在来看这个视图的执行计划
SQL> explain plan for SELECT UOM.PROD_SKID,
2 MAX (UOM.RELTV_CURR_QTY) RELTV_CURR_QTY, 3 MAX (UOM.STAT_CURR_VAL) STAT_CURR_VAL, 4 MAX (UOM.BAR_CURR_CODE) BAR_CURR_CODE 5 --modified by Daniel on 10/26/2010 for CR 326285 of QMR10.1 hypercare retrofit begin 6 FROM OPT_REF_UOM_TEMP_SDIM UOM, 7 REF_PROD_DIM PROD 8 WHERE UOM.RELTV_CURR_QTY = 1 9 AND PROD.CURR_IND = 'Y' 10 AND PROD.PROD_END_DATE = TO_DATE ('31-12-9999', 'dd-mm-yyyy') 11 AND PROD.PROD_SKID = UOM.PROD_SKID 12 AND PROD.BUOM_CURR_SKID = UOM.UOM_SKID 13 --modified by Daniel on 10/26/2010 for CR 326285 of QMR10.1 hypercare retrofit end 14 GROUP BY UOM.PROD_SKID;Explained.
Elapsed: 00:00:01.29
SQL> select * from table(dbms_xplan.display);PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------Plan hash value: 3215660883
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 78 | 4212 | 15507 (1)| 00:01:47 | | 1 | HASH GROUP BY | | 78 | 4212 | 15507 (1)| 00:01:47 | | 2 | NESTED LOOPS | | | | | | | 3 | NESTED LOOPS | | 3034 | 159K| 15506 (1)| 00:01:47 | |* 4 | TABLE ACCESS FULL | OPT_REF_UOM_TEMP_SDIM | 2967 | 101K| 650 (14)| 00:00:05 | |* 5 | INDEX RANGE SCAN | PROD_DIM_PK | 3 | | 2 (0)| 00:00:01 | |* 6 | TABLE ACCESS BY INDEX ROWID| PROD_DIM | 1 | 19 | 5 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------4 - filter("UOM"."RELTV_CURR_QTY"=1)
5 - access("PROD"."PROD_SKID"="UOM"."PROD_SKID") 6 - filter("PROD"."BUOM_CURR_SKID" IS NOT NULL AND "PROD"."PROD_END_DATE"=TO_DATE(' 9999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "PROD"."CURR_IND"='Y' AND "PROD"."BUOM_CURR_SKID"="UOM"."UOM_SKID")22 rows selected.
Elapsed: 00:00:02.06
大家请看执行计划 中ID=3的步骤,cost从 650一下子飙升到15506,根据我的SQL调优经验,这是非常不能理解的。
我怀疑CBO选出了执行计划。那么这里OPT_REF_UOM_TEMP_SDIM 作为驱动表,Oracle先对它做一次全表扫描,然后应用过滤条件
WHERE UOM.RELTV_CURR_QTY = 1 所得到的数据作为驱动行源(你可以看 filter("UOM"."RELTV_CURR_QTY"=1)) 那么这里 CBO计算出 根据 filter("UOM"."RELTV_CURR_QTY"=1) 过滤后 会返回2967条数据 SQL> select count(*) from OPT_REF_UOM_TEMP_SDIM;COUNT(*)
---------- 2137706Elapsed: 00:00:08.87
SQL> select count(*) from OPT_REF_UOM_TEMP_SDIM where "RELTV_CURR_QTY"=1;COUNT(*)
---------- 946432Elapsed: 00:00:01.54
SQL> select 946432/2137706 from dual;946432/2137706
-------------- .442732537Elapsed: 00:00:02.04
根据我下面的查询,那么很明显CBO计算错误,filter("UOM"."RELTV_CURR_QTY"=1) 过滤之后实际上会返回946432条数据
然而 CBO认为只返回 2967条数据,所以这里不应该走嵌套循环,应该走 hash。 为什么不应该走嵌套循环?原因在于SQL> select 946432/2137706 from dual;
946432/2137706
-------------- .442732537 全表扫描然后过滤之后 会返回44%的数据, 什么时候应该选择嵌套循环 ,通常情况下驱动表(行源) 应该返回 少量数据,而且被驱动表上面有选择性很高的索引才选择 嵌套循环。也就是说通常情况下驱动表应该返回不超过 源表10%的数据,而这里居然返回44%的数据,明显打错特错了。我自己测试了一下,如果走嵌套循环要花 2小时40分钟
867176 rows selected.
Elapsed: 02:39:58.33
Execution Plan
---------------------------------------------------------- Plan hash value: 3215660883 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 108 | 15996 (1)| 00:01:50 | | 1 | HASH GROUP BY | | 2 | 108 | 15996 (1)| 00:01:50 | | 2 | NESTED LOOPS | | | | | | | 3 | NESTED LOOPS | | 3134 | 165K| 15995 (1)| 00:01:50 | |* 4 | TABLE ACCESS FULL | OPT_REF_UOM_TEMP_SDIM | 3065 | 104K| 649 (14)| 00:00:05 | |* 5 | INDEX RANGE SCAN | PROD_DIM_PK | 3 | | 2 (0)| 00:00:01 | |* 6 | TABLE ACCESS BY INDEX ROWID| PROD_DIM | 1 | 19 | 5 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter("UOM"."RELTV_CURR_QTY"=1) 5 - access("PROD"."PROD_SKID"="UOM"."PROD_SKID") 6 - filter("PROD"."BUOM_CURR_SKID" IS NOT NULL AND "PROD"."PROD_END_DATE"=TO_DATE(' 9999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "PROD"."CURR_IND"='Y' AND "PROD"."BUOM_CURR_SKID"="UOM"."UOM_SKID") 注意,这个嵌套循环的数据和上面的嵌套循环的数据有点出入,原因是数据有变动,统计信息有变动,不过不影响案例如果走 HASH 连接 ,只要 8 分钟左右,怎么让它走hash? 这里我给大家列举3中方法
1 使用hint full 强制PROD_DIM 走全表扫描
2 使用hint use_hash 强制2表做hash连接 3 使用hint leading 强制 PROD_DIM 作为驱动表867176 rows selected.
Elapsed: 00:07:52.33
Execution Plan
---------------------------------------------------------- Plan hash value: 612020119 ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 60 | 3240 | | 44136 (5)| 00:05:04 | | 1 | HASH GROUP BY | | 60 | 3240 | | 44136 (5)| 00:05:04 | |* 2 | HASH JOIN | | 3065 | 161K| 29M| 44135 (5)| 00:05:04 | |* 3 | TABLE ACCESS FULL| PROD_DIM | 998K| 18M| | 43022 (5)| 00:04:56 | |* 4 | TABLE ACCESS FULL| OPT_REF_UOM_TEMP_SDIM | 3065 | 104K| | 649 (14)| 00:00:05 | ----------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("PROD"."PROD_SKID"="UOM"."PROD_SKID" AND "PROD"."BUOM_CURR_SKID"="UOM"."UOM_SKID") 3 - filter("PROD"."BUOM_CURR_SKID" IS NOT NULL AND "PROD"."PROD_END_DATE"=TO_DATE(' 9999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "PROD"."CURR_IND"='Y') 4 - filter("UOM"."RELTV_CURR_QTY"=1)那么到这里 还没完,我们不仅仅要对SQL做优化,还要搞清楚为什么CBO选择错了执行计划,根据上面的分析,
CBO选择错误的执行计划的原因在于 它认为 filter("UOM"."RELTV_CURR_QTY"=1) 只返回2967 条数据,所以 为题出现在统计信息 上面,而且是列统计信息上面 那么我们来查询一些 列的统计信息SQL> select a.owner ||'.'||a.table_name name ,a.column_name,b.num_rows,a.num_distinct Cardinality,a.num_distinct/b.num_rows selectivity,
2 num_nulls,density,a.histogram,a.num_buckets from dba_tab_col_statistics a,dba_tables b where a.owner=b.owner and a.table_name=b.table_name 3 and a.owner=upper('adwu_optima_we11') and a.table_name=upper('OPT_REF_UOM_TEMP_SDIM') and a.column_name=upper('RELTV_CURR_QTY');NAME COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY NUM_NULLS DENSITY HISTOGRAM
---------------------------------------- -------------------- ---------- ----------- ----------- ---------- ---------- -------------------------- ADWU_OPTIMA_WE11.OPT_REF_UOM_TEMP_SDIM RELTV_CURR_QTY 2160000 728 .000337037 0 .001373626 NONE大家请看,列上面一共只有728个基数(唯一值),然而表有200多万的数据,并且没有对列收集过直方图统计
那么我现在对该列收集直方图SQL> BEGIN
2 DBMS_STATS.GATHER_TABLE_STATS(ownname => 'adwu_optima_we11', 3 tabname => 'OPT_REF_UOM_TEMP_SDIM', 4 estimate_percent => 100, 5 method_opt => 'for columns RELTV_CURR_QTY size 200', 6 degree => DBMS_STATS.AUTO_DEGREE, 7 cascade=>TRUE 8 ); 9 END; 10 /PL/SQL procedure successfully completed.
我再来查看一下该列的统计信息,这里基数上升到2110,并且直方图也收集了
SQL> select a.owner ||'.'||a.table_name name ,a.column_name,b.num_rows,a.num_distinct Cardinality,a.num_distinct/b.num_rows selectivity,
2 num_nulls,density,a.histogram,a.num_buckets from dba_tab_col_statistics a,dba_tables b where a.owner=b.owner and a.table_name=b.table_name 3 and a.owner=upper('adwu_optima_we11') and a.table_name=upper('OPT_REF_UOM_TEMP_SDIM') and a.column_name=upper('RELTV_CURR_QTY');NAME COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY NUM_NULLS DENSITY HISTOGRAM
---------------------------------------- -------------------- ---------- ----------- ----------- ---------- ---------- ---------------------------- ADWU_OPTIMA_WE11.OPT_REF_UOM_TEMP_SDIM RELTV_CURR_QTY 2137706 2110 .000987039 0 .00217122 HEIGHT BALANCED我们再来查看一下执行计划
SQL> explain plan for SELECT UOM.PROD_SKID,
2 MAX (UOM.RELTV_CURR_QTY) RELTV_CURR_QTY, 3 MAX (UOM.STAT_CURR_VAL) STAT_CURR_VAL, 4 MAX (UOM.BAR_CURR_CODE) BAR_CURR_CODE 5 --modified by Daniel on 10/26/2010 for CR 326285 of QMR10.1 hypercare retrofit begin 6 FROM OPT_REF_UOM_TEMP_SDIM UOM, 7 REF_PROD_DIM PROD 8 WHERE UOM.RELTV_CURR_QTY = 1 9 AND PROD.CURR_IND = 'Y' 10 AND PROD.PROD_END_DATE = TO_DATE ('31-12-9999', 'dd-mm-yyyy') 11 AND PROD.PROD_SKID = UOM.PROD_SKID 12 AND PROD.BUOM_CURR_SKID = UOM.UOM_SKID 13 --modified by Daniel on 10/26/2010 for CR 326285 of QMR10.1 hypercare retrofit end 14 GROUP BY UOM.PROD_SKID;Explained.
Elapsed: 00:00:00.82
这回执行计划就走对了
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------Plan hash value: 612020119
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 12097 | 637K| | 44911 (5)| 00:05:09 | | 1 | HASH GROUP BY | | 12097 | 637K| | 44911 (5)| 00:05:09 | |* 2 | HASH JOIN | | 951K| 48M| 29M| 44799 (5)| 00:05:08 | |* 3 | TABLE ACCESS FULL| PROD_DIM | 998K| 18M| | 43022 (5)| 00:04:56 | |* 4 | TABLE ACCESS FULL| OPT_REF_UOM_TEMP_SDIM | 951K| 31M| | 654 (15)| 00:00:05 | -----------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------2 - access("PROD"."PROD_SKID"="UOM"."PROD_SKID" AND
"PROD"."BUOM_CURR_SKID"="UOM"."UOM_SKID") 3 - filter("PROD"."BUOM_CURR_SKID" IS NOT NULL AND "PROD"."PROD_END_DATE"=TO_DATE(' 9999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "PROD"."CURR_IND"='Y') 4 - filter("UOM"."RELTV_CURR_QTY"=1)20 rows selected.
也许你们说 是由于统计信息过期导致的,那么我现在把直方图删除
SQL> BEGIN
2 DBMS_STATS.GATHER_TABLE_STATS(ownname => 'adwu_optima_we11', 3 tabname => 'OPT_REF_UOM_TEMP_SDIM', 4 estimate_percent => 100, 5 method_opt => 'for columns RELTV_CURR_QTY size 1', 6 degree => DBMS_STATS.AUTO_DEGREE, 7 cascade=>TRUE 8 ); 9 END; 10 /PL/SQL procedure successfully completed.
Elapsed: 00:00:12.65
SQL> explain plan for SELECT UOM.PROD_SKID, 2 MAX (UOM.RELTV_CURR_QTY) RELTV_CURR_QTY, 3 MAX (UOM.STAT_CURR_VAL) STAT_CURR_VAL, 4 MAX (UOM.BAR_CURR_CODE) BAR_CURR_CODE 5 --modified by Daniel on 10/26/2010 for CR 326285 of QMR10.1 hypercare retrofit begin 6 FROM OPT_REF_UOM_TEMP_SDIM UOM, 7 REF_PROD_DIM PROD 8 WHERE UOM.RELTV_CURR_QTY = 1 9 AND PROD.CURR_IND = 'Y' 10 AND PROD.PROD_END_DATE = TO_DATE ('31-12-9999', 'dd-mm-yyyy') 11 AND PROD.PROD_SKID = UOM.PROD_SKID 12 AND PROD.BUOM_CURR_SKID = UOM.UOM_SKID 13 --modified by Daniel on 10/26/2010 for CR 326285 of QMR10.1 hypercare retrofit end 14 GROUP BY UOM.PROD_SKID;Explained.
Elapsed: 00:00:00.82
SQL> select * from table(dbms_xplan.display);PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------Plan hash value: 3215660883
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9 | 486 | 5722 (2)| 00:00:40 | | 1 | HASH GROUP BY | | 9 | 486 | 5722 (2)| 00:00:40 | | 2 | NESTED LOOPS | | | | | | | 3 | NESTED LOOPS | | 1036 | 55944 | 5721 (2)| 00:00:40 | |* 4 | TABLE ACCESS FULL | OPT_REF_UOM_TEMP_SDIM | 1013 | 35455 | 650 (14)| 00:00:05 | |* 5 | INDEX RANGE SCAN | PROD_DIM_PK | 3 | | 2 (0)| 00:00:01 | |* 6 | TABLE ACCESS BY INDEX ROWID| PROD_DIM | 1 | 19 | 5 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------4 - filter("UOM"."RELTV_CURR_QTY"=1)
5 - access("PROD"."PROD_SKID"="UOM"."PROD_SKID") 6 - filter("PROD"."BUOM_CURR_SKID" IS NOT NULL AND "PROD"."PROD_END_DATE"=TO_DATE(' 9999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "PROD"."CURR_IND"='Y' AND "PROD"."BUOM_CURR_SKID"="UOM"."UOM_SKID")22 rows selected.
Elapsed: 00:00:01.86
所以,最终导致CBO选错执行计划的罪魁祸首就是列没有收集直方图统计
通过这一次的案例分析,相信大家对嵌套循环的优化应该不成问题了,大家一定要深刻理解嵌套循环的原理
另外一个就是,大家要深入研究统计信息,CBO能不能选对执行计划,很关键的就在于统计信息的准确与否。