现在的位置: 首页 > 综合 > 正文

直方图(Histogram)!

2013年06月15日 ⁄ 综合 ⁄ 共 6048字 ⁄ 字号 评论关闭

dbms_stats 包对段表的分析有三个层次:

表自身的分析:包括表中的行数、数据块数、行长等信息。

列的分析:包括列值的重复数、列上的空值、数据在列上的分布情况。

索引的分析:包括索引叶块的数量、索引的深度、索引的聚合因子等。

直方图有时对于CBO非常重要,特别是对于有字段数据非常倾斜的表,做直方图分析尤为重要,比如下面的例子:

SQL> show user
USER 为 "U1"
SQL> create table t3 as select 1 id,object_name name from test;

表已创建。

SQL> update t3 set id = 99 where rownum = 1;

已更新 1 行。

SQL> commit;

提交完成。

SQL> select id,count(*) from t3 group by id;

        ID   COUNT(*)
---------- ----------
         1     402340
        99          1

SQL> create index idx_t3_id on t3(id);

索引已创建。

SQL> exec dbms_stats.gather_table_stats(ownname=>'u1',tabname=>'t3',cascade=>true);

PL/SQL 过程已成功完成。

我们创建了一张表,他的id字段数值严重倾斜,除了有一条记录为99外,其他的记录全部是1,然后对表做分析,默认情况下,dbms_stats包会对所有的列做直方图分析,可以从这个视图上查到相关的信息:

SQL> select table_name,column_name,endpoint_number,endpoint_value from user_histograms where table_name = 'T3';

TABLE_NAME                     COLUMN_NAME     ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ --------------- --------------- --------------
T3                             ID                            0              1
T3                             NAME                          0     2.4504E+35
T3                             ID                            1              1
T3                             NAME                          1     6.2511E+35

SQL> select table_name,num_rows,avg_row_len,to_char(last_analyzed,'yyyy-mm-dd hh24:mi:ss') from user_tables;

TABLE_NAME                       NUM_ROWS AVG_ROW_LEN TO_CHAR(LAST_ANALYZ
------------------------------ ---------- ----------- -------------------
T3                                 400883          27 2011-12-10 03:41:55

现在来看让id值分布等于1,99的时候,CBO产生的执行计划的情况:

SQL> set autotrace traceonly;
SQL> select * from t3 where id = 99;


执行计划
----------------------------------------------------------
Plan hash value: 3977267011

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 27 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_T3_ID | 1 | | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=99)


统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
1 physical reads
0 redo size
466 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

从这里我们可以看到,CBO根据直方图的信息知道id=99的记录只有一行,所以它选择了索引,这个选择非常正确。

SQL> set autotrace traceonly;
SQL> select * from t3 where id = 1;

已选择402340行。


执行计划
----------------------------------------------------------
Plan hash value: 4161002650

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   400K|    10M|   418   (4)| 00:00:06 |
|*  1 |  TABLE ACCESS FULL| T3   |   400K|    10M|   418   (4)| 00:00:06 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"=1)


统计信息
----------------------------------------------------------
          2  recursive calls
          0  db block gets
      28495  consistent gets
       1728  physical reads
          0  redo size
   12140483  bytes sent via SQL*Net to client
     295442  bytes received via SQL*Net from client
      26824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     402340  rows processed

同样,CBO从直方图的信息中得到id=1的值几乎与表的记录数一样,所以全表扫描被使用。现在让我们将直方图信息删除,保留表和索引的分析信息:

SQL> exec dbms_stats.delete_column_stats(user,'t3','id');


PL/SQL 过程已成功完成。

SQL> select table_name,column_name,endpoint_number,endpoint_value from user_histograms where table_name = 'T3';

TABLE_NAME                     COLUMN_NAME     ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ --------------- --------------- --------------
T3                             NAME                          0     2.4504E+35
T3                             NAME                          1     6.2963E+35

SQL> select table_name,num_rows,avg_row_len,to_char(last_analyzed,'yyyy-mm-dd hh24:mi:ss') from user_tables where table_name = 'T3';

TABLE_NAME                       NUM_ROWS AVG_ROW_LEN TO_CHAR(LAST_ANALYZ
------------------------------ ---------- ----------- -------------------
T3                                 403448          27 2011-12-10 04:12:42

SQL> select blevel,leaf_blocks,distinct_keys,last_analyzed from user_indexes where table_name = 'T3';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS LAST_ANALYZED
---------- ----------- ------------- --------------
         2         786             2 10-12月-11

可以看见id字段的信息已经被删除,索引和表的信息仍然存在,并且索引中甚至可以找到重复的数值只有2个,但是CBO却无法得到这两个重复的数值的分布情况,所以依然没有办法选出一个正确的执行计划。

SQL> set autotrace traceonly;

SQL> select * from t3 where id = 1;

已选择402340行。


执行计划
----------------------------------------------------------
Plan hash value: 3977267011

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |  4034 |   106K|   403   (1)| 00:00:05 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |  4034 |   106K|   403   (1)| 00:00:05 |
|*  2 |   INDEX RANGE SCAN          | IDX_T3_ID |  1614 |       |   395   (0)| 00:00:05 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=1)


统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      28495  consistent gets
       1309  physical reads
          0  redo size
   12140483  bytes sent via SQL*Net to client
     295442  bytes received via SQL*Net from client
      26824  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     402340  rows processed

SQL> select * from t3 where id = 99;


执行计划
----------------------------------------------------------
Plan hash value: 3977267011

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |  4034 |   106K|   403   (1)| 00:00:05 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T3        |  4034 |   106K|   403   (1)| 00:00:05 |
|*  2 |   INDEX RANGE SCAN          | IDX_T3_ID |  1614 |       |   395   (0)| 00:00:05 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=99)


统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        466  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

可以看见上面的执行计划已经是不正确的了。

抱歉!评论已关闭.