乱码乱a∨中文字幕,在线免费激情视频,亚洲欧美久久夜夜潮,国产在线网址

  1. <sub id="hjl7n"></sub>

    1. <sub id="hjl7n"></sub>

      <legend id="hjl7n"></legend>

      當前位置:首頁 >  站長 >  數(shù)據(jù)庫 >  正文

      PostgreSQL利用遞歸優(yōu)化求稀疏列唯一值的方法

       2021-06-01 16:38  來源: 腳本之家   我來投稿 撤稿糾錯

        阿里云優(yōu)惠券 先領(lǐng)券再下單

      在數(shù)據(jù)庫中經(jīng)常會碰到一些表的列是稀疏列,只有很少的值,例如性別字段,一般就只有2種不同的值。

      但是當我們求這些稀疏列的唯一值時,如果表的數(shù)據(jù)量很大,速度還是會很慢。

      例如:

      創(chuàng)建測試表

      bill=# create table t_sex (sex char(1), otherinfo text);
      CREATE TABLE
      bill=# insert into t_sex select 'm', generate_series(1,10000000)||'this is test';
      INSERT 0 10000000
      bill=# insert into t_sex select 'w', generate_series(1,10000000)||'this is test';
      INSERT 0 10000000

       

      查詢:

      可以看到下面的查詢速度很慢。

      bill=# select count(distinct sex) from t_sex;
       count
      -------
         2
      (1 row)

      Time: 8803.505 ms (00:08.804)
      bill=# select sex from t_sex t group by sex;
       sex
      -----
       m
       w
      (2 rows)

      Time: 1026.464 ms (00:01.026)

       

      那么我們對該字段加上索引又是什么情況呢?

      速度依然沒有明顯

      bill=# create index idx_sex_1 on t_sex(sex);
      CREATE INDEX
      bill=# select count(distinct sex) from t_sex;
       count
      -------
         2
      (1 row)

      Time: 8502.460 ms (00:08.502)
      bill=# select sex from t_sex t group by sex;
       sex
      -----
       m
       w
      (2 rows)

      Time: 572.353 ms

       

      的變化,可以看到執(zhí)行計劃已經(jīng)使用Index Only Scan了。

      bill=# explain select count(distinct sex) from t_sex;
                           QUERY PLAN
      ----------------------------------------------------------------------------------------------
       Aggregate (cost=371996.44..371996.45 rows=1 width=8)
        -> Index Only Scan using idx_sex_1 on t_sex (cost=0.44..321996.44 rows=20000000 width=2)
      (2 rows)

       

      同樣的SQL我們看看在Oracle中性能如何?

      創(chuàng)建測試表:

      SQL> create table t_sex (sex char(1), otherinfo varchar2(100));

      Table created.

      SQL> insert into t_sex select 'm', rownum||'this is test' from dual connect by level <=10000000;

      10000000 rows created.

      SQL> commit;

      Commit complete.

      SQL> insert into t_sex select 'w', rownum||'this is test' from dual connect by level <=10000000;

      10000000 rows created.

      SQL> commit;

      Commit complete.

       

      性能測試:

      SQL> set lines 1000 pages 2000
      SQL> set autotrace on
      SQL> set timing on

      SQL> select count(distinct sex) from t_sex;

      COUNT(DISTINCTSEX)
      ------------------
               2

      Elapsed: 00:00:01.58

      Execution Plan
      ----------------------------------------------------------
      Plan hash value: 3915432945

      ----------------------------------------------------------------------------
      | Id | Operation     | Name | Rows | Bytes | Cost (%CPU)| Time   |
      ----------------------------------------------------------------------------
      |  0 | SELECT STATEMENT  |    |   1 |   3 | 20132  (1)| 00:00:01 |
      |  1 | SORT GROUP BY   |    |   1 |   3 |      |     |
      |  2 |  TABLE ACCESS FULL| T_SEX |  14M|  42M| 20132  (1)| 00:00:01 |
      ----------------------------------------------------------------------------

      Note
      -----
        - dynamic statistics used: dynamic sampling (level=2)


      Statistics
      ----------------------------------------------------------
           0 recursive calls
           0 db block gets
         74074 consistent gets
           0 physical reads
           0 redo size
          552 bytes sent via SQL*Net to client
          608 bytes received via SQL*Net from client
           2 SQL*Net roundtrips to/from client
           1 sorts (memory)
           0 sorts (disk)
           1 rows processed

      SQL> select sex from t_sex t group by sex;

      SE
      --
      m
      w

      Elapsed: 00:00:01.08

      Execution Plan
      ----------------------------------------------------------
      Plan hash value: 3915432945

      ----------------------------------------------------------------------------
      | Id | Operation     | Name | Rows | Bytes | Cost (%CPU)| Time   |
      ----------------------------------------------------------------------------
      |  0 | SELECT STATEMENT  |    |  14M|  42M| 20558  (3)| 00:00:01 |
      |  1 | SORT GROUP BY   |    |  14M|  42M| 20558  (3)| 00:00:01 |
      |  2 |  TABLE ACCESS FULL| T_SEX |  14M|  42M| 20132  (1)| 00:00:01 |
      ----------------------------------------------------------------------------

      Note
      -----
        - dynamic statistics used: dynamic sampling (level=2)


      Statistics
      ----------------------------------------------------------
           0 recursive calls
           0 db block gets
         74074 consistent gets
           0 physical reads
           0 redo size
          589 bytes sent via SQL*Net to client
          608 bytes received via SQL*Net from client
           2 SQL*Net roundtrips to/from client
           1 sorts (memory)
           0 sorts (disk)
           2 rows processed

       

      可以看到Oracle的性能即使不加索引也明顯比PostgreSQL中要好。

      那么我們在PostgreSQL中是不是沒辦法繼續(xù)優(yōu)化了呢?這種情況我們利用pg中的遞歸語句結(jié)合索引可以大幅提升性能。

      SQL改寫:

      bill=# with recursive tmp as (
      bill(#  (
      bill(#   select min(t.sex) as sex from t_sex t where t.sex is not null
      bill(#  )
      bill(#  union all
      bill(#  (
      bill(#   select (select min(t.sex) from t_sex t where t.sex > s.sex and t.sex is not null)
      bill(#    from tmp s where s.sex is not null
      bill(#  )
      bill(# )
      bill-# select count(distinct sex) from tmp;
       count
      -------
         2
      (1 row)

      Time: 2.711 ms

       

      查看執(zhí)行計劃:

      bill=# explain with recursive tmp as (
      bill(#  (
      bill(#   select min(t.sex) as sex from t_sex t where t.sex is not null
      bill(#  )
      bill(#  union all
      bill(#  (
      bill(#   select (select min(t.sex) from t_sex t where t.sex > s.sex and t.sex is not null)
      bill(#    from tmp s where s.sex is not null
      bill(#  )
      bill(# )
      bill-# select count(distinct sex) from tmp;
                                 QUERY PLAN
      ----------------------------------------------------------------------------------------------------------------------
       Aggregate (cost=53.62..53.63 rows=1 width=8)
        CTE tmp
         -> Recursive Union (cost=0.46..51.35 rows=101 width=32)
            -> Result (cost=0.46..0.47 rows=1 width=32)
               InitPlan 3 (returns $1)
                -> Limit (cost=0.44..0.46 rows=1 width=2)
                   -> Index Only Scan using idx_sex_1 on t_sex t (cost=0.44..371996.44 rows=20000000 width=2)
                      Index Cond: (sex IS NOT NULL)
            -> WorkTable Scan on tmp s (cost=0.00..4.89 rows=10 width=32)
               Filter: (sex IS NOT NULL)
        -> CTE Scan on tmp (cost=0.00..2.02 rows=101 width=32)
      (11 rows)

      Time: 1.371 ms

      可以看到執(zhí)行時間從原先的8000ms降低到了2ms,提升了幾千倍!

      甚至對比Oracle,性能也是提升了很多。

      但是需要注意的是:這種寫法僅僅是針對稀疏列,換成數(shù)據(jù)分布廣泛的字段,顯然性能是下降的, 所以使用遞歸SQL不適合數(shù)據(jù)分布廣泛的字段的group by或者count(distinct)操作。

      文章來源:腳本之家

      來源地址:https://www.jb51.net/article/204437.htm

      申請創(chuàng)業(yè)報道,分享創(chuàng)業(yè)好點子。點擊此處,共同探討創(chuàng)業(yè)新機遇!

      相關(guān)文章

      熱門排行

      信息推薦