V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
lolizeppelin
V2EX  ›  数据库

DBA 脑子要咋长才行啊....

  •  
  •   lolizeppelin · 2019-07-10 23:15:39 +08:00 · 2724 次点击
    这是一个创建于 2021 天前的主题,其中的信息可能已经有所发展或是发生改变。

    PG 里两条一样的语句,时间跨度不同,结果优化成完全不同的执行方式....

    语句语句用于计算:当天注册的人的登陆次数,一张注册表日志,一张登陆日志表 group by 用于合并渠道和游戏 id. 为了避免无法去重没用 distinct

    select one_day, count() login_count, ROUND((count() / sum(count(*)) over(PARTITION BY one_day, app_id order by app_id))*100, 2) app_percent, sub_channel, app_id from (select reg.reg_day as one_day, login.app_id as app_id, login.sub_channel as sub_channel, reg.user_id from (select count_time, date_trunc('day', count_time) as reg_day, user_id from t_l_register_log where count_time >= '2019-03-03 00:00:00' and count_time < '2019-04-04 00:00:00' order by reg_day) as reg inner join (select date_trunc('day', count_time) as login_day ,sub_channel, app_id, user_id from t_l_platform_login_log where count_time >= '2019-03-03 00:00:00' and count_time < '2019-04-04 00:00:00' group by login_day, app_id, sub_channel, user_id order by login_day) as login on date_trunc('day', reg.count_time) = login_day and reg.user_id = login.user_id group by one_day, app_id, sub_channel, reg.user_id) as tb group by one_day, app_id, sub_channel order by one_day, app_id, login_count desc, app_percent desc;

    这个是能这却执行的 sql,时间跨度 1 年

    ------------------------------------------------------------------------------------------
     Sort  (cost=97561067.23..97581067.23 rows=8000000 width=56) (actual time=34585.894..34586.519 rows=18517 loops=1)
       Output: reg.reg_day, (count(*)), (round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2)), login.sub_channel, login.app_id
       Sort Key: reg.reg_day, login.app_id, (count(*)) DESC, (round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2)) DESC
       Sort Method: quicksort  Memory: 2215kB
       ->  WindowAgg  (cost=83444220.45..96096929.48 rows=8000000 width=56) (actual time=33708.198..34578.031 rows=18517 loops=1)
             Output: reg.reg_day, (count(*)), round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2), login.sub_channel, login.app_id
             ->  GroupAggregate  (cost=83444220.45..95856929.48 rows=8000000 width=24) (actual time=33708.188..34562.551 rows=18517 loops=1)
                   Output: reg.reg_day, login.sub_channel, login.app_id, count(*)
                   Group Key: reg.reg_day, login.app_id, login.sub_channel
                   ->  Group  (cost=83444220.45..87555123.46 rows=328872241 width=28) (actual time=33708.179..34341.267 rows=3198877 loops=1)
                         Output: reg.reg_day, login.app_id, login.sub_channel, reg.user_id, login.app_id, login.sub_channel
                         Group Key: reg.reg_day, login.app_id, login.sub_channel, reg.user_id
                         ->  Sort  (cost=83444220.45..84266401.05 rows=328872241 width=20) (actual time=33708.175..33958.476 rows=3198877 loops=1)
                               Output: reg.reg_day, reg.user_id, login.app_id, login.sub_channel
                               Sort Key: reg.reg_day, login.app_id, login.sub_channel, reg.user_id
                               Sort Method: external merge  Disk: 93944kB
                               ->  Merge Join  (cost=10070881.07..16687034.80 rows=328872241 width=20) (actual time=28498.899..31904.531 rows=3198877 loops=1)
                                     Output: reg.reg_day, reg.user_id, login.app_id, login.sub_channel
                                     Merge Cond: (((date_trunc('day'::text, reg.count_time)) = login.login_day) AND (reg.user_id = login.user_id))
                                     ->  Sort  (cost=876809.36..883338.44 rows=2611632 width=20) (actual time=2665.030..2929.604 rows=3204477 loops=1)
                                           Output: reg.reg_day, reg.user_id, reg.count_time, (date_trunc('day'::text, reg.count_time))
                                           Sort Key: (date_trunc('day'::text, reg.count_time)), reg.user_id
                                           Sort Method: external merge  Disk: 131744kB
                                           ->  Subquery Scan on reg  (cost=142257.54..491333.83 rows=2611632 width=20) (actual time=376.086..1283.740 rows=3204477 loops=1)
                                                 Output: reg.reg_day, reg.user_id, reg.count_time, date_trunc('day'::text, reg.count_time)
                                                 ->  Gather Merge  (cost=142257.54..465217.51 rows=2611632 width=20) (actual time=376.081..815.735 rows=3204477 loops=1)
                                                       Output: _hyper_8_974_chunk.count_time, (date_trunc('day'::text, _hyper_8_974_chunk.count_time)), _hyper_8_974_chunk.user_id
                                                       Workers Planned: 8
                                                       Workers Launched: 8
                                                       ->  Sort  (cost=141257.40..142073.54 rows=326454 width=20) (actual time=346.654..379.105 rows=356053 loops=9)
                                                             Output: _hyper_8_974_chunk.count_time, (date_trunc('day'::text, _hyper_8_974_chunk.count_time)), _hyper_8_974_chunk.user_id
                                                             Sort Key: (date_trunc('day'::text, _hyper_8_974_chunk.count_time))
                                                             Sort Method: external merge  Disk: 10176kB
    

    这个是没有正确执行的 sql,时间跨度一个月,不知道要跑多少时间

    ---------------------------------------------------------------------------
     Sort  (cost=1378360.26..1378362.88 rows=1046 width=56)
       Output: reg.reg_day, (count(*)), (round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2)), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id
       Sort Key: reg.reg_day, _hyper_4_1229_chunk.app_id, (count(*)) DESC, (round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2)) DESC
       ->  WindowAgg  (cost=1377873.71..1378307.80 rows=1046 width=56)
             Output: reg.reg_day, (count(*)), round(((((count(*)))::numeric / sum((count(*))) OVER (?)) * '100'::numeric), 2), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id
             ->  GroupAggregate  (cost=1377873.71..1378276.42 rows=1046 width=24)
                   Output: reg.reg_day, _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, count(*)
                   Group Key: reg.reg_day, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel
                   ->  Group  (cost=1377873.71..1378004.46 rows=10460 width=28)
                         Output: reg.reg_day, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, reg.user_id, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel
                         Group Key: reg.reg_day, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, reg.user_id
                         ->  Sort  (cost=1377873.71..1377899.86 rows=10460 width=20)
                               Output: reg.reg_day, reg.user_id, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel
                               Sort Key: reg.reg_day, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, reg.user_id
                               ->  Merge Join  (cost=652242.46..1377175.37 rows=10460 width=20)
                                     Output: reg.reg_day, reg.user_id, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel
                                     Merge Cond: ((date_trunc('day'::text, _hyper_4_1229_chunk.count_time)) = (date_trunc('day'::text, reg.count_time)))
                                     Join Filter: (reg.user_id = _hyper_4_1229_chunk.user_id)
                                     ->  Group  (cost=650558.24..1318952.53 rows=979872 width=20)
                                           Output: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.user_id
                                           Group Key: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.user_id
                                           ->  Gather Merge  (cost=650558.24..1267509.25 rows=4899360 width=20)
                                                 Output: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.user_id
                                                 Workers Planned: 5
                                                 ->  Group  (cost=649558.16..676504.67 rows=979872 width=20)
                                                       Output: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.user_id
                                                       Group Key: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.user_id
                                                       ->  Sort  (cost=649558.16..654457.53 rows=1959746 width=20)
                                                             Output: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.user_id
                                                             Sort Key: (date_trunc('day'::text, _hyper_4_1229_chunk.count_time)), _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.user_id
                                                             ->  Result  (cost=0.00..404552.30 rows=1959746 width=20)
                                                                   Output: date_trunc('day'::text, _hyper_4_1229_chunk.count_time), _hyper_4_1229_chunk.sub_channel, _hyper_4_1229_chunk.app_id, _hyper_4_1229_chunk.user_id
                                                                   ->  Parallel Append  (cost=0.00..380055.48 rows=1959746 width=20)
    
    

    正确语句用了日期和 userid 排序, 然后正确的双排序 join 了 不正确的语句没用日期排序,最后导致了 M*N?

    有人能帮我解释下是不是这个理啊 orz

    DBA 脑子要要怎么长才能了了解数据库的各种坑啊......

    这 DBA 要怎么干啊 orz,我光测一条语句就要疯了...

    1 条回复    2019-07-11 10:36:37 +08:00
    CallMeReznov
        1
    CallMeReznov  
       2019-07-11 10:36:37 +08:00
    DBA 是另外一个物种
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   1087 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 23ms · UTC 19:00 · PVG 03:00 · LAX 11:00 · JFK 14:00
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.