一段 sql 比较

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

› MySQL 5.5 Community Server

› MySQL 5.6 Community Server

› Percona Configuration Wizard

› XtraBackup 搭建主从复制

Great Sites on MySQL

› Percona

› MySQL Performance Blog

› Severalnines

推荐管理工具

› Sequel Pro

› phpMyAdmin

推荐书目

› MySQL Cookbook

MySQL 相关项目

› MariaDB

› Drizzle

参考文档

› http://mysql-python.sourceforge.net/MySQLdb.html

这是一个创建于 1011 天前的主题，其中的信息可能已经有所发展或是发生改变。

A: and (data_his_id = '0' or data_his_id in (select id from xx where business_type = 1 and states = 2 and audit_status = 2))

B: and (data_his_id = '0' or (select count(1) from xx where business_type = 1 and states = 2 and audit_status = 2 and id=data_his_id)>=1)

本人菜鸡大佬说 a 写法好因为 b 是小表求解但是这连主键都用不上了本人不敢多说. 大伙看看

20 条回复 • 2022-04-28 22:58:26 +08:00

ration

2022-04-28 11:12:41 +08:00 via Android

凭感觉是 A 好，B 用了 count 还关联了表。具体你可以看看实际的执行时间和执行计划

Hurriance

2022-04-28 11:19:00 +08:00

感觉 A B 最终结果集是不一样的

bthulu

2022-04-28 11:29:28 +08:00

navicat 连上去, 生成测试数据, 大表 1 一个亿, 小表 100 万, 再把你这两条 sql 分别执行一下看看就知道了

brader

2022-04-28 11:31:56 +08:00

理论争论不下的时候，建议实操，试下各自的想法建立索引，不同表数据量下的实际查询时间较量。

另外，长点心吧，where 条件 id 用字符串'0'来做条件，虽然 mysql 听智能的，会自动转化

season8

2022-04-28 11:40:18 +08:00

都不好，换成 exists
and (data_his_id = '0' or exists (select 1 from xx where xx.id = data_his_id and business_type = 1 and states = 2 and audit_status = 2))

season8

2022-04-28 11:51:57 +08:00

@season8 抱歉，没看清，exist 大表小表都合适（万金油），如果 xx 是小表的话，A 方式也是可以的。

lipaa

2022-04-28 13:50:55 +08:00

@Hurriance 一样的

lipaa

2022-04-28 13:51:13 +08:00

@bthulu 1

lipaa

2022-04-28 13:51:25 +08:00

@brader 1

lipaa

2022-04-28 13:51:35 +08:00

@season8 1

lipaa

2022-04-28 13:51:47 +08:00

麻了

xuanbg

2022-04-28 13:52:23 +08:00

A 写法中规中矩，B 写法实在是脑洞有点大啊。

lipaa

2022-04-28 13:52:59 +08:00

@xuanbg 我也觉得

lipaa

2022-04-28 13:54:45 +08:00

我的想法是 A 查全表了 B 可以用上主键索引性能应该更好尤其 A 表后期变大了的情况下但是貌似我的想法错了我多测测吧复习吧

wolfie

2022-04-28 15:05:35 +08:00

A：关联表数据量大就慢。
B：跟 exists 差不多，看起来难受，性能不会拉跨。

DonaldY

2022-04-28 16:48:50 +08:00

都不好诶。

OR 优化掉吧。

LeegoYih

2022-04-28 17:03:59 +08:00

如果条件允许的话，我建议是拆成 2 个 SQL：
1：select id from xx where business_type = 1 and states = 2 and audit_status = 2;
2：select * from t where any = ? and data_his_id in (ids);

第 1 个 SQL 执行完后，通过代码往结果里塞一个'0'，然后再执行第二个 SQL 。

不确定这个是最高效的，建议在仿真环境看看每个 SQL 的执行计划：

```sql
set optimizer_trace="enabled=on";
select * from xxx where xx = ?;
select * from information_schema.optimizer_trace;
set optimizer_trace="enabled=off";
```

LeegoYih

2022-04-28 17:13:26 +08:00

如果一定要写成一个 SQL 的话，可以用 union all ，不会影响性能。

select * from t
where any = ?
and data_his_id in (
select 0
union all
select id from xx where business_type = 1 and states = 2 and audit_status = 2);

zlowly

2022-04-28 17:36:46 +08:00

两者有可能在优化器作用下差别不大，你真要比较，应该将 B 重写成 EXISTS 方式的查询语句来比较，因为 exists 可以在子查询只返回第一条记录而不是所有记录，应该对执行计划有较大影响。

但在真实世界里，如果 A 查询对 xx 表的子查询的结果集比较大（例如上万条）那么优化器可能会将原表和子查询结果集做 Hash Join ，而 B 查询也许是 Nested Loop ，这都是视乎你表的数据量和列数据的选择性等情况而定。

所以要做性能比较，这与其上网隔空问，还远不如直接看执行计划或者就是直接执行来得实在。

pengtdyd

2022-04-28 22:58:26 +08:00

现在的 sql 优化器你怎么写基本上已经没啥区别了。。。。。