部分日志
2019-08-01T23:59:02.301+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:02.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:03.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:03.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:04.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:04.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:05.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:05.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:06.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:06.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:07.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:07.302+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:08.302+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:08.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:09.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:09.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:10.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:10.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:11.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:11.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:12.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:12.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:13.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:13.303+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:14.303+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:14.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:15.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:15.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:16.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:16.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:17.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:17.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:18.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:18.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:19.295+0800 W NETWORK [HostnameCanonicalizationWorker] Failed to obtain address information for hostname iZuf61zao4uxbprumx45dlZ: System error
2019-08-01T23:59:19.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:19.304+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:20.304+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:20.305+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:21.305+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:21.305+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:22.305+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:22.305+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:23.305+0800 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:23.305+0800 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:23.631+0800 E STORAGE [thread2] WiredTiger (24) [1564675163:631372][9783:0x7f4e30730700], file:WiredTiger.wt, WT_SESSION.checkpoint: /var/lib/mongodb/WiredTiger.turtle: handle-open: open: Too many open files
2019-08-01T23:59:23.632+0800 E STORAGE [thread2] WiredTiger (24) [1564675163:632761][9783:0x7f4e30730700], checkpoint-server: checkpoint server error: Too many open files
2019-08-01T23:59:23.632+0800 E STORAGE [thread2] WiredTiger (-31804) [1564675163:632802][9783:0x7f4e30730700], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
2019-08-01T23:59:23.632+0800 I - [thread2] Fatal Assertion 28558
2019-08-01T23:59:23.632+0800 I - [thread2]
***aborting after fassert() failure
2019-08-01T23:59:23.638+0800 F - [thread2] Got signal: 6 (Aborted).
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31862
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31862
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
设置了 sysctl.conf fs.file-max = 2097152
每天都会崩溃 实在不清楚问题所在根源
mongod --version
db version v3.2.11
git version: 009580ad490190ba33d1c6253ebd8d91808923e4
OpenSSL version: OpenSSL 1.0.2s 28 May 2019
allocator: tcmalloc
modules: none
build environment:
distarch: x86_64
target_arch: x86_64
1
KYLINZZ 2019-08-02 11:13:25 +08:00 1
|
2
auser 2019-08-02 11:16:37 +08:00 1
建议在 /proc/PID/limits 文件里看进程到底能打开多少 FD
|
3
comwrg OP @auser
``` Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 64000 64000 processes Max open files 64000 64000 files Max locked memory unlimited unlimited bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 31862 31862 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us ``` 看了下应该是没有问题的 |
4
auser 2019-08-02 11:54:04 +08:00 1
@comwrg 检查下 TCP 连接的数量,可以使用 ss 或者 netstat,然后看看 mongodb 进程相关的连接数量是否过多。如果过多,要根据 TCP 所处的状态来进一步推断问题在哪里,到底是什么原因把文件描述符资源占用完了。比如说被拒绝服务攻击,大量空的 TCP 连接。
一个网络连接占用一个文件描述符( fd ),打开文件读写也占用一个。从错误日志来看,最先出现的错误是文件描述符用完,导致新的网络连接拿不到 fd,accept (接受新网络连接的系统调用)失败。这种情况还好。但是对数据库而言,文件写不进磁盘,数据无法落地,主动崩溃是好的做法。 针对楼主的问题,我觉得很可能是频繁调用的地方,文件使用完没有关闭,导致 fd 一直无法释放,最终达到上限。现在楼主应该从网络(第一段所说)与 /proc/PID/fd/目录下来排查故障原因。 |
5
est 2019-08-02 11:55:10 +08:00
inode 用完了。
|
8
aaa5838769 2019-08-02 12:00:13 +08:00
这种一般都是磁盘没空间了,要不就是 i 节点用完了。
|
9
julyclyde 2019-08-02 12:03:53 +08:00 1
用 ulimit 或者 /etc/securiyt/limits.conf 去查看和修改是一种很经典的错误
后台服务的 rlimit 要在其启动的地方设置 |
10
bigpigB 2019-08-02 12:49:23 +08:00 via Android
ulimit 改大一点
|
11
neverfall 2019-08-02 13:03:15 +08:00
只管开不管关么?
记得 close |
12
comwrg OP @est @aaa5838769 都没用哈
|
13
comwrg OP @est @aaa5838769 都没有哈
|
14
comwrg OP @auser 非常感谢🙏,已经按照您说的去排查了
排查到 mongodb 占用了很多 fd ( 24135/38839 )占用超过了一半往上 ![image]( https://user-images.githubusercontent.com/19854253/62348661-efa26b00-b52f-11e9-80be-b1eef07c061b.png) 难道真的时候项目中没有关闭连接吗 不过这个项目已经运行了好几个月了 只是最近几天 mongo 开始频繁的因为 fd 用完而崩溃 |
15
comwrg OP |
16
auser 2019-08-02 14:49:08 +08:00 via iPhone
|
18
ilucio 2019-08-02 17:48:07 +08:00 via Android
将 ulimit 设置成 64000,官网文档里讲了的
|
19
auser 2019-08-02 23:23:13 +08:00 via iPhone
|
21
qq1340691923 2020-06-05 13:06:23 +08:00
@comwrg 你倒是分享一下最终结果啊..
|
22
comwrg OP @qq1340691923
缓解方案,将 ulimit 设置的非常大 之前设置到了 200000 就没有出现那种情况了 暂时还是不清楚是什么原因,不过推断可能是 collection 过多的原因(数量级大于十万) |