大家好!我在一台 linux 主机上运行命令:iptables -nL 时,可以正常输出 Filter 表的一些规则设置信息。但是运行命令:iptables -t nat -nL 时,却卡着什么也不输出,使用 top 命令查看了下这条命令进程的信息,显示如下: 究竟什么原因导致这条命令无法执行呢?或者说有什么办法能看到这条命令卡在了什么环节呢? 操作系统为:CentOS Linux release 7.5.1804 (Core)
1
wd 2018-08-12 22:55:07 +08:00 via iPhone
strace 看看
|
2
tempdban 2018-08-13 02:21:35 +08:00 via Android
cat /proc/16523/status
|
3
ryd994 2018-08-13 07:40:47 +08:00 via Android
看看 dmesg,别是内核 oops 了
|
4
clearbug OP @tempdban #2 运行命令 cat /proc/16523/status 后,输出如下:
``` bash Name: iptables State: D (disk sleep) Tgid: 17357 Ngid: 0 Pid: 17357 PPid: 8171 TracerPid: 6117 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 256 Groups: 0 VmPeak: 16196 kB VmSize: 16112 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 548 kB VmRSS: 508 kB VmData: 468 kB VmStk: 136 kB VmExe: 80 kB VmLib: 3072 kB VmPTE: 48 kB VmSwap: 0 kB Threads: 1 SigQ: 2/63168 SigPnd: 0000000000040000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: 0000000000000000 CapInh: 0000000000000000 CapPrm: 0000001fffffffff CapEff: 0000001fffffffff CapBnd: 0000001fffffffff Seccomp: 0 Cpus_allowed: ffffff Cpus_allowed_list: 0-23 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003 Mems_allowed_list: 0-1 voluntary_ctxt_switches: 1 nonvoluntary_ctxt_switches: 1 ``` 我看到网上别人解释 State: D (disk sleep) 的意思是进程在等待磁盘 IO (这里说磁盘 IO,也可能是网络 IO 吧),然后我运行命令:ll /proc/16523/fd/ ,看了下该进程打开的文件句柄,输出如下: ``` bash lrwx------ 1 root root 64 Aug 13 10:10 0 -> /dev/pts/0 lrwx------ 1 root root 64 Aug 13 10:10 1 -> socket:[836197772] lrwx------ 1 root root 64 Aug 13 10:10 2 -> socket:[836197773] ``` bash 然后我又看了这个解释 socket:[number] 的博客:[https://blog.csdn.net/lkkey80/article/details/16856063]( https://blog.csdn.net/lkkey80/article/details/16856063),我又运行 cat /proc/net/unix | grep 836197772,找到了 836197772: ``` bash Num RefCount Protocol Flags Type St Inode Path ffff880471dd9680: 00000002 00000000 00000000 0001 01 836197772 @xtables ``` 运行 cat /proc/net/raw | grep 836197773 找到了 836197773: ``` bash sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops 255: 00000000:00FF 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 836197773 2 ffff880471ddf440 0 ``` (/proc/net/tcp 和 /proc/net/udp 这两个文件我也看了,没有 836197772 和 836197773 这两个 inode 的信息)。现在,到这里了,/proc/net/unix 和 /proc/net/raw 这两个文件的信息我看不太懂,接下来该怎么排查呢? |
5
clearbug OP @wd #1 我运行了命令:strace -p 16523,就输出了一句:Process 16523 attached ,然后就再也没有输出了,strace 命令本身的进程 id 是 22098,然后我又看了下 strace 命令本身的进程状态:cat /proc/22098/status,输出的是:
State: S (sleeping) 然后我把之前 iptables 命令起的进程杀掉后又运行了命令:strace iptables -t nat -nL,输出如下: execve("/sbin/iptables", ["iptables", "-t", "nat", "-nL"], [/* 24 vars */]) = 0 brk(0) = 0x2387000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f66473f7000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=82284, ...}) = 0 mmap(NULL, 82284, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f66473e2000 close(3) = 0 open("/lib64/libip4tc.so.0", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\31\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=32816, ...}) = 0 mmap(NULL, 2126600, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6646fcf000 mprotect(0x7f6646fd5000, 2097152, PROT_NONE) = 0 mmap(0x7f66471d5000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f66471d5000 close(3) = 0 open("/lib64/libip6tc.so.0", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\32\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=32864, ...}) = 0 mmap(NULL, 2126632, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6646dc7000 mprotect(0x7f6646dce000, 2093056, PROT_NONE) = 0 mmap(0x7f6646fcd000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f6646fcd000 close(3) = 0 open("/lib64/libxtables.so.10", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\2403\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=53520, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f66473e1000 mmap(NULL, 2149016, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6646bba000 mprotect(0x7f6646bc5000, 2097152, PROT_NONE) = 0 mmap(0x7f6646dc5000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb000) = 0x7f6646dc5000 close(3) = 0 open("/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0pS\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1139680, ...}) = 0 mmap(NULL, 3150136, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f66468b8000 mprotect(0x7f66469b9000, 2093056, PROT_NONE) = 0 mmap(0x7f6646bb8000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x100000) = 0x7f6646bb8000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P%\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2173512, ...}) = 0 mmap(NULL, 3981792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f66464eb000 mprotect(0x7f66466ae000, 2093056, PROT_NONE) = 0 mmap(0x7f66468ad000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c2000) = 0x7f66468ad000 mmap(0x7f66468b3000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f66468b3000 close(3) = 0 open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\16\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=19776, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f66473e0000 mmap(NULL, 2109744, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f66462e7000 mprotect(0x7f66462e9000, 2097152, PROT_NONE) = 0 mmap(0x7f66464e9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f66464e9000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f66473df000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f66473dd000 arch_prctl(ARCH_SET_FS, 0x7f66473dd740) = 0 mprotect(0x7f66468ad000, 16384, PROT_READ) = 0 mprotect(0x7f66464e9000, 4096, PROT_READ) = 0 mprotect(0x7f6646bb8000, 4096, PROT_READ) = 0 mprotect(0x7f6646dc5000, 4096, PROT_READ) = 0 mprotect(0x7f6646fcd000, 4096, PROT_READ) = 0 mprotect(0x7f66471d5000, 4096, PROT_READ) = 0 mprotect(0x613000, 4096, PROT_READ) = 0 mprotect(0x7f66473f8000, 4096, PROT_READ) = 0 munmap(0x7f66473e2000, 82284) = 0 socket(PF_LOCAL, SOCK_STREAM, 0) = 3 bind(3, {sa_family=AF_LOCAL, sun_path=@"xtables"}, 10) = 0 socket(PF_INET, SOCK_RAW, IPPROTO_RAW) = 4 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 getsockopt(4, SOL_IP, 0x40 /* IP_??? */, 输出的好像是 Linux 的系统调用信息吧,不知道为啥会卡在 getsockopt 这个调用这里。。。 |
6
clearbug OP @tempdban #2 不好意思,上面给你贴的命令 cat /proc/16523/status 的输出中 pid 不是 16523,是因为我又重新运行命令 iptables -t nat -nL 生成的进程 id 是 17357。。。
|
8
tempdban 2018-08-13 13:39:43 +08:00 via Android
cat /proc/16523/stack
strace iptables -t nat -nL |
9
tempdban 2018-08-13 13:40:40 +08:00 via Android
你这个问题可能要 core 一下看了
|
10
clearbug OP @tempdban #8 strace iptalbes -t nat -nL 这条命令我在 5 楼的回复中已经用过了,知道了卡在了 getsocketopt 这个系统调用处,但是还是不知道卡着的原因。。。而 cat /proc/16523/stack 的输出如下:
[<ffffffff8108b6d1>] call_usermodehelper_exec+0x111/0x1a0 [<ffffffff8108bb5b>] __request_module+0x18b/0x2b0 [<ffffffffa03c096f>] get_info+0x1ef/0x250 [ip_tables] [<ffffffffa03c199f>] do_ipt_get_ctl+0x6f/0x3a0 [ip_tables] [<ffffffff8152ebe8>] nf_getsockopt+0x68/0x90 [<ffffffff8153e5a0>] ip_getsockopt+0xa0/0xd0 [<ffffffff815617c5>] raw_getsockopt+0x25/0x50 [<ffffffff814e5ba4>] sock_common_getsockopt+0x14/0x20 [<ffffffff814e4f17>] SyS_getsockopt+0x77/0xf0 [<ffffffff81614a29>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff 这些输出我个 Linux 渣渣看不懂啊( Ĭ ^ Ĭ ) |
12
ryd994 2018-08-13 16:01:57 +08:00 via Android
syscall 往下就是内核里了
看进程也看不出什么 现在的问题是为什么在内核里挂了 正常来讲内核除非是 bug,不应该在这种地方 block 你的内核是 CentOS 官方内核么?有没有用第三方源或者自己编译 开着 dmesg -w 再跑一次看看 如果没有信息的话恐怕比较难解决。内核里 debug 基本靠 printk …… kgdb 靠不住 |
13
tempdban 2018-08-13 16:14:47 +08:00 via Android
stack 打的是内核栈,如果有兴趣找代码看一下
|
14
didida 2018-08-13 17:00:21 +08:00
也可能是因为 /etc/hosts 中没有 127.0.0.1 localhost 项
|
15
clearbug OP @didida #14
我看是有的 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 |
16
clearbug OP @wd #1
@tempdban @ryd994 #3 @didida #14 @tempdban #13 这个问题现在已经解决了。我再描述下整个问题吧,以便可以帮到其他人。 起初,我是要在一台 centos 7 的机器上以网桥模式运行 docker,但是启动的时候就一直卡着,因为对 linux 和 docker 都不太熟悉,所以一直没明白什么原因,并且启动过程也没看到什么提示信息。我寻思着 docker 的网桥模式可能跟 nat 有关,因为容器内外要做 ip 和端口映射,然后我发现运行 iptables -t nat -nL 命令时也会一直卡着,所以我寻思着应该是同样的原因。后来,无意中我看到系统进程中有一个进程是在运行命令 modprobe iptable_nat,然后我手杀了这个进程,然后我又自己手动运行命令 modprobe iptable_nat 试图加载这个模块,发现仍然是一直卡着没有任何输出,即使 modprobe -v iptable_nat 这样也没有任何输出。然后尝试了上面老哥们说到的 strace 等工具,但是输出的系统调用信息我还是没看懂。知道我仔细阅读了 modprobe 的 man page 之后,意识到它有一个 blacklist 的配置,然后我又在 /etc/modprobe.d 目录下看到了一个 blacklist.conf 文件,而这个文件上就有 blacklist iptable_nat。然后我就是把这个文件删了,然后上面所有的问题都解决啦,希望不会再有其它问题了!!!(说的有点乱,大家无聊时随便看吧) |