环境
OS: SUSE Linux Enterprise Server 12 SP2
DOCKER: 1.12.6
KERNEL: 4.4.59-92.20-default
RANCHER: v1.6.2
问题
2018年1月某日,在测试环境中发现服务器出现
kernel:[1854773.108055] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
临时解决办法是: reboot
排查过程
依据报错信息很快找到这个bug,open时间是opened this issue on 6 May 2014
https://github.com/moby/moby/issues/5618
(现在这个问题貌似解决了,但是那时是1月)在后来的日子里,此报错信息还伴随着,cpu负载变高,docker ps命令hang,等“杂音”
有人专门针对此问题给出了重现方法
https://github.com/fho/docker-samba-loop
在上面的操作系统内核版本上可以重现
kernel:[1598.704278] unregister_netdevice: waiting for lo to become free. Usage count = 1
如果修改dockerfile,追加命令
sleep 10
则不会有kernel 报错信息出现,可能是等待的过程网络连接正常关闭
此次bug 在kernel 4.4.114 上修复了
https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.114
commit edaafa805e0f9d09560a4892790b8e19cab8bf09Author: Dan Streetman <[email protected]>Date: Thu Jan 18 16:14:26 2018 -0500 net: tcp: close sock if net namespace is exiting [ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ] When a tcp socket is closed, if it detects that its net namespace is exiting, close immediately and do not wait for FIN sequence. For normal sockets, a reference is taken to their net namespace, so it will never exit while the socket is open. However, kernel sockets do not take a reference to their net namespace, so it may begin exiting while the kernel socket is still open. In this case if the kernel socket is a tcp socket, it will stay open trying to complete its close sequence. The sock's dst(s) hold a reference to their interface, which are all transferred to the namespace's loopback interface when the real interfaces are taken down. When the namespace tries to take down its loopback interface, it hangs waiting for all references to the loopback interface to release, which results in messages like: unregister_netdevice: waiting for lo to become free. Usage count = 1 These messages continue until the socket finally times out and closes. Since the net namespace cleanup holds the net_mutex while calling its registered pernet callbacks, any new net namespace initialization is blocked until the current net namespace finishes exiting. After this change, the tcp socket notices the exiting net namespace, and closes immediately, releasing its dst(s) and their reference to the loopback interface, which lets the net namespace continue exiting. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811 Signed-off-by: Dan Streetman <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
升级后,重试步骤3,不再出现报错
验证
在生产环境中升级了一个操作系统kernel 到4.4.114,但是发现问题依旧。
问题可能出现在,lo? eth0?
后续
待续
作者:老吕子
链接:https://www.jianshu.com/p/4ce0412b50c3
點擊查看更多內容
1人點贊
評論
評論
共同學習,寫下你的評論
評論加載中...
作者其他優質文章
正在加載中
感謝您的支持,我會繼續努力的~
掃碼打賞,你說多少就多少
贊賞金額會直接到老師賬戶
支付方式
打開微信掃一掃,即可進行掃碼打賞哦