你的位置:首页 > 软件开发 > 数据库 > oom_kill_process造成数据库挂起并出现found dead shared server

oom_kill_process造成数据库挂起并出现found dead shared server

发布时间:2016-08-06 12:00:21
这篇博客是上一篇博客Oracle shutdown immediate遭遇ORA-24324 ORA-24323 ORA-01089的延伸(数据库挂起hang时,才去重启的),其实这是我们海外一工厂的遇到的案例,把内容拆开是因为这个case分开讲述显得主题明确一些。正式进入主题: ...

这篇博客是上一篇博客Oracle shutdown immediate遭遇ORA-24324 ORA-24323 ORA-01089的延伸(数据库挂起hang时,才去重启的),其实这是我们海外一工厂的遇到的案例,把内容拆开是因为这个case分开讲述显得主题明确一些。正式进入主题:

 

服务器数据库版本Oracle Database 10g Release 10.2.0.4.0,操作系统为Red Hat Enterprise Linux Server release 5.7,虚拟机。当时告警日志里面出现大量的found dead shared server这里信息。数据库也出现连接不上的情况

 

found dead shared server 'S016', pid = (35, 23)
found dead shared server 'S023', pid = (42, 1)
Fri Aug  5 10:28:48 2016
found dead shared server 'S013', pid = (32, 110)
found dead shared server 'S021', pid = (40, 1)
Fri Aug  5 10:33:53 2016
found dead shared server 'S012', pid = (31, 132)
found dead shared server 'S023', pid = (38, 3)
Fri Aug  5 10:38:55 2016
found dead shared server 'S013', pid = (32, 111)
found dead shared server 'S022', pid = (42, 3)
Fri Aug  5 10:40:53 2016
found dead shared server 'S020', pid = (39, 4)
found dead shared server 'S021', pid = (40, 3)
failed to start shared server, oer=0

oom_kill_process造成数据库挂起并出现found dead shared server

 

通过检查发现系统内存耗尽,出现了oom_kill 。OOMkiller,即out of memory killer,是linux下面当内存耗尽时的的一种处理机制。当内存较少时,OOM会遍历整个进程链表,然后根据进程的内存使用情况以及它的oom score值最终找到得分较高的进程,然后发送kill信号将其杀掉。

 

oom_kill_process造成数据库挂起并出现found dead shared server

# grep -i kill /var/log/messages | more
Aug  5 10:12:10 xxxxx kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug  5 10:12:10 xxxxx kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Aug  5 10:12:11 xxxxx kernel: Out of memory: kill process 21687 (oracle) score 2296119 or a child
Aug  5 10:12:11 xxxxx kernel: Killed process 21687 (oracle)
Aug  5 10:12:11 xxxxx kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug  5 10:12:11 xxxxx kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Aug  5 10:12:11 xxxxx kernel: Out of memory: kill process 21668 (oracle) score 2144517 or a child
Aug  5 10:12:11 xxxxx kernel: Killed process 21668 (oracle)
Aug  5 10:23:09 xxxxx kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug  5 10:23:09 xxxxx kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Aug  5 10:23:09 xxxxx kernel: Out of memory: kill process 21756 (oracle) score 2144517 or a child
Aug  5 10:23:09 xxxxx kernel: Killed process 21756 (oracle)
Aug  5 10:23:09 xxxxx kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug  5 10:23:09 xxxxx kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Aug  5 10:23:09 xxxxx kernel: Out of memory: kill process 21732 (oracle) score 2138384 or a child
Aug  5 10:23:09 xxxxx kernel: Killed process 21732 (oracle)
Aug  5 10:28:08 xxxxx kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug  5 10:28:08 xxxxx kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Aug  5 10:28:09 xxxxx kernel: Out of memory: kill process 21752 (oracle) score 2144521 or a child
Aug  5 10:28:09 xxxxx kernel: Killed process 21752 (oracle)
Aug  5 10:28:09 xxxxx kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug  5 10:28:09 xxxxx kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Aug  5 10:28:09 xxxxx kernel: Out of memory: kill process 21722 (oracle) score 2138377 or a child
Aug  5 10:28:09 xxxxx kernel: Killed process 21722 (oracle)
Aug  5 10:32:24 xxxxx kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug  5 10:32:24 xxxxx kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Aug  5 10:32:24 xxxxx kernel: Out of memory: kill process 21718 (oracle) score 2135307 or a child
Aug  5 10:32:24 xxxxx kernel: Killed process 21718 (oracle)
Aug  5 10:32:24 xxxxx kernel: gdm-rh-security invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug  5 10:32:24 xxxxx kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Aug  5 10:32:24 xxxxx kernel: Out of memory: kill process 22053 (oracle) score 2135300 or a child
Aug  5 10:32:24 xxxxx kernel: Killed process 22053 (oracle)
Aug  5 10:37:54 xxxxx kernel: beremote invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug  5 10:37:54 xxxxx kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Aug  5 10:37:54 xxxxx kernel: Out of memory: kill process 22238 (oracle) score 2134274 or a child
Aug  5 10:37:54 xxxxx kernel: Killed process 22238 (oracle)
Aug  5 10:37:54 xxxxx kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug  5 10:37:54 xxxxx kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b
Aug  5 10:37:54 xxxxx kernel: Out of memory: kill process 22128 (oracle) score 2133001 or a child
--More--

oom_kill_process造成数据库挂起并出现found dead shared server

 

从上面可以看到大量的ORACLE的进程被kill掉,从而导致ORACLE出现"found dead shared server 'S016', pid = (35, 23)"这类错误,在官方文档Found Dead Shared Server Messages Reported In Alert.Log (文档 ID 760872.1) 有如下介绍(这个文档较老旧,不过原理依然适用于此处环境):

 

SYMPTOMS

 

在官方文档Oracle VM Server hangs after Invoking the OOM Killer and having hundreds of kpartx processes spawned and "state S non-preferred supports toluSnA" reported on the FC LUNs (文档 ID 2123877.1)介绍了这么一个案例

 

APPLIES TO:

Oracle VM - Version 3.3.2 and later

原标题:oom_kill_process造成数据库挂起并出现found dead shared server

关键词:数据库

*特别声明:以上内容来自于网络收集,著作权属原作者所有,如有侵权,请联系我们: admin#shaoqun.com (#换成@)。