How to handle Out of Memory (OOM) issue that have a lot of hs_err_pidxxx.logs on /root directory.
Issue
You may face the following crash with Out of Memory (OOM) issue along with a lot of “hs_err_pidxxx.logs“ on /root folder.
/var/log/messages:
Jun 1 08:31:22 hostname kernel: Out of memory: Kill process 4895 (java) score 81 or sacrifice child
Jun 1 08:31:22 hostname kernel: Killed process 4895 (java), UID 2000, total-vm:14805816kB, anon-rss:5702568kB, file-rss:0kB, shmem-rss:0kB
Jun 3 07:39:19 hostname kernel: Out of memory: Kill process 13446 (java) score 106 or sacrifice child
Jun 3 07:39:19 hostname kernel: Killed process 13446 (java), UID 0, total-vm:26058292kB, anon-rss:7665912kB, file-rss:0kB, shmem-rss:0kB
Jun 1 08:31:22 hostname kernel: Out of memory: Kill process 4895 (java) score 81 or sacrifice child
Jun 1 08:31:22 hostname kernel: Killed process 4895 (java), UID 2000, total-vm:14805816kB, anon-rss:5702568kB, file-rss:0kB, shmem-rss:0kB
Jun 3 07:39:19 hostname kernel: Out of memory: Kill process 13446 (java) score 106 or sacrifice child
Jun 3 07:39:19 hostname kernel: Killed process 13446 (java), UID 0, total-vm:26058292kB, anon-rss:7665912kB, file-rss:0kB, shmem-rss:0kB
At the same time, You can find the creation times of hs_err_pidxxx.logs and the execution times of nw_topology.py by checking from /var/log/cron is pretty much match.
/var/log/cron:
Jun 19 18:58:01 hostname CROND[7863]: (root) CMD (python3 /opt/rsa/saTools/platform/nwtopology/nw_topology.py)
Jun 19 18:59:01 hostname CROND[8348]: (root) CMD (python3 /opt/rsa/saTools/platform/nwtopology/nw_topology.py
Jun 19 21:00:01 hostname CROND[28035]: (root) CMD (python3 /opt/rsa/saTools/platform/nwtopology/nw_topology.py)
Jun 19 21:01:01 hostname CROND[28461]: (root) CMD (python3 /opt/rsa/saTools/platform/nwtopology/nw_topology.py)
ls -l /root:
-rw-r--r--. 1 root root 23528 Jun 19 18:58 hs_err_pid7877.log
-rw-r--r--. 1 root root 23626 Jun 19 18:59 hs_err_pid8369.log
-rw-r--r--. 1 root root 23535 Jun 19 21:00 hs_err_pid28051.log
-rw-r--r--. 1 root root 23731 Jun 19 21:03 hs_err_pid29310.log
Jun 19 18:58:01 hostname CROND[7863]: (root) CMD (python3 /opt/rsa/saTools/platform/nwtopology/nw_topology.py)
Jun 19 18:59:01 hostname CROND[8348]: (root) CMD (python3 /opt/rsa/saTools/platform/nwtopology/nw_topology.py
Jun 19 21:00:01 hostname CROND[28035]: (root) CMD (python3 /opt/rsa/saTools/platform/nwtopology/nw_topology.py)
Jun 19 21:01:01 hostname CROND[28461]: (root) CMD (python3 /opt/rsa/saTools/platform/nwtopology/nw_topology.py)
ls -l /root:
-rw-r--r--. 1 root root 23528 Jun 19 18:58 hs_err_pid7877.log
-rw-r--r--. 1 root root 23626 Jun 19 18:59 hs_err_pid8369.log
-rw-r--r--. 1 root root 23535 Jun 19 21:00 hs_err_pid28051.log
-rw-r--r--. 1 root root 23731 Jun 19 21:03 hs_err_pid29310.log
Resolution
As part of nwtopology, it executes the orchestration list hosts command and the same is observed in all the hs_err_pidxxx log files. This can cause the OOM issue.As long as the system is not going OOM frequently, you don't have to run this workaround. Otherwise, you can disable or schedule the nwtopology cronjob once a year like the following two workaround.
Option 1: To stop adding nwtopology cron job to the crontab whenever nw-base cookbook is run - This will disable the nwtopology permanently.
Edit the nwtopology.rb file and comment the cron expression. Refer the screenshot
vi /var/netwitness/config-management/cookbooks/platform/nw-base/recipes/nwtopology.rb
Option 2: To schedule nwtopology cron job on a specific day or once a year - Example: Schedule the cron job at 12 hours on 12th month (can be configured according to customer’s need).
Edit the nwtopology.rb file and comment the cron expression. Refer the screenshot
vi /var/netwitness/config-management/cookbooks/platform/nw-base/recipes/nwtopology.rb
Notes
The /var/netwitness/config-management/cookbooks/platform/nw-base/recipes/nwtopology.rb will be overwritten and the cron expression will be set to every 3 hours after upgrades which means to revert back to default value. Make sure either of the above options are followed post upgrade not to face this issue again.
Product Details
RSA Product Set: NetWitness Log and NetworkRSA Product/Service Type: Netwitness Server
RSA Version/Condition:11.x
Approval Reviewer Queue
Technical approval queue