Fix an issue with RabbitMQ lacking consumers on log queues after upgrading to 11.5.3 leading to log collection issues
Issue
The issue is related to a corrupt version of nw_admin.ez plug-in which assists the rabbitmq communication between the queues on both sides.After upgrade, one or more event types are no longer being passed from a Remote Log Collector to a Local Log Collector, resulting in:
- Missing logs in Investigate
- H&W Alerts for "LogCollector Event Processor Queue with No Consumer" or similar
- H&W Alerts for "Critical RabbitMQ Queue Message Count"
- H&W Alerts for disk space due to a buildup of RDQ files on the Collectors
Cause
This behavior has been observed on upgrades to 11.5.3 and potentially 11.5.2 and may be attributable to a corrupt nw_admin.ez plug-in.
Workaround
Confirm the problem:- Rabbitmq shovels show green on either the RLC (push) or the Local LC (Pull) - Green indicates that the shovel is up and running, but it does not highlight missing consumers.
- Running the following commands on the Remote Log Collector and the Local Log Collector to show one or more Event Queues with no consumers:
[root@NW11-LOG-HYBRID ~]# rabbitmqctl list_queues -p logcollection name messages consumers
Timeout: 60.0 seconds ...
Listing queues for vhost logcollection ...
name messages consumers
LogDecoder.logdecoder.windows 0 1
LogDecoder.logdecoder.checkpoint 0 1
LogDecoder.logdecoder.syslog 0 1
LogDecoder.logdecoder.file 0 1
LogDecoder.logdecoder.netflow 0 1
LogDecoder.logdecoder.sdee 0 1
LogDecoder.logdecoder.snmptrap 0 1
LogDecoder.logdecoder.vmware 0 1
rabbitmq.log 0 1
LogDecoder.logdecoder.cmdscript 0 1
LogDecoder.logdecoder.windowslegacy 0 1
LogDecoder.logdecoder.odbc 0 1
Timeout: 60.0 seconds ...
Listing queues for vhost logcollection ...
name messages consumers
LogDecoder.logdecoder.windows 0 1
LogDecoder.logdecoder.checkpoint 0 1
LogDecoder.logdecoder.syslog 0 1
LogDecoder.logdecoder.file 0 1
LogDecoder.logdecoder.netflow 0 1
LogDecoder.logdecoder.sdee 0 1
LogDecoder.logdecoder.snmptrap 0 1
LogDecoder.logdecoder.vmware 0 1
rabbitmq.log 0 1
LogDecoder.logdecoder.cmdscript 0 1
LogDecoder.logdecoder.windowslegacy 0 1
LogDecoder.logdecoder.odbc 0 1
Example of missing consumers (look at Windows and Syslog):
[root@NW11-LOG-HYBRID ~]# rabbitmqctl list_queues -p logcollection name messages consumers
Timeout: 60.0 seconds ...
Listing queues for vhost logcollection ...
name messages consumers
LogDecoder.logdecoder.windows 0 0
LogDecoder.logdecoder.checkpoint 0 1
LogDecoder.logdecoder.syslog 0 0
LogDecoder.logdecoder.file 0 1
LogDecoder.logdecoder.netflow 0 1
LogDecoder.logdecoder.sdee 0 1
LogDecoder.logdecoder.snmptrap 0 1
LogDecoder.logdecoder.vmware 0 1
rabbitmq.log 0 1
LogDecoder.logdecoder.cmdscript 0 1
LogDecoder.logdecoder.windowslegacy 0 1
LogDecoder.logdecoder.odbc 0 1
Timeout: 60.0 seconds ...
Listing queues for vhost logcollection ...
name messages consumers
LogDecoder.logdecoder.windows 0 0
LogDecoder.logdecoder.checkpoint 0 1
LogDecoder.logdecoder.syslog 0 0
LogDecoder.logdecoder.file 0 1
LogDecoder.logdecoder.netflow 0 1
LogDecoder.logdecoder.sdee 0 1
LogDecoder.logdecoder.snmptrap 0 1
LogDecoder.logdecoder.vmware 0 1
rabbitmq.log 0 1
LogDecoder.logdecoder.cmdscript 0 1
LogDecoder.logdecoder.windowslegacy 0 1
LogDecoder.logdecoder.odbc 0 1
Workaround:
- Download the provided nw_admin-11.5.3.0.ez
- SCP it to /root/ on all Log Collectors (remote and local)
- Run a sha256sum on it to confirm it is the correct version and was not corrupted in transit.
[root@NW11-LOG-HYBRID ~]# sha256sum nw_admin-11.5.3.0.ez
b3efc16dee21d2df97f859fdb6eecb3995597671fccd30cc38396d4c4c1712b3 nw_admin-11.5.3.0.ez
b3efc16dee21d2df97f859fdb6eecb3995597671fccd30cc38396d4c4c1712b3 nw_admin-11.5.3.0.ez
- Stop rabbitmq-server: systemctl stop rabbitmq-server
- Take a backup of the current nw_admin.ez plug-in currently on the filesystem:
- mv /usr/lib/rabbitmq/lib/rabbitmq_server-3.8.3/plugins/nw_admin.ez /root/nw_admin.ez.BAK
- Copy the new version to the appropriate location (taking special consideration to rename it to simply "nw_admin.ez"):
- cp /root/nw_admin-11.5.3.0.ez /usr/lib/rabbitmq/lib/rabbitmq_server-3.8.3/plugins/nw_admin.ez
- Change the permissions:
[root@NW11-LOG-HYBRID ~]# cd /usr/lib/rabbitmq/lib/rabbitmq_server-3.8.3/plugins/
[root@NW11-LOG-HYBRID plugins]# chmod 644 nw_admin.ez
[root@NW11-LOG-HYBRID plugins]# chmod 644 nw_admin.ez
- Verify the sha256sum and permissions once more:
[root@NW11-LOG-HYBRID plugins]# sha256sum nw_admin.ez
b3efc16dee21d2df97f859fdb6eecb3995597671fccd30cc38396d4c4c1712b3 nw_admin.ez
[root@NW11-LOG-HYBRID plugins]# ls -lrth nw_admin.ez
-rw-r--r--. 1 root root 49K Apr 20 19:56 nw_admin.ez
b3efc16dee21d2df97f859fdb6eecb3995597671fccd30cc38396d4c4c1712b3 nw_admin.ez
[root@NW11-LOG-HYBRID plugins]# ls -lrth nw_admin.ez
-rw-r--r--. 1 root root 49K Apr 20 19:56 nw_admin.ez
- Make a backup of the original plug-in and also copy the new plug-in to our chef reference directory for potential future use:
[root@NW11-LOG-HYBRID ~]# cp /opt/netwitness/nw_admin-11.5.3.0.ez /opt/netwitness/nw_admin-11.5.3.0.ez-ORIG
[root@NW11-LOG-HYBRID ~]# cp /root/nw_admin-11.5.3.0.ez /opt/netwitness/nw_admin-11.5.3.0.ez
cp: overwrite ‘/opt/netwitness/nw_admin-11.5.3.0.ez’? y
[root@NW11-LOG-HYBRID ~]# cp /root/nw_admin-11.5.3.0.ez /opt/netwitness/nw_admin-11.5.3.0.ez
cp: overwrite ‘/opt/netwitness/nw_admin-11.5.3.0.ez’? y
- Verify the sha256sum of the backup and patched version once again:
[root@NW11-LOG-HYBRID ~]# sha256sum /opt/netwitness/nw_admin-11.5.3.0.ez*
b3efc16dee21d2df97f859fdb6eecb3995597671fccd30cc38396d4c4c1712b3 /opt/netwitness/nw_admin-11.5.3.0.ez
ef7569e292be011ef130c3e7b838026f87792afe90ab6cf7a738deac924ec65d /opt/netwitness/nw_admin-11.5.3.0.ez-ORIG
b3efc16dee21d2df97f859fdb6eecb3995597671fccd30cc38396d4c4c1712b3 /opt/netwitness/nw_admin-11.5.3.0.ez
ef7569e292be011ef130c3e7b838026f87792afe90ab6cf7a738deac924ec65d /opt/netwitness/nw_admin-11.5.3.0.ez-ORIG
- Start rabbitmq-server: systemctl start rabbitmq-server
- Restart the nwlogcollector service: systemctl restart nwlogcollector
- Do this on both the local and remote LCs.
- Verify that rabbitmq-server now recognizes the patched version (should be showing as "11.5.1.0") when doing a "rabbitmq-plugins list".
[root@NW11-LOG-HYBRID ~]# rabbitmq-plugins list | grep nw_admin
[E*] nw_admin 11.5.1.0
[E*] nw_admin 11.5.1.0
- Then verify that the consumers have returned:
[root@NW11-LOG-HYBRID plugins]# rabbitmqctl list_queues -p logcollection name messages consumers
Timeout: 60.0 seconds ...
Listing queues for vhost logcollection ...
name messages consumers
LogDecoder.logdecoder.windows 0 1
LogDecoder.logdecoder.checkpoint 0 1
LogDecoder.logdecoder.syslog 0 1
LogDecoder.logdecoder.file 0 1
LogDecoder.logdecoder.netflow 0 1
LogDecoder.logdecoder.sdee 0 1
LogDecoder.logdecoder.snmptrap 0 1
LogDecoder.logdecoder.vmware 0 1
rabbitmq.log 0 1
LogDecoder.logdecoder.cmdscript 0 1
LogDecoder.logdecoder.windowslegacy 0 1
LogDecoder.logdecoder.odbc 0 1
Timeout: 60.0 seconds ...
Listing queues for vhost logcollection ...
name messages consumers
LogDecoder.logdecoder.windows 0 1
LogDecoder.logdecoder.checkpoint 0 1
LogDecoder.logdecoder.syslog 0 1
LogDecoder.logdecoder.file 0 1
LogDecoder.logdecoder.netflow 0 1
LogDecoder.logdecoder.sdee 0 1
LogDecoder.logdecoder.snmptrap 0 1
LogDecoder.logdecoder.vmware 0 1
rabbitmq.log 0 1
LogDecoder.logdecoder.cmdscript 0 1
LogDecoder.logdecoder.windowslegacy 0 1
LogDecoder.logdecoder.odbc 0 1
- After 15 minutes, check that normal log flow in the UI has returned per lc.cid or did not.
- Stop rabbitmq and nwlogcollector services from "services.msc"
- Backup and remove the existing nw_admin pluggin from below 2 locations. (note: there may be previous versions in addition to 11.5.3.0, remove all nw_admin plugins in the following directories):
- C:\Program Files\RabbitMQ Server\rabbitmq_server-3.8.3\plugins
- C:\Program Files\NwLogCollector
- Copy the nw_admin.ez plug-in attached on this KB (that is also used on the CentOS appliances) to the following directories:
- C:\Program Files\RabbitMQ Server\rabbitmq_server-3.8.3\plugins
- C:\Program Files\NwLogCollector
- Start rabbitmq and nwlogcollector services from "services.msc"
Resolution
To install a patched version of the nw_admin.ez plug-in and restart the appropriate services.
Internal Comments
See the following JIRAs for more information:- https://bedfordjira.na.rsa.net/browse/SACE-15832
- https://bedfordjira.na.rsa.net/browse/ASOC-109851
Product Details
- Column 1: RSA Product Set: NetWitness Platform
RSA Product/Service Type: Log Collector, Log Decoder
RSA Version/Condition: 11.5.2, 11.5.3
Platform: CentOS
O/S Version: 7
Summary
After upgrading to 11.5.3, logs flowing from remote to local collectors may unexpectedly quit working. When using rabbitmqctl list_queues, you would notice certain (but not necessarily all) consumers are missing. This occurs in both push and pull environments.
Approval Reviewer Queue
Technical approval queue