RabbitMQ file descriptor limit reached in NetWitness 11.4.x
Issue
To see the article in a demo format, view the RSA EduTube video on RabbitMQ file descriptor limit reached in NetWitness 11.4.x.
The RSA NetWitness appliance's RabbitMQ service appears not to be processing even though the service is still running. When performing a netstat on the server there are a large number of connections, possibly in the thousands, associated with RabbitMQ (beam.smp) process.
The following messages may be found in the /var/log/rabbitmq/rabbit_
2020-04-15 14:10:08.056 [error] <0.19260.1138> CRASH REPORT Process <0.19260.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395
2020-04-15 14:10:08.056 [error] <0.19626.1138> CRASH REPORT Process <0.19626.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.sms.collectd">>}) in rabbit_misc:dirty_read/1 line 395
<<"carlos-upstream-f51f708a-d04e-437f-8e3c-2b46672bf1cb">>,false} started with rabbit_federation_exchange_link:start_link({{upstream,
[<<"amqps://10.41.82.34:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carlos....">>,...},...}) at {restarting,<0.6913.1050>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error
3600000,none,false,'on-confirm',none,
<<"carlos-upstream-18e5b1f6-1698-4a55-848b-cbda1d3d8380">>,false} started with rabbit_federation_exchange_link:start_link({{upstream
[<<"amqps://10.41.82.32:5671?auth_mechanism=external">>],<<"carlos.sms.collectd">>,<<"...">>,...},...}) at {restarting,<0.7949.1050>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.sms.collectd">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error
2020-04-15 14:10:08.064 [warning] <0.579.0> Ranch acceptor reducing accept rate: out of file descriptors
2020-04-15 14:10:08.066 [warning] <0.600.0> Ranch acceptor reducing accept rate: out of file descriptors
2020-04-15 14:10:08.066 [error] <0.19116.1138> CRASH REPORT Process <0.19116.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395
'on-confirm',none, <<"carlos-upstream-b3ad4751-6cc5-4f67-8d50-ca20c2b25fed">>,false} started with rabbit_federation_exchange_link:start_link({{upstream,[<<"amqps://10.203.128.181:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carl...">>,...},...}) at {restarting,<0.6090.1050>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error
2020-04-15 14:10:08.071 [warning] <0.583.0> Ranch acceptor reducing accept rate: out of file descriptors
2020-04-15 14:10:08.073 [error] <0.19158.1138> CRASH REPORT Process <0.19158.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.sms.collectd">>}) in rabbit_misc:dirty_read/1 line 395
2020-04-15 14:10:08.073 [error] <0.2635.0> Supervisor {<0.2635.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqps://153.7.72.225:5671?auth_mechanism=external">>], <<"carlos.sms.collectd">>,<<"carlos.sms.collectd">>,1000,1,5, 3600000,none,false,'on-confirm',none, <<"carlos-upstream-698a3d8d-ba3e-4a93-a25c-b1185a966e86">>,false} started with rabbit_federation_exchange_link:start_link({{upstream,[<<"amqps://153.7.72.225:5671?auth_mechanism=external">>],
2020-03-02 17:19:46.106 [error] <0.19709.3856> CRASH REPORT Process <0.19709.3856> with 0 neighbours exited with reason: {aborted,{no_exists,[rabbit_runtime_parameters,cluster_name]}} in mnesia:abort/1 line 355
2020-03-02 17:19:46.106 [error] <0.15120.3869> Supervisor {<0.15120.3869>,rabbit_connection_sup} had child reader started with rabbit_reader:start_link(<0.17481.3872>, {acceptor,{0,0,0,0,0,0,0,0},5672}) at <0.19709.3856> exit with reason {aborted,{no_exists,[rabbit_runtime_parameters,cluster_name]}} in context child_terminated
2020-03-02 17:19:46.106 [error] <0.15120.3869> Supervisor {<0.15120.3869>,rabbit_connection_sup} had child reader started with rabbit_reader:start_link(<0.17481.3872>, {acceptor,{0,0,0,0,0,0,0,0},5672}) at <0.19709.3856> exit with reason reached_max_restart_intensity in context shutdown 2020-03-02 17:19:46.156 [error] <0.4268.3859> CRASH REPORT Process <0.4268.3859> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}) in rabbit_misc:dirty_read/1 line 395 2020-03-02 17:19:46.157 [error] <0.455.0> Supervisor {<0.455.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqps://172.19.108.192:5671?auth_mechanism=external">>],
'on-confirm',none,
<<"carlos-upstream-d40020aa-9396-4412-bde2-58f863530e9d">>,false} started with rabbit_federation_exchange_link:start_link({{upstream,[<<"amqps://172.19.108.192:5671?auth_mechanism=external">>],<<"carlos.alerts">>,<<"car...">>,...},...}) at {restarting,<0.10709.1780>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error
The following messages may be found in /var/log/rabbitmq/log/crash.log:
pid: <0.22077.1048>
registered_name: []
exception error: {function_clause,[{amqp_gen_connection,terminate,[{shutdown,{gen_server2,call,[file_handle_cache,{obtain,1,socket,<0.22077.1048>},infinity]}},{<0.23240.1048>,{amqp_params_network,<<"guest">>,<<"guest">>,<<"/rsa/system">>,"10.224.254.214",5671,2047,0,10,60000,[],[#Fun
ancestors: [<0.23914.1048>,amqp_sup,<0.259.0>]
message_queue_len: 0
messages: []
links: [<0.23914.1048>]
dictionary: []
trap_exit: true
status: running
heap_size: 1598
stack_size: 27
reductions: 1097
Context: shutdown_error
Reason: shutdown
Offender: [{nb_children,1},{name,channel_sup},{mfargs,{amqp_channel_sup,start_link,[network,<0.20634.45>,<<"client 153.7.72.222:47578 -> 10.95.222.3:5671">>]}},{restart_type,temporary},{shutdown,infinity},{child_type,supervisor}]
2020-04-15 20:06:47 =SUPERVISOR REPORT====
Context: shutdown_error
Reason: shutdown
Offender: [{nb_children,1},{name,channel_sup},{mfargs,{amqp_channel_sup,start_link,[network,<0.12775.48>,<<"client 153.7.72.222:36501 -> 10.95.222.6:5671">>]}},{restart_type,temporary},{shutdown,infinity},{child_type,supervisor}]
pid: <0.28993.2084>
registered_name: []
exception exit: {{badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},[{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,597}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} ancestors: [<0.678.0>,<0.413.0>,rabbit_federation_exchange_link_sup_sup,rabbit_federation_sup,rabbit_sup,<0.287.0>] message_queue_len: 0
messages: []
links: [<0.678.0>]
trap_exit: false
status: running
heap_size: 610
stack_size: 27
reductions: 241
Reason: {badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
Offender: [{pid,{restarting,<0.13111.682>}},{name,{upstream,[<<"amqps://10.100.6.20:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carlos.audit">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-14a5aeef-6a2b-4918-a048-97abea48151a">>,false}},{mfargs,{rabbit_federation_exchange_link,start_link,[{{upstream,[<<"amqps://10.100.6.20:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carlos.audit">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-14a5aeef-6a2b-4918-a048-97abea48151a">>,false},{resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}}]}},{restart_type,{permanent,5}},{shutdown,30000},{child_type,worker}]
2020-03-31 14:14:56 =CRASH REPORT====
crasher:
pid: <0.25092.2090>
registered_name: []
exception exit: {{badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},[{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,597}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} ancestors: [<0.498.0>,<0.413.0>,rabbit_federation_exchange_link_sup_sup,rabbit_federation_sup,rabbit_sup,<0.287.0>] message_queue_len: 0
links: [<0.498.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 610
stack_size: 27
reductions: 241
Context: start_error
Reason: {badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} Offender: [{pid,{restarting,<0.12183.682>}},{name,{upstream,[<<"amqps://10.100.217.26:5671?auth_mechanism=external">>],<<"carlos.alerts">>,<<"carlos.alerts">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-5d4a0f18-4c24-4a17-8f3d-759d96cf4e50">>,false}},{mfargs,{rabbit_federation_exchange_link,start_link,[{{upstream,[<<"amqps://10.100.217.26:5671?auth_mechanism=external">>],<<"carlos.alerts">>,<<"carlos.alerts">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-5d4a0f18-4c24-4a17-8f3d-759d96cf4e50">>,false},{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}}]}},{restart_type,{permanent,5}},{shutdown,30000},{child_type,worker}]
2020-03-02 17:19:46 =SUPERVISOR REPORT====
Supervisor: {<0.31133.3865>,rabbit_connection_sup}
Context: shutdown
Reason: reached_max_restart_intensity
Offender: [{pid,<0.23708.3863>},{name,reader},{mfargs,{rabbit_reader,start_link,[<0.7447.3872>,{acceptor,{0,0,0,0,0,0,0,0},5672}]}},{restart_type,intrinsic},{shutdown,30000},{child_type,worker}]
2020-03-02 17:19:46 =CRASH REPORT====
pid: <0.23253.3872>
registered_name: []
exception exit: {{badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},[{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,597}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
ancestors: [<0.455.0>,<0.385.0>,rabbit_federation_exchange_link_sup_sup,rabbit_federation_sup,rabbit_sup,<0.274.0>] message_queue_len: 0
messages: []
links: [<0.455.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 610
stack_size: 27
reductions: 241
Cause
The RabbitMQ service runs out of file descriptors and brings down the node; however, the RabbitMQ service may remain running. While in this state, RabbitMQ stops processing new messages, but may not produce a crash dump. Instead, the service is no longer able to perform processing functions.
Workaround
Until the official versions are released, a workaround for this issue is available as a download attached to this article (rabbitmq-performance-master.zip).
Note: This script attempts to access the following servers using the REST interface ports: Archiver, Broker, Concentrator, Network/Log Decoder, Endpoint Hybrid, Network/Log Hybrid, VLC, and Malware. This means that the REST interface ports must be accessible to the NW Admin server for this script to function correctly, see the Deployment Guide: Network Architecture and Ports for more information about the REST ports. If the REST interface ports are not open between the NW Admin server and the other RSA NetWitness appliances, see the Manual Change Adjustment method later in this document.
Automated REST Adjustment
- Download the rabbitmq-performance-master.zip from this knowledge base article.
- SCP this script to the NW Admin server and extract it.
- On the NW Admin server, run the script as shown below.
- The script will prompt for the admin user account's password when connecting to the REST interface. If there are issues with the password, the deploy_admin can be used instead. See the -h/--help option of the script for details.
Note: A log file for the script will be created in the same directory where the script is run from. Debugging can be enabled by going into the script on line 39 and changing the logging.INFO to logging.DEBUG.
- Once the script is complete, restart the rabbitmq-server service on the NW Admin server.
If there are issues using the automated script, see the Manual Change Adjustment section below.
Manual Change Adjustment
If the fix for this issue cannot be performed using the automated script or there are special circumstances that prohibit the script's usage, it is possible to manually perform the changes on the services.
- Log in to the RSA NetWitness UI.
- Go to Admin > Services > <Service Name> > Actions > View > Explore.
- Open the node /services/
:5671:amqp/config. - Change the following options to the values shown below:
- auto.open = false
- reconnect.interval = 0
- Restart the rabbitmq-server service on the NW Admin server.
If there are issues with the process above, contact RSA NetWitness Support for further assistance.
Resolution
The fix for this issue will be to upgrade to either RSA NetWitness Platform 11.4.1.2 or 11.5.0, once those versions are available.
Product Details
RSA Product Set: NetWitness PlatformRSA Product/Service Type: All Servers
RSA Version/Condition: 11.4.x
Platform CentOS
O/S Version: 7
Approval Reviewer Queue
Technical approval queue