Skip to content
  • There are no suggestions because the search field is empty.

How to troubleshoot aggregation performance issues in RSA Security Analytics

Issue

Concentrator, Archiver and Warehouse Connector can fall behind when aggregating from decoders sometimes even by several million.
 


Resolution

First off check the time is in sync in  all the appliances. If you require to configure a NTP server, please refer to KB article  a64986.

If the time is in sync and you the device is still aggregating slowly, here some tips on how to improve the performance:
  • If you are aggregating over SSL, the extra work to encrypt the data could be a factor.  Starting with 10.4, you have a choice over which port to aggregate from.  Ports starting in the 560xx range are SSL ports.  The original native ports, 500xx, are non-SSL.  Whenever possible, choose the non-SSL port.  This is especially true when aggregating both meta and logs, like Archiver does.
  • For 10.3 and earlier, run /decoder reconfig to make sure aggregate.buffer.size is optimally configured.  This setting was removed in 10.4 as it's no longer needed.On the target service (the service sending the data), run the /decoder whoAgg command (or /concentrator, depending on the service).  This command returns a list of all services that are currently aggregating data.  Typically, the more services that are aggregating, the slower the overall throughput.  Make sure there are no surprises in the list.  Generally speaking, more than three services is not recommended, but is dependent on capture rates.
  • Measure the network bandwidth between the two appliances using NwConsole.  You might want to temporarily stop aggregation before running this.
    1. SSH into the appliance that has the service doing the aggregating (e.g., concentrator or archiver)
    2. Run NwConsole using the following script:
      1. login : [:ssl]
        1. This should be the service you are aggregating from: login mydecoder:50004 admin netwitness
      2. netspeed count=20000
      3. OR for 10.4, you can specify the amount of data to transfer: netspeed transfer=2g
    3. A 1 Gbps link should be around 100 MB/s or more when it isn't congested
    4. Anything dramatically lower than 100 MB/s should be investigated
    5. Don't forget to turn aggregation back on
  • Upgrade to 10.4.  Database performance was greatly enhanced in 10.4 and this provides a significant boost to aggregation, especially when multiple services are requesting data.Starting with 10.4.1, an automatic bandwidth test, per device, is performed whenever aggregation is starting.  The results of the bandwidth test can be viewed as informational logs.  If the bandwidth test returns a poor result (e.g., slow link), a warning will be output in the logs to aid in troubleshooting.
    1. The bandwidth test will only be performed (at most) once an hour and only when aggregation is starting.  It can take up to 10 seconds (but no more) for it to finish.
    2. If for some reason the test is causing issues, it can be skipped by adding "skipBandwidth=true" to the device's options.  For existing device's, take them offline and change the config node under /concentrator/devices//config/options.  When adding a device, you can specify this with the options parameter to the device add command: /concentrator add ip= port= username= password= options="skipBandwidth=true"
    3. Example logs:

(i) 2014-Oct-06 08:22:09 [Bandwidth] Performing bandwidth test to device 192.168.1.70:50004...
(i) 2014-Oct-06 08:22:14 [Bandwidth] Received 100 MB at a transfer rate of 103.7 MB/sec or 869.9 Mbps from device '192.168.1.70:50004'

  • ​Another potential speed increase is to compress the stream.  It is recommended compressing from the client (e.g., Archiver, Concentrator, WC, etc.) by changing the connection info under /services.  For instance, on an Archiver, open explorer view and navigate to /services/ /config and change  compression and  compression.level (10.4+ only).  Set compression to something like 32767 and compression.level to 3 or less.  32767 means don't compress unless the block of data exceeds 32k and compression.level indicates how much time to spend compressing.  A value of 1 is the fastest but also provides the worst compression.  But for our purposes, faster is better.  Any compression that can be done quickly means less data that needs to be encrypted.  Which can be an overall win, but you should measure the overall performance before and after to be sure.

Notes

  1. Why is aggregation noticeably slower over SSL ports?
  • First off, SSL encryption is always going to be slower than not encrypting data.  It's the nature of the beast.  Encryption is hard work.  It's CPU intensive.  It also changes the dynamic of the aggregation stream - it adds overhead and additional protocol handshakes that are not present in the non-encrypted stream.  This adds latency, which affects performance.
  • SA Core (NextGen) uses the boost ASIO network library with OpenSSL to encrypt it's network communications.  AES encryption uses chain block symmetric encryption, which means you can't encrypt the current block of data without first encrypting the previous block.  So, this makes it a serial process and not subject to performance improvements using concurrency.
  • The OpenSSL library is already leveraging the AES-NI instruction set, which means it's about as fast as it's going to get using the current generation of hardware.
  • In one hour, a single Archiver can receive over 200 GBs of sessions, meta and logs (not joking, I just measured this).  That's a lot of data that needs to be encrypted on the sending side and then decrypted on the receiving side.  Now multiply that out by a WarehouseConnector and Concentrator client and you start to add up some seriously large data transfers in a short amount of time.  WC actually pulls more meta than Archiver.  Try taking 600 GBs of files and measure the time to encrypt then decrypt them.  That's exactly the pain aggregation is enduring when it's over the SSL port.  What's worse, typically WC is running on the same machine, so you are encrypting and decrypting on the SAME MACHINE for no good reason.  That's a lot of wasted CPU power.
  • 200 GBs an hour is 57 MB/s or 450 Mbps.  That's nearly half the total bandwidth of the 1 Gbps link typically used for the administration port.  Throw in multiple aggregation streams and you've saturated the link.  There are typically 4 1 gig links available, so it's possible to use another port for aggregation to prevent saturation.
  • Starting with 10.4, we added an additional native port.  One port is for SSL (/sys/config/ssl.port, trusted connections must use this port) and the other is unencrypted and much faster (/sys/config/port).  It is highly recommended that all aggregation clients, which require large data transfers, use the faster unencrypted port.  The only possible exception to this rule is Broker.  Broker was designed for aggregation over a WAN and aggregation data transfers are small for a Broker.  However, large data transfers can still happen (downloading pcaps for instance), so Broker can still see a performance gain by using the non-SSL port.
  • Another potential speed increase is to compress the stream.  I recommend compressing from the client (e.g., Archiver, Concentrator, WC, etc.) by changing the connection info under /services.  For instance, on an Archiver, open explorer view and navigate to /services/ /config and change  compression and  compression.level (10.4+ only).  Set compression to something like 32767 and compression.level to 3 or less.  32767 means don't compress unless the block of data exceeds 32k and compression.level indicates how much time to spend compressing.  A value of 1 is the fastest but also provides the worst compression.  But for our purposes, faster is better.  Any compression that can be done quickly means less data that needs to be encrypted.  Which can be an overall win, but you should measure the overall performance before and after to be sure.
  • But Web Browsers receive encrypted HTTPS all the time and it doesn't slow them down?
    1. Well, it does slow them down, but you don't notice it because a little web page and a few images are meaningless at the speed you are clicking URLs
    2. See point d. above

Internal Comments

this information is very useful but difficult to find if a KB is not created. Agreed with Scot Moore (the author) that i can write some KBs so that it would be easier to find it. (https://wiki.na.rsa.net/display/CNEX/SA+Core+FAQ)

Product Details

RSA Product Set: Security Analytics
RSA Product/Service Type: Decoder, Log Decoder, Concentrator, Archiver
Platform: CentOS

Summary

How to troubleshoot when aggregation is slow on a service.


Approval Reviewer Queue

ASOC Approval Group