Queries

QueriesQueries

This topic covers the database query syntax. There are three main mechanisms for performing queries in the database, the query , values , and msearch calls on the /sdk folder on each Core service.

The query call returns meta items from the meta database, possibly using the index for fast retrieval.

The values call returns groups of unique meta values sorted by some criteria. It is optimized to return a subset of the unique values sorted by an aggregate function such as count.

The msearch call takes text search terms as it's input, and returns matching sessions that match the search terms. It can search within indexes, meta, raw packets, or raw logs.

query Syntax `query` Syntax

The query message has the following syntax:

   
query-params = size-param, space, query-param, {space, start-meta-param}, {space, end-meta-param}, {space, search-param} ;
size-param = "size=", ? integer between 0 and 1,677,721 ? ;
query-param = "query=", query-string ;
start-meta-param = "id1=", metaid ;
end-meta-param = "id2=", metaid ;
search-param = "search=", search-string ;
metaid = ? any meta ID from the meta database ? ; 
  

The id1 , id2 , and size parameters form a paging mechanism for returning a large number of results from the database. Their usage mostly benefits developers who are writing applications directly against the NetWitness Core database. Normally, results are returned in the order of oldest to newest data (higher meta IDs are always more recent). In order to return results from most recent to oldest, reverse the IDs such that id1 is larger than id2 . This has a slight performance penalty, because the where clause must be completely evaluated before processing in reverse order can begin.

When size is left off or set to zero, the system streams back all results without paging. For the RESTful interface, this results in the full response to be returned with chunked-encoding. The native protocol returns the results over multiple messages.

The query parameter is a query command string with its own NetWitness-specific syntax:

   
query-string = select-clause {, where-clause} {, group-by-clause {, order-by-clause } } ;
select-clause = "select ", ( "*" | meta-or-aggregate {, meta-or-aggregate} ) ;
where-clause = " where ", { where-criteria } ;
meta-or-entity = (meta_key | entity) ;
meta-or-aggregate = meta-or-entity | aggregate_func, "(", meta-or-entity, ")" ;
aggregate_func = "sum" | "count" | "min" | "max" | "avg" | "distinct" | "first" | "last" | "len" | "avglen" | "countdistinct" ;
group-by-clause = " group by ", meta-key-list
meta-key-list = meta-or-entity {, meta-key-list}
order-by-clause = " order by ", order-by-column
order-by-column = meta-or-aggregate { "asc" | "desc" } {, order-by-column} 
  

The select clause allows you to specify either * to return all the meta in all the sessions that match the where clause, or a set of meta field names and aggregate functions to select a subset of the meta with each session.

The select clause may contain entity names in the place of meta key names. If an entity name is in the select clause, meta items returned by the query will have their key name set to the entity name, rather than their actual meta key name stored in the session. Thus, the names of the meta items returned in the query will match the names of the metas in the select clause. For example, if there is an entity ip that consists of ip.dst and ip.src , then a query containing select ip will only return ip fields, with nothing to distiguish ip.dst meta items from ip.src meta items in the result set.

The select clause may contain renamed meta key names. Any fields appearing in the result set as a result of a renamed key in the select clause will be returned with the meta key name matching the name used in the select clause. For example, if the key port_src is used to rename tcp.srcport , then a query containing select port_src will only return port_src fields, even if the underlying meta had type tcp.srcport .

Note: Usage of renamed meta key pairs in the select clause cannot be combined with fixed-size result paging for a query. Doing so causes discrepancies in the results returned to Brokers. The reason for the discrepancies is that Concentrators cannot return only one of the key values of a renamed meta key pair and still preserve the correctness of the result set for the requested size. Hence, the Concentrator omits renamed meta key pair results to preserve the correctness of the result set, which causes the Broker to pull the result from the next Concentrator and advance the IDs that are returned.

Example: select ip.proto,ipv6.proto cannot be combined with size=10 (a paging query) size=10 flags=0 threshold=0 query="select time,ip.src,ip.dst, ip.proto,ipv6.proto,eth.type,size,payload,lifetime,client,did

The aggregate functions have the following effect on the query result set.

where Clauses `where` Clauses

The where clause is a filter specification that allows you to select sessions out of the collection by using the index.

Syntax:

   
where-criteria = criteria-or-group, { space, logical-op, space, criteria-or-group } ;
criteria-or-group = criteria | group ;
criteria = (meta-key | entity), ( unary-op | binary-op meta-value-ranges ) ;
group = ["~"], "(" where-clause ")" ;
logical-op = "&&" | "||" ;
unary-op = "exists" | "!exists" ;
binary-op = "=" | "!=" | "<" | ">" | ">=" | "<=" | "begins" | "contains" | "ends" | "regex" ;
meta-value-ranges = meta-value-range, { ",", meta-value-range } ;
meta-value-range = (meta-value | "l" ), [ "-", ( meta-value | "u" ) ] ;
meta-value = number | quoted-value | ip-address | mac-address | relative-time ;
number = ? any numeric value ? | ( '"' text '"' )
quoted-value = ( '"' text '"' ) | ( '"' date-time '"' ) ;
relative-time = "rtp(" , time-boundary , "," , positive-integer , time-unit, ")" ;
time-boundary = "earliest" | "latest" | "now" ;
positive-integer = ? any non-negative integral number ?
time-unit = "s" | "m" | "h" ; 
  

When specifying rule criteria, the meta-value part of the clause is expected to match the type of the meta specified by the meta-key . For example, if the key is ip.src the meta-value should be an IPv4 address. Entity names are allowed in any location where a meta-key name is required.

Queries using a meta-key name will match meta items corresponding both to the meta-key name as well as to the names of any "renames" specified for the key. See "Key Renaming" under the Index Customization topic for details on key renaming.

Query OperatorsQuery Operators

The following table describes the function of each operator.

Text Values

The system expects quoted text values. Unless it can be parsed as a time (see below), a quoted value is interpreted as text.

It is also important to quote any text value that may contain - so that it is not interpreted as a range.

For text values, the backslash character \ is used as an escape value. This character is used when you need to search for a value containing quote characters. If you need to search for a backslash character, then the backslash itself must be escaped, as \\ . Note that if you are wrapping the query parameters within another language, such as the parameter fields of the REST interface, you may need to add additional escape levels as required by whatever API or interface you are using to interact with the core service.

IP Addresses

IP addresses can be expressed using standard text representations for IPv4 and IPv6 addresses. In addition, the query can use CIDR notation to express a range of addresses. If CIDR notation is used, it is expanded to the equivalent value range.

MAC Addresses

A MAC address can be specified using standard MAC address notation: aa:bb:cc:dd:ee:ff

Numeric Values

In a where clause, you can specify numeric search values. Numbers should not be surrounded by quotes.

Bucketed Numeric Indexes

Meta keys indexed with bucketing can be used like any other numeric search value. Under most situations such searches will return sessions that have a meta value that exactly matches the requested search criteria.

Special behavior is invoked for queries that select only sessionid , for example a query of the form select sessionid where size = 2048 . Selecting sessionid explicitly bypasses all meta database read operations, and only returns index information. If selecting sessionid only, and if the numeric value specified is exactly equal to one of the bucket values, then the system will return all sessions that match somewhere in the bucket, rather than an exact match. For example, the search term size = 2048 will match all sessions in the 2 KB bucket, which is the range from 2048 to 3171 bytes. However, if the search values does not match a bucket values, then the system will return only matches for the exact byte value. For example, the search term size = 2049 will only match sessions with a size meta value exactly 2049. In this mode of operation, specifying a non-bucket value in a where clause is slower than searching within a bucket value. The 'where' clause parameter to the values API also invokes this optimization.

Using bucketed values in other forms of query does not invoke special behavior. The same is true for the msearch API. For those APIs, the use of a bucketed index in the where clause is evaluated accurately, without special meaning applied to bucket values. To search within an entire bucket using these APIs, specify the bucket range explicitly. For example size=2048-3171 .

More information on how to tell if an index is bucketing is in the topic Index Customization .

Numeric Value Aliases

For numeric values, aliases specified in the index can be used in a query as a quoted string in place of where a literal numeric value would be used; e.g.,

   
select * where service = "NFS" 
  

Numeric value aliases can be used anywhere a numeric literal might be used: as a single value, as the beginning or end of a range, or in a comma-delimited list of values (and/or ranges).

Refer to the topic Index Customization for details of how value aliases can be specified in the index.

Date and Time Expressions

In NetWitness Platform, dates are represented using Unix epoch time, which is the number of seconds since Jan 1, 1970 UTC. In queries, you can express the time as this number of seconds, or you can use the string representation. The string representation for the date and time is "YYYY-mmm-DD HH:MM:SS" . A three-letter abbreviation represents the month. You can also express the Month as a two-digit number, 01-12.

Time values must be quoted.

All times specified in queries are expected to be in UTC.

Relative Time Points

Relative time points allow a where clause to reference a value at some fixed offset relative to the earliest or latest time metas seen in the collection. It can also be used to reference a point in time relative the the current time.

A relative time point expression has the syntax rtp(boundary, duration) .

The boundary is either earliest , latest , or now .

The duration is an expression of hours, minutes, or seconds. For example, 24h , 60m , or 60s . When the boundary is earliest , the duration represents the amount of time after the earliest time present in the collection. If the boundary is latest , the duration represents the amount of time before the earliest time present in the collection. If the boundary is now , the duration represents the amount before the current time.

When the boundary is now , the system clock of the Core service host is used to determine what time it is.

Boundary can be represented as 0 seconds if you wish to specify the relative time point with no duration offset. This is most useful in the case of the now boundary, since it is possible that the highest, latest, time observed in the collection may be much earlier than the current time.

Relative time points can only be used in SDK operations, where there is a collection from which to get the boundaries for earliest and latest time metas.

Relative time points only work on indexed meta types. The default indexed meta types are time and event.time .

Examples:

   
Last 90m of collection time:
time = rtp(latest, 90m) - u

First 2 days of event time:
event.time = l - rtp(earliest, 48h)

Events added in the last hour:
time = rtp(now, 60m) - rtp(now,0s) 
  

Special Range Values

Ranges are normally expressed with the syntax * smallest * - * largest *, but there are some special placeholder values you can use in range expressions. You can use the letter l to represent the lower-bound of the all meta values as the start of the range, and u to represent the upper bound. The bounds are determined by looking at the smallest or largest meta value found in the index out of all the meta values that have already entered the index.

If you use the l or u tag, it should be unquoted.

For example, the expression time = "2014-may-20 11:57:00" - u would match all time from that 2014-may-20 11:57:00 to the most recent time found in the collection.

Notice that it is easy to confuse a range expression with a text string. Make sure that text values that contain - are quoted, and that hyphens within range expressions are not within quoted text.

`group by` Clause

The query API has the ability to generate aggregate groups from the results of a query call. This is done using a group by clause on the query. When group by is specified, the result set for the query is subdivided into groups. Each group of results is uniquely identified by the meta values indicated in the group by clause.

For example, consider the query select count(ip.dst) . This query returns a count of all ip.dst metas in the database. However, if you add a group by clause, like this: select count(ip.dst) group by ip.src , the query returns a count of the ip.dst metas found for each unique ip.src.

As of current version, you can utilize up to 6 meta fields in a group by clause.

The group by clause shares some of the same functionality as the values call, but it offers significantly more advanced groups at the expense of longer query times. Producing the results of a grouped query involves reading the meta from the meta database for all sessions that match the where clause, while a values call can produce its aggregates by reading the index only.

The contents of each group returned by the query are defined by the select clause. The select clause can contain any of the aggregate functions or meta fields selected. If multiple aggregates are selected, the result of the aggregate function is defined for each group. If nonaggregate fields are selected, the meta fields are returned in batches for each group.

The result set of a group by query is encoded with the following rules:

All meta items associated with a group are delivered with the same group number.
The first meta items returned to the group identify the group key. For example, if the group by clause specifies group by ip.src , then the first meta item of each group will be an ip.src .
The normal, nonaggregate meta items are returned after the group key , but they all will have the same group number as the group key metas.
The aggregate result meta fields for each group are returned next.
All fields within a group are returned together. Different group results will not be interleaved.

If one of the group by meta items is missing from one of the sessions matched by the where clause, that meta field is treated as a NULL for the purposes of that group. When the results for that group are returned, the NULL-valued parts of the group key will be omitted from the group's results, since the database has no concept of NULL.

The semantics of a group by query differ from a SQL-like database in terms of what meta fields are returned. SQL databases require you to select the group by columns explicitly in the select clause if you want them to be returned in the result set. The NetWitness Core database always implicitly returns the group columns first.

A query with a group by clause honors the result set size parameter if one is provided. However, due to the nature of the grouping, it puts an additional burden on the caller to page and reform groups if a fixed-size result set is requested. For this reason, you should not specify an explicit result size when making a group by call. By not specifying an explicit size, the entire result set will be delivered as partial results.

group by clauses allow results to be grouped by an entity definition.

The following table describes the database honors configuration parameters that limit I/O or memory impact of a group by query.

`order by` Clause

An order by clause can be added to a query that contains a group by clause. The order by clause causes the set of group
see Where Clauses .sizeThe size of the set of unique values to return. This function is optimized to return a small subset of the possible unique values in the database.id1
sort-valueThese flags control how results are sorted. If the flag is sort-total
flags-param};
the packet data will be read from the Decoder