Index Customization
Index Customization
This topic describes how to use the custom index file to customize the index. Each NetWitness NextGen service is installed with a default index configuration that is intended to cover the index needs for most users of the product. However, it is possible to index new meta keys in order to use the index with custom content that generated custom meta.
Index Configuration File Locations
The index customization is accomplished by making changes to the custom index file. The location of this file is /etc/netwitness/ng/index-
Concentrator products also include a file that describes the default index configuration: /etc/netwitness/ng/index-concentrator.xml . This file is useful as a template to show how the custom index file is formatted.
If you make customizations to the index in the custom index file, those customizations override any conflict with the default index configuration.
You can make changes to the custom index file while the service is running. When the service receives an index save command, the changes to the custom index file are read and applied to the index.
Changes to the index can only be applied to new incoming data. Data cannot be retroactively reindexed with a new custom index configuration, except by Rebuilding the Index .
Index configuration entries
The custom index file is an XML document. The root element of this document is the language element, and inside there are elements for each meta key to describe each custom index. Each element of the custom index configuration looks like this:
Definitions for each attribute in this element:
- Attribute: name
- Description: The name of the meta key that will be indexed
- Attribute: description
- Description: A human-readable description for the meta type
- Attribute: level
- Description: The type of index that will be created for this meta key
- Attribute: valueMax
- Description: The maximum unique values that will be stored for this key per slice
- Attribute: format
- Description: The format of the data held by all meta items with this meta key name
- Attribute: bucket
- Description: Enable size bucketing
- Attribute: ngrams
- Description: Enable ngram generation
- Attribute: threshold
- Description: Threshold for approximate value merging on ngram indexes
- Attribute: defaultAction
- Description: Default Navigate view action for this key: Open, Closed, Auto, Hidden
The next few sections examine these parameters in greater detail.
Meta names
The meta name used by the index refers to the meta key name present within every meta item in the meta database. These meta names are generated by the Decoders when parsing. Parsers can choose to generate meta with any meta key name. Therefore, the custom index allows you to choose which of the meta items generated by the Decoder are indexed.
Meta key names can be 16 characters long, and contain only letters or the '.' character.
Data Types
When the Decoder generates meta items, it assigns a data type. Each parser can choose the data type of the meta it generates. However, there are recommended and standard data types for each of the default meta keys. For example, ip.src and ip.dst are stored as the IPv4 meta type, and alias.host is stored as the Text meta type. Each parser must agree on the data format for each meta key generated by the Decoder.
When adding a custom index to the Concentrator, the data type of the custom index must match the format of the data generated by the Decoder. If the types do not match, the Concentrator attempts to convert the meta generated into the type specified for the custom index. However, these conversions sometimes fail, and the resulting index can produce undefined results.
Likewise, when many Decoders and Concentrators work together as part of a NetWitness installation, they must all agree on the types for each meta key. Conflicts of meta types between NetWitness NextGen services can lead to undefined behavior.
The following table shows the metadata types supported by the NetWitness NextGen services.
- Type: Int8
- Size in bytes: 1
- Description: Signed 8-bit integer
- Type: UInt8
- Size in bytes: 1
- Description: Unsigned 8-bit integer
- Type: Int16
- Size in bytes: 2
- Description: Signed 16-bit integer
- Type: UInt16
- Size in bytes: 2
- Description: Unsigned 16-bit integer
- Type: Int32
- Size in bytes: 4
- Description: Signed 32-bit integer
- Type: UInt32
- Size in bytes: 4
- Description: Unsigned 32-bit integer
- Type: Int64
- Size in bytes: 8
- Description: Signed 64-bit integer
- Type: UInt64
- Size in bytes: 8
- Description: Unsigned 64-bit integer
- Type: UInt128
- Size in bytes: 16
- Description: Unsigned 128-bit integer
- Type: Float32
- Size in bytes: 4
- Description: 32-bit floating point number, single precision
- Type: Float64
- Size in bytes: 8
- Description: 64-bit floating point number, double precision
- Type: TimeT
- Size in bytes: 8
- Description: Unix epoc timestamp
- Type: Binary
- Size in bytes: 1-255
- Description: Arbitrary binary data
- Type: Text
- Size in bytes: 1-255
- Description: UTF-8 Encoded text data
- Type: IPv4
- Size in bytes: 4
- Description: IPv4 address bytes
- Type: IPv6
- Size in bytes: 16
- Description: IPv6 address bytes
- Type: MAC
- Size in bytes: 6
- Description: MAC Address bytes
When defining a custom index, it is important to use the best data type for the meta. For example, never store IP addresses as Text, since the Text representation takes more bytes than the IPv4 representation.
Index Levels
There are three levels, or types, of indexing: IndexNone, IndexKeys, and IndexValues.
IndexNone
This type of custom index is not really an index at all. Custom index entries with the IndexNone level exist only to define and document the meta key. IndexNone entries can be used in custom Decoder indices to enforce a specific data type for a meta key across all the parsers on a Decoder.
IndexKeys
This type of custom index indicates that the index only keeps track of sessions that contain meta items with this meta key name. However, it does not index any unique values in the meta database for the meta key.
Key-level indices take much less storage space, memory, and CPU time to manage, but they require a lot more work from the query engine when you perform query or values operations using them.
If used in a where clause, a meta key indexed at the key level can only be used to resolve operations such as exists or !exists.
IndexValues
This type of custom index keeps sessions that contain each individual unique value for the meta key. This type of index is also known as a "full index".
This type of index is needed for efficient processing of most where clauses, and for use of this meta key as the fieldName parameter of a values call.
Value Max
Value max is a parameter that can have a very significant impact on the accuracy and performance of a Value-level index.
As a Decoder parses packets or logs, it is allowed to create meta of any type with any value. Usually, these meta items are created from data copied directly out of the packet or log. Therefore, anyone can create unique meta values in response to nearly any event.
The performance of the index is directly dependent on the number of unique values it has found for each meta key. As the number of unique values increases, the rate at which new meta is indexed can decrease, and the speed with which queries are completed decreases. Since any person can influence the creation of unique meta values, it is possible for any person to affect the performance of the index.
The value max parameter limits the number of unique values that can enter the index. Therefore, a malicious user cannot flood the system with a large number of unique values in an attempt to make the NetWitness system not work.
It is important to set a value max on any meta key that may have its value influenced directly by incoming packets or logs.
The value max applies only to values added since the last index save operation.
The limit for how high value max can be set varies from version to version and on the amount of RAM available to the NetWitness NextGen service. The recommended ceiling for value max is 5,000,000 for any meta key. If there are a lot of custom indices, then the value max may have to be lower.
maxLength
The max length parameter is used exclusively on the word meta type. The meaning of the maxLength parameter depends on whether the index is storing N-grams, as indicated by the ngrams parameter. The default and recommended value for maxLength is 5.
Max Length without N-Grams
If N-Gram support is turned off, then the maxLength parameter indicates that search terms need to be truncated so that they will match truncated values in the index and meta database. If this is the case, the maxLengthmust be less than or equal to the corresponding setting for /decoder/parsers/config/token.max.length on the Log Decoder service that is generating word token metas. The index will use the maxLength to properly interpret search terms fed into the msearch SDK function.
Max Length with N-Grams
If N-Gram support is turned on, by setting ngrams="Edge" or ngrams="All" , then the maxLength parameter controls the maximum length of N-Grams extracted from the meta item. In this scenario, the maxLength does not have to match the length of word meta items generated on the Log Decoder.
minLength
The minimum length parameter is used exclusively on the word meta type. It only has an effect when N-grams are generated. It indicates the smallest length N-gram that will be extracted from the word meta items. The default and recommended minimum N-Gram length is 3, which means that searches against the word index must have at least 3 characters.
ngrams
The ngrams parameter is used exclusively on the word meta type. N-gram indexes extract information that allow for fast lookup of searches that only match part of the word. For example they allow for finding 'ball' inside the word 'basketball'. If set to the value of all , then the index will create entries for all N-grams within the word meta values. The minimum value of N is specified by minLength , and the maximum value of N is specified by maxLength .
The ngrams parameter also supports the value edge , which indicates the index will only store N-grams that appear at the beginning of a word. Edge N-grams are useful for type-ahead search matching, and take less space than storing all N-grams. However they are not useful to locate matches inside the word or at the end of the word.
The ngrams parameter supports the `allvalue` value for the text format meta keys. It means that the index for a meta key will store `all` N-grams within the meta values and also `IndexValues` limited by ValueMax.
This index type enhances the search capability on overflowed index values due to Value Max limits. The N-grams index provides the ability to search any meta value and the Values index provides the ability to retrieve top N available values.
The `minLength` parameter specifies the minimum value of N, the `maxLength` parameter specifies the maximum value of N, and the `ValueMax` parameter specifies maximum unique values. The following are some guidelines to follow while using these parameters:
-
It is recommended to set minLength=3 and maxLength=3 for compact index storage of N-grams and also use ValueMax to limit value index storage. When compared to Text format keys indexed by IndexValues and ValueMax=0 (unlimited) this N-gram index configuration provides better search functionality with compact index storage and memory usage.
-
The `contains` operation in queries runs faster for meta keys indexed with this Ngram index type when compared to IndexValues.
-
As the index type uses both N-grams and IndexValues for the same meta key, it increases the index memory and the index storage usage for the meta key and eventually reduces index retention. Hence it is recommended to choose this index type `only` for desired meta keys to consider storage and index retention.
-
When you switch to this N-gram type and if the new behavior is required on the whole index, you must perform a re-index.
N-gram indexing has a major impact on the functionality of the text indexes. Using the N-gram settings of 'all' or 'allvalue' N-grams with maxLength 3, a meta key index will consume approximately 2 times more space than if N-grams were not enabled.
Note: Note: When you switch from ngrams 'allvalue' or 'all' to IndexValues, then you may need to consider re-index as index slices created before the configuration change would be ngram indexes and the values call would return ngrams.
N-gram indexing has a major impact on the functionality of the text indexes. Using the N-gram settings of all N-grams with maxLength 3, a meta key index will consume approximately 2 times more space than if N-grams were not enabled.
In the default index configuration, only the word meta key has N-gram indexing enabled. This meta key is used to index text tokens extracted from unparsed logs on the Log Decoder.
The N-gram index mode supports a 'threshold' tunable parameter that controls the precision of the index. The threshold is used to merge similar index values together depending on how closely the set of indexed sessions matches. Values greater than 0 and less than or equal to 1.0 are accepted. A value of 1.0 means that the index will only merge values if they were found in the same set of sessions. Higher values mean that the index will merge fewer values, at the expense of requiring more time to create the index during aggregation. Lower values mean the index will merge more values together, at the expense of longer search execution time due to more database access. The threshold parameter does not affect search accuracy.
Numeric Bucketing
Indexes on meta formats that are unsigned integers, specifically UInt32 and UInt64, can make use of size bucketing to improve performance.
Size bucketing rounds down the size values in the index to their nearest traditional byte unit of information. Enabling this option on a numeric index reduces the number of unique values to track in the index, which improves aggregation and query performance.
The bucketing option is enabled by the boolean parameter bucket on the key element. bucket may have the value 0 , for off or 1 for on. The default is 0 .
Examples of bucket number values:
- Raw Value: 0 - 1,023
- Value Stored in Index: 0 - 1,023
- Explanation: Values 0-1023 are stored unmodified
- Raw Value: 1,024 - 1,048,575
- Value Stored in Index: 1 KB, 2 KB, 3 KB ... 1,023 KB
- Explanation: Values under 1 MB are stored in 1 KB buckets
- Raw Value: 1,048,576 - 1,073,741,823
- Value Stored in Index: 1 MB, 2 MB, 3 MB ... 1,023 MB
- Explanation: Values under 1 GB are stored in 1 MB buckets
- Raw Value: 1,073,741,824 - 1,099,511,627,775
- Value Stored in Index: 1 GB, 2 GB, 3 GB ... 1,023 GB
- Explanation: Values under 1 TB are stored in 1 GB buckets
Key Value Aliases
Value aliases can be specified for keys. Value aliases are text representations that correspond to specific values for a key. These text representations may be easier to remember and more convenient to display. Aliases can be used in the rule/query language (see Queries) and are accessible via the SDK.
Value aliases are specified using the aliases and alias elements:
OTHER
FTPD
FTP
SSH
TELNET
SMTP
⋮
Key Renaming
The index language supports the concept of key renaming. This feature is used to provide backwards compatibility for new key names to deprecate and replace old key names. A renaming is achieved by adding rename elements to the key. This has the effect of indicating the parent key renames the key in the rename element. For example, the key definition below defines a new key named port_src that renames the key tcp.srcport .
The rename element indicates to the database that uses of the parent key, in this case port_src , will include both meta items with type port_src and meta items with type tcp.srcport. Thus, new meta items can be added to the database and queried using port_src , and such queries will return information that was previously stored in tcp.srcport as well.
The rename element accepts a single attribute, name , that refers to a previously defined key.
Keys referred by rename elements must have the same type as the parent key.
Keys referred by rename elements must have the same index level as the parent key.
If a key is redefined in a custom index file, and the redefined key contains rename elements, then those rename elements replace any previously defined rename elements.
Note: Usage of renamed meta key pairs in the select clause cannot be combined with fixed-size result paging for a query. For more information, see the Queries topic.
Entities
The index configuration is used to define entities. Entities provide a convenient way to work with several meta keys at the same time. An entity definition is an alias that groups together the results from other meta keys. You can use an entity definition anywhere you would use a normal meta name. The primary use for entities is to organize similar meta types into a single, easier to use, meta type. For exampl ,,,,,,, ,,,,,,, the default NextGen database language includes distinct meta types for IP source and IP destination. You could define an entity that represents the combined set of all IP sources and destinations using an entity element:
NetWitness meta key formats supported by GENEVE parser are as follows: Int8, UInt8, Int16, UInt16, Int32, UInt32, Int64, UInt64 ,,,,,,, ,,,,,,, ,,,,,,, ,,,,,,, ,,,,,,, but value attribute is not defined, then meta key referred by keyref will have value of id.
Each geneve class element can have 0 or more type element defined. The class type element accepts the following attributes:,,,,,, ,,,,,,, millisecondsdirection(Optional) Packet level option type. Applicable for types that provide information about the packet stream direction: client, serveroverride(Optional) Applicable for Packet level option type - directiondisable(Optional) Disables the Option Class Type,,,,,, it will create a meta key referenced by the keyref attribute. The meta key format will correspond to the meta key referenced by keyref attribute. A few GENEVE Option types provide more contextual information about the packet. For example, the timestamp when the packet was captured and direction of the packet originating either from client or server. For timestamp option types, the units attribute provide information about timestamp unit in seconds or milliseconds.,,,,, ,,,,,,, ,,,,,,, define it in the index-decoder-custom.xml file similar to the meta keys defined. If the GENEVE option exists in index.xml file, then all GENEVE Options configuration for that class will be overridden.
Example of GENEVE configuration for Netskope:,,,,,, ,,,,,,, ,,,,,,, ,,,,,,, or an error occurs. The format of the meta key referenced by class node should be Text.,,,,,,, a capture restart is required after performing index save and parser reload for the changes to take effect.,,,,,,, ,,,,,,, ,,,,,,, do the following:,,,,,,, ,,,,,,, ,,,,,,, ,,,,,,, ,,,,,,, select index and right-click to select properties.


