Working with multivalue fields
When working with data in the Splunk platform, each event field typically has a single value. However, for events such as email logs, you can find multiple values in the “To” and “Cc” fields. Multivalue fields can also result from data augmentation using lookups.
If you ignore multivalue fields in your data, you may end up with missing and inaccurate data, sometimes reporting only the first value of the multivalue fields in your results. To properly evaluate and modify multivalue fields to get the results you need, the Splunk platform has some multivalue search commands and functions. Multivalue functions can be used with eval, where, or fieldformat search commands. This article shows you how to use them.
In the examples used in this article, the makeresults command (in Enterprise or Cloud) is used to generate hypothetical data for searches so that anyone can recreate them without the need to onboard data. The default field
_time has been deliberately excluded.
_time is a default field generated when the makeresults command is used.
Within one purchase transaction, Mary bought eggs, milk and bread. She paid for the eggs with cash and covered the remaining items using her credit card. This purchase transaction is equivalent to a log event. The values for each multivalue field are separated by the comma delimiter.
The makemv command is used to split the values of a field that appear like a single value into multiple values within an event based on the delimiter. A delimiter specifies the boundary between characters.
The values in the “groceries” field have been split within the same event based on the comma delimiter. The values in the “payment” field remain the same. The report shows the method of payment for all three grocery items but it does not specify the actual payment method used for each item. To expand the event into three separate events, one for each item and show the exact payment for each grocery item, we need a combination of commands and functions.
Learn more about using the makemv command in Splunk Enterprise or Splunk Cloud Platform documentation.
The mvzip function is used to tie corresponding values in the different fields of an event together. This helps to keep the association among the field values. This function takes two multivalue fields, X and Y, and combines them by stitching together the first value of X with the first value of field Y, then the second X with the second Y, and so on.
The new field, “zipped” is the result of the mvzip function. The values of the groceries and payment fields are properly zipped together before expanding into separate events. At this point, the results are still within one event.
Learn more about using the mvzip function in Splunk Enterprise or Splunk Cloud Platform documentation.
The mvexpand command expands the values of a multivalue field into separate events, one event for each value in the multivalue field. All other single field values and unexpanded multivalue field values will remain the same in each new event.
Mvexpand works well at splitting the values of a multivalue field into multiple events while keeping other field values in the event as is, but it only works on one multivalue field at a time. For instance, in the above example, mvexpand cannot be used to split both “zipped” and “payment” fields at the same time. The mvindex function accomplishes this.
Learn more about using the mvexpand command in Splunk Enterprise or Splunk Cloud Platform documentation.
Having zipped the values and created one field, “zipped”, you can now expand the “zipped” field into multiple events. The mvindex function is a little more intricate. To further tie field values together so that accurate associations are made in the process of expanding the values into separate events, mvindex separates the existing multivalued field into two chosen fields using index values. The following are possible index values using values= a,e,i,o,u:
- Indexes can start at zero if labeling from the first value. For example, a=0 e=1 i=2 o=3 u=4.
- The last character can start with -1. For example, a=-5 e=-4 i=-3 o=-2 u=-1.
- You could have a combination of both index patterns; a=0 e=1 i=2 o=-2 u=-1.
Mvindex is used to assign index 0 to the first value in the group which represents groceries and index 1 to the second value representing payment method so that when the fields are split, the values will not get mixed up. The “split” command is used to separate the values on the comma delimiter. Using mvindex and split functions, the values are now separated into one value per event and the values correspond correctly.
The stats command can also be used in place of mvexpand to split the fields into separate events as shown below:
Learn more about using the mvindex function in Splunk Enterprise or Splunk Cloud Platform documentation.
The mvcount function can be used to quickly determine the number of values in a multivalue field using the delimiter. If the field contains a single value, the function returns 1 and if the field has no values, the function returns NULL.
As with single value fields, keep in mind that you may need a combination of multivalue commands/functions to get your report in the required format that will meet your specific use case.
Learn more about using the mvcount function in Splunk Enterprise or Splunk Cloud Platform documentation.
If there are situations in your data where a field is sometimes multivalue and other times null, see mvexpand multiple multi-value fields that may be null.
Want to learn more about working with multivalue fields in Splunk? Contact us today! TekStream accelerates clients’ digital transformation by navigating complex technology environments with a combination of technical expertise and staffing solutions. We guide clients’ decisions, quickly implement the right technologies with the right people, and keep them running for sustainable growth. Our battle-tested processes and methodology help companies with legacy systems get to the cloud faster, so they can be agile, reduce costs, and improve operational efficiencies. And with hundreds of deployments under our belt, we can guarantee on-time and on-budget project delivery. That’s why 97% of clients are repeat customers.