Skip to main content
 
 
Splunk Lantern

Using Splunk DataSense Navigator

 

It can be difficult for some customers to achieve quick value with Splunk due to the complexities and time involved in converting raw data into actionable insights. In many Splunk deployments, the tasks of bringing in data and normalization often happen separately from building use cases, sometimes managed by different teams or in different project stages. This disconnected method can cause implementation teams to overlook important use case considerations, leading to time-consuming adjustments later.

Solution

Splunk's DataSense Navigator (DSNav) addresses this problem, using a data-centric methodology that allows you to explore potential use cases within your environment, using your existing data resources. DSNav integrates the analysis of use case and data preparation requirements right from the start. This organized approach reduces delays and eliminates the need for time-consuming rework.

  • DSNav is designed for easy access and speed. For example, it automates the identification of use cases and data models, making setup and management processes more efficient. It also adapts to existing environments, broadening the range of applicable use cases and adapting to your needs.
  • DSNav introduces a new standard with event-specific subsampling techniques. Unlike other tools that analyze large datasets in aggregate, the DataSense Navigator looks at each event individually. By using key fields and statistical methods to manage data size, this approach ensures an accurate understanding of whether specific information is present or absent in the data, doing so more efficiently and without the typical computational challenges associated with such precision.
  • Finally, DSNav can help with all the following scenarios:
    • Use case development and data onboarding. Ensures efficient validation of new data sources against all or selected use cases, confirming their compatibility and meeting necessary conditions for effective use case implementation.
    • Efficient use case discovery. Accelerates and refines use case discovery, providing recommendations for use case activations based on existing data quality. It also aids in future use case planning, especially when partial prerequisites are met, potentially reducing the time to value compared to use cases with no prerequisites fulfilled.
    • Data quality validation. Examines the presence of events within data models that contribute to Splunk Enterprise Security (ES) use cases, and facilitates a thorough examination of event presence within data models crucial for ES use cases, providing a clear picture of data quality.
    • Priority setting for data quality improvement. Identifies gaps in field extractions or improper configurations from the source, helping to prioritize data quality improvement tasks. For example, if certain use cases are prioritized, any data or tag attribution gaps will be highlighted, aiding in precise effort planning. It also allows tracking modifications at the field/tag level rather than just the data source level, providing a clearer picture of effort needed for different data sources.
    • Post-configuration push impact analysis. Assesses the impact of a configuration push on the use case coverage potential of a particular data source. Variations in coverage post-deployment, assuming consistent data sampling intervals, can indicate success or the need for additional adjustments.
    • Facilitation of a structured workflow to users of different skill levels. Provides a structured yet interactive workflow design which accommodates differently-skilled users, guiding them to similar conclusions on the gaps between onboarded data and target use cases.

Getting started

Prerequisites and version compatibility

  • DSNav is crafted in simple XML to maintain compatibility with older versions of Splunk Enterprise Security.
  • DSNav is compatible with Splunk Cloud Platform, Splunk Enterprise (on-premises), and bring-your-own-license (BYOL) cloud deployments. Splunk Cloud Platform users need to perform an additional step at stage 3 of the search during data subsampling, where you'll need to specify if the deployment is on Splunk Cloud Platform, as this dictates the hosts for the rest call. This distinction is due to the inability to perform rest calls against indexers on Splunk Cloud Platform. Incorrect specification can cause issues in stage 3 where target indexes are being populated.

App Dependencies

  • Although there are no specific app requirements, having Splunk Enterprise Security (ES), Splunk Security Essentials (SSE), and Enterprise Security Content Updates (ESCU) installed is recommended for a broader base of saved searches for tool analysis.
  • The main technical necessities for the tool include the presence of searches formatted in the correlation search format and the Common Information Model (CIM). Even without ES, the tool should function if the target environment has CIM add-ons and other applications with use case references similar to SSE.
  • Access to Lookup Editor can be beneficial for allow listing specific correlation searches tailored to your analysis scope.

Installation and configuration

Installation guidelines

  • Installation on the search head layer is recommended.
  • In on-premises setups, installation can be executed via the UI on the search head or distributed to the Search Head Cluster (SHC) through the deployer.
  • In Splunk Cloud Platform, installation is currently managed through the Admin Config Service (ACS) API, under the self-service app installation (SSAI) protocol.

App components

  • The solution primarily features an XML dashboard housing three KV stores, which store contexts for three computational stages. All necessary searches powering these computations are enclosed within this dashboard, facilitating easier hotfix applications devoid of CIDC and deployment procedures.
  • DSNav operates through four primary distinct stages:
    • Stage 1: Identification of data model-driven correlation searches
    • Stage 2: Pinpointing data models
    • Stage 3: Data subsampling
    • Stage 4: Subsample search mapping and event origin exploration
  • The computations carried out in stages 1 through 3 above are preserved in the KV store, which then informs the results in stage 4. Typically, stages 1 and 2 do not require frequent re-generation of their KV store contexts, as alterations to the correlation searches and data models are not common occurrences.

Application walkthrough

The app contains four configuration stages, and the instructions below are divided based on these four stages within the app. Each stage has mandatory and optional steps. Going through the optional steps isn't mandatory, but it could enhance your understanding of the app's functionality and aid in preliminary troubleshooting.

Open the app to reach stage 1. You'll simply scroll to get to the rest of the stages. You might need to navigate back and forth between stages, especially between stages 3 and 4, if you choose to analyze different data sources.

Stage 1

Mandatory steps

Stage 1 - Analysis of Saved Searches for Data Model-Driven Use Case Identification shows you the total assessable correlation searches identified in your environment.

On initiating the tool, especially for first-time users, only the first panel of Stage 1 should display a number. This number represents the count of identified correlation searches conforming to the SPL syntax used for querying both accelerated and non-accelerated CIM data models, like "tstats" or "datamodel".

The screenshot below shows what the page might look like upon initialization. Note that the “Count of the Stage 1 Context“ reads zero. This means that no results are currently written into the associated KV store context.

874e3bd9-4659-42dc-98a4-33055476a1ad.png

Mandatory steps

  1. Toggle the Context Updater (to the right of the count of Total Assessable Correlation Searches Identified) to Update to save this result to the KV store. Although not required, it's recommended to switch the Update setting back to View after seeing the confirmation message. This prevents the count in the Context Updater from being overwritten if the page accidentally refreshes.

    The screenshot below shows the Update toggle and the associated confirmation message.

    172b0adb-b3e4-4594-8115-2437dbb33e2b.png

Optional steps

To access these optional steps, look at the panel on the right side of each stage. For example, in Stage 1, you'll find a "Stage 1 Analysis Breakdown" panel which contains these optional steps. By default, the optional steps are set to "Hide," since they are not essential for the main computation process. You can change the setting from Hide to Show in the Stage 1 Analysis Breakdown to show the corresponding charts and analyses.

  • Review visualizations. The first chart, shown in the screenshot below, provides an overview of the apps from where the analyzed searches originate. In an environment with SSE, ESCU, and ES, most use cases are likely to come from these sources. More mature environments might also have existing correlation searches or searches from other applications. This section also includes two charts of count overviews: one that indicates the count of searches referencing specific CIM data models (note that a search can reference several data models), and another that sort searches by use case category in descending order. Remember, the use case category is the string prefix in the naming convention for Splunk Enterprise Security.70076c2c-60e7-4ab9-aada-933741bfbc32.png
  • Review Search Context Explorer. The Search Context Explorer, shown in the screenshot below, provides a summary breakdown of the searches identified, not what's currently stored in the KV store. Given the volume of searches, several filters are available for narrowing down searches: name, category, data model, residing app, MITRE category code (defined in the annotations of the correlation search), disabled state (where a value of "1" signifies a disabled search), and scheduled status (a value of 1 indicates an enabled, running search as per the defined cron schedule). The Search Context Explorer helps you understand vital parameters about specific searches, typically focusing on data model references, field references, and the disabled/scheduled state of the searches. A preliminary analysis at this stage can prepare you for assessing match outcomes in the Stage 4.dc171d92-4cb9-4ba0-9610-dbaf8ddf2b44.png

Stage 2

Mandatory steps

Stage 2, Identification of Compatible Data Models and Corresponding Tag Attributions, shows you the different data models present in your environment that are also referenced by the existing use cases. Initially, the data model count in Stage 2 will show as 0 since the base search that powers this step initializes as the dashboard loads. However, at this point, the context from Stage 1 hasn't been generated yet.

  1. To prompt a refresh, navigate to the Refresh Search dropdown, as shown in the screenshot below. Select the alternate option to the one currently selected, typically Search Retry. Both options serve the same function - to re-trigger the base search. (Note: If you want to refresh this step in future, you'll need to toggle between the options.) After refreshing, the data models populate along with tags that correspond to these models. These tags will be crucial in Stage 3 for identifying data model attributions.
  2. Click Update on the adjacent panel to the right, similar to the process in Stage 1. Toggle it back to View after you see the confirmation message.
  3. Next, you'll update an allow list token. The purpose of this field is to identify existing fields within the data models, filter out fields belonging to unused data models, and apply an additional filter to exclude fields not referenced by any of the analyzed correlation searches during the Stage 1. These values are subsequently used for event in event analysis in Stage 3. Under the dropdown Fields Identification Status, "Successful" should appear, which indicates completion of the process, accompanied by a value in brackets representing the number of identified fields. Click Successful. This selection triggers Stage 3.

beb83c0e-b089-43b9-b04e-546e2883f136.png

Optional steps

  • Review table showing the identified data models. Unlike the content in the KV store, this table offers a means of validating the outputs of this Stage 2 before committing the results to the KV store. Alongside each data model, the table displays a count of identified fields pertaining to that data model, as well as the associated tags. It's important to note that these values also reflect the data model's dataset hierarchy. For example, when examining context entry for the Change data model, the table will also show tags and fields from hierarchy subsets like Audit, Endpoint, Network, Account Management, and Instance.64971919-89a3-4e19-b51d-8d4b73b06a59.png

Stage 3

Mandatory steps

Stage 3 - Event Specific Subsampling, as shown in the screenshot below, subsamples a range of data to pinpoint distinct event structures, minimizing computational load while ensuring accuracy when calculating matching findings. Each event in this sample is examined to identify data model attributions based on the associated tags. Following this, the information from data model mappings is used to extract specific data model fields into the context for analysis in Stage 4.

  1. Adjust the Time Picker input, selecting a time range for analysis. Starting with a shorter range like 1-4 hours is recommended, adjusting it based on search completion speed or the number of matches observed.
  2. Adjust the Sampling Precision input by incorporating additional data variation parameters. The moderate setting usually offers a good balance between performance and accuracy. The base settings for deduplication are on source, source type, index, tag, and line count.
    • Moderate introduces "punctuation" as a filtering parameter.
    • Advanced adds a calculated field similar to punctuation but captures all the occurrences within an event, and is more optimized for XML and JSON events.
    • Comprehensive incorporates character length.
    • You should Increment to a higher precision only for complex data sources like XML or JSON.
  3. Adust the On-Prem or Splunk Cloud input (for Splunk Enterprise or Splunk Cloud Platform). This input tailors how indexes are dynamically discovered based on your Splunk configuration. An additional option within this dropdown caters to environments where REST calls to indexers are restricted, fetching an index validation list through a standard Splunk search instead.
  4. Adjust the Target Index. Though multiple indexes can be selected, sticking to one index initially helps with understanding how the tool manages different data loads.
  5. Adjust the Target Sourcetype. This section auto-populates based on the indexes selected previously. The strategy for selecting source type mirrors that of indexes.
  6. Adjust the Sample Ratio. This is used to reduce the volume of data to a manageable size, particularly useful when working with large and/or complex datasets. This runs before the Event Specific Subsampling approach is applied. It is generally recommend to adjust this to find the appropriate ratio for your dataset and the nature of your analysis.
  7. When configuration is complete, two values display: one representing the non-sampled data set count, and the other representing the count post-subsampling. There's no standard count as data sources vary extensively in distinct variety. It's beneficial to experiment with settings based on dataset volume and search performance, gauging by the search completion time. Review and adjust the previous configuration options as necessary.
  8. After the values are generated, click Update on the adjacent panel to the right. Toggle back to View post confirmation message receipt to complete this stage's configuration.

clipboard_e9915530e9b37c5f2a6b68f2be4578d16.png

Optional steps

  • Review Sample Event Breakdown - Tag Counts by Sourcetype, and Sample Event Breakdown - Data Model by Sourcetype visualizations. These visualizations provide a high-level overview of the data distribution across the Common Information Model (CIM) mappings, indicating the use case categories each data source might support.
    • Sample Event Breakdown - Tag Counts by Sourcetype shows the tag distributions across the sample events, presenting a view of the variety of data model attributions.
    • Sample Event Breakdown - Data Model by Sourcetype aligns the distribution of these events to data models based on the tags identified. When multiple source types are specified, the results are further segregated by source type.

f21c82e4-98ce-4e14-a3dd-a141b1dca7be.png

Stage 4

Stage 4 - Correlation Search Success Mapping to Subsampled Datasource analyzes the cumulative results generated from the previous three stages. It merges the enriched data subsample obtained in Stage 3 with the insights from Stage 1 around the requirements for each search, and provides the match results.

Upon the initial run, this step might display a "0", or it could refer to a context from a previous version. This occurrence is due to the initialization of this stage when the dashboard is loaded, during which the context it references may not be the latest. To address this, a similar approach as in Stage 2 is advised:

  1. Navigate to the Refresh Search dropdown menu.
  2. Select the option opposite to the currently selected one to trigger the base search to run again, updating the context to the most current state. Following this refresh action, the analysis in Stage 4 should now reflect the updated context, ensuring the match results are based on the latest data and requirements derived from the earlier stages. This step ensures the analyses and findings are accurately aligned with the current state of the data and the defined search requirements. Three values appear, each signifying different levels of use case match scenarios with the sampled data:
    1. Complete Use Case Matches. Shows the count of use cases where the sampled data met all the stipulated requirements of the use cases.
    2. Partial Use Case Matches. Shows instances where the sample events correctly had the tag attributions and satisfied some, but not all, of the field requirements.
    3. Failed Use Case Matches. Shows the count of use cases where the data either did not meet the data model attribution requirements and/or none of the field requirements were satisfied.

e7643c76-3d19-492b-8307-5325826daa42.png

It's possible for a search to register as a full match while also having partial matches. Because of this, the cumulative counts across the three match categories might exceed the total count of searches, which is important to keep in mind during your analysis of these results.

Optional steps

  • Review Matched Searches by Model and Sourcetype, and Matched Searches by Category and Sourecetype visualizations. These depict the actual coverage of the defined use cases in your environment, segregated by data model and use case category. You can use the Match Result dropdown to switch the visualizations based on match conditions, which allows for viewing distribution across different match scenarios. Different bar colors represent different source types.d7a0035c-7d6a-4cc0-b533-41accef3d64b.png
  • Review Interactive Match Results. This table breaks down the match results. It shows 15 fields in total, with some hidden initially to maintain a clean visual interface. To make hidden fields visible, click on the column you want to unhide and then select the 'X' on that column. The complete list of fields, including those hidden by default, are shown in the left-hand box in this screenshot:clipboard_e5b99c27e1c099708898b46aae55e3a36.png
    • To manage visual clutter, the dropdown option Filter Drilldown Results by default narrows down the display to top results by the number of field matches per source type. This is especially beneficial when analyzing multiple indexes/source types with numerous distinct event variations.
    • A key field that is hidden by default is Manual Flagged for Review, visible in the screenshot above under the "Hide columns" area. This allows users to tag particular searches for review using apps like "Lookup Editor", for instance, flagging certain use cases with a custom tag (like 1) for focused review. By default, it's set to 0, but you can modify it to facilitate ease of filtering.
  • The filter options from Stage 1 are retained, with additional filters introduced, as shown in the screenshot below. 83aaf0df-4f56-4c9b-9ac0-fc8de6aa32d0.png

    Key fields and columns are:

    • Matched Fields Reqs from Data Sample. Lists all the fields that matched from the various data samples, which aids in comparison with the Field Reqs from Search to spot gaps. This field aggregates field matches across all sampled event variations, so when analyzing partial matches, refer to the drilldown column to evaluate the matches on a per-event variation basis.
    • # of Samples Matched on DM/Field Reqs. Shows the total count of event variations sampled in Stage 3 that matched or partially matched the use case conditions.
    • Drilldown. Hidden by default. It serves multiple purposes:
      • The Events Matched parameter breaks down the event variations that make up the number of matches noted in # of Samples Matched on DM/Field Reqs.
      • The Fields Matched parameter shows the number of fields matched out of the total required for the targeted use case, within each event variation.
      • You can click the results in the Drilldown column to explore the contributing events and expand on specific match fields identified, as determined by the Fields Matched parameter.

In the screenshot shown above, the default columns are visible along with the enabled Drilldown column, showing a partial match scenario. Filter Drilldown Results is set to Show All, displaying three event variations in the Drilldown column. If it were set to Show Top Results, only the event variation with the highest matches for each source type would be displayed. You can see the two source types that partially meet the field requirements for the given use case. Also note that the highlighted section in the Drilldown column specifies the existence of fields like object_category, action, status, and user. Additionally, you can see that tags linked to the Change data model are specified in the drilldown search.

  • Review Contributing Events Drilldown Analysis. Clicking Drilldown populates the Drilldown Token, which generates a search in this section. Although the events shown here may not precisely match those sampled in Stage 3 due to lesser filtering, this is deliberate as this permits a broader variety of events to be be available for exploration.
    • Some degree of deduplication is still applied to maintain variation in event structure, with the level of deduplication controlled by the "Sampling Precision" setting from Stage 3.
    • Events can be explored in detail, aiding in confirming the accuracy of identified contributing events, and potentially uncovering fields that could fulfill the target use case requirements.
    • The time range for the search is aligned with what's set in Stage 3. If the sampled match variations are sparse or rare for the selected Drilldown option, the contributing events in the drilldown might not appear if much time has elapsed since the context in Stage 3 was generated. This can be remedied by re-running the context generation in Stage 3 or widening the search range.
    • The screenshot shown below shows what you'll see when the Drilldown option hovered over in the previous image has been clicked, initiating the corresponding drilldown search. The parameters in the boxed sections in the screenshot below are the returned events from this search.d1b4b208-e7b9-4991-ad90-79e74da3444a (1).png

Best practices

  • To enhance subsampling speed, it's recommended to narrow down the search time range before adjusting the sampling precision. A practical initial time range is about 1-4 hours, and it's only recommended to extend this if the dataset volume is relatively low, such as less than 100k events in the non-sampled reference.
  • When the option to select multiple indexes and source types is available, it's usually better to review only a few source types at a time, ideally those managed by a single add-on. This approach means you can more easily draw inferences from the tool regarding any necessary modifications to the specific TA in question.
  • The result calculations in Stage 4, derived from KV store data generated in Stages 1 to 3, allow concurrent analysis in Stage 4 while the context in the previous stages is being updated with new information. For instance, if Linux logs were initially analyzed at a 1-hour interval at the minimal configuration setting, this would expedite the data sampling results for further analysis in Stage 4. At the same time, Stage 3 could be regenerated, either with the same data at a longer interval, such as 4 hours, with a more precise sampling setting such as moderate or advanced, or with a completely different data source. This parallel processing minimizes waiting time for outputs as data generation and analysis steps for each phase can occur simultaneously.
  • It's recommended to switch the update KV store option from Update to View after the confirmation message appears. Although neglecting this step usually doesn't affect normal operations, a tool refresh could trigger the regeneration of some KV store contexts which might hinder analysis. This action also helps in reducing impacts on search concurrency.

Known issues

  • The subsampling stage tends to take a longer time for large event sizes. This delay is more noticeable with JSON events (like those from Crowdstrike) and XML events (like those from Windows).
    • When working with JSON or XML files, the results returned might be on the lower side under moderate sampling precision settings. This is due to the utilization of "punct" as an indicator of event variation. Given how the punct field is defined at index time, it might not capture all punctuation fields present in a particular event, especially if the event is large as seen with JSON and XML events. Even though the number of subsampled events is modest, tests with live customer examples show that it rarely affects the accuracy of comparative matches. However, for those seeking higher precision, advanced or comprehensive subsampling techniques could alleviate this entirely.
  • If the app is set to public and is accessed by multiple users at the same time, the KV stores might get overwritten. This issue stems from the current design which does not support concurrent multi-user interactions on the application. A temporary solution is to use the app in a private mode for additional users who which (have? want?) their own instance. This way, the data in the KV store remains intact as interactions from one user instance do not affect the data in another user's KV store. Preliminary tests show that switching to a private version doesn’t hamper the app’s functionality, although more comprehensive testing of concurrent usage is still taking place.
  • You might also notice that some events show a full or partial match, but not all required tags for a specific data model appear in the drilldown search. This is by design. The tool identifies a data model association if at least one of the tag requirements of that data model is met. Here are some underlying reasons for why a partial approach for tags was ultimately considered:
    • Some tags, like "Endpoint", are associated with multiple data models (for example, Change and Endpoint) as well as hierarchical data models within a main data model.
    • While tags referenced by the CIM data models are also used by other vendors when TAs are being created, there's generally overlap in how tag definitions get assigned. However, inconsistencies can arise between vendors, especially if tag attributions are defined by a custom user configuration.
    • Certain data models use two to three tags, all of which must be qualified. This is usually where perceived gaps in the DSNavs identification may be observed.
    • Data models like Endpoint and Performance don’t have a base tag, but rather have different tags for different variations.

    The main aim of DSNav is to ensure accurate field qualifications. The tag qualifiers are intentionally broadened so as not to be overly strict, as tags are often defined based on the existence of certain fields. If the tool identifies events with matches or partial matches but some tags are missing, rectifying this is relatively straightforward given the flexibility in how tag definitions are defined.

    Opting not to enforce the existence of all tags also serves to enhance performance. While filtering out base tags in Stage 2 is straightforward, requiring all tags would increase computation times in Stages 3 and 4, especially while preserving the functionality of showing both partial and full matches. Ultimately, it was found that showing more possible matches by reducing the qualifying criteria for data model attributions to partial instead of full, added value. The aim is to provide a broader view, allowing for more flexible and efficient identification and correction where needed.

FAQ

What is the impact on performance when this tool is actively used in an environment?

  • The tool primarily relies on four base searches for most of its functions when active. All searches are performant with the exception of the subsampling stage, which tends to use more resources, especially when handling a large number of standard events (around 10 million) or when the individual event sizes are large (as in the case with XML/JSON events and usually noticeable around 1 million events). Importantly, the tool does not run any recurring search processes when it's not in use, ensuring no performance impact when not in active usage.

Are there certain situations where the tool might provide inaccurate results?

  • The tool identifies search conditions using a set of regex conditions executed in Stage 1. In some cases, a field referenced by a correlation search might pertain to a calculated field that isn't available when analyzing the indexed event. This scenario may result in a match being labeled as partial when it could have been a full match. While such instances seem to be rare based on preliminary customer testing and are often identified quickly, challenges might arise if there's an attempt to modify the data onboarding of the event to include the calculated field. This could be problematic without understanding that the calculated field is only computed after the index is added to the data model.

Why does this tool only look at indexed events instead of the data model directly?

  • The tool is designed to be deployed after data has been ingested into the Splunk environment but before the index housing that data is allow listed as part of the Data Model Set Up process. This sequence is deliberate for several reasons:
    • Data validation. By positioning the tool at this stage, you can validate that the ingested data conforms to the requirements of your specific use cases.
    • Data fidelity. The tool helps to ensure that only high-fidelity data, which is well onboarded, is promoted to the accelerated data models. This avoids the risk of diluting the quality of the data models with low-fidelity or poorly onboarded data.
  • By aligning the tool in this manner, you preserve the efficiency and integrity of the data models.

Is it possible to use this tool in environments other than ES (Enterprise Security)?

  • The tool's operability isn't confined to ES environments. As long as the environment has CIM data models and some searches formatted in the correlation search format, the tool should function as intended. While ES isn't a mandatory requirement, it naturally meets these prerequisites, making it a conducive environment for the tool.

  • Regarding the tool's applicability in other environments like Splunk Enterprise, Splunk ITSI, etc., there are plans to extend compatibility in future iterations of the tool (likely around Q2 of 2024). Currently, the scope is more focused since non-DMA driven searches usually don't reference a large subset of data sources, minimizing the inefficacy often associated with larger data sets. This narrower scope simplifies the validation of prerequisites for a particular use case, as opposed to use cases referencing a data model which might entail a broader set of data sources.

  • ES was chosen for initial testing of this solution framework due to its breadth, complexity, and the intensive nature of conforming data quality standards to both the CIM model and the use case prerequisites. On the other hand, non-ES environments typically deal with a smaller subset of datasets and only the use case prerequisites, which often results in fewer inaccuracies and challenges. This distinction became clearer through customer environment evaluations, revealing that even customers with lower security maturity levels usually face no issues creating searches that merely reference indexes and source types.

Why haven't more search performance monitoring aspects been incorporated into the tool to track elements like search concurrency, or additional panels to provide more insight on ES health?

  • Currently, there are other tools available that cater to these monitoring needs, along with several initiatives in development slated for future release. While the effectiveness of existing tools and the advantages of an interactive approach (similar to what DSNav employs) could be useful to other suggested applications, the core design philosophy of this tool centers around simplification. Incorporating functions that don't directly align with the core objective of identifying use cases from a sample dataset could potentially distract end users, and might affect adoption especially among users with less proficiency in ES related processes.

  • To elaborate further, the perceived gap between an ideal tool monitoring aspects like search concurrency and the current popular tools addressing this need, might be seen as relatively small. On the other hand, the gap between a tool proficient in identifying use case prerequisites with high accuracy and what's available prior to the advent of the DataSense Navigator is considerably larger, hence justifying the development efforts.

Why is there a discrepancy between the number of events displayed in the drilldown step of Stage 4 and the number of matches identified?

  • The discrepancy stems from the different processing stages. The sample matches in Stage 4 aren't subjected to the same subsampling reduction and filtering as in Stage 3. Stage 4 aims to offer a wider perspective on the potential matches, thus enabling a quicker search process.

How can I ascertain the tool's proper functionality?

  • The event drilldown feature of the tool is designed to fulfill two main purposes. It enables viewing of possible events that either meet or partially meet the requirements of a potential use case, and in this scenario, it helps validate the accuracy of match identification. If clicking the drilldown search returns no results, it could suggest issues encountered during the sampling generation phase. Conducting a visual examination of the drilldown events to check for the presence of field matches can further confirm whether the match conditions have been correctly identified.

  • It's important to note that there might be instances where no contributing events are identified during this phase, especially if such events are rare. The match findings in Stage 4 are obtained from the time interval during which the subsample was created. While the data drilldown utilizes this time interval, there could be a time lapse from when the subsample was taken to when the drilldown was conducted. There might be certain time frames at the peripheries outside the overlap between Stage 3 and the Stage 4 drilldown. If the contributing events fall outside this overlap, they may not appear. This issue can be resolved by either taking a new sample and conducting the drilldown immediately afterward or by opening the drilldown in a separate window and extending the search range to encompass a larger data subset.

Next steps

These resources might help you understand and implement this guidance:

Splunk OnDemand Services: Use these credit-based services for direct access to Splunk technical consultants with a variety of technical services from a pre-defined catalog. Most customers have OnDemand Services per their license support plan. Engage the ODS team at ondemand@splunk.com if you require assistance.