Responding to security incidents using SOAR
The complexity and the evolving nature of threats far outpaces the defenses that teams can implement. What can the modern Security Operations Center (SOC) do to organize, modernize, and increase the efficiency of its operations? Splunk Security Orchestration, Automation, and Response (SOAR) is a solution to advance security operations and your overall security program maturity.
This article is part of Splunk's Use Case Explorer for Security, which is designed to help you identify and implement prescriptive use cases that drive incremental business value. In the Security maturity journey described in the Use Case Explorer, this article is part of Orchestrate response workflows.
Challenges in the security team's environment
Increasing attacks
New types of threats and more sophisticated threats emerge constantly. The overall threat level continues to grow, and threats continue to develop in complexity. As a defense footprint is developed in an attempt to keep up, the number of security alerts increases as well, which can overwhelm incident response capacity.
Some organizations attempt to counter this by assigning criticality levels to individual alerts and then prioritizing the most serious, but this method is flawed because attacks are often started using several early lower criticality indicators and then build up in stages to a more serious event or breach. This kind of alert triage creates the risk of missing serious threats, opening the business to further risk.
Changing attack surfaces
The evolving IT platforms such Cloud services (IaaS, PaaS, SaaS) and the shift to a more remote based workforce now working through off-network mobile devices are introducing and increasing risk to businesses.
Many organizations are increasing the migration of data and applications to cloud infrastructure, expanding the security perimeter beyond traditional networks. At the same time, modern workforces and changing technology are combining on-premises and cloud infrastructures to expand business functionality and consumer reach. As a result of this, infrastructure is more diverse, and the security team now has systems, processes, and many more types of threats to monitor. The expanding utilization of cloud solutions introduces business critical systems and data outside of the traditional perimeter and your direct control. However, data security, monitoring, and incident response are still your responsibilities.
While most cloud providers have excellent security track records, it is still the primary responsibility of your security team to stay on top of security events such as suspicious account use or attempts to access data outside of permission. Even in the cloud, users and their actions are generally the organization's weakest link.
The shift to new technologies, such as the Internet of Things (IoT) presents a similar challenge and exponentially increases attack surface area. IoT means different things to different organizations, but the risk of compromise is the same regardless of how companies are connecting devices to their infrastructure. Sensors, industrial controls, or smart devices designed to monitor remote machines or facilities, are becoming internet enabled, exposing organizations to the risks of malware and sensitive data breach.
Inefficient incident response processes and workflows
Even among SOC teams running the most practiced and matured programs, high volumes of alerts and security incidents send security teams scrambling. Several factors contribute to this reality:
- Inconsistent response to critical threats: Threats tend to manifest in ways that defy the structure of existing incident response workflows. Threats, as well as the systems they target, undergo constant evolution and grow in sophistication. The amount of threat intelligence also grows and changes constantly. Incident response protocols must adapt as rapidly as the evolving threat landscape.
- Failure to integrate people, process, and technology: A business defense works best when security tools and people are aligned around common frameworks and best practices. Many security solutions and even teams operate in silos, often resulting in analysts missing important alerts because they only see the incident from a single tool perspective. Even with security incident and event management (SIEM) systems such as Splunk Enterprise Security which correlate security event data across multiple systems, there still is a need for manual investigation and to have the ability to pivot through adjacent event data. These types of activities slow incident response and can overwhelm teams, creating a backlog in unchecked alerts where hidden threats may lie.
- Staff churn resulting in loss of institutional knowledge: When expertly trained security analysts leave teams that operate on complex systems and informally defined workflows, incident response is impacted. New analyst onboarding and training also requires an investment of time and money. When people leave, that investment is lost.
- Compliance and regulations affect security policy: New policies and guidelines, driven by changing regulations, mean new obstacles to operational teams and more potentially missed procedures and processes.
Overwhelmed security teams
To address every alert would require incident response team staffing to be significantly scaled up. When an analyst has to assess thousands of alerts daily, that workload becomes time-consuming and overwhelming. SOCs can be impacted by low morale, burnout, fatigue, and resulting churn within their teams. Serious challenges emerge when this happens. Many SOC managers find hiring, training, and retaining good analysts to be a challenge. It can be difficult to recruit qualified individuals, even if the budget is available to do so. The high-turnover SOC may experience a lack of coordination between systems and people, often resulting in an inconsistent response to critical threats. Junior analysts tasked to monitor tools or security solutions they don’t yet fully understand can miss important alerts and threat artifacts. This creates stress on the team and fuels the cycle of staff leaving.
How can Splunk SOAR help?
By using Splunk SOAR, your business can take inputs, apply workflows, and automate repetitive manual tasks, freeing up analysts to perform the most crucial parts of the investigation and remediation of security events. SOAR centralizes capabilities such as case and incident management features, threat intelligence management, essential state of operations dashboards, reporting, and analytics that can be applied across various functions throughout the incident lifecycle. SOAR empowers the SOC by using technology and processes to help human analysts address alerts, which in turn reduces risk.
Centralize operations
Automated incident response and Splunk SOAR offers analysts a centralized security operations platform and interface. Splunk SOAR provides a tool for handling investigative tasks that require the use of secondary systems and processes.
For example, through a unified console, a security analyst can monitor and interpret the most critical notables from Splunk Enterprise Security or the SIEM of their choice, a phishing email box, Cloud based security events, and endpoint systems.
Automate operations
Through the utilization of playbooks, the security team can model its manual repetitive alert response processes and codify them to run on triggers or at certain stages of the IR process.
For example, an incident response process calls for a suspicious file from the phishing inbox to be manually uploaded to VirusTotal for evaluation, and the IP reputation information to be gathered and checked against threat intelligence. A playbook can easily handle the VirusTotal submission and data collection steps on its own to have this information available to an assigned analyst. Pending the results, it can also automatically increase the severity of the issue to get an analyst involved quickly.
Automating tasks shaves minutes of work off of every alert response so analysts can be focused on higher-priority tasks. By automating and orchestrating the sub-tasks associated with incident response, Splunk SOAR can speed up the resolution process significantly, saving time and improving the organization’s ability to respond and resolve incidents.
Tasks that can be automated and orchestrated using Splunk SOAR include:
- Notable investigation involving log gathering and analysis
- Review and analyze threat intelligence indicators
- Update cases, generate reports, and email alerts (e.g., automatically log into multiple systems and entering incident information)
- Understand context and take defensive actions (e.g., implement security controls, update blacklist, disable a user account, etc.)
Splunk SOAR functionality
Aggregation
While many businesses leverage SIEM to aggregate and correlate data, Splunk SOAR reaches farther and to a more diverse set of toolsets.
While SIEM can collect data from logs or events coming from the usual components within your infrastructure, Splunk SOAR can absorb that data as well as act on information from external sources and endpoint security software. This makes Splunk SOAR a more comprehensive aggregation solution because by gathering information from many more sources, it helps to unify your security response across the network.
SOAR also integrates with hundreds of apps through Splunkbase to expand analyst reach and make your incident response processes more efficient and cohesive.
Enrichment
When collecting and processing data, analysts can get to resolutions much sooner when they have context and additional insights. Splunk SOAR integrates external threat intelligence, which helps analysts perform internal contextual lookups or run additional processes to gather further data.
Orchestration
Cybersecurity and IT teams can use Splunk SOAR to combine efforts as they address the overall environment in a more collaborative and unified manner. The tools that Splunk SOAR uses can combine internal data and external information about threats. Teams can then use this information to get to the root of each security incident. This allows them to optimize a structured workflow, through playbooks, by gathering information from various sources and consolidating it to a central case management.
Automation
The automation features of Splunk SOAR help eliminate the need for manual steps and repetition, which can be time-consuming and tedious for any analyst. Security automation can accomplish a wide range of investigation tasks, including managing user access and querying events. This enables the SOAR to proactively complete a single set of tasks or functions without human involvement.
Automation is not an alternative for human intervention in investigations. But it does reduce analyst time spent on simple, repetitive tasks which get performed on all similar investigations. Instead of wasting time on tedious manual tasks and investigating false positives, members of the SOC can utilize their expertise to respond to events quickly and effectively.
Response
Splunk SOAR's orchestration and automation functions combine to provide the response feature of the SOAR platform. With SOAR, an organization can manage, plan, and coordinate how it reacts to a security threat. The automation capabilities of Splunk SOAR help reduce the risk of human error. This makes responses more accurate and cuts down mean-time-to-remediation (MTTR).
Collaboration and information sharing
Response to a security alert is an equal responsibility of many individuals or teams within an organization. Using case management to provide a central repo to support collaboration and information sharing among team members in a controlled manner is essential to effective communication and workflow.
Return on investment in Splunk SOAR
SOC managers can show return on investment in Splunk SOAR in at least three ways:
- Your basic cost-per-alert metric will decrease with the implementation of automated incident response. An increase in daily alert assessment becomes possible through automation and prioritization with the same staff. The cost per alert decreases exponentially when alerts are handled more efficiently and in a greater volume.
- When your team can investigate all the alerts in the incident queue, there is less need to expand your team. Investing in automated incident response avoids staff increases and yields a ROI in the first year.
- The additional analytical capabilities of SOAR facilitate better security capacity planning and staffing budgets. SOAR's overview dashboards and executive metrics deliver extensive visibility into the performance, capacity, and value of a security operations investment. Avoiding over-provisioning of security systems going forward can significantly contribute to ROI for automated incident response.
Managing thousands of security events every day can be an exhausting task for any security team. It is nearly impossible to assess every incident manually. However, missing even one incident can contribute to risk exposure, and that exposure can lead to thousands or more dollars in damages. Automated incident response and SOAR allows security teams to respond to every alert without increasing the size of their staff or increasing turnover and burnout. It also formalizes and documents workflows and responses processes to improve institutional knowledge. Splunk SOAR capability delivers a faster mean time to resolution along with greater risk reduction and threat protection. It accomplishes these goals while reducing costs and extending the capabilities of existing resources.
Next steps
For a comprehensive Splunk SOAR demo or to engage Professional Services for setting up Splunk SOAR in your environment or on Splunk Cloud Platform, reach out to your Splunk account team or representative. In addition, these Splunk resources might help you understand and implement this use case: