This guide focuses on incident management from a cybersecurity perspective, exploring the role of Splunk Security products in this process.
Aim and strategy
Incident management helps cybersecurity, DevOps, and IT professionals prepare for and effectively respond to both known and unknown cyber threats or infrastructure events. Incident types can range from simple operational issues to data breaches or even severe cyberattacks.
While there are many different ways to respond to different types of incidents, with policies, tools and service-level agreements (SLAs) varying across organizations, incident management processes provide a framework that allows any organization to effectively identify threats and their root causes, then respond and recover from post-incident effects.
Splunk customers who are able to advance and mature their incident management capabilities can reduce risks associated with IT complexity and the increasing frequency and sophistication of threats. Good incident management practices are essential to prevent service interruptions that can be extremely costly, leading to regulatory fines, brand damage, and customer attrition. Prompt detection and response to incidents can minimize the impact of financial, reputational, and operational damages. It's also important to learn from the aftermath of incidents so your organization can prepare for future incidents.
The full benefits of a properly implemented incident management process can include:
- Reduced downtime. Rapid assessment, identification, and resolution of incidents minimizes impact on business operations.
- Consistent approach. Incident management processes provide you with a clear and methodical plan of action to follow when incidents occur, helping you to utilize your manpower, software tools, and resources in a consistent and effective way.
- Improved communication. Good communication prevents duplication of effort and ensures everyone understands who, what, and how they should be contributing when responding to incidents.
- Strengthens overall security posture. Incident management exposes and addresses vulnerabilities, helping you to increase your organization's cybersecurity resilience and protect against future threats.
- Establish performance and service level agreements. Performance metrics with predictable turnaround times provide reliability, and metrics can be driven to improve over time.
- Prevention of future incidents. Identification of root causes and resolution helps prevent the same incidents from reoccurring.
- Compliance. Incident management processes can help to ensure compliance with regulations such as GDPR, PCI DSS, and HIPAA, which can be especially important for critical sectors such as financial services and healthcare.
What are the differences between the aims and strategy of incident management versus cybersecurity incident response?
Although related, incident management and cybersecurity incident response are two distinct concepts. Incident management is a broad term that encompasses the management of all types of incidents, whether related to cybersecurity or not. On the other hand, cybersecurity incident response refers specifically to the process of addressing incidents that involve information security threats or breaches.
There are some key differences between incident management and cybersecurity incident responses:
- Scope. Incident management covers many types of incidents faced by organizations, such as natural disasters, power outages, or physical security incidents. Cybersecurity incident response focuses on responding to incidents that impact an organization's information security.
- Team. Incident management teams are cross-functional with representatives from different parts of the organization, such as IT, operations, and business units. Cybersecurity incident response teams are more specialized, containing security analysts, incident responders, and forensic investigators. These teams often transition into larger incident management teams as the cybersecurity incident impact expands.
- Processes. Incident management and cybersecurity incident response both involve established processes for identifying, triaging, and responding to incidents. However, the specific processes and procedures used in each case are tailored to the specific type of incident being managed.
- Tools. Incident management typically involves a wide range of tools and technologies, such as incident tracking software, communication tools, and resource management tools. Cybersecurity incident response typically involves specialized tools and technologies, such as malware analysis tools, intrusion detection systems, and forensic analysis software.
Common use cases
- Incident and event investigation
- Forensic investigations of threat artifacts
- Automated and orchestrated response processes
- Mean time to response (MTTR) reporting
- Business continuity and resilience
An effective incident management team has several key roles and responsibilities:
- Identifying incidents. Incident management teams must identify any issues that could impact business operations as quickly as possible.
- Resolving incidents. Once identified, incident management teams must gather the necessary resources to quickly resolve the incident, often working with other departments to return operations to normal and remediate security vulnerabilities that lead to the incident occurring.
- Reporting incidents. Regular reporting on incidents helps prevent future incidents and keeps the organization informed.
- Training employees. Incident management teams are responsible for training staff on how to respond to various types of events and incidents. This includes training on established procedures, and educating other teams about the potential impact of incidents on business operations.
Incident management teams often contain a computer security incident response team (CSIRT) whose responsibilities include analyzing, categorizing, and responding to cybersecurity-based incidents and events.
This type of incident response team can include functional roles such as:
- Incident response manager. Duties of this position include overseeing and prioritizing actions during detection, containment, and recovery of a cybersecurity incident. Managers might also be required to convey high-severity incidents to the rest of the organization, customers, law enforcement, regulators, and the public where applicable.
- Security analysts. Analysts support and work directly with affected resources, as well as implementing and maintaining technical and operational controls.
- Threat researchers or analysts. These team members provide threat intelligence and additional context around cyber security incidents. They might use third-party tools and internet resources to understand current and future threats.
Effective incident management programs involve a cross-functional team from senior leadership, legal, human resources, IT, cybersecurity, and public relations. Senior leadership support is critical to gather resources, funding, and staff from different teams. Senior leadership roles that might provide this support include the Chief Information Security Officer (CISO) or Chief Information Officer (CIO), or in some cases, the CEO. Legal counsel teams advise on compliance activities and liability for vendor or partner data breaches. Human resources teams provide guidance on personnel removal and access credentials, where an incident involves insider threat. Public relations teams ensures accurate and transparent communication with regulators, media, customers, and shareholders.
Some of the most key roles involved in incident management processes are:
|Records and classifies received incidents and undertake an immediate investigative effort in order to restore a failed or impacted IT Service as quickly as possible.
|Performs the administrative tasks necessary to support activities within a process.
|Manages the process to restore normal service operation as quickly as possible to minimize the impact to business operations.
|Splunk SOAR Admin
|Applies configuration changes, app installation and maintenance, user, permissions changes, develops and implements playbooks to automate and orchestrate incident management workflows and processes.
|Information Security Management
|Accountable for the incident management process and maintains, designs and improves the process as necessary to achieve the objectives of the business.
Before establishing or formally kicking off an incident management program, there are some prerequisites that an organization should consider to avoid pitfalls.
These might include conducting a risk assessment. Conducting a risk assessment to identify potential security threats and vulnerabilities is a critical first step in establishing an incident management program. Assessments help an organization understand its specific risk landscape and determine the types of incidents that it is most likely to encounter or is encountering routinely.
Assess your monitoring and analysis situation. An organization should have tools and processes in place to monitor its applications, systems, users, assets and have the ability to detect incidents. This includes using security monitoring tools and conducting regular vulnerability assessments and penetration testing. There are tools and industry standards that can be helpful to incident management teams. Tools can be split into three categories:
- Prevention. An organization might deploy vulnerability scanners and data leak prevention (DLP) tools to prevent leaked credentials and other sensitive organization data being exposed due to poor security controls or a lack of configuration management.
- Detection. Detection could be covered through the use of anti-malware software, network intrusion detection systems, security incident and event management (SIEM) software such as Splunk Enterprise Security, or vulnerability scanners that check endpoint and server security posture.
- Response. Common response tools often help assist in processes around remediation workflows where incident management teams can investigate, collaborate, request remediation, track and close third-party attack vectors, and store indicators. A security orchestration, automation, and response platform such as Splunk SOAR is an example of a response tool.
There are many tools that can play some critical role in the incident management process, some of the most common tools might include:
- Intrusion detection systems (IDS). These systems detect and react to anomalous security events and incidents. They often have features such as real-time alerts, blocking, and reporting capabilities.
- Network and cloud traffic flow analyzers. These tools help incident managers understand the traffic flowing in and out of their network and cloud infrastructure. This type of information can identify anomalous and malicious activity and help quickly respond to incidents.
- Vulnerability scanners. These scanners help identify vulnerabilities in an organization’s systems and networks. This information can be used to identify and fix the vulnerabilities and prevent future incidents.
- Platform observability and availability monitoring. These types of monitoring tools help incident managers track the availability of critical systems and applications. This information can be used to quickly identify and resolve incidents affecting business and critical systems operations.
- Web proxies: A web proxy is a system positioned between the end-point client and the target website. It intercepts network web requests from the endpoint and forwards them to the target server. This can be used to monitor, cache, redirect traffic, and block access to specific websites.
- Security information and event management (SIEM). SIEM ingest and analyze network, system, and security data across an organization. Through data correlation, SIEM can identify and alert on anomalous or malicious activity. This can help incident managers quickly identify and mitigate potential threats.
- Threat intelligence management. Threat intelligence is information about current or emerging threats that can impact an organization. It can be leveraged to help incident managers stay ahead of any potential attacks and protect their organizations.
2. Recommended Training
Establishing an effective incident management program is critical for any organization that wants to be prepared to respond to security incidents and minimize their impact. Here are some considerations that an organization should take into account when establishing an incident management program:
- Define incident management policies and procedures. An incident management program should have clearly defined policies and procedures that outline how incidents are identified, triaged, investigated, and resolved. The policies and procedures should be documented and communicated to all relevant stakeholders.
- Determine roles and responsibilities. An incident management program should clearly define the roles and responsibilities of all individuals involved in the incident management process. This includes identifying who will be responsible for leading the incident response, who will be responsible for communication, who will be responsible for technical investigations, and who will be responsible for documenting and reporting on incidents.
- Identify incident response team. An incident management program should identify and train a team of incident responders who have the necessary skills and expertise to effectively respond to incidents. The team should be composed of individuals from different departments, including IT, security, legal, and communications, and should be available to respond to incidents at any time.
- Develop incident response plans. Incident response plans should be developed for different types of incidents, such as malware infections, data breaches, or denial-of-service attacks. The plans should outline the specific steps that need to be taken to respond to each type of incident, and should include checklists and templates to ensure consistency and thoroughness in the response.
- Establish communication protocols. An incident management program should establish communication protocols that ensure that all stakeholders are informed of incidents in a timely and effective manner. This includes establishing lines of communication between the incident response team, senior management, legal, communications, and other relevant parties. For security operations centers and incident management teams, communications will play a significant role in reaching stakeholders, sharing information, building relationships, and fostering trust. Communications transcend all business and security processes, including those that occur under normal operations and during a crisis.
- Review and test incident management plan. An incident management program should periodically review and test incident management plans to ensure that they remain up-to-date and effective. This includes conducting tabletop exercises, simulations, and drills to test the response team's ability to effectively respond to incidents.
- Monitor and improve incident management process. An incident management program should have a process in place to monitor and continuously improve incident management processes. This includes analyzing incident data, identifying trends and patterns, and making adjustments to incident response plans and procedures as needed.
1. Step-by-step guidance
The necessity for incident management teams is growing. An incident management team can help organizations promptly respond to any event or incident and protect their business from potential attacks and business interrupting events - for example, by creating an organization-wide incident response policy.
Here are some key activities to help set your organization up for successful incident management:
- Make an inventory of assets. Categorization is important to determine what systems and data are most critical for your business activity and prioritize the order in which they need to be addressed and recovered after a security incident.
- Assemble a security incident response team. Identify and assign team members roles and responsibilities, and be sure to include representatives from departments outside of IT, such as finance, operations, and legal. Establish communication with the appropriate individuals during a security incident.
- Look for security clues. Start by defining what constitutes a security incident for your organization, so you know what to look for. Then develop processes, procedures, and policies for how they’re detected and reported.
- Create a security incident action plan. This should include a list of all relevant tasks based on the threat, including key performance indicators (KPIs) and who is responsible for handling each one. Then test the plan to determine its effectiveness and streamline as needed, including testing and consolidating your incident management tools.
- Evaluate your team’s response. Analyzing response time, successes, and failures during an incident allows you to build your knowledge base to improve the plan for future incidents.
For more information, see Overview of incident review in Splunk Enterprise Security and Creating an incident workflow in Splunk Enterprise Security.
2. Incident management with Splunk SOAR and Splunk Mission Control
SOAR platforms are designed to help security teams manage and respond to security incidents more efficiently and effectively. Using SOAR to provide a centralized location for managing security alerts, automating incident response tasks via workflows, and orchestrating responses across multiple security tools and systems.
Start by identifying which security processes need to be operationalized and improved. Some guiding principles for choosing a security process to standardize are:
- Limit to processes that are performed today (that is, already established work) that produce quality feedback from the team.
- Look for processes that are done relatively frequently and that are familiar to all members of the team.
- Limit the options to processes where the penalty for inconsistency can be potentially high.
- Look for processes where automation could be introduced in the future to reduce the number of manual tasks required by the security analysts.
- Look for processes that may have some compliance or regulatory requirement where consistency represents an audit improvement.
2.1 Setting up a Splunk SOAR workbook
To set up a workbook in Splunk SOAR, follow these steps:
- Access Splunk SOAR's built-in workbooks under Administration > Product Settings > Workbooks.
- Click the button to create a new workbook. You'll be taken to the workbook creation screen.
- Enter the workbook name and workbook description.
- When creating or editing a workbook, there are two configurations that require you to make a decision and to click it if it applies.
- Start inputting what you have on your paper workbook into phases and tasks in Splunk SOAR. You can also assign different owners to each task as an option. If there is an ordered structure for the analysts that will be used for deciding who performs each task (for example, tier one/tier two or junior/senior), capture it in the configuration. Splunk SOAR allows for assignment based on user or role, which is useful if the team is not in a flat structure. Complete all the sections, and when you are finished, save it in the system.
- After you save the workbook, run through some test incidents and have your team do the same. This gives you the time to familiarize yourselves with the Splunk SOAR workbook user interface.
For more information on this process relating to Splunk SOAR, see how to Define a workflow in a case using workbooks in Splunk SOAR. For more information on this process relating to Splunk Mission Control, see how to Triage incidents using incident review, Investigate an incident, Create response templates to establish guidelines for incident response, or Automate incident response with playbooks and actions.
When implementing the guidance in this guide, you can measure and should see improvements in the following:
- Your security rating and risk posture assessment status
- Vendor risks remediated
- Number of incidents detected
- Number of incidents missed
- Number of incidents requiring actions
- Number of repeat incidents
- Number of known attack vectors
- Average detection time
- Average remediation time
- Number of data breach events
- Average vendor security posture
- Number of stakeholders present in incident response plan review meetings
- Number of stakeholders present in incident response plan tabletop exercises
- Other security initiatives, for example, cybersecurity awareness training, website risks, email security, network security, malware, and brand protection