Adopting workload management with cgroups v1
Workload management plays a critical role in ensuring the efficient allocation of system resources among various workloads.
Currently, Splunk's workload management relies on cgroups version 1 (v1) within user operating systems. The shift toward newer systems defaulting to cgroups version 2 (v2) poses a potential challenge – users who update their operating systems without adjusting the cgroups version might find that workload management might not function as intended or, worse, break entirely.
Cgroups v2 offers various improvements over cgroups v1, such as:
- Namespaces Isolation. cgroups v2 offers better integration with namespaces, allowing more secure isolation and better compartmentalization of system resources.
- Improved Resource Management. cgroups v2 offers more precise control and isolation of resources. This more effective resource management can prevent resource abuse or denial-of-service attacks.
- Enhanced Control. cgroups v2 provides finer-grained control over resources, reducing the risk of resource contention and improving overall system stability and security.
Splunk Enterprise 9.2 or below do not support workload management in Linux systems that use cgroups v2, so if you're a user of these versions of Splunk Enterprise, you'll need to follow the advice in this article. Splunk Enterprise 9.3 and higher support workload management on Linux systems that use cgroups v1 and cgroups v2, so if you're a user of these versions of Splunk Enterprise, see this article for guidance.
- If your operating system supports cgroups v2 by default, we strongly advise against upgrading without adjusting the cgroups version settings. Doing so might inadvertently disrupt the functionality of workload management within the Splunk platform.
- The processes listed in this article provide general guidelines, and the actual steps required might be different based on your operating system's distribution. Each operating system has different procedures to verify cgroups versions. You should consult the specific documentation or official guidelines for your unique operating system to verify cgroups versions or stay on cgroups v1.
Check your cgroups version
For users considering an operating system upgrade, it's important to understand how to maintain compatibility with workload management. You can check your cgroups version using the following steps:
- If you have
/sys/fs/cgroup/cpu
and/sys/fs/cgroup/memory
, then you have configured cgroups v1 and WLM should operate as intended. - Otherwise (for example, having
/sys/fs/cgroup/cgroup.controllers
) you have cgroups v2 configured on your operating system, or you have a misconfigured cgroups v1 on your OS for Splunk.
If you’re using Red Hat OpenShift, you can also check the file under sys/fs/cgroup:
$ stat -c %T -f /sys/fs/cgroup
If the output is tmp2fs
, then you have cgroups v1 on your node; on the other hand, cgroup2fs
shows that you have cgroups v2 on your system.
How to stay on cgroups v1
Many Linux distributions configure systemd with cgroup v2 as default. If you have already updated to cgroups v2 and potentially encountered issues with Workload Management, it's still possible to revert to cgroups v1. Below are some general steps you can choose to change the cgroups version from v2 to v1.
- Backup your data. Before making any changes, ensure you have backups of critical data.
- Check current configuration. Determine if your system is using cgroups v2. Check the mounted filesystems to confirm if the v2 hierarchy is active by using
mount | grep cgroup
. - Configure to use cgroups v1. Modify the bootloader configuration or kernel parameters to switch back to cgroups v1.
- Reboot. After making all the changes, reboot your system to apply the modifications and check whether cgroups v1 is now being used.
- Modify configuration files. On some systems, such as those using systemd, you might need to modify configuration files related to cgroups to ensure the system uses v1. Review and adjust these configuration files accordingly.
Next steps
These resources might help you understand and implement this guidance:
- Red Hat: How to enable cgroup-v1 in Red Hat Enterprise Linux 9
- Red Hat: Configuring the Linux cgroup version on your nodes
- Kubernetes: About cgroup v2
- Kernel: PSI - Pressure Stall Information