Safeguarding Workload Management operation during the transition to cgroups v2
Workload Management plays a critical role in ensuring the efficient allocation of system resources among various workloads.
Currently, Splunk's Workload Management relies on cgroups v1 within user operating systems. The shift toward newer systems defaulting to cgroups v2 poses a potential challenge – users who update their operating systems without adjusting the cgroups version might find that Workload Management might not function as intended or, worse, break entirely.
Cgroups v2 offers various improvements over cgroups v1, such as:
- Namespaces Isolation. cgroups v2 offers better integration with namespaces, allowing more secure isolation and better compartmentalization of system resources.
- Improved Resource Management. cgroups v2 offers more precise control and isolation of resources. This more effective resource management can prevent resource abuse or denial-of-service attacks.
- Enhanced Control. cgroups v2 provides finer-grained control over resources, reducing the risk of resource contention and improving overall system stability and security.
Splunk is actively working on enhancing Workload Management to support cgroups v2 environments. Until then, it's essential for users to take certain precautions to maintain a smooth operational experience.
- If your operating system supports cgroups v2 by default, we strongly advise against upgrading without adjusting the cgroups version settings. Doing so might inadvertently disrupt the functionality of Workload Management within the Splunk platform.
- The processes listed in this article provide general guidelines, and the actual steps required might be different based on your operating system's distribution. Each operating system has different procedures to verify cgroups versions. You should consult the specific documentation or official guidelines for your unique operating system to verify cgroups versions or stay on cgroups v1.
Check your cgroups version
For users considering an operating system upgrade, it's important to understand how to maintain compatibility with Workload Management. You can check your cgroups version using the following steps:
- If you have
/sys/fs/cgroup/cpu
and/sys/fs/cgroup/memory
, then you have configured cgroups v1 and WLM should operate as intended. - Otherwise (for example, having
/sys/fs/cgroup/cgroup.controllers
) you have cgroups v2 configured on your operating system, or you have a misconfigured cgroups v1 on your OS for Splunk.
If you’re using Red Hat OpenShift, you can also check the file under sys/fs/cgroup:
$ stat -c %T -f /sys/fs/cgroup
If the output is tmp2fs
, then you have cgroups v1 on your node; on the other hand, cgroup2fs
shows that you have cgroups v2 on your system.
How to stay on cgroups v1
Many Linux distributions configure systemd with cgroup v2 as default. If you have already updated to cgroups v2 and potentially encountered issues with Workload Management, it's still possible to revert to cgroups v1. Below are some general steps you can choose to change the cgroups version from v2 to v1.
- Backup your data. Before making any changes, ensure you have backups of critical data.
- Check current configuration. Determine if your system is using cgroups v2. Check the mounted filesystems to confirm if the v2 hierarchy is active by using
mount | grep cgroup
. - Configure to use cgroups v1. Modify the bootloader configuration or kernel parameters to switch back to cgroups v1.
- Reboot. After making all the changes, reboot your system to apply the modifications and check whether cgroups v1 is now being used.
- Modify configuration files. On some systems, such as those using systemd, you might need to modify configuration files related to cgroups to ensure the system uses v1. Review and adjust these configuration files accordingly.
Splunk is committed to ensuring that Workload Management remains robust and adaptable to evolving system environments and are working on a version that seamlessly integrates with cgroups v2 to provide enhanced functionality without disruptions.
Until then, we encourage everyone to stay informed and take the necessary precautions outlined above. This article will be updated as cgroups v2 compatibility for Workload Management is rolled out in the second half of 2024.
Next steps
These resources might help you understand and implement this guidance:
- Red Hat: How to enable cgroup-v1 in Red Hat Enterprise Linux 9
- Red Hat: Configuring the Linux cgroup version on your nodes
- Kubernetes: About cgroup v2
- Kernel: PSI - Pressure Stall Information