Reducing memory interference latency of safety-critical applications via memory request throttling and Linux cgroup

One of the major challenges that system developers must address when designing a mixed-criticality system is to effectively reduce the performance interference that occurs when applications with different level of criticality compete for shared computing resources. This problem is particularly pronounced for memory resources when data intensive applications such as deep learning are running on a CPU-GPU heterogeneous architecture where CPU and GPU work together on a single chip sharing memory. This is because the memory access contention tends to get intensified between CPU cores and GPU that perform data intensive workload.

Memory access contention results from applications’ simultaneous requests for memory hardware resources such as the system bus, a memory controller and memory banks. When memory access contention occurs excessively, without appropriate counter measures, applications with a higher level of criticality could be delayed resulting in their end-to-end latencies increasing unpredictably. We refer to such increase in end-to-end latency as memory interference latency.

Kernel architecture of the application-aware dynamic memory request throttling
Kernel architecture of the application-aware dynamic memory request throttling

We propose an operating system mechanism to dynamically reduce the memory interference latency of critical applications in a dual-criticality system such as the infoADAS. The key idea behind our approach is to reduce memory interference latency by lowering the memory request rate of normal tasks when critical tasks are executing. We call our approach “application-aware dynamic memory request throttling.” Our approach is composed of four actively involved components along with other kernel components: (1) the critical task chain manger, (2) the excessive memory contention predictor, (3) the memory request rate controller and (4) the CPUFreq governor. The critical task chain manager keeps track of a critical task chain and generates classification requests for tasks on a critical task chain to the cgroup core. The excessive memory contention predictor periodically predicts the occurrence of excessive memory contention under critical task execution by identifying tasks running on both CPU and GPU and counting the number of outstanding memory requests in a memory controller. The memory request rate controller invokes the CPUFreq governor to scale up or down the frequency of each core according to its decision.

 Overview of the Progress Balancing
Overview of the Progress Balancing


References
[1] Ernst Rolf, and Marco Di Natale, “Mixed Criticality Systems—A History of Misconceptions?”, IEEE Design & Test 33.5, pp65-74, 2016.