Increasing memory heterogeneity mandates careful data placement to hide the non-uniform memory access (NUMA) effects on applications. While NUMA optimizations have focused on application data for decades, they have ignored the placement of kernel data structures due to their small memory footprint; this is evident in typical OSes that pin kernel data structures in memory. In this paper, we show that careful placement of kernel data structures is gaining importance in the context of page-tables: their sub-optimal placement causes severe slowdown (up to 3.1×) on virtualized NUMA servers.
In response, we present vMitosis – a system for explicit management of two-level page-tables, i.e., the guest and extended page-tables, on virtualized NUMA servers. vMitosis enables faster address translation by migrating and replicating page-tables. It supports two prevalent virtualization configurations: first, where the hypervisor exposes the NUMA architecture to the guest OS, and second, where such information is hidden from the guest OS. vMitosis is implemented in Linux/KVM, and our evaluation on a recent 1.5TiB 4-socket server shows that it effectively eliminates NUMA effects on 2D page-table walks, resulting in a speedup of 1.8−3.1× for Thin (single-socket) and 1.06 − 1.6× for Wide (multi-socket) workloads.
Ashish Panwar, Reto Achermann, Arkaprava Basu, Abhishek Bhattacharjee, K. Gopinath, Jayneel Gandhi, “Fast Local Page-Tables for Virtualized NUMA Servers with vMitosis,” ASPLOS 2021.
Ashish Panwar, Sorav Bansal, K. Gopinath, “HawkEye: Efficient Fine-grained OS Support for Huge Pages,” ASPLOS 2019
Click image to view enlarged version