Speaker: Sumit Kumar Mandal
Processing-in-memory (PIM) is a promising technique to accelerate deep learning (DL) workloads. Emerging DL workloads (e.g., ResNet with 152 layers) consist of millions of parameters, which increase the area and fabrication cost of monolithic PIM accelerators. The fabrication cost challenge can be addressed by 2.5-D systems integrating multiple PIM chiplets connected through a network-on-package (NoP). However, server-scale scenarios simultaneously execute multiple compute-heavy DL workloads, leading to significant inter-chiplet data volume. State-of-the-art NoP architectures proposed in the literature do not consider the nature of DL workloads. In this talk, we will discuss a novel server-scale 2.5-D manycore architecture that accounts for the traffic characteristics of DL applications. Comprehensive experimental evaluations with different system sizes as well as diverse emerging DL workloads demonstrate that the architecture achieves significant performance and energy consumption improvements with much lower fabrication cost than state-of-the-art NoP topologies.
Sumit Kumar Mandal is currently an Assistant Professor in Indian Institute of Science, Bangalore. He completed his PhD from University of Wisconsin-Madison. He received Best paper award from ACM TODAES in 2020 and ESweek in 2022. His research interest is energy efficient communication architecture for machine learning applications with emerging technologies.