Towards a Data Centric Foundation for AI: Teaching AI to Solve its own Data Problem

Speaker: Suparna Bhattacharya


Abstract

AI models and pipelines are growing in sophistication with phenomenal advances that eclipse previous solutions. However, the real-world impact and trustworthiness of AI heavily depends on the data behind these models. Data centric AI is an emerging discipline that accounts for this reality, provoking us to rethink how we build AI systems by shifting attention to techniques that systematically improve the data instead of iterating on models while keeping the data fixed.
Selecting and tuning the most valuable data for improving model performance, efficiency and trust metrics can be very laborious and challenging - a complex optimization problem in a high- dimensional space of possible data characteristics and hyperparameters across all many pipeline stages including data collection, selection, labelling, augmentation, feature selection, domain adaptation, model training, fine tuning, testing and refinement. This space is even more intricate in “AI for Science” pipelines that often include multiple models applied in sequence, or models built incrementally as new data is collected from simulation or experimental facilities.
This talk describes a Self-Learning Data Foundation that captures and learns from AI pipeline metadata and data characteristics, enabling a new layer of data centric intelligence towards addressing these challenges.

Bio

Suparna is an HPE Fellow in the AI Research Lab at Hewlett Packard Labs, where she currently focuses on data-centric and trustworthy AI, and has a passion for realizing innovations that blend insights from diverse computing domains. She has deep experience in several areas of systems software development and research, spanning many layers of data-processing systems, and the storage stack. This includes several enjoyable years of open-source contributions to the Linux kernel, 29 granted patents, 30 publications, and a book on Resource Proportional Software Design for Emerging Systems.
Suparna holds a B.Tech in Electronics and Electrical Communication Engineering from IIT Kharagpur (1993) and a late-in-life PhD in Computer Science with a best-thesis award from the Indian Institute of Science (2013). She was elevated to IEEE Fellow in 2022 for her contributions to the Linux kernel for the enterprise and advanced data processing systems. She is also a Fellow of the India National Academy of Engineering. She received the HPE Women’s Excellence Award in 2017 and 2022, the IEEE India Council Woman Technologist of the Year award in 2020, the IISc Prof S. K Chatterjee Award for Outstanding Woman Researcher in 2019, the Zinnov Next Generation Women Leaders Award in 2019 and the Economic Times and Femina Inspiring Women-in-Tech award in 2023.