Machine Learning as a Service: The Challenges of Serving a Million client Distributions

Speaker: Sunita Sarawagi


Abstract

Increasing concentration of big data and computing resources has resulted in widespread adoption of machine learning as a service (MLaaS). The best-performing NLP, speech, image and video recognition tools are now provided as network services. In such cases, the labeled data used for training may be proprietary, and different clients may be interested in different data distributions often violating the core ML generalizability assumption of the training and test distributions matching. This talk will discuss techniques for reducing such mismatch. We discuss ways in which the server could exploit multi-client training data to train ML models for better generalization to client distributions when an explicit parameter adaptation is not an option. Next, we call for a more detailed specification of a server’s accuracy where accuracy is not a single number, but a surface over interpretable client data properties. Such an interpretable surface would allow a client to make more informed choice of a model from the burgeoning marketplace of cloud services. We discuss methods for lightweight and heavyweight client adaptation of a blackbox service in the context of NLP models for topic adaptation, and speech models for accent adaptation.

Bio

Sunita Sarawagi researches in the fields of databases and machine learning. She is institute chair professor at IIT Bombay. She got her PhD in databases from the University of California at Berkeley and a bachelors degree from IIT Kharagpur. She has also worked at Google Research (2014-2016), CMU (2004), and IBM Almaden Research Center (1996-1999). She was awarded the Infosys Prize in 2019 for Engineering and Computer Science, and the distinguished Alumnus award from IIT Kharagpur. She has several publications including best paper awards at ACM SIGMOD, VLDB, ICDM, NIPS, and ICML conferences. She has served on the board of directors of the ACM SIGKDD and VLDB foundation. She was program chair for the ACM SIGKDD 2008 conference, research track co-chair for the VLDB 2011 conference and has served as program committee member for SIGMOD, VLDB, SIGKDD, ICDE, and ICML conferences, and on the editorial boards of the ACM TODS and ACM TKDD journals.