Aarogya Setu – A case study of a large-scale data-driven system for pandemic control

The Centre for Networked Intelligence, under the Division of Electrical, Electronics, and Computer Sciences (EECS) and Robert Bosch Centre for Cyber-Physical Systems (RBCCPS) hosts the Networks Seminar Series. This is a technical discussion forum on themes such as computer networks, machine learning, signal processing, and information theory.

As part of this seminar series, Professor Kamokoti Veezhinathan delivered a talk on ‘Aarogya Setu – A case study of a large-scale data-driven system for pandemic control’. Professor Veezhinathan is one of the members of the team that built the Aarogya Setu (AS) App. He is the current Director of the Indian Institute of Technology Madras (IITM), a faculty member in the Department of Computer Science and Engineering at IITM, and was former Associate Dean for Industrial Consultancy and Sponsored Research (ICSR) at IITM. He has been awarded the DRDO Academy Excellence Award (2013), IBM Faculty Award (2016), and Abdul Kalam Technology Innovation National Fellowship (2020) among others.

Aarogya Setu – the ‘bridge to health’ in Sanskrit – was built in the year 2020, in a record time of 21 days in response to the urgent need for data on COVID-positive persons and contact tracing, in order to curb the spread of infection. This was an initiative of the Government of India, in collaboration with academia and industry. On 26 May 2020, the source code was made available in the open domain.

The App can be downloaded at https://www.mygov.in/aarogya-setu-app/. The Aarogya Setu website lists the total Apps downloaded (Android, iOS and KaiOS) as 21,55,00,000, total samples tested up to 10 April 2022 as 79,41,18,951, and samples tested on 10 April 2022 as 2,71,211. As on 12 April 2022, 5 am, the total COVID-19 cases across India was 4,30,36,132, active cases were 11,058, discharged cases were 4,25,03,383, and deaths were 5,21,691.

In his lecture, Professor Veezhinathan spoke about the development of the AS App. Some keywords that stood out in the talk were interdisciplinarity, intuition, user acceptance, trust, privacy, and security. A summary of his talk is provided below.

The development of any large-scale data analytics system involves the design of three components, namely, an edge sensor, intermediate data aggregation, and cloud analytics. Multiple facets need to be taken care of, including the social aspects of deployment of the system.

In Aarogya Setu, the symptoms are registered and submitted. The latitude and longitude information are very important for the nearest COVID care centre to reach the infected individual. Using bluetooth, contacts can be traced. A unique device ID is given to the mobiles of person A and person B, which will be stored in each other’s mobiles. In the unfortunate event of person A becoming positive, the Cloud will pull out the data of mobile A; this data will contain the mobile IDs, time durations and proximities of all persons who have come into contact with person A. Based on this and rules developed by virologists, alert messages are sent to all the mobile IDs that have come into contact with person A.

Aarogya Setu is used by over 190 million people. The privacy of all these people are retained. The alerted people do not know where the message has come from.

The nature of the project is interdisciplinary, involving epidemiologists, statisticians, computer scientists, and lawyers.

The development cycle consisted of software requirement specification, functional testing, performance testing, security testing, deployment, end-user acceptance, usage and inference, and as a case study.

The system requirement specifications comprised three parts:

Edge sensor
Intermediate data aggregation
Cloud analytics

The key is that, even though there is quite a bit of uncertainty about COVID, the disease must have been picked up through contact, measured as a function of proximity and duration.

Edge sensor

Bluetooth was used as a sensing mechanism. Both users A and B should have installed bluetooth AS and kept bluetooth on. Here, user acceptance comes into the picture. Persons A and B are anonymised by assigning unique IDs to each mobile number, unknown to others. The common link is the unique ID. If A comes into proximity with B, the distance, time of contact, and location information are recorded in both the mobiles, along with the unique ID of the other.

There has been a lot of criticism on why location details are being collected. Those monitoring the disease spread cannot keep calling everyone. However, unless location details are collected, action cannot be taken. How can this be scaled up? When developing such systems, it is important that the end user trusts the system. Success is measured by how many people use the system. User perception is very important when software requirements specification (SRS) is developed. It is necessary to explain clearly why certain information is required.

The important factors are firstly – privacy and secondly – security. The information on the person through whom an individual got COVID should not be revealed, and the App ensured this.

Intermediate data aggregation

Through the App, a mobile phone stores the sensed data in encrypted form and interfaces with the Cloud. It discards data that is more than 15 days old, as a COVID-positive person cannot transmit infection after 15 days. The App allows the user to upload self-assessment data with symptoms such as cough, fever, loss of taste/smell and difficulty in breathing. The data from the mobile will only be used if the person tests positive.

Cloud analytics

The self-assessment data is uploaded by the users. The contacts of all positive-tested persons are pulled up. The epidemiological model analyses the contact trace information to classify each contact as high risk, moderate risk and low risk. The information is sent back to the mobiles of traced device IDs on the classification. In addition, information such as infection spread in the proximity and test centres is provided to users.

ITIHAS (IT-enabled Telco Information based Hotspot Analysis System) is the data-driven backend for hotspot prediction. When data is properly sensed and analysed, it gives a million insights. The number of self-assessments and traces per day is 2,00,000 and increasing. The questions to be asked are ‘Can a single entity follow up?’ and ‘How do you physically reach infected persons?’ The system is useless without follow up. States, districts, pin codes, and sub-post offices were considered. For every sub-post office, there is a latitude (lat) and longitude (long). Any query on lat/long can be mapped on to the sub-post office.

The working principle of ITIHAS is as follows: The information on COVID-positive persons in a given area in the last 15 days is collected. Self-assessment is recorded, along with lat/long. Depending on the data, areas are classified as immediate security area (pink colour), scrutiny area (amber), watchlist area (light blue), and immediate watchlist area (dark blue). The objective is to identify, alert and classify probable areas of infection at the pin code sub-post office level, and to identify specific sub-areas within those areas. Action is to be taken with respect to the mobile numbers within that area that have reported symptoms through AS. State-wise downloadable reports are generated, with district-wise hidden C-19 spots that were not covered under hot spots.

On 13 April 2020, 9 cases were reported in Anand district sub-post office in Gujarat. There were 2 hidden hotspots, which were detected 17 days in advance. ITIHAS intimated that these were hotspots. On 30 April, the district was declared ‘red’ by the Government. By 2 May, there were 123 cases. Such case studies gave the AS team the confidence that the effort was worthwhile. Early prediction of hotspots in Thanjavur district, Tamil Nadu was also done. There were 17 cases on 13 April, 2020, and by 2 May, it had escalated to 66 cases. Another case is that of Kancheepuram district in Tamil Nadu, where there were 19 cases on 13 April, and 97 cases on 2 May. In both these cases, AS gave an early warning.

Across India, a 100 x 100 grid-based system is adopted. The lat/long of every post office is mapped on to a box on the grid. Given a query point, a box ‘b’ is found to which it belongs to. A search is done for the closest box to ‘b’ mapping a pin code. At every sub-post office, trace and caller lists are provided, so that collective action could be taken. This is a very simple mapping system.

To derive insights from the data, statisticians performed syndromic mapping. This predicts the presence and spread of disease in a given area, which in turn, can be mapped on to sub-post office areas. The areas are classified as very high risk, high risk, moderate risk, and low risk. Maximum exposure areas are also identified. This enabled a ‘Catch early and Contain early’ protocol, devised by the Ahmedabad Municipal Corporation (AMC).

Ahmedabad adopted Aarogya Setu on 27 May 2020. From 17 June to 17 July 2020, the cases came down from 330 to 187, and deaths came down from 22 to 5. The two reasons behind the success of Ahmedabad are: (i) the dedicated team at AMC worked round the clock, and (ii) people submitted self-assessment and carried mobiles, keeping bluetooth on. There was end-user acceptance, trust and belief.

The human aspect is very important. A person should trust that the App is serving him/her and the community. The Government also opened up the code base for people to verify. The concerns regarding privacy and data security were addressed by making the working, privacy and data retention process public.

The development method involved application development, performance testing, security testing and data analytics. Pre-production performance assurance was done using stress test, service level agreement (SLA)-based test, reliability test, capacity planning test, performance tuning and regression test, and key transaction test.

Aarogya Setu was developed in 21 days by a 28-person team in 600–700 person-hours. The success story proves that we have really good engineers who can build large-scale correctly-working software in a very short period of time, said Professor Kamakoti Veezhinathan.