Digital biomarkers for infectious diseases: the low cost tool that enables mass screening

Our healthcare system now more than ever is under tremendous pressure. We are in desperate need of short term measures (social distancing, mouth masks, ventilators), but should also strive to collect data that will enable us to act with more foresight and take corrective actions sooner in the future.

Digital biomarkers are characteristics that can be objectively measured and evaluated as an indicator of normal biologic processes or pathologic processes. Being collected through digital devices enables them to be rolled-out fast and scaled at very low cost once validated (almost every person/household has one or more smartphones). They have already been developed for neurological conditions (e.g.  and are likely best known in diabetes e.g. measuring the blood glucose level of patients. Now is the time to develop digital biomarkers for infectious diseases so that we can react faster, more targeted and with less collateral damage.


Digital biomarkers in the COVID-19 context

COVID-19 has an incubation period of commonly around 5 days (range: 1-14d) in which the virus can already be passed on to others (source: WHO). If we can shed a light into the darkness during this incubation period we could place people carrying the virus in quarantine sooner. And, by (potentially) mapping the interactions infected people have had we could even apply directed quarantine to smaller groups, and perhaps even limit the period in which people have to stay in quarantine without symptoms. Continuous monitoring could thus enable us to adopt strategies that are sustainable over a longer period of time and that generate less (economic) collateral damage.

The main symptoms of COVID-19 are fever, fatigue, dry cough, shortness of breath (source: WHO). Several less frequent symptoms include muscle ache, runny nose, nasal congestion, sneezing, sore throat, headache, diarrhea. Most of these main symptoms can be measured objectively and accurately at home (e.g. thermometer) or with a smartphone or smartwatch. While I believe that a final solution will have to take into account all these symptoms, I will focus on the main differentiator with common flue in this article: shortness of breath.

Respiration volume (amount of air) is the volume of air you breath in/out and is measured by a spirometer. The microphone on your smartphone is in essence a pressure sensor (sound is a pressure wave). Following Bernouilli’s principle ( flow rate is inversely related to pressure. So we could in theory measure the flow of air from the pressure on the microphone of a common smartphone. A regression model can then be applied to derive the flow vs volume curves. This has already been developed, tested and published (

Respiration rate or frequency (how fast you breath (#breaths/minute)) can also be measured by several sensors on your smartphone (the accelerometer, the camera, the magnetometer, and the microphone). Analyzing the sound wave you can identify when someone is breathing in, and when that person is breathing out. Doing this over a period of time will give you the respiration rate. This was developed, tested, and published here (

We care about respiration because it could tell us something about the oxygen saturation in the blood (SpO2). Luckily, we can also measure SpO2 using a common smartphone. Hemoglobin is a protein in red blood cells that carries oxygen. SpO2 indicates the percentage of hemoglobin molecules in the blood which are saturated with oxygen. Through photoplethysmography, shining various wave lengths of light through blood vessels, we can measure how much light is reflected and absorbed. This technique allows you to measure your heartrate  and heart rhythm by holding your fingertip on the camera (e.g FibriCheck (, and most smartwatches with HR-capability) but can also be used to measure SpO2. This has been developed, tested and published here ( and is implemented on both Apple’s and Samsung’s healthkit, as well as several independent apps. NB: This measurement is, more than the previous, hardware dependent (different setups thus require different models).

When the body is coping with an infection, such as a virus, it tries to fight against it. This fight costs energy and resources, which is why very often you can see an increase in heartrate. Recent studies and developments are looking with even more detail at heart rate variability ( This could potentially be an (early) indicator of infection (perhaps even before symptom expression).


Mass testing based on digital measurements as a first screening / monitoring

Many of these measurements display person to person differences. In a real world scenario it is thus likely better that everyone acts as their own control. In practice that means that you start recording these measurements while being healthy and use that as a baseline to track potential changes.  The aforementioned measurements are obviously imperfect, but combined as features in a machine learning or deep learning model they could provide reliable feedback and alleviate some of the pressure of the healthcare system and workers. Upon changes or doubts by said model you could then be referred to the traditional pathways of the healthcare system (e.g. testing for the virus, analysis of your lungs, …). This is a hybrid approach, where we use the mass testing based on digital measurements as a first screening/monitoring and refer to the more accurate traditional measurements based on needs or assessed risk. This enables to dedicate the traditional resources for those cases who really need them, and to make best use of the equipment & staff we have; while at the same time collecting important data about spread and prevalence of the virus.


Network mapping to roll out isolation strategies much faster

Network mapping, for example based on location data could enable novel strategies that can either be applied much faster (in case we face a similar problem again) or sustained much longer (in case this lasts much longer than anticipated). The current measures are necessary and we should all abide. But they are tough calls to make and therefore are only applied (relatively) late, and have tremendous impact on social, economic, and healthcare systems. I am aware that sharing location data is provocative and triggers red flags. Yet most do it daily e.g. when using a routing/mapping app. Most people are currently unaware of how many of their apps have access to their location data. Novel isolation strategies tailored to those with a higher risk could then be rolled out much faster than the current “emergency brake”.

An alternative is to better predict those who are at greatest risk and find ways to protect (worst case scenario: quarantine) those people whilst exposing the rest (heard) to the virus thus creating heard immunity. In the current scenario 80% of people that get infected recover completely without medical intervention (source: WHO). Assuming that there is no reinfection (, creating heard immunity could prevent stressing the capacity of the healthcare system and control the risk to exposure now and in the future. But until we can clearly identify all groups with elevated risk to require extensive medical assistance (hospitalization, ICU, …) this alternative might prove dangerous.


A plea for open datasets

In times of need, like we are facing today, open datasets would greatly help the data science community to pitch in. Many are eager to help but lack data to generate traction.

I am currently unaware of existing digital biomarkers that facilitate mass scale testing for infectious diseases or monitoring thereof. If you do, please contact me (

If you know of an institution or company that is currently already collecting this data at large scale (pre-during-post infection) and or is sharing this data, again please contact me . If you are willing to develop, please let me know as this is an area that I am passionate about and I would love to contribute.

A summary of other initiatives being taken currently in Belgium can be found here:



Thanks for reading

Share blog