The so called Big Data revolution has opened new paths for scientific research in many fields, including health related data science. As measurement techniques, data storage capabilities, and the ability to crosslink diverse data sets develop, increasingly large volumes of information are available for public health research and decision making. Many scientific works have been published which describe and make predictions about the role of Big Data in healthcare, epidemiology and sanitary surveillance.
Traditional Public Health Research
Public health research has been historically done by focusing on a small set of indicators that are suspected to be associated with a specific health outcome: hypothesis are made on the correlation between some environmental or clinical variables and a particular issue that can be measured on a defined cohort. Traditional epidemiological studies are designed starting from a specific hypothesis and the entire process, including data collection and analysis, is structured around it.
Data-based Health Research
However, the increasing availability of large population-level data related to health, like electronic medical records but also sensor data from mobile and wearable devices, has led public health researchers to explore novel approaches to data aggregation and analysis. Many factors and circumstances, like environmental exposures, demographics, socioeconomic factors, consumer purchasing behaviors, etc., besides being object of studies by their own, are providing lots of data related to health.
There is a growing acknowledgement that analysing such datasets with hundreds of variables can lead to important and unexpected public health discoveries. For instance, leveraging large datasets on environmental data could not only produce evidence for the single issue being investigated but could also highlight correlations and dependencies that weren’t even suspected. It is of course possible to use an hypothesis-driven approach when working with Big Data records, but such datasets can also be used with a new open-ended approach, integrating a large number of indicators covering different aspects of human health and looking for relevant patterns without pursuing a specific answer. A new paradigm is being developed in order to carry on this kind of data-driven studies.
Challenges in Data-driven Health Research
From a public health perspective, one of the biggest challenges when working with extensive data sets is figuring out which indicators are the most important determinants of a particular health outcome. In traditional public health studies, this is usually done with standard techniques, like creating linear regression models and studying the percentage of the variance accounted for in the model. When the number of indicators grows significantly this approach doesn’t scale well, and other techniques have to be used or developed. For many studies, as in the traditional approach, the clinical validation is still important to mark the actual relevant indicators.
Digital Epidimiology – An Example of Health Research with ICARUS
To highlight an example near to the ICARUS landscape, human mobility data are an extremely variegated kind of information, originating from different sources and with many potential applications, amongst which we can definitely mention epidemiological modeling. Closely related is the newborn field of digital epidemiology, which is mainly based on the idea that the health of a population can be assessed through the digital traces left by individuals. Researchers have already started to develop methods and strategies for using digital epidemiology to support infectious disease monitoring and surveillance or to feed realistic models to learn how diseases spread and how to best counteract the spreading.
A lot of work still needs to be done in order to fully integrate computational and digital epidemiology with existing practices, and many issues have to be addressed in order to do so, including for instance privacy concerns, but the increasing availability of new valuable datasets will almost certainly drive the process in that direction.
The ISI demonstrator in ICARUS focuses on the use of aviation data to improve the modelling of infectious disease spreading.