I am not an epidemiologist, nor a virologist. The files linked here represent my attempt to follow the news on the COVID-19 epidemic, "digest" the information contained in the available data, and become more aware of its possible limitations. As a data scientist, it is easier for me than for the average public to access publically available data and go through this process. I share these attempts here with the hope that others might find them useful. I would be very grateful for any feedback: I am fvery interested on the effectiveness of different strategies for communicating to the general public the results of data science investigations.
I am originally from Italy, and I was born and raised in Brescia, which is one of the areas that have been hit the hardest from the COVID-19 epidemic in Italy. To follow closer what is happening to my home town and my family who is still living there, I have been looking at the italian data. There are two sources with daily up-dates
In addition, the "Istituto nazionale di Statistica" (similar to the Census bureau) has information on populations in the various regions, and mortality rates compiled to facilitate the evaluation of the actual impact of the COVID-19 epidemic. These are the data sources I am working with.
Here are some of my initial analyses: more comprehensive ones are coming up.
- The "Protezione civile" (a branch of the governament devoted to the non-armed protection of citizen) releases daily data on a GitHub site, reporting total cases, hospitalized, death and similar.
- The epidemiology department of the "Istituto superiore della sanita'" (something like NIH) releases daily and bi-weekly summaries of the population characteristics (age, sex, comorbidities) of cases and deaths. There are versions in English as well.
To follow the daily changes in the epidemic I have created this visualization of the case/death data released by https://github.com/pcm-dpc/COVID-19, with a focus on the Lombardy region. The goal was to spot changes in epidemic spread as early as possible.
- To look at the overall trends, and understand how and when the epidemic concludes (ex. when do we reach the "peak"?) I use this visualization of active cases and recovery .
- Display of partial comparative mortality data from Italy. This refers to the first data release, which included information till March 21.
- Display of partial all cause mortality data from Italy, comparing data Jan 1-April 15 for 2020 and the median values for five prior years. Analysis of the breakdown by sex and age is forthcoming.
Unfortunately, for the US epidemic we do not have a central curated data repository updated with the same frequency as we do for Italy.
A number of different groups have made serious aggregation efforts. I have looked at only a couple sources, with a focus on California.
Epidemiological model for cases
- Visualization of the cases and death counts data provided by the NYT, with a focus on California and Santa Clara County.
- Visualization of the data provided by California Department of Public Health on hospitalization counts and demographic characteristics of cases and deaths in California.
To understand the public discourse on "flattening the curve" or "heard immunity" etc, I put together a gentle introduction to deterministic models for epidemics.
While this is quite rudimentary, to allow you to play with parameter values and explore options, I make the R markdown available here, together with a file for colors. The data comes from https://github.com/nytimes/covid-19-data . And to run the program, you should use R studio
Individual level data and survival curves
Epidemiological models 101
The following link will bring you to work that is not mine, but I am including here, because it presents information that is not often easy to reach on COVID-19, however fundamental to plan interventions and for an understanding go the disease:
Zhimei Ren, Xiaoman Luo, Lihua Lei and Ruohan Zhan have put together a display of results of statistical analyses with the ultimate goal of building a risk assessment tool for COVID-19.