To provide convenient access to epidemiological data on the coronavirus outbreak, we developed an R package, nCov2019 (https://github.com/GuangchuangYu/nCov2019). Besides detailed real-time statistics, it also includes historical data in China, down to the city-level. We also developed a website (http://www.bcloud.org/e/) with interactive plots and simple time-series forecasts. These analytics tools could be useful in informing the public and studying how this and similar viruses spread in populous countries.
Installation
To start off, users could utilize the ‘remotes’ package to install it directly from GitHub by running the following in R:
Query the latest data
To query the latest data, you can load it in with get_nCov2019()
. By default, the language setting is automatically set to Chinese or English based on the user’s system environment. Of course, users can also use parameter lang = 'zh'
or lang = 'en'
to set it explicitly.
Since most of confirmed cases concentrated in China, researchers may more concern about the details in China. So, print the object x
, you could get the total number of confirmed cases in China.
China (total confirmed cases): 81385
last update: 2020-03-20 17:24:49
And then you could use summary(x)
to get recent Chinese data.
confirm suspect dead heal nowConfirm nowSevere importedCase deadRate
1 41 0 1 0 0 0 0 2.4
2 41 0 1 0 0 0 0 2.4
3 41 0 2 5 0 0 0 4.9
4 45 0 2 8 0 0 0 4.4
5 62 0 2 12 0 0 0 3.2
6 198 0 3 17 0 0 0 1.5
healRate date
1 0.0 01.13
2 0.0 01.14
3 12.2 01.15
4 17.8 01.16
5 19.4 01.17
6 8.6 01.18
While no region is specified, x[]
will return the provincial level outbreak statistics in China.
name confirm suspect dead deadRate showRate heal healRate showHeal
1 Hubei 67800 0 3132 4.62 FALSE 58381 86.11 TRUE
2 Guangdong 1395 0 8 0.57 FALSE 1322 94.77 TRUE
3 Henan 1273 0 22 1.73 FALSE 1250 98.19 TRUE
4 Zhejiang 1234 0 1 0.08 FALSE 1219 98.78 TRUE
5 Hunan 1018 0 4 0.39 FALSE 1014 99.61 TRUE
6 Anhui 990 0 6 0.61 FALSE 984 99.39 TRUE
To obtain a more granular scale data, you only need to specify the province name. For example, to obtain data in Hubei Province.
name confirm suspect dead deadRate showRate heal healRate showHeal
1 Wuhan 50005 0 2498 5.00 FALSE 41389 82.77 TRUE
2 Xiaogan 3518 0 128 3.64 FALSE 3349 95.20 TRUE
3 Huanggang 2907 0 125 4.30 FALSE 2782 95.70 TRUE
4 Jingzhou 1580 0 50 3.16 FALSE 1517 96.01 TRUE
5 Ezhou 1394 0 58 4.16 FALSE 1303 93.47 TRUE
6 Suizhou 1307 0 45 3.44 FALSE 1236 94.57 TRUE
In addition, by using the argument by = 'today'
, the number of newly added cases will be return.
name confirm confirmCuts isUpdated
1 Wuhan 0 0 TRUE
2 Xiaogan 0 0 TRUE
3 Huanggang 0 0 TRUE
4 Jingzhou 0 0 TRUE
5 Ezhou 0 0 FALSE
6 Suizhou 0 0 TRUE
Getting global data is also easy, by using x['global']
, the data frame for the global landscape view of each country will be returned.
name confirm suspect dead deadRate showRate heal healRate
1 China 81385 104 3253 4.00 FALSE 71290 87.60
2 Italy 41035 0 3405 8.3 FALSE 4440 10.82
3 Spain 19980 0 1002 5.02 FALSE 1081 5.41
4 Iran 19644 0 1433 7.29 FALSE 6745 34.34
5 Germany 15320 0 44 0.29 FALSE 115 0.75
6 United States 14365 0 212 1.48 FALSE 125 0.87
showHeal
1 TRUE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
If you wanted to visualize the cumulative summary data, an example plot could be the following:
library(ggplot2)
ggplot(d,
aes(as.Date(date, "%m.%d"), as.numeric(confirm))) +
geom_col(fill = 'firebrick') +
theme_minimal(base_size = 14) +
xlab(NULL) + ylab(NULL) +
scale_x_date(date_labels = "%Y/%m/%d") +
labs(caption = paste("accessed date:", time(x)))
And the bar-plot of the latest confirmed diagnosis in Anhui province could be plotted as follow:
library(ggplot2)
d = x['Anhui', ] # you can replace Anhui with any province
d = d[order(d$confirm), ]
ggplot(d, aes(name, as.numeric(confirm))) +
geom_col(fill = 'firebrick') +
theme_minimal(base_size = 14) +
xlab(NULL) + ylab(NULL) +
labs(caption = paste("accessed date:", time(x))) +
scale_x_discrete(limits = d$name) + coord_flip()
Access detailed historical data
The method for accessing historical data is basically the same as getting the latest data, but entry function is load_nCov2019()
.
library('nCov2019')
y <- load_nCov2019(lang = 'en')
y # this will return update time of historical data
nCov2019 historical data
last update: 2020-03-19
For the historical data, currently, we maintain three historical data, one of which is collected and organized from GitHub repo, user will access it by default, or use load_nCov2019(source = 'github')
to get it.
The second one is obtained from an Chinese website Dingxiangyuan and user could access it by using load_nCov2019(source = 'dxy')
. And the last one is obtained from the National Health Commission of Chinese, user could get it by using argument source = 'cnnhc'
. The forms of these data are basically the same, but the default data source has more comprehensive global historical information and also contains older historical data. Users can compare and switch data from different sources.
# compare the total confirmed cases in china between data sources
library(nCov2019)
library(ggplot2)
nCov2019_set_country('China')
y = load_nCov2019(lang = 'en', source = 'github')
dxy = load_nCov2019(lang = 'en', source = 'dxy')
nhc = load_nCov2019(lang = 'en', source = 'cnnhc')
dxy_china <- aggregate(cum_confirm ~ + time, summary(dxy), sum)
y_china <- aggregate(cum_confirm ~ + time, summary(y), sum)
nhc_china <- aggregate(cum_confirm ~ + time, summary(nhc), sum)
dxy_china$source = 'DXY data'
y_china$source = 'GitHub data'
nhc_china$source = 'NHC data'
df = rbind(dxy_china, y_china, nhc_china)
ggplot(subset(df, time >= '2020-01-11'),
aes(time,cum_confirm, color = source)) +
geom_line() + scale_x_date(date_labels = "%Y-%m-%d") +
ylab('Confirmed Cases in China') + xlab('Time') + theme_bw() +
theme(axis.text.x = element_text(hjust = 1)) +
theme(legend.position = 'bottom')
Then you can use summary(y)
to get historical data at the provincial level in China:
time country province cum_confirm cum_heal cum_dead suspected
1 2019-12-01 China Hubei 1 0 0 0
2 2019-12-02 China Hubei 1 0 0 0
3 2019-12-03 China Hubei 1 0 0 0
4 2019-12-04 China Hubei 1 0 0 0
5 2019-12-05 China Hubei 1 0 0 0
6 2019-12-06 China Hubei 1 0 0 0
To get historical data for all cities in China, you can use y[]
as follow:
time country province city cum_confirm cum_heal cum_dead
1 2019-12-01 China Hubei Wuhan 1 0 0
2 2019-12-02 China Hubei Wuhan 1 0 0
3 2019-12-03 China Hubei Wuhan 1 0 0
4 2019-12-04 China Hubei Wuhan 1 0 0
5 2019-12-05 China Hubei Wuhan 1 0 0
6 2019-12-06 China Hubei Wuhan 1 0 0
suspected
1 0
2 0
3 0
4 0
5 0
6 0
You can also specify a province name to get the corresponding historical data, for example, extracting historical data from Anhui Province:
time country province city cum_confirm cum_heal cum_dead
63 2020-01-21 China Anhui Hefei 0 0 0
96 2020-01-22 China Anhui Hefei 1 0 0
102 2020-01-22 China Anhui Lu'an 0 0 0
198 2020-01-23 China Anhui Hefei 6 0 0
199 2020-01-23 China Anhui Bengbu 1 0 0
200 2020-01-23 China Anhui Anqing 1 0 0
suspected
63 1
96 3
102 1
198 0
199 0
200 0
Similarly, you can get global historical data by specifying the 'global'
parameter.
time country cum_confirm cum_heal cum_dead
3137 2020-03-19 Virgin Islands (U.S.) 2 0 0
3138 2020-03-19 Vietnam 76 16 0
3139 2020-03-19 Mayotte 3 0 0
3140 2020-03-19 South Africa 150 1 0
3141 2020-03-19 Zambia 2 0 0
3142 2020-03-19 Namibia 2 0 0
NOTE: The global historical data is not available from source 'dxy'
.
Here are some visualization examples with the historical data.
- Draw a curve reflecting the number of deaths, confirms, and cures in China.
library('tidyr')
library('ggrepel')
library('ggplot2')
y <- load_nCov2019(lang = 'en')
d <- subset(y['global'], country == 'China')
d <- gather(d, curve, count, -time, -country)
ggplot(d, aes(time, count, color = curve)) +
geom_point() + geom_line() + xlab(NULL) + ylab(NULL) +
theme_bw() + theme(legend.position = "none") +
geom_text_repel(aes(label = curve),
data = d[d$time == time(y), ], hjust = 1) +
theme(axis.text.x = element_text(angle = 15, hjust = 1)) +
scale_x_date(date_labels = "%Y-%m-%d",
limits = c(as.Date("2020-01-15"), as.Date("2020-03-20"))) +
labs(title="Number of deaths, confirms, and cures in China")
- Outbreak Trend Curves of Top ten Countries Around the World (except China).
library('ggrepel')
library('ggplot2')
y <- load_nCov2019(lang = 'en')
df <- y['global']
d <- subset(df,country != 'China' & time == time(y))
t10 <- d[order(d$cum_confirm,decreasing = T),]$country[1:10]
df <- df[which(df$country %in% t10),]
ggplot(df, aes(time, as.numeric(cum_confirm),
group = country, color = country)) +
geom_point() + geom_line() +
geom_label_repel(aes(label = country),
data = df[df$time == time(y), ], hjust = 1) +
theme_bw() + theme(legend.position = 'none') +
xlab(NULL) + ylab(NULL) +
scale_x_date(date_labels = "%Y-%m-%d",
limits = c(as.Date("2020-02-01"), as.Date("2020-03-19"))) +
theme(axis.text.x = element_text(angle = 15, hjust = 1)) +
labs(title = "Outbreak Trend Curves of Top 10 Countries Around the World \n (except China)")
- Growth curve of confirms in Anhui Province, China.
y <- load_nCov2019(lang = 'en')
d <- y['Anhui']
ggplot(d, aes(time, as.numeric(cum_confirm),
group = city, color = city)) +
geom_point() + geom_line() +
geom_label_repel(aes(label = city),
data = d[d$time == time(y), ], hjust = 1) +
theme_minimal(base_size = 14) + theme(legend.position = 'none') +
scale_x_date(date_labels = "%Y-%m-%d") + xlab(NULL) + ylab(NULL) +
theme(axis.text.x = element_text(hjust = 1)) +
labs(title = "Growth curve of confirms in Anhui Province, China")
- A heatmap of epidemic situation around the world in the last 7 days.
library(ggplot2)
y <- load_nCov2019(lang = 'en')
d <- y['global']
max_time <- max(d$time)
min_time <- max_time - 7
d <- na.omit(d[d$time >= min_time & d$time <= max_time,])
dd <- d[d$time == max(d$time, na.rm = TRUE),]
d$country <- factor(d$country,
levels=unique(dd$country[order(dd$cum_confirm)]))
breaks = c(10, 100, 1000, 10000)
ggplot(d, aes(time, country)) +
geom_tile(aes(fill = cum_confirm), color = 'black') +
scale_fill_viridis_c(trans = 'log', breaks = breaks,
labels = breaks) +
xlab(NULL) + ylab(NULL) +
scale_x_date(date_labels = "%Y-%m-%d") + theme_minimal()
The user could get province level data beside China, we current have collected province level information in China, South Korea, United States, Japan, Iran, Italy, Germany and United Kingdom. To get the detail of any country of them, you only need to set the country env as follow:
nCov2019_set_country('Italy')
y <- load_nCov2019(lang = 'en', source = 'github')
head(y['province']) # This will return province data of Italy
time country province cum_confirm cum_heal cum_dead
2823 2020-03-15 Italy Lombardy 13272 2011 1218
2824 2020-03-15 Italy Emilia - Romagna 3093 68 284
2825 2020-03-15 Italy Veneto 2246 120 63
2826 2020-03-15 Italy Marche 1133 0 46
2827 2020-03-15 Italy Piedmont 1111 0 81
2828 2020-03-15 Italy Tuscany 781 10 8
suspected
2823 NA
2824 NA
2825 NA
2826 NA
2827 NA
2828 NA
- Windrose plot of global confirm cases
require(nCov2019)
y <- load_nCov2019(lang = 'en', source='github')
d = y['global']
require(dplyr)
dd <- filter(d, time == time(y)) %>%
arrange(desc(cum_confirm))
dd = dd[1:40, ]
dd$country = factor(dd$country, levels=dd$country)
dd$angle = 1:40 * 360/40
require(ggplot2)
p <- ggplot(dd, aes(country, cum_confirm, fill=cum_confirm)) +
geom_col(width=1, color='grey90') +
geom_col(aes(y=I(5)), width=1, fill='grey90', alpha = .2) +
geom_col(aes(y=I(3)), width=1, fill='grey90', alpha = .2) +
geom_col(aes(y=I(2)), width=1, fill = "white") +
scale_y_log10() +
scale_fill_gradientn(colors=c("darkgreen", "green", "orange", "firebrick","red"), trans="log") +
geom_text(aes(label=paste(country, cum_confirm, sep="\n"),
y = cum_confirm *.8, angle=angle),
data=function(d) d[d$cum_confirm > 700,],
size=3, color = "white", fontface="bold", vjust=1) +
geom_text(aes(label=paste0(cum_confirm, " cases ", country),
y = max(cum_confirm) * 2, angle=angle+90),
data=function(d) d[d$cum_confirm < 700,],
size=3, vjust=0) +
coord_polar(direction=-1) +
theme_void() +
theme(legend.position="none") +
ggtitle("COVID19 global trend", time(y))
- Outbreak trend since 100th cases
require(dplyr)
require(ggplot2)
require(shadowtext)
require(nCov2019)
d <- load_nCov2019()
dd <- d['global'] %>%
as_tibble %>%
rename(confirm=cum_confirm) %>%
filter(confirm > 100 & country != "China") %>%
group_by(country) %>%
mutate(days_since_100 = as.numeric(time - min(time))) %>%
ungroup
breaks=c(100, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000)
p <- ggplot(dd, aes(days_since_100, confirm, color = country)) +
geom_smooth(method='lm', aes(group=1),
data = . %>% filter(!country %in% c("Japan", "Singapore")),
color='grey10', linetype='dashed') +
geom_line(size = 0.8) +
geom_point(pch = 21, size = 1) +
scale_y_log10(expand = expansion(add = c(0,0.1)),
breaks = breaks, labels = breaks) +
scale_x_continuous(expand = expansion(add = c(0,1))) +
theme_minimal(base_size = 14) +
theme(
panel.grid.minor = element_blank(),
legend.position = "none",
plot.margin = margin(3,15,3,3,"mm")
) +
coord_cartesian(clip = "off") +
geom_shadowtext(aes(label = paste0(" ",country)), hjust=0, vjust = 0,
data = . %>% group_by(country) %>% top_n(1, days_since_100),
bg.color = "white") +
labs(x = "Number of days since 100th case", y = "",
subtitle = "Total number of cases")
print(p)
Geographic map visualization
We provide a built-in and convenient geographic map visualization function with nCov2019 package. Getting a plot of the world map is really simple. Just in a few lines as follow:
Combined with chinamap R package, you can draw more detailed maps in China. For example, we can slightly modify the code above to better display China region.
library(chinamap)
x = get_nCov2019(lang = 'en')
cn = get_map_china()
cn$province <- trans_province(cn$province)
plot(x, chinamap = cn, palette = "Reds")
Note: The cn$province
should be translated by using trans_province()
while in English language environment.
We provide several parameters to adjust the final effect of the map, including font.size
to adjust label size, continuous_scale = FALSE
to set a discrete color scale (continuous color scale at log space was used by default), and palette'
to adjust the color palette (e.g. palette = 'blue'
for setting color from dark blue to light blue).
With the argument region , User could plot the map focus on specific country. For example, plot(x, region = 'South Korea')
will plot the map with confirmed cases number in south Korea. And plot(x, region = 'Japan')
will plot the map of Japan (excluding cases in Diamond Princess).
To get a closer look at what the situation is in China, please add the argument region = 'china'
and chinamap
as follow:
x <- get_nCov2019(lang = 'en')
cn = get_map_china()
cn$province <- trans_province(cn$province)
plot(x, region = 'china', chinamap = cn,
continuous_scale = FALSE,
palette = 'Blues', font.size = 2)
More color palettes can be found here: palettes
Plotting data on selected geographical region is also supported if the GIS file was available. we have provided the city maps in China. User cloud downdoad it from here. Also, when the user runs dashboard(remote=FALSE)
the first time, the map rds file cn_city_map.rds
will be downloaded in current working directory, it has contained the whole city maps in China.
# If the user want to use their GIS file,
# it could be loads is as:
# m = sf::st_read("PATH/TO_GIS_file.shp")
rds <- tempfile(fileext = ".rds")
url <- 'https://storage.live.com/items/FB3FF08624DEB0EA!50056'
downloader::download(url, destfile = rds, quiet = FALSE)
m <- readRDS(rds)
Also, you can plot a map on specific date:
nCov2019_set_country(country = 'China')
y <- load_nCov2019(lang = 'en')
cn = get_map_china()
cn$province <- trans_province(cn$province)
plot(y, region = 'china', continuous_scale = FALSE,
chinamap = cn, date = '2020-03-01', font.size = 2)
A more informative application is to draw dynamic geographic maps at multiple time points and save it as gif file. Users can easily do that, just need to specify the date with arguments from
and to
. Other useful parameters includes: width
and height
to specify animation figure size, and filename
to specify the file name to save.
The complete codes for plotting historical maps of world, China, and provinces in China are as follows:
library(nCov2019)
from = "2020-02-18"
to = "2020-03-02"
y <- load_nCov2019(lang = 'en')
# To generate a historical world map;
# with default figure size and save with default filename
# the gif file will be saved in current working directory
plot(y, from = from, to = to)
# To generate a historical map of China
# and save as "china.gif":
library(chinamap)
cn = get_map_china()
cn$province <- trans_province(cn$province)
plot(y, region="china", chinamap=cn,
from=from, to=to, filename='china.gif')
# Specify figure width and height is also available,
# file “cn_city_map.rds” contains map data of cities in China;
shijie = readRDS("cn_city_map.rds")
shijie$NAME = trans_city(shijie$NAME)
plot(y, region="Hubei", chinamap=shijie, width = 600, height = 600, from=from, to=to)
Shiny Dashboard
Sometimes users want to know the situation directly without taking time to code. We provide a helpful Shiny dashboard, both online web and local version, users can choose between them according to needs. By using dashboard(lang = 'en', remote = TRUE)
, an English website of online dashboard will be open, and remote = FALSE
will run it on local machine. The online version is usually more convenient, because the first run of the local version needs to download map information file, which may take some time, depending on your Internet speed.
Session Info
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin18.7.0 (64-bit)
Running under: macOS Mojave 10.14.5
Matrix products: default
BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.7/lib/libopenblasp-r0.3.7.dylib
locale:
[1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-2 sp_1.3-2 chinamap_0.2.0
[4] maps_3.3.0 shadowtext_0.0.7 dplyr_0.8.3
[7] ggrepel_0.8.1 tidyr_1.0.0 ggplot2_3.3.0
[10] nCov2019_0.3.4 knitr_1.24
loaded via a namespace (and not attached):
[1] prettydoc_0.3.1 tidyselect_0.2.5 xfun_0.9
[4] sf_0.8-1 purrr_0.3.3 splines_3.6.2
[7] lattice_0.20-38 colorspace_1.4-1 vctrs_0.2.1
[10] htmltools_0.4.0 viridisLite_0.3.0 yaml_2.2.0
[13] mgcv_1.8-31 gridGraphics_0.4-1 rlang_0.4.4
[16] e1071_1.7-3 pillar_1.4.2 DBI_1.1.0
[19] glue_1.3.1 withr_2.1.2 rvcheck_0.1.7
[22] lifecycle_0.1.0 stringr_1.4.0 munsell_0.5.0
[25] gtable_0.3.0 mapproj_1.2.7 evaluate_0.14
[28] labeling_0.3 class_7.3-15 Rcpp_1.0.2
[31] KernSmooth_2.23-16 classInt_0.4-2 backports_1.1.4
[34] scales_1.0.0 BiocManager_1.30.4 jsonlite_1.6
[37] digest_0.6.20 stringi_1.4.3 grid_3.6.2
[40] tools_3.6.2 magrittr_1.5 tibble_2.1.3
[43] crayon_1.3.4 pkgconfig_2.0.2 zeallot_0.1.0
[46] ellipsis_0.3.0 Matrix_1.2-18 downloader_0.4
[49] ggplotify_0.0.4 assertthat_0.2.1 rmarkdown_2.1
[52] R6_2.4.0 units_0.6-5 nlme_3.1-142
[55] compiler_3.6.2