Covid-19, Analysis, Visualization, Prediction

8.5 μs

Author: Zhuofei, Zhou


13.6 μs

Introduction

Covid-19

Covid-19

Novel Coronavirus 2019, on 12 January 2020, WHO officially named it 2019-NCOV. Coronaviruses are a large family of viruses known to cause colds and more serious illnesses such as Middle East Respiratory syndrome (MERS) and severe acute respiratory syndrome (SARS). Novel Coronavirus is a novel coronavirus strain that has never been found in humans before. More information is available.百度百科

data

Data from COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.link

download here:

12.3 μs

import some packages in julia:

6.7 μs
1.5 s
1.5 s
1 s
32.9 ms
23.1 s
1.3 s
16.7 s
2.9 s
266 μs
12.9 s
20.7 s
236 μs
211 μs
228 μs
595 ms

1.9 μs
comfirmed

266 rows × 259 columns (omitted printing of 253 columns)

Province/StateCountry/RegionLatLong1/22/201/23/20
StringStringFloat64Float64Int64Int64
1Afghanistan33.939167.7100
2Albania41.153320.168300
3Algeria28.03391.659600
4Andorra42.50631.521800
5Angola-11.202717.873900
6Antigua and Barbuda17.0608-61.796400
7Argentina-38.4161-63.616700
8Armenia40.069145.038200
9Australian Capital TerritoryAustralia-35.4735149.01200
10New South WalesAustralia-33.8688151.20900
11Northern TerritoryAustralia-12.4634130.84600
12QueenslandAustralia-27.4698153.02500
13South AustraliaAustralia-34.9285138.60100
14TasmaniaAustralia-42.8821147.32700
15VictoriaAustralia-37.8136144.96300
16Western AustraliaAustralia-31.9505115.86100
17Austria47.516214.550100
18Azerbaijan40.143147.576900
14.4 s
812 ms
18.9 ms
19 ms

Analysis

Analysis Global Data

Now, we just use the data to rebuild a new dataframe to store the Global data. I create a new dataframe named as Global_num to store the number of deaths and comfirmed numbers,time from 2020-01-22 to 2020-10-02.

Data Operate

operating as folows:

7.3 μs
86.4 ms
Global_num

255 rows × 4 columns

Dateconfirmed_numdeaths_numrecovered_num
Date…AnyAnyAny
12020-01-225551728
22020-01-236541830
32020-01-249412636
42020-01-2514344239
52020-01-2621185652
62020-01-2729278261
72020-01-285578131107
82020-01-296167133126
92020-01-308235171143
102020-01-319927213222
112020-02-0112038259284
122020-02-0216787362472
132020-02-0319887426623
142020-02-0423898492852
152020-02-05276435641124
162020-02-06308036341487
172020-02-07343967192011
182020-02-08371308062616
128 ms
127 ms

Plot show

Then i plot the number of comfirmed cases and deaths cases over time.

As we can see, the number of comfirmed cases continues to increase exponentially. But the number of deaths cases seems to increase linearly.

8.5 μs
p1
13.5 s
p2
272 ms
p_re
1.3 ms
1.2 s

Now, start calculate the New cases every day,and the deaths rate.

3.1 μs
p4
74 ms
p5
55 ms
p6
85.1 ms

1 rows × 6 columns

Dateconfirmed_numdeaths_numrecovered_numdeath_raterecover_rate
Date…AnyAnyAnyFloat64Float64
12020-04-2931907352306579483180.07228960.29721
369 ms
1.4 s
502 ms

the maximum death rate in 2020-4-29

So, the new comfirmed cases continues to rise, the death rate is already falling.

4.1 μs

Prediction

Global Trend:

It is useful to understand the global trend of an increase in the number of cases over time. There is always a pattern in any data, but the concern is how strongly data follows a pattern. COVID-19 spreads exponentially.

i think of some ways to estimate the curve:

  • Numerical Analysis: ❎

  • linear regression, after transmission: ✅

  • neural network: ✅

So, we first focus on comfirmed case:

do y=y when i use to transition. maybe will get a linea function.:

7.5 μs
2.1 s

Other ways: Neural Network

import Flux to do meachine learning.

the all code here:

using Flux, Statistics
using Flux.Data: DataLoader
using Flux: throttle
using Parameters: @with_kw
using DelimitedFiles
using IterTools: ncycle
using Dates
using DataFrames
using CSV
using Plots


@with_kw mutable struct Args
    η::Float64 = 0.001
    batchsize::Int = 1
    epochs::Int = 1000
end
cd(@__DIR__)
pwd()
#read data:
dt, Header = readdlm("time_series_covid19_comfirmed_global.csv", ',', header=true)
dt_deaths, Header_deaths = readdlm("time_series_covid19_deaths_global.csv", ',', header=true)
dt_recover,Header_recov = readdlm("time_series_covid19_recovered_global.csv",',',header=true)


#create a new datafram to store datum
comfirmed_num = []
deaths_num = []
recovered_num = []
for i in 5:259
    push!(comfirmed_num, sum(dt[:, i]))
    push!(deaths_num, sum(dt_deaths[:, i]))
    push!(recovered_num, sum(dt_recover[:,i]))
end
dates = Date(2020, 1, 22):Day(1):Date(2020, 10, 2)
Global_num = DataFrame(Date=dates, 
    comfirmed_num=comfirmed_num, 
    deaths_num=deaths_num, 
    recovered_num=recovered_num)

#read train_data
n, = size(comfirmed_num)
x = 0:n-1
y = Global_num.comfirmed_num
args = Args()
train_data = DataLoader((Array(x), Float64.(y)), batchsize=args.batchsize)

# define leaky relu
Lelu(x, α=100) = (x ≥ 0 ? x : x/α)

#define Model
m = Chain(
    Dense(1, 40, Lelu),
    Dense(40, 40, Lelu),
    Dense(40, 40, Lelu),
    Dense(40, 1, Lelu)
)

#define loss function
loss(x, y) = Flux.mse(m(x), y)

#define parameters
ps = Flux.params(m)

#define Opt
opt = ADAM(args.η)

#train model
Flux.train!(loss, ps, ncycle(train_data, args.epochs), opt)

#visualize
flux_y = []
for i in Array(x)
    push!(flux_y, Array(m([i]))[1])
end
flux_y = Float64.(flux_y)
plot(dates, [flux_y Float64.(y)], 
    label=["predict" "real"], xlabel="date", ylabel="Numbers", size=(900, 600))
43 μs
predict_curve (generic function with 4 methods)
189 μs
144 s
119 s
117 s

Analysis country data

consider analyzing the situation in different countries.

7.7 μs
dt_country

188 rows × 14 columns (omitted printing of 7 columns)

Country_RegionLast_UpdateLatLong_ConfirmedDeathsRecovered
StringStringFloat64?Float64?Float64Float64Int64?
1Afghanistan2020-10-04 04:23:4433.939167.7139297.01462.032842
2Albania2020-10-04 04:23:4441.153320.168314117.0392.08536
3Algeria2020-10-04 04:23:4428.03391.659651995.01756.036482
4Andorra2020-10-04 04:23:4442.50631.52182110.053.01540
5Angola2020-10-04 04:23:44-11.202717.87395370.0193.02436
6Antigua and Barbuda2020-10-04 04:23:4417.0608-61.7964107.03.096
7Argentina2020-10-04 04:23:44-38.4161-63.6167790818.020795.0626114
8Armenia2020-10-04 04:23:4440.069145.038251925.0972.044583
9Australia2020-10-04 04:23:44-25.0133.027135.0894.024864
10Austria2020-10-04 04:23:4447.516214.550147432.0809.038045
11Azerbaijan2020-10-04 04:23:4440.143147.576940561.0595.038354
12Bahamas2020-10-04 04:23:4425.0259-78.03594332.096.02375
13Bahrain2020-10-04 04:23:4426.027550.5572310.0258.066813
14Bangladesh2020-10-04 04:23:4423.68590.3563367565.05325.0280069
15Barbados2020-10-04 04:23:4413.1939-59.5432196.07.0182
16Belarus2020-10-04 04:23:4453.709827.953479852.0851.075148
17Belgium2020-10-04 04:23:4450.83334.46994127623.010044.019645
18Belize2020-10-04 04:23:4417.1899-88.49762080.028.01290
4.4 ms

188 rows × 14 columns (omitted printing of 9 columns)

Country_RegionLast_UpdateLatLong_Confirmed
StringStringFloat64?Float64?Float64
1US2020-10-04 04:23:4440.0-100.07.38219e6
2India2020-10-04 04:23:4420.593778.96296.47354e6
3Brazil2020-10-04 04:23:44-14.235-51.92534.90683e6
4Russia2020-10-04 04:23:4461.524105.3191.19866e6
5Colombia2020-10-04 04:23:444.5709-74.2973848147.0
6Peru2020-10-04 04:23:44-9.19-75.0152821564.0
7Argentina2020-10-04 04:23:44-38.4161-63.6167790818.0
8Spain2020-10-04 04:23:4440.4637-3.74922789932.0
9Mexico2020-10-04 04:23:4423.6345-102.553757953.0
10South Africa2020-10-04 04:23:44-30.559522.9375679716.0
11France2020-10-04 04:23:4446.22762.2137629509.0
12United Kingdom2020-10-04 04:23:4455.0-3.0482654.0
13Chile2020-10-04 04:23:44-35.6751-71.543468471.0
14Iran2020-10-04 04:23:4432.427953.688468119.0
15Iraq2020-10-04 04:23:4433.223243.6793375931.0
16Bangladesh2020-10-04 04:23:4423.68590.3563367565.0
17Saudi Arabia2020-10-04 04:23:4423.885945.0792335997.0
18Turkey2020-10-04 04:23:4438.963735.2433323014.0
419 ms

188 rows × 14 columns (omitted printing of 9 columns)

Country_RegionLast_UpdateLatLong_Confirmed
StringStringFloat64?Float64?Float64
1US2020-10-04 04:23:4440.0-100.07.38219e6
2India2020-10-04 04:23:4420.593778.96296.47354e6
3Brazil2020-10-04 04:23:44-14.235-51.92534.90683e6
4Russia2020-10-04 04:23:4461.524105.3191.19866e6
5Colombia2020-10-04 04:23:444.5709-74.2973848147.0
6Peru2020-10-04 04:23:44-9.19-75.0152821564.0
7Argentina2020-10-04 04:23:44-38.4161-63.6167790818.0
8Spain2020-10-04 04:23:4440.4637-3.74922789932.0
9Mexico2020-10-04 04:23:4423.6345-102.553757953.0
10South Africa2020-10-04 04:23:44-30.559522.9375679716.0
11France2020-10-04 04:23:4446.22762.2137629509.0
12United Kingdom2020-10-04 04:23:4455.0-3.0482654.0
13Chile2020-10-04 04:23:44-35.6751-71.543468471.0
14Iran2020-10-04 04:23:4432.427953.688468119.0
15Iraq2020-10-04 04:23:4433.223243.6793375931.0
16Bangladesh2020-10-04 04:23:4423.68590.3563367565.0
17Saudi Arabia2020-10-04 04:23:4423.885945.0792335997.0
18Turkey2020-10-04 04:23:4438.963735.2433323014.0
1.4 μs