The CSV file of the PACS data set is here. Go here if you want to copy and paste this CSV file to your computer. It looks like
# A tibble: 8,288 x 17
id nationality sex age dob province district village travel_laos travel_abroad onset hospitalization consultation
<int> <chr> <fct> <dbl> <date> <chr> <chr> <chr> <chr> <chr> <date> <date> <date>
1 1 laos fema… 10 2012-02-10 Vientia… Sisatta… thongk… <NA> <NA> 2012-03-24 2012-03-25 2012-03-25
2 2 laos male 6 NA Vientia… Hadxaif… nongheo <NA> <NA> 2012-03-25 2012-03-28 2012-03-28
3 3 africa fema… 25 1987-05-27 <NA> <NA> aaa <NA> south africa 2012-03-26 NA 2012-03-30
4 4 foreigner male 0 NA Vientia… Xaysetha phones… Louangphab… <NA> 2012-09-04 NA 2012-10-04
5 5 japan male 5 NA <NA> <NA> <NA> <NA> <NA> NA NA NA
6 6 laos fema… 4 2008-06-06 <NA> <NA> <NA> <NA> <NA> 2012-04-28 2012-06-05 2012-06-05
7 7 japan male 64 NA <NA> <NA> <NA> <NA> <NA> 2012-03-05 NA 2012-05-08
8 8 laos male 13 1998-06-17 Vientia… Chantha… Phonsa… bokeo <NA> 2012-05-11 2012-05-14 2012-05-14
9 9 laos male 5 2006-12-12 Vientia… Xaysetha phonex… <NA> <NA> 2012-05-10 2012-05-15 2012-05-15
10 10 laos male 14 NA Vientia… Xaysetha chomma… <NA> <NA> 2012-05-13 2012-05-16 2012-05-16
# … with 8,278 more rows, and 4 more variables: sample_collection <date>, pcr <fct>, ns1 <fct>, serotype <fct>
As of 2019-06-18, the PACS data set contains 8288 cases reported from 2012-01-02 to 2018-12-31. There are 4170 cases for which there are more than 15 days between at least one pair of dates among onset
, hospitalization
, consultation
and sample_collection
. A CSV file of these cases is here. Go here if you want to copy and paste this CSV file to your computer. Furthermore, there are 565 cases with no date at all. A CSV file of these cases is here. Go here if you want to copy and paste this CSV file to your computer. After removing these 4170 cases with date problems as well as the 565 cases with no dates at all and inferring the missing onset dates from hospitalization
, consultation
or sample_collection
, the time series of the number of suspected cases per week looks like:
Here the presence of a confirmation test is based on the information in the variables pcr
, ns1
and serotype
. Indeed, some cases without any information in pcr
or ns1
may still have an identified serotype. For example:
# A tibble: 1 x 4
id pcr ns1 serotype
<int> <fct> <fct> <fct>
1 955 <NA> <NA> dengue_3
The split of data according to the availability of time information and confirmation test is:
time_info
tested FALSE TRUE Sum
FALSE 391 2640 3031
TRUE 174 5083 5257
Sum 565 7723 8288
Among the cases for which a confirmation test is available, the split of data according to positivity and time information is:
time_info
confirmed FALSE TRUE Sum
FALSE 40 1458 1498
TRUE 134 3625 3759
Sum 174 5083 5257
Out of the 8288 reported cases, 5257 (63 %) have a conclusive confirmation test, of which 3759 (72 %) are positive:
NS1
PCR positive equivocal negative not finished not tested NA Sum
positive 267 1 106 2 1784 1445 3605
equivocal 2 0 3 0 11 0 16
negative 145 0 1498 2 628 1339 3612
not finished 0 0 0 0 1 0 1
not tested 0 0 3 0 16 0 19
NA 6 0 15 0 1 1013 1035
Sum 420 1 1625 4 2441 3797 8288
Stratifying by the reported cases with or without problem in missing dates (i.e. the 4170 cases with date problems as well as the 565 cases with no dates at all), it gives:
NS1
PCR positive equivocal negative not finished not tested NA Sum
positive 123 0 40 1 885 639 1688
equivocal 1 0 2 0 4 0 7
negative 47 0 691 2 349 637 1726
not finished 0 0 0 0 0 0 0
not tested 0 0 2 0 11 0 13
NA 4 0 9 0 0 723 736
Sum 175 0 744 3 1249 1999 4170
and
NS1
PCR positive equivocal negative not finished not tested NA Sum
positive 144 1 66 1 899 806 1917
equivocal 1 0 1 0 7 0 9
negative 98 0 807 0 279 702 1886
not finished 0 0 0 0 1 0 1
not tested 0 0 1 0 5 0 6
NA 2 0 6 0 1 290 299
Sum 245 1 881 1 1192 1798 4118
The status of the serotypes tests is as follow:
# A tibble: 8 x 2
serotype n
<fct> <int>
1 dengue_1 120
2 dengue_2 56
3 dengue_3 579
4 dengue_4 20
5 not_identified 203
6 not_finished 38
7 not_tested 2702
8 NA 4570
The number of missing values for province, district and village:
$province
FALSE TRUE
293 7995
$district
FALSE TRUE
372 7916
$village
FALSE TRUE
415 7873
where TRUE
means available informaiton and FALSE
means missing information. The combinations of missing values for these 3 variables are:
# A tibble: 6 x 4
province district village n
<lgl> <lgl> <lgl> <int>
1 FALSE FALSE FALSE 290
2 FALSE FALSE TRUE 3
3 TRUE FALSE FALSE 49
4 TRUE FALSE TRUE 30
5 TRUE TRUE FALSE 76
6 TRUE TRUE TRUE 7840
The reported cases with village information but missing province information:
# A tibble: 3 x 4
id province district village
<int> <chr> <chr> <chr>
1 3 <NA> <NA> aaa
2 827 <NA> <NA> 14 km
3 906 <NA> <NA> saphathongkang
The reported cases with village information but missing district information:
# A tibble: 33 x 4
id province district village
<int> <chr> <chr> <chr>
1 3 <NA> <NA> aaa
2 355 Vientiane [prefecture] <NA> pakkayoung
3 615 Vientiane <NA> nahone
4 650 Vientiane <NA> vang
5 733 Vientiane <NA> napamai
6 773 Vientiane [prefecture] <NA> nonehai
7 806 Vientiane <NA> namthern
8 818 Vientiane <NA> nonehai
9 827 <NA> <NA> 14 km
10 906 <NA> <NA> saphathongkang
# … with 23 more rows
A CVS file of all the case is here. Go here if you want to copy and paste this CSV file to your computer.
WE SHOULD BE ABLE TO FIND THIS DISTRICTS BASED ON VILLAGE (AND PROVINCE) INFORMATION
The provinces names that are not official Lao province name:
[1] NA "Hanoi" "EntomoGeo" "Jiangsu"
80 % of reported cases (6405) are from Vientiane prefecture:
# A tibble: 22 x 2
province n
<chr> <int>
1 Vientiane [prefecture] 6405
2 Saravan 588
3 Attapu 411
4 <NA> 293
5 Vientiane 285
6 Bolikhamxai 78
7 Louangphrabang 53
8 Savannakhét 43
9 Khammouan 26
10 Champasak 25
11 Xiangkhoang 19
12 Xaisômboun 13
13 Louang Namtha 12
14 Houaphan 10
15 Xaignabouri 8
16 Oudômxai 5
17 Xékong 5
18 Bokeo 4
19 Phôngsali 2
20 EntomoGeo 1
21 Hanoi 1
22 Jiangsu 1
The distribution of cases among the different districts of Vientiane prefecture looks like:
# A tibble: 11 x 2
district n
<chr> <int>
1 Xaythany 1846
2 Xaysetha 1511
3 Sisattanak 878
4 Chanthabuly 795
5 Hadxaifong 634
6 Sikhottabong 464
7 Naxaithong 147
8 Pakngeum district 67
9 <NA> 40
10 Sangthong 21
11 Bolikhamxay 2
There is village information for 7873 cases (95 % of the total number of cases). In Vientiane prefecture, the split of cases depending on available information on village, test and time is
# A tibble: 8 x 4
village_info test_info time_info n
<lgl> <lgl> <lgl> <int>
1 FALSE FALSE FALSE 4
2 FALSE FALSE TRUE 8
3 FALSE TRUE FALSE 3
4 FALSE TRUE TRUE 41
5 TRUE FALSE FALSE 266
6 TRUE FALSE TRUE 2145
7 TRUE TRUE FALSE 71
8 TRUE TRUE TRUE 3867
which means that there are 3867 (60 %) cases in Vientiane prefecture for which we have village, time and confirmation test information. If we consider all cases (tested or not), the split then becomes:
time_info
village_info FALSE TRUE Sum
FALSE 7 49 56
TRUE 337 6012 6349
Sum 344 6061 6405
There are 6012 cases (94 %) for which we have both time and village information.
For the reported cases below, the dates of birth are not compatible with the age. It seems that the year of the date of birth has incorrectly been taken as the same as the year of the onset:
# A tibble: 14 x 4
id dob age onset2
<int> <date> <dbl> <date>
1 86 2012-11-06 5 2012-07-08
2 120 2012-08-05 29 2012-07-27
3 181 2012-12-08 8 2012-08-12
4 266 2012-11-08 10 2012-09-13
5 769 2013-11-01 0 2013-05-10
6 1169 2013-06-19 13 2013-06-13
7 2841 2013-12-04 32 2013-12-02
8 3876 2016-06-05 24 2016-05-06
9 3970 2016-08-04 1.8 2016-07-04
10 5051 2016-12-01 58 2016-11-25
11 5246 2017-06-13 4 2017-02-09
12 6441 2017-12-30 5 2017-08-11
13 6980 2017-10-08 11 2017-10-06
14 7954 2018-09-23 6 2018-07-07
There are 99 cases for which the reported age, onset, and date of birth are not quite compatible:
# A tibble: 99 x 9
id dob onset hospitalization consultation sample_collection onset2 age age2
<int> <date> <date> <date> <date> <date> <date> <dbl> <dbl>
1 1 2012-02-10 2012-03-24 2012-03-25 2012-03-25 2013-03-26 2012-03-24 10 0.118
2 25 1992-05-03 2012-05-20 2012-05-23 2012-05-24 2012-05-25 2012-05-20 10 20.1
3 82 1982-02-06 2012-07-08 2012-07-13 2012-07-13 2012-07-13 2012-07-08 27 30.4
4 107 2012-04-07 2012-07-18 2012-07-23 2012-07-23 2012-07-23 2012-07-18 22 0.279
5 144 1992-06-07 2012-08-04 NA 2012-08-08 2012-08-08 2012-08-04 18 20.2
6 171 1968-12-05 NA NA 2012-08-15 2012-08-15 2012-08-08 48 43.7
7 207 2008-10-11 2012-08-17 2012-08-21 2012-08-21 2012-08-21 2012-08-17 13 3.85
8 300 2006-03-01 2012-09-23 2012-09-25 2012-09-25 2012-09-25 2012-09-23 10 6.57
9 429 1983-04-02 2012-11-12 2012-11-16 2012-11-16 2012-11-16 2012-11-12 8 29.6
10 470 1998-04-15 NA 2012-12-05 2012-12-05 2012-12-06 2012-11-28 25 14.6
# … with 89 more rows
where onset2
is calculated from onset
, hospitalization
, consultation
and sample_collection
, and age2 = onset2 - dob
. A CVS file of all the case is here. Go here if you want to copy and paste this CSV file to your computer.