The CSV file of the PACS data set is here. Go here if you want to copy and paste this CSV file to your computer. It looks like

# A tibble: 8,288 x 17
      id nationality sex     age dob        province district village travel_laos travel_abroad onset      hospitalization consultation
   <int> <chr>       <fct> <dbl> <date>     <chr>    <chr>    <chr>   <chr>       <chr>         <date>     <date>          <date>      
 1     1 laos        fema…    10 2012-02-10 Vientia… Sisatta… thongk… <NA>        <NA>          2012-03-24 2012-03-25      2012-03-25  
 2     2 laos        male      6 NA         Vientia… Hadxaif… nongheo <NA>        <NA>          2012-03-25 2012-03-28      2012-03-28  
 3     3 africa      fema…    25 1987-05-27 <NA>     <NA>     aaa     <NA>        south africa  2012-03-26 NA              2012-03-30  
 4     4 foreigner   male      0 NA         Vientia… Xaysetha phones… Louangphab… <NA>          2012-09-04 NA              2012-10-04  
 5     5 japan       male      5 NA         <NA>     <NA>     <NA>    <NA>        <NA>          NA         NA              NA          
 6     6 laos        fema…     4 2008-06-06 <NA>     <NA>     <NA>    <NA>        <NA>          2012-04-28 2012-06-05      2012-06-05  
 7     7 japan       male     64 NA         <NA>     <NA>     <NA>    <NA>        <NA>          2012-03-05 NA              2012-05-08  
 8     8 laos        male     13 1998-06-17 Vientia… Chantha… Phonsa… bokeo       <NA>          2012-05-11 2012-05-14      2012-05-14  
 9     9 laos        male      5 2006-12-12 Vientia… Xaysetha phonex… <NA>        <NA>          2012-05-10 2012-05-15      2012-05-15  
10    10 laos        male     14 NA         Vientia… Xaysetha chomma… <NA>        <NA>          2012-05-13 2012-05-16      2012-05-16  
# … with 8,278 more rows, and 4 more variables: sample_collection <date>, pcr <fct>, ns1 <fct>, serotype <fct>

Dates of reported cases

As of 2019-06-18, the PACS data set contains 8288 cases reported from 2012-01-02 to 2018-12-31. There are 4170 cases for which there are more than 15 days between at least one pair of dates among onset, hospitalization, consultation and sample_collection. A CSV file of these cases is here. Go here if you want to copy and paste this CSV file to your computer. Furthermore, there are 565 cases with no date at all. A CSV file of these cases is here. Go here if you want to copy and paste this CSV file to your computer. After removing these 4170 cases with date problems as well as the 565 cases with no dates at all and inferring the missing onset dates from hospitalization, consultation or sample_collection, the time series of the number of suspected cases per week looks like:

Confirmation tests

Here the presence of a confirmation test is based on the information in the variables pcr, ns1 and serotype. Indeed, some cases without any information in pcr or ns1 may still have an identified serotype. For example:

# A tibble: 1 x 4
     id pcr   ns1   serotype
  <int> <fct> <fct> <fct>   
1   955 <NA>  <NA>  dengue_3

The split of data according to the availability of time information and confirmation test is:

       time_info
tested  FALSE TRUE  Sum
  FALSE   391 2640 3031
  TRUE    174 5083 5257
  Sum     565 7723 8288

Among the cases for which a confirmation test is available, the split of data according to positivity and time information is:

         time_info
confirmed FALSE TRUE  Sum
    FALSE    40 1458 1498
    TRUE    134 3625 3759
    Sum     174 5083 5257

Out of the 8288 reported cases, 5257 (63 %) have a conclusive confirmation test, of which 3759 (72 %) are positive:

              NS1
PCR            positive equivocal negative not finished not tested   NA  Sum
  positive          267         1      106            2       1784 1445 3605
  equivocal           2         0        3            0         11    0   16
  negative          145         0     1498            2        628 1339 3612
  not finished        0         0        0            0          1    0    1
  not tested          0         0        3            0         16    0   19
  NA                  6         0       15            0          1 1013 1035
  Sum               420         1     1625            4       2441 3797 8288

Stratifying by the reported cases with or without problem in missing dates (i.e. the 4170 cases with date problems as well as the 565 cases with no dates at all), it gives:

              NS1
PCR            positive equivocal negative not finished not tested   NA  Sum
  positive          123         0       40            1        885  639 1688
  equivocal           1         0        2            0          4    0    7
  negative           47         0      691            2        349  637 1726
  not finished        0         0        0            0          0    0    0
  not tested          0         0        2            0         11    0   13
  NA                  4         0        9            0          0  723  736
  Sum               175         0      744            3       1249 1999 4170

and

              NS1
PCR            positive equivocal negative not finished not tested   NA  Sum
  positive          144         1       66            1        899  806 1917
  equivocal           1         0        1            0          7    0    9
  negative           98         0      807            0        279  702 1886
  not finished        0         0        0            0          1    0    1
  not tested          0         0        1            0          5    0    6
  NA                  2         0        6            0          1  290  299
  Sum               245         1      881            1       1192 1798 4118

The status of the serotypes tests is as follow:

# A tibble: 8 x 2
  serotype           n
  <fct>          <int>
1 dengue_1         120
2 dengue_2          56
3 dengue_3         579
4 dengue_4          20
5 not_identified   203
6 not_finished      38
7 not_tested      2702
8 NA              4570

Geography

The number of missing values for province, district and village:

$province

FALSE  TRUE 
  293  7995 

$district

FALSE  TRUE 
  372  7916 

$village

FALSE  TRUE 
  415  7873 

where TRUE means available informaiton and FALSE means missing information. The combinations of missing values for these 3 variables are:

# A tibble: 6 x 4
  province district village     n
  <lgl>    <lgl>    <lgl>   <int>
1 FALSE    FALSE    FALSE     290
2 FALSE    FALSE    TRUE        3
3 TRUE     FALSE    FALSE      49
4 TRUE     FALSE    TRUE       30
5 TRUE     TRUE     FALSE      76
6 TRUE     TRUE     TRUE     7840

The reported cases with village information but missing province information:

# A tibble: 3 x 4
     id province district village       
  <int> <chr>    <chr>    <chr>         
1     3 <NA>     <NA>     aaa           
2   827 <NA>     <NA>     14 km         
3   906 <NA>     <NA>     saphathongkang

The reported cases with village information but missing district information:

# A tibble: 33 x 4
      id province               district village       
   <int> <chr>                  <chr>    <chr>         
 1     3 <NA>                   <NA>     aaa           
 2   355 Vientiane [prefecture] <NA>     pakkayoung    
 3   615 Vientiane              <NA>     nahone        
 4   650 Vientiane              <NA>     vang          
 5   733 Vientiane              <NA>     napamai       
 6   773 Vientiane [prefecture] <NA>     nonehai       
 7   806 Vientiane              <NA>     namthern      
 8   818 Vientiane              <NA>     nonehai       
 9   827 <NA>                   <NA>     14 km         
10   906 <NA>                   <NA>     saphathongkang
# … with 23 more rows

A CVS file of all the case is here. Go here if you want to copy and paste this CSV file to your computer.

WE SHOULD BE ABLE TO FIND THIS DISTRICTS BASED ON VILLAGE (AND PROVINCE) INFORMATION

The provinces names that are not official Lao province name:

[1] NA          "Hanoi"     "EntomoGeo" "Jiangsu"  

80 % of reported cases (6405) are from Vientiane prefecture:

# A tibble: 22 x 2
   province                   n
   <chr>                  <int>
 1 Vientiane [prefecture]  6405
 2 Saravan                  588
 3 Attapu                   411
 4 <NA>                     293
 5 Vientiane                285
 6 Bolikhamxai               78
 7 Louangphrabang            53
 8 Savannakhét               43
 9 Khammouan                 26
10 Champasak                 25
11 Xiangkhoang               19
12 Xaisômboun                13
13 Louang Namtha             12
14 Houaphan                  10
15 Xaignabouri                8
16 Oudômxai                   5
17 Xékong                     5
18 Bokeo                      4
19 Phôngsali                  2
20 EntomoGeo                  1
21 Hanoi                      1
22 Jiangsu                    1

The distribution of cases among the different districts of Vientiane prefecture looks like:

# A tibble: 11 x 2
   district              n
   <chr>             <int>
 1 Xaythany           1846
 2 Xaysetha           1511
 3 Sisattanak          878
 4 Chanthabuly         795
 5 Hadxaifong          634
 6 Sikhottabong        464
 7 Naxaithong          147
 8 Pakngeum district    67
 9 <NA>                 40
10 Sangthong            21
11 Bolikhamxay           2

There is village information for 7873 cases (95 % of the total number of cases). In Vientiane prefecture, the split of cases depending on available information on village, test and time is

# A tibble: 8 x 4
  village_info test_info time_info     n
  <lgl>        <lgl>     <lgl>     <int>
1 FALSE        FALSE     FALSE         4
2 FALSE        FALSE     TRUE          8
3 FALSE        TRUE      FALSE         3
4 FALSE        TRUE      TRUE         41
5 TRUE         FALSE     FALSE       266
6 TRUE         FALSE     TRUE       2145
7 TRUE         TRUE      FALSE        71
8 TRUE         TRUE      TRUE       3867

which means that there are 3867 (60 %) cases in Vientiane prefecture for which we have village, time and confirmation test information. If we consider all cases (tested or not), the split then becomes:

            time_info
village_info FALSE TRUE  Sum
       FALSE     7   49   56
       TRUE    337 6012 6349
       Sum     344 6061 6405

There are 6012 cases (94 %) for which we have both time and village information.

Ages

For the reported cases below, the dates of birth are not compatible with the age. It seems that the year of the date of birth has incorrectly been taken as the same as the year of the onset:

# A tibble: 14 x 4
      id dob          age onset2    
   <int> <date>     <dbl> <date>    
 1    86 2012-11-06   5   2012-07-08
 2   120 2012-08-05  29   2012-07-27
 3   181 2012-12-08   8   2012-08-12
 4   266 2012-11-08  10   2012-09-13
 5   769 2013-11-01   0   2013-05-10
 6  1169 2013-06-19  13   2013-06-13
 7  2841 2013-12-04  32   2013-12-02
 8  3876 2016-06-05  24   2016-05-06
 9  3970 2016-08-04   1.8 2016-07-04
10  5051 2016-12-01  58   2016-11-25
11  5246 2017-06-13   4   2017-02-09
12  6441 2017-12-30   5   2017-08-11
13  6980 2017-10-08  11   2017-10-06
14  7954 2018-09-23   6   2018-07-07

There are 99 cases for which the reported age, onset, and date of birth are not quite compatible:

# A tibble: 99 x 9
      id dob        onset      hospitalization consultation sample_collection onset2       age   age2
   <int> <date>     <date>     <date>          <date>       <date>            <date>     <dbl>  <dbl>
 1     1 2012-02-10 2012-03-24 2012-03-25      2012-03-25   2013-03-26        2012-03-24    10  0.118
 2    25 1992-05-03 2012-05-20 2012-05-23      2012-05-24   2012-05-25        2012-05-20    10 20.1  
 3    82 1982-02-06 2012-07-08 2012-07-13      2012-07-13   2012-07-13        2012-07-08    27 30.4  
 4   107 2012-04-07 2012-07-18 2012-07-23      2012-07-23   2012-07-23        2012-07-18    22  0.279
 5   144 1992-06-07 2012-08-04 NA              2012-08-08   2012-08-08        2012-08-04    18 20.2  
 6   171 1968-12-05 NA         NA              2012-08-15   2012-08-15        2012-08-08    48 43.7  
 7   207 2008-10-11 2012-08-17 2012-08-21      2012-08-21   2012-08-21        2012-08-17    13  3.85 
 8   300 2006-03-01 2012-09-23 2012-09-25      2012-09-25   2012-09-25        2012-09-23    10  6.57 
 9   429 1983-04-02 2012-11-12 2012-11-16      2012-11-16   2012-11-16        2012-11-12     8 29.6  
10   470 1998-04-15 NA         2012-12-05      2012-12-05   2012-12-06        2012-11-28    25 14.6  
# … with 89 more rows

where onset2 is calculated from onset, hospitalization, consultation and sample_collection, and age2 = onset2 - dob. A CVS file of all the case is here. Go here if you want to copy and paste this CSV file to your computer.