Dates of reported cases

As of 2019-06-18, the PACS data set contains 8288 cases reported from 2012-01-02 to 2018-12-31. There are 4170 cases for which there are more than 15 days between at least one pair of dates among onset, hospitalization, consultation and sample_collection. A CSV file of these cases is here. Go here if you want to copy and paste this CSV file to your computer. Furthermore, there are 565 cases with no date at all. A CSV file of these cases is here. Go here if you want to copy and paste this CSV file to your computer. After removing these 4170 cases with date problems as well as the 565 cases with no dates at all and inferring the missing onset dates from hospitalization, consultation or sample_collection, the time series of the number of suspected cases per week looks like:

Confirmation tests

Here the presence of a confirmation test is based on the information in the variables pcr, ns1 and serotype. Indeed, some cases without any information in pcr or ns1 may still have an identified serotype. For example:

The split of data according to the availability of time information and confirmation test is:

tested  FALSE TRUE  Sum
  FALSE   391 2640 3031
  TRUE    174 5083 5257
  Sum     565 7723 8288

Among the cases for which a confirmation test is available, the split of data according to positivity and time information is:

confirmed FALSE TRUE  Sum
    FALSE    40 1458 1498
    TRUE    134 3625 3759
    Sum     174 5083 5257

Out of the 8288 reported cases, 5257 (63 %) have a conclusive confirmation test, of which 3759 (72 %) are positive:

PCR            positive equivocal negative not finished not tested   NA  Sum
  positive          267         1      106            2       1784 1445 3605
  equivocal           2         0        3            0         11    0   16
  negative          145         0     1498            2        628 1339 3612
  not finished        0         0        0            0          1    0    1
  not tested          0         0        3            0         16    0   19
  NA                  6         0       15            0          1 1013 1035
  Sum               420         1     1625            4       2441 3797 8288

Stratifying by the reported cases with or without problem in missing dates (i.e. the 4170 cases with date problems as well as the 565 cases with no dates at all), it gives:

PCR            positive equivocal negative not finished not tested   NA  Sum
  positive          123         0       40            1        885  639 1688
  equivocal           1         0        2            0          4    0    7
  negative           47         0      691            2        349  637 1726
  not finished        0         0        0            0          0    0    0
  not tested          0         0        2            0         11    0   13
  NA                  4         0        9            0          0  723  736
  Sum               175         0      744            3       1249 1999 4170


PCR            positive equivocal negative not finished not tested   NA  Sum
  positive          144         1       66            1        899  806 1917
  equivocal           1         0        1            0          7    0    9
  negative           98         0      807            0        279  702 1886
  not finished        0         0        0            0          1    0    1
  not tested          0         0        1            0          5    0    6
  NA                  2         0        6            0          1  290  299
  Sum               245         1      881            1       1192 1798 4118

The status of the serotypes tests is as follow:

The number of missing values for province, district and village:


  293  7995 


  372  7916 


  415  7873 

where TRUE means available informaiton and FALSE means missing information. The combinations of missing values for these 3 variables are:

The reported cases with village information but missing district information:

A CVS file of all the case is here. Go here if you want to copy and paste this CSV file to your computer.


The provinces names that are not official Lao province name:

[1] NA          "Hanoi"     "EntomoGeo" "Jiangsu"  

80 % of reported cases (6405) are from Vientiane prefecture:

The distribution of cases among the different districts of Vientiane prefecture looks like:

There is village information for 7873 cases (95 % of the total number of cases). In Vientiane prefecture, the split of cases depending on available information on village, test and time is

which means that there are 3867 (60 %) cases in Vientiane prefecture for which we have village, time and confirmation test information. If we consider all cases (tested or not), the split then becomes:

village_info FALSE TRUE  Sum
       FALSE     7   49   56
       TRUE    337 6012 6349
       Sum     344 6061 6405

There are 6012 cases (94 %) for which we have both time and village information.


For the reported cases below, the dates of birth are not compatible with the age. It seems that the year of the date of birth has incorrectly been taken as the same as the year of the onset:

There are 99 cases for which the reported age, onset, and date of birth are not quite compatible:

where onset2 is calculated from onset, hospitalization, consultation and sample_collection, and age2 = onset2 - dob. A CVS file of all the case is here. Go here if you want to copy and paste this CSV file to your computer.