The CSV file of the GPS data set is here. Go here if you want to copy and paste this CSV file to your computer.

There are 1841 coordinates in this file from 1839 unique cases (i.e. 22 % of the total number of cases, 8288, reported in PACS). By year, the GPS data split like this:

# A tibble: 8 x 3
   year   gps  perc
  <dbl> <int> <dbl>
1  2012    95 18.5 
2  2013   593 26.8 
3  2014     5  3.85
4  2015   122 21.3 
5  2016   280 21.8 
6  2017   603 27.7 
7  2018    60  7.21
8    NA    81 14.3 

Considering only confirmed cases, it looks like:

# A tibble: 8 x 3
   year   gps  perc
  <dbl> <int> <dbl>
1  2012    95  46.1
2  2013   572  45.9
3  2014     5  31.2
4  2015   122  45.7
5  2016   137  54.8
6  2017   586  44.3
7  2018    60  18.9
8    NA    24  17.9

The duplicates are

# A tibble: 4 x 4
     id source   longitude latitude
  <int> <chr>        <dbl>    <dbl>
1  7681 server        103.     18.0
2  7681 server        103.     18.1
3  6060 whatsapp      103.     18.1
4  6060 whatsapp      103.     18.0

There are 81 geocoded cases for which we don’t have any date:

# A tibble: 81 x 5
      id onset      hospitalization consultation sample_collection
   <int> <date>     <date>          <date>       <date>           
 1   937 NA         NA              NA           NA               
 2   941 NA         NA              NA           NA               
 3  1112 NA         NA              NA           NA               
 4  1211 NA         NA              NA           NA               
 5  1281 NA         NA              NA           NA               
 6  1282 NA         NA              NA           NA               
 7  1286 NA         NA              NA           NA               
 8  1489 NA         NA              NA           NA               
 9  1572 NA         NA              NA           NA               
10  1869 NA         NA              NA           NA               
# … with 71 more rows

The CSV file of these cases is here. Go here if you want to copy and paste this CSV file to your computer. The split by source reads

# A tibble: 4 x 2
  source       n
  <chr>    <int>
1 new_gps    821
2 old_gps    736
3 server      31
4 whatsapp   253

The split of data according to the test and the availability of GPS coordinates is:

       gps
tested  FALSE TRUE  Sum
  FALSE  2800  231 3031
  TRUE   3649 1608 5257
  Sum    6449 1839 8288

The cases with GPS data and reported as negative are:

# A tibble: 7 x 3
     id gps   confirmed
  <int> <lgl> <lgl>    
1  2218 TRUE  FALSE    
2  2303 TRUE  FALSE    
3  2375 TRUE  FALSE    
4  2423 TRUE  FALSE    
5  4925 TRUE  FALSE    
6  5052 TRUE  FALSE    
7  7097 TRUE  FALSE    

The split of data according to the availability of village data and GPS coordinates is:

            gps
village_info FALSE TRUE  Sum
       FALSE   400   15  415
       TRUE   6049 1824 7873
       Sum    6449 1839 8288

Here is a map of the geolocated cases:

The blue crosses are the Wattay International Airport, the Vayakorn Inn and the Institut Pasteur du Laos. Same with satellite image background: