Chronogram assembly • chronogram

Introduction

A chronogram offers a dynamic solution to real world datasets, not limited to a particular pathogen, vaccine schedule, or experimental assay. The core functionality of chronogram is its ability to expand out dates from your dataset, constructing a comprehensive chronological list. Such a structure can prove invaluable, especially in scenarios where pinpointing precise timepoints is essential. Here, we provide an illustration of how to incorporate your data into a chronogram. Once adapted for your study, the same code can be used and extended as subsequent data or assays become available.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(chronogram)

TLDR

data(smallstudy)

cg <- cg_assemble(
  start_date = "01012020",
  end_date = "10102021",
  ## the provided metadata ##
  metadata = smallstudy$small_study_metadata,
  ## the column name in the metadata that contains participant IDs ##
  metadata_ids_col = elig_study_id,
  ## column name for dates ##
  calendar_date_col = calendar_date,
  ## the provided experiment data (we have 1 assay, so a list of 1) ##
  experiment_data_list = list(smallstudy$small_study_Ab)
)
#> Checking input parameters...
#> -- checking start date 01012020
#> -- checking end date 10102021
#> -- checking end date later than start date
#> -- checking metadata
#> -- checking experiment data list
#> --- checking experiment data list slot 1
#> Input checks completed
#> Chronogram assembling...
#> -- chrongram_skeleton built
#> -- chrongram built with metadata
#> -- adding experiment data
#> --- adding experiment data slot 1 cols... elig_study_id calendar_date serum_Ab_S ...

## Use cg_add_experiment() for extra assays ##

## cg_save() and cg_load() offer a route to save to disk ##
## - metadata de-duplicated on disk & cg_load check object ##
## - save() and load() will work but not encouraged ##

Requirements

There are four inputs:

a start date
an end date
some metadata, including a column that contains participant or study IDs
OPTIONAL Experimental data, stored against the identifier in #1 and a date of sample.

NOTE

the start date, and end date should be the before and after the earliest and latest data available. For example, if your study collected information on prior symptoms at enrolment, and you want to include those in your analysis, use a suitably early start date.

there are no limits to the number of experimental data sources.

In brief

cg_assemble() wraps the following functions, and in most situations cg_assemble() is the advised route. In this vignette, we unpack step-wise.

## Fictional example data ##
data(smallstudy)

## 5 requirements ##
ids <- smallstudy$small_study_ids

start <- "01012020"
end <- "10102021"

meta <- smallstudy$small_study_metadata
ab <- smallstudy$small_study_Ab # here, we just have antibody data


## Make a chronogram ##
small_study <- chronogram_skeleton(
  ids = ids,
  start_date = start,
  end_date = end,
  ## change this to your ID column name ##
  col_ids = elig_study_id,
  ## change this to your date column name ##
  col_calendar_date = calendar_date
)

small_study <- chronogram(
  small_study,
  meta
)

small_study <- cg_add_experiment(
  small_study,
  ab
)

small_study
#> # A tibble:     1,947 × 10
#> # A chronogram: try summary()
#>    calendar_date elig_study_id   age sex   dose_1 date_dose_1 dose_2 date_dose_2
#>  * <date>        <fct>         <dbl> <fct> <fct>  <date>      <fct>  <date>     
#>  1 2020-01-01    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  2 2020-01-02    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  3 2020-01-03    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  4 2020-01-04    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  5 2020-01-05    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  6 2020-01-06    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  7 2020-01-07    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  8 2020-01-08    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  9 2020-01-09    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#> 10 2020-01-10    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#> # ℹ 1,937 more rows
#> # ℹ 2 more variables: serum_Ab_S <dbl>, serum_Ab_N <dbl>
#> # ★ Dates: calendar_date      ★ IDs: elig_study_id
#> # ★ metadata: age, sex, dose_1, date_dose_1, dose_2, date_dose_2

## Use cg_add_experiment() for extra assays ##

Step-by-step

Generate a chronogram_skeleton object

chronogram_skeleton() returns an object which: - contains two columns: calendar_date (name is set with col_calendar_date), elig_study_id (name is set with col_ids). - contains a row for each participant for each day.

## Fictional example data ##
data(smallstudy)

## 5 requirements ##
ids <- smallstudy$small_study_ids

start <- "01012020"
end <- "10102021"

meta <- smallstudy$small_study_metadata
ab <- smallstudy$small_study_Ab # here, we just have antibody data


## Make a chronogram_skeleton ##
small_study <- chronogram_skeleton(
  ids = ids,
  start_date = start,
  end_date = end,
  ## change this to your ID column name ##
  col_ids = elig_study_id,
  ## change this to your date column name ##
  col_calendar_date = calendar_date
)

Now print, to check that you have generated the expected results.

small_study
#> # A tibble: 1,947 × 2
#>    calendar_date elig_study_id
#>  * <date>        <fct>        
#>  1 2020-01-01    1            
#>  2 2020-01-02    1            
#>  3 2020-01-03    1            
#>  4 2020-01-04    1            
#>  5 2020-01-05    1            
#>  6 2020-01-06    1            
#>  7 2020-01-07    1            
#>  8 2020-01-08    1            
#>  9 2020-01-09    1            
#> 10 2020-01-10    1            
#> # ℹ 1,937 more rows

Or you may want to try: View(small_study) via Rstudio.

This chronogram_skeleton is the framework onto which we will add metadata, and experimental data.

The provided col_ids and col_calendar_date are stored as attributes of small_study, so the user does not have to enter them again. Chronogram assumes you are adding on data indexed by those column names.

NOTE

Your study may have data stored under ‘StudyID’ or ‘PID’ etc. This is fine: adjust col_ids = use_whatever_your_StudyID_is.

Similarly, your study may use ‘date’, or ‘date_on_mars’. Adjust: col_calendar_date = use_whatever_your_Study_uses_for_dates.

Dates are in ddmmyyyy format (anything that lubridate::dmy() can interpret will work). Providing dates in other formats is likely to fail with an error. There is a special case for US formatted dates where dd<=12 and mm<=12:

chronogram_skeleton(
  ids = ids,
  start_date = "01012020",
  ## 1st Dec 2020, provided in mmddyyyy ##
  end_date = "12012020",
  col_ids = elig_study_id,
  col_calendar_date = calendar_date
)
#> # A tibble: 36 × 2
#>    calendar_date elig_study_id
#>  * <date>        <fct>        
#>  1 2020-01-01    1            
#>  2 2020-01-02    1            
#>  3 2020-01-03    1            
#>  4 2020-01-04    1            
#>  5 2020-01-05    1            
#>  6 2020-01-06    1            
#>  7 2020-01-07    1            
#>  8 2020-01-08    1            
#>  9 2020-01-09    1            
#> 10 2020-01-10    1            
#> # ℹ 26 more rows

The above is interpreted as 12-day interval. Swap to ddmmyyy to get the expected 11 months:

chronogram_skeleton(
  ids = ids,
  start_date = "01012020",
  ## 1st Dec 2020, provided in the correct ddmmyyyy ##
  end_date = "01122020",
  col_ids = elig_study_id,
  col_calendar_date = calendar_date
)
#> # A tibble: 1,008 × 2
#>    calendar_date elig_study_id
#>  * <date>        <fct>        
#>  1 2020-01-01    1            
#>  2 2020-01-02    1            
#>  3 2020-01-03    1            
#>  4 2020-01-04    1            
#>  5 2020-01-05    1            
#>  6 2020-01-06    1            
#>  7 2020-01-07    1            
#>  8 2020-01-08    1            
#>  9 2020-01-09    1            
#> 10 2020-01-10    1            
#> # ℹ 998 more rows

Generate a chronogram object

Here, we combine the outline data structure (a chronogram_skeleton object) with the metadata. Each line of metadata is repeated for every row of that individual. This is a shortcut to providing useful for selecting samples that meet particular characteristics (eg antibody testing 14-21d after dose 2) and for plotting based on these characteristics. The extra memory occupied by this repetition is ~25-50kB for this example.

The metadata contains age, sex, dates and formulations of doses 1 and 2 (values are plausible, but fictitious). The provided metadata is a tibble, and care has been taken to provide columns of relevant classes (factors, dates etc).

head(meta)
#> # A tibble: 3 × 7
#>   elig_study_id   age sex   dose_1   date_dose_1 dose_2   date_dose_2
#>           <dbl> <dbl> <fct> <fct>    <date>      <fct>    <date>     
#> 1             1    40 F     AZD1222  2021-01-05  AZD1222  2021-02-05 
#> 2             2    45 F     BNT162b2 2021-01-05  BNT162b2 2021-02-05 
#> 3             3    35 M     BNT162b2 2021-01-10  BNT162b2 2021-03-10

The assembly of a chronogram is a join between the chronogram_skeleton and the metadata.

small_study <- chronogram(
  small_study,
  meta
)
small_study
#> # A tibble:     1,947 × 8
#> # A chronogram: try summary()
#>    calendar_date elig_study_id   age sex   dose_1 date_dose_1 dose_2 date_dose_2
#>  * <date>        <fct>         <dbl> <fct> <fct>  <date>      <fct>  <date>     
#>  1 2020-01-01    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  2 2020-01-02    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  3 2020-01-03    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  4 2020-01-04    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  5 2020-01-05    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  6 2020-01-06    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  7 2020-01-07    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  8 2020-01-08    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  9 2020-01-09    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#> 10 2020-01-10    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#> # ℹ 1,937 more rows
#> # ★ Dates: calendar_date      ★ IDs: elig_study_id
#> # ★ metadata: age, sex, dose_1, date_dose_1, dose_2, date_dose_2

Add experimental data to chronogram object

The experimental data provided is for a serum anti-S IgG and a serum anti-N IgG (again: plausible, but fictitious). The experimental data is provided with both a study ID and a calendar date. The data will be joined by these variables: their column names must match exactly to the main chronogram. The date to use is the date of the sample, rather than the assay date (which may be months or years later if the sample was frozen). Be wary of duplicates or shared columns when adding experimental data - a warning is provided for any shared columns.

head(ab)
#> # A tibble: 6 × 4
#>   elig_study_id calendar_date serum_Ab_S serum_Ab_N
#>           <dbl> <date>             <dbl>      <dbl>
#> 1             1 2021-01-05           500        100
#> 2             1 2021-01-15          4000        100
#> 3             1 2021-02-03          3750        100
#> 4             1 2021-02-15         10000        100
#> 5             2 2021-01-05             0          0
#> 6             2 2021-01-15          2000          0

We are going to add just this one set of experimental data, but there is no limit to the number of experiments you could add. The process assumes that all runs of the same assay are added at once. For example, if you had 10 runs of anti-S and anti-N (i.e. 10 objects that looked like ab), you should combine these by bind_rows() to make one long object and then proceed to cg_add_experiment().

small_study <- cg_add_experiment(
  small_study,
  ab
)

small_study
#> # A tibble:     1,947 × 10
#> # A chronogram: try summary()
#>    calendar_date elig_study_id   age sex   dose_1 date_dose_1 dose_2 date_dose_2
#>  * <date>        <fct>         <dbl> <fct> <fct>  <date>      <fct>  <date>     
#>  1 2020-01-01    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  2 2020-01-02    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  3 2020-01-03    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  4 2020-01-04    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  5 2020-01-05    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  6 2020-01-06    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  7 2020-01-07    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  8 2020-01-08    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#>  9 2020-01-09    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#> 10 2020-01-10    1                40 F     AZD12… 2021-01-05  AZD12… 2021-02-05 
#> # ℹ 1,937 more rows
#> # ℹ 2 more variables: serum_Ab_S <dbl>, serum_Ab_N <dbl>
#> # ★ Dates: calendar_date      ★ IDs: elig_study_id
#> # ★ metadata: age, sex, dose_1, date_dose_1, dose_2, date_dose_2

## Use cg_add_experiment() for extra assays ##

You might have run the same assay on several different materials, such as anti-S and anti-N IgG testing on serum and mucosal sampling. Labelling your data columns in a system like source_test_type (eg serum_Ab_S vs nasal_Ab_S vs BAL_Ab_S etc) is advantageous, as it makes analysis easier. If you wanted to grab all the S IgG data, perhaps to pass to ggplot2, you can use small.study %>% select(contains("Ab_S")). This is not mandatory, and you can still build with whatever column names you like (aside from the col_ids and col_calendar_date that are required).

If you have complex data that cannot be reduced to a single entry (eg. scRNAseq data), then adding a column like: PBMC_scRNAseq = run/not_run, is useful as you can handily subset your final chronogram down to an object that looks like colData or annData for Bioconductor’s 10x analysis, Seurat, or scanpy: small_study %>% filter(PBMC_scRNAseq == "run").

If your data are already in an SQL (or similar) database, see the assembly from SQL vignette.

Summary

We have constructed a chronogram from a small study of n=3 individuals with a simple set of metadata and a single experimental assay.

SessionInfo

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] chronogram_1.0.0 lubridate_1.9.3  dplyr_1.1.4     
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.5       cli_3.6.3         knitr_1.48        rlang_1.1.4      
#>  [5] xfun_0.49         purrr_1.0.2       generics_0.1.3    textshaping_0.4.0
#>  [9] jsonlite_1.8.9    glue_1.8.0        htmltools_0.5.8.1 ragg_1.3.3       
#> [13] sass_0.4.9        fansi_1.0.6       rmarkdown_2.29    tibble_3.2.1     
#> [17] evaluate_1.0.1    jquerylib_0.1.4   fastmap_1.2.0     yaml_2.3.10      
#> [21] lifecycle_1.0.4   compiler_4.4.2    fs_1.6.5          timechange_0.3.0 
#> [25] pkgconfig_2.0.3   tidyr_1.3.1       systemfonts_1.1.0 digest_0.6.37    
#> [29] R6_2.5.1          tidyselect_1.2.1  utf8_1.2.4        pillar_1.9.0     
#> [33] magrittr_2.0.3    bslib_0.8.0       withr_3.0.2       tools_4.4.2      
#> [37] pkgdown_2.1.1     cachem_1.1.0      desc_1.4.3