Annotate exposures
annotate_exposures.Rmd
Introduction
The chronogram package provides a family of functions to annotate a
chronogram. These all start cg_annotate_
. This vignette
explains how to use these annotation functions. Before using this
vignette, consult the vignette("assembly")
.
This vignette demonstrates annotation of exposures. Exposures can be
either from vaccines, or infection episodes - these require annotation
first. See: vignette("annotate vaccines")
&
vignette("annotate episodes")
.
Setup
library(chronogram)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
library(patchwork)
We will use the example pre-built chronogram, introduced in the
vignette("assembly")
, with vaccines and episodes annotated:
vignette("annotate vaccines")
&
vignette("annotate episodes")
. The code chunk below is
discussed in those vignettes.
data(built_smallstudy)
cg <- built_smallstudy$chronogram
infections_to_add <- built_smallstudy$infections_to_add
## add to chronogram
cg <- cg_add_experiment(
cg,
infections_to_add
)
## annotate vaccines
cg <- cg_annotate_vaccines_count(
cg,
## the prefix to the dose columns: ##
dose = dose,
## the output column name: ##
dose_counter = dose_number,
## the prefix to the date columns: ##
vaccine_date_stem = date_dose,
## use 14d to 'star' after a dose ##
intermediate_days = 14
)
#> Using stem: date_dose
#> Found vaccine dates
#> date_dose_1
#>
#> date_dose_2
## annotate episodes
cg <- cg %>%
cg_annotate_episodes_find(
infection_cols = c("LFT", "PCR", "symptoms"),
infection_present = c("pos", "Post", "^severe")
) %>%
mutate(
episode_variant =
case_when(
# "is an episode" & "PCR positive" -> Delta #
(!is.na(episode_number)) & PCR == "Pos" ~ "Delta",
# "is an episode" & "PCR unavailable" -> Anc/Delta #
(!is.na(episode_number)) & PCR == "not tested" ~ "Anc/Alpha"
)
) %>%
cg_annotate_episodes_fill(
col_to_fill = episode_variant,
col_to_return = episode_variant_filled,
.direction = "updown"
)
#> Parsed: infection_cols and infection_present
#>
#> Searching in the [[column]], for the "text":
#> stringr::str_detect(.data[["LFT"]], "pos") ~ "yes"
#>
#> stringr::str_detect(.data[["PCR"]], "Post") ~ "yes"
#>
#> stringr::str_detect(.data[["symptoms"]], "^severe") ~ "yes"
#>
#>
#> ...detecting will be exact.
#> Capitals, spelling etc must be precise
#> Joining with `by = join_by(calendar_date, elig_study_id)`
#> Joining with `by = join_by(elig_study_id, episode_number)`
Outline
Annotation is required to allow the selection sub-cohorts of individuals (and corresponding dates) that are relevant to test your biological hypothesis.
Exposures can be either from vaccine doses or from infection
episodes. cg_annotate_exposure_count()
provides a
cumulative counter for each individual’s personal history.
cg_annotate_antigenic_history()
returns a text string
summarising the sequence of encounters.
Worked example
cg_annotate_exposure_count()
takes the column names for
episode number, dose number and seroconversion episode numbers, to
calculate a per day, per individual running count of exposures.
cg_annotate_antigenic_history()
provides a character
vector of length 1 to summarise each person’s course over the study.
These are both best explored with an example, and a plot.
cg_exposures <- cg %>% cg_annotate_exposures_count(
episode_number = episode_number,
dose_number = dose_number,
## we have not considered episodes of seroconversion
N_seroconversion_episode_number = NULL
)
cg_exposures <- cg_exposures %>%
mutate(
episode_variant_summarised =
episode_variant_filled
) %>%
cg_annotate_antigenic_history(
episode_number = episode_number,
dose_number = dose_number,
episode_variant_summarised = episode_variant_summarised,
ag_col = antigenic_history
)
## Plot ##
top_panel <- cg_exposures %>%
select(calendar_date,
exposure_number,
elig_study_id,
antigenic_history) %>%
ggplot(aes(
x = calendar_date, y = exposure_number,
col = elig_study_id
)) +
geom_line() +
facet_grid(antigenic_history ~ .)
swimmers_panel <- cg_plot_meta(cg_exposures,
visit = serum_Ab_S
) +
## set the axes to match top_panel ##
xlim(
min(cg_exposures$calendar_date),
max(cg_exposures$calendar_date)
) +
scale_y_discrete(limits = factor(c(3, 2, 1)))
#> Function provided to illustrate chronogram ->
#> ggplot2 interface.
#> Function assumes the
#> presence of {dose_1, date_dose_1, dose_2, date_dose_2}
#> columns.
#> Users are likely to want to write their own,
#> study-specific applications
top_panel / swimmers_panel & theme_bw() &
theme(
legend.position = "bottom",
strip.text.y = element_text(angle = 0),
strip.background = element_blank(),
panel.grid.minor = element_blank()
)
The plot above shows how each infection, or vaccination impacts the exposure number. For participant 3, their only exposures are vaccination, whereas both participants 1 and 2 have additional exposures from infection. By the end of this example, participant 2 has experienced 4 encounters with Spike, participant 1 has had 3 encounters and participant 3 just two from vaccination alone.
Summary
This vignette has provided examples of the cg_annotate family in
action. If you are conducting a multi-pathogen study (RSV, flu, covid),
then run a set of cg_annotate family functions for each pathogen - and
you may wish to prefix the output columns eg RSV_
,
flu_
& covid_
. As these have differing
considerations for eg variants, chronogram leaves the cg_annotate family
without an overall wrapper to let users easily omit unneeded
annotations.
SessionInfo
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] patchwork_1.3.0 ggplot2_3.5.1 dplyr_1.1.4 chronogram_1.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 jsonlite_1.8.9 highr_0.11 compiler_4.4.2
#> [5] tidyselect_1.2.1 stringr_1.5.1 tidyr_1.3.1 jquerylib_0.1.4
#> [9] systemfonts_1.1.0 scales_1.3.0 textshaping_0.4.0 yaml_2.3.10
#> [13] fastmap_1.2.0 R6_2.5.1 labeling_0.4.3 generics_0.1.3
#> [17] knitr_1.48 tibble_3.2.1 desc_1.4.3 munsell_0.5.1
#> [21] lubridate_1.9.3 bslib_0.8.0 pillar_1.9.0 rlang_1.1.4
#> [25] utf8_1.2.4 stringi_1.8.4 cachem_1.1.0 xfun_0.49
#> [29] fs_1.6.5 sass_0.4.9 timechange_0.3.0 cli_3.6.3
#> [33] pkgdown_2.1.1 withr_3.0.2 magrittr_2.0.3 digest_0.6.37
#> [37] grid_4.4.2 lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.1
#> [41] glue_1.8.0 farver_2.1.2 ragg_1.3.3 fansi_1.0.6
#> [45] colorspace_2.1-1 purrr_1.0.2 rmarkdown_2.29 tools_4.4.2
#> [49] pkgconfig_2.0.3 htmltools_0.5.8.1