Occasionally we may want to run just the differential part of the
analysis for a pre-processed dataset that already exists as an R object.
Here, we describe the easiest way to this, assuming we have a spec-file
and either an rds or rda R object that contains the raw counts and
metadata as a DESeq2::DESeqDataSet object.
The objective is to create a folder containing the scripts and
resources usually provided by the differential folder of
the whole pipeline. Navigate to the directory in which you want to carry
out the analysis: probably initially empty to avoid over-writing any
previous preexisting content. Then, in that directory, download just the
differential part of the pipeline:
tar -xzf /nemo/stp/babs/working/bioinformatics/templates/rnaseq.tar.gz --strip=2 babs/differential
make check-isolatedThe make command there will progressively step you
through the prerequisites for an isolated analysis. If
make check-isolated completes without any errors, then you
are ready to try make run, otherwise carefully examine and
act on what it reports - if it finds something to be missing, it will
generate an example for you, but it won’t fit the specifics of your
design. Once you have customised it to fit your requirements, run
make check-isolated again to see what the next missing
piece might be.
The non-negotiable is a .config file:
Config file
The first run of that command will create a config called
extdata/example.config if you haven’t already got a config
file. The exmple contains empty versions of all possible entries:
# Name of bioconductor annotation package
example.org.db=org.Hs.eg.db
# Name of rda or rds file in extdata that will supply a preprocessed object.
# Will default to the name of the config file - so delete lines if not used
example.rda=example
example.rds=example
# Name of CSV file in extdata that will supply assay data. Delete line if not used.
# You can set example.missing to this value _instead_, and a binary 'missingness'
# analysis will be carried out instead.
example.csv=example
# Name of a column in a CSV that contains the gene symbols.
# Defaults to first column if you delete this line.
example.symbolColumn=Symbol
# What word to use in reports to refer to a feature (protein, peptide, metabolite, gene...)
# Defaults to 'gene'
example.rowNoun=gene
# Comma-separated list of names of spec files that will _not_ be combined with this config settings.
example.nospec=
Base your actual config file on that - it is important to
prefix each line with the name of the config file: if
you rename the above to my.config but still have
example.org.db=org.Hs.eg.db that won’t be recognised. If
you have a working config file in another location, copy that to the
extdata folder and remove the example before
continuing.
The above already gives some guidance on ways to get the data into
the pipeline: a path to an rda file, an rds file or a csv file - plus
the fallback of the output of a Nextflow run if the extdata
doesn’t find any of the others. Whichever, it needs to be directly named
after the config file, so I will always use
e.g. extdata/example.csv here, but change that to your own
preference.
Assay Data
make check-isolated will not create
example assay data for you, but will report the lack of compatible
data.
RDS/RDA
For these binary files, it expects either a fully-fledged
DESeqDataSet object with populated colData
etc. If it is a SummarizedExperiment object, the pipeline
will run in limma-mode.
CSV
This will also require a extdata/metadata.csv file to be
present, that conforms to the usual requirements describing the
experimental conditions. If metadata.csv and
example.csv are both present, they will be combined into a
SummarizedExperiment object, and analysed in
limma-mode. example.csv must have columns
named exactly the same as the entries in the first column of
metadata.csv - they will act as the assay data. Additional
columns in the example.csv will be imported as
rowData.
Spec file(s)
Then into extdata/ (which gets created by the script),
you need to place a *.spec file. `make check-isolated’ will
put an example spec file in if you don’t have one.
Running the pipeline
Once you’ve done that (got x.config,
x.{rda,rds,csv} and y.spec all in ./extdata) a
make run should do everything required. It’s all a lot
easier than comes across writing it down long-hand. You can also do
make run sbatch_args=
make run sbatch_args=--time=01-00
make run sbatch_args="--time=01-00 --job-name='Hello World'"
if you want to run the job under slurm. The sbatch_args=
can be empty (purely to flag to make that sbatch is
required, but with default parameters), or contain one or more sbatch command-line
options