Configuration:
Default configurations can be set at the following configuration file: config/basic.yaml
. Otherwise configurations can be set manually on a run-by-run basis when running FilTar from the command line by using the --use-conda
option passed to snakemake
The primary options to be configured in the configuration files are as followed:
miRNAs
: A list of miRNAs (using canonical miRNA names found in miRBase with the three letter species prefix) to use for target prediction. If this configuration is left blank, target prediction is performed for all miRBase annotated miRNAs of that speciestranscripts
: A list of mRNAs to use for target prediction. If this configuration is left blank, target prediction is performed for all annotated transcripts of a given speciesspecies
: Specify which species to investigate. There must be a record containing this species name in the 'species' field of themetadata.tsv
data table. A full list of available species that can be used with FilTar is found in the file namedconfig/species.yaml
.context
: Specify which RNA-Seq experiment to use e.g. liver or kidney. This should correspond to values specified for the 'biological_context' field inmetadata.tsv
.If the user does not want to use RNA-Seq data at all, and to default on standard Ensembl 3UTR models, the value of 'context' should be set to 'reference'reannotation
: Set to 'True' if the user wants to reannotate 3'UTRs, if not, set to 'False'.TPM_expression_threshold
: set TPM expression threshold. Use a value of 0 if you do not want to filter by expression level.conservation
: Set to 'True' to use multiple sequence alignments, otherwise set to 'False'. Genome-wide alignment files are many gigabytes in size, and take longer to process than non-MSA sequence data, so in some cases, this option may be preferableprediction_algorithm
: Which miRNA target prediction algorithm to use - possible values are 'Targetscan7' and 'miRanda'use_high_conf_mirs_only
: Set toTrue
to use high confidence miRBase miRNA annotations only. Otherwise, all miRNAs for that species will be used
These details can be manually configured by editing the config/basic.yaml
file to enter configurations in serialised format.
Dependency configuration
The input parameters for different applications utilised by FilTar such as HISAT2, samtools and miRanda can be configured using theconfig/dependencies.yaml
configuration file. Keys in this yaml file contain the name of the application and the name of the parameter so that users know which YAML keys refer to which application parameters.
User RNA-Seq data
Users can provide their own data for use with FilTar
To do this, users must deposit their own compressed fastq data with fastq.gz
or fq.gz
file extension in either the data/single_end
or data/paired_end
directories. When using FilTar in this way, users must also set the sequence_data_source
configuration value to User
Files names must be in the form {accession}.fastq.gz
for single-end data and {accession}_1.fastq.gz
and {accession}_2.fastq.gz
containing the first and second mate pair reads, respectively for paired-end data.
Users can specify any value they wish for the accession wildcard provided that their corresponds to values specified in the main configuration file.
Rule parameterisation
For the most part, external tools are executed using default parameters which are documented in their respective snakefiles (found within subdirectories of the modules directory). Values of parameters can be altered by directly editing the 'param' key-value pairs contained within the rules of these snakefiles.