View on GitHub

FilTar

Using RNA-Seq data to improve microRNA target prediction accuracy in animals

Configuration:

Default configurations can be set at the following configuration file: config/basic.yaml. Otherwise configurations can be set manually on a run-by-run basis when running FilTar from the command line by using the --use-conda option passed to snakemake

The primary options to be configured in the configuration files are as followed:

  1. miRNAs: A list of miRNAs (using canonical miRNA names found in miRBase with the three letter species prefix) to use for target prediction. If this configuration is left blank, target prediction is performed for all miRBase annotated miRNAs of that species
  2. transcripts: A list of mRNAs to use for target prediction. If this configuration is left blank, target prediction is performed for all annotated transcripts of a given species
  3. species: Specify which species to investigate. There must be a record containing this species name in the 'species' field of the metadata.tsv data table. A full list of available species that can be used with FilTar is found in the file named config/species.yaml.
  4. context: Specify which RNA-Seq experiment to use e.g. liver or kidney. This should correspond to values specified for the 'biological_context' field in metadata.tsv.If the user does not want to use RNA-Seq data at all, and to default on standard Ensembl 3UTR models, the value of 'context' should be set to 'reference'
  5. reannotation: Set to 'True' if the user wants to reannotate 3'UTRs, if not, set to 'False'.
  6. TPM_expression_threshold: set TPM expression threshold. Use a value of 0 if you do not want to filter by expression level.
  7. conservation: Set to 'True' to use multiple sequence alignments, otherwise set to 'False'. Genome-wide alignment files are many gigabytes in size, and take longer to process than non-MSA sequence data, so in some cases, this option may be preferable
  8. prediction_algorithm: Which miRNA target prediction algorithm to use - possible values are 'Targetscan7' and 'miRanda'
  9. use_high_conf_mirs_only: Set to True to use high confidence miRBase miRNA annotations only. Otherwise, all miRNAs for that species will be used

These details can be manually configured by editing the config/basic.yaml file to enter configurations in serialised format.

Dependency configuration

The input parameters for different applications utilised by FilTar such as HISAT2, samtools and miRanda can be configured using the config/dependencies.yaml configuration file. Keys in this yaml file contain the name of the application and the name of the parameter so that users know which YAML keys refer to which application parameters.

User RNA-Seq data

Users can provide their own data for use with FilTar

To do this, users must deposit their own compressed fastq data with fastq.gz or fq.gz file extension in either the data/single_end or data/paired_end directories. When using FilTar in this way, users must also set the sequence_data_source configuration value to User

Files names must be in the form {accession}.fastq.gz for single-end data and {accession}_1.fastq.gz and {accession}_2.fastq.gz containing the first and second mate pair reads, respectively for paired-end data.

Users can specify any value they wish for the accession wildcard provided that their corresponds to values specified in the main configuration file.

Rule parameterisation

For the most part, external tools are executed using default parameters which are documented in their respective snakefiles (found within subdirectories of the modules directory). Values of parameters can be altered by directly editing the 'param' key-value pairs contained within the rules of these snakefiles.