View on GitHub

FilTar

Using RNA-Seq data to improve microRNA target prediction accuracy in animals

Installation & dependencies

  1. devtools (R package): devtools is needed to install R packages from Github repositories. It can be installed directly from CRAN within the R console as follows:

    install.packages("devtools")


  2. filtar (R package): The filtar R package also exists to perform a series of data manipulations within the main FilTar workflow. It can be installed via use of the devtools R package with the following command within the R console:

    devtools::install_github('TBradley27/filtar_R')

    If you are using conda to manage your R installation, you may run into a few issues. See the following thread for possible resolutions


  3. Conda (using python3): FilTar is a snakemake project and it is therefore required that the conda package manager is installed on your system. The easiest way to do this is to obtain conda through the relatively lightweight miniconda python distribution, which only contains conda and its dependencies (Installation Instructions). Ensure that you have installed a version of conda compatible with python3.

  4. Create a new conda environment: It is advised that users create a new conda environment specifically for FilTar as this will minimise the possibility of dependency conflicts. If the user already has a stand-alone environment for snakemake than this may also work.

    conda create --name filtar python=3.7


  5. Activate the environment:

    conda activate filtar


  6. Install a specific perl build using conda:

    conda install -c conda-forge perl=5.26.2=h470a237_0

  7. Install cpanm - the perl package manager: The recommended method of installing perl modules for use with FilTar is to install the cpanm perl package manager, and then to manually install perl modules using cpanm, which itself can be installed using conda:

    conda install -c bioconda perl-app-cpanminus

  8. Install the Statistics::Lite perl module

    cpanm Statistics::Lite

  9. Install the Bio::TreeIO perl module

    This is quite a large installation as Bio::TreeIO uses BioPerl as a dependency. This installation also requires several steps:

    Attempt to install Bio::TreeIO using cpanm:

    cpanm Bio::TreeIO

    The attempted installation with cpanm will fail, but will prime your conda enviornment for correct installation of this module. To complete installation, download patch needed for Bio::TreeIO installation - this should be located at the following location: ~/.cpan/prefs/XML-DOM-XPath.yml. The URL of the patch is as followed: github.com/eserte/srezic-cpan-distroprefs/blob/master/XML-DOM-XPath.yml

    Reattempt the installation of Bio::TreeIO using CPAN:

    perl -e MCPAN -e shell

    install XML::DOM:XPath

  10. Install mamba (a conda alternative)

    conda install -c conda-forge mamba

  11. mamba install -c conda-forge -c bioconda snakemake

  12. Other: For the remaining dependencies, the user has the choice of the following two options:

    1. To allow dependencies to be managed within the snakemake workflow. If the --use-conda flag (see later) is used when executing snakemake, then snakemake will use conda to download and/or activate environments containing dependencies needed to execute a given rule (i.e. job within the larger workflow). That environment will then be reactivated whenever that same rule is executed.

      For example, if the --use-conda flag is used, filtar will download and install HISAT2 and run it within its own self-contained conda environment. The dependency would be installed within conda, so root priviliges would not be needed.

    2. The user can manage their own dependencies outside of the filtar workflow. If the --use-conda flag is not used, then filtar will try and find HISAT2 within the current environment using the PATH environment variable.

    Note 1: Option 1) may cause problems on systems such as some HPC environments in which data download and resource-expensive jobs cannot occur on the same node/machine. In which case option 2) may be preferable. However, for more recent versions of snakemake, all data downloads occur before subsequent job execution, in which case, it may br advisable to download data on one machine, then re-execute the workflow on a different machine to complete jobs.

    Note 2: For more flexibility if users would like FilTar to manage some dependencies internally (e.g. HISAT2) and some dependencies externally (e.g. trim galore) they could edit the relevant Snakefile and comment out the 'conda' directive from the relevant rule. In this case, that rule's dependencies would not be managed internally by filtar even when the --use-conda flag is used when executing the 'snakemake' command