Commit 7d2c341d authored by Carsten Kemena's avatar Carsten Kemena

updated the manual

- changed to sphinx
- added description for the list-alignments option
- expanded the installation/setup description
parent 284e3820
.. _installation:
************
Installation
************
------------
Requirements
------------
We try to keep the dependencies as little as possible. Current dependencies are:
* BioSeqDataLib (https://ebbgit.uni-muenster.de/domainWorld/BioSeqDataLib) (can be added via git submodule)
* boost (http://www.boost.org)
* SQLite (https://www.sqlite.org)
* compiler with c++11 and OpenMP support
--------
Download
--------
The easiest way to download the most current version of RADS is to use git:
.. code-block:: bash
git clone https://ebbgit.uni-muenster.de/domainWorld/RADS.git
cd RADS
git submodule init
git submodule update
If you are unable to use git you can download the source code package manually from here: If you don't want to use git, you can download the source code from here: https://ebbgit.uni-muenster.de/domainWorld/RADS/-/archive/master/RADS-master.tar.gz. Below you find the commands needed to put everything necessary in its correct place. You can replace the ``wget`` command with manual downloads and copying the file to the correct position.
.. code-block:: bash
wget https://ebbgit.uni-muenster.de/domainWorld/RADS/-/archive/master/RADS-master.tar.gz
tar xfz RADS-master.tar.gz
# The BioSeqDataLib now needs to be added manually
cd RADS-master/libs
rmdir BioSeqDataLib
wget https://ebbgit.uni-muenster.de/domainWorld/BioSeqDataLib/-/archive/master/BioSeqDataLib-master.tar.gz
tar xfz BioSeqDataLib-master.tar.gz
mv BioSeqDataLib-master BioSeqDataLib
-----------
Compilation
-----------
Change into the RADS directory and run the following commands:
.. code-block:: bash
mkdir build
cd build
cmake ..
make
.. _makeradsdb:
****************
makeRadsDB Usage
****************
``makeRadsDB`` is a program to compute a data base that can be used by RADS. A database consists of two files an index file
(SQLite database) and an arrangement file (simple text file) (e.g. if the name of the data base is MyDB the
files needed are MyDB.db and MyDB.da).
===============
Program options
===============
---------------
General options
---------------
The basic options
.. program:: makeRadsDB
.. option:: -h, --help
Produces this help message
.. option:: -i <FILE>, --input <FILE>
Domain arrangement file(s) that should be turned into a database.
.. option:: -I <FILE>, --InterPro <FILE>
Used to turn the InterPro annotation file (match\_complete.xml.gz) found on https://www.ebi.ac.uk/interpro/download.html into a RADS database. This option is used to compute the precomputed InterPro databases. Use the :option:`makeRadsDB --database` option to extract the domain arrangements of a single database.
.. option:: -s <FILE>, --seqs <FILE>
Sequence files. Are used in combination with the domain arrangement files. If none is given all sequence lengths are set to 0.
.. option:: -o <FILE>, --out <FILE>
The output prefix used to produce two files in format prefix.db and prefix.da. Be aware that we currently do no support adding data to an existing data base.
---------------
Filter options
---------------
Some options to influence the data base construction.
.. program:: makeRadsDB
.. option:: -d, --database
This options is used together with the option: :option:`makeRadsDB --InterPro`. It determines which of the supported databases to include in the RADS database.
===============
Examples
===============
.. code-block:: bash
# running makeRadsDB providing pfam annotations and sequences
makeRadsDB -i domains1.pfam domains2.pfam -s seqs1.fa ses2.fa -o myDB
.. _rads:
**********
RADS Usage
**********
============
Simple Usage
============
This section assumes that you have installed RADS as described in :ref:`installation` and setup the RADS as described in :ref:`setup`.
Three parameters are required, a query, the database to search in and a scoring matrix. There are three different ways to provide a query, either as a simple list of domain IDs,
a protein sequence that will be automatically annotated, or already an existing domain annotation file (e.g. the result of a run of ``pfam_scan.pl``).
.. code-block:: bash
# running RADS providing a manual list of domains as query
rads -D PF02758 PF05729 --db InterPro60-pfam -m pfam30.dsm
# running RADS providing a sequence as query
rads -Q seq.fasta --db InterPro60-pfam -m pfam30.dsm
# running RADS providing a domain annotation as query
rads -q seq.dom --db InterPro60-pfam -m pfam30.dsm
===============
Program Options
===============
---------------
General options
---------------
The general option influence the general behaviour of RADS:
.. program:: rads
.. option:: -h, --help
Prints a simple help message with a small description of all the available options.
.. option:: -d <FILE>, --db <FILE>
Prefix to the database. Can be either one of the precomputed ones downloaded from the website or self-computed (see :ref:`setup`).
.. option:: -o <FILE>, --out <FILE>
The output file.
.. option:: -l, --list-alignments
Report the alignments computed for the different domain arrangements.
.. option:: -n <INT>, --threads <INT>
The number of threads to be used by the program. Currently with this option several queries can be processed in parallel. If only one query is given, this program will still use only a single core. *Default: 1*
--------------
Query options
--------------
The query options define the different ways a query can be provided.
.. program:: rads
.. option:: -q <FILE>, --query-dom <FILE>
The domain annotation file to be used as query. This is a simple domain annotation file in one of the supported formats (e.g. the output of ``pfam_scan.pl``).
.. option:: -Q <FILE>, --query-seq <FILE>
File containing sequences to be used as queries. The file has to be in FASTA format.
.. option:: --domain-db <FILE>
The domain database to use for automated annotation.
.. option:: -D <IDs>, --domains <IDs>
Provide a domain arrangement manually in form of space separated domain accession numbers (e.g. PF00001 PF00002).
---------------
Scoring options
---------------
These parameters influence the alignment scoring similar to the same values in a standard alignment.
.. program:: rads
.. option:: -m <FILE>, --matrix <FILE>
The domain similarity matrix. This one needs to fit the data in the database (e.g. If you work with a database that contain Pfam domains, use the corresponding Pfam similarity matrix.
.. option:: --gop <INT>
Gap opening penalty, These costs are applied once for each consecutive set of gaps in a domain arrangement. They are not applied to gaps at the ends of the alignment. *Default: -50*
.. option:: --gep <INT>
Gap extension penalty. These costs are applied to each single gap character in the alignment. *Default: -10*
.. option:: -c, --collapse
Collapse consecutive identical domains. It is **recommended to use** this option. The reason why this is not automatically done is, that if actually changes the domain arrangements. *Default: false*
Gap opening costs are only taken into account when the gap occurs in the middle of a domain arrangement. Gaps at either end of a DA are assumed only penalized using the 'gap extension' costs.
------------------------------
Result filtering options
------------------------------
These options can be used to filter the hits that are reported.
.. program:: rads
.. option:: -a, --all
All of the domain IDs in the query have to appear in the target sequences as well. *Default: false*
.. option:: -M <INT>, --min-score <INT>
Only alignments with a score larger or equal to this value are reported. *Default: 0*
===============
Output format
===============
The output is in a simple text file format and contains two parts. The first part is a summary of the process containing the date of execution, The version of RADS and the parameters used. The second part of the file contains the result. The hits are listed in a table of five \emph{tab} separated columns. The first column contains the alignment score and the second the normalized version. The third column contains the the target id followed by the sequence length in the fourth column.
The table is sorted according to the first column.
.. code-block:: text
# RADS version 2.2.0
# RADS Output v1
# run at Fri Apr 20 14:19:09 2018
#
# query file: -
# database: interPro-test
# gap open penalty -50
# gap extension penalty -10
# matrix: pfam-31.dsm
# all: false
# collapse: true
# ******************************************************************
# -------------------------------------------------------------------
Results for: manual entered query
Domain arrangement: PF00001 PF00002 PF00003
# score | normalized | SeqID | sequence length | domain arrangement
# -------------------------------------------------------------------
300 1.00 test-seq1 530 PF00001 10 63 PF00002 104 312 PF00003 362 524
300 1.00 test-seq2 530 PF00001 10 63 PF00002 104 312 PF00003 362 524
190 0.69 test-seq3 530 PF00002 104 312 PF00003 362 524
190 0.69 test-seq5 530 PF00001 10 63 PF00002 104 312 PF00002 362 524
.. _setup:
***************
Setting up RADS
***************
This chapter describes how to setup RADS so it can access all the data it needs. Beside the query
=======================
Setting up the database
=======================
You need a database to search in. You can use one of the databases we precomputed based on InterPro annotations available here: http://domainworld.uni-muenster.de/programs/rads/ or you can compute your own one using the the ``makeRadsDB`` program described in :ref:`makeRadsDB`.
=============================================
Setting up the domain similarity matrix (DSM)
=============================================
These precomputed similarity matrices should be fitting to the database you use. You can download a DSM from: http://domainworld.uni-muenster.de/data/dsm/
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment