Commit 17d18002 authored by Carsten Kemena's avatar Carsten Kemena

updated the manual

- changed to sphinx
- added description for the list-alignments option
- expanded the installation/setup description
parent 7d2c341d
v. 2.3.0
- added option to display computed alignments
- updated manual and changed to sphinx
v. 2.2.0
- internal code improvements
......
RADS 2.1.2 (beta)
RADS 2.3.0
====
This program can perform a domain arrangement similarity search on databases.
......@@ -14,9 +14,21 @@ We try to keep the dependencies as little as possible. Current dependencies are:
- compiler with c++11 and OpenMP support
Download
--------
```bash
git clone https://ebbgit.uni-muenster.de/domainWorld/RADS.git
cd RADS
git submodule init
git submodule update
```
Installation
------------
Change into the RADS directory and run the following commands:
```bash
......@@ -30,7 +42,8 @@ make
Usage
-----
Please take a look at the wiki page (https://ebbgit.uni-muenster.de/domainWorld/RADS/wikis/home) for detailed a description
Please take a look at the file UserManual.pdf included in this program to get a detailed overview on how to install and run the program.
Problems, Bugs & Suggestions
----------------------------
......
File added
......@@ -86,6 +86,7 @@ html_theme = 'alabaster'
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_logo = '_static/logo.png'
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
......@@ -107,8 +108,8 @@ htmlhelp_basename = 'RADSdoc'
# -- Options for LaTeX output ------------------------------------------------
latex_elements = {
'sphinxsetup':'VerbatimColor={rgb}{0.87,0.87,0.87},verbatimwithframe=false'
'sphinxsetup':'VerbatimColor={rgb}{0.87,0.87,0.87},verbatimwithframe=false',
'classoptions': ',openany,onside'
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
......@@ -134,6 +135,7 @@ latex_documents = [
'Carsten Kemena', 'manual'),
]
latex_logo = '_static/logo.png'
# -- Options for manual page output ------------------------------------------
......
......@@ -30,7 +30,7 @@ The easiest way to download the most current version of RADS is to use git:
git submodule update
If you are unable to use git you can download the source code package manually from here: If you don't want to use git, you can download the source code from here: https://ebbgit.uni-muenster.de/domainWorld/RADS/-/archive/master/RADS-master.tar.gz. Below you find the commands needed to put everything necessary in its correct place. You can replace the ``wget`` command with manual downloads and copying the file to the correct position.
If you don't want to use git, you can download the source code from here: https://ebbgit.uni-muenster.de/domainWorld/RADS/-/archive/master/RADS-master.tar.gz. Below you find the commands needed to put everything necessary in its correct place. You can replace the ``wget`` command with manual downloads and copying the file to the correct position.
.. code-block:: bash
......
......@@ -5,8 +5,8 @@ makeRadsDB Usage
****************
``makeRadsDB`` is a program to compute a data base that can be used by RADS. A database consists of two files an index file
(SQLite database) and an arrangement file (simple text file) (e.g. if the name of the data base is MyDB the
files needed are MyDB.db and MyDB.da).
(SQLite database) and a domain arrangement file (simple text file). Therefore, if the name of the data base is MyDB the
files needed are MyDB.db and MyDB.da.
===============
Program options
......@@ -32,7 +32,7 @@ The basic options
.. option:: -I <FILE>, --InterPro <FILE>
Used to turn the InterPro annotation file (match\_complete.xml.gz) found on https://www.ebi.ac.uk/interpro/download.html into a RADS database. This option is used to compute the precomputed InterPro databases. Use the :option:`makeRadsDB --database` option to extract the domain arrangements of a single database.
Used to turn the InterPro annotation file (match\_complete.xml.gz) found on https://www.ebi.ac.uk/interpro/download.html into a RADS database. This option is used to compute the precomputed InterPro databases. Use the :option:`--database` option to extract the domain arrangements of a single database.
.. option:: -s <FILE>, --seqs <FILE>
......@@ -54,7 +54,7 @@ Some options to influence the data base construction.
.. option:: -d, --database
This options is used together with the option: :option:`makeRadsDB --InterPro`. It determines which of the supported databases to include in the RADS database.
This options is used together with the option: :option:`--InterPro`. It determines which of the supported databases to include in the RADS database.
......
......@@ -9,7 +9,7 @@ RADS Usage
Simple Usage
============
This section assumes that you have installed RADS as described in :ref:`installation` and setup the RADS as described in :ref:`setup`.
This section assumes that you have installed RADS as described in :ref:`installation` and setup RADS as described in :ref:`setup`.
Three parameters are required, a query, the database to search in and a scoring matrix. There are three different ways to provide a query, either as a simple list of domain IDs,
a protein sequence that will be automatically annotated, or already an existing domain annotation file (e.g. the result of a run of ``pfam_scan.pl``).
......@@ -45,7 +45,7 @@ The general option influence the general behaviour of RADS:
.. option:: -d <FILE>, --db <FILE>
Prefix to the database. Can be either one of the precomputed ones downloaded from the website or self-computed (see :ref:`setup`).
Prefix of the database. Can be either one of the precomputed ones downloaded from the website (see :ref:`setup`) or self-computed (see :ref:`makeRadsDB`).
.. option:: -o <FILE>, --out <FILE>
......@@ -95,16 +95,13 @@ These parameters influence the alignment scoring similar to the same values in a
The domain similarity matrix. This one needs to fit the data in the database (e.g. If you work with a database that contain Pfam domains, use the corresponding Pfam similarity matrix.
.. option:: --gop <INT>
Gap opening penalty, These costs are applied once for each consecutive set of gaps in a domain arrangement. They are not applied to gaps at the ends of the alignment. *Default: -50*
Gap opening penalty. These costs are applied once for each consecutive set of gaps in a domain arrangement. They are not applied to gaps at the ends of the alignment. *Default: -50*
.. option:: --gep <INT>
Gap extension penalty. These costs are applied to each single gap character in the alignment. *Default: -10*
.. option:: -c, --collapse
Collapse consecutive identical domains. It is **recommended to use** this option. The reason why this is not automatically done is, that if actually changes the domain arrangements. *Default: false*
Gap opening costs are only taken into account when the gap occurs in the middle of a domain arrangement. Gaps at either end of a DA are assumed only penalized using the 'gap extension' costs.
Collapse consecutive identical domains. It is **recommended to use** this option. The reason why this is not automatically done is that it actually changes the domain arrangements. However, domains can often duplicate and several times the same domain in a row is not uncommon, usually without affecting the function of a protein. *Default: false*
------------------------------
Result filtering options
......@@ -133,28 +130,89 @@ The table is sorted according to the first column.
.. code-block:: text
# RADS version 2.2.0
# RADS Output v1
# run at Fri Apr 20 14:19:09 2018
#
# query file: -
# database: interPro-test
# gap open penalty -50
# gap extension penalty -10
# matrix: pfam-31.dsm
# all: false
# collapse: true
# ******************************************************************
# -------------------------------------------------------------------
Results for: manual entered query
Domain arrangement: PF00001 PF00002 PF00003
# score | normalized | SeqID | sequence length | domain arrangement
# -------------------------------------------------------------------
300 1.00 test-seq1 530 PF00001 10 63 PF00002 104 312 PF00003 362 524
300 1.00 test-seq2 530 PF00001 10 63 PF00002 104 312 PF00003 362 524
190 0.69 test-seq3 530 PF00002 104 312 PF00003 362 524
190 0.69 test-seq5 530 PF00001 10 63 PF00002 104 312 PF00002 362 524
# RADS version 2.2.0
# RADS Output v1
# run at Fri Apr 20 14:19:09 2018
#
# query file: -
# database: interPro-test
# gap open penalty -50
# gap extension penalty -10
# matrix: pfam-31.dsm
# all: false
# collapse: true
# ******************************************************************
# -------------------------------------------------------------------
Results for: manual entered query
Domain arrangement: PF00001 PF00002 PF00003
# score | normalized | SeqID | sequence length | domain arrangement
# -------------------------------------------------------------------
300 1.00 test-seq1 530 PF00001 10 63 PF00002 104 312 PF00003 362 524
300 1.00 test-seq2 530 PF00001 10 63 PF00002 104 312 PF00003 362 524
190 0.69 test-seq3 530 PF00002 104 312 PF00003 362 524
190 0.69 test-seq5 530 PF00001 10 63 PF00002 104 312 PF00002 362 524
If you used the :option:`--list-alignments` option you will find additional output. An additional column denotes the alignment ID. The alignments can then be found at the end of the table.
.. note::
Be aware that if you use additionally the :option:`--collapse` option the table will still show the original domain arrangement, the alignment though will use the collapsed version. See example below.
.. code-block:: text
# RADS version 2.3.0
# RADS Output v1
# run at Wed Jun 27 15:09:15 2018
#
# query file: -
# database: /local/home/ckeme_01/projects/domainWorld/RADS/tests/integrationTests/interPro-test
# gap open penalty -50
# gap extension penalty -10
# matrix: /local/home/ckeme_01/.domainWorld/dsm/pfam-31.dsm
# all: false
# collapse: true
# ******************************************************************
# -------------------------------------------------------------------
Results for: manual entered query
Domain arrangement: PF00001 PF00002 PF00003
# score | normalized | SeqID | sequence length | domain arrangement | aln
# -------------------------------------------------------------------
300 1.00 test-seq1 530 PF00001 10 63 PF00002 104 312 PF00003 362 524 1
300 1.00 test-seq2 530 PF00001 10 63 PF00002 104 312 PF00003 362 524 1
190 0.69 test-seq5 530 PF00001 10 63 PF00002 104 312 PF00002 362 524 2
190 0.69 test-seq3 530 PF00002 104 312 PF00003 362 524 3
# -------------------------------------------------------------------
List of alignments:
# -------------------------------------------------------------------
1)
Query DA: PF00001 PF00002 PF00003
Target DA: PF00001 PF00002 PF00003
2)
Query DA: PF00001 PF00002 PF00003
Target DA: PF00001 PF00002 *******
3)
Query DA: PF00001 PF00002 PF00003
Target DA: ******* PF00002 PF00003
===============
Citation
===============
If you find RADS useful in your research, please cite it.
Terrapon, Nicolas, Weiner, January, Grath, Sonja, Moore, Andrew D, Bornberg-Bauer, Erich: Rapid similarity search of proteins using alignments of domain arrangements., Bioinformatics (2014) 30 (2): 274-281. doi: 10.1093/bioinformatics/btt379
http://bioinformatics.oxfordjournals.org/content/30/2/274.long
......@@ -5,17 +5,17 @@
Setting up RADS
***************
This chapter describes how to setup RADS so it can access all the data it needs. Beside the query
This chapter describes how to setup RADS so it can access all the data it needs. Additional to your query you will also need a RADS database and a similarity matrix.
=======================
Setting up the database
=======================
You need a database to search in. You can use one of the databases we precomputed based on InterPro annotations available here: http://domainworld.uni-muenster.de/programs/rads/ or you can compute your own one using the the ``makeRadsDB`` program described in :ref:`makeRadsDB`.
You need a database to search in. You can use one of the databases we precomputed based on InterPro annotations available here: http://domainworld.uni-muenster.de/programs/rads/ or you can compute your own one using the ``makeRadsDB`` program described in :ref:`makeRadsDB`.
=============================================
Setting up the domain similarity matrix (DSM)
=============================================
These precomputed similarity matrices should be fitting to the database you use. You can download a DSM from: http://domainworld.uni-muenster.de/data/dsm/
These precomputed similarity matrices should be fitting to the domain database used, e.g. If you database contains PFAM domain, use the DSM containing the PFAM match scores. You can download DSMs for PFAM and SUPERFAMILY from: http://domainworld.uni-muenster.de/data/dsm/.
......@@ -6,18 +6,23 @@
Welcome to RADS's documentation!
================================
.. toctree::
:maxdepth: 2
:caption: Contents:
usage/installation.rst
content/installation.rst
content/setup.rst
content/rads_usage.rst
content/makedb_usage.rst
Indices and tables
==================
.. only:: html
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
* :ref:`genindex`
* :ref:`search`
This program can perform a domain arrangement similarity search on databases. It provides tools to create your own database as well as the program to search in them.
************
Installation
************
------------
Requirements
------------
We try to keep the dependencies as little as possible. Current dependencies are:
* BioSeqDataLib (https://ebbgit.uni-muenster.de/domainWorld/BioSeqDataLib) (can be added via git submodule)
* boost (http://www.boost.org)
* SQLite (https://www.sqlite.org)
* compiler with c++11 and OpenMP support
--------
Download
--------
You can use git to get the latest stable version of RADS:
.. code-block:: bash
git clone https://ebbgit.uni-muenster.de/domainWorld/RADS.git
cd RADS
git submodule init
git submodule update
If you don't want to use git, you can download the source code from here: https://ebbgit.uni-muenster.de/domainWorld/RADS/-/archive/master/RADS-master.tar.gz. In this case you will need to additionally copy the BioSeqDataLib folder into the libs folder of RADS.
-----------
Compilation
-----------
Change into the RADS directory and run the following commands:
.. code-block:: bash
mkdir build
cd build
cmake ..
make
*****
Usage
*****
\documentclass{scrartcl}
\usepackage{hyperref}
\usepackage{xcolor}
\usepackage{listings}
\usepackage{float}
\begin{document}
\title{RADS manual}
\subtitle{2.2.0}
\author{Carsten Kemena}
\maketitle
\tableofcontents
\section{RADS}
\subsection{Introduction}
RADS is a program to search for domain arrangements in a given database.
\subsection{Program Options}
\subsubsection*{General options}
The general option influence the general behaviour of RADS:
\begin{tabular}{llp{9.5cm}}
\hline
parameter & default & description\\\hline
-h, --help & - & Produces this help message \\
-d, --db & - & Prefix to the database. Can be either one of the precomputed ones downloaded from the website or self-computed. \\
-o, --out & - & The output file.\\
-n, --threads & 1 & The number of threads to use\\
\hline
\end{tabular}
\subsubsection*{Query options}
The query options define the different ways a query can be provided.
\begin{tabular}{llp{9cm}}
\hline
parameter & default & description\\\hline
-q, --query-dom &- & The domain annotation file to be used as query. This is a simple domain annotation file in one of the supported formats.\\
-Q, --query-seq &- & File containing sequences to be used as queries. The file has to be in FASTA format.\\
--domaindb & - & The domain database to use for automated annotation. \\
-D, --domains & - & Provide a domain arrangement manually in form of space separated domain accession numbers (e.g. PF00001 PF00002)\\
\hline
\end{tabular}
\subsubsection*{Scoring options}
These parameters influence the alignment scoring similar to the same values in a standard alignment.
\begin{table}[H]
\begin{tabular}{llp{9cm}}
\hline
parameter & default & description\\\hline
-M, --matrix &- & The domain similarity matrix. This one needs to fit the data in the database (e.g. If you work with a database that contain Pfam domains, use the corresponding Pfam similarity matrix.\\
--gop & -50 & Gap opening costs\\
--gep & -10 & Gap extension costs\\
-c, --collapse & false & Collapse consecutive identical domains\\
\hline
\end{tabular}
\end{table}
Gap opening costs are only taken into account when the gap occurs in the middle of a domain arrangement. Gaps at either end of a DA are assumed only penalized using the 'gap extension' costs.
\subsubsection{Result filtering options}
\begin{table}[H]
\begin{tabular}{llp{9cm}}
\hline
parameter & default & description\\\hline
-a, --all & false & All domain types need to occur\\
-M, --min-score & =0 & The minimum alignment score to list\\
\end{tabular}
\end{table}
\subsection{Data bases}
We provide a range of precomputed databases on our website. We currently provide databases based on the InterPro annotations. If you want to compute a database based on you own data you can do that very easily using the makeRadsDB program included (see Section \ref{section:makeRadsDB}).
\subsection{Output format}
The output is in a simple text file format and contains two parts. The first part is a summary of the process containing the date of execution, The version of RADS and the parameters used. The second part of the file contains the result. The hits are listed in a table of five \emph{tab} separated columns. The first column contains the alignment score and the second the normalized version. The third column contains the the target id followed by the sequence length in the fourth column.
The table is sorted according to the first column.
\begin{verbatim}
# RADS version 2.2.0
# RADS Output v1
# run at Fri Apr 20 14:19:09 2018
#
# query file: -
# database: interPro-test
# gap open penalty -50
# gap extension penalty -10
# matrix: pfam-31.dsm
# all: false
# collapse: true
# ******************************************************************
# -------------------------------------------------------------------
Results for: manual entered query
Domain arrangement: PF00001 PF00002 PF00003
# score | normalized | SeqID | sequence length | domain arrangement
# -------------------------------------------------------------------
300 1.00 test-seq1 530 PF00001 10 63 PF00002 104 312 PF00003 362 524
300 1.00 test-seq2 530 PF00001 10 63 PF00002 104 312 PF00003 362 524
190 0.69 test-seq3 530 PF00002 104 312 PF00003 362 524
190 0.69 test-seq5 530 PF00001 10 63 PF00002 104 312 PF00002 362 524
\end{verbatim}
\subsection{Examples}
\begin{lstlisting}[language=bash,backgroundcolor = \color{lightgray}]
# running RADS providing a manual list of domains as query
rads --db InterPro60-pfam -M pfam30.dsm -D PF02758 PF05729
# running RADS providing a sequence as query
rads --db InterPro60-pfam -M pfam30.dsm -Q seq.fasta
# running RADS providing a domain annotation as query
rads --db InterPro60-pfam -M pfam30.dsm -q seq.dom
\end{lstlisting}
\newpage
\section{makeRadsDB}\label{section:makeRadsDB}
A program to compute a data base that can be used by RADS. A database consists of two files an index file
(SQLite database) and an arrangement file (simple text file) (e.g. if the name of the data base is MyDB the
files needed are MyDB.db and MyDB.da).
\subsection{Program options}
\subsubsection*{General options}
The basic options
\begin{table}[H]
\begin{tabular}{llp{9cm}}
\hline
parameter & default & description\\\hline
-h, --help & - & Produces this help message\\
-i, --input & - & Domain arrangement file(s) that should be turned into a database. \\
-I, --InterPro & - & Used to turn the InterPro annotation file (match\_complete.xml.gz) found on
\url{https://www.ebi.ac.uk/interpro/download.html} into a RADS database. This option is
used to compute the precomputed InterPro databases.\\
-s, --seqs & - & Sequence files. Are used in combination with the domain arrangement files.
If none is given all sequence lengths are set to 0.\\
-o, --out & - & The output prefix used to produce two files in format prefix.db and prefix.da.
Be aware that we currently do no support adding data to an existing data base.\\
\hline
\end{tabular}
\end{table}
The domain arrangement file as well as the sequence files can contain several sequences/arrangements.
\subsubsection*{Filter options}
Some options to influence the data base construction.
\begin{tabular}{llp{9cm}}
\hline
parameter & default & description\\\hline
-d, --database & - & The database to use\\
\hline
\end{tabular}
\subsection{Examples}
\begin{lstlisting}[language=bash,backgroundcolor = \color{lightgray}]
# running makeRadsDB providing pfam annotations and sequences
makeRadsDB -i domains1.pfam domains2.pfam -s seqs1.fa ses2.fa \
-o myDB
\end{lstlisting}
\end{document}
......@@ -71,10 +71,10 @@ main(int argc, char *argv[])
po::options_description general("General options");
general.add_options()
("help,h", "Produces this help message")
("input,i", po::value<vector<fs::path> >(&daFiles)->multitoken(), "Domain arrangement files")
("InterPro,I", po::value<fs::path>(&interProFile), "InterPro match file")
("seqs,s", po::value<vector<fs::path> >(&seqFiles)->multitoken(), "Sequence files")
("out,o", po::value<string>(&prefix)->required(), "The output prefix")
("input,i", po::value<vector<fs::path> >(&daFiles)->multitoken()->value_name("FILE"), "Domain arrangement files")
("InterPro,I", po::value<fs::path>(&interProFile)->value_name("FILE"), "InterPro match file")
("seqs,s", po::value<vector<fs::path> >(&seqFiles)->multitoken()->value_name("FILE"), "Sequence files")
("out,o", po::value<string>(&prefix)->required()->value_name("FILE"), "The output prefix")
;
//std::vector<string> databases;
......
......@@ -120,18 +120,18 @@ main(int argc, char *argv[])
po::options_description general("General options");
general.add_options()
("help,h", "Produces this help message")
("db,d", po::value<fs::path>(&prefix)->required(), "The database prefix")
("db,d", po::value<fs::path>(&prefix)->required()->value_name("FILE"), "The database prefix")
("out,o", po::value<fs::path>(&outFile)->value_name("FILE"), "The output file")
("list-alignments,l", po::value<bool>(&listAlignments)->default_value(false)->zero_tokens(), "List alignments")
("threads,n", po::value<unsigned short>(&nThreads)->default_value(1), "The number of threads to use")
("threads,n", po::value<unsigned short>(&nThreads)->default_value(1)->value_name("INT"), "The number of threads to use")
;
po::options_description queryOpts("Query options");
queryOpts.add_options()
("query-dom,q", po::value<fs::path>(&queryDomainFile), "The domain annotation file to be used as query")
("query-seq,Q", po::value<fs::path>(&querySeqFile), "File containing sequences to be used as queries")
("domaindb", po::value<fs::path>(&domainDB), "The domain database to use for automated annotation")
("domains,D", po::value<vector<string> >(&domains)->multitoken(), "Domain arrangement")
("query-dom,q", po::value<fs::path>(&queryDomainFile)->value_name("FILE"), "The domain annotation file to be used as query")
("query-seq,Q", po::value<fs::path>(&querySeqFile)->value_name("FILE"), "File containing sequences to be used as queries")
("domain-db", po::value<fs::path>(&domainDB)->value_name("FILE"), "The domain database to use for automated annotation")
("domains,D", po::value<vector<string> >(&domains)->multitoken()->value_name("ID(s)"), "Domain arrangement")
;
int gop, gep;
......@@ -150,7 +150,7 @@ main(int argc, char *argv[])
po::options_description filterOpts("Result filtering options");
filterOpts.add_options()
("all,a", po::value<bool>(&all)->default_value(false)->zero_tokens(), "All domain types need to occur")
("min-score,M", po::value<int>(&minScore)->default_value(0), "The minimum alignment score to list")
("min-score,M", po::value<int>(&minScore)->default_value(0)->value_name("INT"), "The minimum alignment score to list")
;
bool print_filename_only;
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment