Commit e99789e7 authored by Carsten Kemena's avatar Carsten Kemena

improving options

parent 1e45b525
No preview for this file type
...@@ -10,7 +10,7 @@ ...@@ -10,7 +10,7 @@
\begin{document} \begin{document}
\title{RADS manual} \title{RADS manual}
\subtitle{2.1.2 (beta)} \subtitle{2.2.0}
\author{Carsten Kemena} \author{Carsten Kemena}
\maketitle \maketitle
...@@ -33,9 +33,8 @@ The general option influence the general behaviour of RADS: ...@@ -33,9 +33,8 @@ The general option influence the general behaviour of RADS:
\begin{tabular}{llp{9.5cm}} \begin{tabular}{llp{9.5cm}}
\hline \hline
parameter & default & description\\\hline parameter & default & description\\\hline
-h, --help & - & Produces this help message \\ -h, --help & - & Produces this help message \\
-d, --db & - & Prefix to the database. Can be either one of the precomputed ones downloaded from the website or self-computed. \\ -d, --db & - & Prefix to the database. Can be either one of the precomputed ones downloaded from the website or self-computed. \\
-a, --all & - & All domain types need to occur\\
-o, --out & - & The output file.\\ -o, --out & - & The output file.\\
-n, --threads & 1 & The number of threads to use\\ -n, --threads & 1 & The number of threads to use\\
\hline \hline
...@@ -48,7 +47,7 @@ The query options define the different ways a query can be provided. ...@@ -48,7 +47,7 @@ The query options define the different ways a query can be provided.
\begin{tabular}{llp{9cm}} \begin{tabular}{llp{9cm}}
\hline \hline
parameter & default & description\\\hline parameter & default & description\\\hline
-q, --query-dom &- & The domain annotation file to be used as query. This is a simple domain annotion file in one of the supported formats.\\ -q, --query-dom &- & The domain annotation file to be used as query. This is a simple domain annotation file in one of the supported formats.\\
-Q, --query-seq &- & File containing sequences to be used as queries. The file has to be in FASTA format.\\ -Q, --query-seq &- & File containing sequences to be used as queries. The file has to be in FASTA format.\\
--domaindb & - & The domain database to use for automated annotation. \\ --domaindb & - & The domain database to use for automated annotation. \\
-D, --domains & - & Provide a domain arrangement manually in form of space separated domain accession numbers (e.g. PF00001 PF00002)\\ -D, --domains & - & Provide a domain arrangement manually in form of space separated domain accession numbers (e.g. PF00001 PF00002)\\
...@@ -67,12 +66,24 @@ parameter & default & description\\\hline ...@@ -67,12 +66,24 @@ parameter & default & description\\\hline
-M, --matrix &- & The domain similarity matrix. This one needs to fit the data in the database (e.g. If you work with a database that contain Pfam domains, use the corresponding Pfam similarity matrix.\\ -M, --matrix &- & The domain similarity matrix. This one needs to fit the data in the database (e.g. If you work with a database that contain Pfam domains, use the corresponding Pfam similarity matrix.\\
--gop & -50 & Gap opening costs\\ --gop & -50 & Gap opening costs\\
--gep & -10 & Gap extension costs\\ --gep & -10 & Gap extension costs\\
-c, --collapse & false & Collapse consecutive identical domains\\
\hline \hline
\end{tabular} \end{tabular}
\end{table} \end{table}
Gap opening costs are only taken into account when the gap occurs in the middle of a domain arrangement. Gaps at either end of a DA are assumed only penalized using the 'gap extension' costs. Gap opening costs are only taken into account when the gap occurs in the middle of a domain arrangement. Gaps at either end of a DA are assumed only penalized using the 'gap extension' costs.
\subsubsection{Result filtering options}
\begin{table}[H]
\begin{tabular}{llp{9cm}}
\hline
parameter & default & description\\\hline
-a, --all & false & All domain types need to occur\\
-M, --min-score & =0 & The minimum alignment score to list\\
\end{tabular}
\end{table}
\subsection{Data bases} \subsection{Data bases}
We provide a range of precomputed databases on our website. We currently provide databases based on the InterPro annotations. If you want to compute a database based on you own data you can do that very easily using the makeRadsDB program included (see Section \ref{section:makeRadsDB}). We provide a range of precomputed databases on our website. We currently provide databases based on the InterPro annotations. If you want to compute a database based on you own data you can do that very easily using the makeRadsDB program included (see Section \ref{section:makeRadsDB}).
...@@ -81,27 +92,34 @@ We provide a range of precomputed databases on our website. We currently provide ...@@ -81,27 +92,34 @@ We provide a range of precomputed databases on our website. We currently provide
\subsection{Output format} \subsection{Output format}
The output is in a very simple textfile format. The hits are listed in a table of five \emph{tab} separated columns. The first column contains the alignment score and the second the normalized version. The third column contains the the target id followed by the sequence length in the fourth column. The output is in a simple text file format and contains two parts. The first part is a summary of the process containing the date of execution, The version of RADS and the parameters used. The second part of the file contains the result. The hits are listed in a table of five \emph{tab} separated columns. The first column contains the alignment score and the second the normalized version. The third column contains the the target id followed by the sequence length in the fourth column.
The table is sorted according to the first column. The table is sorted according to the first column.
\begin{verbatim} \begin{verbatim}
# RADS Output v1 # RADS version 2.2.0
# RADS version 2.0.0 # RADS Output v1
# ******************************** # run at Fri Apr 20 14:19:09 2018
#
# query file: -
# database: interPro-test
# gap open penalty -50
# gap extension penalty -10
# matrix: pfam-31.dsm
# all: false
# collapse: true
# ******************************************************************
# -------------------------------------------------------------------
Results for: manual entered query Results for: manual entered query
Domain arrangement: PF00001 Domain arrangement: PF00001 PF00002 PF00003
# score | normalized | SeqID | sequence length | domain arrangement # score | normalized | SeqID | sequence length | domain arrangement
# ------------------------------------------------------------------- # -------------------------------------------------------------------
100 1.00 10020:000030 611 PF00001 44 293 300 1.00 test-seq1 530 PF00001 10 63 PF00002 104 312 PF00003 362 524
100 1.00 10020:000054 276 PF00001 2 215 300 1.00 test-seq2 530 PF00001 10 63 PF00002 104 312 PF00003 362 524
100 1.00 10020:0001c3 337 PF00001 42 293 190 0.69 test-seq3 530 PF00002 104 312 PF00003 362 524
100 1.00 10020:000327 402 PF00001 75 353 190 0.69 test-seq5 530 PF00001 10 63 PF00002 104 312 PF00002 362 524
100 1.00 10020:000359 410 PF00001 52 305
100 1.00 10020:000393 372 PF00001 67 321
\end{verbatim} \end{verbatim}
...@@ -122,7 +140,7 @@ rads --db InterPro60-pfam -M pfam30.dsm -q seq.dom ...@@ -122,7 +140,7 @@ rads --db InterPro60-pfam -M pfam30.dsm -q seq.dom
\section{makeRadsDB}\label{section:makeRadsDB} \section{makeRadsDB}\label{section:makeRadsDB}
A program to compute a data base that can be used by RADS. A database consists of two files an index file A program to compute a data base that can be used by RADS. A database consists of two files an index file
(SQLite database) and an arrangement file (simple textfile) (e.g. if the name of the data base is MyDB the (SQLite database) and an arrangement file (simple text file) (e.g. if the name of the data base is MyDB the
files needed are MyDB.db and MyDB.da). files needed are MyDB.db and MyDB.da).
\subsection{Program options} \subsection{Program options}
...@@ -157,9 +175,7 @@ Some options to influence the data base construction. ...@@ -157,9 +175,7 @@ Some options to influence the data base construction.
\begin{tabular}{llp{9cm}} \begin{tabular}{llp{9cm}}
\hline \hline
parameter & default & description\\\hline parameter & default & description\\\hline
-d, --databases & - & The database to use\\ -d, --database & - & The database to use\\
-f, --filter & - & Remove overlapping domains\\
-t, --threshold & 10 & Maximal number of allowed overlap\\
\hline \hline
\end{tabular} \end{tabular}
......
...@@ -80,20 +80,20 @@ main(int argc, char *argv[]) ...@@ -80,20 +80,20 @@ main(int argc, char *argv[])
int gop, gep; int gop, gep;
fs::path matrixName; fs::path matrixName;
bool collapse;
po::options_description scoreOpts("Scoring options"); po::options_description scoreOpts("Scoring options");
scoreOpts.add_options() scoreOpts.add_options()
("matrix,m", po::value<fs::path>(&matrixName)->value_name("FILE"), "The domain similarity matrix") ("matrix,m", po::value<fs::path>(&matrixName)->value_name("FILE"), "The domain similarity matrix")
("gop", po::value<int>(&gop)->default_value(-50)->value_name("INT"), "Gap opening costs") ("gop", po::value<int>(&gop)->default_value(-50)->value_name("INT"), "Gap opening costs")
("gep", po::value<int>(&gep)->default_value(-10)->value_name("INT"), "Gap extension costs") ("gep", po::value<int>(&gep)->default_value(-10)->value_name("INT"), "Gap extension costs")
("collapse,c", po::value<bool>(&collapse)->default_value(false)->zero_tokens(), "Collapse consecutive identical domains")
; ;
bool all; bool all;
bool collapse;
int minScore; int minScore;
po::options_description filterOpts("Result filtering options"); po::options_description filterOpts("Result filtering options");
filterOpts.add_options() filterOpts.add_options()
("all,a", po::value<bool>(&all)->default_value(false)->zero_tokens(), "All domain types need to occur") ("all,a", po::value<bool>(&all)->default_value(false)->zero_tokens(), "All domain types need to occur")
("collapse,c", po::value<bool>(&collapse)->default_value(false)->zero_tokens(), "Collapse consecutive identical domains")
("min-score,M", po::value<int>(&minScore)->default_value(0), "The minimum alignment score to list") ("min-score,M", po::value<int>(&minScore)->default_value(0), "The minimum alignment score to list")
; ;
...@@ -177,9 +177,7 @@ main(int argc, char *argv[]) ...@@ -177,9 +177,7 @@ main(int argc, char *argv[])
querySet.emplace("manual entered query", da); querySet.emplace("manual entered query", da);
} }
// calculate results
AP::Output outS(outFile);
fs::path daFile = prefix; fs::path daFile = prefix;
daFile.replace_extension(".da"); daFile.replace_extension(".da");
vector<BSDL::AlignmentMatrix<int, BSDL::DSM> > matrices; vector<BSDL::AlignmentMatrix<int, BSDL::DSM> > matrices;
...@@ -190,11 +188,12 @@ main(int argc, char *argv[]) ...@@ -190,11 +188,12 @@ main(int argc, char *argv[])
std::time_t tt; std::time_t tt;
tt = system_clock::to_time_t(today); tt = system_clock::to_time_t(today);
// Write results
string qPath = querySeqFile.empty() ? "-" : fs::canonical(querySeqFile).string(); string qPath = querySeqFile.empty() ? "-" : fs::canonical(querySeqFile).string();
string print_matrix_name = (print_filename_only) ? matrixName.stem().string() : fs::canonical(matrixName).string(); string print_matrix_name = (print_filename_only) ? matrixName.stem().string() : fs::canonical(matrixName).string();
string print_db_name = (print_filename_only) ? prefix.stem().string() : fs::canonical(daFile).replace_extension("").string(); string print_db_name = (print_filename_only) ? prefix.stem().string() : fs::canonical(daFile).replace_extension("").string();
// print output header // print output header
AP::Output outS(outFile);
outS << "# RADS version " + version + "\n" outS << "# RADS version " + version + "\n"
<< "# RADS Output v1\n" << "# RADS Output v1\n"
<< "# run at " << ctime (&tt) << "#\n" << "# run at " << ctime (&tt) << "#\n"
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment