Commit 396bb27e authored by Carsten Kemena's avatar Carsten Kemena

switched to AlignmentMatrix

parent 522d157d
Subproject commit b6c16e20c167a8c5c9e2ad7876feea959e1b66b1
Subproject commit cada6232b356db742926226965626bb7a567eb37
No preview for this file type
......@@ -4,24 +4,25 @@
\usepackage{hyperref}
\usepackage{xcolor}
\usepackage{listings}
\usepackage{float}
\begin{document}
\title{RADS manual}
\subtitle{2.0.0}
\subtitle{2.1.2 (beta)}
\author{Carsten Kemena}
\maketitle
\tableofcontents
\section{RADS}
\subsection{Introduction}
RADS is a program to search for domain arrangements in a given database.
RADS is a program to search for domain arrangements in a given database.
\subsection{Program Options}
......@@ -59,25 +60,28 @@ parameter & default & description\\\hline
These parameters influence the alignment scoring similar to the same values in a standard alignment.
\begin{table}[H]
\begin{tabular}{llp{9cm}}
\hline
parameter & default & description\\\hline
-M, --matrix &- & The domain similarity matrix. This one needs to fit the data in the database (e.g. If you work with a database that contain Pfam domains, use the corresponding Pfam similarity matrix.\\
--gop & -50 & Gap opening costs\\
--gop & -50 & Gap opening costs\\
--gep & -10 & Gap extension costs\\
\hline
\end{tabular}
\end{table}
Gap opening costs are only taken into account when the gap occurs in the middle of a domain arrangement. Gaps at either end of a DA are assumed only penalized using the 'gap extension' costs.
\subsection{Data bases}
We provide a range of precomputed databases on our website. We currently provide databases based on the InterPro annotations. If you want to compute a database based on you own data you can do that very easily using the makeRadsDB program included.
We provide a range of precomputed databases on our website. We currently provide databases based on the InterPro annotations. If you want to compute a database based on you own data you can do that very easily using the makeRadsDB program included (see Section \ref{section:makeRadsDB}).
\subsection{Output format}
The output is in a very simple textfile format. The hits are listed in a table of five \emph{tab} separated columns. The first column contains the alignment score and the second the normalized version. The third column contains the the target id followed by the sequence length in the fourth column.
The output is in a very simple textfile format. The hits are listed in a table of five \emph{tab} separated columns. The first column contains the alignment score and the second the normalized version. The third column contains the the target id followed by the sequence length in the fourth column.
The table is sorted according to the first column.
......@@ -90,7 +94,7 @@ The table is sorted according to the first column.
Results for: manual entered query
Domain arrangement: PF00001
# score | normalized | SeqID | sequence length | domain arrangement
# score | normalized | SeqID | sequence length | domain arrangement
# -------------------------------------------------------------------
100 1.00 10020:000030 611 PF00001 44 293
100 1.00 10020:000054 276 PF00001 2 215
......@@ -115,26 +119,35 @@ rads --db InterPro60-pfam -M pfam30.dsm -q seq.dom
\end{lstlisting}
\newpage
\section{makeRadsDB}
\section{makeRadsDB}\label{section:makeRadsDB}
A program to compute a data base that can be used by RADS. A database consists of two files an index file (SQLite database) and an arrangement file (simple textfile) (e.g. if the name of the data base is MyDB the files needed are MyDB.db and MyDB.da).
A program to compute a data base that can be used by RADS. A database consists of two files an index file
(SQLite database) and an arrangement file (simple textfile) (e.g. if the name of the data base is MyDB the
files needed are MyDB.db and MyDB.da).
\subsection{Program options}
\subsubsection*{General options}
The basic options
\begin{table}[H]
\begin{tabular}{llp{9cm}}
\hline
parameter & default & description\\\hline
-h, --help & - & Produces this help message\\
-i, --input & - & Domain arrangement file(s) that should be turned into a database. \\
-I, --InterPro & - & Used to turn the InterPro annotation file (match\_complete.xml.gz) found on \url{https://www.ebi.ac.uk/interpro/download.html} into a RADS database. This option is used to compute the precomputed InterPro databases.\\
-s, --seqs & - & Sequence files. Are used in combination with the domain arrangement files. If none is given all sequence lengths are set to 0.\\
-o, --out & - & The output prefix used to produce two files in format prefix.db and prefix.da. Be aware that we currently do no support adding data to an existing data base.\\
-h, --help & - & Produces this help message\\
-i, --input & - & Domain arrangement file(s) that should be turned into a database. \\
-I, --InterPro & - & Used to turn the InterPro annotation file (match\_complete.xml.gz) found on
\url{https://www.ebi.ac.uk/interpro/download.html} into a RADS database. This option is
used to compute the precomputed InterPro databases.\\
-s, --seqs & - & Sequence files. Are used in combination with the domain arrangement files.
If none is given all sequence lengths are set to 0.\\
-o, --out & - & The output prefix used to produce two files in format prefix.db and prefix.da.
Be aware that we currently do no support adding data to an existing data base.\\
\hline
\end{tabular}
\end{table}
The domain arrangement file as well as the sequence files can contain several sequences/arrangements.
\subsubsection*{Filter options}
......@@ -150,14 +163,15 @@ parameter & default & description\\\hline
\hline
\end{tabular}
\section{Examples}
\subsection{Examples}
\begin{lstlisting}[language=bash,backgroundcolor = \color{lightgray}]
# running makeRadsDB providing pfam annotations and sequences
makeRadsDB -i domains1.pfam domains2.pfam -s seqs1.fa ses2.fa \
-o myDB
\end{lstlisting}
\end{document}
\ No newline at end of file
\end{document}
......@@ -29,7 +29,7 @@
// BSDL header
#include "../libs/BioSeqDataLib/src/DomainModule.hpp"
#include "../libs/BioSeqDataLib/src/align/nw_gotoh.hpp"
#include "../libs/BioSeqDataLib/src/align/AlignmentMatrix.hpp"
#include "../libs/BioSeqDataLib/src/utility/utility.hpp"
#include "../libs/BioSeqDataLib/src/external/Output.hpp"
#include "../libs/BioSeqDataLib/src/utility/Settings.hpp"
......@@ -86,7 +86,7 @@ struct Result
* \param outS the ouput file
*/
void
runSearch(const std::pair<string, BSDL::DomainArrangement<BSDL::Domain> > queryDA, SQLiteDB &db, fs::path daFile, int gop, int gep, bool all, int scoreThres, const BSDL::DSM &simMat, BioSeqDataLib::MatrixStack<3, std::pair<int,char> > &matrix, AP::Output &outS)
runSearch(const std::pair<string, BSDL::DomainArrangement<BSDL::Domain> > queryDA, SQLiteDB &db, fs::path daFile, bool all, int scoreThres, BSDL::AlignmentMatrix<int, BSDL::DSM> &matrix, AP::Output &outS)
{
// get the positions of all domain arrangments that contain at least one of the domains in question
string query = "Select position from domain where accession in ('" + queryDA.second[0].accession() + "'";
......@@ -145,11 +145,11 @@ runSearch(const std::pair<string, BSDL::DomainArrangement<BSDL::Domain> > queryD
{
size_t targetLength = targetDA.size();
BioSeqDataLib::gotoh(queryDA.second, targetDA, matrix, simMat, gop, gep);
score = (matrix[0][queryLength][targetLength]).first;
matrix.gotoh(queryDA.second, targetDA);
score = matrix.score();// (matrix[0][queryLength][targetLength]).first;
if (score < scoreThres)
continue;
int minScore = (queryLength+targetLength) * gep;
int minScore = (queryLength+targetLength) * matrix.gep();
normScore = min((score-minScore)*1.0/(queryLength*100-minScore),(score-minScore)*1.0/(targetLength*100-minScore));
}
catch (std::exception &e)
......@@ -344,9 +344,10 @@ main(int argc, char *argv[])
AP::Output outS(outFile);
fs::path daFile(prefix + ".da");
vector< BSDL::MatrixStack<3, std::pair<int,char> > > matrices;
vector<BSDL::AlignmentMatrix<int, BSDL::DSM> > matrices;
//vector< BSDL::MatrixStack<3, std::pair<int,char> > > matrices;
for (unsigned short i=0; i<nThreads; ++i)
matrices.emplace_back(1,2);
matrices.emplace_back(gop, gep, simMat);
system_clock::time_point today = system_clock::now();
std::time_t tt;
......@@ -373,7 +374,7 @@ main(int argc, char *argv[])
auto it=querySet.begin();
for (size_t j=0; j<i; ++j)
++it;
runSearch(*it, db, daFile, gop, gep, all, minScore, simMat, matrices[omp_get_thread_num()], outS);
runSearch(*it, db, daFile, all, minScore, matrices[omp_get_thread_num()], outS);
}
outS.close();
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment