Commit d421dafe authored by Carsten Kemena's avatar Carsten Kemena

added a manual

parent ff1c8655
......@@ -79,3 +79,6 @@ ehthumbs.db
\ No newline at end of file
Version 0.9.2-beta
- added manual
- changed RADIANT help message
Version 0.9.1-beta
- change of database construction and thereby much smaller databases
- simpler handling of databases
# Minimal makefile for Sphinx documentation
# You can set these variables from the command line.
SPHINXBUILD = sphinx-build
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
# -*- coding: utf-8 -*-
# Configuration file for the Sphinx documentation builder.
# This file does only contain a selection of the most common options. For a
# full list see the documentation:
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
# -- Project information -----------------------------------------------------
project = 'RADIANT'
copyright = '2018, Carsten Kemena'
author = 'Carsten Kemena'
# The short X.Y version
version = '0.9.2-beta'
# The full version, including alpha/beta/rc tags
release = '0.9.2-beta'
# -- General configuration ---------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path .
exclude_patterns = []
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
# The default sidebars (for documents that don't match any pattern) are
# defined by theme itself. Builtin themes are using these templates by
# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
# 'searchbox.html']``.
# html_sidebars = {}
html_logo = '_static/logo.png'
# -- Options for HTMLHelp output ---------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'RADIANTdoc'
# -- Options for LaTeX output ------------------------------------------------
latex_elements = {
#'preamble': '',
'sphinxsetup':'VerbatimColor={rgb}{0.99,0.96,0.9},verbatimwithframe=false,warningBgColor={rgb}{1, 0.86,0.86},warningborder=2pt,warningBorderColor={rgb}{0.86, 0.08, 0.24},',
'classoptions': ',openany',
# The paper size ('letterpaper' or 'a4paper').
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
# Latex figure (float) alignment
# 'figure_align': 'htbp',
latex_additional_files = ["mystyle.sty"]
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'RADIANT.tex', 'RADIANT Manual',
'Carsten Kemena', 'manual'),
latex_logo = '_static/logo.png'
# -- Options for manual page output ------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'radiant', 'RADIANT Documentation',
[author], 1)
# -- Options for Texinfo output ----------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'RADIANT', 'RADIANT Documentation',
author, 'RADIANT', 'One line description of project.',
.. _general:
RADIANT is currently still in the beta-phase and therefore should be used with caution. However, our benchmark reveals that in most cases the results are very close to the (repeat collapsed) prediction of Pfam. But in a few cases RADIANT can miss quite a lot of domains.
RADIANT is a program to annotate protein sequences with domains.
RADIANT is based on the idea of UProC (P Meinicke, Bioinformatics 2015). It stores the words of known domains in a database and uses these words when scanning an unknown sequence to assign domains. Due to the algorithm it is currently not possible to detect domain repeats or provide start and end points of a domain.
If you have any problems, questions or suggestions concerning this program please contact us:
.. _installation:
We try to keep the dependencies as few as possible. Current dependencies are:
* BioSeqDataLib ( (can be added via git submodule)
* boost (
* compiler with c++11
You can currently choose between two different donwnload options: ``git`` and a manual download. Using ``git`` is the recommended way but if that is
not possible you can use the manual way instead. Both ways are described in the next sections.
The easiest way to download the most current version of RADIANT is to use ``git``:
.. code-block:: bash
git clone
git submodule init
git submodule update
If you want to update to a newer version you can simply run the following command:
.. code-block:: bash
git pull
git submodule update
Do not forget to recombile the program after this step.
Manual download
If you don't want to use git, you can download the source code from here:
Below you find the commands needed to put everything necessary in its correct place. You can replace the ``wget`` command with manual downloads and copying the
file to the correct position if you do not have an internet connection.
.. code-block:: bash
tar xfz RADIANT-master.tar.gz
# The BioSeqDataLib now needs to be added manually
cd RADIANT-master/libs
rmdir BioSeqDataLib
tar xfz BioSeqDataLib-master.tar.gz
mv BioSeqDataLib-master BioSeqDataLib
Change into the RADIANT directory and run the following commands:
.. code-block:: bash
mkdir build
cd build
cmake ..
.. _rads:
Simple Usage
.. program:: radiant
This section assumes that you have installed RADIANT as described in :ref:`installation` and setup RADIANT as described in :ref:`setup`.
The minimal requirements are to provide an input sequence file (FASTA format) and the database folder. This prints the annotation to the console. Using the :option:`--out` option will write the annotation to a provided file.
.. code-block:: bash
# simplest RADIANT usage
radiant -i <input file> -d <database directory>
# write RADIANT output to file
radiant -i <input file> -d <database directory> -o <output file>
Program Options
General options
The general option influence the general behaviour of RADS:
.. program:: radiant
.. option:: -h, --help
Prints a simple help message with a small description of all the available options.
.. option:: -i <FILE>, --in <FILE>
The sequence file to annotate. If it is a DNA file, you have to use the :option:`--translate` option to properly annotate the sequences. If you don't, protein domains will be predicted on the DNA which will not fail but give you wrong results.
.. option:: -d <DIRECTORY>, --database <DIRECTORY>
Prefix of the database. Can be either one of the precomputed ones downloaded from the website (see :ref:`setup`)
Output options
The query options define the different ways a query can be provided.
.. program:: radiant
.. option:: -o <FILE>, --out <FILE>
The ouput file. If none is provided the annotation will be printed to the console.
.. option:: -p, --pfam-like
Produces a Pfam-like output format. However, since not all colums can be filled with useful data, some of them will contain default values without meaning. This option is only meant to increase compatibility with existing parsers.
.. option:: -n, --no-header
Prevents the output file header from being printed
Translate options
These parameters influence the alignment scoring similar to the same values in a standard alignment.
.. program:: radiant
.. option:: --translate
Translate the input file from DNA into proteins. It will search for the longest ORF in all six frames and will translate that one into protein.
Output format
The default output format contains a header with information about the used parameters. The actual annotation consists of 7 columns. The first columns denotes the name of the sequence. The second and third column contains the start and end of the domain, currently however that values are the same and can only be used to sort the domains to the correct order. The fourth column is the Pfam ID and the fifth contains the domain name. The last two columns contain the type and the clan of the annotated domain.
Example output::
# RADIANT 0.9.2-beta
# RADIANT output format: 1.0
# run at Thu Nov 29 10:24:29 2018
# Options used:
# query file: example.fa
# database file: pfamdb_32_new/
# translate: false
# <seq id> <match start> <match end> <hmm acc> <hmm name> <type> <clan>
comp10000_c0_seq1|m.5390|g.5390 117 117 PF08209 Sgf11 Family CL0361.4
comp10003_c0_seq1|m.5397|g.5397 379 379 PF10367 Vps39_2 Domain No_clan
comp10010_c0_seq1|m.5409|g.5409 111 111 PF00117 GATase Domain CL0014.22
comp10011_c0_seq1|m.5412|g.5412 34 34 PF14632 SPT6_acidic Family No_clan
comp10011_c0_seq2|m.5414|g.5414 34 34 PF14632 SPT6_acidic Family No_clan
comp10012_c0_seq1|m.5416|g.5416 339 339 PF00096 zf-C2H2 Domain CL0361.4
comp10015_c0_seq1|m.5426|g.5426 20 20 PF13358 DDE_3 Domain CL0219.14
comp10017_c0_seq1|m.5429|g.5429 75 75 PF05915 DUF872 Family No_clan
comp10019_c0_seq1|m.5433|g.5433 183 183 PF05182 Fip1 Motif No_clan
comp10020_c0_seq1|m.5439|g.5439 27 27 PF00076 RRM_1 Domain CL0221.11
comp10021_c0_seq1|m.5440|g.5440 41 41 PF00076 RRM_1 Domain CL0221.11
comp10023_c0_seq1|m.5444|g.5444 70 70 PF14892 DUF4490 Family No_clan
comp10023_c0_seq2|m.5446|g.5446 91 91 PF14892 DUF4490 Family No_clan
comp10026_c0_seq1|m.5455|g.5455 98 98 PF12796 Ank_2 Repeat CL0465.3
comp10027_c0_seq1|m.5459|g.5459 65 65 PF01490 Aa_trans Family CL0062.13
comp10028_c0_seq1|m.5463|g.5463 17 17 PF01398 JAB Family CL0366.4
comp10028_c0_seq1|m.5463|g.5463 108 108 PF13012 MitMem_reg Family No_clan
comp10029_c0_seq1|m.5466|g.5466 21 21 PF00578 AhpC-TSA Domain CL0172.17
comp10029_c0_seq1|m.5466|g.5466 176 176 PF10417 1-cysPrx_C Domain No_clan
comp10030_c0_seq1|m.5467|g.5467 47 47 PF14954 LIX1 Family CL0196.12
comp10032_c0_seq1|m.5477|g.5477 82 82 PF00789 UBX Domain CL0072.20
comp10032_c0_seq2|m.5478|g.5478 18 18 PF00627 UBA Domain CL0214.13
comp10032_c0_seq2|m.5478|g.5478 293 293 PF00789 UBX Domain CL0072.20
comp10034_c0_seq1|m.5482|g.5482 86 86 PF00595 PDZ Domain CL0466.3
comp10036_c0_seq1|m.5489|g.5489 18 18 PF05021 NPL4 Family CL0366.4
comp10036_c0_seq2|m.5492|g.5492 58 58 PF05021 NPL4 Family CL0366.4
comp10037_c0_seq1|m.5494|g.5494 53 53 PF12436 USP7_ICP0_bdg Family CL0072.20
comp10037_c0_seq1|m.5494|g.5494 283 283 PF14533 USP7_C2 Family CL0072.20
comp10037_c0_seq2|m.5504|g.5504 53 53 PF12436 USP7_ICP0_bdg Family CL0072.20
comp10037_c0_seq2|m.5504|g.5504 283 283 PF14533 USP7_C2 Family CL0072.20
.. _setup:
Setting up RADIANT
This chapter describes how to setup RADIANT so it can access all the data it needs.
Setting up the database
You can download the database from the `DomainWorld website <>`_. Simply download the database for the Pfam version you want to use and then extract the file using tar:
.. code-block:: bash
# replace <version> with the version you downloaded
tar xfj radiant_db_pfam<version>.tar.bz2
The command will create a new directory containing three files needed by RADIANT. You will simply need to provide the path to the folder to RADIANT.
.. RADIANT documentation master file, created by
sphinx-quickstart on Wed Nov 28 16:21:21 2018.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to RADIANT's documentation!
.. toctree::
:maxdepth: 2
:caption: Contents:
.. only:: html
Indices and tables
* :ref:`genindex`
* :ref:`search`
......@@ -8,13 +8,13 @@ namespace fs = boost::filesystem;
* \brief Write database to file
* @param database The database to write
* @param outFile The file to write the database into
* @param outDir The file to write the database into
write2file(std::unordered_map<PrefixType, std::map<CodedSuffix, unsigned short > > &database, const std::string &outFile)
write2file(std::unordered_map<PrefixType, std::map<CodedSuffix, unsigned short > > &database, const std::string &outDir)
std::ofstream fout(outFile, std::ios::out | std::ios::binary);
std::ofstream fout(outDir, std::ios::out | std::ios::binary);
size_t val= database.size();
fout.write((char*)&val, sizeof(size_t));
for (auto it=database.begin(); it!=database.end(); ++it)
......@@ -31,8 +31,8 @@ write2file(std::unordered_map<PrefixType, std::map<CodedSuffix, unsigned short >
std::ofstream fout(outFile, std::ios::out | std::ios::binary);
std::ofstream fout_index(outFile+".index", std::ios::out | std::ios::binary);
std::ofstream fout(outDir, std::ios::out | std::ios::binary);
std::ofstream fout_index(outDir+".index", std::ios::out | std::ios::binary);
size_t val= database.size();
fout.write((char*)&val, sizeof(size_t));
for (auto it=database.begin(); it!=database.end(); ++it)
......@@ -59,15 +59,15 @@ write2file(std::unordered_map<PrefixType, std::map<CodedSuffix, unsigned short >
main(int argc, char const *argv[])
fs::path inFile, outFile;
fs::path inFile, outDir;
size_t nThreads;
std::string radiantDBVersion(std::string(STR(MAJOR_VERSION)) + "." + std::string(STR(MINOR_VERSION)) + "." + std::string(STR(PATCH_VERSION)) );
po::options_description allOpts("makeRadiantDB " + radiantDBVersion + " (C) 2017 Carsten Kemena\nThis program comes with ABSOLUTELY NO WARRANTY;\n\nAllowed options are displayed below.");
po::options_description general("General options");
("help,h", "Produces this help message")
("in,i", po::value<fs::path>(&inFile)->required(), "The input file")
("out,o", po::value<fs::path>(&outFile)->required(), "The prefix for the output files")
("in,i", po::value<fs::path>(&inFile)->required()->value_name("FILE"), "The input file")
("out,o", po::value<fs::path>(&outDir)->required()->value_name("DIRECTORY"), "The prefix for the output files")
("nThreads,t", po::value<size_t>(&nThreads)->default_value(1), "The number of threads to use")
......@@ -95,12 +95,12 @@ main(int argc, char const *argv[])
std::unordered_map<PrefixType, std::map<CodedSuffix, unsigned short > > database;
turnFile2db(inFile, database, false);
write2file(database, outFile.string() + "_fwd.db");
write2file(database, outDir.string() + "/forward.db");
turnFile2db(inFile, database, true);
cout << "Write to file" << endl;
write2file(database, outFile.string() + "_rev.db");
write2file(database, outDir.string() + "/reverse.db");
return 0;
......@@ -265,8 +265,8 @@ main(int argc, char const *argv[])
po::options_description general("General options");
("help,h", "Produces this help message")
("in,i", po::value<fs::path>(&inFile)->required(), "The input file")
("database,d", po::value<fs::path>(&databaseFile)->required(), "The path to the database")
("in,i", po::value<fs::path>(&inFile)->required()->value_name("FILE"), "The input file")
("database,d", po::value<fs::path>(&databaseFile)->required()->value_name("DIRECTORY"), "The path to the database")
//("nThreads,t", po::value<int>(&nThreads)->default_value(1), "Number of threads to use")
......@@ -274,7 +274,7 @@ main(int argc, char const *argv[])
bool pfamLike, noHeader;
po::options_description outputO("Output options");
("out,o", po::value<fs::path>(&outFile), "The output file")
("out,o", po::value<fs::path>(&outFile)->value_name("FILE"), "The output file")
("pfam-like,p", po::value<bool>(&pfamLike)->default_value(false)->zero_tokens(), "Produces a fake Pfam format")
("no-header,n", po::value<bool>(&noHeader)->default_value(false)->zero_tokens(), "Do not print the header")
SET(unit_tests_src ./unit_tests.cpp)
SET(unit_tests_exe unit_tests)
ADD_EXECUTABLE(${unit_tests_exe} ${unit_tests_src})
#SET(unit_tests_src ./unit_tests.cpp)
#SET(unit_tests_exe unit_tests)
#ADD_EXECUTABLE(${unit_tests_exe} ${unit_tests_src})
# ${Boost_LIBRARIES}
SET(listVar "")
LIST(APPEND listVar "${prefix}${f}")
SET(${var} "${listVar}" PARENT_SCOPE)
SET(tests_src ./unitTests/unit_tests.cpp)
SET(tests_exe unit_tests)
ADD_EXECUTABLE(${tests_exe} ${tests_src})
#!/usr/bin/env bats
@test "simple_run"
@test "make DB Test" {
# database based on pfam annotation files
run ../../build/makeRadsDB -i ../data/db_pfam.dom -s ../data/db_seqs.fa -o annotation
[ $status == 0 ]
echo $output
[ "$output" == $'Number of sequences included: 8\nNumber of distinct arrangements 7' ]
rm annotation.db annotation.da
# database based on interpro file
run ../../build/makeRadsDB -I ../data/match_small.xml -o interPro -d PFAM
[ $status == 0 ]
[ "$output" == $'Number of sequences included: 10\nNumber of distinct arrangements 9' ]
@test "run RADS Test" {
run ../../build/rads -d interPro -Q ../data/query_seqs.fa -o testQuerySeq.txt -m pfam-31.dsm
[ $status == 0 ]
run diff <(grep -v '#' testQuerySeq.txt) <(grep -v '#' results/testQuerySeqRes.txt)
[ $status == 0 ]
# check if all parameter works
#run ../../build/rads -D PF00733 PF13537 -a -m pfam-31.dsm -d interPro
#[ $status == 0 ]
#[ "$output" == $'# RADS version 2.1.0\n# RADS Output v1\n# ********************************\n\n# -------------------------------------------------------------------\nResults for: manual entered query\nDomain arrangement: PF00733 PF13537\n\n# score | normalized | SeqID | sequence length | domain arrangement\n# -------------------------------------------------------------------\n200 1.00 A0A004 645 PF00733 240 626 PF13537 49 162' ]
run ../../build/rads -D PF00733 PF13537 -m pfam-31.dsm -d interPro -o test1Res.txt
[ $status == 0 ]
run diff <(grep -v '#' test1Res.txt) <(grep -v '#' results/test1Res.txt)
[ $status == 0 ]
run ../../build/rads -h
[ $status == 0 ]
rm test1Res.txt testQuerySeq.txt
rm interPro.db interPro.da
......@@ -7,8 +7,8 @@
#include <iostream>
#include <string>
#include "../src/common.hpp"
#include "../libs/BioSeqDataLib/src/sequence/Sequence.hpp"
#include "../../src/common.hpp"
#include "../../libs/BioSeqDataLib/src/sequence/Sequence.hpp"
......@@ -7,9 +7,9 @@
#include <iostream>
#include <string>
#include "../src/common.hpp"
#include "../src/makeRadiantDb.hpp"
#include "../libs/BioSeqDataLib/src/sequence/SequenceSet.hpp"
#include "../../src/common.hpp"
#include "../../src/makeRadiantDb.hpp"
#include "../../libs/BioSeqDataLib/src/sequence/SequenceSet.hpp"
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment