Software
AsymmeTree
AsymmeTree is an open-source Python library for the simulation and analysis of phylogenetic scenarios. It includes a simulator for species and gene trees with heterogeneous evolution rates, nucleotide and amino acid sequences with or without indels, as well as whole genomes/proteomes.
Moreover, it includes tools for the inference and analysis of orthology and phylogenetic best matches (resp. best hits) from known gene trees or evolutionary distances, tools for the analysis of horizontal gene transfer (HGT) events, an algorithm to compute supertrees, and a method to estimate rooted species trees from an ensemble of orthology/paralogy relations.
The library is primarily designed to explore and validate mathematical concepts, and to test inference methods for various steps on the way to more realistically-available data, i.e., dated gene trees, additive distances of gene sets, noisy distances and finally sequences.
Reference for AsymmeTree:
AsymmeTree: A Flexible Python Package for the Simulation of
Complex Gene Family Histories
D. Schaller, M. Hellmuth, P.F. Stadler,
Software, 1(3), 276-298, 2022
[publisher's page]
tralda
tralda is an open-source Python library for tree algorithms and data structures. It provides methods for tree traversals and manipulation, output in Newick format, as well as the linear-time computation of last common ancestors by Bender & Farach-Colton. Tralda ontains an efficient algorithm for cograph recognition and heuristics for cograph editing. A number of algorithms for the computation of supertrees are implemented:
- BUILD (Aho et al. 1981), class Build or function BUILD_supertree
- BuildST (Deng & Fernández-Baca 2016), class BuildST or function build_st
- Loose_Cons_Tree (Jansson et al. 2016), class LooseConsensusTree or function loose_consensus_tree
- LinCR (Schaller et al. 2021), class LinCR or function linear_common_refinement
Reference for Tralda:
A simpler linear-time algorithm for the common refinement of rooted phylogenetic trees on a common leaf set
D. Schaller, M. Hellmuth, P.F. Stadler,
Algorithms for Molecular Biology volume 16, Article number: 23 (2021)
[publisher's page]
Simplify DAGs and networks via Least-Common-Ancestor (LCA) Constraints
In a DAG or network G with leaf set L(G), a least common ancestor (LCA) of a subset A of L(G) is a vertex v that is an ancestor of all x in A and has no descendant that also satisfies this property. A DAG or network G is LCA-relevant if each vertex is the LCA of some subset of leaves. Moreover, G is lca-relevant if each vertex is a unique LCA in G. DAGs and networks inferred from genomic sequence data can be highly complex and tangled, often containing redundant information. In particular, vertices that are not LCAs of any subset of leaves can be considered less significant and redundant in an evolutionary contex as they lack direct relevance to the observed ancestral relationships. To reduce unnecessary complexity and eliminate unsupported vertices, we aim to simplify a DAG to retain only LCA vertices while preserving essential evolutionary information. This python tool allows to simplify a DAG by ``removal'' of such vertices resulting in an LCA-relevant, resp., lca-relevant DAG while preserving key structural properties of the original DAG or network.
- Download Reference:
Simplifying and Characterizing DAGs and Phylogenetic Networks via Least Common Ancestor Constraints, Lindeberg, A. and Hellmuth, M., arXiv:2411.00708, 2024
ParaPhylo
ParaPhylo is a tool for reconstructing phylogenetic species trees based on information about paralogous genes. It relies on solving three intertwined NP-hard optimization problems: the cograph editing problem, the maximum consistent triple set problem, and the least resolved tree problem. Implemented as Integer Linear Program, paralogy-based phylogenies can be computed exactly for up to some twenty species and their complete protein complements.
Reference for ParaPhylo:
M. Hellmuth, N. Wieseke, M. Lechner, H-P Lenhof, M. Middendorf, and P.F. Stadler
Phylogenomics with Paralogs
Proceedings of the National Academy of Sciences (PNAS), 112(7):2058-2063, 2015
M. Hellmuth, N. Wieseke
From Sequence Data incl. Orthologs, Paralogs, and Xenologs to Gene and Species Trees
in Evolutionary Biology: Convergent Evolution, Evolution of Complex Traits, Concepts and Methods, Ed. Pierre Pontarotti, Springer, ISBN 978-3-319-41323-5, 2016
REvolutionH-tl: Reconstruction of Evolutionary Histories tool
Orthology detection from sequence similarity remains a difficult and computationally expensive problem for gene families with large numbers of gene duplications and losses. REvolutionH-tl implements a new graph-based approach to identify orthogroups, orthology, and paralogy relationships first, and it uses this information in a second step to infer event-labeled gene trees and their reconciliation with an inferred species tree. It avoids using gene trees and species trees upon input and settles for a maximal subtree reconciliation in cases where noise or horizontal gene transfer precludes a global reconciliation
Reference for REvolutionH-tl:
J.A. Ramirez-Rafael, A. Korchmaros, K. Avina-Padilla, A. Lopez Sanchez, A.A. Espana-Tinajero, M. Hellmuth, P.F. Stadler & M. Hernandez-Rosales
REvolutionH-tl: Reconstruction of Evolutionary Histories tool
In: Scornavacca, C., Hernandez-Rosales, M. (eds) Comparative Genomics. RECOMB-CG 2024. Lecture Notes in Computer Science, vol 14616. Springer, Cham.
GapEST
Antiparallel strong traces (ASTs) are a type of walks in graphs which use every edge exactly twice. They correspond to 1-face embeddings in orientable surfaces and can be used to design self- assembling protein or DNA strands. Based on a novel canonical form invariant for ASTs, gap vector, we provide a linear-time isomorphism test for ASTs and thus, also for orientable 1-face embeddings of graphs. Using the canonical form, we develop the algorithm GapEST for enumerating all pairwise non-isomorphic 1-face embeddings of given graphs.
Reference for GapEST:
M. Hellmuth, A.S. Knudsen, M. Kotrbcik, D. Merkle and N. Nojgaard
Linear Time Canonicalization and Enumeration of Non-Isomorphic 1-Face EmbeddingsSIAM: Algorithm Engineering and Experiments (ALENEX18) (to appear) 2018
(Cayley Atom Tracker)
A python package for tracking atoms through chemical networks using projected Cayley graphs.
References for CAT
M. Hellmuth, D. Merkle, N. Nøjgaard
Atom Tracking Using Cayley Graphs,
16th International Symposium on Bioinformatics Research and Applications (ISBRA 2020),
Lecture Notes in Computer Science, vol 12304, Springer, pp 406-415, 2020
M. Hellmuth, W. Fontana, D. Merkle, N. Nøjgaard
Cayley Graphs of Semigroups Applied to Atom Tracking in Chemistry
Journal of Computational Biology, Journal of Computational Biology, 28(7), 701-715., 2021
TimeCons-Reconc
In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. TimeCons-Recons is a tool that allow to check in O(|V(T)|log(|V(S)|))-time whether there is a time-consistent reconciliation map between an event-labeled gene tree T and a species tree S.
Reference for TimeCons-Reconc
N. Nojgaard, M. Geiss, D. Merkle, P. F. Stadler, N. Wieske, M. Hellmuth
Forbidden Time Travel: Characterization of Time-Consistent Tree Reconciliation Maps17th International Workshop on Algorithms in Bioinformatics (WABI 2017),Leibniz International Proceedings in Informatics (LIPIcs), 88, 17:1--17:12, 2017
PartialHomology
A variety of methods based on sequence similarity, reconciliation, synteny or functional charac- teristics, can be used to infer homology relations, that is, orthology, paralogy and xenology relations between genes of a given gene family G. The (inferred) homology relations might not cover each pair of genes and thus, provide only partial knowledge on the full set of homology relations. Moreover, for particular pairs of genes it might be known with a high degree of certainty that they are not orthologs (resp. paralogs, xenologs) which yields forbidden pairs of genes. The question arises as whether such sets of (partial) homology relations with or without forbidden gene pairs are satisfiable, i.e., can they simultaneously co-exist in an evolutionary history for G. PartialHomology is an O(|G|^2+m|G|)-time algorithm to determine whether such homology relations are satisfiable and, in the positive case, to construct event-labeled gene trees containing speciation, duplication and horizontal gene transfer (HGT) events that can explain the relations.
Reference for PartialHomology:
N. Nojgaard, N. El-Mabrouk, D. Merkle, N. Wieske, M. Hellmuth
Partial Orthology, Paralogy and Xenology Relations - Satisfiability in terms of Di-Cographs
In Computing and Combinatorics: 24st International Conference (COCOON),
Lecture Notes in Computer Science, vol 10976, Springer, pp. 403-415, 2018
ComposAlign
Alignments are part of the most important data type in the field of comparative genomics. They can be abstracted to a character matrix derived from aligned sequences. Avariety of biological questions forces the researcher to inspect these alignments. Our tool, called ComposeAlign, was developed to sonify largescale genomic data.The resulting musical composition is based on COMMONMUSIC and allows the mapping of genes to motifs and species to instruments. It enables the researcher to listen to the musical representation of thegenome-wide alignment and contrasts a bioinformatician's sight-oriented work at the computer.
Webinterface, Sources and Examples can be found here.
Reference for ComposeAlign:
T. Ingalls, G. Martius, M. Hellmuth, M. Marz, S.J. Prohaska,
Converting DNA to Music: ComposAlign,
German Conference on Bioinformatics 2009:Lecture Notes in Informatics, 93-103, 2009
(Approximate) Products of Graphs
The following tools are designed to determine the (approximate) prime factors of a given graph.
Prime Factor Decomposition of Graphs w.r.t. Strong Product Using a Local Approach
Reference for Local PFD strong product graphs:
M. Hellmuth
A Local Prime Factor Decomposition Algorithm
Discrete Mathematics, 311, 12, 944-965, 2011
Prime Factor Decomposition of Graphs w.r.t the Strong Product
Prime Factor Decomposition of Graphs w.r.t. the Cartesian Product