Software

AsymmeTree

AsymmeTree is an open-source Python library for the simulation and analysis of phylogenetic scenarios. It includes a simulator for species and gene trees with heterogeneous evolution rates, nucleotide and amino acid sequences with or without indels, as well as whole genomes/proteomes.
Moreover, it includes tools for the inference and analysis of orthology and phylogenetic best matches (resp. best hits) from known gene trees or evolutionary distances, tools for the analysis of horizontal gene transfer (HGT) events, an algorithm to compute supertrees, and a method to estimate rooted species trees from an ensemble of orthology/paralogy relations.
The library is primarily designed to explore and validate mathematical concepts, and to test inference methods for various steps on the way to more realistically-available data, i.e., dated gene trees, additive distances of gene sets, noisy distances and finally sequences.

Reference for AsymmeTree:

AsymmeTree: A Flexible Python Package for the Simulation of Complex Gene Family Histories
D. Schaller, M. Hellmuth, P.F. Stadler, Software, 1(3), 276-298, 2022
[publisher's page]

tralda

tralda is an open-source Python library for tree algorithms and data structures. It provides methods for tree traversals and manipulation, output in Newick format, as well as the linear-time computation of last common ancestors by Bender & Farach-Colton. Tralda ontains an efficient algorithm for cograph recognition and heuristics for cograph editing. A number of algorithms for the computation of supertrees are implemented:

  • BUILD (Aho et al. 1981), class Build or function BUILD_supertree
  • BuildST (Deng & Fernández-Baca 2016), class BuildST or function build_st
  • Loose_Cons_Tree (Jansson et al. 2016), class LooseConsensusTree or function loose_consensus_tree
  • LinCR (Schaller et al. 2021), class LinCR or function linear_common_refinement

Reference for Tralda:

A simpler linear-time algorithm for the common refinement of rooted phylogenetic trees on a common leaf set
D. Schaller, M. Hellmuth, P.F. Stadler, Algorithms for Molecular Biology volume 16, Article number: 23 (2021)
[publisher's page]

Simplify DAGs and networks via Least-Common-Ancestor (LCA) Constraints

In a DAG or network G with leaf set L(G), a least common ancestor (LCA) of a subset A of L(G) is a vertex v that is an ancestor of all x in A and has no descendant that also satisfies this property. A DAG or network G is LCA-relevant if each vertex is the LCA of some subset of leaves. Moreover, G is lca-relevant if each vertex is a unique LCA in G. DAGs and networks inferred from genomic sequence data can be highly complex and tangled, often containing redundant information. In particular, vertices that are not LCAs of any subset of leaves can be considered less significant and redundant in an evolutionary contex as they lack direct relevance to the observed ancestral relationships. To reduce unnecessary complexity and eliminate unsupported vertices, we aim to simplify a DAG to retain only LCA vertices while preserving essential evolutionary information. This python tool allows to simplify a DAG by ``removal'' of such vertices resulting in an LCA-relevant, resp., lca-relevant DAG while preserving key structural properties of the original DAG or network.

  • Download
  • Reference:

    Simplifying and Characterizing DAGs and Phylogenetic Networks via Least Common Ancestor Constraints, Lindeberg, A. and Hellmuth, M., arXiv:2411.00708, 2024

    ParaPhylo

    ParaPhylo is a tool for reconstructing phylogenetic species trees based on information about paralogous genes. It relies on solving three intertwined NP-hard optimization problems: the cograph editing problem, the maximum consistent triple set problem, and the least resolved tree problem. Implemented as Integer Linear Program, paralogy-based phylogenies can be computed exactly for up to some twenty species and their complete protein complements.

    Reference for ParaPhylo:

    M. Hellmuth, N. Wieseke, M. Lechner, H-P Lenhof, M. Middendorf, and P.F. Stadler
    Phylogenomics with Paralogs
    Proceedings of the National Academy of Sciences (PNAS), 112(7):2058-2063, 2015

    M. Hellmuth, N. Wieseke
    From Sequence Data incl. Orthologs, Paralogs, and Xenologs to Gene and Species Trees
    in Evolutionary Biology: Convergent Evolution, Evolution of Complex Traits, Concepts and Methods, Ed. Pierre Pontarotti, Springer, ISBN 978-3-319-41323-5, 2016

    REvolutionH-tl: Reconstruction of Evolutionary Histories tool

    Orthology detection from sequence similarity remains a difficult and computationally expensive problem for gene families with large numbers of gene duplications and losses. REvolutionH-tl implements a new graph-based approach to identify orthogroups, orthology, and paralogy relationships first, and it uses this information in a second step to infer event-labeled gene trees and their reconciliation with an inferred species tree. It avoids using gene trees and species trees upon input and settles for a maximal subtree reconciliation in cases where noise or horizontal gene transfer precludes a global reconciliation

    Reference for REvolutionH-tl:

    J.A. Ramirez-Rafael, A. Korchmaros, K. Avina-Padilla, A. Lopez Sanchez, A.A. Espana-Tinajero, M. Hellmuth, P.F. Stadler & M. Hernandez-Rosales
    REvolutionH-tl: Reconstruction of Evolutionary Histories tool
    In: Scornavacca, C., Hernandez-Rosales, M. (eds) Comparative Genomics. RECOMB-CG 2024. Lecture Notes in Computer Science, vol 14616. Springer, Cham.

    GapEST

    Antiparallel strong traces (ASTs) are a type of walks in graphs which use every edge exactly twice. They correspond to 1-face embeddings in orientable surfaces and can be used to design self- assembling protein or DNA strands. Based on a novel canonical form invariant for ASTs, gap vector, we provide a linear-time isomorphism test for ASTs and thus, also for orientable 1-face embeddings of graphs. Using the canonical form, we develop the algorithm GapEST for enumerating all pairwise non-isomorphic 1-face embeddings of given graphs.

    Download
    Datasets

    Reference for GapEST:

    M. Hellmuth, A.S. Knudsen, M. Kotrbcik, D. Merkle and N. Nojgaard
    Linear Time Canonicalization and Enumeration of Non-Isomorphic 1-Face EmbeddingsSIAM: Algorithm Engineering and Experiments (ALENEX18)  (to appear) 2018

    (Cayley Atom Tracker)

    A python package for tracking atoms through chemical networks using projected Cayley graphs.

    Download

    References for CAT

    M. Hellmuth, D. Merkle, N. Nøjgaard
    Atom Tracking Using Cayley Graphs, 16th International Symposium on Bioinformatics Research and Applications (ISBRA 2020), Lecture Notes in Computer Science, vol 12304, Springer, pp 406-415, 2020

    M. Hellmuth, W. Fontana, D. Merkle, N. Nøjgaard
    Cayley Graphs of Semigroups Applied to Atom Tracking in Chemistry Journal of Computational Biology, Journal of Computational Biology, 28(7), 701-715., 2021

     

    TimeCons-Reconc

    In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. TimeCons-Recons is a tool that allow to check in O(|V(T)|log(|V(S)|))-time whether there is a time-consistent reconciliation map between an event-labeled gene tree T and a species tree S.

    Download

    Reference for TimeCons-Reconc

    N. Nojgaard, M. Geiss, D. Merkle, P. F. Stadler, N. Wieske, M. Hellmuth
    Forbidden Time Travel: Characterization of Time-Consistent Tree Reconciliation Maps17th International Workshop on Algorithms in Bioinformatics (WABI 2017),Leibniz International Proceedings in Informatics (LIPIcs), 88, 17:1--17:12, 2017

     

    PartialHomology

    A variety of methods based on sequence similarity, reconciliation, synteny or functional charac- teristics, can be used to infer homology relations, that is, orthology, paralogy and xenology relations between genes of a given gene family G. The (inferred) homology relations might not cover each pair of genes and thus, provide only partial knowledge on the full set of homology relations. Moreover, for particular pairs of genes it might be known with a high degree of certainty that they are not orthologs (resp. paralogs, xenologs) which yields forbidden pairs of genes. The question arises as whether such sets of (partial) homology relations with or without forbidden gene pairs are satisfiable, i.e., can they simultaneously co-exist in an evolutionary history for G. PartialHomology is an O(|G|^2+m|G|)-time algorithm to determine whether such homology relations are satisfiable and, in the positive case, to construct event-labeled gene trees containing speciation, duplication and horizontal gene transfer (HGT) events that can explain the relations.

    Download

    Reference for PartialHomology:

    N. Nojgaard, N. El-Mabrouk, D. Merkle, N. Wieske, M. Hellmuth
    Partial Orthology, Paralogy and Xenology Relations - Satisfiability in terms of Di-Cographs
    In Computing and Combinatorics: 24st International Conference (COCOON), Lecture Notes in Computer Science, vol 10976, Springer, pp. 403-415, 2018

     

    ComposAlign

    Alignments are part of the most  important data type in the field of comparative genomics. They can be abstracted to a character matrix derived from aligned sequences. Avariety of biological questions forces  the researcher to inspect these alignments. Our tool,  called ComposeAlign, was developed to sonify largescale genomic data.The resulting musical composition is based  on COMMONMUSIC and allows the mapping of genes to motifs and  species  to instruments. It enables the researcher to listen to the  musical representation of thegenome-wide alignment and contrasts a bioinformatician's sight-oriented work at the computer.

    Webinterface, Sources and Examples can be found here

    Reference for ComposeAlign:

    T. Ingalls, G. Martius, M. Hellmuth, M. Marz, S.J. Prohaska,
    Converting DNA to Music: ComposAlign,
    German Conference on Bioinformatics 2009:Lecture Notes in Informatics, 93-103, 2009



    (Approximate) Products of Graphs

    The following tools are designed to determine the (approximate) prime factors of a given graph.

    Prime Factor Decomposition of Graphs w.r.t. Strong Product Using a Local Approach

    Reference for Local PFD strong product graphs:

    M. Hellmuth
    A Local Prime Factor Decomposition Algorithm
    Discrete Mathematics, 311, 12, 944-965, 2011

    Prime Factor Decomposition of Graphs w.r.t the Strong Product

    Prime Factor Decomposition of Graphs w.r.t. the Cartesian Product

    Boost Graph Library