# Software

## ParaPhylo

ParaPhylo is a tool for reconstructing phylogenetic species trees based on information about paralogous genes. It relies on solving three intertwined NP-hard optimization problems: the cograph editing problem, the maximum consistent triple set problem, and the least resolved tree problem. Implemented as Integer Linear Program, paralogy-based phylogenies can be computed exactly for up to some twenty species and their complete protein complements.

*Reference for ParaPhylo:*

M. Hellmuth, N. Wieseke, M. Lechner, H-P Lenhof, M. Middendorf, and P.F. Stadler

Phylogenomics with Paralogs

Proceedings of the National Academy of Sciences (PNAS), 112(7):2058–2063, 2015

M. Hellmuth, N. Wieseke

From Sequence Data incl. Orthologs, Paralogs, and Xenologs to Gene and Species Trees

in Evolutionary Biology: Convergent Evolution, Evolution of Complex Traits, Concepts and Methods, Ed. Pierre Pontarotti, Springer, ISBN 978-3-319-41323-5, 2016

## GapEST

Antiparallel strong traces (ASTs) are a type of walks in graphs which use every edge exactly twice. They correspond to 1-face embeddings in orientable surfaces and can be used to design self- assembling protein or DNA strands. Based on a novel canonical form invariant for ASTs, gap vector, we provide a linear-time isomorphism test for ASTs and thus, also for orientable 1-face embeddings of graphs. Using the canonical form, we develop the algorithm GapEST for enumerating all pairwise non-isomorphic 1-face embeddings of given graphs.

*Reference for GapEST:*

M. Hellmuth, A.S. Knudsen, M. Kotrbčík, D. Merkle and N. NøjgaardLinear Time Canonicalization and Enumeration of Non-Isomorphic 1-Face EmbeddingsSIAM: Algorithm Engineering and Experiments (ALENEX18) (to appear) 2018

## TimeCons-Reconc

In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. TimeCons-Recons is a tool that allow to check in O(|V(T)|log(|V(S)|))-time whether there is a time-consistent reconciliation map between an event-labeled gene tree T and a species tree S.

*Reference for TimeCons-Reconc*

N. Nøjgaard, M. Geiß, D. Merkle, P. F. Stadler, N. Wieske, M. HellmuthForbidden Time Travel: Characterization of Time-Consistent Tree Reconciliation Maps17th International Workshop on Algorithms in Bioinformatics (WABI 2017),Leibniz International Proceedings in Informatics (LIPIcs), 88, 17:1--17:12, 2017

## PartialHomology

A variety of methods based on sequence similarity, reconciliation, synteny or functional charac- teristics, can be used to infer homology relations, that is, orthology, paralogy and xenology relations between genes of a given gene family G. The (inferred) homology relations might not cover each pair of genes and thus, provide only partial knowledge on the full set of homology relations. Moreover, for particular pairs of genes it might be known with a high degree of certainty that they are not orthologs (resp. paralogs, xenologs) which yields forbidden pairs of genes. The question arises as whether such sets of (partial) homology relations with or without forbidden gene pairs are satisfiable, i.e., can they simultaneously co-exist in an evolutionary history for G. PartialHomology is an O(|G|^2+m|G|)-time algorithm to determine whether such homology relations are satisfiable and, in the positive case, to construct event-labeled gene trees containing speciation, duplication and horizontal gene transfer (HGT) events that can explain the relations.

*Reference for PartialHomology:*

N. Nøjgaard, N. El-Mabrouk, D. Merkle, N. Wieske, M. Hellmuth

Partial Orthology, Paralogy and Xenology Relations - Satisfiability in terms of Di-Cographs

arXiv:1711.00504 (2017)

## ComposAlign

Alignments are part of the most important data type in the field of comparative genomics. They can be abstracted to a character matrix derived from aligned sequences. Avariety of biological questions forces the researcher to inspect these alignments. Our tool, called ComposeAlign, was developed to sonify largescale genomic data.The resulting musical composition is based on COMMONMUSIC and allows the mapping of genes to motifs and species to instruments. It enables the researcher to listen to the musical representation of thegenome-wide alignment and contrasts a bioinformatician's sight-oriented work at the computer.

Webinterface, Sources and Examples can be found here.

*Reference for ComposeAlign:*

T. Ingalls, G. Martius, M. Hellmuth, M. Marz, S.J. Prohaska,

Converting DNA to Music: ComposAlign,

German Conference on Bioinformatics 2009:Lecture Notes in Informatics, 93-103, 2009

## (Approximate) Products of Graphs

The following tools are designed to determine the (approximate) prime factors of a given graph.

**Prime Factor Decomposition of Graphs w.r.t. Strong Product Using a Local Approach**

*Reference for Local PFD strong product graphs:*

M. Hellmuth

A Local Prime Factor Decomposition Algorithm

Discrete Mathematics, 311, 12, 944-965, 2011

**Prime Factor Decomposition of Graphs ****w.r.t the Strong Product**

**Prime Factor Decomposition of Graphs ****w.r.t. the Cartesian Product**