Type Strain Genome Server

Q: Why does the TYGS not offer all-vs-all dDDH (dis-)similarity matrices or cross tables?

In recent years heat maps and clusterings/dendrograms inferred from (dis-)similarity matrices have become rather popular assets in many taxonomic papers and species descriptions of microbes. One wonders, however, whether there is an actual scientific need for such methods.

The TYGS reports all relevant dDDH values, i.e., all comparisons between user-defined genomes and closest type-strain genomes and all comparisons between user-defined genomes. If you are running the TYGS in the mode "restricted to user genomes", the TYGS will report all pairwise dDDH values and these can, in principle, be easily transformed to a cross table. But the TYGS does not offer this is a built-in feature for reasons given below.

The three main applications of all-vs-all dDDH (dis-)similarity matrices are:

1) [Clusterings/Dendrograms]: One can apply some type of hierarchical clustering for obtaining an assignment of strains into species clusters. But if this is not properly done, the results can be misleading, especially if the underlying data are not ultrametric (i.e., if the organisms have not evolved under a molecular clock). Dendrograms inferred via hierarchical clustering are not phylogenetic trees and they do not normally show branch support. As a consequence, dendrograms should not be interpreted. Unfortunately, oftentimes such dendrograms are wrongly presented in scientific papers as "trees", thereby blurring the difference between them and phylogenetic trees.

2) [Heat maps]: Heat maps only allow for a visual exploration of a data set but they do neither replace a proper species clustering nor phylogenetic approaches. It may or may not be easy to decode the quantitative information that is conveyed by a heat map because of the need of a colour gradient. In particular, the human brain may notice sharp contrasts between adjacent bits of an image but may be poor at comparing shading in non-adjacent regions of a visualization. On top of that, the depicted (dis-)similarity values are not of primary taxonomic interest. They are just crude estimates for the true phylogenetic (dis-)similarity and their proper interpretation is achieved by inferring a phylogenetic tree.

3) [Phylogenetic trees]: A phylogenetic tree is the appropriate way to use (dis-)similarity matrices for taxonomic purposes, particularly in conjunction with branch support. The TYGS thus conducts and reports not only a distinct type-based species clustering (see TYGS paper), which is different to a standard hierarchical clustering, but also infers phylogenies with branch support on which these clusters are annotated. In that manner potential misinterpretations can be avoided or at least mitigated and usually, depending on the data, a well-informed taxonomic decision be made.