Type Strain Genome Server

Q: Why does the TYGS not offer POCP or AAI values for the delineation of prokaryotic genera?

Short answer: Because there are issues with POCP and AAI that have already been discussed in the literature (see below). However, genera can usually be delineated using the TYGS by properly interpreting the TYGS phylogenies (see below).

Long answer: The pragmatic prokaryotic species concept uses DDH and a 70% threshold for the comparison of novel strains to a set of type strains (digital DDH mimics DDH without the known pitfalls of traditional DDH) [Meier-Kolthoff et al. (2013a)]. This concept works relatively well because closely related organisms usually evolved at a similar speed thus resulting in matrices of pairwise (dis-)similarities that are oftentimes nearly ultrametric [Meier-Kolthoff et al. (2014a)]. The latter is important because the application of any type of threshold to a given distance or similarity matrix will only properly work under this condition [Meier-Kolthoff et al. (2014a)]. Now, the less related organisms are, the less ultrametric the underyling (dis-)similarity data matrix will be. This is frequently the case when working on datasets covering entire genera, families or higher taxa and this is also the reason for why one won't find generally accepted universal genus (or higher taxa) delineation cutoffs for dDDH, ANI etc.

Now, even though POCP and AAI were introduced for genus delineation, these approaches were also criticized in the literature in various aspects. A brief summary of these issues is found in [Barco et al. (2020)]:

[...] Methods to demarcate genera have been proposed that are based on either AAI (18) or the percentage of conserved proteins (POCP; 19). The former method provided a range of AAI values (65 to 72%) that were originally obtained by correlation to a now-outdated 16S rRNA gene identity threshold for genus. The POCP method directly relies on the 16S rRNA gene sequence, which is in some cases insensitive to evolutionary changes in the rest of the genome of a given organism, as revealed by different species sharing >99% identity over the length of this gene. This method also arbitrarily sets a genus boundary at a POCP value of 50%. Additionally, the generally used arbitrary genus threshold of 95% 16S rRNA gene identity has been recently revisited to a lower minimum value of 94.5%, with a median sequence identity of 96.4% and confidence interval of 94.55 to 95.05% [...]

Moreover, AAI was suggested in their original work to be capable of providing insights into the higher level taxonomy. But AAI pairwise values alone, even if visualized as a dendrogram, do not replace truly genome-scale phylogenies with branch support (e.g. GBDP-based phylogenies as provided by the TYGS).

Regarding POCP, studies concluded that POCP is not universally applicable:

[...] In this context, the 50% POCP boundary is not an appropriate metric to delineate genera within Methylococcaceae. The use of the POCP has, similarly, been shown to be ineffective in delineating genera within the families Bacillaceae (Aliyu et al., 2016), Burkholderiaceae (Lopes-Santos et al., 2017), Neisseriaceae (Li et al., 2017), and Rhodobacteraceae (Wirth and Whitman, 2018), among others. [...]

But what to do instead?

To the best of our knowledge, decisions on higher level classification should be inferred from well-resolved phylogenies by comparison of, for example, relative subtree heights and by how uniform the proposed taxa (e.g. genera) are in terms of sequence divergence among one another. For example, when we conducted the large taxonomic studies on Actinobacteria or Bacteroidetes, we used the principles of phylogenetic systematics and taxonomic conservatism to repair obviously non-monophyletic taxa.

In general, when one is interested in delineating novel genera, or the higher level classification, existing taxa can serve as a guide. For example by comparing how the different genera in a given family are nested in the phylogenetic tree (relative heights of their subtrees), one can usually find a conservative delineation into genera that makes these newly created genera uniform in terms of sequence divergence when compared to the other ones in the family.

Note, if the TYGS whole genome-based GBDP analysis is not well resolved, one can even order an additional proteome-based GBDP analysis.