Use of this form is free for academic purposes. For all other uses, please contact the authors via the feedback form.
"Academic purposes" do not include running queries just to collect and potentially plagiarize the type-strain genomes and other associated meta data included in the TYGS database. If you are unsure if and how to properly use the TYGS, do not hesitate to contact the TYGS team via the contact form.
Q: How is my privacy respected?
Browsing the TYGS web page is free and does not require registration.
For submitting a TYGS job, an e-mail address has to be provided along with some genome sequences and/or GenBank accession IDs, the subject of the subsequent analysis. The e-mail address is the only piece of person-related information stored by the TYGS. All data associated with a TYGS job, including the e-mail address, are deleted after the job has finished and an additional amount of time has passed. The exact time of deletion is indicated in the notification e-mail and in the information badges in the top-right corner of the TYGS result page.
Additional information is found in the general privacy statement of the Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, in accordance with the General Data Protection Regulation.
Q: How to conduct larger analyses?
The TYGS is currently limited to 20 genomes not because of a limitation of the method but because we have to keep an eye on the compute cluster usage. But for most use cases this limit is clearly more than enough.
However, we usually offer to increase the upload cap for a given email address on TYGS beyond the default of 20 user genomes when asked for, assuming the requested upload cap is not exorbitantly high.
Before asking for an increased upload cap, please consider and prepare the following aspects though:
How large is your final dataset?
Is your dataset really final?
In case you want to analyze few own isolates together with a rather large list of
'reference strains' which you have obtained from some other web site:
Which insights do you expect to get from such an analysis?
Why can't you analyze your isolates through the TYGS without these
'reference strains' (also please read the FAQ item on type strains vs.
Have you removed redundant strains (e.g. clonal strains) from the dataset to reduce its size and, if not, why?
Is your dataset a rather diverse selection of strains or do the strains belong to a group of closely related strains?
Are your files properly labelled (e.g. labels will be shown in the trees)?
Does your dataset contain type-strain genomes and, if yes, why? The TYGS already provides type-strain genomes.
Do you want to restrict the TYGS analysis to the uploaded genome sequences or do you want
to use the TYGS in standard mode (i.e. the TYGS will determine closely related type
How does the procedure work?
We will set up an exception for your e-mail address which allows you to submit TYGS
requests with the requested number of user sequences. If the number of strains exceeds
~100, the web server might dislike this large amount of data and we have to submit the
files on your behalf from within our network. In that case the file exchange will be
organized via a confidential folder on our institutional cloud.
The exception will be usually valid for a week but can normally be extended on request. Please note that your job(s) will use some resources of our in-house compute cluster. That is, please do not start multiple redundant jobs of that size if this can be avoided.
Q: How should GenBank accessions be specified?
Please see the
for detailed instructions on how to specify accession numbers.
Q: How do I specify more than one GenBank/FASTA file in the submission form?
After a click on the "browse" button in the submission form, you can hold the CTRL key in the files selection
dialog and use the left mouse button to click and select a custom set of genome files. The key combination CTRL+A
will select all files in the folder. If you hold the SHIFT key instead, you can select a range of files. Both
techniques should work in any type of browser and these mechanisms are entirely independent of the TYGS.
Q: Why are a 'type strain' and a 'reference strain' usually not the same?
Type strains form the backbone of prokaryotic systematics as nomenclatural types of
species and subspecies, and comparisons with established type strains are mandatory
when classifying novel strains (PMID: 19700448).
The good news is that the TYGS is designed to automatically determine type strains
closely related to your query genome(s).
On contrary, the term 'reference strain' is often used synonymously with the term
'type strain' but in fact the former term is not sharply defined and just an
arbitrary label which can basically be put on any strain, even on those which are
in fact no type strains. But this can lead to serious taxonomic confusions,
if strains are falsely mistaken as type strains. That is, be careful when preparing
your dataset and when collecting lists of 'reference strains' from other web pages.
Q: Do I have to include type strain genomes in my TYGS submission?
In default mode, the TYGS will already determine a set of closely related type-strain genomes per each of your provided genomes.
The exact procedure is described in the TYGS publication. If you are still uploading type strain genomes, this will of course result
in duplicate genome sequences throughout your results because your uploaded genomes will perfectly match with the respective
type-strain genome from the TYGS database. In case you have clicked the checkbox 'Restrict job to above genome(s)?', the TYGS will skip
the determination of closely related type-strain genomes entirely and only focus on the genome sequences and accessions you have
provided via the submission form.
Q: Why is a particular type-strain genome not included in the TYGS database?
While the TYGS database attempts to be as comprehensive as possible, a specific
type-strain genome may be missing for a variety of reasons:
The genome has not yet been sequenced.
The genome sequence has been obtained but not been deposited in public
The genome sequence has been deposited in public databases but cannot be
identified because its metadata lack crucial information.
The genome sequence has been deposited in public databases but cannot be
identified because a deposit of the type strain was used that is unknown to
the TYGS database.
The genome sequence was identified in public databases but is still under
investigation by the TYGS team.
The genome sequence can be identified in public databases but fails the
TYGS quality checks.
If you are aware of a type-strain genome sequence that is missing in the TYGS
database, please contact the maintainers and
provide information on this genome sequence as indicated above.
Prior to sending a report on a missing type-strain genome sequence,
make sure the species or subspecies is not contained in the list for
manual genome selection. If it is contained, you may better file a report on a
type-strain genome sequence that is contained but not automatically found.
Q: Why did I not receive an e-mail pointing to TYGS results?
Please see the
for detailed explanations of why an e-mail may get lost.
Also note that the TYGS results are additionally displayed on a website.
Q: What do the three different digital DDH formulas (d0,d4,d6) mean?
Table 3 of the TYGS result page contains the pairwise dDDH values between your user genomes and the
selected type-strain genomes. The dDDH values are provided along with their confidence intervals (C.I.)
for the three different GBDP (Genome BLAST Distance Phylogeny) formulas:
formula d0 (a.k.a. GGDC formula 1): length of all HSPs divided by total genome length
formula d4 (a.k.a. GGDC formula 2): sum of all identities found in HSPs divided by overall HSP length
formula d6 (a.k.a. GGDC formula 3): sum of all identities found in HSPs divided by total genome length
More info on these formulas and the underlying GBDP method is found in the literature.
Note: Formula d4 is independent of genome length and is thus robust against the use of incomplete draft
genomes. For other reasons for preferring formula d4, see the FAQ. Formulae d0 and d6 reflect the genome pair's
(dis-)similarity in gene content.
Q: Why is a species sometimes represented in the TYGS phylogenies by more than one strain deposit?
For some species the TYGS database contains the genome sequences of more than one strain deposit (e.g. ATCC and DSM). Now, if a user-provided
genome sequence results in a close match with such a species all strain deposits of that species are usually included in the TYGS result.
The main reason is that the scientific literature reports rare cases in which such strains unexpectedly differ to a considerable extent thus indicating
a strain confusion or contamination (please find an example here). That way, the TYGS is an important tool to uncover such irregularities.
Apart from that, we think that having more than one strain deposit of the same species included in the dataset is at most a cosmetic issue, not a scientific problem. If you still want to remove such "duplicates", you have the option to download the trees in Newick format and remove them. We however advise against post-manipulation of ready-made results.
Q: Why do the TYGS results sometimes include species whose names are not validly published?
The TYGS is showing such matches because it may be important for any taxonomist to be aware of all closely related species or subspecies even if their names are not (yet) validly published. Since several criteria have to be met before a new species name is validly published (see details on LPSN page), the entire process might take some time. For example, it can well be that a novel species or subspecies name was already proposed in an effective publication but has still not been announced in a Validation List. In theory, a second team might now start working on the description of the same taxon, resulting in redundant work. That is, if your novel strain is placed in the same species or subspecies cluster as a species or subspecies with a not (yet) validly published name, we recommend to get in touch with its authors. An effective publication may be available and the valid publication of the name may be imminent. And even in the case of a low probability of a forthcoming validation it is often worth reporting phylogenetically close relationships to taxa that lack a validly published name. Their names occur in databases anyway and their analyses may yield valuable information.
Q: Is it possible that the TYGS does not show the correct name of a species or subspecies?
When two species are regarded as heterotypic synonyms, this does not affect the status of type strains as type strains. The type strain of the younger heterotypic synonym remains the type strain of the younger heterotypic synonym; it neither becomes the type strain of the older heterotypic synonym nor does it lose its status as type strain entirely. The TYGS aims at an unambiguous relationship between taxon names and type strains. For each type strain the taxon name (or set of taxon names) for which it is the type strain must be shown to achieve this, even if the taxon name is actually believed to be a younger heterotypic synonym and thus not believed to be the correct name. Notably, as specified in the International Code of Nomenclature of Prokaryotes, if alternatives are available the choice of the correct name depends on taxonomic opinion.
Moreover, the analyses conducted with the TYGS yield taxon boundaries themselves. Information on heterotypic synonyms is thus part of the outcome and should not be part of the input. Often the TYGS results simply confirm known synonym relationships. But surprises are also possible. Additional information on known heterotypic synonyms is available from LPSN, to which the TYGS results are linked. Heterotypic synonyms at the species or subspecies level may contain distinct genus names. To assess the affiliation to a genus it is necessary to consider the position of the type species of the genus.
Q: The tree is missing! What should I do?
Click on the refresh symbol close to
"Click to load or refresh tree (page needs to be viewed in https session)"
on the tree page.
Q: How do I proceed if the genome-scale phylogeny is not well resolved?
For very diverse datasets of strains, the average branch support, even of the genome-based phylogeny,
might be too low which is not unlikely for such datasets. In general, if certain parts in any given
phylogeny are not well resolved (i.e. low branch support), these parts are not interpretable.
In case of the TYGS, an optional proteome-based GBDP analysis will become available on user request
if the dataset is not too large (< 30 strains) and the average branch support of the genome-based
tree is smaller than 60%. If these conditions are met, you will find an order button on the respective
TYGS result page below the phylogenies table.
The method behind proteome-based GBDP analyses is similiar to the nucleotide-based GBDP analyses except
for the use of the entire proteome in the former case. The method has been described
here and here
and, moreover, has been successfully applied in large-scale phylogenomic studies such as:
1. Lagkouvardos I, Pukall R, Abt B, Foesel BU, Meier-Kolthoff JP, Kumar N, et al. The Mouse Intestinal Bacterial Collection (miBC) provides host-specific insight into cultured diversity and functional potential of the gut microbiota. Nat Microbiol. 2016;1: 16131. doi:10.1038/nmicrobiol.2016.131
2. Barka EA, Vatsa P, Sanchez L, Gaveau-vaillant N, Jacquard C, Meier-Kolthoff JP, et al. Taxonomy, physiology, and natural products of Actinobacteria. Microbiol Mol Biol Rev. 2016;80: iii. doi:10.1128/MMBR.00044-16
3. Nouioui I, Ghodhbane-Gtari F, Montero-Calasanz M del C, Göker M, Meier-Kolthoff JP, Schumann P, et al. Proposal of a type strain for Frankia alni ( Woronin 1866 ) Von Tubeuf 1895 , emended description of Frankia alni, and recognition of Frankia casuarinae sp. nov. and Frankia elaeagni sp. Int J Syst Evol Microbiol. 2016;published. doi:10.1099/ijsem.0.001496
4. Hahnke RL, Meier-Kolthoff JP, García-Lopez M, Mukherjee S, Huntemann M, Ivanova NN, et al. Genome-based taxonomic classification of Bacteroidetes. Front Microbiol. 2016;7: 2003. doi:10.3389/fmicb.2016.02003
5. Simon M, Scheuner C, Meier-Kolthoff JP, Brinkhoff T, Wagner-Döbler I, Ulbrich M, et al. Phylogenomics of Rhodobacteraceae reveals evolutionary adaptation to marine and non-marine habitats. ISME J. 2017;11: 1483–1499. doi:10.1038/ismej.2016.198
6. Montero-Calasanz M del C, Meier-Kolthoff JP, Zhang DF, Yaramis A, Rohde M, Woyke T, et al. Genome-scale data call for a taxonomic rearrangement ofGeodermatophilaceae. Front Microbiol. 2017;8: 1–15. doi:10.3389/fmicb.2017.02501
7. Mukherjee S, Seshadri R, Varghese NJ, Eloe-Fadrosh EA, Meier-Kolthoff JP, Göker M, et al. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nat Biotechnol. 2017;35: 676–683. doi:10.1038/nbt.3886
8. Carro L, Nouioui I, Sangal V, Meier-Kolthoff JP, Trujillo ME, Montero-Calasanz M del C, et al. Genome-based classification of micromonosporae with a focus on their biotechnological and ecological potential. Sci Rep. 2018;8: 525. doi:10.1038/s41598-017-17392-0
9. Nouioui I, Carro L, García-López M, Meier-Kolthoff JP, Woyke T, Kyrpides NC, et al. Genome-based taxonomic classification of the phylum Actinobacteria. Front Microbiol. 2018;9: 1–119. doi:10.3389/fmicb.2018.02007
10. García-López M, Meier-Kolthoff JP, Tindall BJ, Gronow S, Woyke T, Kyrpides NC, et al. Analysis of 1,000 type-strain genomes improves taxonomic classification of Bacteroidetes. Front Microbiol. 2019;10: 2083. doi:10.3389/fmicb.2019.02083
11. Hördt A, García-López M, Meier-Kolthoff JP, Schleuning M, Weinhold LM, Tindall BJ, et al. Analysis of 1,000+ type-strain genomes substantially improves taxonomic classification of Alphaproteobacteria. Front Microbiol. 2020;11: 468. doi:10.3389/fmicb.2020.00468
12. Dedysh SN, Henke P, Ivanova AA, Kulichevskaya IS, Philippov DA, Meier-Kolthoff JP, et al. 100-year-old enigma solved: identification, genomic characterization and biogeography of the yet uncultured Planctomyces bekefii. Environ Microbiol. 2020;22: 198–211. doi:10.1111/1462-2920.14838
13. Strepis N, Naranjo HD, Meier-Kolthoff J, Göker M, Shapiro N, Kyrpides N, et al. Genome-guided analysis allows the identification of novel physiological traits in Trichococcus species. BMC Genomics. 2020;21: 1–13. doi:10.1186/s12864-019-6410-x
14. Schumann P, Kalensee F, Cao J, Criscuolo A, Clermont D, Köhler JM, et al. Reclassification of Haloactinobacterium glacieicola as Occultella glacieicola gen. nov., comb. nov., of Haloactinobacterium album as Ruania alba comb. nov, with an emended description of the genus Ruania, recognition that the genus names Haloactinobacteri. Int J Syst Evol Microbiol. 2021;71. doi:https://doi.org/10.1099/ijsem.0.004769
15. Heidler von Heilborn D, Reinmüller J, Hölzl G, Meier-Kolthoff JP, Woehle C, Marek M, et al. Sphingomonas aliaeris sp . nov ., a new species isolated from pork steak packed under modified atmosphere. Int J Syst Evol Microbiol. 2021;71: 004973. doi:10.1099/ijsem.0.004973
Q: Why does the TYGS not offer POCP or AAI values for the delineation of prokaryotic genera?
Short answer: Because there are issues with POCP and AAI that have already been discussed in the literature (see below). However, genera can usually be delineated using the TYGS by properly interpreting the TYGS phylogenies (see below).
Long answer: The pragmatic prokaryotic species concept uses DDH and a 70% threshold for the comparison of novel strains to a set of type strains (digital DDH mimics DDH without the known pitfalls of traditional DDH) [Meier-Kolthoff et al. (2013a)]. This concept works relatively well because closely related organisms usually evolved at a similar speed thus resulting in matrices of pairwise (dis-)similarities that are oftentimes nearly ultrametric [Meier-Kolthoff et al. (2014a)]. The latter is important because the application of any type of threshold to a given distance or similarity matrix will only properly work under this condition [Meier-Kolthoff et al. (2014a)]. Now, the less related organisms are, the less ultrametric the underyling (dis-)similarity data matrix will be. This is frequently the case when working on datasets covering entire genera, families or higher taxa and this is also the reason for why one won't find generally accepted universal genus (or higher taxa) delineation cutoffs for dDDH, ANI etc.
Now, even though POCP and AAI were introduced for genus delineation, these approaches were also criticized in the literature in various aspects. A brief summary of these issues is found in [Barco et al. (2020)]:
[...] Methods to demarcate genera have been proposed that are based on either AAI (18) or the percentage of conserved proteins (POCP; 19). The former method provided a range of AAI values (65 to 72%) that were originally obtained by correlation to a now-outdated 16S rRNA gene identity threshold for genus. The POCP method directly relies on the 16S rRNA gene sequence, which is in some cases insensitive to evolutionary changes in the rest of the genome of a given organism, as revealed by different species sharing >99% identity over the length of this gene. This method also arbitrarily sets a genus boundary at a POCP value of 50%. Additionally, the generally used arbitrary genus threshold of 95% 16S rRNA gene identity has been recently revisited to a lower minimum value of 94.5%, with a median sequence identity of 96.4% and confidence interval of 94.55 to 95.05% [...]
Moreover, AAI was suggested in their original work to be capable of providing insights into the higher level taxonomy. But AAI pairwise values alone, even if visualized as a dendrogram, do not replace truly genome-scale phylogenies with branch support (e.g. GBDP-based phylogenies as provided by the TYGS).
Regarding POCP, studies concluded that POCP is not universally applicable:
[...] In this context, the 50% POCP boundary is not an appropriate metric to delineate genera within Methylococcaceae. The use of the POCP has, similarly, been shown to be ineffective in delineating genera within the families Bacillaceae (Aliyu et al., 2016), Burkholderiaceae (Lopes-Santos et al., 2017), Neisseriaceae (Li et al., 2017), and Rhodobacteraceae (Wirth and Whitman, 2018), among others. [...]
But what to do instead?
To the best of our knowledge, decisions on higher level classification should be inferred from well-resolved phylogenies by comparison of, for example, relative subtree heights and by how uniform the proposed taxa (e.g. genera) are in terms of sequence divergence among one another. For example, when we conducted the large taxonomic studies on Actinobacteria or Bacteroidetes, we used the principles of phylogenetic systematics and taxonomic conservatism to repair obviously non-monophyletic taxa.
In general, when one is interested in delineating novel genera, or the higher level classification, existing taxa can serve as a guide. For example by comparing how the different genera in a given family are nested in the phylogenetic tree (relative heights of their subtrees), one can usually find a conservative delineation into genera that makes these newly created genera uniform in terms of sequence divergence when compared to the other ones in the family.
Yes, the TYGS has an API for the programmatic download of results. Please find a detailed description here.
Q: Which user-visible changes have been applied to the TYGS database since the publication?
In addition to the routine import of taxonomic and genomic information,
the user-visible changes that were applied to the database after
the initial Nature Communications publication are listed on the News and