Use of this form is free for academic purposes. For all other uses, please contact the authors via the feedback form.
"Academic purposes" do not include running queries just to collect and potentially plagiarize the type-strain genomes and other associated meta data included in the TYGS database. If you are unsure if and how to properly use the TYGS, do not hesitate to contact the TYGS team via the contact form.
Q: How is my privacy respected?
Browsing the TYGS web page is free and does not require registration.
For submitting a TYGS job, an e-mail address has to be provided along with some genome sequences and/or GenBank accession IDs, the subject of the subsequent analysis. The e-mail address is the only piece of person-related information stored by the TYGS. All data associated with a TYGS job, including the e-mail address, are deleted after the job has finished and an additional amount of time has passed. The exact time of deletion is indicated in the notification e-mail and in the information badges in the top-right corner of the TYGS result page.
Additional information is found in the general privacy statement of the Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, in accordance with the General Data Protection Regulation.
Q: How to conduct larger analyses?
The TYGS is currently limited to 20 genomes not because of a limitation of the method but because we have to keep an eye on the compute cluster usage. But for most use cases this limit is clearly more than enough.
However, we usually offer to increase the upload cap for a given email address on TYGS beyond the default of 20 user genomes when asked for, assuming the requested upload cap is not exorbitantly high.
Before asking for an increased upload cap, please consider and prepare the following aspects though:
How large is your final dataset?
Is your dataset really final?
In case you want to analyze few own isolates together with a rather large list of
'reference strains' which you have obtained from some other web site: which insights do you
expect to get from such an analysis and why can't you just analyze your isolates through the
TYGS without these 'reference strains' (also please read the FAQ item on type strains vs.
Have you removed redundant strains (e.g. clonal strains) from the dataset to reduce its size and, if not, why?
Is your dataset a rather diverse selection of strains or do the strains belong to a group of closely related strains?
Are your files properly labelled (e.g. labels will be shown in the trees)?
Does your dataset contain type-strain genomes and, if yes, why? The TYGS already provides type-strain genomes.
Do you want to restrict the TYGS analysis to the uploaded genome sequences or do you want
to use the TYGS in standard mode (i.e. the TYGS will determine closely related type
How does the procedure work?
We will set up an exception for your e-mail address which allows you to submit TYGS
requests with the requested number of user sequences. If the number of strains exceeds
100, the web server might dislike this large amount of data and we have to submit the
files on your behalf from within our network. In that case the file exchange will be
organized via a confidential folder on our institutional cloud.
The exception will be usually valid for a week but can normally be extended on request. Please note that your job(s) will use some resources of our in-house compute cluster. That is, please do not start multiple redundant jobs of that size if this can be avoided.
Q: How should GenBank accessions be specified?
Please see the
for detailed instructions on how to specify accession numbers.
Q: How do I specify more than one GenBank/FASTA file in the submission form?
After a click on the "browse" button in the submission form, you can hold the CTRL key in the files selection
dialog and use the left mouse button to click and select a custom set of genome files. The key combination CTRL+A
will select all files in the folder. If you hold the SHIFT key instead, you can select a range of files. Both
techniques should work in any type of browser and these mechanisms are entirely independent of the TYGS.
Q: Why are a 'type strain' and a 'reference strain' usually not the same?
Type strains form the backbone of prokaryotic systematics as nomenclatural types of
species and subspecies, and comparisons with established type strains are mandatory
when classifying novel strains (PMID: 19700448).
The good news is that the TYGS is designed to automatically determine type strains
closely related to your query genome(s).
On contrary, the term 'reference strain' is often used synonymously with the term
'type strain' but in fact the former term is not sharply defined and just an
arbitrary label which can basically be put on any strain, even on those which are
in fact no type strains. But this can lead to serious taxonomic confusions,
if strains are falsely mistaken as type strains. That is, be careful when preparing
your dataset and when collecting lists of 'reference strains' from other web pages.
Q: Do I have to include type strain genomes in my TYGS submission?
In default mode, the TYGS will already determine a set of closely related type-strain genomes per each of your provided genomes.
The exact procedure is described in the TYGS publication. If you are still uploading type strain genomes, this will of course result
in duplicate genome sequences throughout your results because your uploaded genomes will perfectly match with the respective
type-strain genome from the TYGS database. In case you have clicked the checkbox 'Restrict job to above genome(s)?', the TYGS will skip
the determination of closely related type-strain genomes entirely and only focus on the genome sequences and accessions you have
provided via the submission form.
Q: Why is a particular type-strain genome not included in the TYGS database?
While the TYGS database attempts to be as comprehensive as possible, a specific
type-strain genome may be missing for a variety of reasons:
The genome has not yet been sequenced.
The genome sequence has been obtained but not been deposited in public
The genome sequence has been deposited in public databases but cannot be
identified because its metadata lack crucial information.
The genome sequence has been deposited in public databases but cannot be
identified because a deposit of the type strain was used that is unknown to
the TYGS database.
The genome sequence was identified in public databases but is still under
investigation by the TYGS team.
The genome sequence can be identified in public databases but fails the
TYGS quality checks.
If you are aware of a type-strain genome sequence that is missing in the TYGS
database, please contact the maintainers and
provide information on this genome sequence as indicated above.
Prior to sending a report on a missing type-strain genome sequence,
make sure the species or subspecies is not contained in the list for
manual genome selection. If it is contained, you may better file a report on a
type-strain genome sequence that is contained but not automatically found.
Q: Why did I not receive an e-mail pointing to TYGS results?
Please see the
for detailed explanations of why an e-mail may get lost.
Also note that the TYGS results are additionally displayed on a website.
Q: What do the three different digital DDH formulas (d0,d4,d6) mean?
Table 3 of the TYGS result page contains the pairwise dDDH values between your user genomes and the
selected type-strain genomes. The dDDH values are provided along with their confidence intervals (C.I.)
for the three different GBDP (Genome BLAST Distance Phylogeny) formulas:
formula d0 (a.k.a. GGDC formula 1): length of all HSPs divided by total genome length
formula d4 (a.k.a. GGDC formula 2): sum of all identities found in HSPs divided by overall HSP length
formula d6 (a.k.a. GGDC formula 3): sum of all identities found in HSPs divided by total genome length
More info on these formulas and the underlying GBDP method is found in the literature.
Note: Formula d4 is independent of genome length and is thus robust against the use of incomplete draft
genomes. For other reasons for preferring formula d4, see the FAQ. Formulae d0 and d6 reflect the genome pair's
(dis-)similarity in gene content.
Q: Why is a species sometimes represented in the TYGS phylogenies by more than one strain deposit?
For some species the TYGS database contains the genome sequences of more than one strain deposit (e.g. ATCC and DSM). Now, if a user-provided
genome sequence results in a close match with such a species all strain deposits of that species are usually included in the TYGS result.
The main reason is that the scientific literature reports rare cases in which such strains unexpectedly differ to a considerable extent thus indicating
a strain confusion or contamination (please find an example here). That way, the TYGS is an important tool to uncover such irregularities.
Apart from that, we think that having more than one strain deposit of the same species included in the dataset is at most a cosmetic issue, not a scientific problem. If you still want to remove such "duplicates", you have the option to download the trees in Newick format and remove them. We however advise against post-manipulation of ready-made results.
Q: Why do the TYGS results sometimes include species which names are not validly published?
The TYGS is showing such matches because it may be important for any taxonomist to be aware of all closely related species or subspecies even if their names are not (yet) validly published. Since several criteria have to be met before a new species name is validly published (see details on LPSN page), the entire process might take some time. For example, it can well be that a novel species or subspecies name was already proposed in an effective publication but has still not been announced in a Validation List. In theory, a second team might now start working on the description of the same taxon, resulting in redundant work. That is, if your novel strain is placed in the same species or subspecies cluster as a species or subspecies with a not (yet) validly published name, we recommend to get in touch with its authors. An effective publication may be available and the valid publication of the name may be imminent. And even in the case of a low probability of a forthcoming validation it is often worth reporting phylogenetically close relationships to taxa that lack a validly published name. Their names occur in databases anyway and their analyses may yield valuable information.
Q: The tree is missing! What should I do?
Click on the refresh symbol close to
"Click to load or refresh tree (page needs to be viewed in https session)"
on the tree page.
Q: How do I proceed if the genome-scale phylogeny is not well resolved?
For very diverse datasets of strains, the average branch support, even of the genome-based phylogeny,
might be too low which is not unlikely for such datasets. In general, if certain parts in any given
phylogeny are not well resolved (i.e. low branch support), these parts are not interpretable.
In case of the TYGS, an optional proteome-based GBDP analysis will become available on user request
if the dataset is not too large (< 30 strains) and the average branch support of the genome-based
tree is smaller than 60%. If these conditions are met, you will find an order button on the respective
TYGS result page below the phylogenies table.
Q: Does the TYGS offer an API?
Yes, the TYGS has an API for the programmatic download of results. Please find a detailed description here.
Q: Which user-visible changes have been applied to the TYGS database since the publication?
In addition to the routine import of taxonomic and genomic information,
the user-visible changes that were applied to the database after
the initial Nature Communications publication are listed on the News and