Type Strain Genome Server

Answer

Use of this form is free for academic purposes. For all other uses, please contact the authors via the feedback form.

"Academic purposes" do not include running queries just to collect and potentially plagiarize the type-strain genomes and other associated meta data included in the TYGS database. If you are unsure if and how to properly use the TYGS, do not hesitate to contact the TYGS team via the contact form.

Answer

Browsing the TYGS web page is free and does not require registration.

For submitting a TYGS job, an e-mail address has to be provided along with some genome sequences and/or GenBank accession IDs, the subject of the subsequent analysis. The e-mail address is the only piece of person-related information stored by the TYGS. All data associated with a TYGS job, including the e-mail address, are deleted after the job has finished and an additional amount of time has passed. The exact time of deletion is indicated in the notification e-mail and in the information badges in the top-right corner of the TYGS result page.

Additional information is found in the general privacy statement of the Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, in accordance with the General Data Protection Regulation.

Answer

The TYGS is currently limited to 20 genomes not because of a limitation of the method but because we have to keep an eye on the compute cluster usage. But for most use cases this limit is clearly more than enough. However, we usually offer to increase the upload cap for a given email address on TYGS beyond the default of 20 user genomes when asked for, assuming the requested upload cap is not exorbitantly high.

Before asking for an increased upload cap, please consider and prepare the following aspects though:

  1. How large is your final dataset?
  2. Is your dataset really final?
  3. In case you want to analyze few own isolates together with a rather large list of 'reference strains' which you have obtained from some other web site: which insights do you expect to get from such an analysis and why can't you just analyze your isolates through the TYGS without these 'reference strains' (also please read the FAQ item on type strains vs. 'reference strains')?
  4. Have you removed redundant strains (e.g. clonal strains) from the dataset to reduce its size and, if not, why?
  5. Is your dataset a rather diverse selection of strains or do the strains belong to a group of closely related strains?
  6. Are your files properly labelled (e.g. labels will be shown in the trees)?
  7. Does your dataset contain type-strain genomes and, if yes, why? The TYGS already provides type-strain genomes.
  8. Do you want to restrict the TYGS analysis to the uploaded genome sequences or do you want to use the TYGS in standard mode (i.e. the TYGS will determine closely related type strain genomes)?

How does the procedure work?

We will set up an exception for your e-mail address which allows you to submit TYGS requests with the requested number of user sequences. If the number of strains exceeds 100, the web server might dislike this large amount of data and we have to submit the files on your behalf from within our network. In that case the file exchange will be organized via a confidential folder on our institutional cloud.

The exception will be usually valid for a week but can normally be extended on request. Please note that your job(s) will use some resources of our in-house compute cluster. That is, please do not start multiple redundant jobs of that size if this can be avoided.

Answer

Please see the GGDC/VICTOR FAQ for detailed instructions on how to specify accession numbers.

Answer

After a click on the "browse" button in the submission form, you can hold the CTRL key in the files selection dialog and use the left mouse button to click and select a custom set of genome files. The key combination CTRL+A will select all files in the folder. If you hold the SHIFT key instead, you can select a range of files. Both techniques should work in any type of browser and these mechanisms are entirely independent of the TYGS.

Answer

Type strains form the backbone of prokaryotic systematics as nomenclatural types of species and subspecies, and comparisons with established type strains are mandatory when classifying novel strains (PMID: 19700448).

The good news is that the TYGS is designed to automatically determine type strains closely related to your query genome(s).

On contrary, the term 'reference strain' is often used synonymously with the term 'type strain' but in fact the former term is not sharply defined and just an arbitrary label which can basically be put on any strain, even on those which are in fact no type strains. But this can lead to serious taxonomic confusions, if strains are falsely mistaken as type strains. That is, be careful when preparing your dataset and when collecting lists of 'reference strains' from other web pages.

Answer

In default mode, the TYGS will already determine a set of closely related type-strain genomes per each of your provided genomes. The exact procedure is described in the TYGS publication. If you are still uploading type strain genomes, this will of course result in duplicate genome sequences throughout your results because your uploaded genomes will perfectly match with the respective type-strain genome from the TYGS database. In case you have clicked the checkbox 'Restrict job to above genome(s)?', the TYGS will skip the determination of closely related type-strain genomes entirely and only focus on the genome sequences and accessions you have provided via the submission form.

Answer

While the TYGS database attempts to be as comprehensive as possible, a specific type-strain genome may be missing for a variety of reasons:

  • The genome has not yet been sequenced.
  • The genome sequence has been obtained but not been deposited in public databases.
  • The genome sequence has been deposited in public databases but cannot be identified because its metadata lack crucial information.
  • The genome sequence has been deposited in public databases but cannot be identified because a deposit of the type strain was used that is unknown to the TYGS database.
  • The genome sequence was identified in public databases but is still under investigation by the TYGS team.
  • The genome sequence can be identified in public databases but fails the TYGS quality checks.

If you are aware of a type-strain genome sequence that is missing in the TYGS database, please contact the maintainers and provide information on this genome sequence as indicated above.

Prior to sending a report on a missing type-strain genome sequence, make sure the species or subspecies is not contained in the list for manual genome selection. If it is contained, you may better file a report on a type-strain genome sequence that is contained but not automatically found.

Answer

Please see the GGDC/VICTOR FAQ for detailed explanations of why an e-mail may get lost. Also note that the TYGS results are additionally displayed on a website.

Answer

Table 3 of the TYGS result page contains the pairwise dDDH values between your user genomes and the selected type-strain genomes. The dDDH values are provided along with their confidence intervals (C.I.) for the three different GBDP (Genome BLAST Distance Phylogeny) formulas:

  • formula d0 (a.k.a. GGDC formula 1): length of all HSPs divided by total genome length
  • formula d4 (a.k.a. GGDC formula 2): sum of all identities found in HSPs divided by overall HSP length
  • formula d6 (a.k.a. GGDC formula 3): sum of all identities found in HSPs divided by total genome length

More info on these formulas and the underlying GBDP method is found in the literature.

Note: Formula d4 is independent of genome length and is thus robust against the use of incomplete draft genomes. For other reasons for preferring formula d4, see the FAQ. Formulae d0 and d6 reflect the genome pair's (dis-)similarity in gene content.

Answer

For some species the TYGS database contains the genome sequences of more than one strain deposit (e.g. ATCC and DSM). Now, if a user-provided genome sequence results in a close match with such a species all strain deposits of that species are usually included in the TYGS result. The main reason is that the scientific literature reports rare cases in which such strains unexpectedly differ to a considerable extent thus indicating a strain confusion or contamination (please find an example here). That way, the TYGS is an important tool to uncover such irregularities.

Apart from that, we think that having more than one strain deposit of the same species included in the dataset is at most a cosmetic issue, not a scientific problem. If you still want to remove such "duplicates", you have the option to download the trees in Newick format and remove them. We however advise against post-manipulation of ready-made results.

Answer

The TYGS is showing such matches because it may be important for any taxonomist to be aware of all closely related species or subspecies even if their names are not (yet) validly published. Since several criteria have to be met before a new species name is validly published (see details on LPSN page), the entire process might take some time. For example, it can well be that a novel species or subspecies name was already proposed in an effective publication but has still not been announced in a Validation List. In theory, a second team might now start working on the description of the same taxon, resulting in redundant work. That is, if your novel strain is placed in the same species or subspecies cluster as a species or subspecies with a not (yet) validly published name, we recommend to get in touch with its authors. An effective publication may be available and the valid publication of the name may be imminent. And even in the case of a low probability of a forthcoming validation it is often worth reporting phylogenetically close relationships to taxa that lack a validly published name. Their names occur in databases anyway and their analyses may yield valuable information.

Answer

Click on the refresh symbol close to "Click to load or refresh tree (page needs to be viewed in https session)" on the tree page.

Answer

For very diverse datasets of strains, the average branch support, even of the genome-based phylogeny, might be too low which is not unlikely for such datasets. In general, if certain parts in any given phylogeny are not well resolved (i.e. low branch support), these parts are not interpretable. In case of the TYGS, an optional proteome-based GBDP analysis will become available on user request if the dataset is not too large (< 30 strains) and the average branch support of the genome-based tree is smaller than 60%. If these conditions are met, you will find an order button on the respective TYGS result page below the phylogenies table.

Answer

Yes, the TYGS has an API for the programmatic download of results. Please find a detailed description here.

Answer

In addition to the routine import of taxonomic and genomic information, the user-visible changes that were applied to the database after the initial Nature Communications publication are listed on the News and Changelogs page.