Convert Genbank to FASTA

Genbank/EMBL to FASTA Conversion Tool

Instructions:

This tool is designed to accept a GenBank or EMBL format file, and convert it to a FASTA file. You have control over what kind of sequence gets extracted, and how the header line is written. Select a GenBank or EMBL format file to upload containing a feature table. Select the file format. Choose the delimiter characters that will separate the qualifiers from one another in the header line. Select whether to extract translated amino acid sequences, DNA sequence for each feature, or the entire DNA sequence of the whole record. To choose how your header line appears, designate the names of the qualifiers, one per field, in the order that you want them to appear in the FASTA header line. Common feature and qualifier choices are already filled in, but you may put your own choices in their place. A description of potential feature keys and qualifiers can be found here.

Stand Alone Version:

A stand alone command line version of this software is available here.

Please only process one file at a time, so as not to overload our server. If your input file is more than a few MB, please download the stand alone version.


General Options:

Input your file:
Please select a file to convert. Must be GenBank or EMBL format and have a feature table. (Support for EMBL files is experimental.)
Sample Input File
GenBank EMBL

Header Item Delimiter Character:
Space Pipe Space |
Tab
Space
Pipe |
Dash -
Underscore _


Fill out either the following green or gray section, not both.

Extract Individual Features

Output Sequence Type:
Amino Acid sequence
Extracts amino acid sequences from the feature table. Only works if your features include "translation" qualifiers (usually the case).

Amino Acid sequence translated on the fly
Generates a translation from the nucleotide sequence on the fly based on the position. Useful if your file doen't have "translation" qualifiers in the feature table.Takes into account joins, in other words introns are excluded from exported sequences.Does not take into account RNA editing. Also, where fuzzy locations are given, we just pick a spot. In cases of RNA editing and fuzzy locations, DNA sequences used in the translation may not be quite like actual cDNA that generated your protein, but should be close. For this reason, the above option that takes amino acid sequence directly from the feature table is preferable.

Nucleotide sequence
Extracts nucleotide sequence of features using position information in the feature table. Takes into account joins, in other words introns are excluded from exported sequences. Does not take into account RNA editing. Also, where fuzzy locations are given, we just pick a spot. In cases of RNA editing and fuzzy locations the DNA sequence will not be quite like cDNA that generated your protein, but should be close.


Feature to Extract:
CDS
rRNA
tRNA
gene
misc_RNA
other

Select qualifiers to use in the Fasta Header Line:
First Qualifier:
Second Qualifier:
Third Qualifier:
Fourth Qualifier:
Fifth Qualifier:
Sixth Qualifier:
A note about location: The given coordinates are for the whole feature, including introns. If you want the exact location as given in the genbank file, use "location_long"

Ready?

Extract Whole Sequence

Extract DNA sequence of the whole record, not individual features. The FASTA header line will be the organism.


Custom Header:
If you wish, you may specify a custom header line for your fasta file. Omit the leading ">" character.

Ready?



NSF logo

Support for the development of Convert GenBank to FASTA provided by the National Science Foundation under Grant EF-0523756 to G. Rocap and R. A. Cattolico.

Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF)

Creative Commons License Genbank to FASTAby Cedar McKay and Gabrielle Rocap, UW Oceanography is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.