GeneTrans extracts and/or translates coding regions as defined in the feature table of sequences stored in the EMBL or Genbank databases.
GeneTrans not only translates one gene of one sequence at a time but can translate all genes from a group of sequences or even from a whole database in one step. But the program will create one file for each gene, so it is easy to produce hundreds of files with one call of this program.
The program asks for the database sequence(s) and for the coding sequence(s) to be translated. Coding sequences are defined by the CDS feature key within the feature table; if you want to translate, for example, the coding sequence of the third CDS entry, just enter 3. You may enter an asterisk (*) to translate all coding sequences.
With the -DNA parameter, the program will simply extract the coding sequences but not translate them.
This program was written by Weiyun Chen and Karl-Heinz Glatting at the German Cancer Research Centre (DKFZ), Heidelberg, Germany.
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Here is a session using GeneTrans to extract and translate the second gene out of a sequence from the EMBL data library called ecpatabc :
% genetrans GENETRANS extracts and/or translates coding regions as defined in the feature table of sequences stored in the EMBL or Genbank database. Translate gene(s) from what sequence(s) ? embl:ecpatabc Translate what protein coding sequence ? Enter * for all. (* 1 *) ? 2 What should I call the output file (* ecpatabc.pep *) ? %
Here is some of the output file ecpatabc.pep, that shows common information about the database sequence, the corresponding part of the feature table and the translated coding sequence:
GeneTrans of EMPRO:ECPATABC ID ECPATABC standard; DNA; PRO; 4385 BP. AC M64519; DT 26-APR-1991 (Rel. 28, Created) DT 14-AUG-1991 (Rel. 29, Last updated, Version 2) DE E.coli transport protein (potA, potB, potC and potD) genes, . . . FT CDS 1529. .2356 FT /product="transport protein" /gene="potB" FT /codon_start=1 ecpatabc.pep Length: 276 August 18, 1992 13:59 Check: 5233 .. 1 VIVTIVGWLV LFVFLPNLMI IGTSFLTRDD ASFVKMVFTL DNYTRLLDPL 51 YFEVLLHSLN MALIATLACL VLGYPFAWFL AKLPHKVRPL LLFLLIVPFW 101 TNSLIRIYGL KIFLSTKGYL NEFLLWLGVI DTPIRIMFTP SAVIIGLVYI 151 LLPFMVMPLY SSIEKLDKPL LEAARDLGAS KLQTFIRIII PLTMPGIIAG 201 CLLVMLPAMG LFYVSDLMGG AKNLLIGNVI KVQFLNIRDW PFGAATSITL 251 TIVMGLMLLV YWRASRLLNK KVELE*
Here is a part of the example file embl:ecpatabc showing the feature table (FT) that gives detailed information about specific parts of the sequence :
ID ECPATABC standard; DNA; PRO; 4385 BP. AC M64519; DT 26-APR-1991 (Rel. 28, Created) DT 14-AUG-1991 (Rel. 29, Last updated, Version 2) DE E.coli transport protein (potA, potB, potC and potD) genes, DE complete cds. //////////////////////////////////////////////////////////////////////////////// FH Key Location/Qualifiers FH FT CDS 379. .1515 FT /product="transport protein" /gene="potA" FT /codon_start=1 FT CDS 1529. .2356 FT /product="transport protein" /gene="potB" FT /codon_start=1 FT CDS 2353. .3147 FT /product="transport protein" /gene="potC" FT /codon_start=1 FT CDS 3144. .4190 FT /product="transport protein" /gene="potD" FT /codon_start=1 FT sig_peptide 3144. .3212 FT /gene="potD" /codon_start=1 FT mat_peptide 3213. .4187 FT /product="transport protein" /gene="potD" FT /codon_start=1 ////////////////////////////////////////////////////////////////////////////////In our example session we chose the second coding sequence from position 1529 to 2356 to be translated. If you select all coding sequences, Genetrans numbers all CDS entries from 1 through N and creates one file for each CDS entry with the extensions .pep.1 through .pep.N, respectively.
Translate translates nucleotide sequences into peptide sequences.
BackTranslate creates a nucleotide sequence from an amino acid sequence. The output helps design synthetic probes.
ExtractPeptide writes a peptide sequence from one or more of the translation frames displayed in the output from Map. Translate supercedes ExtractPeptide for most applications.
Map displays both strands of a DNA sequence with a restriction map shown above the sequence and possible protein translations shown below.
GeneTrans creates one separate file for each extracted gene, so it is very easy to create hundreds of files at a time. Be careful with searches through whole or large parts of databases !
All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
Minimum Syntax: % genetrans [-INfile=]embl:ecpatabc -Default Required parameters: [-OUTfile=]seqname.pep name of the output file -CDS=1 select coding sequence number 1 0 (specify 0 for all) Local Data Files: -TRANSlate=translate.txt contains the translation scheme Optional Switches: -DNA do not translate the sequence; extract only
GeneTrans was written by Karl-Heinz Glatting (DKFZ).
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -TRANSlate=mycode.txt. Translation tables are discussed in more detail in the Data Files manual.
The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
only extract the gene, without translation.
Printed: April 22, 1996 15:53 (1162)