Genetrans

Go back to top

GENETRANS

GENETRANS

FUNCTION

GeneTrans extracts and/or translates coding regions as defined in the feature table of sequences stored in the EMBL or Genbank databases.

GeneTrans not only translates one gene of one sequence at a time but can translate all genes from a group of sequences or even from a whole database in one step. But the program will create one file for each gene, so it is easy to produce hundreds of files with one call of this program.

The program asks for the database sequence(s) and for the coding sequence(s) to be translated. Coding sequences are defined by the CDS feature key within the feature table; if you want to translate, for example, the coding sequence of the third CDS entry, just enter 3. You may enter an asterisk (*) to translate all coding sequences.

With the -DNA parameter, the program will simply extract the coding sequences but not translate them.

AUTHOR

This program was written by Weiyun Chen and Karl-Heinz Glatting at the German Cancer Research Centre (DKFZ), Heidelberg, Germany.

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).

EXAMPLE

Here is a session using GeneTrans to extract and translate the second gene out of a sequence from the EMBL data library called ecpatabc :

  
  
  % genetrans
  
  GENETRANS extracts and/or translates coding regions as defined
  in the feature table of sequences stored in the EMBL or
  Genbank database.
  
  
  Translate gene(s) from what sequence(s) ?  embl:ecpatabc
  
   Translate what protein coding sequence ?
     Enter * for all. (* 1 *) ?  2
  
   What should I call the output file (* ecpatabc.pep *) ?
  
  %

OUTPUT

Here is some of the output file ecpatabc.pep, that shows common information about the database sequence, the corresponding part of the feature table and the translated coding sequence:

  
  
  GeneTrans of EMPRO:ECPATABC
  
  ID   ECPATABC   standard; DNA; PRO; 4385 BP.
  AC   M64519;
  DT   26-APR-1991 (Rel. 28, Created)
  DT   14-AUG-1991 (Rel. 29, Last updated, Version 2)
  DE   E.coli transport protein (potA, potB, potC and potD) genes, . . .
  FT   CDS             1529. .2356
  FT                   /product="transport protein" /gene="potB"
  FT                   /codon_start=1
  
ecpatabc.pep  Length: 276  August 18, 1992 13:59  Check: 5233  ..
  
    1  VIVTIVGWLV LFVFLPNLMI IGTSFLTRDD ASFVKMVFTL DNYTRLLDPL
  
   51  YFEVLLHSLN MALIATLACL VLGYPFAWFL AKLPHKVRPL LLFLLIVPFW
  
  101  TNSLIRIYGL KIFLSTKGYL NEFLLWLGVI DTPIRIMFTP SAVIIGLVYI
  
  151  LLPFMVMPLY SSIEKLDKPL LEAARDLGAS KLQTFIRIII PLTMPGIIAG
  
  201  CLLVMLPAMG LFYVSDLMGG AKNLLIGNVI KVQFLNIRDW PFGAATSITL
  
  251  TIVMGLMLLV YWRASRLLNK KVELE*

FEATURE TABLE

Here is a part of the example file embl:ecpatabc showing the feature table (FT) that gives detailed information about specific parts of the sequence :

  
  
  
  ID   ECPATABC   standard; DNA; PRO; 4385 BP.
  AC   M64519;
  DT   26-APR-1991 (Rel. 28, Created)
  DT   14-AUG-1991 (Rel. 29, Last updated, Version 2)
  DE   E.coli transport protein (potA, potB, potC and potD) genes,
  DE   complete cds.
  
  ////////////////////////////////////////////////////////////////////////////////
  
  FH   Key             Location/Qualifiers
  FH
  FT   CDS             379. .1515
  FT                   /product="transport protein" /gene="potA"
  FT                   /codon_start=1
  FT   CDS             1529. .2356
  FT                   /product="transport protein" /gene="potB"
  FT                   /codon_start=1
  FT   CDS             2353. .3147
  FT                   /product="transport protein" /gene="potC"
  FT                   /codon_start=1
  FT   CDS             3144. .4190
  FT                   /product="transport protein" /gene="potD"
  FT                   /codon_start=1
  FT   sig_peptide     3144. .3212
  FT                   /gene="potD"                /codon_start=1
  FT   mat_peptide     3213. .4187
  FT                   /product="transport protein" /gene="potD"
  FT                   /codon_start=1
  
  ////////////////////////////////////////////////////////////////////////////////

In our example session we chose the second coding sequence from position 1529 to 2356 to be translated. If you select all coding sequences, Genetrans numbers all CDS entries from 1 through N and creates one file for each CDS entry with the extensions .pep.1 through .pep.N, respectively.

RELATED PROGRAMS

Translate translates nucleotide sequences into peptide sequences.

BackTranslate creates a nucleotide sequence from an amino acid sequence. The output helps design synthetic probes.

ExtractPeptide writes a peptide sequence from one or more of the translation frames displayed in the output from Map. Translate supercedes ExtractPeptide for most applications.

Map displays both strands of a DNA sequence with a restriction map shown above the sequence and possible protein translations shown below.

CONSIDERATIONS

GeneTrans creates one separate file for each extracted gene, so it is very easy to create hundreds of files at a time. Be careful with searches through whole or large parts of databases !

COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  
  
  Minimum Syntax:  % genetrans [-INfile=]embl:ecpatabc -Default
  
  Required parameters:
  
  [-OUTfile=]seqname.pep    name of the output file
  -CDS=1                    select coding sequence number 1
  0                    (specify 0 for all)
  
  Local Data Files:
  
  -TRANSlate=translate.txt  contains the translation scheme
  
  Optional Switches:
  
  -DNA                      do not translate the sequence;
                       extract only

ACKNOWLEDGEMENTS

GeneTrans was written by Karl-Heinz Glatting (DKFZ).

LOCAL DATA FILES

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -TRANSlate=mycode.txt. Translation tables are discussed in more detail in the Data Files manual.

OPTIONAL PARAMETERS

The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

-DNA

only extract the gene, without translation.

Printed: April 22, 1996 15:53 (1162)