Etostaden

Go back to top

ETOSTADEN


FUNCTION

EToStaden writes a GCG sequence into a file in Staden format. If the file contains a nucleotide sequence, the ambiguity codes are converted as shown in Appendix III of the GCG Program Manual. EToStaden is a version of GCG's ToStaden with command line control.


DESCRIPTION

Any sequence file in GCG format can be converted with EToStaden into a format suitable for use in the Staden programs. If the sequence is a nucleic acid sequence, the compatible ambiguity codes are converted from the IUB-IUPAC versions to Staden's versions. You can see how they are converted in Appendix III or in the example below.


AUTHOR

This GCG program was modified by Jaakko Hattula (Tampere University of Technology, Finland) and Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).


EXAMPLE

Here is a session using EToStaden to convert the sequence file test2.seq (which you can fetch as EGenDocData:test2.seq) into a Staden-format file:

  
  
  % etostaden
  
   ETOSTADEN uses any sequence data
  
   ETOSTADEN of what sequence ?  test2.seq
  
              Start (* 1 *) ?
             End (*  389 *) ?
  
   What should I call the output file (* test2.sdn *) ?
  
  %
  
  


OUTPUT

Here is the output file test.sdn:

  
  
  GCTGCCGCAGCGGC-GATGACAATAACRAYTGTTGCTGYGATGACGAYGA
  AGAGGARTTTTTCTTYGGTGGCGGAGGGGG-CATCACCAYATTATCATAA
  T-AAAAAGAARTTGTTACTTCTCCTACTGTTRCT-YTAYTGYTRYT-ATG
  AATAACAAYCCTCCCCCACCGCC-CAACAGCARCGTCGCCGACGGCGGAG
  AAGGCG-AGR5GA5GG5GR5G-TCTTCCTCATCGAGTAGCTC-AGY78-A
  CTACCACAACGAC-GTTGTCGTAGTGGT-TGG---TATTACTAYGAAGAG
  CAACAG8ARTAATAGTGATARTRATRRA-C---G--6-5---R8TT-7-Y
  ---A-C---G--6-5---R8TT-7-Y---
  


RELATED PROGRAMS

The following programs convert sequences between other formats and GCG format: FromEMBL, FromGenBank, FromIG, FromPIR, FromStaden, FromFastA, ToIG, ToPIR, ToStaden and ToFastA.

DataSet creates a GCG data library from any set of sequences in GCG format. ToBLAST creates a database that can be searched by the BLAST program from any set of sequences in GCG format.


RESTRICTIONS

The ambiguity codes are not all strictly comparable in converting from GCG to Staden format. All documentation and numbering is lost in the Staden-format output file. You should be sure that the Staden program you intend to use is compatible with any ambiguity codes used in your sequence.


FILES USED

Here is the input file for the example above:

  
  
  This sequence contains every symbol in the alphabet of
  legitimate GCG sequence characters (Appendix III).
  
  Test.Seq  Length: 389  July 19, 1994 15:05  Type: N  Check: 8468  ..
  
    1
       >starts with the codons from appendix iii>
       GCTGCCGCAG CGGCXGATGA CAATAACRAY TGTTGCTGYG ATGACGAYGA
  
   51  AGAGGARTTT TTCTTYGGTG GCGGAGGGGG XCATCACCAY ATTATCATAA
  
  101  THAAAAAGAA RTTGTTACTT CTCCTACTGT TRCTXYTAYT GYTRYTXATG
  
  151  AATAACAAYC CTCCCCCACC GCCXCAACAG CARCGTCGCC GACGGCGGAG
  
  201  AAGGCGXAGR MGAMGGMGRM GXTCTTCCTC ATCGAGTAGC TCXAGYWSXA
  
  251  CTACCACAAC GACXGTTGTC GTAGTGGTXT GGXXXTATTA CTAYGAAGAG
  
  301  CAACAGSART AATAGTGATA RTRATRR
                                    >continues with all
       uppercase sequence characters>
                                    ABC DEFGHIJKLM NOPQRSTUVW
  
  351  XYZ.+@&*ab cdefghijkl mnopqrstuv wxyz*@&+.
  
                                                 
 


SEQUENCE TYPE

The function of EToStaden depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.


COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

  
  
  Minimal Syntax: % etostaden [-INfile1=]test.seq -Default
  
  Prompted Parameters:
  
  -BEGin=1 -END=389         range of interest
  [-OUTfile1=]test.sdn      output file name
  
  Local Data Files: None
  
  Optional Parameters: None
  
  


LOCAL DATA FILES

None.


OPTIONAL PARAMETERS

None.

Printed: April 22, 1996 15:53 (1162)