EGCG newfeatures

NEWFEATURES*

FUNCTION

NewFeatures is an interactive editor for entering and modifying the feature table and for minor editing of the sequence itself.

DESCRIPTION

NewFeatures uses the screen of your terminal as a window into a data library entry. It works like the VMS text editor EDT or the GCG sequence editing program SeqEd. You can add, modify and delete feature table items by either typing the sequence positions or by searching for known short sequences and pointing to the correct feature position with the cursor.

You can also update the sequence if it is found to be incorrect. Changes you make in the sequence take place at the cursor position and are reflected immediately on the screen. Feature positions are updated automatically. You can insert or delete bases, move the cursor, and search for patterns.

If the feature table already exists, either from an existing entry, or from the output of a previous NewFeatures run, the feature table will be loaded and an attempt will be made to create the correct feature groups for coding sequences and repeats.

The feature table can be in a file in your current directory, in the GCG copy of the database, or loaded by a locally defined "Grab". There are command line options to tell NewFeatures where to look for the latest version of the feature table and sequence.

The command file EGenCom:Grab.Com (if present) is executed to access local data collections.

NewFeatures will let you change the positions of the keys on your terminal keyboard to make it more convenient to enter the letters G, A, T, and C for sequence editing. The method is the same as for the GCG program SeqEd.

AUTHOR

This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).

All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).

EXAMPLE

Users already familiar with the VMS EDT editor or the SeqEd program will learn to use NewFeatures quickly. When you run NewFeatures with a command like % NewFeatures x12345.dat, your screen will look something like this:


X12345.Dat                                                          NEWFEATURES

         FH   Key             Location/Qualifiers
         FH
    1 >> FT   CDS             262..1299
         FT                   /product="amidase"
    2  * FT   mRNA            <1..>2657



     0        10        20        30        40        50        60        70
 ....|.........|.........|.......:.|.........|.........|.........|.........|....

      AGCTTCCGTGCGAATGATGGCATGCATGCTATCTCAGGCTCGCACCATGTGCTTTCGCGATCGCGCCGATTACA

      S  F  R  A  N  D  G  M  H  A  I  S  G  S  H  H  V  L  S  R  S  R  R  L  H
       A  S  V  R  M  M  A  C  M  L  S  Q  A  R  T  M  C  F  R  D  R  A  D  Y  I
        L  C  P  E  *  W  H  A  C  Y  L  R  L  A  P  C  A  F  A  I  A  P  I  T

EDIT NEW OR EXISTING SEQUENCES

If you name a sequence file that already exists, NewFeatures will display the existing feature table in the top part of the screen followed by the sequence and the three protein sequence translations in the present direction (forward or reversed).

The nucleotide and protein sequences, the command line, and the message line are always in the bottom few lines of the screen. The rest of the screen is used to display the feature table. If you are editing a large features table, you can use a workstation screen set to 55 lines \ of the feature table than you can see on a normal terminal.

If the sequence you name does not exist, NewFeatures will start in SeqEdit Mode (see below) to allow you to enter the new sequence. Type followed by NOSEQEDIT to stop editing the sequence.

If you are creating a new feature table, of if the source key is missing, NewFeatures will create a source key and prompt you for the organism name.

SCREEN MODE

Moving the Current Feature Line Pointer

The ">>" pointer in the feature table display shows the current feature line that is being edited. Coding regions and groups of repeats have more than one feature line, indicated by an asterisk (*) on the other lines in the group.

The key on the keypad moves the pointer down to the next feature line. and moves the pointer down to the last feature line.

The key moves the pointer up to the previous feature line. and moves the pointer up to the first feature line.

Typing a number first causes the and keys to move the pointer that many lines up or down for faster scrolling through the feature table.

The SEARCH command moves to the next feature (starting at the present line) that contains specified text in the feature key of qualifiers.

The pointer is saved for each table, so when you browse another table and return later you will be on the same feature line.

Moving the Sequence Position Cursor

To move the cursor to the right, use the key; to move to the left, use the key. Movements are confined to the length of the sequence.

You can type a number followed by a carriage return and the cursor will move to that sequence position.

You can type a number followed by an arrow key to move a specific number of bases to move to the left or right. 10 will move 10 bases to the right.

You can move 50 bases to the right with the ">" key, or with the key.

You can move 50 bases to the left with the "<" key, or with the key.

Nucleotide and Protein Sequence Lines

The command PROtein in Command Mode moves the cursor to the protein sequence and makes NewFeatures search for an exact match to a protein sequence.

The commands NUCleotide and NOPROTein in Command Mode make searches recognize patterns containing IUB nucleotide ambiguity symbols.

The key moves down to the next protein sequence line. If the cursor is already on the lowest line, it moves up to the nucleotide sequence.

The key moves up to the next protein sequence line. If the cursor is already on the top line, it moves up to the nucleotide sequence. If the cursor is on the nucleotide sequence it moves down to the lowest protein sequence line.

By default, NewFeatures shows only the forward strand on the screen The SHOWREVerse command, and the command line option, tell NewFeatures to also show the complementary strand. This takes up one extra line on the screen, and so leaves a little less space to display feature table lines.

Finding Patterns

To search for a sequence pattern, type a ? or use the key in Screen Mode. The cursor will move to the lower left-hand corner of the screen to let you enter a sequence pattern that you wish to find. You can repeat the last search by simply typing ?. NewFeatures treats all nucleotide sequences as circular and finds your pattern even if it wraps from the end of the sequence into the beginning.

If your cursor is on the nucleotide sequence, NewFeatures will search for a nucleotide pattern, using the normal ambiguity codes.

If your cursor is on one of the protein sequence translations, NewFeatures will search for an exact peptide sequence (using the default translation table).

NewFeatures uses the same rules for pattern definition and recognition as the programs SeqEd, FindPatterns, MapPlot, Map, and MapSort.

Even if NewFeatures is searching the nucleotide sequence, you can request a perfect match search by typing "=" after the "?". For example, ?=RCT will only match RCT (case does not matter) no matter which kind of sequence the cursor is on.

Finding a "Marked" Position

You can mark a position in a sequence to which you wish to return. You give the marked position a letter (like giving it a name) using the Command Mode command Mark (see below). Then, in Screen Mode, a single quote followed by the letter used to mark the sequence (for example 'x) will move the cursor to the sequence position where that mark was defined. The marks are not saved when you exit from NewFeatures.

Leaving Screen Mode

Type the key or Ctrl-Z or ":" to leave Screen Mode and enter Command Mode. The commands SINgle and NOSINgle switch the option for NewFeatures to return automatically to Screen Mode after each command.

Use the key to enter a single command and return to Screen Mode if you are not in "single" mode.

EDITING THE FEATURE TABLE

Creating Feature Lines

New feature lines are created with the command FEATURE, or more commonly with the key. The program prompts for the feature key, which need only be the first few letters if they are not ambiguous. For example, mat is enough to specify "mat_peptide", but polya could be "polyA_signal" or "polyA_site" so almost the entire feature type must be given.

Once you have a feature line on the screen, you can also create new lines by deleting any feature and then undeleting it more than once (just like "cut and paste" in an editor). You can then change the type of the new copies with the KEY command, and/or change their locations.

When you create a "CDS" feature, several additional lines are automatically created. These include "mat_peptide", "mRNA", and "prim_transcript" if they are regarded as "standard". You are also asked how many exons the CDS has, or whether it is a cDNA. If the CDS is made up of one or more exons, the "exon" and "intron" features are also created. These are then used for defining the feature locations of the "joined" features.

You can also specify a join across entries for a new CDS feature. You are then asked for the accession number of each exon, and the other entries are loaded into memory automatically.

Setting Feature Locations

Feature locations at their simplest are in the form "from".."to" or simply "from" if both positions are the same (a modified base position for example).

You set the "from" position usually by moving the sequence cursor to the first base of the feature (using pattern searching, or typing the sequence position followed by ). pressing the key then sets the "from" position of the feature to the current sequence position. The FROM command allows you to specify a "from" position without moving the sequence cursor position.

If you know the exact base number of the "from" position, you can set it faster by typing the number and hitting keypad-1 immediately. The sequence position cursor is not changed, only the feature location is updated.

The "to" position is set in the same way, using the key or to TO command.

You can "turn off" the "from" and "to" positions (set them back to "<1" and ">end") by setting the position to unknown using the commands FROM ? and TO ?. Setting the position to 0 (zero) has the same effect.

More complex feature locations are "joins" of several sequence ranges. The individual ranges are set as exons, and are automatically copied to the "join" features and to the "intron" features. The exons represent usually the sections of the primary transcript that appear in the mRNA. The "CDS" and "mat_peptide" features do not extend to the extreme ends of the exons, so these are defined by setting a "from" and "to" position for the "CDS" or "mat_peptide" feature line as for a simple location.

Some other feature keys, such as "tRNA" can also be made by processing of the original sequence. You can convert a feature to a "joined" location with the EXONS command and convert back to a simple location with the NOOPER command.

In some cases the features location is a "group", "order" or "one-of" several possible locations. These are set in the same way as "join" locations, except that there is no need for introns. The OPER command sets the type of location.

There are also commands REPLACE and SUBSTITUTE which define a location in the form replace(from..to,"subs") for mutants, conflicts, and so on. The MUTATE command switches the original sequence and the substitute string. Note that this can cause conflicts in an entry with a large number of overlapping "replace" locations.

In a few extreme cases, it may not be possible to enter the feature location by editing exon positions (for example, if RNA editing occurs at many positions). In these cases, the command TEXT LOC provides a location as an edited text string which is not further validated by the program.

There is a simple check for two possible text locations: sequences in the form "acgt" or dot ranges in the form (12.13). Sequences are accepted as complete locations, and dot ranges are accepted as complete locations or as "from" or "to" positions. Any other text location or "from" or "to" position is accepted, but a warning message is issued.

Adding and Changing Qualifiers

Qualifiers are specified by typing "/name=value" in the same way as they appear in the final feature table. The name need only be the first few letters if there is no other qualifier that can match. The same is true for values which are limited by the controlled vocabulary and listed in the Appendix of the Feature Table Definition, for example modified base names and repeat types.

For qualifiers with a larger controlled vocabulary the F13 key starts a query to select an organism name. If this option is available, you will see the message "or [F13] for Controlled Vocab" at the bottom of the screen.

The value is optional. If you do not give one, you will be offered the existing value (if any) to edit. If you decide to make no change, simply press without editing, or (if you have already started editing for example), delete the entire value and press . No change will be made if you do not give a value.

Certain qualifiers may appear only once for a feature line, while others (/note is an obvious example) can occur any number of times. To create a new copy of a qualifier, put + before the qualifier name. For example, /+NOTE will create a new /note qualifier.

If several copies of a qualifier already exist, you will be asked which copy you want to edit.

Qualifiers with complex values, such as /anticodon, can be entered in a simpler form. As the "pos:" and "aa:" parts are fixed, the program only requires a value of (32..34,Met) and will fill in the rest. For /cons_splice a value of (n,y) is sufficient.

Copying Qualifiers

In some cases, it is useful to be able to copy qualifiers to several other featues. One way is to add the qualifiers for one feature, then delete and undelete it to create several copies with the same qualifiers. You can also repeat the last qualifier with the command "/=" which is also on the keypad as and . This is a useful way to duplicate a qualifier (/gene for example) on several feature lines.

Editing Qualifier Values

Qualifier values can be too long to fit on a single line. In such cases, a text editor is available. When entering a qualifier value, simply hit or and you will be able to edit the current text in the bottom part of the screen. You leave the editor by typing .

You can move around using the arrow keys, and make insertions and deletions as you wish. The editing window will scroll up and down if there too many lines to fit on the screen. As in DCL, will move the cursor to the beginning of the line, and will move the cursor to the end. will delete all characters from the current cursor position to the beginning of the line. Extra spaces do not matter, as they will be automatically removed when you leave the editor.

If an existing qualifier value is already too long to fit on one line, you will be put into the editor mode immediately. If the value reaches the end of the line while you are entering it, you will also be put into the editor mode.

For qualifiers with a specified controlled vocabulary the F13 key starts a query to select a value. If a controlled vocabulary is available, you will see the message "or [F13] for Controlled Vocab" at the bottom of the screen.

EDITING THE SEQUENCE

Entering or Editing a Sequence

In Screen Mode the cursor shows your position in the sequence. You can move around in the sequence, add bases, delete bases, and search for patterns. You can insert any valid GCG nucleotide sequence symbol (GCG Program Manual Appendix III) into the sequence by typing the symbol. It will be inserted at the cursor (or the base at the cursor will be overwritten in OverStrike mode).

The INCLUDE command copies a sequence range from any file in GCG format into the present sequence.

Deleting Bases in a Sequence

The key will delete the bases to the left of the cursor, one by one. Typing a number first will delete a specified number of bases. Typing 10 will delete the ten bases to the left of the cursor.

The DELETE command deletes a specific base range in command mode.

Screen Mode Summary

Here is the summary of Screen Mode commands and the keypad layout in the on-line help.


                       Screen Mode Commands
           [n] is an optional numeric parameter.

G, A, T, C, . . .  - insert a sequence character
           - delete a sequence character
/qualname=value    - add/change qualifier
?TAACG        - find the next occurrence of "TAACG"
                           (last pattern is the default)

[n]   - go ahead n characters
[n]    - go back  n characters
         - go up to protein frame (or nucleotide) above
       - go down to protein frame below
'A                 - go to marked position A
37            - go to position 37 (any integer)

  Press  for more:

Keypad Layout Diagram

+---------+---------+---------+ +---pf1---+---pf2---+---pf3---+---pf4---+ ! ! !UNDELFEAT! ! ! ! !UNDELFEAT! ! ! ! ! ! ! ! ! ! ! FIND ! NEWFEAT ! DELFEAT ! ! GOLD ! HELP ! FIND ! DELFEAT ! +---------+---------+---------+ +----7----+----8----+----9----+---dash--+ ! NOSHOW ![GO-FROM]! [GO-TO] ! ! COMMAND !NOBETWEEN!NOREPLACE! /= ! ! ! ! ! ! ! ! ! ! ! SHOW ! 50-LEFT ! 50-RIGHT! ! SUBS ! BETWEEN ! REPLACE ! DELQUAL ! +---------+---------+---------+ +----4----+----5----+----6----+--comma--+ ! (GROUP) ! (GROUP) !COMPLEMEN! TOP ! Other useful keys: ! ! ! ! ! ! FORWARD !NOFORWARD! NOCOMPL ! PREV-FT ! / add or change qualifier +----1----+----2----+----3----+--enter--+ ? find a sequence pattern ! FROM < ! TO > ! ORF ! ! < move left 50 base ! ! ! ! BOTTOM ! > move right 50 bases ! FROM ! TO ! STOP ! ! left move left +---------0---------+---dot---+ + right move right ! BROWSE prev ! NEWFEAT ! ! up move between nuc/protein ! ! ! NEXT-FT ! down move between nuc/protein ! BROWSE next ! NUMBER ! ! : enter command mode +---------+---------+---------+---------+ ctrl-H start of sequence ctrl-E end of sequence Top-row keys: +---help--+---------Do--------+ +---f17---+---f18---+---f19---+---f20---+ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! HELP ! COMMAND ! ! ! ! ! CURRENT ! +---------+---------+---------+ +---------+---------+---------+---------+ Press for more:

COMMAND MODE
Type a Ctrl-Z or : in Screen Mode to enter Command Mode, or use the key to enter a single command and return to Screen Mode. The cursor will move down to the lower left-hand corner of the screen next to a colon (:) where you can enter any of the commands shown below followed by a .
Editing NewFeatures Commands
NewFeatures command editing is modelled on VMS DCL command line editing. The and keys let you move your cursor around in a command that you have typed so you can insert or delete characters at any position. As in DCL, Ctrl-H and Ctrl-E move the cursor to the beginning and end of the line, respectively. Ctrl-U deletes all the characters from the current cursor position to the start of the line.
Editing Previous NewFeatures Commands
NewFeatures will let you modify and execute previous commands as in DCL command line editing. The key displays previous commands when you are at the ":" prompt.
Returning to Screen Mode
NewFeatures normally returns to Screen Mode after each command. If you have used the command NOSINGle then NewFeatures remains in Command Mode until you type .
Commands May Be Truncated
Only the capitalized portion of the commands described in the documentation below needs to be typed. All commands may be prefixed with NO, and allow up to two numbers before the command and any command text after. Extra options will be simply ignored if the command does not use them.
Parameters are Used with Commands
Some commands can be preceded with numeric parameters or succeeded with a file name or text. The square brackets ( [ ] ) in the documentation below show command parameters that are optional (you can leave them out and will be prompted for values if needed).
All commands accept up to two numbers before the command name, and a free text value after. Command names may also be prefixed with NO. If any of these options is not used by the command it will simply be ignored. For example, 12 34 NOEXIT testing will simply exit.
Command Mode Summary
Here is the summary of Command Mode commands you would see with the command Help. Each command is described in detail in the next section.
Command Mode Summary - General

General Commands Commands end with . [n] indicates an optional parameter. Only the capitalized part of the command is necessary. 37 - go to base 37 - go to "from" position of feature - go to "to" position of feature BUFFer - buffer space management Change or - enter screen mode [NO]COMPlement - sequence direction ENDFind - position at start/end of target sequence EXit [featname] - write feature table (and sequence) and quit Find - find pattern in nucleotide or protein seq [n] FRAme - cursor on reading frame [n] [n] Go - go to base [n] Help - display help screens NUCleotide - finds in nucleotide seq with ambiguity codes PROtein - finds in translation (perfect matches only) Quit - quit without writing features or sequence [n] READ [method] - priorities for reading oracle, flatfile, etc. REDraw or - redraw the screen [NO]RESOLVE - convert all labels into base ranges Seq - reports the sequence name and length SHow [what] - valid feature keys and qualifiers, loaded tables, known marks, known labels [NO]SHOWREVerse - display complementary strand on screen [n] TABLE [name] - set default translation table [NO]THREE - use (one or) three letter code for protein WQ [featfile] - write feature table and quit Press to continue:

Command Mode Summary - Feature Editing

Feature Editing Commands Commands end with . [n] indicates an optional parameter. b and e are numbers for start and finish or a range of interest p is the sequence position, f is a feature number Only the capitalized part of the command is necessary. - edit next feature, or after to top - edit previous feature, or after to bottom - create new feature line/group /qualname=value - add/change qualifier /+qualname=value - add new qualifier even if one exists /= - repeat latest qualifier name and value [NO]ACCno [accno] - sets location in another entry [n f] ADDEXON - adds another exon for selective use [f] BETWeen - set feature location "from^to" BROWse [accno] - set display to another entry [f] CHECK - report problems in coding sequence [NO]CHECKFile - output file for CHECK CURrent - set display back to main entry [b e] DELFeat - delete feature(s) [b to e] [f] DELQual qualname - deletes a qualifier for feature [f] [f f] DUPlicate - make a copy of feature [f] at line [f] [f] END [accno][label] - set accno:label to end a join [e f] EXONs - create [e] exons for location of feature [f] [f] FEATure [type] - create new feature (or feature group) [f] [NO]FORWARD - toggle direction for feature [f] [p f] FROM [?] [<] [text]- set from position of feature [f] [f f] GROUP - put features into same group (CDS, etc.) [f] [NO]HIde - don't (do) write out feature [f] [f] [NO]INTrons - [delete or] add introns for feature [f] group [f] KEy [key] - change feature key of feature [f] [f] [NO]LABel [label] - sets location as a label [n] LOad [accno] - loads an entry (or [n] entries) for browsing [f] NUMber - go to feature [f] [f] [NO]OPERator [oper]- sets type of join operator [f] ORF [description] - set "from", "to", "desc" for current ORF PEPWrite [filename]- write translation of coding region to file [f] QUALify - add/change qualifier (same as "/qualname") RELOad [accno] - reloads another entry (for a later version) [f] REName [qualifier] - renames a qualifier [f] REPlace - location of feature [f] is replace(fr..to,"subs") [f] SEArch [text] - go to features with text in key or qualifiers SORT [TYPE] [GROUP]- sort by from/to positions (and type or group) [f] STArt [accno][label] - set accno:label to start a join [f] STOP - go to next stop in this frame [f] SUBStitute "seq" - substitution sequence for "replace" location TEST - writes a file FEATURES.TEST for error reports [f] [NO]TEXT [FR/TO/LOC] - set from/to/location as edited text [p f] TO [?] [>] - set to position [p] of feature [f] TRANSlate - retranslate all reading frames (e.g. in seqedit) [NO]TRUNCate - truncate translations at a stop codon [f f] UNDELFeat - undelete last deleted feature line UNLOad [accno] - unloads table from another entry [b e] [NO]USE - uses/excludes exons in a 'join' feature Write [featname] - write the feature table to a file Press to continue:

Command Mode Summary - Sequence Editing

Sequence editing commands: Commands end with . [n] indicates an optional parameter. b and e are numbers for start and finish or a range of interest p is the sequence position, f is a feature number Only the capitalized part of the command is necessary. s f Delete - delete a range of bases [n] Include [seqname] - insert another sequence [at position n] INSert - enter insert mode when sequence editing [n] Mark markcharacter - mark the sequence [at position n] [n] MUTate - swap sequence and substitute for a "replace" OVERstrike - enter overstrike mode when sequence editing SEQEDit - (dis)allow sequence editing [b e] SEQWrite [seqname] - write [a part of] the sequence to a file Press to continue:

GENERAL COMMANDS
37
or any other numerical value will move the cursor to that sequence position. values less then 1 move to the start of the sequence, and values greater than the sequence length move to the end.
BUFFer [Command]
provides management of the internal buffer space used to store all text values. The available commands are: SHOW to show how full the buffer is, COMPRESS to clear deleted strings if the buffer is filling up, and LIMIT to set the maximum space used before a compress is done automatically.
Change
returns your session to Screen Mode. Note that the entire command is optional and a simple is equivalent.
[NO]COMPlement
reverses the direction in which the sequence is annotated. The sequence display is unchanged (so it will still match a figure in a paper for example) but the protein sequences are now the translations in the reverse direction and new features are automatically set to be in the reverse direction. When you specify a "from" and "to" position in reverse direction, the positions are displayed as "complement(to..from)" in the feature table.
You can also use keys to set the forward direction (NOCOMPlement) and then to set the reverse direction (COMPlement).
[NO]ENDFind
makes sequence searches put the cursor at the end of the search pattern instead of the start.
EXit [FileName]
works exactly like Write followed by SEQWrite except that the session with NewFeatures ends after the feature table is written out into a new feature file, and the sequence (if modified) is written out into a new sequence file.
Find
is the command line version of the or keys in screen mode.
[n] FRAme
moves the cursor to the present reading frame (taking the cursor position as the first base in the codon), and turns on the PROTein option.
[s] Go
goes to position [s] in the sequence. This is the command line version of typing the base position and hitting in screen mode.
Help
shows the commands available to the Screen and Command Modes of SeqEd. You can also use the or keys in Screen Mode.
NUCleotide
uses the nucleotide sequence for searching, and moves the cursor to the sequence line with at the present base position. All searches will use the IUB base ambiguity codes. The PROTein command is used when searching in the protein translations.
[NO]PROtein
uses the protein sequences for searching, and moves the cursor to the reading frame that has the present cursor position as the first base in a codon. All searches will now be for exact matches, until the NUCLeotide command resets the option.
Quit
terminates a session with NewFeatures without writing a new features or sequence file. Use Quit instead of Ctrl-Y to terminate a session with NewFeatures. (If you use Ctrl-Y, the next time you run Features from the same directory, it will try to recover from what it assumes was a system crash during your previous session.)
[n] READ [method]
sets the priorities for reading other entries by various methods. The available methods are ORACLE to use EGenCom:Grab.Com to unload an entry, DAT to read an existing file called "accno.dat", FLAT to read a flatfile called "accno.seq" (and perhaps accno.ft for the feature table), and GCG to read the entry for the GCG version of the EMBL database.
The command SHOW READ shows the current priorities. A value of 1 is the highest priority, a value of 5 is the lowest. A value of 0 (zero) means that the method is not to be used.
In addition to the priority values, methods with the same priority are tried in the order ORACLE, FLAT, DAT, GCG.
REDraw
redraws your terminal screen. This is useful if a system message appears on your screen.
You can also use the Ctrl-W key in Screen Mode.
Seq
reports the sequence name and length at the bottom of the screen
[f]SHOW [option]
shows valid feaure keys, qualifiers, and controlled vocabulary. Also shows loaded entries, marked sequence positions, and so on. The possible options are:
KEY - a list of all valid feature keys and mandatory qualifiers
QUAL - a list of valid qualifiers for the feature key of the present line (the [f] value can be used to move to another feature line in one command).
BROWSE - a list of the other entries loaded, with number of features and sequence length
MARK - a list of marked sequence positions (set with the MARK command).
LABEL - a list of known labels in the present entry.
READ - lists the priority values for methods of raeding entries, and shows the actual order in which the methods will be tried.
[NO]SHOWREVerse
turns on (or off) the display of the complementary strand on screen. As this takes up an extra line that could be used to display more features, the complementary strand display if off by default.
[NO]SINGLEcommand
returns to screen mode after each command, or (NOSINGLE) remains in command mode until a null command (just typing ).
[n]TABle [name]
sets the default translation table to be the one with standard name [name] or with ID number [n] in Appendix V of the Features Table Definition.
The default table is normally number 1 (name: SGC0). Other tables should be set with /transl_table qualifiers so that other software can translate the features correctly. A common use for this command is to see the protein sequence displayed on screen using an alternative translation table.
[NO]THRee
uses 3-letter or 1-letter codes for amino acids. The single letter codes are easier to see and can be learned quickly once you begin to use them. The 1-letter code is shown under the first base of the codon in each of the three reading frames. The 3-letter code is shown under the three bases of the codon in each of the three reading frames. This option can be useful where a paper has used the 3-letter codes.
WQ [filename]
writes the feature table to a file and quits.
FEATURE EDITING COMMANDS
/[+][qualname][=][qualvalue]
enters qualifiers and their values for the current feature line.
The qualifier name is prompted for if none is entered. If the name is invalid the command fails. If the name is ambiguous (for example "polyA" could be "polyA_signal" or "polyA_site") a list of alternatives is given in a numbered menu. If the qualifier has a controlled vocabulary the F13 key can be used to start a query to select a suitable value. If a controlled vocabulary is available, the message "or [F13] for Controlled Vocab" appears at the bottom of the screen.
If the qualifier has a value, one must be entered in the correct syntax.
To edit an existing qualifier, specify the name but not the value. You will then be able to edit the current value.
If you wish to cancel the command, entering a null qualifier name or value will not change anything.
To create a new qualifier when one already exists (for example, a second "note"), use "+" in front of the qualifier name.
As a short-cut when the same qualifier is required on several features, the command "/=" will repeat the previous qualifier name and value. This command is available on the keypad with and so you can simply move the feature pointer to another feature line and use the keypad to copy the last-used qualifier.
[f] ACCno [accno]
sets the location of a features (usually an exon) to be in another entry. The program will offer to load the entry if it is not already loaded.
[n f]ADDEXon
adds a new exon number [n] to the group for feature [f], which can be included using the USE command. Note that this moves all higher exon numbers up by one. The intended uses are for alternate splicing and for adding internal exons to existing feature table entries.
[NO]BETWeen
sets the feature location as from^to or from..to.
BROWse [accno]
sets the display to the feature table of another entry loaded automatically when a location was specified in that entry, or with the LOAD command.
[f] CHEck
checks a coding feature for a start codon at the beginning, a stop codon at the end, and no stop codons in the rest of the sequence. The feature line must be CDS, mat_peptide or sig_peptide.
CURrent
sets the display to the current feature table so that editing can continue.
[b e] DELFeat
deletes features [b] to [e]. If only one line is given, only one line is deleted. If no line is given the current feature line is deleted.
You can also delete the current feature line with the key or the key.
The UNDELFeat command restores the last deleted feature line, including all qualifier values.
[f] DELQual [qualname]
deletes the qualifier and its value from feature line [f]. If [f] is not given, the current feature line is used. This is the command line version of the key in screen mode.
[f f] DUPlicate
makes a copy of feature [f], for example to create a second "CDS" feature from a polycistronic (bacterial) mRNA. If [f] is not given, the current feature line is used. If the second [f] is not given, the copies will be together. If the second [f] is higher than the last feature line, the copy will be at the bottom of the feature table.
This command (or the equivalent of deleting a feature and then undeleting it twice) is the simplest way to create new feature lines. If the feature is a "join" of several exons, the same exons are used for all the duplicate features although each "join" feature can have its own "from" and "to" positions within the range of the exons.
[f] [NO]END label
sets the label for the last exon of a join across entries. If [label] is ":" the program will prompt for accession number and label.
[e f] EXONs
creates [e] exons (and [e-1] introns) and makes the location of feature [f] a "join" of all the exons. The number of exons is prompted for if not given. If [f] is not given the current feature line is used.
[b e] FEATure [key]
starts a new feature line, and prompts for the feature key if [key] is not given. If [b] and [e] are given, new feature line(s) are created in that range, but normally the command is used alone to create a new feature line at the bottom of the table.
If the feature key is "CDS" there is a prompt for the number of exons and an "mRNA" line is also created. For 2 or more exons, the "prim_transcript", "exon" and "intron" lines are created automatically.
If the feature key is "CDS ACROSS" there is a prompt for the accession number, label, "from" and "to" position for each exon. If the entry is not yet loaded, there is a prompt for an immediate load. Otherwise there is only a prompt for the "from" and "to" positions.
If the feature key is "repeat_unit" there is a prompt for the number of repeats, and the extra lines are generated automatically.
You can also use the key in Screen Mode, or shorten the command to FT.
[f] [NO]FORWard [GRoup]
sets the feature direction for feature line [f]. If [f] is not given, the current feature line is used. If the command is followed by GRoup the entire feature group is set.
[p f] FRom [?] [<] [text-location]
sets the "from" position for a feature line. If [p] is not given, the present sequence position is used. If [f] is not given, the current feature line is used.
In some cases the position is unknown but must still be included in the feature table as "<1" (or ">end" for the complement direction). You can use 0 as the position or put a "?" after the command to specify an unknown position.
If the exact position is not known, or it is beyond the start or end of the sequence, you can set the "less than" sign (<) by typing "<" at the end of the command.
If the location is more complex (for example, (12.34) to specify a position within a range) you can specify the location as text after the command. If the text is not a valid dot range, it is accepted but a warning message is issued.
The "from" position is also used by "joined" features (such as CDS) to specify the start position. By default the start position is the beginning of the first exon but this is overridden by setting a "from" position on the feature line.
For joins across entries, if the join starts in another entry use the START command to set the accno:range or accno:label of the first (partial) exon used.
[f f] GRoup
puts a feature into the same group as another feature, for example to link a new "CDS" feature to existing "exon" features.
The first number defines any feature in the group you want to add to, and defaults to creating a new group.
The second number defines the feature you are changing, and defaults to the current feature.
[f] [NO]HIde
specifies whether features will appear in the output file. "Hidden" features (for example, exons that are only used to point to positions for a "joined" feature) appear on the screen with the feature key in quare brackets , for example [exon]. These feature lines are not included in the output file.
The key is the same as HIDE.
[f] [NO]INTrons
The NOINTrons command deletes all intron features for the current feature group. It can be useful if you have exons that point to another entry, or if you have introns of zero length.
The INTrons command creates all missing intron features for the current feature group.
[f] KEy [key]
defines the feature key. If [f] is not given the current feature line is used. [key] is the feature key to be used. If no feature key is given NewFeatures will prompt for a feature key.
The feature key is checked against a list of valid keys. If more than one feature could match the key, NewFeatures will list the possible feature keys in a numbered menu.
If you decide to cancel the command, simply enter a null feature key.
[f] [NO]LABel [label]
sets the location to be a label rather than a range. This is usually used to specify labels for exons. If the label used is defined by a /label qualifier on the same exon, the location will be written out in the "from..to" form for the exon, but the label will be used in joins etc.
LOad [accno]
loads the feature table and sequence from another entry. The feature table can be displayed on screen with the BROWSE command, and the sequence can be used to validate a join across entries or to write out the peptide sequence from such a join.

[f] Number

moves the current feature pointer to feature line [f]. If no number is given, the pointer moves to the next feature. If -1 is given, the pointer moves to the previous feature.

In Screen Mode you can also use the key to go to the next feature, or the key to go to the previous feature. You can also use the key and the key to go to the next feature, or the key and the key to go to the previous feature.

If you first type a number, the and keys go up or down that many lines.

[f] [NO]OPERator [operator]

sets (or clears) the join operator for a location. Valid operators are JOIN, ORDER, ONE-OF or GROUP. In these cases the location is built from a set of exons for the group of features. The NOOPER command resets the location to a simple base range.

[p f] ORF

sets the "from" position to the cursor position (or [p] is a position is given), moves the cursor to the end of the open reading frame, and sets the "to" position for feature line [f]. If [f] is not given, the current feature line is used. The "to" position is the last base before the stop codon, unless the feature is a "CDS" when the "to" position is the end of the stop codon.

This command is only used for protein coding regions.

PEPWrite [filename]

writes the translation of the current coding region to a file. If [filename] is not given, the output file is "seqname".pep.

[f] QUALify

is the same as typing "/" (see above) but allows you to specify a feature other than the current line.

[f] REName [qualname]

renames a qualifier to another of the same type, for example /note to /product. There is a prompt for the new qualifier name.

RELOAD [accno]

reloads the feature table and sequence from another entry. The feature table can be displayed on screen with the BROWSE command, and the sequence can be used to validate a join across entries or to write out the peptide sequence from such a join. RELOAD is useful if the feature table of the other entry has been changed during the current edit session.

[f] REPLace

specifies that the feature location is in the form "replace(from..to,"subs") where "from" and "to" are the normal "from" and "to" positions, and the substitute string is specified with the SUBStitute command.

[f] SEArch [pattern]

searches for the text string [pattern] in the feature key and in the qualifier names and values, starting with feature line [f] (or the current feature line).

This command can be used to find CDS features, or to find where a feature label is defined.

SORT [by]

sorts the feature lines by "from" and "to" positions. You can also specify SORT KEY to sort by feature key (so all the repeats are together for example), or SORT GROUP so all coding region and transcript features (with any exons/introns) for an entry are together.

The normal sort order when you read in an existing feature table is SORT GROUP.

[f] [NO]STArt label

sets the label for the first exon of a join across entries. If [label] is ":" the program will prompt for accession number and label.

[f] STOp

moves the cursor to the end of the open reading frame, and sets the "to" position for feature line [f]. If [f] is not given, the current feature line is used. The "to" position is the last base before the stop codon, unless the feature is a "CDS" when the "to" position is the end of the stop codon.

This command is only used for protein coding regions.

[f] [NO]SUBstitute ["replacement-sequence"]

Some features are changes to the sequence, using the "replace" operator. These are specified by setting "from" and "to" positions normally, and using the SUBStitute command to specify the replacement sequence, which can be a null string "" if the base range is simply deleted.

[a b] [NO]TEST [text]

writes a file newfeatures.test with the program's version of the feature table. If the feature table locations appear incorrect on the screen, make a note of the errors, and send a printed copy of the newfeatures.test file to Peter Rice.

The [text] part of the command is used to generate extra output if requested. BUFFER generates a report of internal buffer space use, BROWSE generates the browse read-only feature tables, TABLE reports on loaded translation tables, QUAL reports valid qualifier names, KEY reports valid feature keys, ITEM reports valid items in lists (for example, fixed controlled vocabularies). ALL generates all extra details, but produces a very large output file.

[f] [NO]TEXt [FROM/TO/LOC] text-location

some from and to positions are not simple numbers or replace operators (see the SUBStitute command), but must be specified as text. This is especially true for the full location, as some problems including RNA editing are not well covered by NewFeatures options.

The command NOTEXT returns to using normal "from" and "to" positions and deletes the previous text

[p f] TO [?] [>]

sets the "to" position for a feature line. If [p] is not given, the present sequence position is used. If [f] is not given, the current feature line is used.

In some cases the position is unknown but must still be included in the feature table as a ">end" (or "<1" for the complement direction). You can use 0 as the position or put a "?" after the command to specify an unknown position.

If the exact position is not known, or it is beyond the start or end of the sequence, you can set the "greater than" sign (>) by typing ">" at the end of the command.

If the location is more complex (for example, (12.34) to specify a position within a range) you can specify the location as text after the command. If the text is not a valid dot range, it is accepted but a warning message is issued.

The "to" position is also used by "joined" features (such as CDS) to specify the end position. By default the end position is the end of the last exon but this is overridden by setting a "to" position on the feature line.

For joins across entries, if the join ends in another entry use the END command to set the accno:range or accno:label of the last (partial) exon used.

TRANSlate

makes certain that the protein sequence lines are correct. This is done automatically after the sequence is edited or reversed, but can be useful during an editing session to show the updated translation of the new sequence.

The translation is otherwise only done when leaving SeqEdit mode to avoid problems in keeping up with fast sequence entry speeds.

[NO]TRUNCate

determines whether protein sequences and checks are terminated at the first detected stop codon or continue to the specified end position.

[f f] UNDELFeat

restores the last feature line deleted. If repeated, this command will make multiple copies of a feature.

The same function is provided by the key and either the key or the key.

UNLOAD [accno]

unloads the feature table and sequence from another entry if loaded for browsing.

[b e] [NO]USe

specifies which "exons" are used in a "join" feature.

By default, all exons are included in a "join", with the exception of those before the "from" or after the "to" position set on a feature. If alternative splicing occurs, this ican be annotated by copying the "CDS" or other features, moving the pointer to one of the"join" feature lines, and using the command 2 NOUSE to remove exon 2 from the join operation. The exon is still used for all other "join" features in the group, and can be restored for this feature by the command 2 USE.

Write [FileName]

writes the current form of the features table into a file. If you name a file, NewFeatures writes the sequence into a file with that name instead of the name of the input file. For example, Write temp.ft would write the features table to a file called temp.ft. The default file extension is ".ft" but any extension may be specified.

SEQUENCE EDITING COMMANDS

If sequence editing is allowed, the bottom right corner of the screen shows the string SEQEDIT. The sequence editing commands are only allowed when SEQEDIT is on. Any character typed in Screen Mode is inserted in the sequence before the present cursor position (unless OverStrike mode is in use).

s f Delete

deletes some or all of the sequence. You must specify a beginning and ending coordinate for the range of symbols you want to delete.

[s] Include [FileName]

includes another sequence within the sequence being edited at the current cursor position or at the position specified by the optional parameter. If no FileName is given, NewFeatures will prompt for one.

INSert

Specifies insert mode when sequence editing. The bottom right corner of the screen shows the string INS SEQEDIT.

[s] Mark markcharacter

You can mark the position where the cursor is in a sequence if you wish to return to it later. You give the marked position a letter (for example MARK x) using this command. Then, in Screen Mode, a single quote followed by the letter used to mark the sequence (for example 'x) will move the cursor back to the sequence position where that mark was defined.

[f] MUTate

swaps the current sequence with the substitute value of a "replace" feature location.

Note that this can cause conflicts in the sequence locations of overlapping "replace" locations.

OVERstrike

Specifies overstrike mode when sequence editing. The bottom right corner of the screen shows the string OVER SEQEDIT.

SEQEDit

then sequence editing is turned off until enabled so that commands are not accidentally entered as parts of the sequence.

If sequence editing is allowed, the bottom right corner of the screen shows the string SEQEDIT.

[s f] SEQWrite [FileName]

writes the current form of the sequence into a file. If you supply starting and finishing coordinates, NewFeatures only writes the indicated segment. For example, 1,56 SEQWrite would write bases 1 to 56 into a file. If you name a file, NewFeatures writes the sequence into a file with that name instead of the name of the input file.

RESTRICTIONS

NewFeatures now has very few limits that affect normal use. The program has the following upper limits on storage:

Limits on Feature Table Size Browse (read-only) feature tables: 200 Total feature lines in browse (read-only) feature tables: 2000 Buffer space for general storage: 750kb EC numbers and names: 5000 Feature lines in main table: 1500 Exons in a single join: 200 Groups of features (joins plus exons) in main table: 500 Lists of items (qualifier names, modbases, etc.): 1000 Items in lists: 2000 Valid qualifiers: 200 Valid feature keys: 200 Maximum total qualifier length for a single feature: 2047

The System Must Know Your Terminal Type

To use the keypad controls in this program, you must have your terminal set up to correctly emulate a VT200 terminal. So far, we have been unable to find out how to do this completely, although on X-terminals many of the keys are active.

If you are an expert in "termcap", or know someone who is, please contact the EGCG support team as we would very much like to support this capability on Unix.

Until then, there is always the option of using the command line, or we could rewrite the keypad interpreter to use an alternative set of keystrokes.

ACKNOWLEDGEMENTS

NewFeatures was written by Peter Rice (now at the Sanger Centre, Hinxton, UK) while in the EMBL Computer Group. It uses the original code of SeqEd for sequence editing and pattern searching.

The original proposal and specifications for NewFeatures came from Kate Rice. Suggestions for the program have come from many EMBL Data Library staff, especially Bernd Roechert, Guenter Stoesser and David Emmert. Further ideas are always welcome.

COMMAND-LINE SUMMARY

All parameters for this program may be put on the command line. Use the option -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the qualifier names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose qualifiers or parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.


Minimum syntax: % newfeatures [-INfile1=]sample.seq -Default

Prompted Parameters: None

Local Data Files

[-INITilize=]newfeatures.init   command line initializing file
-KEYfile=filename               valid feature keys and qualifiers
-QUALfile=filename              valid qualifier names and values
-[NO]ECfile=filename            reformatted ENZYME database. -NOEC skips
-TABLEfile=filename             GCG version of translation tables

Optional Parameters:

-ORAcle=2            priority for loads from local (Oracle) database
-GCGdatabase=3       priority for loads from GCG database
-DATfile=2           priority for loads from a .DAT file
-FLATfile=4          priority for loads from .FT and .SEQ files
-[NO]SINGlecommand   stays in command mode or returns to screen mode
-TRANSlate=sgc0      default translation table
-TRace=filename      trace parsing of an existing feature table
-TESTfile=filename   output file for the TEST command
-CIRCular            treat sequence as circular (not yet implemented)
-SHOWREVerse         show both strands on screen
-PAUse               read messages before screen mode

LOCAL DATA FILES

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

Customizing Your Keyboard With SetKeys

You can use the program SetKeys to create a set.keys file that tells the editors SeqEd, GelEnter, LineUp, and GelAssemble how to interpret the letters you type at the terminal. When entering gel readings, it is useful to have the symbols for G, A, T, and C under the fingers of one hand in the same positions as the lanes in your gel. SeqEd, GelEnter, LineUp, and GelAssemble automatically read the file set.keys if it is present in your local directory. If set.keys is absent, or if the sequence type is set to Protein (in SeqEd and LineUp, only) the terminal keys retain their conventional meanings.

If you have a set.keys file in your directory, SeqEd, GelEnter, LineUp,and GelAssemble only respond to the sequence characters that it redefines. You can edit the file set.keys with a text editor if some of the keys you want to use are not in it. Any keys not mentioned in set.keys appear to be dead.

Several keys are vital for the control of SeqEd, LineUp, GelEnter, and GelAssemble; this means you are not allowed to redefine the keys for /, [, ], {, }, (, ), :, ,, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, R, D, H, , and E.

Preset Command Line Options

The command line options below may be preset with a local data file called NewFeatures.init. The command line could have an expression like -INITialize=MyFileName if you want to use a different file name. For example, you could create a file called FlatFeat.init which set /FLATfile=1 to load from a flat file whenever possible, then start NewFeatures with the command newfeatures -Init=flatfeat.init.

OPTIONAL PARAMETERS

The parameters and switches listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.

-ORAcle=2

sets the priority (in the range 1-5) for loading an entry from the local database (Oracle when used at EMBL) using the EGenCom:Grab.Com command procedure. Low numbers are used first. Specify 0 to turn off this method of reading an entry.

-GCGdatabase=3

sets the priority (in the range 1-5) for loading an entry from the GCG EMBL or EMNEW database. Low numbers are used first. Specify 0 to turn off this method of reading an entry.

-DATfile=2

sets the priority (in the range 1-5) for loading an entry from an EMBL or GCG format file with the extension .dat. Low numbers are used first. Specify 0 to turn off this method of reading an entry.

-FLATfile=4

sets the priority (in the range 1-5) for loading an entry from a flat file with the extensions .seq for the sequence and .ft for the feature table. Low numbers are used first. Specify 0 to turn off this method of reading an entry.

-NOSINGlecommand

sets NewFeatures to remain in Command Mode after processing a command.

-TRANSlate=TableName

gives an alternative translation table for (for example) organelle sequences. Valid translation tables are listed in section 7.5.5 of Appendix V in the DDBJ/EMBL/GenBank FeatureTable Definition or in the Data Library documentation. New tables are easily built by editing an existing table to create a file in your own directory and using that filename as the TableName.

"TableName" can be the standard name of the ID number of the table. The default is table 1 (name: SGC0)

-TRace=TraceFile

requests a trace of a feature table load to identify problems in converting existing feature tables into features, groups and exons.

-TESTfile=newfeatures.test

specifies the output file name for the TEST command.

-CIRCular

specifies that the input sequence is circular. This at present has no effect, but could be used in future to direct automatic generation of joins across the end of the sequence if there is a demand.

-SHOWREVerse

displays both strands of the sequence on the screen.

-KEYfile=EGenRunData:newfeatures.key

sets the name of the control file that specifies feature keys and their permitted and mandatory qualifiers.

-QUALfile=EGenRunData:newfeatures.qualify

sets the name of the control file that specifies feature qualifiers and their possible values.

[NO]-ECfile=EGenRunData:newfeatures.ec

sets the name of the reformatted ENZYME database used to convert EC numbers into standard enzyme names. The -NOECfile option makes NewFeatures start faster by not reading the ENZYME database, but product names cannot then be automatically updated.

-TABLEfile=EGenRunData:newfeatures.tables

sets the name of the control file that specifies feature keys and their permitted and mandatory qualifiers.

Printed: April 22, 1996 15:54 (1162)