FUNCTION
NewFeatures is an interactive editor for entering and modifying the feature table and for minor editing of the sequence itself.
DESCRIPTION
NewFeatures uses the screen of your terminal as a window into a data library entry. It works like the VMS text editor EDT or the GCG sequence editing program SeqEd. You can add, modify and delete feature table items by either typing the sequence positions or by searching for known short sequences and pointing to the correct feature position with the cursor.
You can also update the sequence if it is found to be incorrect. Changes you make in the sequence take place at the cursor position and are reflected immediately on the screen. Feature positions are updated automatically. You can insert or delete bases, move the cursor, and search for patterns.
If the feature table already exists, either from an existing entry, or from the output of a previous NewFeatures run, the feature table will be loaded and an attempt will be made to create the correct feature groups for coding sequences and repeats.
The feature table can be in a file in your current directory, in the GCG copy of the database, or loaded by a locally defined "Grab". There are command line options to tell NewFeatures where to look for the latest version of the feature table and sequence.
The command file EGenCom:Grab.Com (if present) is executed to access local data collections.
NewFeatures will let you change the positions of the keys on your terminal keyboard to make it more convenient to enter the letters G, A, T, and C for sequence editing. The method is the same as for the GCG program SeqEd.
AUTHOR
This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
EXAMPLE
Users already familiar with the VMS EDT editor or the SeqEd program will learn to use NewFeatures quickly. When you run NewFeatures with a command like % NewFeatures x12345.dat, your screen will look something like this:
X12345.Dat NEWFEATURES
FH Key Location/Qualifiers
FH
1 >> FT CDS 262..1299
FT /product="amidase"
2 * FT mRNA <1..>2657
0 10 20 30 40 50 60 70
....|.........|.........|.......:.|.........|.........|.........|.........|....
AGCTTCCGTGCGAATGATGGCATGCATGCTATCTCAGGCTCGCACCATGTGCTTTCGCGATCGCGCCGATTACA
S F R A N D G M H A I S G S H H V L S R S R R L H
A S V R M M A C M L S Q A R T M C F R D R A D Y I
L C P E * W H A C Y L R L A P C A F A I A P I T
EDIT NEW OR EXISTING SEQUENCES
If you name a sequence file that already exists, NewFeatures will display the existing feature table in the top part of the screen followed by the sequence and the three protein sequence translations in the present direction (forward or reversed).
The nucleotide and protein sequences, the command line, and the message line are always in the bottom few lines of the screen. The rest of the screen is used to display the feature table. If you are editing a large features table, you can use a workstation screen set to 55 lines \ of the feature table than you can see on a normal terminal.
If the sequence you name does not exist, NewFeatures will start in SeqEdit Mode (see
below) to allow you to enter the new sequence. Type
If you are creating a new feature table, of if the source key is missing, NewFeatures will create a source key and prompt you for the organism name.
SCREEN MODE
Moving the Current Feature Line Pointer
The ">>" pointer in the feature table display shows the current feature line that is being edited. Coding regions and groups of repeats have more than one feature line, indicated by an asterisk (*) on the other lines in the group.
The
The
Typing a number first causes the
The SEARCH command moves to the next feature (starting at the present line) that contains specified text in the feature key of qualifiers.
The pointer is saved for each table, so when you browse another table and return later you will be on the same feature line.
Moving the Sequence Position Cursor
To move the cursor to the right, use the
You can type a number followed by a carriage return and the cursor will move to that sequence position.
You can type a number followed by an arrow key to move a specific number of bases to move to
the left or right. 10
You can move 50 bases to the right with the ">" key, or with the
You can move 50 bases to the left with the "<" key, or with the
Nucleotide and Protein Sequence Lines
The command PROtein in Command Mode moves the cursor to the protein sequence and makes NewFeatures search for an exact match to a protein sequence.
The commands NUCleotide and NOPROTein in Command Mode make searches recognize patterns containing IUB nucleotide ambiguity symbols.
The
The
By default, NewFeatures shows only the forward strand on the screen The SHOWREVerse command, and the command line option, tell NewFeatures to also show the complementary strand. This takes up one extra line on the screen, and so leaves a little less space to display feature table lines.
Finding Patterns
To search for a sequence pattern, type a ? or use the
If your cursor is on the nucleotide sequence, NewFeatures will search for a nucleotide pattern, using the normal ambiguity codes.
If your cursor is on one of the protein sequence translations, NewFeatures will search for an exact peptide sequence (using the default translation table).
NewFeatures uses the same rules for pattern definition and recognition as the programs SeqEd, FindPatterns, MapPlot, Map, and MapSort.
Even if NewFeatures is searching the nucleotide sequence, you can request a perfect match search by typing "=" after the "?". For example, ?=RCT will only match RCT (case does not matter) no matter which kind of sequence the cursor is on.
Finding a "Marked" Position
You can mark a position in a sequence to which you wish to return. You give the marked position a letter (like giving it a name) using the Command Mode command Mark (see below). Then, in Screen Mode, a single quote followed by the letter used to mark the sequence (for example 'x) will move the cursor to the sequence position where that mark was defined. The marks are not saved when you exit from NewFeatures.
Leaving Screen Mode
Type the
Use the
EDITING THE FEATURE TABLE
Creating Feature Lines
New feature lines are created with the command FEATURE, or more commonly with
the
Once you have a feature line on the screen, you can also create new lines by deleting any feature and then undeleting it more than once (just like "cut and paste" in an editor). You can then change the type of the new copies with the KEY command, and/or change their locations.
When you create a "CDS" feature, several additional lines are automatically created. These include "mat_peptide", "mRNA", and "prim_transcript" if they are regarded as "standard". You are also asked how many exons the CDS has, or whether it is a cDNA. If the CDS is made up of one or more exons, the "exon" and "intron" features are also created. These are then used for defining the feature locations of the "joined" features.
You can also specify a join across entries for a new CDS feature. You are then asked for the accession number of each exon, and the other entries are loaded into memory automatically.
Setting Feature Locations
Feature locations at their simplest are in the form "from".."to" or simply "from" if both positions are the same (a modified base position for example).
You set the "from" position usually by moving the sequence cursor to the first base of the
feature (using pattern searching, or typing the sequence position followed by
If you know the exact base number of the "from" position, you can set it faster by typing the
number and hitting keypad-1 immediately. The sequence position cursor is not
changed, only the feature location is updated.
The "to" position is set in the same way, using the
You can "turn off" the "from" and "to" positions (set them back to "<1" and ">end") by setting
the position to unknown using the commands FROM ? and TO ?. Setting
the position to 0 (zero) has the same effect.
More complex feature locations are "joins" of several sequence ranges. The individual ranges
are set as exons, and are automatically copied to the "join" features and to the "intron"
features. The exons represent usually the sections of the primary transcript that appear in
the mRNA. The "CDS" and "mat_peptide" features do not extend to the extreme ends of the
exons, so these are defined by setting a "from" and "to" position for the "CDS" or "mat_peptide"
feature line as for a simple location.
Some other feature keys, such as "tRNA" can also be made by processing of the original
sequence. You can convert a feature to a "joined" location with the EXONS
command and convert back to a simple location with the NOOPER command.
In some cases the features location is a "group", "order" or "one-of" several possible locations.
These are set in the same way as "join" locations, except that there is no need for introns. The
OPER command sets the type of location.
There are also commands REPLACE and SUBSTITUTE which define a
location in the form replace(from..to,"subs") for mutants, conflicts, and so on. The
MUTATE command switches the original sequence and the substitute string. Note
that this can cause conflicts in an entry with a large number of overlapping "replace"
locations.
In a few extreme cases, it may not be possible to enter the feature location by editing exon
positions (for example, if RNA editing occurs at many positions). In these cases, the command
TEXT LOC provides a location as an edited text string which is not further validated
by the program.
There is a simple check for two possible text locations: sequences in the form "acgt" or dot
ranges in the form (12.13). Sequences are accepted as complete locations, and dot ranges are
accepted as complete locations or as "from" or "to" positions. Any other text location or "from"
or "to" position is accepted, but a warning message is issued.
Adding and Changing Qualifiers
Qualifiers are specified by typing "/name=value" in the same way as they appear in the final
feature table. The name need only be the first few letters if there is no other qualifier that can
match. The same is true for values which are limited by the controlled vocabulary and listed
in the Appendix of the Feature Table Definition, for example modified base
names and repeat types.
For qualifiers with a larger controlled vocabulary the F13 key starts a query to select
an organism name. If this option is available, you will see the message "or [F13] for
Controlled Vocab" at the bottom of the screen.
The value is optional. If you do not give one, you will be offered the existing value (if any) to
edit. If you decide to make no change, simply press
Certain qualifiers may appear only once for a feature line, while others (/note is an obvious
example) can occur any number of times. To create a new copy of a qualifier, put + before the
qualifier name. For example, /+NOTE will create a new /note qualifier.
If several copies of a qualifier already exist, you will be asked which copy you want to edit.
Qualifiers with complex values, such as /anticodon, can be entered in a simpler form. As the
"pos:" and "aa:" parts are fixed, the program only requires a value of (32..34,Met) and will fill
in the rest. For /cons_splice a value of (n,y) is sufficient.
Copying Qualifiers
In some cases, it is useful to be able to copy qualifiers to several other featues. One way is to
add the qualifiers for one feature, then delete and undelete it to create several copies with the
same qualifiers. You can also repeat the last qualifier with the command "/=" which
is also on the keypad as
Editing Qualifier Values
Qualifier values can be too long to fit on a single line. In such cases, a text editor is available.
When entering a qualifier value, simply hit
You can move around using the arrow keys, and make insertions and deletions as you wish.
The editing window will scroll up and down if there too many lines to fit on the screen. As in
DCL,
If an existing qualifier value is already too long to fit on one line, you will be put into the
editor mode immediately. If the value reaches the end of the line while you are entering it,
you will also be put into the editor mode.
For qualifiers with a specified controlled vocabulary the F13 key starts a query to
select a value. If a controlled vocabulary is available, you will see the message "or [F13] for
Controlled Vocab" at the bottom of the screen.
EDITING THE SEQUENCE
Entering or Editing a Sequence
In Screen Mode the cursor shows your position in the sequence. You can move around in the
sequence, add bases, delete bases, and search for patterns. You can insert any valid GCG nucleotide
sequence symbol (GCG Program Manual Appendix III) into the sequence by typing the
symbol. It will be inserted at the cursor (or the base at the cursor will be overwritten in OverStrike
mode).
The INCLUDE command copies a sequence range from any file in GCG format into the
present sequence.
Deleting Bases in a Sequence
The
The DELETE command deletes a specific base range in command mode.
Screen Mode Summary
Here is the summary of Screen Mode commands and the keypad layout in the on-line help.
Keypad Layout Diagram
COMMAND MODE
Type a Ctrl-Z or : in Screen Mode to enter Command Mode, or use the
Editing NewFeatures Commands
NewFeatures command editing is modelled on VMS DCL command line editing. The
Editing Previous NewFeatures Commands
NewFeatures will let you modify and execute previous commands as in DCL
command line editing. The
Returning to Screen Mode
NewFeatures normally returns to Screen Mode after each command. If you have
used the command NOSINGle then NewFeatures remains in Command
Mode until you type
Commands May Be Truncated
Only the capitalized portion of the commands described in the documentation below needs to
be typed. All commands may be prefixed with NO, and allow up to two numbers
before the command and any command text after. Extra options will be simply ignored if the
command does not use them.
Parameters are Used with Commands
Some commands can be preceded with numeric parameters or succeeded with a file name or
text. The square brackets ( [ ] ) in the documentation below show command parameters that
are optional (you can leave them out and will be prompted for values if needed).
All commands accept up to two numbers before the command name, and a free text value
after. Command names may also be prefixed with NO. If any of these options is not
used by the command it will simply be ignored. For example, 12 34 NOEXIT testing
will simply exit.
Command Mode Summary
Here is the summary of Command Mode commands you would see with the command
Help. Each command is described in detail in the next section.
Command Mode Summary - General
Command Mode Summary - Feature Editing
Command Mode Summary - Sequence Editing
GENERAL COMMANDS
37
or any other numerical value will move the cursor to that sequence position. values less then
1 move to the start of the sequence, and values greater than the sequence length move to the
end.
BUFFer [Command]
provides management of the internal buffer space used to store all text values. The available
commands are: SHOW to show how full the buffer is, COMPRESS to clear
deleted strings if the buffer is filling up, and LIMIT to set the maximum space used
before a compress is done automatically.
Change
returns your session to Screen Mode. Note that the entire command is optional and a simple
[NO]COMPlement
reverses the direction in which the sequence is annotated. The sequence display is unchanged
(so it will still match a figure in a paper for example) but the protein sequences are now the
translations in the reverse direction and new features are automatically set to be in the
reverse direction. When you specify a "from" and "to" position in reverse direction, the
positions are displayed as "complement(to..from)" in the feature table.
You can also use keys
[NO]ENDFind
makes sequence searches put the cursor at the end of the search pattern instead of the start.
EXit [FileName]
works exactly like Write followed by SEQWrite except that the session with
NewFeatures ends after the feature table is written out into a new feature file, and
the sequence (if modified) is written out into a new sequence file.
Find
is the command line version of the > or
[n] FRAme
moves the cursor to the present reading frame (taking the cursor position as the first base in
the codon), and turns on the PROTein option.
[s] Go
goes to position [s] in the sequence. This is the command line version of typing the base
position and hitting
Help
shows the commands available to the Screen and Command Modes of SeqEd. You
can also use the
NUCleotide
uses the nucleotide sequence for searching, and moves the cursor to the sequence line with at
the present base position. All searches will use the IUB base ambiguity codes. The
PROTein command is used when searching in the protein translations.
[NO]PROtein
uses the protein sequences for searching, and moves the cursor to the reading frame that has
the present cursor position as the first base in a codon. All searches will now be for exact
matches, until the NUCLeotide command resets the option.
Quit
terminates a session with NewFeatures without writing a new features or sequence
file. Use Quit instead of Ctrl-Y to terminate a session with
NewFeatures. (If you use Ctrl-Y, the next time you run Features
from the same directory, it will try to recover from what it assumes was a system crash during
your previous session.)
[n] READ [method]
sets the priorities for reading other entries by various methods. The available methods are
ORACLE to use EGenCom:Grab.Com to unload an entry, DAT to read an
existing file called "accno.dat", FLAT to read a flatfile called "accno.seq" (and perhaps
accno.ft for the feature table), and GCG to read the entry for the GCG version of the
EMBL database.
The command SHOW READ shows the current priorities. A value of 1 is the highest
priority, a value of 5 is the lowest. A value of 0 (zero) means that the method is not to be used.
In addition to the priority values, methods with the same priority are tried in the order
ORACLE, FLAT, DAT, GCG.
REDraw
redraws your terminal screen. This is useful if a system message appears on your screen.
You can also use the Ctrl-W key in Screen Mode.
Seq
reports the sequence name and length at the bottom of the screen
[f]SHOW [option]
shows valid feaure keys, qualifiers, and controlled vocabulary. Also shows loaded entries,
marked sequence positions, and so on. The possible options are:
KEY - a list of all valid feature keys and mandatory qualifiers
QUAL - a list of valid qualifiers for the feature key of the present line (the [f] value can be
used to move to another feature line in one command).
BROWSE - a list of the other entries loaded, with number of features and sequence length
MARK - a list of marked sequence positions (set with the MARK command).
LABEL - a list of known labels in the present entry.
READ - lists the priority values for methods of raeding entries, and shows the actual order in
which the methods will be tried.
[NO]SHOWREVerse
turns on (or off) the display of the complementary strand on screen. As this takes up an extra
line that could be used to display more features, the complementary strand display if off by
default.
[NO]SINGLEcommand
returns to screen mode after each command, or (NOSINGLE) remains in command mode until
a null command (just typing
[n]TABle [name]
sets the default translation table to be the one with standard name [name] or with ID number
[n] in Appendix V of the Features Table Definition.
The default table is normally number 1 (name: SGC0). Other tables should be set
with /transl_table qualifiers so that other software can translate the features correctly. A
common use for this command is to see the protein sequence displayed on screen using an
alternative translation table.
[NO]THRee
uses 3-letter or 1-letter codes for amino acids. The single letter codes are easier to see and can
be learned quickly once you begin to use them. The 1-letter code is shown under the first base
of the codon in each of the three reading frames. The 3-letter code is shown under the three
bases of the codon in each of the three reading frames. This option can be useful where a
paper has used the 3-letter codes.
WQ [filename]
writes the feature table to a file and quits.
FEATURE EDITING COMMANDS
/[+][qualname][=][qualvalue]
enters qualifiers and their values for the current feature line.
The qualifier name is prompted for if none is entered. If the name is invalid the command
fails. If the name is ambiguous (for example "polyA" could be "polyA_signal" or "polyA_site") a
list of alternatives is given in a numbered menu. If the qualifier has a controlled vocabulary
the F13 key can be used to start a query to select a suitable value. If a controlled
vocabulary is available, the message "or [F13] for Controlled Vocab" appears at the bottom of
the screen.
If the qualifier has a value, one must be entered in the correct syntax.
To edit an existing qualifier, specify the name but not the value. You will then be able to edit
the current value.
If you wish to cancel the command, entering a null qualifier name or value will not change
anything.
To create a new qualifier when one already exists (for example, a second "note"), use "+" in
front of the qualifier name.
As a short-cut when the same qualifier is required on several features, the command
"/=" will repeat the previous qualifier name and value. This command is available
on the keypad with
[f] ACCno [accno]
sets the location of a features (usually an exon) to be in another entry. The program will offer
to load the entry if it is not already loaded.
[n f]ADDEXon
adds a new exon number [n] to the group for feature [f], which can be included using the
USE command. Note that this moves all higher exon numbers up by one. The
intended uses are for alternate splicing and for adding internal exons to existing feature table
entries.
[NO]BETWeen
sets the feature location as from^to or from..to.
BROWse [accno]
sets the display to the feature table of another entry loaded automatically when a location was
specified in that entry, or with the LOAD command.
[f] CHEck
checks a coding feature for a start codon at the beginning, a stop codon at the end, and no stop
codons in the rest of the sequence. The feature line must be CDS, mat_peptide or sig_peptide.
CURrent
sets the display to the current feature table so that editing can continue.
[b e] DELFeat
deletes features [b] to [e]. If only one line is given, only one line is deleted. If no line is given
the current feature line is deleted.
You can also delete the current feature line with the
The UNDELFeat command restores the last deleted feature line, including all
qualifier values.
[f] DELQual [qualname]
deletes the qualifier and its value from feature line [f]. If [f] is not given, the current feature
line is used. This is the command line version of the
[f f] DUPlicate
makes a copy of feature [f], for example to create a second "CDS" feature from a polycistronic
(bacterial) mRNA. If [f] is not given, the current feature line is used. If the second [f] is not
given, the copies will be together. If the second [f] is higher than the last feature line, the copy
will be at the bottom of the feature table.
This command (or the equivalent of deleting a feature and then undeleting it twice) is the
simplest way to create new feature lines. If the feature is a "join" of several exons, the same
exons are used for all the duplicate features although each "join" feature can have its own
"from" and "to" positions within the range of the exons.
[f] [NO]END label
sets the label for the last exon of a join across entries. If [label] is ":" the program will prompt
for accession number and label.
[e f] EXONs
creates [e] exons (and [e-1] introns) and makes the location of feature [f] a "join" of all the
exons. The number of exons is prompted for if not given. If [f] is not given the current feature
line is used.
[b e] FEATure [key]
starts a new feature line, and prompts for the feature key if [key] is not given. If [b] and [e]
are given, new feature line(s) are created in that range, but normally the command is used
alone to create a new feature line at the bottom of the table.
If the feature key is "CDS" there is a prompt for the number of exons and an "mRNA" line is
also created. For 2 or more exons, the "prim_transcript", "exon" and "intron" lines are created
automatically.
If the feature key is "CDS ACROSS" there is a prompt for the accession number, label, "from"
and "to" position for each exon. If the entry is not yet loaded, there is a prompt for an
immediate load. Otherwise there is only a prompt for the "from" and "to" positions.
If the feature key is "repeat_unit" there is a prompt for the number of repeats, and the extra
lines are generated automatically.
You can also use the
[f] [NO]FORWard [GRoup]
sets the feature direction for feature line [f]. If [f] is not given, the current feature line is used.
If the command is followed by GRoup the entire feature group is set.
[p f] FRom [?] [<] [text-location]
sets the "from" position for a feature line. If [p] is not given, the present sequence position is
used. If [f] is not given, the current feature line is used.
In some cases the position is unknown but must still be included in the feature table as
"<1" (or ">end" for the complement direction). You can use 0 as the position
or put a "?" after the command to specify an unknown position.
If the exact position is not known, or it is beyond the start or end of the sequence, you can set
the "less than" sign (<) by typing "<" at the end of the command.
If the location is more complex (for example, (12.34) to specify a position within a
range) you can specify the location as text after the command. If the text is not a valid dot
range, it is accepted but a warning message is issued.
The "from" position is also used by "joined" features (such as CDS) to specify the start position.
By default the start position is the beginning of the first exon but this is overridden by setting
a "from" position on the feature line.
For joins across entries, if the join starts in another entry use the START command
to set the accno:range or accno:label of the first (partial) exon used.
[f f] GRoup
puts a feature into the same group as another feature, for example to link a new "CDS"
feature to existing "exon" features.
The first number defines any feature in the group you want to add to, and defaults to creating
a new group.
The second number defines the feature you are changing, and defaults to the current feature.
[f] [NO]HIde
specifies whether features will appear in the output file. "Hidden" features (for example,
exons that are only used to point to positions for a "joined" feature) appear on the screen with
the feature key in quare brackets , for example [exon]. These feature lines are not
included in the output file.
The key is the same as NOHIDE. The
[f] [NO]INTrons
The NOINTrons command deletes all intron features for the current feature group.
It can be useful if you have exons that point to another entry, or if you have introns of zero
length.
The INTrons command creates all missing intron features for the current feature
group.
[f] KEy [key]
defines the feature key. If [f] is not given the current feature line is used. [key] is the feature
key to be used. If no feature key is given NewFeatures will prompt for a feature key.
The feature key is checked against a list of valid keys. If more than one feature could match
the key, NewFeatures will list the possible feature keys in a numbered menu.
If you decide to cancel the command, simply enter a null feature key.
[f] [NO]LABel [label]
sets the location to be a label rather than a range. This is usually used to specify labels for
exons. If the label used is defined by a /label qualifier on the same exon, the location will be
written out in the "from..to" form for the exon, but the label will be used in joins etc.
LOad [accno]
loads the feature table and sequence from another entry. The feature table can be displayed
on screen with the BROWSE command, and the sequence can be used to validate a
join across entries or to write out the peptide sequence from such a join.
[f] N
moves the current feature pointer to feature line [f]. If no number is given, the pointer moves
to the next feature. If -1 is given, the pointer moves to the previous feature.
In Screen Mode you can also use the
If you first type a number, the
[f] [NO]OPERator [operator]
sets (or clears) the join operator for a location. Valid operators are JOIN, ORDER, ONE-OF or
GROUP. In these cases the location is built from a set of exons for the group of features. The
NOOPER command resets the location to a simple base range.
[p f] ORF
sets the "from" position to the cursor position (or [p] is a position is given), moves the cursor to
the end of the open reading frame, and sets the "to" position for feature line [f]. If [f] is not
given, the current feature line is used. The "to" position is the last base before the stop codon,
unless the feature is a "CDS" when the "to" position is the end of the stop codon.
This command is only used for protein coding regions.
PEPWrite [filename]
writes the translation of the current coding region to a file. If [filename] is not given, the
output file is "seqname".pep.
[f] QUALify
is the same as typing "/" (see above) but allows you to specify a feature other than
the current line.
[f] REName [qualname]
renames a qualifier to another of the same type, for example /note to /product. There is
a prompt for the new qualifier name.
RELOAD [accno]
reloads the feature table and sequence from another entry. The feature table can be displayed
on screen with the BROWSE command, and the sequence can be used to validate a
join across entries or to write out the peptide sequence from such a join. RELOAD is
useful if the feature table of the other entry has been changed during the current edit session.
[f] REPLace
specifies that the feature location is in the form "replace(from..to,"subs") where "from" and "to"
are the normal "from" and "to" positions, and the substitute string is specified with the
SUBStitute command.
[f] SEArch [pattern]
searches for the text string [pattern] in the feature key and in the qualifier names and values,
starting with feature line [f] (or the current feature line).
This command can be used to find CDS features, or to find where a feature label is defined.
SORT [by]
sorts the feature lines by "from" and "to" positions. You can also specify SORT KEY
to sort by feature key (so all the repeats are together for example), or SORT GROUP
so all coding region and transcript features (with any exons/introns) for an entry are together.
The normal sort order when you read in an existing feature table is SORT GROUP.
[f] [NO]STArt label
sets the label for the first exon of a join across entries. If [label] is ":" the program will prompt
for accession number and label.
[f] STOp
moves the cursor to the end of the open reading frame, and sets the "to" position for feature
line [f]. If [f] is not given, the current feature line is used. The "to" position is the last base
before the stop codon, unless the feature is a "CDS" when the "to" position is the end of the
stop codon.
This command is only used for protein coding regions.
[f] [NO]SUBstitute ["replacement-sequence"]
Some features are changes to the sequence, using the "replace" operator. These are specified
by setting "from" and "to" positions normally, and using the SUBStitute command to
specify the replacement sequence, which can be a null string "" if the base range is simply
deleted.
[a b] [NO]TEST [text]
writes a file newfeatures.test with the program's version of the feature table. If the feature
table locations appear incorrect on the screen, make a note of the errors, and send a printed
copy of the newfeatures.test file to Peter Rice.
The [text] part of the command is used to generate extra output if requested. BUFFER
generates a report of internal buffer space use, BROWSE generates the browse read-only
feature tables, TABLE reports on loaded translation tables, QUAL reports valid qualifier
names, KEY reports valid feature keys, ITEM reports valid items in lists (for example, fixed
controlled vocabularies). ALL generates all extra details, but produces a very large output
file.
[f] [NO]TEXt [FROM/TO/LOC] text-location
some from and to positions are not simple numbers or replace operators (see the
SUBStitute command), but must be specified as text. This is especially true for the
full location, as some problems including RNA editing are not well covered by
NewFeatures options.
The command NOTEXT returns to using normal "from" and "to" positions and
deletes the previous text
[p f] TO [?] [>]
sets the "to" position for a feature line. If [p] is not given, the present sequence position is
used. If [f] is not given, the current feature line is used.
In some cases the position is unknown but must still be included in the feature table as a
">end" (or "<1" for the complement direction). You can use 0 as the position
or put a "?" after the command to specify an unknown position.
If the exact position is not known, or it is beyond the start or end of the sequence, you can set
the "greater than" sign (>) by typing ">" at the end of the command.
If the location is more complex (for example, (12.34) to specify a position within a
range) you can specify the location as text after the command. If the text is not a valid dot
range, it is accepted but a warning message is issued.
The "to" position is also used by "joined" features (such as CDS) to specify the end position. By
default the end position is the end of the last exon but this is overridden by setting a "to"
position on the feature line.
For joins across entries, if the join ends in another entry use the END command to
set the accno:range or accno:label of the last (partial) exon used.
TRANSlate
makes certain that the protein sequence lines are correct. This is done automatically after the
sequence is edited or reversed, but can be useful during an editing session to show the
updated translation of the new sequence.
The translation is otherwise only done when leaving SeqEdit mode to avoid problems in
keeping up with fast sequence entry speeds.
[NO]TRUNCate
determines whether protein sequences and checks are terminated at the first detected stop
codon or continue to the specified end position.
[f f] UNDELFeat
restores the last feature line deleted. If repeated, this command will make multiple copies of a
feature.
The same function is provided by the
UNLOAD [accno]
unloads the feature table and sequence from another entry if loaded for browsing.
[b e] [NO]USe
specifies which "exons" are used in a "join" feature.
By default, all exons are included in a "join", with the exception of those before the "from" or
after the "to" position set on a feature. If alternative splicing occurs, this ican be annotated by
copying the "CDS" or other features, moving the pointer to one of the"join" feature lines, and
using the command 2 NOUSE to remove exon 2 from the join operation. The exon is
still used for all other "join" features in the group, and can be restored for this feature by the
command 2 USE.
Write [FileName]
writes the current form of the features table into a file. If you name a file,
NewFeatures writes the sequence into a file with that name instead of the name of
the input file. For example, Write temp.ft would write the features table to
a file called temp.ft. The default file extension is ".ft" but any extension may be specified.
SEQUENCE EDITING COMMANDS
If sequence editing is allowed, the bottom right corner of the screen shows the string
SEQEDIT. The sequence editing commands are only allowed when SEQEDIT is on. Any
character typed in Screen Mode is inserted in the sequence before the present cursor position (unless
OverStrike mode is in use).
s f Delete
deletes some or all of the sequence. You must specify a beginning and ending coordinate for
the range of symbols you want to delete.
[s] Include [FileName]
includes another sequence within the sequence being edited at the current cursor position or
at the position specified by the optional parameter. If no FileName is given,
NewFeatures will prompt for one.
INSert
Specifies insert mode when sequence editing. The bottom right corner of the screen shows the
string INS SEQEDIT.
[s] Mark markcharacter
You can mark the position where the cursor is in a sequence if you wish to return to it later.
You give the marked position a letter (for example MARK x) using this command.
Then, in Screen Mode, a single quote followed by the letter used to mark the sequence (for
example 'x) will move the cursor back to the sequence position where that mark was
defined.
[f] MUTate
swaps the current sequence with the substitute value of a "replace" feature location.
Note that this can cause conflicts in the sequence locations of overlapping "replace" locations.
OVERstrike
Specifies overstrike mode when sequence editing. The bottom right corner of the screen shows
the string OVER SEQEDIT.
SEQEDit
then sequence editing is turned off until enabled so that commands are not accidentally
entered as parts of the sequence.
If sequence editing is allowed, the bottom right corner of the screen shows the string
SEQEDIT.
[s f] SEQWrite [FileName]
writes the current form of the sequence into a file. If you supply starting and finishing
coordinates, NewFeatures only writes the indicated segment. For example,
1,56 SEQWrite would write bases 1 to 56 into a file. If you name a file,
NewFeatures writes the sequence into a file with that name instead of the name of
the input file.
RESTRICTIONS
NewFeatures now has very few limits that affect normal use. The program has the
following upper limits on storage:
Limits on Feature Table Size
Browse (read-only) feature tables: 200
Total feature lines in browse (read-only) feature tables: 2000
Buffer space for general storage: 750kb
EC numbers and names: 5000
Feature lines in main table: 1500
Exons in a single join: 200
Groups of features (joins plus exons) in main table: 500
Lists of items (qualifier names, modbases, etc.): 1000
Items in lists: 2000
Valid qualifiers: 200
Valid feature keys: 200
Maximum total qualifier length for a single feature: 2047
The System Must Know Your Terminal Type
To use the keypad controls in this program, you must have your terminal set up to correctly
emulate a VT200 terminal. So far, we have been unable to find out how to do this completely,
although on X-terminals many of the keys are active.
If you are an expert in "termcap", or know someone who is, please contact the EGCG support
team as we would very much like to support this capability on Unix.
Until then, there is always the option of using the command line, or we could rewrite the
keypad interpreter to use an alternative set of keystrokes.
ACKNOWLEDGEMENTS
NewFeatures was written by Peter Rice (now at the Sanger Centre, Hinxton, UK) while in
the EMBL Computer Group. It uses the original code of SeqEd for sequence editing and
pattern searching.
The original proposal and specifications for NewFeatures came from Kate Rice.
Suggestions for the program have come from many EMBL Data Library staff, especially Bernd
Roechert, Guenter Stoesser and David Emmert. Further ideas are always welcome.
COMMAND-LINE SUMMARY
All parameters for this program may be put on the command line. Use the option -CHEck
to see the summary below and to have a chance to add things to the command line before the
program executes. In the summary below, the capitalized letters in the qualifier names are the
letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose
qualifiers or parameter values that are optional. For more information, see "Using Program
Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG User's Guide.
LOCAL DATA FILES
The files described below supply auxiliary data to this program. The program automatically reads
them from a public data directory unless you either 1) have a data file with exactly the same name in
your current working directory; or 2) name a file on the command line with an expression like
-DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the
User's Guide.
Customizing Your Keyboard With SetKeys
You can use the program SetKeys to create a set.keys file that tells the editors SeqEd,
GelEnter, LineUp, and GelAssemble how to interpret the letters you type at the terminal.
When entering gel readings, it is useful to have the symbols for G, A, T, and C under the
fingers of one hand in the same positions as the lanes in your gel. SeqEd, GelEnter, LineUp,
and GelAssemble automatically read the file set.keys if it is present in your local directory. If
set.keys is absent, or if the sequence type is set to Protein (in SeqEd and LineUp, only) the
terminal keys retain their conventional meanings.
If you have a set.keys file in your directory, SeqEd, GelEnter, LineUp,and GelAssemble only
respond to the sequence characters that it redefines. You can edit the file set.keys with a text
editor if some of the keys you want to use are not in it. Any keys not mentioned in set.keys
appear to be dead.
Several keys are vital for the control of SeqEd, LineUp, GelEnter, and GelAssemble; this
means you are not allowed to redefine the keys for /, [, ],
{, }, (, ), :, ,, 1, 2,
3, 4, 5, 6, 7, 8, 9,
0,
Preset Command Line Options
The command line options below may be preset with a local data file called NewFeatures.init.
The command line could have an expression like -INITialize=MyFileName
if you want to use a different file name. For example, you could create a file called
FlatFeat.init which set /FLATfile=1 to load from a flat file whenever possible, then start
NewFeatures with the command newfeatures -Init=flatfeat.init.
OPTIONAL PARAMETERS
The parameters and switches listed below can be set from the command line. For more information,
see "Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the GCG
User's Guide.
-ORAcle=2
sets the priority (in the range 1-5) for loading an entry from the local database (Oracle when
used at EMBL) using the EGenCom:Grab.Com command procedure. Low numbers are used
first. Specify 0 to turn off this method of reading an entry.
-GCGdatabase=3
sets the priority (in the range 1-5) for loading an entry from the GCG EMBL or EMNEW
database. Low numbers are used first. Specify 0 to turn off this method of reading an entry.
-DATfile=2
sets the priority (in the range 1-5) for loading an entry from an EMBL or GCG format file with
the extension .dat. Low numbers are used first. Specify 0 to turn off this method of reading
an entry.
-FLATfile=4
sets the priority (in the range 1-5) for loading an entry from a flat file with the extensions .seq
for the sequence and .ft for the feature table. Low numbers are used first. Specify 0 to turn
off this method of reading an entry.
-NOSINGlecommand
sets NewFeatures to remain in Command Mode after processing a command.
-TRANSlate=TableName
gives an alternative translation table for (for example) organelle sequences. Valid translation
tables are listed in section 7.5.5 of Appendix V in the DDBJ/EMBL/GenBank
FeatureTable Definition or in the Data Library documentation. New tables are easily
built by editing an existing table to create a file in your own directory and using that filename
as the TableName.
"TableName" can be the standard name of the ID number of the table. The default is table 1
(name: SGC0)
-TRace=TraceFile
requests a trace of a feature table load to identify problems in converting existing feature
tables into features, groups and exons.
-TESTfile=newfeatures.test
specifies the output file name for the TEST command.
-CIRCular
specifies that the input sequence is circular. This at present has no effect, but could be used in
future to direct automatic generation of joins across the end of the sequence if there is a
demand.
-SHOWREVerse
displays both strands of the sequence on the screen.
-KEYfile=EGenRunData:newfeatures.key
sets the name of the control file that specifies feature keys and their permitted and mandatory
qualifiers.
-QUALfile=EGenRunData:newfeatures.qualify
sets the name of the control file that specifies feature qualifiers and their possible values.
[NO]-ECfile=EGenRunData:newfeatures.ec
sets the name of the reformatted ENZYME database used to convert EC numbers into
standard enzyme names. The -NOECfile option makes NewFeatures start
faster by not reading the ENZYME database, but product names cannot then be automatically
updated.
-TABLEfile=EGenRunData:newfeatures.tables
sets the name of the control file that specifies feature keys and their permitted and mandatory
qualifiers.
Printed: April 22, 1996 15:54 (1162)
Screen Mode Commands
[n] is an optional numeric parameter.
G, A, T, C, . . . - insert a sequence character
+---------+---------+---------+ +---pf1---+---pf2---+---pf3---+---pf4---+
! ! !UNDELFEAT! ! ! ! !UNDELFEAT!
! ! ! ! ! ! ! ! !
! FIND ! NEWFEAT ! DELFEAT ! ! GOLD ! HELP ! FIND ! DELFEAT !
+---------+---------+---------+ +----7----+----8----+----9----+---dash--+
! NOSHOW ![GO-FROM]! [GO-TO] ! ! COMMAND !NOBETWEEN!NOREPLACE! /= !
! ! ! ! ! ! ! ! !
! SHOW ! 50-LEFT ! 50-RIGHT! ! SUBS ! BETWEEN ! REPLACE ! DELQUAL !
+---------+---------+---------+ +----4----+----5----+----6----+--comma--+
! (GROUP) ! (GROUP) !COMPLEMEN! TOP !
Other useful keys: ! ! ! ! !
! FORWARD !NOFORWARD! NOCOMPL ! PREV-FT !
/ add or change qualifier +----1----+----2----+----3----+--enter--+
? find a sequence pattern ! FROM < ! TO > ! ORF ! !
< move left 50 base ! ! ! ! BOTTOM !
> move right 50 bases ! FROM ! TO ! STOP ! !
left move left +---------0---------+---dot---+ +
right move right ! BROWSE prev ! NEWFEAT ! !
up move between nuc/protein ! ! ! NEXT-FT !
down move between nuc/protein ! BROWSE next ! NUMBER ! !
: enter command mode +---------+---------+---------+---------+
ctrl-H start of sequence
ctrl-E end of sequence
Top-row keys:
+---help--+---------Do--------+ +---f17---+---f18---+---f19---+---f20---+
! ! ! ! ! ! ! !
! ! ! ! ! ! ! !
! HELP ! COMMAND ! ! ! ! ! CURRENT !
+---------+---------+---------+ +---------+---------+---------+---------+
Press
General Commands
Commands end with
Feature Editing Commands
Commands end with
Sequence editing commands:
Commands end with
Minimum syntax: % newfeatures [-INfile1=]sample.seq -Default
Prompted Parameters: None
Local Data Files
[-INITilize=]newfeatures.init command line initializing file
-KEYfile=filename valid feature keys and qualifiers
-QUALfile=filename valid qualifier names and values
-[NO]ECfile=filename reformatted ENZYME database. -NOEC skips
-TABLEfile=filename GCG version of translation tables
Optional Parameters:
-ORAcle=2 priority for loads from local (Oracle) database
-GCGdatabase=3 priority for loads from GCG database
-DATfile=2 priority for loads from a .DAT file
-FLATfile=4 priority for loads from .FT and .SEQ files
-[NO]SINGlecommand stays in command mode or returns to screen mode
-TRANSlate=sgc0 default translation table
-TRace=filename trace parsing of an existing feature table
-TESTfile=filename output file for the TEST command
-CIRCular treat sequence as circular (not yet implemented)
-SHOWREVerse show both strands on screen
-PAUse read messages before screen mode