NewFeatures is an interactive editor for entering and modifying the feature table and for minor editing of the sequence itself.
NewFeatures uses the screen of your terminal as a window into a data library entry. It works like the VMS text editor EDT or the GCG sequence editing program SeqEd. You can add, modify and delete feature table items by either typing the sequence positions or by searching for known short sequences and pointing to the correct feature position with the cursor.
You can also update the sequence if it is found to be incorrect. Changes you make in the sequence take place at the cursor position and are reflected immediately on the screen. Feature positions are updated automatically. You can insert or delete bases, move the cursor, and search for patterns.
If the feature table already exists, either from an existing entry, or from the output of a previous NewFeatures run, the feature table will be loaded and an attempt will be made to create the correct feature groups for coding sequences and repeats.
The feature table can be in a file in your current directory, in the GCG copy of the database, or loaded by a locally defined "Grab". There are command line options to tell NewFeatures where to look for the latest version of the feature table and sequence.
The command file EGenCom:Grab.Com (if present) is executed to access local data collections.
NewFeatures will let you change the positions of the keys on your terminal keyboard to make it more convenient to enter the letters G, A, T, and C for sequence editing. The method is the same as for the GCG program SeqEd.
This program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Hinxton Hall, Cambridge, CB10 1RQ, UK).
All EGCG programs are supported by the EGCG Support Team, who can be contacted by E-mail (egcg@embnet.org).
Users already familiar with the VMS EDT editor or the SeqEd program will learn to use NewFeatures quickly. When you run NewFeatures with a command like % NewFeatures x12345.dat, your screen will look something like this:
X12345.Dat NEWFEATURES FH Key Location/Qualifiers FH 1 >> FT CDS 262..1299 FT /product="amidase" 2 * FT mRNA <1..>2657 0 10 20 30 40 50 60 70 ....|.........|.........|.......:.|.........|.........|.........|.........|.... AGCTTCCGTGCGAATGATGGCATGCATGCTATCTCAGGCTCGCACCATGTGCTTTCGCGATCGCGCCGATTACA S F R A N D G M H A I S G S H H V L S R S R R L H A S V R M M A C M L S Q A R T M C F R D R A D Y I L C P E * W H A C Y L R L A P C A F A I A P I T
If you name a sequence file that already exists, NewFeatures will display the existing feature table in the top part of the screen followed by the sequence and the three protein sequence translations in the present direction (forward or reversed).
The nucleotide and protein sequences, the command line, and the message line are always in the bottom few lines of the screen. The rest of the screen is used to display the feature table. If you are editing a large features table, you can use a workstation screen set to 55 lines \ of the feature table than you can see on a normal terminal.
If the sequence you name does not exist,
NewFeatures will start in SeqEdit Mode (see below)
to allow you to enter the new sequence.
Type If you are creating a new feature table,
of if the source key is missing,
NewFeatures will create a source key and prompt you for the organism name.
The ">>" pointer in the feature table display shows the current feature line that is being edited.
Coding regions and groups of repeats have more than one feature line,
indicated by an asterisk (*)
on the other lines in the group.
The The Typing a number first causes the The SEARCH command moves to the next feature (starting at the present line)
that contains specified text in the feature key of qualifiers.
The pointer is saved for each table,
so when you browse another table and return later you will be on the same feature line.
To move the cursor to the right,
use the You can type a number followed by a carriage return and the cursor will move to that sequence position.
You can type a number followed by an arrow key to move a specific number of bases to move to the left or right.
10 You can move 50 bases to the right with the ">" key,
or with the You can move 50 bases to the left with the "<" key,
or with the
The command PROtein in Command Mode moves the cursor to the protein sequence and makes NewFeatures search for an exact match to a protein sequence.
The commands NUCleotide and NOPROTein in Command Mode make searches recognize patterns containing IUB nucleotide ambiguity symbols.
The The By default,
NewFeatures shows only the forward strand on the screen The SHOWREVerse command,
and the command line option,
tell NewFeatures to also show the complementary strand.
This takes up one extra line on the screen,
and so leaves a little less space to display feature table lines.
To search for a sequence pattern,
type a ? or use the If your cursor is on the nucleotide sequence,
NewFeatures will search for a nucleotide pattern,
using the normal ambiguity codes.
If your cursor is on one of the protein sequence translations,
NewFeatures will search for an exact peptide sequence (using the default translation table).
NewFeatures uses the same rules for pattern definition and recognition as the programs SeqEd,
FindPatterns,
MapPlot,
Map,
and MapSort.
Even if NewFeatures is searching the nucleotide sequence,
you can request a perfect match search by typing "=" after the "?".
For example,
?=RCT will only match RCT (case does not matter)
no matter which kind of sequence the cursor is on.
Finding a "Marked" Position You can mark a position in a sequence to which you wish to return.
You give the marked position a letter (like giving it a name)
using the Command Mode command Mark (see below).
Then,
in Screen Mode,
a single quote followed by the letter used to mark the sequence (for example 'x)
will move the cursor to the sequence position where that mark was defined.
The marks are not saved when you exit from NewFeatures
Type the Use the
New feature lines are created with the command FEATURE,
or more commonly with the Once you have a feature line on the screen,
you can also create new lines by deleting any feature and then undeleting it more than once (just like "cut and paste" in an editor).
You can then change the type of the new copies with the KEY command,
and/or change their locations.
When you create a "CDS" feature,
several additional lines are automatically created.
These include "mat_peptide",
"mRNA",
and "prim_transcript" if they are regarded as "standard".
You are also asked how many exons the CDS has,
or whether it is a cDNA.
If the CDS is made up of one or more exons,
the "exon" and "intron" features are also created.
These are then used for defining the feature locations of the "joined" features.
You can also specify a join across entries for a new CDS feature.
You are then asked for the accession number of each exon,
and the other entries are loaded into memory automatically.
Feature locations at their simplest are in the form "from".."to" or simply "from" if both positions are the same (a modified base position for example).
You set the "from" position usually by moving the sequence cursor to the first base of the feature (using pattern searching,
or typing the sequence position followed by If you know the exact base number of the "from" position,
you can set it faster by typing the number and hitting keypad-1 immediately.
The sequence position cursor is not changed,
only the feature location is updated.
The "to" position is set in the same way,
using the You can "turn off" the "from" and "to" positions (set them back to "<1" and ">end")
by setting the position to unknown using the commands FROM ? and TO ?.
Setting the position to 0 (zero)
has the same effect.
More complex feature locations are "joins" of several sequence ranges.
The individual ranges are set as exons,
and are automatically copied to the "join" features and to the "intron" features.
The exons represent usually the sections of the primary transcript that appear in the mRNA.
The "CDS" and "mat_peptide" features do not extend to the extreme ends of the exons,
so these are defined by setting a "from" and "to" position for the "CDS" or "mat_peptide" feature line as for a simple location.
Some other feature keys,
such as "tRNA" can also be made by processing of the original sequence.
You can convert a feature to a "joined" location with the EXONS command and convert back to a simple location with the NOOPER command.
In some cases the features location is a "group",
"order" or "one-of" several possible locations.
These are set in the same way as "join" locations,
except that there is no need for introns.
The OPER command sets the type of location.
There are also commands REPLACE and SUBSTITUTE which define a location in the form replace(from..to,"subs")
for mutants,
conflicts,
and so on.
The MUTATE command switches the original sequence and the substitute string.
Note that this can cause conflicts in an entry with a large number of overlapping "replace" locations.
In a few extreme cases,
it may not be possible to enter the feature location by editing exon positions (for example,
if RNA editing occurs at many positions).
In these cases,
the command TEXT LOC provides a location as an edited text string which is not further validated by the program.
There is a simple check for two possible text locations: sequences in the form "acgt" or dot ranges in the form (12.13).
Sequences are accepted as complete locations,
and dot ranges are accepted as complete locations or as "from" or "to" positions.
Any other text location or "from" or "to" position is accepted,
but a warning message is issued.
Qualifiers are specified by typing "/name=value" in the same way as they appear in the final feature table.
The name need only be the first few letters if there is no other qualifier that can match.
The same is true for values which are limited by the controlled vocabulary and listed in the Appendix of the Feature Table Definition,
for example modified base names and repeat types.
For qualifiers with a larger controlled vocabulary the F13 key starts a query to select an organism name.
If this option is available,
you will see the message "or [F13] for Controlled Vocab" at the bottom of the screen.
The value is optional.
If you do not give one,
you will be offered the existing value (if any)
to edit.
If you decide to make no change,
simply press Certain qualifiers may appear only once for a feature line,
while others (/note is an obvious example)
can occur any number of times.
To create a new copy of a qualifier,
put + before the qualifier name.
For example,
/+NOTE will create a new /note qualifier.
If several copies of a qualifier already exist,
you will be asked which copy you want to edit.
Qualifiers with complex values,
such as /anticodon,
can be entered in a simpler form.
As the "pos:" and "aa:" parts are fixed,
the program only requires a value of (32..34,Met)
and will fill in the rest.
For /cons_splice a value of (n,y)
is sufficient.
In some cases,
it is useful to be able to copy qualifiers to several other featues.
One way is to add the qualifiers for one feature,
then delete and undelete it to create several copies with the same qualifiers.
You can also repeat the last qualifier with the command "/=" which is also on the keypad as
Qualifier values can be too long to fit on a single line.
In such cases,
a text editor is available.
When entering a qualifier value,
simply hit You can move around using the arrow keys,
and make insertions and deletions as you wish.
The editing window will scroll up and down if there too many lines to fit on the screen.
As in DCL,
If an existing qualifier value is already too long to fit on one line,
you will be put into the editor mode immediately.
If the value reaches the end of the line while you are entering it,
you will also be put into the editor mode.
For qualifiers with a specified controlled vocabulary the F13 key starts a query to select a value.
If a controlled vocabulary is available,
you will see the message "or [F13] for Controlled Vocab" at the bottom of the screen.
In Screen Mode the cursor shows your position in the sequence.
You can move around in the sequence,
add bases,
delete bases,
and search for patterns.
You can insert any valid GCG nucleotide sequence symbol (GCG Program Manual Appendix III)
into the sequence by typing the symbol.
It will be inserted at the cursor (or the base at the cursor will be overwritten in OverStrike mode).
The INCLUDE command copies a sequence range from any file in GCG format into the present sequence.
The The DELETE command deletes a specific base range in command mode.
Here is the summary of Screen Mode commands and the keypad layout in the on-line help.
Type a Ctrl-Z or : in Screen Mode to enter Command Mode,
or use the
NewFeatures command editing is modelled on VMS DCL command line editing.
The
NewFeatures will let you modify and execute previous commands as in DCL command line editing.
The
NewFeatures normally returns to Screen Mode after each command.
If you have used the command NOSINGle then NewFeatures remains in Command Mode until you type
Only the capitalized portion of the commands described in the documentation below needs to be typed.
All commands may be prefixed with NO,
and allow up to two numbers before the command and any command text after.
Extra options will be simply ignored if the command does not use them.
Some commands can be preceded with numeric parameters or succeeded with a file name or text.
The square brackets ( [ ] )
in the documentation below show command parameters that are optional (you can leave them out and will be prompted for values if needed).
All commands accept up to two numbers before the command name,
and a free text value after.
Command names may also be prefixed with NO.
If any of these options is not used by the command it will simply be ignored.
For example,
12 34 NOEXIT testing will simply exit.
Here is the summary of Command Mode commands you would see with the command Help.
Each command is described in detail in the next section.
or any other numerical value will move the cursor to that sequence position.
values less then 1 move to the start of the sequence,
and values greater than the sequence length move to the end.
BUFFer [Command] provides management of the internal buffer space used to store all text values.
The available commands are: SHOW to show how full the buffer is,
COMPRESS to clear deleted strings if the buffer is filling up,
and LIMIT to set the maximum space used before a compress is done automatically.
Change returns your session to Screen Mode.
Note that the entire command is optional and a simple [NO]COMPlement reverses the direction in which the sequence is annotated.
The sequence display is unchanged (so it will still match a figure in a paper for example)
but the protein sequences are now the translations in the reverse direction and new features are automatically set to be in the reverse direction.
When you specify a "from" and "to" position in reverse direction,
the positions are displayed as "complement(to..from)" in the feature table.
You can also use keys [NO]ENDFind makes sequence searches put the cursor at the end of the search pattern instead of the start.
EXit [FileName] works exactly like Write followed by SEQWrite except that the session with NewFeatures ends after the feature table is written out into a new feature file,
and the sequence (if modified)
is written out into a new sequence file.
Find is the command line version of the > or [n] FRAme moves the cursor to the present reading frame (taking the cursor position as the first base in the codon),
and turns on the PROTein option.
[s] Go goes to position [s] in the sequence.
This is the command line version of typing the base position and hitting Help shows the commands available to the Screen and Command Modes of SeqEd.
You can also use the NUCleotide uses the nucleotide sequence for searching,
and moves the cursor to the sequence line with at the present base position.
All searches will use the IUB base ambiguity codes.
The PROTein command is used when searching in the protein translations.
[NO]PROtein uses the protein sequences for searching,
and moves the cursor to the reading frame that has the present cursor position as the first base in a codon.
All searches will now be for exact matches,
until the NUCLeotide command resets the option.
Quit terminates a session with NewFeatures without writing a new features or sequence file.
Use Quit instead of Ctrl-Y to terminate a session with NewFeatures (If you use Ctrl-Y,
the next time you run Features from the same directory,
it will try to recover from what it assumes was a system crash during your previous session.)
[n] READ [method] sets the priorities for reading other entries by various methods.
The available methods are ORACLE to use EGenCom:Grab.Com to unload an entry,
DAT to read an existing file called "accno.dat",
FLAT to read a flatfile called "accno.seq" (and perhaps accno.ft for the feature table),
and GCG to read the entry for the GCG version of the EMBL database.
The command SHOW READ shows the current priorities.
A value of 1 is the highest priority,
a value of 5 is the lowest.
A value of 0 (zero)
means that the method is not to be used.
In addition to the priority values,
methods with the same priority are tried in the order ORACLE,
FLAT,
DAT,
GCG.
REDraw redraws your terminal screen.
This is useful if a system message appears on your screen.
You can also use the Ctrl-W key in Screen Mode.
reports the sequence name and length at the bottom of the screen [f]SHOW [option] shows valid feaure keys,
qualifiers,
and controlled vocabulary.
Also shows loaded entries,
marked sequence positions,
and so on.
The possible options are: KEY - a list of all valid feature keys and mandatory qualifiers QUAL - a list of valid qualifiers for the feature key of the present line (the [f] value can be used to move to another feature line in one command).
BROWSE - a list of the other entries loaded,
with number of features and sequence length MARK - a list of marked sequence positions (set with the MARK command).
LABEL - a list of known labels in the present entry.
READ - lists the priority values for methods of raeding entries,
and shows the actual order in which the methods will be tried.
[NO]SHOWREVerse turns on (or off)
the display of the complementary strand on screen.
As this takes up an extra line that could be used to display more features,
the complementary strand display if off by default.
[NO]SINGLEcommand returns to screen mode after each command,
or (NOSINGLE)
remains in command mode until a null command (just typing [n]TABle [name] sets the default translation table to be the one with standard name [name] or with ID number [n] in Appendix V
of the Features Table Definition.
The default table is normally number 1 (name: SGC0).
Other tables should be set with /transl_table qualifiers so that other software can translate the features correctly.
A common use for this command is to see the protein sequence displayed on screen using an alternative translation table.
[NO]THRee uses 3-letter or 1-letter codes for amino acids.
The single letter codes are easier to see and can be learned quickly once you begin to use them.
The 1-letter code is shown under the first base of the codon in each of the three reading frames.
The 3-letter code is shown under the three bases of the codon in each of the three reading frames.
This option can be useful where a paper has used the 3-letter codes.
WQ [filename] writes the feature table to a file and quits.
/[+][qualname][=][qualvalue] enters qualifiers and their values for the current feature line.
The qualifier name is prompted for if none is entered.
If the name is invalid the command fails.
If the name is ambiguous (for example "polyA" could be "polyA_signal" or "polyA_site")
a list of alternatives is given in a numbered menu.
If the qualifier has a controlled vocabulary the F13 key can be used to start a query to select a suitable value.
If a controlled vocabulary is available,
the message "or [F13] for Controlled Vocab" appears at the bottom of the screen.
If the qualifier has a value,
one must be entered in the correct syntax.
To edit an existing qualifier,
specify the name but not the value.
You will then be able to edit the current value.
If you wish to cancel the command,
entering a null qualifier name or value will not change anything.
To create a new qualifier when one already exists (for example,
a second "note"),
use "+" in front of the qualifier name.
As a short-cut when the same qualifier is required on several features,
the command "/=" will repeat the previous qualifier name and value.
This command is available on the keypad with [f] ACCno [accno] sets the location of a features (usually an exon)
to be in another entry.
The program will offer to load the entry if it is not already loaded.
[n f]ADDEXon adds a new exon number [n] to the group for feature [f],
which can be included using the USE command.
Note that this moves all higher exon numbers up by one.
The intended uses are for alternate splicing and for adding internal exons to existing feature table entries.
[NO]BETWeen sets the feature location as from^to or from..to.
BROWse [accno] sets the display to the feature table of another entry loaded automatically when a location was specified in that entry,
or with the LOAD command.
[f] CHEck checks a coding feature for a start codon at the beginning,
a stop codon at the end,
and no stop codons in the rest of the sequence.
The feature line must be CDS,
mat_peptide or sig_peptide.
CURrent sets the display to the current feature table so that editing can continue.
[b e] DELFeat deletes features [b] to [e].
If only one line is given,
only one line is deleted.
If no line is given the current feature line is deleted.
You can also delete the current feature line with the The UNDELFeat command restores the last deleted feature line,
including all qualifier values.
[f] DELQual [qualname] deletes the qualifier and its value from feature line [f].
If [f] is not given,
the current feature line is used.
This is the command line version of the [f f] DUPlicate makes a copy of feature [f],
for example to create a second "CDS" feature from a polycistronic (bacterial)
mRNA.
If [f] is not given,
the current feature line is used.
If the second [f] is not given,
the copies will be together.
If the second [f] is higher than the last feature line,
the copy will be at the bottom of the feature table.
This command (or the equivalent of deleting a feature and then undeleting it twice)
is the simplest way to create new feature lines.
If the feature is a "join" of several exons,
the same exons are used for all the duplicate features although each "join" feature can have its own "from" and "to" positions within the range of the exons.
[f] [NO]END label sets the label for the last exon of a join across entries.
If [label] is ":" the program will prompt for accession number and label.
[e f] EXONs creates [e] exons (and [e-1] introns)
and makes the location of feature [f] a "join" of all the exons.
The number of exons is prompted for if not given.
If [f] is not given the current feature line is used.
[b e] FEATure [key] starts a new feature line,
and prompts for the feature key if [key] is not given.
If [b] and [e] are given,
new feature line(s)
are created in that range,
but normally the command is used alone to create a new feature line at the bottom of the table.
If the feature key is "CDS" there is a prompt for the number of exons and an "mRNA" line is also created.
For 2 or more exons,
the "prim_transcript",
"exon" and "intron" lines are created automatically.
If the feature key is "CDS ACROSS" there is a prompt for the accession number,
label,
"from" and "to" position for each exon.
If the entry is not yet loaded,
there is a prompt for an immediate load.
Otherwise there is only a prompt for the "from" and "to" positions.
If the feature key is "repeat_unit" there is a prompt for the number of repeats,
and the extra lines are generated automatically.
You can also use the [f] [NO]FORWard [GRoup] sets the feature direction for feature line [f].
If [f] is not given,
the current feature line is used.
If the command is followed by GRoup the entire feature group is set.
[p f] FRom [?] [<] [text-location] sets the "from" position for a feature line.
If [p] is not given,
the present sequence position is used.
If [f] is not given,
the current feature line is used.
In some cases the position is unknown but must still be included in the feature table as "<1" (or ">end" for the complement direction).
You can use 0 as the position or put a "?" after the command to specify an unknown position.
If the exact position is not known,
or it is beyond the start or end of the sequence,
you can set the "less than" sign (<)
by typing "<" at the end of the command.
If the location is more complex (for example,
(12.34)
to specify a position within a range)
you can specify the location as text after the command.
If the text is not a valid dot range,
it is accepted but a warning message is issued.
The "from" position is also used by "joined" features (such as CDS)
to specify the start position.
By default the start position is the beginning of the first exon but this is overridden by setting a "from" position on the feature line.
For joins across entries,
if the join starts in another entry use the START command to set the accno:range or accno:label of the first (partial)
exon used.
[f f] GRoup puts a feature into the same group as another feature,
for example to link a new "CDS" feature to existing "exon" features.
The first number defines any feature in the group you want to add to,
and defaults to creating a new group.
The second number defines the feature you are changing,
and defaults to the current feature.
[f] [NO]HIde specifies whether features will appear in the output file.
"Hidden" features (for example,
exons that are only used to point to positions for a "joined" feature)
appear on the screen with the feature key in quare brackets ,
for example [exon].
These feature lines are not included in the output file.
The SCREEN MODE
Moving the Current Feature Line Pointer
Moving the Sequence Position Cursor
Nucleotide and Protein Sequence Lines
Finding Patterns
Leaving Screen Mode
EDITING THE FEATURE TABLE
Creating Feature Lines
Setting Feature Locations
Adding and Changing Qualifiers
Copying Qualifiers
Editing Qualifier Values
EDITING THE SEQUENCE
Entering or Editing a Sequence
Deleting Bases in a Sequence
Screen Mode Summary
Screen Mode Commands
[n] is an optional numeric parameter.
G, A, T, C, . . . - insert a sequence character
Keypad Layout Diagram
+---------+---------+---------+ +---pf1---+---pf2---+---pf3---+---pf4---+
! ! !UNDELFEAT! ! ! ! !UNDELFEAT!
! ! ! ! ! ! ! ! !
! FIND ! NEWFEAT ! DELFEAT ! ! GOLD ! HELP ! FIND ! DELFEAT !
+---------+---------+---------+ +----7----+----8----+----9----+---dash--+
! NOSHOW ![GO-FROM]! [GO-TO] ! ! COMMAND !NOBETWEEN!NOREPLACE! /= !
! ! ! ! ! ! ! ! !
! SHOW ! 50-LEFT ! 50-RIGHT! ! SUBS ! BETWEEN ! REPLACE ! DELQUAL !
+---------+---------+---------+ +----4----+----5----+----6----+--comma--+
! (GROUP) ! (GROUP) !COMPLEMEN! TOP !
Other useful keys: ! ! ! ! !
! FORWARD !NOFORWARD! NOCOMPL ! PREV-FT !
/ add or change qualifier +----1----+----2----+----3----+--enter--+
? find a sequence pattern ! FROM < ! TO > ! ORF ! !
< move left 50 base ! ! ! ! BOTTOM !
> move right 50 bases ! FROM ! TO ! STOP ! !
left move left +---------0---------+---dot---+ +
right move right ! BROWSE prev ! NEWFEAT ! !
up move between nuc/protein ! ! ! NEXT-FT !
down move between nuc/protein ! BROWSE next ! NUMBER ! !
: enter command mode +---------+---------+---------+---------+
ctrl-H start of sequence
ctrl-E end of sequence
Top-row keys:
+---help--+---------Do--------+ +---f17---+---f18---+---f19---+---f20---+
! ! ! ! ! ! ! !
! ! ! ! ! ! ! !
! HELP ! COMMAND ! ! ! ! ! CURRENT !
+---------+---------+---------+ +---------+---------+---------+---------+
Press
COMMAND MODE
Editing NewFeatures Commands
Editing Previous NewFeatures Commands
Returning to Screen Mode
Commands May Be Truncated
Parameters are Used with Commands
Command Mode Summary
Command Mode Summary - General
General Commands
Commands end with
Command Mode Summary - Feature Editing
Feature Editing Commands
Commands end with
Command Mode Summary - Sequence Editing
Sequence editing commands:
Commands end with
GENERAL COMMANDS
37
Seq
FEATURE EDITING COMMANDS