March 6, 2000

Determining Function from
Sequence
General Books
Adams, M. D., Fields, C. and Venter, J. C. (1994). Automated DNA
Sequencing and Analysis. New York: Academic Press, 368 pages.
Baldi, P. and Brunak, S. (1998).
Bioinformatics: The Machine Learning Approach (1st ed.). Cambridge,
MA: The MIT Press.
Baxevanis, A. D. and Ouellette, B. F. F.
(1998). Bioinformatics: A practical Guide to the Analysis of Genes
and Proteins. New York, NY: John Wiley & Sons, Inc.,
356.
Bishop, M. J. (1994). Guide to Human Genome Computing. London:
Academic Press, 350 pages.
Brutlag, D. L. and Sternberg, M. J. E. (1996). Sequences and
Topology. London: Current Biology Ltd., 427 pages.
Creighton, T. E. (1993). Proteins: Structures and Molecular
Properties (Second Edition ed.). New York: Freeman.
Cover, T. M. and Thomas, J. A. (1991). Elements of Information
Theory (1st ed.). New York NY: John Wiley and Sons Inc.
Doolittle, R. F. (1986). Of Urfs and Orfs: A Primer on How to
Analyze Derived Amino Acid Sequences. University Science Books, Mill
Valley, California.
Doolittle, R. F. (1990). Molecular Evolution: Computer Analysis of
Protein and Nucleic Acid Sequences (1 ed.). Methods in Enzymology
Volume 183, New York: Academic Press.
Doolittle, R. F. (1996). Computer Methods
for Macromolecular Sequence Analysis. (Vol. 266). New York: Academic
Press. 711 Pages.
Durbin, R., Eddy, S., Krogh, A. and
Michison, G. (1998). Biological Sequence Analysis (1st ed.).
Cambridge, UK: Cambridge University Press.
Fasman, G. D. (1989). Prediction of Protein Structure and the
Principles of Protein Conformation. New York NY: Plenum Press,
Gribskov, M. and Devereux, J. (1991). Sequence Analysis Primer.
New York: Stockton Press, 279 pages.
Gusfield, D. (1997). Algorithms on
Strings, Trees and Sequences. (1st. ed.). Cambridge, UK: Cambridge
University Press, 534 pages.
Hunter, L. (1993). Artificial Intelligence and Molecular Biology.
Menlo Park, CA: AAAI Press, 470 pages.
Hunter, L., Searls, D. and Shavlik, J. (1993). First International
Conference on Intelligent Systems for Molecular Biology. Menlo Park,
CA.: AAAI Press.
Lander, E. S. and Waterman, M. S. (1995). Calculating the Secrets
of Life: Applications of the Mathematical Sciences in Molecular
Biology. Washington D. C.: National Academy Press, 285 pages.
Lesk, A. (1991). Protein Architecture: A Practical Approach .
Oxford: IRL Press at Oxford University Press. 287 pages
Salzberg, S. L., Searls, D. B. and Kasif,
S. (1998). Computational Methods in Molecular Biology. Amsterdam:
Elsevier, 371.
Schultze-Kremer, S. (1994). Advances in Molecular Bioinformatics.
Washington D.D.: IOS PRess, 259 pages.
Smith, D. W. (1994). Biocomputing: Informatics and Genome
Projects. New York: Academic Press Inc., 336 pages.
Trends Guide to Bioinformatics,
Supplement, December 1998, Elsevier.
von Heijne, Gunnar (1987). Sequence Analysis in Molecular Biology:
Treasure Trove or Trivial Pursuit, Academic Press, New York. 188
Pages
Waterman, M. (1988). Mathematical Methods for DNA Sequences, CRC
Press, Cleveland Ohio. 283 Pages.
Waterman, M. S. (1995). Introduction to
Computational Biology. Chapman & Hall Press, London 430
pages.
General Reviews
Altschul,
S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990). A
Basic Local Alignment Search Tool. J. Mol. Biol., 215,
403-410.
Altschul,
S. F., Boguski, M. S., Gish, W. and Wootton, J. C. (1994). Issues in
searching molecular sequence databases. Nat Genet 6 (2),
119-29.
Altschul,
S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller,
W. and Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs. Nucleic Acids Res,
25(17), 3389-402.
Altschul, S. F. and Koonin, E. V. (1998). Iterated profile
searches with PSI-BLAST--a tool for discovery in protein databases.
Trends Biochem Sci, 23(11), 444-447.
Doolittle,
R. F. (1994). Protein sequence comparisons: searching databases and
aligning sequences. Curr Opin Biotechnol 5 (1), 24-8.
Hogue,
C. W. (1997). Cn3D: a new generation of three-dimensional molecular
structure viewer. Trends Biochem Sci, 22(8), 314-6.
Holm,
L. and Sander, C. (1994). Searching protein structure databases has
come of age. Proteins 19 (3), 165-73.
Holm,
L., & Sander, C. (1996). Mapping the protein universe. Science,
273(5275), 595-603.
Rost,
B. and Sander, C. (1994). Structure prediction of proteins--where are
we now? Curr Opin Biotechnol 5 (4), 372-80.
Russell,
R. B., & Sternberg, M. J. (1995). Structure prediction. How good
are we? Curr Biol, 5(5), 488-90.
White,
S. H. (1994). Global statistics of protein sequences: implications
for the origin, evolution, and prediction of structure. Annu Rev
Biophys Biomol Struct 23 , 407-39.
Molecular Databases on the Internet
Attimonelli, M. et al. (2000). MitBASE : a comprehensive and
integrated mitochondrial DNA database. The present status [In
Process Citation]. Nucleic Acids Res, 28(1), 148-152.
Attwood,
T. K., Croning, M. D., Flower, D. R., Lewis, A. P., Mabey, J. E.,
Scordis, P., Selley, J. N. and Wright, W. (2000). PRINTS-S: the
database formerly known as PRINTS [In Process Citation].
Nucleic Acids Res, 28(1),
225-227.
Bairoch,
A. (2000). The ENZYME database in 2000 [In Process Citation].
Nucleic Acids Res, 28(1), 304-305.
Bairoch,
A. and Apweiler, R. (2000). The SWISS-PROT protein sequence database
and its supplement TrEMBL in 2000 [In Process Citation].
Nucleic Acids Res, 28(1), 45-48.
Baker,
W., van den Broek, A., Camon, E., Hingamp, P., Sterk, P., Stoesser,
G. and Tuli, M. A. (2000). The EMBL nucleotide sequence database
[In Process Citation]. Nucleic Acids Res, 28(1),
19-23.
Ball,
C. A. et al. (2000). Integrating functional genomic information into
the saccharomyces genome database [In Process Citation].
Nucleic Acids Res, 28(1), 77-80.
Banerjee-Basu,
S., Ryan, J. F. and Baxevanis, A. D. (2000). The homeodomain
resource: a prototype database for a large protein family [In
Process Citation]. Nucleic Acids Res, 28(1),
329-330.
Barker,
W. C. et al. (2000). The protein information resource (PIR) [In
Process Citation]. Nucleic Acids Res, 28(1),
41-44.
Bateman,
A., Birney, E., Durbin, R., Eddy, S. R., Howe, K. L. and Sonnhammer,
E. L. (2000). The pfam protein families database [In Process
Citation]. Nucleic Acids Res, 28(1), 263-266.
Baxevanis,
A. D. (2000). The molecular biology database collection: an online
compilation of relevant database resources [In Process
Citation]. Nucleic Acids Res, 28(1), 1-7.
Benson,
D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A.
and Wheeler, D. L. (2000). GenBank [In Process Citation].
Nucleic Acids Res, 28(1), 15-18.
Berman,
H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig,
H., Shindyalov, I. N. and Bourne, P. E. (2000). The protein data bank
[In Process Citation]. Nucleic Acids Res, 28(1),
235-242.
Blake,
J. A., Eppig, J. T., Richardson, J. E., Davisson, M. T. and the Mouse
Genome Database, G. (2000). The mouse genome database (MGD):
expanding genetic and genomic resources for the laboratory mouse
[In Process Citation]. Nucleic Acids Res, 28(1),
108-111.
Brenner,
S. E., Koehl, P. and Levitt, M. (2000). The ASTRAL compendium for
protein structure and sequence analysis [In Process
Citation]. Nucleic Acids Res, 28(1), 254-256.
Brookes,
A. J., Lehvaslaiho, H., Siegfried, M., Boehm, J. G., Yuan, Y. P.,
Sarkar, C. M., Bork, P. and Ortigao, F. (2000). HGBASE: a database of
SNPs and other variations in and around human genes [In Process
Citation]. Nucleic Acids Res, 28(1), 356-360.
Bult,
C. J., Krupke, D. M., Sundberg, J. P. and Eppig, J. T. (2000). Mouse
tumor biology database (MTB): enhancements and current status [In
Process Citation]. Nucleic Acids Res, 28(1),
112-114.
Catalano,
D., Licciulli, F., D'Elia, D. and Attimonelli, M. (2000). Update of
KEYnet: a gene and protein names database for biosequences functional
organisation [In Process Citation]. Nucleic Acids Res, 28(1),
372-373.
Cheung,
K. H., Osier, M. V., Kidd, J. R., Pakstis, A. J., Miller, P. L. and
Kidd, K. K. (2000). ALFRED: an allele frequency database for diverse
populations and DNA polymorphisms [In Process Citation].
Nucleic Acids Res, 28(1), 361-363.
Corpet,
F., Servant, F., Gouzy, J. and Kahn, D. (2000). ProDom and ProDom-CG:
tools for protein domain analysis and whole genome comparisons
[In Process Citation]. Nucleic Acids Res, 28(1),
267-269.
Costanzo,
M. C. et al. (2000). The yeast proteome database (YPD) and
caenorhabditis elegans proteome database (WormPD): comprehensive
resources for the organization and comparison of model organism
protein information [In Process Citation]. Nucleic Acids Res,
28(1), 73-76.
D'Souza,
M., Romine, M. F. and Maltsev, N. (2000). SENTRA, a database of
signal transduction proteins [In Process Citation]. Nucleic
Acids Res, 28(1), 335-336.
De
Rijk, P., Wuyts, J., Van de Peer, Y., Winkelmans, T. and De Wachter,
R. (2000). The european large subunit ribosomal RNA database [In
Process Citation]. Nucleic Acids Res, 28(1),
177-178.
Dicks,
J. et al. (2000). UK CropNet: a collection of databases and
bioinformatics resources for crop plant genomics [In Process
Citation]. Nucleic Acids Res, 28(1), 104-107.
Discala,
C., Benigni, X., Barillot, E. and Vaysseix, G. (2000). DBcat: a
catalog of 500 biological databases [In Process Citation].
Nucleic Acids Res, 28(1), 8-9.
Dralyuk,
I., Brudno, M., Gelfand, M. S., Zorn, M. and Dubchak, I. (2000).
ASDB: database of alternatively spliced genes [In Process
Citation]. Nucleic Acids Res, 28(1), 296-297.
Ellis,
L. B., Hershberger, C. D. and Wackett, L. P. (2000). The university
of minnesota Biocatalysis/Biodegradation database: microorganisms,
genomics and prediction [In Process Citation]. Nucleic Acids
Res, 28(1), 377-379.
Erdmann,
V. A., Szymanski, M., Hochberg, A., Groot, N. and Barciszewski, J.
(2000). Non-coding, mRNA-like RNAs database Y2K [In Process
Citation]. Nucleic Acids Res, 28(1), 197-200.
Gai,
X., Lal, S., Xing, L., Brendel, V. and Walbot, V. (2000). Gene
discovery using the maize genome database ZmDB [In Process
Citation]. Nucleic Acids Res, 28(1), 94-96.
Garavelli,
J. S. (2000). The RESID database of protein structure modifications:
2000 update [In Process Citation]. Nucleic Acids Res, 28(1),
209-211.
Ghosh,
D. (2000). Object-oriented transcription factors database (ooTFD)
[In Process Citation]. Nucleic Acids Res, 28(1),
308-310.
Goto,
S., Nishioka, T. and Kanehisa, M. (2000). LIGAND: chemical database
of enzyme reactions [In Process Citation]. Nucleic Acids Res,
28(1), 380-382.
Gromiha,
M. M., An, J., Kono, H., Oobatake, M., Uedaira, H., Prabakaran, P.
and Sarai, A. (2000). ProTherm, version 2.0: thermodynamic database
for proteins and mutants [In Process Citation]. Nucleic Acids
Res, 28(1), 283-285.
Harger,
C., Chen, G., Farmer, A., Huang, W., Inman, J., Kiphart, D.,
Schilkey, F., Skupski, M. P. and Weller, J. (2000). The genome
sequence DataBase [In Process Citation]. Nucleic Acids Res,
28(1), 31-32.
Henikoff,
J. G., Greene, E. A., Pietrokovski, S. and Henikoff, S. (2000).
Increased coverage of protein families with the blocks database
servers [In Process Citation]. Nucleic Acids Res, 28(1),
228-230.
Hishiki,
T., Kawamoto, S., Morishita, S. and Okubo, K. (2000). BodyMap: a
human and mouse gene expression database [In Process
Citation]. Nucleic Acids Res, 28(1), 136-138.
Hoogland,
C., Sanchez, J. C., Tonella, L., Binz, P. A., Bairoch, A.,
Hochstrasser, D. F. and Appel, R. D. (2000). The 1999 SWISS-2DPAGE
database update [In Process Citation]. Nucleic Acids Res,
28(1), 286-288.
Huang,
H., Xiao, C. and Wu, C. H. (2000). ProClass protein family database
[In Process Citation]. Nucleic Acids Res, 28(1),
273-276.
Huret,
J. L., Minor, S. L., Dorkeld, F., Dessen, P. and Bernheim, A. (2000).
Atlas of genetics and cytogenetics in oncology and haematology, an
interactive database [In Process Citation]. Nucleic Acids
Res, 28(1), 349-351.
Jacobs,
G. H., Stockwell, P. A., Schrieber, M. J., Tate, W. P. and Brown, C.
M. (2000). Transterm: a database of messenger RNA components and
signals [In Process Citation]. Nucleic Acids Res, 28(1),
293-295.
Jankowsky,
E. and Jankowsky, A. (2000). The DExH/D protein family database
[In Process Citation]. Nucleic Acids Res, 28(1),
333-334.
Johnson,
G. and Wu, T. T. (2000). Kabat database and its applications: 30
years after the first variability plot [In Process Citation].
Nucleic Acids Res, 28(1), 214-218.
Kanehisa,
M. and Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes
[In Process Citation]. Nucleic Acids Res, 28(1),
27-30.
Karp,
P. D., Riley, M., Saier, M., Paulsen, I. T., Paley, S. M. and
Pellegrini-Toole, A. (2000). The EcoCyc and MetaCyc databases [In
Process Citation]. Nucleic Acids Res, 28(1),
56-59.
Kawashima,
S. and Kanehisa, M. (2000). AAindex: amino acid index database
[In Process Citation]. Nucleic Acids Res, 28(1),
374.
Kawashima,
T., Kawashima, S., Kanehisa, M., Nishida, H. and Makabe, K. W.
(2000). MAGEST: MAboya gene expression patterns and sequence tags
[In Process Citation]. Nucleic Acids Res, 28(1),
133-135.
Kel-Margoulis,
O. V., Romashchenko, A. G., Kolchanov, N. A., Wingender, E. and Kel,
A. E. (2000). COMPEL: a database on composite regulatory elements
providing combinatorial transcriptional regulation [In Process
Citation]. Nucleic Acids Res, 28(1), 311-315.
Kent,
W. J. and Zahler, A. M. (2000). The intronerator: exploring introns
and alternative splicing in caenorhabditis elegans [In Process
Citation]. Nucleic Acids Res, 28(1), 91-93.
Kikuno,
R., Nagase, T., Suyama, M., Waki, M., Hirosawa, M. and Ohara, O.
(2000). HUGE: a database for human large proteins identified in the
kazusa cDNA sequencing project [In Process Citation]. Nucleic
Acids Res, 28(1), 331-332.
Kolchanov,
N. A. et al. (2000). Transcription regulatory regions database
(TRRD): its status in 2000 [In Process Citation]. Nucleic
Acids Res, 28(1), 298-301.
Krause,
A., Stoye, J. and Vingron, M. (2000). The SYSTERS protein sequence
cluster set [In Process Citation]. Nucleic Acids Res, 28(1),
270-272.
Lanave,
C., Liuni, S., Licciulli, F. and Attimonelli, M. (2000). Update of
AMmtDB: a database of multi-aligned metazoa mitochondrial DNA
sequences [In Process Citation]. Nucleic Acids Res, 28(1),
153-154.
Lo
Conte, L., Ailey, B., Hubbard, T. J., Brenner, S. E., Murzin, A. G.
and Chothia, C. (2000). SCOP: a structural classification of proteins
database [In Process Citation]. Nucleic Acids Res, 28(1),
257-259.
Lopez,
P. J. and Seraphin, B. (2000). YIDB: the yeast intron DataBase
[In Process Citation]. Nucleic Acids Res, 28(1),
85-86.
Maglott,
D. R., Katz, K. S., Sicotte, H. and Pruitt, K. D. (2000). NCBI's
LocusLink and RefSeq [In Process Citation]. Nucleic Acids
Res, 28(1), 126-128.
Maidak,
B. L. et al. (2000). The RDP (Ribosomal database project) continues
[In Process Citation]. Nucleic Acids Res, 28(1),
173-174.
Mewes,
H. W. et al. (2000). MIPS: a database for genomes and protein
sequences [In Process Citation]. Nucleic Acids Res, 28(1),
37-40.
Minoshima,
S., Mitsuyama, S., Ohno, S., Kawamura, T. and Shimizu, N. (2000).
Keio mutation database (KMDB) for human disease gene mutations
[In Process Citation]. Nucleic Acids Res, 28(1),
364-368.
Murvai,
J., Vlahovicek, K., Barta, E., Cataletto, B. and Pongor, S. (2000).
The SBASE protein domain library, release 7.0: a collection of
annotated protein sequence segments [In Process Citation].
Nucleic Acids Res, 28(1), 260-262.
Nagaswamy,
U., Voss, N., Zhang, Z. and Fox, G. E. (2000). Database of
non-canonical base pairs found in known RNA structures [In
Process Citation]. Nucleic Acids Res, 28(1),
375-376.
Nakamura,
Y., Gojobori, T. and Ikemura, T. (2000). Codon usage tabulated from
international DNA sequence databases: status for the year 2000
[In Process Citation]. Nucleic Acids Res, 28(1),
292.
Nakamura,
Y., Kaneko, T. and Tabata, S. (2000). CyanoBase, the genome database
for synechocystis sp. strain PCC6803: status for the year 2000
[In Process Citation]. Nucleic Acids Res, 28(1),
72.
Nelson,
P. S., Clegg, N., Eroglu, B., Hawkins, V., Bumgarner, R., Smith, T.
and Hood, L. (2000). The prostate expression database (PEDB): status
and enhancements in 2000 [In Process Citation]. Nucleic Acids
Res, 28(1), 212-213.
Overbeek,
R., Larsen, N., Pusch, G. D., D'Souza, M., Jr, E. S., Kyrpides, N.,
Fonstein, M., Maltsev, N. and Selkov, E. (2000). WIT: integrated
system for high-throughput genome sequence analysis and metabolic
reconstruction [In Process Citation]. Nucleic Acids Res,
28(1), 123-125.
Palm,
C. J., Federspiel, N. A. and Davis, R. W. (2000). DAtA: database of
arabidopsis thaliana annotation [In Process Citation].
Nucleic Acids Res, 28(1), 102-103.
Pearl,
F. M., Lee, D., Bray, J. E., Sillitoe, I., Todd, A. E., Harrison, A.
P., Thornton, J. M. and Orengo, C. A. (2000). Assigning genomic
sequences to CATH [In Process Citation]. Nucleic Acids Res,
28(1), 277-282.
Pelchat,
M., Deschenes, P. and Perreault, J. P. (2000). The database of the
smallest known auto-replicable RNA species: viroids and viroid-like
RNAs [In Process Citation]. Nucleic Acids Res, 28(1),
179-180.
Perier,
R. C., Praz, V., Junier, T., Bonnard, C. and Bucher, P. (2000). The
eukaryotic promoter database (EPD) [In Process Citation].
Nucleic Acids Res, 28(1), 302-303.
Perler,
F. B. (2000). InBase, the intein database [In Process
Citation]. Nucleic Acids Res, 28(1), 344-345.
Perriere,
G., Bessieres, P. and Labedan, B. (2000). EMGLib: the enhanced
microbial genomes library (update 2000) [In Process
Citation]. Nucleic Acids Res, 28(1), 68-71.
Pesole,
G., Gissi, C., Catalano, D., Grillo, G., Licciulli, F., Liuni, S.,
Attimonelli, M. and Saccone, C. (2000). MitoNuc and MitoAln: two
related databases of nuclear genes coding for mitochondrial proteins
[In Process Citation]. Nucleic Acids Res, 28(1),
163-165.
Pesole,
G., Liuni, S., Grillo, G., Licciulli, F., Larizza, A., Makalowski, W.
and Saccone, C. (2000). UTRdb and UTRsite: specialized databases of
sequences and functional elements of 5' and 3' untranslated regions
of eukaryotic mRNAs [In Process Citation]. Nucleic Acids Res,
28(1), 193-196.
Ploger,
R., Zhang, J., Bassett, D., Reeves, R., Hieter, P., Boguski, M. and
Spencer, F. (2000). XREFdb: cross-referencing the genetics and genes
of mammals and model organisms [In Process Citation]. Nucleic
Acids Res, 28(1), 120-122.
Pollet,
N., Schmidt, H. A., Gawantka, V., Vingron, M. and Niehrs, C. (2000).
Axeldb: a xenopus laevis database focusing on gene expression [In
Process Citation]. Nucleic Acids Res, 28(1),
139-140.
Ponomarenko,
J. V., Orlova, G. V., Ponomarenko, M. P., Lavryushev, S. V., Frolov,
A. S., Zybova, S. V. and Kolchanov, N. A. (2000). SELEX_DB: an
activated database on selected randomized DNA/RNA sequences addressed
to genomic sequence annotation [In Process Citation]. Nucleic
Acids Res, 28(1), 205-208.
Quackenbush,
J., Liang, F., Holt, I., Pertea, G. and Upton, J. (2000). The TIGR
gene indices: reconstruction and representation of expressed gene
sequences [In Process Citation]. Nucleic Acids Res, 28(1),
141-145.
Rawlings,
N. D. and Barrett, A. J. (2000). MEROPS: the peptidase database
[In Process Citation]. Nucleic Acids Res, 28(1),
323-325.
Reichert,
J., Jabs, A., Slickers, P. and Suhnel, J. (2000). The IMB jena image
library of biological macromolecules [In Process Citation].
Nucleic Acids Res, 28(1), 246-249.
Ringwald,
M., Eppig, J. T., Kadin, J. A., Richardson, J. E. and the Gene
Expression Database, G. (2000). GXD: a gene expression database for
the laboratory mouse: current status and recent enhancements [In
Process Citation]. Nucleic Acids Res, 28(1),
115-119.
Roberts,
R. J. and Macelis, D. (2000). REBASE - restriction enzymes and
methylases [In Process Citation]. Nucleic Acids Res, 28(1),
306-307.
Rodriguez-Tome,
P. and Lijnzaad, P. (2000). RHdb: the radiation hybrid database
[In Process Citation]. Nucleic Acids Res, 28(1),
146-147.
Rudd,
K. E. (2000). EcoGene: a genome sequence database for escherichia
coli K-12 [In Process Citation]. Nucleic Acids Res, 28(1),
60-64.
Ruiz,
M. et al. (2000). IMGT, the international ImMunoGeneTics database
[In Process Citation]. Nucleic Acids Res, 28(1),
219-221.
Sakata,
K., Antonio, B. A., Mukai, Y., Nagasaki, H., Sakai, Y., Makino, K.
and Sasaki, T. (2000). INE: a rice genome database with an integrated
map view [In Process Citation]. Nucleic Acids Res, 28(1),
97-101.
Sakharkar,
M., Long, M., Tan, T. W. and de Souza, S. J. (2000). ExInt: an
Exon/Intron database [In Process Citation]. Nucleic Acids
Res, 28(1), 191-192.
Salgado,
H., Santos-Zavaleta, A., Gama-Castro, S., Millan-Zarate, D.,
Blattner, F. R. and Collado-Vides, J. (2000). RegulonDB (version
3.0): transcriptional regulation and operon organization in
escherichia coli K-12 [In Process Citation]. Nucleic Acids
Res, 28(1), 65-67.
Sanchez,
R., Pieper, U., Mirkovi, N., de Bakker, P. I., Wittenstein, E. and
ali, A. (2000). MODBASE, a database of annotated comparative protein
structure models [In Process Citation]. Nucleic Acids Res,
28(1), 250-253.
Saxonov,
S., Daizadeh, I., Fedorov, A. and Gilbert, W. (2000). EID: the
exon-intron database-an exhaustive database of protein-coding
intron-containing genes [In Process Citation]. Nucleic Acids
Res, 28(1), 185-190.
Scharfe,
C. et al. (2000). MITOP, the mitochondrial proteome database: 2000
update [In Process Citation]. Nucleic Acids Res, 28(1),
155-158.
Schisler,
N. J. and Palmer, J. D. (2000). The IDB and IEDB: intron sequence and
evolution databases [In Process Citation]. Nucleic Acids Res,
28(1), 181-184.
Schonbach,
C., Koh, J. L., Sheng, X., Wong, L. and Brusic, V. (2000). FIMM, a
database of functional molecular immunology [In Process
Citation]. Nucleic Acids Res, 28(1), 222-224.
Schultz,
J., Copley, R. R., Doerks, T., Ponting, C. P. and Bork, P. (2000).
SMART: a web-based tool for the study of genetically mobile domains
[In Process Citation]. Nucleic Acids Res, 28(1),
231-234.
Shafer,
R. W., Jung, D. R., Betts, B. J., Xi, Y. and Gonzales, M. J. (2000).
Human immunodeficiency virus reverse transcriptase and protease
sequence database [In Process Citation]. Nucleic Acids Res,
28(1), 346-348.
Skoufos,
E., Marenco, L., Nadkarni, P. M., Miller, P. L. and Shepherd, G. M.
(2000). Olfactory receptor database: a sensory chemoreceptor resource
[In Process Citation]. Nucleic Acids Res, 28(1),
341-343.
Smigielski,
E. M., Sirotkin, K., Ward, M. and Sherry, S. T. (2000). dbSNP: a
database of single nucleotide polymorphisms [In Process
Citation]. Nucleic Acids Res, 28(1), 352-355.
Spirov,
A. V., Bowler, T. and Reinitz, J. (2000). HOX pro: a specialized
database for clusters and networks of homeobox genes [In Process
Citation]. Nucleic Acids Res, 28(1), 337-340.
Stenberg,
K. A., Riikonen, P. T. and Vihinen, M. (2000). KinMutBase, a database
of human disease-causing protein kinase mutations [In Process
Citation]. Nucleic Acids Res, 28(1), 369-371.
Sullivan,
S. A., Aravind, L., Makalowska, I., Baxevanis, A. D. and Landsman, D.
(2000). The histone database: a comprehensive WWW resource for
histones and histone fold-containing proteins [In Process
Citation]. Nucleic Acids Res, 28(1), 320-322.
Szymanski,
M., Barciszewska, M. Z., Barciszewski, J. and Erdmann, V. A. (2000).
5S ribosomal RNA database Y2K [In Process Citation]. Nucleic
Acids Res, 28(1), 166-167.
Szymanski,
M. and Barciszewski, J. (2000). Aminoacyl-tRNA synthetases database
Y2K [In Process Citation]. Nucleic Acids Res, 28(1),
326-328.
Tateno,
Y., Miyazaki, S., Ota, M., Sugawara, H. and Gojobori, T. (2000). DNA
data bank of japan (DDBJ) in collaboration with mass sequencing teams
[In Process Citation]. Nucleic Acids Res, 28(1),
24-26.
Tatusov,
R. L., Galperin, M. Y., Natale, D. A. and Koonin, E. V. (2000). The
COG database: a tool for genome-scale analysis of protein functions
and evolution [In Process Citation]. Nucleic Acids Res,
28(1), 33-36.
van
Batenburg, F. H., Gultyaev, A. P., Pleij, C. W., Ng, J. and Oliehoek,
J. (2000). PseudoBase: a database with RNA pseudoknots [In
Process Citation]. Nucleic Acids Res, 28(1),
201-204.
Van
de Peer, Y., De Rijk, P., Wuyts, J., Winkelmans, T. and De Wachter,
R. (2000). The european small subunit ribosomal RNA database [In
Process Citation]. Nucleic Acids Res, 28(1),
175-176.
Volpetti,
V., Gallerani, R., De Benedetto, C., Liuni, S., Licciulli, F. and
Ceci, L. R. (2000). PLMItRNA, a database for tRNAs and tRNA genes in
plant mitochondria: enlargement and updating [In Process
Citation]. Nucleic Acids Res, 28(1), 159-162.
Wang,
Y., Addess, K. J., Geer, L., Madej, T., Marchler-Bauer, A.,
Zimmerman, D. and Bryant, S. H. (2000). MMDB: 3D structure data in
entrez [In Process Citation]. Nucleic Acids Res, 28(1),
243-245.
Waugh,
M., Hraber, P., Weller, J., Wu, Y., Chen, G., Inman, J., Kiphart, D.
and Sobral, B. (2000). The phytophthora genome initiative database:
informatics and analysis for distributed pathogenomic research
[In Process Citation]. Nucleic Acids Res, 28(1),
87-90.
Wheeler,
D. L., Chappey, C., Lash, A. E., Leipe, D. D., Madden, T. L.,
Schuler, G. D., Tatusova, T. A. and Rapp, B. A. (2000). Database
resources of the national center for biotechnology information
[In Process Citation]. Nucleic Acids Res, 28(1),
10-14.
Williams,
K. P. (2000). The tmRNA website [In Process Citation].
Nucleic Acids Res, 28(1), 168-161.
Wingender,
E. et al. (2000). TRANSFAC: an integrated system for gene expression
regulation [In Process Citation]. Nucleic Acids Res, 28(1),
316-319.
Xenarios,
I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M. and
Eisenberg, D. (2000). DIP: the database of interacting proteins
[In Process Citation]. Nucleic Acids Res, 28(1),
289-291.
Yona,
G., Linial, N. and Linial, M. (2000). ProtoMap: automatic
classification of protein sequences and hierarchy of protein families
[In Process Citation]. Nucleic Acids Res, 28(1),
49-55.
Zhao,
S. (2000). Human BAC ends [In Process Citation]. Nucleic
Acids Res, 28(1), 129-132.
Zwieb,
C. and Samuelsson, T. (2000). SRPDB (Signal recognition particle
database) [In Process Citation]. Nucleic Acids Res, 28(1),
171-172.
Zwieb,
C. and Wower, J. (2000). tmRDB (tmRNA database) [In Process
Citation]. Nucleic Acids Res, 28(1), 169-170.
Consensus Pattern Matching
Abarbanel,
R. M., Wieneke, P. R., Mansfield, E., Jaffe, D. A. and Brutlag, D. L.
(1984). Rapid searches for complex patterns in biological molecules.
Nucleic Acids Res. 12, 263-280.
Attwood,
T. K. and Beck, M. E. (1994). PRINTS--a protein motif fingerprint
database. Protein Eng, 7(7), 841-8.
Attwood,
T. K., Beck, M. E., Bleasby, A. J., Degtyarenko, K. and Parry Smith,
D. J. (1996). Progress with the PRINTS protein fingerprint database.
Nucleic Acids Res, 24(1), 182-8.
Attwood,
T. K., Beck, M. E., Bleasby, A. J., Degtyarenko, K., Michie, A. D.
and Parry-Smith, D. J. (1997). Novel developments with the PRINTS
protein fingerprint database. Nucleic Acids Res, 25(1),
212-7.
Attwood,
T. K., Beck, M. E., Flower, D. R., Scordis, P. and Selley, J. N.
(1998). The PRINTS protein fingerprint database in its fifth year.
Nucleic Acids Res, 26(1), 304-8.
Bairoch,
A., Bucher, P., & Hofmann, K. (1997). The PROSITE database, its
status in 1997. Nucleic Acids Res, 25(1), 217-21.
Bork,
P. (1989). Recognition of functional regions in primary structures
using a set of property patterns. FEBS Lett 257 (1), 191-5.
Bork,
P. and Koonin, E. V. (1996). Protein Sequence Motifs. Current Opinion
in Structural Biology 6 (3), 366-376.
Henikoff,
S. (1996). Scores for Sequence Searches. Current Opinion in
Structural Biology 6 (3), 353-360.
Koonin,
E. V., Tatusov, R. L. and Rudd, K. E. (1996). Protein sequence
comparison at genome scale. Methods Enzymol, 266, 295-322.
Koonin,
E. V., Tatusov, R. L. and Galperin, M. Y. (1998). Beyond complete
genomes: from sequence to structure and function. Curr Opin Struct
Biol, 8(3), 355-63.
Nevill-Manning,
C., Sethi, K., Wu, T. D., & Brutlag, D. L. (1997). Enumerating
and Ranking Discete Motifs. ISMB-97, 4, 202-209.
Nevill-Manning,
C. G., Wu, T. D. and Brutlag, D. L. (1998). Highly specific protein
sequence motifs for genome analysis. Proc Natl Acad Sci U S A,
95(11), 5865-71.
Saqi,
M. A. and Sternberg, M. J. (1994). Identification of sequence motifs
from a set of proteins with related function. Protein Eng, 7(2),
165-71.
Smith,
H. O., Annau, T. M. and Chandrasegaran, S. (1990). Finding sequence
motifs in groups of functionally related proteins. Proc Natl Acad Sci
U S A, 87 (2), 826-30.
Smith,
R. (1988). A finite state machine algorithm for finding restriction
sites and other pattern matching applications. Comput Appl Biosci, 4
(4), 459-65.
Saqi,
M. A. and Sternberg, M. J. (1994). Identification of sequence motifs
from a set of proteins with related function. Protein Eng, 7(2),
165-71.
Stormo,
G. D. (1990). Consensus patterns in DNA. Methods Enzymol 183 ,
211-21.
Wu,
T. D. and Brutlag, D. L. (1995). Identification of protein motifs
using conserved amino acid properties and partitioning techniques.
Ismb, 3, 402-10.
Quantitative and Probabilistic Pattern Matching
Bowie,
J. U., Luthy, R. and Eisenberg, D. (1991). A Method to Identify
Protein Sequences That Fold Into a Known Three-Dimensional Structure.
Science 253, 164-170.
Brennan,
R. G. and Matthews, B. W. (1989a). The helix-turn-helix DNA binding
motif. J Biol Chem, 264 (4), 1903-6.
Brennan,
R. G. and Matthews, B. W. (1989b). Structural basis of DNA-protein
recognition. Trends Biochem Sci, 14 (7), 286-90.
Dodd,
I. B. and Egan, J. B. (1990). Improved detection of helix-turn-helix
DNA-binding motifs in protein sequences. Nucleic Acids Res 18 (17),
5019-26.
Gribskov,
M., McLachlan, A. D. and Eisenberg, D. (1987). Profile analysis:
Dectection of distantly related proteins. Proc. Natl. Acad. Sci. USA,
84, 4355-4358.
Gribskov,
M., Homyak, M., Edenfield, J. and Eisenberg, D. (1988). Profile
scanning for three-dimensional structural patterns in protein
sequences. Comput Appl Biosci, 4 (1), 61-6.
Gribskov,
M. (1994). Profile analysis. Methods Mol Biol 25 , 247-66.
Henikoff,
S. and Henikoff, J. G. (1991). Automated assembly of protein blocks
for database searching. Nucleic Acids Res 19 (23), 6565-72.
Henikoff,
S. (1991). Playing with blocks: some pitfalls of forcing multiple
alignments. New Biol 3 (12), 1148-54.
Henikoff,
S. and Henikoff, J. G. (1994). Position-based Sequence Weights. J.
Mol. Biol. 243 , 574-578.
Henikoff,
J. G. and Henikoff, S. (1996). Using substitution probabilities to
improve position-specific scoring matirices. Comput Appl Biosci,
12(2), 135-43.
Henikoff,
S. (1996). Scores for Sequence Searches. Current Opinion in
Structural Biology, 6(3), 353-360.
Luthy,
R., Bowie, J. U. and Eisenberg, D. (1992). Assessment of protein
models with three-dimensional profiles. Nature 356 (6364),
83-85.
Luthy,
R., McLachlan, A. D. and Eisenberg, D. (1991). Secondary
structure-based profiles: use of structure-conserving scoring tables
in searching protein sequence databases for structural similarities.
Proteins 10 (3), 229-239.
Pietrokovski,
S., Henikoff, J. G. and Henikoff, S. (1996). The Blocks database--a
system for protein classification. Nucleic Acids Res, 24(1),
197-200.
Vogt,
G., Etzold, T. and Argos, P. (1995). An assessment of amino acid
exchange matrices in aligning protein sequences: the twilight zone
revisited. J Mol Biol, 249(4), 816-31.
Wallace,
J. C. and Henikoff, S. (1992). PATMAT: a searching and extraction
program for sequence, pattern and block queries and databases. Comput
Appl Biosci, 8 (3), 249-54.
Alignment of Biological Sequences
Altschul,
S. F. (1991). Amino acid substitution matrices from an information
theoretic perspective. J Mol Biol, 219(3), 555-65.
Dayhoff, M. Schwartz, R. M. and Orcutt, B. C. (1978). A model of
evolutionary change in Proteins. Atlas of Protein Structure 1978,
345-352
Doolittle, R. F. (1986). Of Urfs and Orfs: A Primer on How to
Analyze Derived Amino Acid Sequences. Mill Valley, California:
University Science Books.
Feng,
D.F., Johnson, M.S. and Doolittle, R.F. (1985). Aligning amino acid
sequences: comparison of commonly used methods. J. Mol. Evol. 21,
112-125.
Gribskov,
M. (1994). Profile analysis. Methods Mol Biol 25 , 247-66.
Grice,
J. A., Hughey, R. and Speck, D. (1995). Parallel sequence alignment
in limited space. Ismb 3 , 145-53.
Krogh,
A., Brown, M., Mian, I. S., Sjolander, K. and Haussler, D. (1994).
Hidden Markov models in computational biology. Applications to
protein modeling. J Mol Biol 235 (5), 1501-31.
Needleman,
S. B. and Wunsch, C. D. (1970). A general method applicable to the
search for similarities in the amino acid sequence of two proteins.
J. Mol. Biol. 48, 443-453.
Pearson,
W. R. and Miller, W. (1992). Dynamic programming algorithms for
biological sequence comparison. Methods Enzymol, 210,
575-601.
earson,
W. R. (1995). Comparison of methods for searching protein sequence
databases. Protein Sci 4 (6), 1145-60.
Reeck,
G. R., de Haen, C., Teller, D. C., Doolittle, R. F., Fitch, W. M.,
Dickerson, R. E (1987). "Homology" in Proteins andNucleic Acids: A
Terminology Muddle and a Way out of It. Cell 50, 667.
Smith,
T. F. and Waterman, M. (1981). Identification of common molecular
subsequences. J. Mol. Biol. 147, 195-197.
Back
to Determining Function From Sequence
Similarity Scoring Systems
Altschul,
S. F. (1993). A protein alignment scoring system sensitive at all
evolutionary distances. J Mol Evol 36 (3), 290-300.
Benner,
S. A., Cohen, M. A. and Gonnet, G. H. (1993). Empirical and
structural models for insertions and deletions in the divergent
evolution of proteins. J Mol Biol 229 (4), 1065-82.
Gonnet,
G. H., Cohen, M. A. and Benner, S. A. (1992). Exhaustive Matching of
the Entire Protein Sequence Database. Science 256 (5062),
1443-5.
Henikoff,
S. and Henikoff, J. G. (1993). Performance evaluation of amino acid
substitution matrices. Proteins, 17(1), 49-61.
Henikoff,
S. (1996). Scores for Sequence Searches. Current Opinion in
Structural Biology 6 (3), 353-360.
Jones,
D. T., Taylor, W. R. and Thornton, J. M. (1992). The rapid generation
of mutation data matrices from protein sequences. Comput Appl Biosci,
8 (3), 275-82.
Schwartz, R. M. and Dayhoff, M. O. (1979). Matrices for Detecting
Distant Relationships. Atlas of Protein Structure 5 (Suppl. 3),
353-358.
Vogt,
G., Etzold, T. and Argos, P. (1995). An assessment of amino acid
exchange matrices in aligning protein sequences: the twilight zone
revisited. J Mol Biol, 249(4), 816-31.
Wilbur,
W. J. (1985). On the PAM matrix model of protein evolution. . Mol
Biol Evol 2 (5), 434-47.
Zhu,
Z. Y., Sali, A. and Blundell, T. L. (1992). A variable gap penalty
function and feature weights for protein 3-D structure comparisons.
Protein Eng 5 (1), 43-51. Back To Top
Back
to Determining Function From Sequence
Rapid Sequence Similarity Search
Altschul,
S. F. (1998). Generalized affine gap costs for protein sequence
alignment. Proteins, 32(1), 88-96
Altschul,
S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990). A
Basic Local Alignment Search Tool. J. Mol. Biol., 215,
403-410.
Altschul,
S. F., Boguski, M. S., Gish, W. and Wootton, J. C. (1994). Issues in
searching molecular sequence databases. Nat Genet 6 (2),
119-29.
Altschul,
S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller,
W. and Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs. Nucleic Acids Res,
25(17), 3389-402.
Barsalou,
T. and Brutlag, D. L. (1991). Searching Gene and Protein Sequence
Databases. MD Computing, 8(3), 144-149.
Brutlag,
D. L., Dautricourt, J. P., Maulik, S. and Relph, J. (1990). Improved
sensitivity of biological sequence database searches. Comput Appl
Biosci, 6(3), 237-45.
Brutlag,
D. L., Dautricourt, J. P., Diaz, R., Fier, J., Moxon, B. and Stamm,
R. (1993). BLAZE: An implementation of the Smith-Waterman Comparison
Algorithm on a Massively Parallel Computer. Computers and Chemistry
17 , 203-207.
Gonnet,
G. H., Cohen, M. A. and Benner, S. A. (1992). Exhaustive matching of
the entire protein sequence database. Science, 256, 1443-5.
Gotoh,
O. (1982). An Improved Algorithm for Matching Biological Sequences.
J. Mol. Biol., 162, 705-8.
Gribskov,
M., McLachlan, A. D. and Eisenberg, D. (1987). Profile analysis:
Dectection of distantly related proteins. Proc. Natl. Acad. Sci. USA
84, 4355-4358.
Lipman,
D.J. and Pearson, W.R. (1985). Rapid and Sensitive Protein Simlarity
Searches. Science 227, 1435-1441.
Pearson,
W. R. (1994). Using the FASTA program to search protein and DNA
sequence databases. Methods Mol Biol 25 , 365-89.