Biochemistry 201

Advanced Molecular Biology

March 6, 2000

Determining Function from Sequence

Doug Brutlag

General Books

General Reviews

Molecular Databases on the Internet

Consensus Pattern Matching

Quantitative and Probabilistic Pattern Matching

Alignment of Biological Sequences

Similarity Scoring Systems

Rapid Sequence Similarity Search

Back to Top

General Books

Adams, M. D., Fields, C. and Venter, J. C. (1994). Automated DNA Sequencing and Analysis. New York: Academic Press, 368 pages.

Baldi, P. and Brunak, S. (1998). Bioinformatics: The Machine Learning Approach (1st ed.). Cambridge, MA: The MIT Press.

Baxevanis, A. D. and Ouellette, B. F. F. (1998). Bioinformatics: A practical Guide to the Analysis of Genes and Proteins. New York, NY: John Wiley & Sons, Inc., 356.

Bishop, M. J. (1994). Guide to Human Genome Computing. London: Academic Press, 350 pages.

Brutlag, D. L. and Sternberg, M. J. E. (1996). Sequences and Topology. London: Current Biology Ltd., 427 pages.

Creighton, T. E. (1993). Proteins: Structures and Molecular Properties (Second Edition ed.). New York: Freeman.

Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory (1st ed.). New York NY: John Wiley and Sons Inc.

Doolittle, R. F. (1986). Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences. University Science Books, Mill Valley, California.

Doolittle, R. F. (1990). Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences (1 ed.). Methods in Enzymology Volume 183, New York: Academic Press.

Doolittle, R. F. (1996). Computer Methods for Macromolecular Sequence Analysis. (Vol. 266). New York: Academic Press. 711 Pages.

Durbin, R., Eddy, S., Krogh, A. and Michison, G. (1998). Biological Sequence Analysis (1st ed.). Cambridge, UK: Cambridge University Press.

Fasman, G. D. (1989). Prediction of Protein Structure and the Principles of Protein Conformation. New York NY: Plenum Press,

Gribskov, M. and Devereux, J. (1991). Sequence Analysis Primer. New York: Stockton Press, 279 pages.

Gusfield, D. (1997). Algorithms on Strings, Trees and Sequences. (1st. ed.). Cambridge, UK: Cambridge University Press, 534 pages.

Hunter, L. (1993). Artificial Intelligence and Molecular Biology. Menlo Park, CA: AAAI Press, 470 pages.

Hunter, L., Searls, D. and Shavlik, J. (1993). First International Conference on Intelligent Systems for Molecular Biology. Menlo Park, CA.: AAAI Press.

Lander, E. S. and Waterman, M. S. (1995). Calculating the Secrets of Life: Applications of the Mathematical Sciences in Molecular Biology. Washington D. C.: National Academy Press, 285 pages.

Lesk, A. (1991). Protein Architecture: A Practical Approach . Oxford: IRL Press at Oxford University Press. 287 pages

Salzberg, S. L., Searls, D. B. and Kasif, S. (1998). Computational Methods in Molecular Biology. Amsterdam: Elsevier, 371.

Schultze-Kremer, S. (1994). Advances in Molecular Bioinformatics. Washington D.D.: IOS PRess, 259 pages.

Smith, D. W. (1994). Biocomputing: Informatics and Genome Projects. New York: Academic Press Inc., 336 pages.

Trends Guide to Bioinformatics, Supplement, December 1998, Elsevier.

von Heijne, Gunnar (1987). Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit, Academic Press, New York. 188 Pages

Waterman, M. (1988). Mathematical Methods for DNA Sequences, CRC Press, Cleveland Ohio. 283 Pages.

Waterman, M. S. (1995). Introduction to Computational Biology. Chapman & Hall Press, London 430 pages.

Back To Top

Back to Determining Function From Sequence

General Reviews

Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990). A Basic Local Alignment Search Tool. J. Mol. Biol., 215, 403-410.

Altschul, S. F., Boguski, M. S., Gish, W. and Wootton, J. C. (1994). Issues in searching molecular sequence databases. Nat Genet 6 (2), 119-29.

Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25(17), 3389-402.

Altschul, S. F. and Koonin, E. V. (1998). Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. Trends Biochem Sci, 23(11), 444-447.

Doolittle, R. F. (1994). Protein sequence comparisons: searching databases and aligning sequences. Curr Opin Biotechnol 5 (1), 24-8.

Hogue, C. W. (1997). Cn3D: a new generation of three-dimensional molecular structure viewer. Trends Biochem Sci, 22(8), 314-6.

Holm, L. and Sander, C. (1994). Searching protein structure databases has come of age. Proteins 19 (3), 165-73.

Holm, L., & Sander, C. (1996). Mapping the protein universe. Science, 273(5275), 595-603.

Rost, B. and Sander, C. (1994). Structure prediction of proteins--where are we now? Curr Opin Biotechnol 5 (4), 372-80.

Russell, R. B., & Sternberg, M. J. (1995). Structure prediction. How good are we? Curr Biol, 5(5), 488-90.

White, S. H. (1994). Global statistics of protein sequences: implications for the origin, evolution, and prediction of structure. Annu Rev Biophys Biomol Struct 23 , 407-39.

Back To Top

Back to Determining Function From Sequence

Molecular Databases on the Internet

Attimonelli, M. et al. (2000). MitBASE : a comprehensive and integrated mitochondrial DNA database. The present status [In Process Citation]. Nucleic Acids Res, 28(1), 148-152.

Attwood, T. K., Croning, M. D., Flower, D. R., Lewis, A. P., Mabey, J. E., Scordis, P., Selley, J. N. and Wright, W. (2000). PRINTS-S: the database formerly known as PRINTS [In Process Citation]. Nucleic Acids Res, 28(1), 225-227.

Bairoch, A. (2000). The ENZYME database in 2000 [In Process Citation]. Nucleic Acids Res, 28(1), 304-305.

Bairoch, A. and Apweiler, R. (2000). The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [In Process Citation]. Nucleic Acids Res, 28(1), 45-48.

Baker, W., van den Broek, A., Camon, E., Hingamp, P., Sterk, P., Stoesser, G. and Tuli, M. A. (2000). The EMBL nucleotide sequence database [In Process Citation]. Nucleic Acids Res, 28(1), 19-23.

Ball, C. A. et al. (2000). Integrating functional genomic information into the saccharomyces genome database [In Process Citation]. Nucleic Acids Res, 28(1), 77-80.

Banerjee-Basu, S., Ryan, J. F. and Baxevanis, A. D. (2000). The homeodomain resource: a prototype database for a large protein family [In Process Citation]. Nucleic Acids Res, 28(1), 329-330.

Barker, W. C. et al. (2000). The protein information resource (PIR) [In Process Citation]. Nucleic Acids Res, 28(1), 41-44.

Bateman, A., Birney, E., Durbin, R., Eddy, S. R., Howe, K. L. and Sonnhammer, E. L. (2000). The pfam protein families database [In Process Citation]. Nucleic Acids Res, 28(1), 263-266.

Baxevanis, A. D. (2000). The molecular biology database collection: an online compilation of relevant database resources [In Process Citation]. Nucleic Acids Res, 28(1), 1-7.

Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A. and Wheeler, D. L. (2000). GenBank [In Process Citation]. Nucleic Acids Res, 28(1), 15-18.

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. and Bourne, P. E. (2000). The protein data bank [In Process Citation]. Nucleic Acids Res, 28(1), 235-242.

Blake, J. A., Eppig, J. T., Richardson, J. E., Davisson, M. T. and the Mouse Genome Database, G. (2000). The mouse genome database (MGD): expanding genetic and genomic resources for the laboratory mouse [In Process Citation]. Nucleic Acids Res, 28(1), 108-111.

Brenner, S. E., Koehl, P. and Levitt, M. (2000). The ASTRAL compendium for protein structure and sequence analysis [In Process Citation]. Nucleic Acids Res, 28(1), 254-256.

Brookes, A. J., Lehvaslaiho, H., Siegfried, M., Boehm, J. G., Yuan, Y. P., Sarkar, C. M., Bork, P. and Ortigao, F. (2000). HGBASE: a database of SNPs and other variations in and around human genes [In Process Citation]. Nucleic Acids Res, 28(1), 356-360.

Bult, C. J., Krupke, D. M., Sundberg, J. P. and Eppig, J. T. (2000). Mouse tumor biology database (MTB): enhancements and current status [In Process Citation]. Nucleic Acids Res, 28(1), 112-114.

Catalano, D., Licciulli, F., D'Elia, D. and Attimonelli, M. (2000). Update of KEYnet: a gene and protein names database for biosequences functional organisation [In Process Citation]. Nucleic Acids Res, 28(1), 372-373.

Cheung, K. H., Osier, M. V., Kidd, J. R., Pakstis, A. J., Miller, P. L. and Kidd, K. K. (2000). ALFRED: an allele frequency database for diverse populations and DNA polymorphisms [In Process Citation]. Nucleic Acids Res, 28(1), 361-363.

Corpet, F., Servant, F., Gouzy, J. and Kahn, D. (2000). ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons [In Process Citation]. Nucleic Acids Res, 28(1), 267-269.

Costanzo, M. C. et al. (2000). The yeast proteome database (YPD) and caenorhabditis elegans proteome database (WormPD): comprehensive resources for the organization and comparison of model organism protein information [In Process Citation]. Nucleic Acids Res, 28(1), 73-76.

D'Souza, M., Romine, M. F. and Maltsev, N. (2000). SENTRA, a database of signal transduction proteins [In Process Citation]. Nucleic Acids Res, 28(1), 335-336.

De Rijk, P., Wuyts, J., Van de Peer, Y., Winkelmans, T. and De Wachter, R. (2000). The european large subunit ribosomal RNA database [In Process Citation]. Nucleic Acids Res, 28(1), 177-178.

Dicks, J. et al. (2000). UK CropNet: a collection of databases and bioinformatics resources for crop plant genomics [In Process Citation]. Nucleic Acids Res, 28(1), 104-107.

Discala, C., Benigni, X., Barillot, E. and Vaysseix, G. (2000). DBcat: a catalog of 500 biological databases [In Process Citation]. Nucleic Acids Res, 28(1), 8-9.

Dralyuk, I., Brudno, M., Gelfand, M. S., Zorn, M. and Dubchak, I. (2000). ASDB: database of alternatively spliced genes [In Process Citation]. Nucleic Acids Res, 28(1), 296-297.

Ellis, L. B., Hershberger, C. D. and Wackett, L. P. (2000). The university of minnesota Biocatalysis/Biodegradation database: microorganisms, genomics and prediction [In Process Citation]. Nucleic Acids Res, 28(1), 377-379.

Erdmann, V. A., Szymanski, M., Hochberg, A., Groot, N. and Barciszewski, J. (2000). Non-coding, mRNA-like RNAs database Y2K [In Process Citation]. Nucleic Acids Res, 28(1), 197-200.

Gai, X., Lal, S., Xing, L., Brendel, V. and Walbot, V. (2000). Gene discovery using the maize genome database ZmDB [In Process Citation]. Nucleic Acids Res, 28(1), 94-96.

Garavelli, J. S. (2000). The RESID database of protein structure modifications: 2000 update [In Process Citation]. Nucleic Acids Res, 28(1), 209-211.

Ghosh, D. (2000). Object-oriented transcription factors database (ooTFD) [In Process Citation]. Nucleic Acids Res, 28(1), 308-310.

Goto, S., Nishioka, T. and Kanehisa, M. (2000). LIGAND: chemical database of enzyme reactions [In Process Citation]. Nucleic Acids Res, 28(1), 380-382.

Gromiha, M. M., An, J., Kono, H., Oobatake, M., Uedaira, H., Prabakaran, P. and Sarai, A. (2000). ProTherm, version 2.0: thermodynamic database for proteins and mutants [In Process Citation]. Nucleic Acids Res, 28(1), 283-285.

Harger, C., Chen, G., Farmer, A., Huang, W., Inman, J., Kiphart, D., Schilkey, F., Skupski, M. P. and Weller, J. (2000). The genome sequence DataBase [In Process Citation]. Nucleic Acids Res, 28(1), 31-32.

Henikoff, J. G., Greene, E. A., Pietrokovski, S. and Henikoff, S. (2000). Increased coverage of protein families with the blocks database servers [In Process Citation]. Nucleic Acids Res, 28(1), 228-230.

Hishiki, T., Kawamoto, S., Morishita, S. and Okubo, K. (2000). BodyMap: a human and mouse gene expression database [In Process Citation]. Nucleic Acids Res, 28(1), 136-138.

Hoogland, C., Sanchez, J. C., Tonella, L., Binz, P. A., Bairoch, A., Hochstrasser, D. F. and Appel, R. D. (2000). The 1999 SWISS-2DPAGE database update [In Process Citation]. Nucleic Acids Res, 28(1), 286-288.

Huang, H., Xiao, C. and Wu, C. H. (2000). ProClass protein family database [In Process Citation]. Nucleic Acids Res, 28(1), 273-276.

Huret, J. L., Minor, S. L., Dorkeld, F., Dessen, P. and Bernheim, A. (2000). Atlas of genetics and cytogenetics in oncology and haematology, an interactive database [In Process Citation]. Nucleic Acids Res, 28(1), 349-351.

Jacobs, G. H., Stockwell, P. A., Schrieber, M. J., Tate, W. P. and Brown, C. M. (2000). Transterm: a database of messenger RNA components and signals [In Process Citation]. Nucleic Acids Res, 28(1), 293-295.

Jankowsky, E. and Jankowsky, A. (2000). The DExH/D protein family database [In Process Citation]. Nucleic Acids Res, 28(1), 333-334.

Johnson, G. and Wu, T. T. (2000). Kabat database and its applications: 30 years after the first variability plot [In Process Citation]. Nucleic Acids Res, 28(1), 214-218.

Kanehisa, M. and Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes [In Process Citation]. Nucleic Acids Res, 28(1), 27-30.

Karp, P. D., Riley, M., Saier, M., Paulsen, I. T., Paley, S. M. and Pellegrini-Toole, A. (2000). The EcoCyc and MetaCyc databases [In Process Citation]. Nucleic Acids Res, 28(1), 56-59.

Kawashima, S. and Kanehisa, M. (2000). AAindex: amino acid index database [In Process Citation]. Nucleic Acids Res, 28(1), 374.

Kawashima, T., Kawashima, S., Kanehisa, M., Nishida, H. and Makabe, K. W. (2000). MAGEST: MAboya gene expression patterns and sequence tags [In Process Citation]. Nucleic Acids Res, 28(1), 133-135.

Kel-Margoulis, O. V., Romashchenko, A. G., Kolchanov, N. A., Wingender, E. and Kel, A. E. (2000). COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation [In Process Citation]. Nucleic Acids Res, 28(1), 311-315.

Kent, W. J. and Zahler, A. M. (2000). The intronerator: exploring introns and alternative splicing in caenorhabditis elegans [In Process Citation]. Nucleic Acids Res, 28(1), 91-93.

Kikuno, R., Nagase, T., Suyama, M., Waki, M., Hirosawa, M. and Ohara, O. (2000). HUGE: a database for human large proteins identified in the kazusa cDNA sequencing project [In Process Citation]. Nucleic Acids Res, 28(1), 331-332.

Kolchanov, N. A. et al. (2000). Transcription regulatory regions database (TRRD): its status in 2000 [In Process Citation]. Nucleic Acids Res, 28(1), 298-301.

Krause, A., Stoye, J. and Vingron, M. (2000). The SYSTERS protein sequence cluster set [In Process Citation]. Nucleic Acids Res, 28(1), 270-272.

Lanave, C., Liuni, S., Licciulli, F. and Attimonelli, M. (2000). Update of AMmtDB: a database of multi-aligned metazoa mitochondrial DNA sequences [In Process Citation]. Nucleic Acids Res, 28(1), 153-154.

Lo Conte, L., Ailey, B., Hubbard, T. J., Brenner, S. E., Murzin, A. G. and Chothia, C. (2000). SCOP: a structural classification of proteins database [In Process Citation]. Nucleic Acids Res, 28(1), 257-259.

Lopez, P. J. and Seraphin, B. (2000). YIDB: the yeast intron DataBase [In Process Citation]. Nucleic Acids Res, 28(1), 85-86.

Maglott, D. R., Katz, K. S., Sicotte, H. and Pruitt, K. D. (2000). NCBI's LocusLink and RefSeq [In Process Citation]. Nucleic Acids Res, 28(1), 126-128.

Maidak, B. L. et al. (2000). The RDP (Ribosomal database project) continues [In Process Citation]. Nucleic Acids Res, 28(1), 173-174.

Mewes, H. W. et al. (2000). MIPS: a database for genomes and protein sequences [In Process Citation]. Nucleic Acids Res, 28(1), 37-40.

Minoshima, S., Mitsuyama, S., Ohno, S., Kawamura, T. and Shimizu, N. (2000). Keio mutation database (KMDB) for human disease gene mutations [In Process Citation]. Nucleic Acids Res, 28(1), 364-368.

Murvai, J., Vlahovicek, K., Barta, E., Cataletto, B. and Pongor, S. (2000). The SBASE protein domain library, release 7.0: a collection of annotated protein sequence segments [In Process Citation]. Nucleic Acids Res, 28(1), 260-262.

Nagaswamy, U., Voss, N., Zhang, Z. and Fox, G. E. (2000). Database of non-canonical base pairs found in known RNA structures [In Process Citation]. Nucleic Acids Res, 28(1), 375-376.

Nakamura, Y., Gojobori, T. and Ikemura, T. (2000). Codon usage tabulated from international DNA sequence databases: status for the year 2000 [In Process Citation]. Nucleic Acids Res, 28(1), 292.

Nakamura, Y., Kaneko, T. and Tabata, S. (2000). CyanoBase, the genome database for synechocystis sp. strain PCC6803: status for the year 2000 [In Process Citation]. Nucleic Acids Res, 28(1), 72.

Nelson, P. S., Clegg, N., Eroglu, B., Hawkins, V., Bumgarner, R., Smith, T. and Hood, L. (2000). The prostate expression database (PEDB): status and enhancements in 2000 [In Process Citation]. Nucleic Acids Res, 28(1), 212-213.

Overbeek, R., Larsen, N., Pusch, G. D., D'Souza, M., Jr, E. S., Kyrpides, N., Fonstein, M., Maltsev, N. and Selkov, E. (2000). WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction [In Process Citation]. Nucleic Acids Res, 28(1), 123-125.

Palm, C. J., Federspiel, N. A. and Davis, R. W. (2000). DAtA: database of arabidopsis thaliana annotation [In Process Citation]. Nucleic Acids Res, 28(1), 102-103.

Pearl, F. M., Lee, D., Bray, J. E., Sillitoe, I., Todd, A. E., Harrison, A. P., Thornton, J. M. and Orengo, C. A. (2000). Assigning genomic sequences to CATH [In Process Citation]. Nucleic Acids Res, 28(1), 277-282.

Pelchat, M., Deschenes, P. and Perreault, J. P. (2000). The database of the smallest known auto-replicable RNA species: viroids and viroid-like RNAs [In Process Citation]. Nucleic Acids Res, 28(1), 179-180.

Perier, R. C., Praz, V., Junier, T., Bonnard, C. and Bucher, P. (2000). The eukaryotic promoter database (EPD) [In Process Citation]. Nucleic Acids Res, 28(1), 302-303.

Perler, F. B. (2000). InBase, the intein database [In Process Citation]. Nucleic Acids Res, 28(1), 344-345.

Perriere, G., Bessieres, P. and Labedan, B. (2000). EMGLib: the enhanced microbial genomes library (update 2000) [In Process Citation]. Nucleic Acids Res, 28(1), 68-71.

Pesole, G., Gissi, C., Catalano, D., Grillo, G., Licciulli, F., Liuni, S., Attimonelli, M. and Saccone, C. (2000). MitoNuc and MitoAln: two related databases of nuclear genes coding for mitochondrial proteins [In Process Citation]. Nucleic Acids Res, 28(1), 163-165.

Pesole, G., Liuni, S., Grillo, G., Licciulli, F., Larizza, A., Makalowski, W. and Saccone, C. (2000). UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs [In Process Citation]. Nucleic Acids Res, 28(1), 193-196.

Ploger, R., Zhang, J., Bassett, D., Reeves, R., Hieter, P., Boguski, M. and Spencer, F. (2000). XREFdb: cross-referencing the genetics and genes of mammals and model organisms [In Process Citation]. Nucleic Acids Res, 28(1), 120-122.

Pollet, N., Schmidt, H. A., Gawantka, V., Vingron, M. and Niehrs, C. (2000). Axeldb: a xenopus laevis database focusing on gene expression [In Process Citation]. Nucleic Acids Res, 28(1), 139-140.

Ponomarenko, J. V., Orlova, G. V., Ponomarenko, M. P., Lavryushev, S. V., Frolov, A. S., Zybova, S. V. and Kolchanov, N. A. (2000). SELEX_DB: an activated database on selected randomized DNA/RNA sequences addressed to genomic sequence annotation [In Process Citation]. Nucleic Acids Res, 28(1), 205-208.

Quackenbush, J., Liang, F., Holt, I., Pertea, G. and Upton, J. (2000). The TIGR gene indices: reconstruction and representation of expressed gene sequences [In Process Citation]. Nucleic Acids Res, 28(1), 141-145.

Rawlings, N. D. and Barrett, A. J. (2000). MEROPS: the peptidase database [In Process Citation]. Nucleic Acids Res, 28(1), 323-325.

Reichert, J., Jabs, A., Slickers, P. and Suhnel, J. (2000). The IMB jena image library of biological macromolecules [In Process Citation]. Nucleic Acids Res, 28(1), 246-249.

Ringwald, M., Eppig, J. T., Kadin, J. A., Richardson, J. E. and the Gene Expression Database, G. (2000). GXD: a gene expression database for the laboratory mouse: current status and recent enhancements [In Process Citation]. Nucleic Acids Res, 28(1), 115-119.

Roberts, R. J. and Macelis, D. (2000). REBASE - restriction enzymes and methylases [In Process Citation]. Nucleic Acids Res, 28(1), 306-307.

Rodriguez-Tome, P. and Lijnzaad, P. (2000). RHdb: the radiation hybrid database [In Process Citation]. Nucleic Acids Res, 28(1), 146-147.

Rudd, K. E. (2000). EcoGene: a genome sequence database for escherichia coli K-12 [In Process Citation]. Nucleic Acids Res, 28(1), 60-64.

Ruiz, M. et al. (2000). IMGT, the international ImMunoGeneTics database [In Process Citation]. Nucleic Acids Res, 28(1), 219-221.

Sakata, K., Antonio, B. A., Mukai, Y., Nagasaki, H., Sakai, Y., Makino, K. and Sasaki, T. (2000). INE: a rice genome database with an integrated map view [In Process Citation]. Nucleic Acids Res, 28(1), 97-101.

Sakharkar, M., Long, M., Tan, T. W. and de Souza, S. J. (2000). ExInt: an Exon/Intron database [In Process Citation]. Nucleic Acids Res, 28(1), 191-192.

Salgado, H., Santos-Zavaleta, A., Gama-Castro, S., Millan-Zarate, D., Blattner, F. R. and Collado-Vides, J. (2000). RegulonDB (version 3.0): transcriptional regulation and operon organization in escherichia coli K-12 [In Process Citation]. Nucleic Acids Res, 28(1), 65-67.

Sanchez, R., Pieper, U., Mirkovi, N., de Bakker, P. I., Wittenstein, E. and ali, A. (2000). MODBASE, a database of annotated comparative protein structure models [In Process Citation]. Nucleic Acids Res, 28(1), 250-253.

Saxonov, S., Daizadeh, I., Fedorov, A. and Gilbert, W. (2000). EID: the exon-intron database-an exhaustive database of protein-coding intron-containing genes [In Process Citation]. Nucleic Acids Res, 28(1), 185-190.

Scharfe, C. et al. (2000). MITOP, the mitochondrial proteome database: 2000 update [In Process Citation]. Nucleic Acids Res, 28(1), 155-158.

Schisler, N. J. and Palmer, J. D. (2000). The IDB and IEDB: intron sequence and evolution databases [In Process Citation]. Nucleic Acids Res, 28(1), 181-184.

Schonbach, C., Koh, J. L., Sheng, X., Wong, L. and Brusic, V. (2000). FIMM, a database of functional molecular immunology [In Process Citation]. Nucleic Acids Res, 28(1), 222-224.

Schultz, J., Copley, R. R., Doerks, T., Ponting, C. P. and Bork, P. (2000). SMART: a web-based tool for the study of genetically mobile domains [In Process Citation]. Nucleic Acids Res, 28(1), 231-234.

Shafer, R. W., Jung, D. R., Betts, B. J., Xi, Y. and Gonzales, M. J. (2000). Human immunodeficiency virus reverse transcriptase and protease sequence database [In Process Citation]. Nucleic Acids Res, 28(1), 346-348.

Skoufos, E., Marenco, L., Nadkarni, P. M., Miller, P. L. and Shepherd, G. M. (2000). Olfactory receptor database: a sensory chemoreceptor resource [In Process Citation]. Nucleic Acids Res, 28(1), 341-343.

Smigielski, E. M., Sirotkin, K., Ward, M. and Sherry, S. T. (2000). dbSNP: a database of single nucleotide polymorphisms [In Process Citation]. Nucleic Acids Res, 28(1), 352-355.

Spirov, A. V., Bowler, T. and Reinitz, J. (2000). HOX pro: a specialized database for clusters and networks of homeobox genes [In Process Citation]. Nucleic Acids Res, 28(1), 337-340.

Stenberg, K. A., Riikonen, P. T. and Vihinen, M. (2000). KinMutBase, a database of human disease-causing protein kinase mutations [In Process Citation]. Nucleic Acids Res, 28(1), 369-371.

Sullivan, S. A., Aravind, L., Makalowska, I., Baxevanis, A. D. and Landsman, D. (2000). The histone database: a comprehensive WWW resource for histones and histone fold-containing proteins [In Process Citation]. Nucleic Acids Res, 28(1), 320-322.

Szymanski, M., Barciszewska, M. Z., Barciszewski, J. and Erdmann, V. A. (2000). 5S ribosomal RNA database Y2K [In Process Citation]. Nucleic Acids Res, 28(1), 166-167.

Szymanski, M. and Barciszewski, J. (2000). Aminoacyl-tRNA synthetases database Y2K [In Process Citation]. Nucleic Acids Res, 28(1), 326-328.

Tateno, Y., Miyazaki, S., Ota, M., Sugawara, H. and Gojobori, T. (2000). DNA data bank of japan (DDBJ) in collaboration with mass sequencing teams [In Process Citation]. Nucleic Acids Res, 28(1), 24-26.

Tatusov, R. L., Galperin, M. Y., Natale, D. A. and Koonin, E. V. (2000). The COG database: a tool for genome-scale analysis of protein functions and evolution [In Process Citation]. Nucleic Acids Res, 28(1), 33-36.

van Batenburg, F. H., Gultyaev, A. P., Pleij, C. W., Ng, J. and Oliehoek, J. (2000). PseudoBase: a database with RNA pseudoknots [In Process Citation]. Nucleic Acids Res, 28(1), 201-204.

Van de Peer, Y., De Rijk, P., Wuyts, J., Winkelmans, T. and De Wachter, R. (2000). The european small subunit ribosomal RNA database [In Process Citation]. Nucleic Acids Res, 28(1), 175-176.

Volpetti, V., Gallerani, R., De Benedetto, C., Liuni, S., Licciulli, F. and Ceci, L. R. (2000). PLMItRNA, a database for tRNAs and tRNA genes in plant mitochondria: enlargement and updating [In Process Citation]. Nucleic Acids Res, 28(1), 159-162.

Wang, Y., Addess, K. J., Geer, L., Madej, T., Marchler-Bauer, A., Zimmerman, D. and Bryant, S. H. (2000). MMDB: 3D structure data in entrez [In Process Citation]. Nucleic Acids Res, 28(1), 243-245.

Waugh, M., Hraber, P., Weller, J., Wu, Y., Chen, G., Inman, J., Kiphart, D. and Sobral, B. (2000). The phytophthora genome initiative database: informatics and analysis for distributed pathogenomic research [In Process Citation]. Nucleic Acids Res, 28(1), 87-90.

Wheeler, D. L., Chappey, C., Lash, A. E., Leipe, D. D., Madden, T. L., Schuler, G. D., Tatusova, T. A. and Rapp, B. A. (2000). Database resources of the national center for biotechnology information [In Process Citation]. Nucleic Acids Res, 28(1), 10-14.

Williams, K. P. (2000). The tmRNA website [In Process Citation]. Nucleic Acids Res, 28(1), 168-161.

Wingender, E. et al. (2000). TRANSFAC: an integrated system for gene expression regulation [In Process Citation]. Nucleic Acids Res, 28(1), 316-319.

Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M. and Eisenberg, D. (2000). DIP: the database of interacting proteins [In Process Citation]. Nucleic Acids Res, 28(1), 289-291.

Yona, G., Linial, N. and Linial, M. (2000). ProtoMap: automatic classification of protein sequences and hierarchy of protein families [In Process Citation]. Nucleic Acids Res, 28(1), 49-55.

Zhao, S. (2000). Human BAC ends [In Process Citation]. Nucleic Acids Res, 28(1), 129-132.

Zwieb, C. and Samuelsson, T. (2000). SRPDB (Signal recognition particle database) [In Process Citation]. Nucleic Acids Res, 28(1), 171-172.

Zwieb, C. and Wower, J. (2000). tmRDB (tmRNA database) [In Process Citation]. Nucleic Acids Res, 28(1), 169-170.

Back to Determining Function From Sequence

Consensus Pattern Matching

Abarbanel, R. M., Wieneke, P. R., Mansfield, E., Jaffe, D. A. and Brutlag, D. L. (1984). Rapid searches for complex patterns in biological molecules. Nucleic Acids Res. 12, 263-280.

Attwood, T. K. and Beck, M. E. (1994). PRINTS--a protein motif fingerprint database. Protein Eng, 7(7), 841-8.

Attwood, T. K., Beck, M. E., Bleasby, A. J., Degtyarenko, K. and Parry Smith, D. J. (1996). Progress with the PRINTS protein fingerprint database. Nucleic Acids Res, 24(1), 182-8.

Attwood, T. K., Beck, M. E., Bleasby, A. J., Degtyarenko, K., Michie, A. D. and Parry-Smith, D. J. (1997). Novel developments with the PRINTS protein fingerprint database. Nucleic Acids Res, 25(1), 212-7.

Attwood, T. K., Beck, M. E., Flower, D. R., Scordis, P. and Selley, J. N. (1998). The PRINTS protein fingerprint database in its fifth year. Nucleic Acids Res, 26(1), 304-8.

Bairoch, A., Bucher, P., & Hofmann, K. (1997). The PROSITE database, its status in 1997. Nucleic Acids Res, 25(1), 217-21.

Bork, P. (1989). Recognition of functional regions in primary structures using a set of property patterns. FEBS Lett 257 (1), 191-5.

Bork, P. and Koonin, E. V. (1996). Protein Sequence Motifs. Current Opinion in Structural Biology 6 (3), 366-376.

Henikoff, S. (1996). Scores for Sequence Searches. Current Opinion in Structural Biology 6 (3), 353-360.

Koonin, E. V., Tatusov, R. L. and Rudd, K. E. (1996). Protein sequence comparison at genome scale. Methods Enzymol, 266, 295-322.

Koonin, E. V., Tatusov, R. L. and Galperin, M. Y. (1998). Beyond complete genomes: from sequence to structure and function. Curr Opin Struct Biol, 8(3), 355-63.

Nevill-Manning, C., Sethi, K., Wu, T. D., & Brutlag, D. L. (1997). Enumerating and Ranking Discete Motifs. ISMB-97, 4, 202-209.

Nevill-Manning, C. G., Wu, T. D. and Brutlag, D. L. (1998). Highly specific protein sequence motifs for genome analysis. Proc Natl Acad Sci U S A, 95(11), 5865-71.

Saqi, M. A. and Sternberg, M. J. (1994). Identification of sequence motifs from a set of proteins with related function. Protein Eng, 7(2), 165-71.

Smith, H. O., Annau, T. M. and Chandrasegaran, S. (1990). Finding sequence motifs in groups of functionally related proteins. Proc Natl Acad Sci U S A, 87 (2), 826-30.

Smith, R. (1988). A finite state machine algorithm for finding restriction sites and other pattern matching applications. Comput Appl Biosci, 4 (4), 459-65.

Saqi, M. A. and Sternberg, M. J. (1994). Identification of sequence motifs from a set of proteins with related function. Protein Eng, 7(2), 165-71.

Stormo, G. D. (1990). Consensus patterns in DNA. Methods Enzymol 183 , 211-21.

Wu, T. D. and Brutlag, D. L. (1995). Identification of protein motifs using conserved amino acid properties and partitioning techniques. Ismb, 3, 402-10.

Back To Top

Back to Determining Function From Sequence

Quantitative and Probabilistic Pattern Matching

Bowie, J. U., Luthy, R. and Eisenberg, D. (1991). A Method to Identify Protein Sequences That Fold Into a Known Three-Dimensional Structure. Science 253, 164-170.

Brennan, R. G. and Matthews, B. W. (1989a). The helix-turn-helix DNA binding motif. J Biol Chem, 264 (4), 1903-6.

Brennan, R. G. and Matthews, B. W. (1989b). Structural basis of DNA-protein recognition. Trends Biochem Sci, 14 (7), 286-90.

Dodd, I. B. and Egan, J. B. (1990). Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res 18 (17), 5019-26.

Gribskov, M., McLachlan, A. D. and Eisenberg, D. (1987). Profile analysis: Dectection of distantly related proteins. Proc. Natl. Acad. Sci. USA, 84, 4355-4358.

Gribskov, M., Homyak, M., Edenfield, J. and Eisenberg, D. (1988). Profile scanning for three-dimensional structural patterns in protein sequences. Comput Appl Biosci, 4 (1), 61-6.

Gribskov, M. (1994). Profile analysis. Methods Mol Biol 25 , 247-66.

Henikoff, S. and Henikoff, J. G. (1991). Automated assembly of protein blocks for database searching. Nucleic Acids Res 19 (23), 6565-72.

Henikoff, S. (1991). Playing with blocks: some pitfalls of forcing multiple alignments. New Biol 3 (12), 1148-54.

Henikoff, S. and Henikoff, J. G. (1994). Position-based Sequence Weights. J. Mol. Biol. 243 , 574-578.

Henikoff, J. G. and Henikoff, S. (1996). Using substitution probabilities to improve position-specific scoring matirices. Comput Appl Biosci, 12(2), 135-43.

Henikoff, S. (1996). Scores for Sequence Searches. Current Opinion in Structural Biology, 6(3), 353-360.

Luthy, R., Bowie, J. U. and Eisenberg, D. (1992). Assessment of protein models with three-dimensional profiles. Nature 356 (6364), 83-85.

Luthy, R., McLachlan, A. D. and Eisenberg, D. (1991). Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins 10 (3), 229-239.

Pietrokovski, S., Henikoff, J. G. and Henikoff, S. (1996). The Blocks database--a system for protein classification. Nucleic Acids Res, 24(1), 197-200.

Vogt, G., Etzold, T. and Argos, P. (1995). An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol, 249(4), 816-31.

Wallace, J. C. and Henikoff, S. (1992). PATMAT: a searching and extraction program for sequence, pattern and block queries and databases. Comput Appl Biosci, 8 (3), 249-54.

Back To Top

Back to Determining Function From Sequence

Alignment of Biological Sequences

Altschul, S. F. (1991). Amino acid substitution matrices from an information theoretic perspective. J Mol Biol, 219(3), 555-65.

Dayhoff, M. Schwartz, R. M. and Orcutt, B. C. (1978). A model of evolutionary change in Proteins. Atlas of Protein Structure 1978, 345-352

Doolittle, R. F. (1986). Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences. Mill Valley, California: University Science Books.

Feng, D.F., Johnson, M.S. and Doolittle, R.F. (1985). Aligning amino acid sequences: comparison of commonly used methods. J. Mol. Evol. 21, 112-125.

Gribskov, M. (1994). Profile analysis. Methods Mol Biol 25 , 247-66.

Grice, J. A., Hughey, R. and Speck, D. (1995). Parallel sequence alignment in limited space. Ismb 3 , 145-53.

Krogh, A., Brown, M., Mian, I. S., Sjolander, K. and Haussler, D. (1994). Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 235 (5), 1501-31.

Needleman, S. B. and Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443-453.

Pearson, W. R. and Miller, W. (1992). Dynamic programming algorithms for biological sequence comparison. Methods Enzymol, 210, 575-601.

earson, W. R. (1995). Comparison of methods for searching protein sequence databases. Protein Sci 4 (6), 1145-60.

Reeck, G. R., de Haen, C., Teller, D. C., Doolittle, R. F., Fitch, W. M., Dickerson, R. E (1987). "Homology" in Proteins andNucleic Acids: A Terminology Muddle and a Way out of It. Cell 50, 667.

Smith, T. F. and Waterman, M. (1981). Identification of common molecular subsequences. J. Mol. Biol. 147, 195-197.