EGCG 8.0

An Article submitted to embnet.news

Peter Rice (Sanger Centre, Hinxton, UK)
Rodrigo Lopez (Norwegian EMBnet node)
Reinhard Doelz (Swiss EMBnet node)
Jack Leunissen (Netherlands EMBnet node)

Historical Background

EGCG is an abbreviation of "Extended GCG". The package began life at EMBL in Heidelberg, Germany as a small collection of programs to support EMBL's research activities, in particular the development of automated DNA sequencing. The programs that resulted were published [1] and released on the new EMBL Network File Server [2] under the name "GCGEMBL".

In 1989, Rodrigo Lopez came to EMBL from the Norwegian EMBnet node for a sabbatical visit, and developed several new programs which were included in the next release "GCGEMBL 6.0", along with additional collections "GCGDBASE" for database maintenance and "GCGQUICK" for enhanced Quick Searching programs [3].

When GCG 7.0 was released, these collections were merged and renamed as "EGCG". At the time, the "E" stood for "European" but we already intended it to be changed to "Extended" once a non-European contribution appeared. EGCG 7.0 was distributed on the EMBL Network File Server, but was also included on the new EMBL ftp site, and was submitted to GCG for distribution as "unsupported" software to all their licensed VMS users.

When GCG released a Unix version, and as EMBnet nodes began to migrate to Unix systems, the most popular EGCG programs were ported to Unix by Reinhard Doelz (Swiss EMBnet node), Jack Leunissen (Netherlands EMBnet node) and Rodrigo Lopez in time to be distributed with GCG 7.2 as "unsupported".

Jaakko Hattula, a summer student from Tampere University of Technology in Finland, completed the addition of a command line interface to all GCG programs, and contributed to the development of some new ideas for EGCG graphics.

Current Status

When the earliest test versions of GCG 8.0 appeared, it was clear that conversion would be a non trivial task. We undertook a major rewrite of some sections of the EGCG code, including the development of our own procedure libraries EGENLIB and EAPPLIB, which has made our code much easier to maintain. Additional advantages include the potential to run any program in batch with minor code changes, and to provide many extensions to the graphics options.

Unlike GCG, the Unix and VMS versions of EGCG use identical source code and documentation files. New applications are developed mainly on Unix and tested on VMS with very few problems.

Users of the EMBL Quick service will be happy to know that we have continued to support thie in EGCG 8.0. Several users at the Sanger Centre have found Quick Searching to be extremely useful, and we will continue to maintain and extend these programs.

We do not, however, have any plans to support the GCG WPI user interface. Although the EGCG programs will work with WPI, we have no time to spend on writing the configuration files.

EGCG is, like GCG, written in both Fortran and C. We will continue to maintain support for both languages.

Distribution

EGCG is no longer distributed with GCG. We prefer to distribute the package directly to the users through FTP, allowing us to more rapidly provide new programs and updates, and to keep control of our own software. The EGCG programs are distributed freely to academic users thourghout the world. The only requirement is that users must have a valid GCG licence to build the programs.

Program Examples

There are over 70 programs in EGCG 8.0. Old favourites include PRETTYPLOT which produces boxes sequence alignments; CPGPLOT which identifies CpG islands; SIGCLEAVE which identifies signal peptide cleavage sites; EQUICKSEARCH which runs QuickSearch faster and with less system resources than GCG's version (and QUICKMATCH which removes false hits); GELSTATUS and GELPICTURE which report on fragment assembly projects.

Unix users will find several new programs that were only in the latest VMS release before. These include MAPSELECT which creates local enzyme data files; PEPCOIL which identifies coiled-coil regions and leucine zippers; NEWFEATURES which analyses and edits the EMBL feature table; TPROFILESEARCH which compares protein profiles to DNA databases; and PEPALLWINDOW which plots hydrophobicity for aligned sequences.

New programs in release 8.0 include POLAND which simulates transition curves for DNA and RNA; PRETTYBOX which produces shaded sequence alignments; and STSSEARCH which searches a DNA database for matches to primer pairs.

The EMBnet Connection

Throughout the development of the EGCG package, there have been excellent collaborations with the EMBnet community. Academic software such as EGCG, developed for a limited number of users, cannot under normal circumstances be maintained to a sufficient standard for reliable public use. Only the large number of EMBnet users, and the efforts of their support personnel, have made it possible to produce a robust package. In return, we receive many suggestions for further developments which we can provide and test locally and then easily incorporate into the next release.

Acknowledgements

We are very grateful to (in alphabetical order) Rein Aasland, Wilhelm Ansorge, Peer Bork, Thure Etzold, Toby Gibson, Tom Kristensen, David Mathog, Franc Pattus, Kate Rice, Christian Schwager, Peter Sibbald, Julie Thompson, Hartmut Voss, Gert Vriend and Rick Westerman for their many contributions and critical comments as users of the EGCG Programs.

We are also deeply indebted to the staff of GCG Inc. who provided rapid and helpful answers to our many questions during the development of the programs. Many thanks to Irv Edelman, Maggie Smith, Donald Katz, Michael Hogan, Joseph King, Mary Schulz and especially John Devereux.

Into the Future

Since EGCG was first released with GCG, it has become very popular at a number of sites around the world. We now have a backlog of program submissions to include in EGCG which we plan to include as soon as the C interface to the procedure library is in place. Contributing authors in the backlog include William Pearson and Rainer Fuchs. Additional contributions are always welcome.

We also have to work on improvements to our procedure library. After consulting with the users, we intend to extend the scope of EGCG beyond the GCG package. We are interested especially in supporting a wider range of data formats, providing an improved user interface and providing documentation more suited to the needs of novice users. Past experience leads us to strongly prefer European solutions in these areas. At present we are looking at SRS [4], HASSLE [5], the Bioccelerator [6] from Compugen, www2gcg from the Belgian EMBnet node and the UK CCP11 project among other options.

Distribution

Current major version: 8.0 beta
URL: ftp://ftp.sanger.ac.uk/pub/pmr/egcg8

E-mail contact

egcg@embnet.org
pmr@sanger.ac.uk

References

[1] Edwards A. et al. Automated DNA Sequencing of the Human HPRT Locus. Genomics 6:593-608 (1990).
[2] Stoehr P.J. and Omond R.A. The EMBL Network File Server. Nucleic Acids Res. 17:6763-6764 (1989).
[3] Devereux J. A Rapid Method for Identifying Sequences in Large Nucleotide Sequence Databases. Ph.D. Thesis, 1988 (University Microfilms Inc., Ann Arbor, Michigan, USA.)
[4] Etzold T. The Sequence Retrieval System (SRS) on the WWW. embnet.news 1(2):5-6 (1994)
[5] Doelz R, Eggenberger F. and Wadley, R. Biocomputing on a Server Network. embnet.news 1(2):6-8 (1994)
[6] Esterman L. Bioccelerator: A Currently Available Solution for Fast Profile and Smith-Waterman Searches. embnet.news 2(1): 5-6 (1994)

Lee Kozar / Manager, Bioinformatics Resource / lkozar@cmgm.stanford.edu