Expressed sequence tags (ESTs) offer a low cost approach to gene discovery and are being used by an increasing number of laboratories to obtain sequence information for a wide variety of organisms. The challenge lies in processing and organising this data within a genomic context to facilitate large scale analyses. Here we present PartiGene, an integrated sequence analysis suite which uses freely available public domain software to:
(1) process raw trace chromatograms into sequence objects suitable for submission to dbEST;
(2) place these sequences within a genomic context;
(3) perform customisable annotation of the data; and
(4) present the data as HTML tables and an SQL database resource.
PartiGene has been used to create a number of non-model organism database resources including NEMBASE ( http://www.nematodes.org). The packages are readily portable, freely available and can be run on simple Linux based workstations.
AVAILABLE SOFTWARE:
trace2dbest: software to process sequence trace files ready for submission to NCBI dbEST. Versions for simple processing of sequence trace files, and for processing GSS sequences, are also available.
PartiGene: software for processing sequences (usually ESTs) into clusters representing putative genes, and for annotating and databasing them.
prot4EST: software for accurate prediction of translations from clustered EST datasets
annot8r: software for functional annotation of gene datasets contained in a PartiGene database
wwwPartiGene: software for preparing a PartiGene database for presentation and querying via the www
CLOBB: software for clustering sequences into putative gene objects (incorporated in PartiGene)
SimiTri: software for visualisation of the similarity relationships (BLAST, expression, etc) of one dataset against three others