|
1) How to classify dbEST libraries |
| ¡¡ |
-
EST libraries are
downloaded from dbEST at the NCBI GenBank FTP site, converted into
FASTA-formatted sequences, and divided according to library names.
-
All libraries in
dbEST are classified by organism and sequencing center.
In dbEST, there are large number of human libraries, more than
any other species. These human libraries were generated from various
sources and there are significant differences in the terminology
used to describe the library sources.
-
Human libraries are further categorized by a set of structured
and controlled terms from eVOC. By mapping 'organ' or
¡®tissue¡¯ names in the library to anatomical and pathological terms
of eVOC, we assigned the human libraries to the Anatomy ontology and
the Pathology ontology of eVOC. The remaining unmapped libraries
were considered 'unclassifiable' in the both ontologies.
|
|
2) How to cleanse sequences in dbEST |
-
CleanEST provides two different cleansed sequences: 'pre-cleansed' and 'user-cleansed'
|
| |
1. pre-cleansed ESTs |
|
- We obtained sequences of major
contamination databases: the UniVec database (for
vector/linker), the Escherichia coli full genome
sequence (for cloning host), and the RefSeq mitochondrial genome
sequences (for organelle).
-
EST sequences are compared against
these three database sequences and contaminated regions were masked.
This was performed using the Cross_match program (with
minmatch = 20 and minscore = 20).
-
Masked EST sequences are either
trimmed or discarded using our Perl script trimming tool. If masked
regions commenced within 100 bases of the 5' or 3' ends, they were
trimmed. EST sequences with internally located masked regions were
discarded because.
-
After pre-cleansing, EST sequences
shorter than 100 bases were discarded.
|
| ¡¡ |
2. user-cleansed ESTs |
| ¡¡ |
-
CleanEST provide an
automatic user-cleansing pipeline, in which sequences in a
user-selected library are cleansed on-the-fly according to
user-input options.
-
This pipeline
consists of highly reliable open-source tools and public databases.
In the interface of the pipeline, users can select parameters of the
Cross_match program and contamination sources. After user-cleansing,
users can download the cleansed sequences.
|
| 3) How to use CleanEST |
|
1. Searching for organism and sequencing center |
|
-
Searching for organism and sequencing
center is simple.
-
First, select an organism or
sequencing center. Second, list libraries or download EST sequences
of the selected organism or sequencing center.
|
|
2. Searching for human libraries |
|
 |
|
< Figure 1 > |
|
- Select anatomical terms and then list libraries or download sequences by clicking on the button.
- Select pathological terms and then list libraries or download sequences by clicking on the button.
- List libraries or download sequences of the intersection of Anatomy and Pathology. Here, the user should select both eVOC ontologies.
|
|
3. Library list in the search |
 |
|
< Figure 2 > |
|
- Serial numbers.
- The user can sort the result list by clicking on the three titles (organism, library name, organ, tissue).
- The user can download sequences by clicking on the number of the raw and pre-cleansed sequences. In the pre-cleansed, the user can download cleansing information by clicking on 'info'.
|
|
4. Obtaining 'user-cleansed' EST sequences |
|
- To provide user-cleansed sequences, CleanEST uses an automatic
user-cleansing pipeline, in which sequences in a user-selected
library are cleansed on-the-fly according to user-input options.
|
|
1. Click on the 'User-cleansed' in the Figure 2, and the user can see the popup window below.
|
 |
|
< Figure 3 > |
|
2. After submitting, the user can see the result window below (Fig. 4). And download 'user-cleansed' ESTs and
their cleansing information.
|
 |
|
< Figure 4 > |