1) How to classify dbEST libraries
¡¡
  • EST libraries are downloaded from dbEST at the NCBI GenBank FTP site, converted into FASTA-formatted sequences, and divided according to library names.
  • All libraries in dbEST are classified by organism and sequencing center. In dbEST, there are large number of  human libraries, more than any other species. These human libraries were generated from various sources and there are significant differences in the terminology used to describe the library sources.
  • Human libraries are further categorized  by a set of structured and controlled terms from eVOC. By mapping 'organ' or ¡®tissue¡¯ names in the library to anatomical and pathological terms of eVOC, we assigned the human libraries to the Anatomy ontology and the Pathology ontology of eVOC. The remaining unmapped libraries were considered 'unclassifiable' in the both ontologies.
2) How to cleanse sequences in dbEST
  • CleanEST provides two different cleansed sequences: 'pre-cleansed' and 'user-cleansed'
  1. pre-cleansed ESTs
  • We obtained sequences of major contamination databases: the UniVec database (for vector/linker), the Escherichia coli full genome sequence (for cloning host), and the RefSeq mitochondrial genome sequences (for organelle).
  • EST sequences are compared against these three database sequences and contaminated regions were masked. This was performed using the Cross_match program (with minmatch = 20 and minscore = 20).
  • Masked EST sequences are either trimmed or discarded using our Perl script trimming tool. If masked regions commenced within 100 bases of the 5' or 3' ends, they were trimmed. EST sequences with internally located masked regions were discarded because.
  • After pre-cleansing, EST sequences shorter than 100 bases were discarded.
¡¡ 2. user-cleansed ESTs
¡¡
  • CleanEST provide an automatic user-cleansing pipeline, in which sequences in a user-selected library are cleansed on-the-fly according to user-input options.
  • This pipeline consists of highly reliable open-source tools and public databases. In the interface of the pipeline, users can select parameters of the Cross_match program and contamination sources. After user-cleansing, users can download the cleansed sequences.
3) How to use CleanEST
1. Searching for organism and sequencing center
  • Searching for organism and sequencing center is simple.
  • First, select an organism or sequencing center. Second, list libraries or download EST sequences of the selected organism or sequencing center.
2. Searching for human libraries
< Figure 1 >                         

  1. Select anatomical terms and then list libraries or download sequences by clicking on the button.
  2. Select pathological terms and then list libraries or download sequences by clicking on the button.
  3. List libraries or download sequences of the intersection of Anatomy and Pathology. Here, the user should select both eVOC ontologies.
3. Library list in the search

< Figure 2 >                         

  1. Serial numbers.
  2. The user can sort the result list by clicking on the three titles (organism, library name, organ, tissue).
  3. The user can download sequences by clicking on the number of the raw and pre-cleansed sequences. In the pre-cleansed, the user can download cleansing information by clicking on 'info'.
4. Obtaining 'user-cleansed' EST sequences
  • To provide user-cleansed sequences, CleanEST uses an automatic user-cleansing pipeline, in which sequences in a user-selected library are cleansed on-the-fly according to user-input options.

   1. Click on the 'User-cleansed' in the Figure 2, and the user can see the popup window below.

< Figure 3 >                         

   2. After submitting, the user can see the result window below (Fig. 4). And download 'user-cleansed' ESTs and
       their cleansing information.

< Figure 4 >                         
   
 
Copyright ¨Ï 2007 by Korean BioInformation Center (KOBIC). All rights reserved.
Korean BioInformation Center, Korea Research Institute of Bioscience and Biotechnology,
52, Oun-dong, Yusong-gu, Taejon, 305-806, Korea.
TEL. +82-42-860-8511, FAX. +82-42-879-8519,   E-mail :