RNA finder
I. Introduction
COME is designed to calculate COding potential from Multiple fEatures for transcripts.
COME accectps a gtf-format file as input, predicts the input transcripts as either coding ones or non-coding ones. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes COME’s performance more accurate and robust than other well-known tools, for transcripts with different lengths and assembly qualities. First, COME compose the feature matrix for the given transcripts using the pre-calculated features vectors. Second, COME predict the coding potential by the pre-trained models, using the feature matrix generated in the first step. COME is currently pre-trained for five model species: human (hg19), mouse (mm10), fly (dm3), worm (ce10) and plant (TAIR10).
II. Input file
The input gtf file should be:
1) as the description from UCSC gtf file.
2) Chromosome names should be in lower and abbreviate case, e.g. (chr1, chrX, etc), except for worm genome, which used roman number: chrI, chrII, chrIII, chrIV, chrX, chrY.
3) Only exon is allowed in the third column.
4) Only + or - are allowed in the seventh column.
5) gene_id and transcript_id must be provided with every transcript in gtf.
6) Transcript length must be longer than 50 nucleotides.
7) Other lines will be skipped.
III. Calculation Time
Transcript length
Transcript number
Runing time (seconds)
[ 0.2k, 0.5k) 1000 63 ± 5
[ 0.5k, 1.0k) 1000 88 ± 27
[ 1.0k, 1.5k) 1000 81 ± 31
[ 1.5k, 2.0k) 1000 68 ± 10
random sampled 10 38 ± 13
random sampled 100 72 ± 18
random sampled 1000 76 ± 8