RNA finder

Help for COME Server

I.Introduction

II.Input file


I. Introduction

COME is designed to calculate COding potential from Multiple fEatures for transcripts.
COME accectps a gtf-format file as input, predicts the input transcripts as either coding ones or non-coding ones. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes COME’s performance more accurate and robust than other well-known tools, for transcripts with different lengths and assembly qualities. First, COME compose the feature matrix for the given transcripts using the pre-calculated features vectors. Second, COME predict the coding potential by the pre-trained models, using the feature matrix generated in the first step. COME is currently pre-trained for five model species: human (hg19), mouse (mm10), fly (dm3), worm (ce10) and plant (TAIR10).

II. Input file

The input gtf file should be:
1) as the description from UCSC gtf file.
2) Chromosome names should be in lower and abbreviate case, e.g. (chr1, chrX, etc), except for worm genome, which used roman number: chrI, chrII, chrIII, chrIV, chrX, chrY.
3) Only exon is allowed in the third column.
4) Only + or - are allowed in the seventh column.
5) gene_id and transcript_id must be provided with every transcript in gtf.
6) Transcript length must be longer than 50 nucleotides.
7) Other lines will be skipped.


III. Calculation Time

Transcript length Transcript number Runing time (seconds)
[ 0.2k, 0.5k)100063 ± 5
[ 0.5k, 1.0k)100088 ± 27
[ 1.0k, 1.5k)100081 ± 31
[ 1.5k, 2.0k)100068 ± 10
random sampled1038 ± 13
random sampled10072 ± 18
random sampled100076 ± 8