RNAfeature

I. Introduction

COME is designed to calculate COding potential from Multiple fEatures for transcripts.
COME accectps a gtf-format file as input, predicts the input transcripts as either coding ones or non-coding ones. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes COME’s performance more accurate and robust than other well-known tools, for transcripts with different lengths and assembly qualities. First, COME compose the feature matrix for the given transcripts using the pre-calculated features vectors. Second, COME predict the coding potential by the pre-trained models, using the feature matrix generated in the first step. COME is currently pre-trained for five model species: human (hg19), mouse (mm10), fly (dm3), worm (ce10) and plant (TAIR10).

II. Input file

The input gtf file should be:
1) as the description from UCSC gtf file.
2) Chromosome names should be in lower and abbreviate case, e.g. (chr1, chrX, etc), except for worm genome, which used roman number: chrI, chrII, chrIII, chrIV, chrX, chrY.
3) Only exon is allowed in the third column.
4) Only + or - are allowed in the seventh column.
5) gene_id and transcript_id must be provided with every transcript in gtf.
6) Transcript length must be longer than 50 nucleotides.
7) Other lines will be skipped.

III. Calculation Time

Transcript length	Transcript number	Runing time (seconds)
[ 0.2k, 0.5k)	1000	63 ± 5
[ 0.5k, 1.0k)	1000	88 ± 27
[ 1.0k, 1.5k)	1000	81 ± 31
[ 1.5k, 2.0k)	1000	68 ± 10
random sampled	10	38 ± 13
random sampled	100	72 ± 18
random sampled	1000	76 ± 8

Help for COME Server

I.Introduction

II.Input file

I. Introduction

II. Input file

III. Calculation Time