sORFs.org: repository of small ORFs identified by ribosome profiling

This section provides a summary on the acquisition and processing of the data. For more detailed information, feel free to consult the available literature. Use the navigation bar on the left to navigate trough the different steps.

4. sORF assembly, (non) splice-aware

The sORF assembly screens for in-frame stop codons starting from all identified TIS positions. Either no annotation is taken into account (non splice-aware) or all splice sites and their correspond rearranged mRNA transcripts are mapped. Next to defining the genomic coordinates, the mass of the resulting peptide, and the DNA/AA sequences; a number of other characteristics are calculated enabling downstream evaluation of the identified sORF sequence: annotation (based on TIS location in 5'UTR, exonic, intronic, 3'UTR, ncRNA or intergenic regions), % of overlap with an annotated exon region, nearest gene (for intergenic sORFs)... For sORFs with multiple possible Ensemble annotations (i.e. protein-coding/lincRNA), an annotation rank list was constructed and the sORF is attributed the highest ranked annotation.

Annotation rank in descending order : 'protein_coding', 'nonsense_mediated_decay', 'non_stop_decay', 'lincRNA' 'antisense', 'sense_intronic', 'sense_overlapping', '3prime_overlapping_ncrna', 'macro_lncRNA' 'processed_transcript', 'retained_intron', 'processed_pseudogene', 'unprocessed_pseudogene' 'transcribed_unprocessed_pseudogene', 'transcribed_processed_pseudogene', 'unitary_pseudogene' 'polymorphic_pseudogene', 'pseudogene', 'transcribed_unitary_pseudogene' 'translated_unprocessed_pseudogene', 'TEC', 'NA', 'nohit'.


Classication of different sORFs by the assembly