Small open reading frames (sORFs) can be defined as open reading frames smaller than or equal to 300 nucleotides (100 amino acids). These “sORFs”, while inherent to all genomes, are historically ignored in gene annotation studies, stating that these lack any coding potential. Exclusion of these sORFs has emerged as a side effect during the development of different (gene prediction) tools in the field of bioinformatics/genomics/proteomics trying to reduce noise, imposed by technological limitations However, recent scientific breakthroughs discovered coding potential of several sORFs with clinical significance, indicating their importance. 1, 2, 4 . In particular, the advent of ribosome profiling 5 (RIBO-seq), a next generation deep sequencing technique, providing a genome-wide snapshot of the translating machinery in a cell, provided evidence of translation in sORFs. The value and importance of sORFs is becoming widely recognized 6, 7 furthermore ribosome profiling data is becoming more abundant. The creation of a public repository for sORFs, providing information resulting from various tools and metrics, seems a necessity in aiding functional research in the micropeptide field
What does the database hold:
With this in mind, we like to introduce sORF.org, a public repository for sORFs. The main purpose is to allow researchers to examine individual sORFs or to perform searches based on several criteria for further large-scale studies. Different data sources, both experimental and in silico (based on various bioinformatics tools), are collected. sORF.org currently holds 367582 sORFs across three different species (human, mouse and fruit fly), derived from multiple RIBO-seq experiments and is expanding as more data becomes available. Available datasets can be inspected HERE.
Two query interfaces were developed for sORFs.org. The default query interface
excels in the quick lookup of sORFs, however has limited query possibilities.For example the default query interface excels at the lookup of sORFs
containing a specific sequence pattern. A tutorial regarding the default query interface
can be found HERE.
For advanced querying and export options a BioMart query interface is implemented. BioMart allows to filter, view and export data according to the user's needs. A tutorial regarding the BioMart query interface can be found HERE.
Relevant data and/or papers can be sumbitted by completing the form on the submit page, found HERE. Data provided will be manually curated and implemented if relevant. All contributions are highly appreciated and will be accredited accordingly.
Suggestions, questions or remarks,
are graciously received by completing the contact form located HERE.
If you wish to acknowledge sORFs.org in your publication, Please cite:
Volodimir Olexiouk; Jeroen Crappé; Steven Verbruggen; Kenneth Verhegen; Lennart Martens; and Gerben Menschaert
Nucleic Acids Research 2015; doi:10.1093/nar/gkv1175