You are here



Reliable orthology prediction is central to comparative genomics and the annotation of newly sequenced genomes. Since orthology and paralogy are both evolutionary concepts, phylogeny-based strategies are expected to provide the most accurate predictions. However, given the high computational cost associated to phylogenetic analyses, the majority of automated orthology prediction methods rely on faster but less accurate pairwise sequence comparisons. Only recently, thanks to the availability of faster computers and better algorithms, it is feasible to use phylogeny-based orthology prediction at genomic scale.


Recently, several projects have addressed the reconstruction of large collections of high-quality phylogenetic trees from which orthology can be inferred. This provides us with the opportunity to infer the evolutionary relationships of two genes from multiple, independent, phylogenetic trees and use the consistency across predictions as a reliability measure of an orthology assignment. By using phylogenetic trees available at PhylomeDB, Ensembl, TreeFam and Fungal Orthogroups databases and those reconstructed for EggNOG, OrthoMCL, and COG, we predict orthology and paralogy relationships for over XXX millions proteins in XXXX fully-sequenced genomes and provide a reliability score for all of them, based on the number of independent trees and the consistency across predictions.