Genome annotation is the process of identifying genomic elements in a genome sequence. The most popular genomic elements are gene models, but the annotation process can involve also the identification of transposable elements, regulatory regions and tridimensional structures associated to the DNA.
Genome annotation is divided into two main parts:
- Structural annotation, as identification of gene models and transposable elements. Structural annotation can be performed based on evidence such as transcriptomic data or protein sequences or based on ab-initio models. There are two major methodologies to develop ab-initio models: HMM (e.g. used by the tool Augustus) and CNN (e.g., used by the tool Helixer).
- Functional annotation, as assignment of functions through sequence homology searches with known proteins. The identification of regulatory regions can be also associated to the functional annotations although they are not included in the standard pipelines.
This research line has two main goals:
- Development of tools to perform an extensive QC of the genoma annotations. A good example is the tool GAQET2.
- Benchmark of different genome annotation tools. At this time we have annotated more than 500 species, some of them with more than 10 different tools and approaches.

Figure 1: Example of GAQET plot to assess the different annotation for Nicotiana benthamiana.
