The VariantAnnotation record groups different types of annotation records by Variant.

The TranscriptEffect sub record holds information on the effect of a specific allele on a specific transcript. As Variants may overlap multiple transcripts, they may have multiple TranscriptEffect records. Variants with multiple alternate alleles will have multiple TranscriptEffect records per transcript. (2 alternate alleles x 3 transcripts = 6 TranscriptEffect records)

VariantAnnotation records belong to VariantAnnotationSets. VariantAnnotationSets are created by comparing a number of Variants from a VariantSet to a specific set of reference data using specific software tools. A VariantAnnotationSet contains information on reference data and software versions used in calculating the annotation; it is essential this information is exhaustive.

message AnalysisResult
  • analysis_id (string) – The ID of the analysis record for this result
  • result (string) – The text-based result for this analysis
  • score (integer) – The numeric score for this analysis

An AnalysisResult record holds the output of a prediction package such as SIFT on a specific allele.

message AlleleLocation
  • start (integer) – Relative start position of the allele in this coordinate system
  • end (integer) – Relative end position of the allele in this coordinate system
  • reference_sequence (string) – Reference sequence in feature (this should be the codon at CDS level)
  • alternate_sequence (string) – Alternate sequence in feature (this should be the codon at CDS level)

An allele location record holds the location of an allele relative to a non - genomic coordinate system such as a CDS or protein and holds the reference and alternate sequence where appropriate

message VariantAnnotationSet
  • id (string) – The ID of the variant annotation set record
  • variant_set_id (string) – The ID of the variant set to which this annotation set belongs
  • name (string) – The variant annotation set name.
  • analysis (Analysis) – Analysis details. It is essential to supply versions for all software and reference data used.
  • attributes (Attributes) – A map of additional information about the Annotation Set.

A VariantAnnotationSet record groups VariantAnnotation records. It is derived from a VariantSet and holds information describing the software and reference data used in the annotation.

message HGVSAnnotation
  • genomic (string) –
  • transcript (string) –
  • protein (string) –

A HGVSAnnotation record holds Human Genome Variation Society descriptions of the sequence change with respect to genomic, transcript and protein sequences. See: http:// Descriptions should be provided at genomic level - Descriptions at transcript level should be provided when the allele lies within a transcript - Descriptions at protein level should be provided when the allele lies within the translated sequence or stop codon.

message TranscriptEffect
  • id (string) – The ID of the transcript effect record
  • feature_id (string) – The id of the transcript feature the annotation is relative to. TODO: derive unique id from digest of data [location, allele, transcript?]
  • alternate_bases (string) – Alternate allele - a variant may have more than one alternate allele, each of which will have distinct annotation.
  • effects (list of OntologyTerm) – Effect of variant on this feature.
  • hgvs_annotation (HGVSAnnotation) – Human Genome Variation Society variant descriptions.
  • cdna_location (AlleleLocation) – Change relative to cDNA.
  • cds_location (AlleleLocation) – Change relative to coding sequence.
  • protein_location (AlleleLocation) – Change relative to protein.
  • analysis_results (list of AnalysisResult) – Output from prediction packages such as SIFT.
  • attributes (Attributes) – A map of additional information about the Transcript Effect.

A transcript effect record is a set of information describing the effect of an allele on a transcript

message VariantAnnotation
  • id (string) – The ID of this VariantAnnotation.
  • variant_id (string) – The variant ID.
  • variant_annotation_set_id (string) – The ID of the variant annotation set this record belongs to.
  • created (string) – The time at which this record was created, in ISO 8601 format.
  • transcript_effects (list of TranscriptEffect) – The transcript effect annotation for the alleles of this variant. Each one represents the effect of a single allele on a single transcript.
  • attributes (Attributes) – A map of additional information about the Annotation.

A VariantAnnotation record represents the result of comparing a variant to a set of reference data.