See Variants schema for a detailed reference.
Variants Data Model¶
The Variants data model, although based on the VCF format, allows for more versatile interaction with the data. Instead of sending whole VCF files, the server can send information on specific variants or genomic regions instead. And instead of getting the whole genotype matrix, it’s possible to just get details for one or more specified individuals.
The API uses four main entities to represent variants. The following diagram illustrates how these entities relate to each other to constitute the genotype matrix.
The lowest-level entity is a Call:
The other entities can be thought of as collections of Calls that have something in common:
- a variant description: a potential difference between experimental DNA and a reference sequence, including the site (position of the difference) and alleles (how the bases differ)
- variant observations: a collection of Calls describing evidence for actual instances of that difference, as seen in analyses of experimental data
a supports working with the subset of Calls in a VariantSet that were generated by the same analysis of the same sample. The CallSet includes information about which sample was analyzed and how it was analyzed, and is linked to information about what differences were found.
The following diagram shows the relationship of these four entities to
each other and to other GA4GH API entities. It shows which entities
contain other entities (such as ),
and which contain IDs that can be used to get information from other
entities (such as ’s
arrow points from the entity that contains the ID to the entity
that can be identified by that ID.
FIXME: remove the Sample object from the graphic; that object isn’t (yet) defined in the API.