genotype_phenotype
This protocol defines the associations between genotype
and phenotype (G2P). Associations can be made as a
result of literature curation, computational modeling,
inference, etc., and modeled and shared using this schema.
Here, we follow the dogma of:
Genotype + Environment = Phenotype
Where a G2P association is between the G(enotype) in the context of
some E(environment), which gives rise to a P(henotype). These
associations have further evidence, provenance, and attribution.
We leverage the GenomicFeature in the sequenceAnnotation schema here
as it can accomodate any genomic feature from a single nucleotide variation
(SNV), up through a gene, and/or complex rearrangements. Each can
be modeled as genomic features, and generally linked to a phenotype.
Collections of these features can represent a genotype at different levels
of completeness. Therefore, we can represent single allelic variation,
allelic complement, and multiple variants in a genotype that can each or
collectively be associated with a phenotype.
To enable standardized integration, this schema relies heavily on
OntologyTerms, for typing phenotype, genomic features, and levels
of evidence. Suggested ontologies to leverage include (with browser links):
-
message
PhenotypeAssociationSet
Fields: |
- id (string) – The phenotype association set ID.
- name (string) – The phenotype association set name.
- dataset_id (string) – The ID of the dataset this phenotype association set belongs to.
- info (map< string ,
ListValue >) – Optional additional information for this phenotype association set.
|
The top level container for phenotype association data.
-
message
EnvironmentalContext
-
The context in which a genotype gives rise to a phenotype.
This is fairly open-ended; as a stub we have a simple ontology term.
For example, a controlled term for a drug, or perhaps an instance of a
complex environment including temperature and air quality, or perhaps
the anatomical environment (gut vs tissue type vs whole organism).
-
message
PhenotypeInstance
-
An association to a phenotype and related information.
This record is intended primarily to be used in conjunction with variants, but
the record can also be composed with other kinds of entities such as diseases
-
message
Evidence
Fields: |
- evidence_type (
OntologyTerm ) – ECO or OBI is recommended
- description (string) – A textual description of the evidence. This is used to complement the
structured description in the evidence_type field
- info (map< string ,
ListValue >) – Additional annotation data in key-value pairs.
|
Evidence for the phenotype association.
This is also a stub for further expansion. We should consider moving this into
it’s own schema.
-
message
FeaturePhenotypeAssociation
Fields: |
- id (string) – A unique identifier for the association.
- phenotype_association_set_id (string) – The ID of the PhenotypeAssociationSet this FeaturePhenotypeAssociation
belongs to.
- feature_ids (string) – The set of features of the organism that bears the phenotype.
This could be as complete as a full complement of variants,
or as minimal as the confirmed variants that are known causation
for the annotated phenotype.
Examples of features could be variations at the nucleotide level,
large rearrangements at the chromosome level, or relevant epigenetic
markers. Relevant genomic feature types are suggested to be
those typed in the Sequence Ontology (SO).
The feature set can have only one item, and must not be null.
- evidence (list of
Evidence ) – The evidence for this specific instance of association between the
features and the phenotype.
- phenotype (
PhenotypeInstance ) – The phenotypic component of this association.
- description (string) – A textual description of the association.
- environmental_contexts (list of
EnvironmentalContext ) – The context in which the phenotype arises.
Multiple contexts can be specified - these are assumed to all hold together
- info (map< string ,
ListValue >) – Additional annotation data in key-value pairs.
|
An association between one or more genomic features and a phenotype.
The instance of association allows us to link a feature to a phenotype,
multiple times, each bearing potentially different levels of confidence,
such as resulting from alternative experiments and analysis.