references

message Reference
Fields:
  • id (string) – The reference ID. Unique within the repository.
  • length (long) – The length of this reference’s sequence.
  • md5checksum (string) – The MD5 checksum uniquely representing this Reference as a lower-case hexadecimal string, calculated as the MD5 of the upper-case sequence excluding all whitespace characters (this is equivalent to SQ:M5 in SAM).
  • name (string) – The unique name of this reference within the Reference Set (e.g. ‘22’).
  • source_uri (string) – The URI from which the sequence was obtained. Specifies a FASTA format file/string with one name, sequence pair. In most cases, clients should call the getReferenceBases() method to obtain sequence bases for a Reference instead of attempting to retrieve this URI.
  • source_accessions (string) – All known corresponding accession IDs in INSDC (GenBank/ENA/DDBJ) which must include a version number, e.g. GCF_000001405.26.
  • is_derived (boolean) – A sequence X is said to be derived from source sequence Y, if X and Y are of the same length and the per-base sequence divergence at A/C/G/T bases is sufficiently small. Two sequences derived from the same official sequence share the same coordinates and annotations, and can be replaced with the official sequence for certain use cases.
  • source_divergence (float) – The sourceDivergence is the fraction of non-indel bases that do not match the reference this message was derived from.
  • species (OntologyTerm) – For a representation of an NCBI Taxon ID as an OntologyTerm, see NCBITaxon Ontology http://www.obofoundry.org/ontology/ncbitaxon.html For example, ‘Homo sapiens’ has the ID 9606. The NCBITaxon ontology ID for this is NCBITaxon:9606, which has the URI http://purl.obolibrary.org/obo/NCBITaxon_9606
  • attributes (Attributes) – A map of additional information.

A Reference is a canonical assembled contig, intended to act as a reference coordinate space for other genomic annotations. A single Reference might represent the human chromosome 1, for instance.

Reference s are designed to be immutable.

message ReferenceSet
Fields:
  • id (string) – The reference set ID. Unique in the repository.
  • name (string) – The reference set name.
  • md5checksum (string) –

    Order-independent MD5 checksum which identifies this ReferenceSet.

    To compute this checksum, make a list of Reference.md5checksum for all Reference s in this set. Then sort that list, and take the MD5 hash of all the strings concatenated together. Express the hash as a lower-case hexadecimal string.

  • species (OntologyTerm) – For a representation of an NCBI Taxon ID as an OntologyTerm, see NCBITaxon Ontology http://www.obofoundry.org/ontology/ncbitaxon.html For example, ‘Homo sapiens’ has the ID 9606. The NCBITaxon ontology ID for this is NCBITaxon:9606, which has the URI http://purl.obolibrary.org/obo/NCBITaxon_9606
  • description (string) – Optional free text description of this reference set.
  • assembly_id (string) – The remaining information is about the source of the sequences Public id of this reference set, such as GRCh37.
  • source_uri (string) – Specifies a FASTA format file/string.
  • source_accessions (string) – All known corresponding accession IDs in INSDC (GenBank/ENA/DDBJ) ideally with a version number, e.g. NC_000001.11.
  • is_derived (boolean) – A reference set may be derived from a source if it contains additional sequences, or some of the sequences within it are derived (see the definition of isDerived in Reference).
  • attributes (Attributes) – A map of additional information.

A ReferenceSet is a set of Reference s which typically comprise a reference assembly, such as GRCh38. A ReferenceSet defines a common coordinate space for comparing reference-aligned experimental data.