VariantMethods

searchVariants(request)
Parameters:request – SearchVariantsRequest: This request maps to the body of POST /variants/search as JSON.
Return type:SearchVariantsResponse
Throws:GAException

Gets a list of Variant matching the search criteria.

POST /variants/search must accept a JSON version of SearchVariantsRequest as the post body and will return a JSON version of SearchVariantsResponse.

getCallSet(id)
Parameters:id – string: The ID of the CallSet.
Return type:org.ga4gh.models.CallSet
Throws:GAException

Gets a CallSet by ID. GET /callsets/{id} will return a JSON version of CallSet.

searchVariantSets(request)
Parameters:request – SearchVariantSetsRequest: This request maps to the body of POST /variantsets/search as JSON.
Return type:SearchVariantSetsResponse
Throws:GAException

Gets a list of VariantSet matching the search criteria.

POST /variantsets/search must accept a JSON version of SearchVariantSetsRequest as the post body and will return a JSON version of SearchVariantSetsResponse.

getVariantSet(id)
Parameters:id – string: The ID of the VariantSet.
Return type:org.ga4gh.models.VariantSet
Throws:GAException

Gets a VariantSet by ID. GET /variantsets/{id} will return a JSON version of VariantSet.

getVariant(id)
Parameters:id – string: The ID of the Variant.
Return type:org.ga4gh.models.Variant
Throws:GAException

Gets a Variant by ID. GET /variants/{id} will return a JSON version of Variant.

searchCallSets(request)
Parameters:request – SearchCallSetsRequest: This request maps to the body of POST /callsets/search as JSON.
Return type:SearchCallSetsResponse
Throws:GAException

Gets a list of CallSet matching the search criteria.

POST /callsets/search must accept a JSON version of SearchCallSetsRequest as the post body and will return a JSON version of SearchCallSetsResponse.

error GAException

A general exception type.

enum Strand
Symbols:NEG_STRAND|POS_STRAND

Indicates the DNA strand associate for some data item. * NEG_STRAND: The negative (-) strand. * POS_STRAND: The postive (+) strand.

record Position
Fields:
  • referenceName (string) – The name of the Reference on which the Position is located.
  • position (long) –
    The 0-based offset from the start of the forward strand for that Reference.
    Genomic positions are non-negative integers less than Reference length.
  • strand (Strand) – Strand the position is associated with.

A Position is an unoriented base in some Reference. A Position is represented by a Reference name, and a base number on that Reference (0-based).

record ExternalIdentifier
Fields:
  • database (string) –
    The source of the identifier.
    (e.g. Ensembl)
  • identifier (string) –
    The ID defined by the external database.
    (e.g. ENST00000000000)
  • version (string) –
    The version of the object or the database
    (e.g. 78)

Identifier from a public database

enum CigarOperation
Symbols:ALIGNMENT_MATCH|INSERT|DELETE|SKIP|CLIP_SOFT|CLIP_HARD|PAD|SEQUENCE_MATCH|SEQUENCE_MISMATCH

An enum for the different types of CIGAR alignment operations that exist. Used wherever CIGAR alignments are used. The different enumerated values have the following usage:

  • ALIGNMENT_MATCH: An alignment match indicates that a sequence can be aligned to the reference without evidence of an INDEL. Unlike the SEQUENCE_MATCH and SEQUENCE_MISMATCH operators, the ALIGNMENT_MATCH operator does not indicate whether the reference and read sequences are an exact match. This operator is equivalent to SAM’s M.
  • INSERT: The insert operator indicates that the read contains evidence of bases being inserted into the reference. This operator is equivalent to SAM’s I.
  • DELETE: The delete operator indicates that the read contains evidence of bases being deleted from the reference. This operator is equivalent to SAM’s D.
  • SKIP: The skip operator indicates that this read skips a long segment of the reference, but the bases have not been deleted. This operator is commonly used when working with RNA-seq data, where reads may skip long segments of the reference between exons. This operator is equivalent to SAM’s ‘N’.
  • CLIP_SOFT: The soft clip operator indicates that bases at the start/end of a read have not been considered during alignment. This may occur if the majority of a read maps, except for low quality bases at the start/end of a read. This operator is equivalent to SAM’s ‘S’. Bases that are soft clipped will still be stored in the read.
  • CLIP_HARD: The hard clip operator indicates that bases at the start/end of a read have been omitted from this alignment. This may occur if this linear alignment is part of a chimeric alignment, or if the read has been trimmed (e.g., during error correction, or to trim poly-A tails for RNA-seq). This operator is equivalent to SAM’s ‘H’.
  • PAD: The pad operator indicates that there is padding in an alignment. This operator is equivalent to SAM’s ‘P’.
  • SEQUENCE_MATCH: This operator indicates that this portion of the aligned sequence exactly matches the reference (e.g., all bases are equal to the reference bases). This operator is equivalent to SAM’s ‘=’.
  • SEQUENCE_MISMATCH: This operator indicates that this portion of the aligned sequence is an alignment match to the reference, but a sequence mismatch (e.g., the bases are not equal to the reference). This can indicate a SNP or a read error. This operator is equivalent to SAM’s ‘X’.
record CigarUnit
Fields:
  • operation (CigarOperation) – The operation type.
  • operationLength (long) – The number of bases that the operation runs for.
  • referenceSequence (null|string) –
    referenceSequence is only used at mismatches (SEQUENCE_MISMATCH)
    and deletions (DELETE). Filling this field replaces the MD tag. If the relevant information is not available, leave this field as null.

A structure for an instance of a CIGAR operation. FIXME: This belongs under Reads (only readAlignment refers to this)

record VariantSetMetadata
Fields:
  • key (string) – The top-level key.
  • value (string) – The value field for simple metadata.
  • id (string) –
    User-provided ID field, not enforced by this API.
    Two or more pieces of structured metadata with identical id and key fields are considered equivalent. FIXME: If it’s not enforced, then why can’t it be null?
  • type (string) – The type of data.
  • number (string) –
    The number of values that can be included in a field described by this
    metadata.
  • description (string) – A textual description of this metadata.
  • info (map<array<string>>) – Remaining structured metadata key-value pairs.

Optional metadata associated with a variant set.

record VariantSet
Fields:
  • id (string) – The variant set ID.
  • name (null|string) – The variant set name.
  • datasetId (string) – The ID of the dataset this variant set belongs to.
  • referenceSetId (string) – The ID of the reference set that describes the sequences used by the variants in this set.
  • metadata (array<VariantSetMetadata>) –
    Optional metadata associated with this variant set.
    This array can be used to store information about the variant set, such as information found in VCF header fields, that isn’t already available in first class fields such as “name”.

A VariantSet is a collection of variants and variant calls intended to be analyzed together.

record CallSet
Fields:
  • id (string) – The call set ID.
  • name (null|string) – The call set name.
  • sampleId (null|string) –
    The sample this call set’s data was generated from.
    Note: the current API does not have a rigorous definition of sample. Therefore, this field actually contains an arbitrary string, typically corresponding to the sampleId field in the read groups used to generate this call set.
  • variantSetIds (array<string>) – The IDs of the variant sets this call set has calls in.
  • created (null|long) – The date this call set was created in milliseconds from the epoch.
  • updated (null|long) –
    The time at which this call set was last updated in
    milliseconds from the epoch.
  • info (map<array<string>>) – A map of additional call set information.

A CallSet is a collection of calls that were generated by the same analysis of the same sample.

record Call
Fields:
  • callSetName (null|string) –
    The name of the call set this variant call belongs to.
    If this field is not present, the ordering of the call sets from a SearchCallSetsRequest over this VariantSet is guaranteed to match the ordering of the calls on this Variant. The number of results will also be the same.
  • callSetId (null|string) –

    The ID of the call set this variant call belongs to.

    If this field is not present, the ordering of the call sets from a SearchCallSetsRequest over this VariantSet is guaranteed to match the ordering of the calls on this Variant. The number of results will also be the same.
  • genotype (array<int>) –

    The genotype of this variant call.

    A 0 value represents the reference allele of the associated Variant. Any other value is a 1-based index into the alternate alleles of the associated Variant.

    If a variant had a referenceBases field of “T”, an alternateBases value of [“A”, “C”], and the genotype was [2, 1], that would mean the call represented the heterozygous value “CA” for this variant. If the genotype was instead [0, 1] the represented value would be “TA”. Ordering of the genotype values is important if the phaseset field is present.

  • phaseset (null|string) –
    If this field is not null, this variant call’s genotype ordering implies
    the phase of the bases and is consistent with any other variant calls on the same contig which have the same phaseset string.
  • genotypeLikelihood (array<double>) –
    The genotype likelihoods for this variant call. Each array entry
    represents how likely a specific genotype is for this call as log10(P(data | genotype)), analogous to the GL tag in the VCF spec. The value ordering is defined by the GL tag in the VCF spec.
  • info (map<array<string>>) – A map of additional variant call information.

A Call represents the determination of genotype with respect to a particular Variant.

It may include associated information such as quality and phasing. For example, a call might assign a probability of 0.32 to the occurrence of a SNP named rs1234 in a call set with the name NA12345.

record Variant
Fields:
  • id (string) – The variant ID.
  • variantSetId (string) –
    The ID of the VariantSet this variant belongs to. This transitively defines
    the ReferenceSet against which the Variant is to be interpreted.
  • names (array<string>) – Names for the variant, for example a RefSNP ID.
  • created (null|long) – The date this variant was created in milliseconds from the epoch.
  • updated (null|long) –
    The time at which this variant was last updated in
    milliseconds from the epoch.
  • referenceName (string) –
    The reference on which this variant occurs.
    (e.g. chr20 or X)
  • start (long) –
    The start position at which this variant occurs (0-based).
    This corresponds to the first base of the string of reference bases. Genomic positions are non-negative integers less than reference length. Variants spanning the join of circular genomes are represented as two variants one on each side of the join (position 0).
  • end (long) –
    The end position (exclusive), resulting in [start, end) closed-open interval.
    This is typically calculated by start + referenceBases.length.
  • referenceBases (string) – The reference bases for this variant. They start at the given start position.
  • alternateBases (array<string>) –
    The bases that appear instead of the reference bases. Multiple alternate
    alleles are possible.
  • info (map<array<string>>) – A map of additional variant information.
  • calls (array<Call>) –
    The variant calls for this particular variant. Each one represents the
    determination of genotype with respect to this variant. Call`s in this array are implicitly associated with this `Variant.

A Variant represents a change in DNA sequence relative to some reference. For example, a variant could represent a SNP or an insertion. Variants belong to a VariantSet. This is equivalent to a row in VCF.

record SearchVariantSetsRequest
Fields:
  • datasetId (string) – The Dataset to search.
  • pageSize (null|int) –
    Specifies the maximum number of results to return in a single page.
    If unspecified, a system default will be used.
  • pageToken (null|string) –
    The continuation token, which is used to page through large result sets.
    To get the next page of results, set this parameter to the value of nextPageToken from the previous response.

This request maps to the body of POST /variantsets/search as JSON.

record SearchVariantSetsResponse
Fields:
  • variantSets (array<org.ga4gh.models.VariantSet>) – The list of matching variant sets.
  • nextPageToken (null|string) –
    The continuation token, which is used to page through large result sets.
    Provide this value in a subsequent request to return the next page of results. This field will be empty if there aren’t any additional results.

This is the response from POST /variantsets/search expressed as JSON.

record SearchVariantsRequest
Fields:
  • variantSetId (string) – The VariantSet to search.
  • callSetIds (null|array<string>) –
    Only return variant calls which belong to call sets with these IDs.
    If an empty array, returns variants without any call objects. If null, returns all variant calls.
  • referenceName (string) – Required. Only return variants on this reference.
  • start (long) –
    Required. The beginning of the window (0-based, inclusive) for
    which overlapping variants should be returned. Genomic positions are non-negative integers less than reference length. Requests spanning the join of circular genomes are represented as two requests one on each side of the join (position 0).
  • end (long) –
    Required. The end of the window (0-based, exclusive) for which overlapping
    variants should be returned.
  • pageSize (null|int) –
    Specifies the maximum number of results to return in a single page.
    If unspecified, a system default will be used.
  • pageToken (null|string) –
    The continuation token, which is used to page through large result sets.
    To get the next page of results, set this parameter to the value of nextPageToken from the previous response.

This request maps to the body of POST /variants/search as JSON.

record SearchVariantsResponse
Fields:
  • variants (array<org.ga4gh.models.Variant>) –
    The list of matching variants.
    If the callSetId field on the returned calls is not present, the ordering of the call sets from a SearchCallSetsRequest over the parent VariantSet is guaranteed to match the ordering of the calls on each Variant. The number of results will also be the same.
  • nextPageToken (null|string) –
    The continuation token, which is used to page through large result sets.
    Provide this value in a subsequent request to return the next page of results. This field will be empty if there aren’t any additional results.

This is the response from POST /variants/search expressed as JSON.

record SearchCallSetsRequest
Fields:
  • variantSetId (string) – The VariantSet to search.
  • name (null|string) – Only return call sets with this name (case-sensitive, exact match).
  • pageSize (null|int) –
    Specifies the maximum number of results to return in a single page.
    If unspecified, a system default will be used.
  • pageToken (null|string) –
    The continuation token, which is used to page through large result sets.
    To get the next page of results, set this parameter to the value of nextPageToken from the previous response.

This request maps to the body of POST /callsets/search as JSON.

record SearchCallSetsResponse
Fields:
  • callSets (array<org.ga4gh.models.CallSet>) – The list of matching call sets.
  • nextPageToken (null|string) –
    The continuation token, which is used to page through large result sets.
    Provide this value in a subsequent request to return the next page of results. This field will be empty if there aren’t any additional results.

This is the response from POST /callsets/search expressed as JSON.