Value Object Descriptor Specification¶
This draft specification, under development by Driver Projects of the GKS Work Stream, specifies standard data classes for the exchange of common information useful for the description of variation but superfluous to the salient elements necessary for specifying a value object. We describe these classes as Value Object Descriptors (VODs). The VOD specification introduced here is version-controlled and extensible, and envisioned to seed a larger collection of VODs used with other GA4GH standards beyond VRSATILE.
Use of VODs provides a convenience mechanism for passing labels that you As an example, this means that a value object representing a genomic variant in VRS can be conveniently passed alongside human identifiers (e.g. ClinVar IDs), expressions (e.g. HGVS descriptions), and important contexts (e.g. sequence type, gene, transcript) in a standard format. This additional structure is necessary due to the nature of value objects and VRS.
The GA4GH Variation Representation Specification (VRS) is a terminology, information model, and schema for the computational representation of variation. VRS also describes useful conventions for the normalization of variation forms for message passing between systems. Objects compliant with VRS are value objects: data objects that are compared by structure and value, in contrast to entities which are compared by registered identifiers. For example, the variants represented by the NM_004415.2:c.8472_8483del and LRG_423t1:c.8472_8483del HGVS descriptions are not found equivalent by comparing these strings, but by comparing the structure of the reference sequence and indicated change underlying the descriptors. Conversely, the meaning of the variant (a specific deletion on a specific residue sequence) is clear without reference to any external naming authority (in this example, the NM and LRG sequence identifiers), and in fact the referenced entities can only be retrieved through lookup on a sequence registry instead of through inspection of the variant itself.
Value Object Descriptor¶
Computational Definition
The abstract Value Object Descriptor parent class. All attributes of this parent class are inherited by descendent classes.
Information Model
Field |
Type |
Limits |
Description |
---|---|---|---|
id |
1..1 |
Descriptor ID; MUST be unique within document. |
|
type |
string |
1..1 |
MUST be VOD class name. |
label |
string |
0..1 |
A primary label for the value object. |
description |
string |
0..1 |
A free-text description of the value object. |
xrefs |
0..m |
List of CURIEs representing associated concepts. |
|
alternate_labels |
string |
0..m |
List of strings representing alternate labels for the value object. |
extensions |
0..m |
List of resource-specific Extensions needed to describe the value object. |
Variation Descriptor¶
Computational Definition
This descriptor class is used for describing VRS Variation value objects.
Information Model
Some VariationDescriptor attributes are inherited from Value Object Descriptor.
Field |
Type |
Limits |
Description |
---|---|---|---|
id |
1..1 |
Descriptor ID; MUST be unique within document. |
|
type |
string |
1..1 |
MUST be “VariationDescriptor”. |
label |
string |
0..1 |
A primary label for the value object. |
description |
string |
0..1 |
A free-text description of the value object. |
xrefs |
0..m |
List of CURIEs representing associated concepts. |
|
alternate_labels |
string |
0..m |
List of strings representing alternate labels for the value object. |
extensions |
0..m |
List of resource-specific Extensions needed to describe the value object. |
|
variation_id |
0..1 |
This SHOULD be provided if variation is omitted. |
|
variation |
0..1 |
This SHOULD be provided if variation_id is omitted. |
|
molecule_context |
string |
0..1 |
The molecular context of this variant. Must be one of “genomic”, “transcript”, or “protein”. |
structural_type |
0..1 |
The structural variant type associated with this variant. We RECOMMEND a descendent term of SO:0001537. |
|
expressions |
0..1 |
Typically HGVS or ISCN nomenclature expressions. Other systems relevant to the description of variation MAY be used. |
|
vcf_record |
0..1 |
A VCF Record of the variant. This SHOULD be a single allele, the VCF genotype (GT) field should be represented in the allelic_state attribute. |
|
gene_context |
0..1 |
A specific gene context that applies to this variant. |
|
vrs_ref_allele_seq |
0..1 |
A VRS Sequence corresponding to a “ref allele”, describing the sequence expected at a VRS SequenceLocation reference. |
|
allelic_state |
0..1 |
We RECOMMEND that the allelic_state of a variation be described by terms from the Genotype Ontology (GENO). These SHOULD descend from concept GENO:0000875 <http://purl.obolibrary.org/obo/GENO_0000875>. |
Location Descriptor¶
This descriptor is intended to reference VRS Location value objects. In addition to the attributes inherited from its Value Object Descriptor parent class, the Sequence Location Descriptor has the following attributes:
Field |
Type |
Limits |
Description |
---|---|---|---|
type |
string |
1..1 |
MUST be “LocationDescriptor” |
location_id |
0..1 |
This MUST be provided if location is omitted |
|
location |
0..1 |
This MUST be provided if location_id is omitted |
Sequence Descriptor¶
This descriptor is intended to reference VRS Sequence value objects. In addition to the attributes inherited from its Value Object Descriptor parent class, the Sequence Descriptor has the following attributes:
Field |
Type |
Limits |
Description |
---|---|---|---|
type |
string |
1..1 |
MUST be “SequenceDescriptor” |
sequence_id |
0..1 |
This MUST be provided if sequence is omitted |
|
sequence |
0..1 |
This MUST be provided if sequence_id is omitted |
|
residue_type |
0..1 |
CURIE MUST be SO:0000348 (nucleic acid), SO:0001407 (peptidyl), or a descendent of one of these concepts. |
Gene Descriptor¶
This descriptor is intended to reference VRS Gene value objects. In addition to the attributes inherited from its Value Object Descriptor parent class, the Gene Descriptor has the following attributes:
Field |
Type |
Limits |
Description |
---|---|---|---|
type |
string |
1..1 |
MUST be “GeneDescriptor” |
gene_id |
0..1 |
This MUST be provided if gene is omitted |
|
gene |
0..1 |
This MUST be provided if gene_id is omitted |
Categorical Variation Descriptor¶
Computational Definition
This descriptor class is used for describing Categorical Variation value objects.
Information Model
Some CategoricalVariationDescriptor attributes are inherited from Value Object Descriptor.
Field |
Type |
Limits |
Description |
---|---|---|---|
id |
1..1 |
Descriptor ID; MUST be unique within document. |
|
type |
string |
1..1 |
MUST be “VariationDescriptor”. |
label |
string |
0..1 |
A primary label for the value object. |
description |
string |
0..1 |
A free-text description of the value object. |
xrefs |
0..m |
List of CURIEs representing associated concepts. |
|
alternate_labels |
string |
0..m |
List of strings representing alternate labels for the value object. |
extensions |
0..m |
List of resource-specific Extensions needed to describe the value object. |
|
version |
string |
0..1 |
The version of the Categorical Variation Descriptor. |
categorical_variation_id |
0..1 |
This SHOULD be provided if categorical_variation is omitted. |
|
categorical_variation |
0..1 |
This SHOULD be provided if variation_id is omitted. |
|
members |
0..m |
VariationMember instances that fall within the functional domain of the Categorical Variation. |
Other Data Classes¶
VCF Record¶
Computational Definition
This data class is used when it is desirable to pass data as expected from a VCF record. The class is only used as an optional attribute within a Variation Descriptor. The Genotype field from a VCF should be captured by the allelic_state attribute in the Variation Descriptor.
Information Model
Field |
Type |
Limits |
Description |
---|---|---|---|
genome_assembly |
string |
1..1 |
Identifier for the genome assembly used to call the allele. |
chrom |
string |
1..1 |
A chromosome or contig identifier. |
pos |
string |
1..1 |
The reference residue-coordinate position, with the first residue having position 1. |
id |
string |
0..1 |
A semicolon-separated list of unique identifiers where available. For example, dbSNP rsIDs. We RECOMMEND storing this information as a list in the Variation Descriptor xrefs field. |
ref |
string |
1..1 |
Reference base as expected by the VCF specification. |
alt |
string |
1..1 |
Alternate base as expected by the VCF specification. |
qual |
string |
0..1 |
Quality: Phred-scaled quality score for the assertion made in ALT. |
filter |
string |
0..1 |
Filter status: PASS if this position has passed all filters. |
info |
string |
0..1 |
Additional information: Semicolon-separated series of additional information fields. |
Extension¶
The Extension class provides VODs with a means to extend descriptions with other attributes unique to a content provider. These extensions are not expected to be natively understood under VRSATILE, but may be used for pre-negotiated exchange of message attributes when needed.
Field |
Type |
Limits |
Description |
---|---|---|---|
type |
string |
1..1 |
MUST be “Extension” |
name |
string |
1..1 |
A name for the Extension |
value |
any[] |
0..* |
Any primitive or structured object |
Expression¶
The Expression class is designed to enable descriptions based on a specified nomenclature or syntax for representing an object. Common examples of expressions for the description of molecular variation include the HGVS and ISCN nomenclatures.
Field |
Type |
Limits |
Description |
---|---|---|---|
type |
string |
1..1 |
MUST be “Expression” |
syntax |
1..1 |
CURIE referencing the expression syntax |
|
value |
string |
1..1 |
The concept expression as a string |
version |
string |
0..1 |
An optional version of the expression syntax |