Value Object Descriptor Specification

This draft specification, under development by Driver Projects of the GKS Work Stream, specifies standard data classes for the exchange of common information useful for the description of variation but superfluous to the salient elements necessary for specifying a value object. We describe these classes as Value Object Descriptors (VODs). The VOD specification introduced here is version-controlled and extensible, and envisioned to seed a larger collection of VODs used with other GA4GH standards beyond VRSATILE.

Use of VODs provides a convenience mechanism for passing labels that you As an example, this means that a value object representing a genomic variant in VRS can be conveniently passed alongside human identifiers (e.g. ClinVar IDs), expressions (e.g. HGVS descriptions), and important contexts (e.g. sequence type, gene, transcript) in a standard format. This additional structure is necessary due to the nature of value objects and VRS.

The GA4GH Variation Representation Specification (VRS) is a terminology, information model, and schema for the computational representation of variation. VRS also describes useful conventions for the normalization of variation forms for message passing between systems. Objects compliant with VRS are value objects: data objects that are compared by structure and value, in contrast to entities which are compared by registered identifiers. For example, the variants represented by the NM_004415.2:c.8472_8483del and LRG_423t1:c.8472_8483del HGVS descriptions are not found equivalent by comparing these strings, but by comparing the structure of the reference sequence and indicated change underlying the descriptors. Conversely, the meaning of the variant (a specific deletion on a specific residue sequence) is clear without reference to any external naming authority (in this example, the NM and LRG sequence identifiers), and in fact the referenced entities can only be retrieved through lookup on a sequence registry instead of through inspection of the variant itself.

Value Object Descriptor

Computational Definition

The abstract Value Object Descriptor parent class. All attributes of this parent class are inherited by descendent classes.

Variation Descriptor

Computational Definition

This descriptor class is used for describing VRS Variation value objects.

Information Model

Some VariationDescriptor attributes are inherited from Entity.

Field

Type

Limits

Description

id

string

1..1

The ‘logical’ identifier of the entity in the system of record, e.g. a UUID. This ‘id’ is unique within a given system. The identified entity may have a different ‘id’ in a different system, or may refer to an ‘id’ for the shared concept in another system (e.g. a CURIE).

type

string

1..1

MUST be “VariationDescriptor”.

label

string

0..1

A primary label for the value object.

extensions

Extension

0..m

description

string

0..1

A free-text description of the value object.

xrefs

CURIE

0..m

List of CURIEs representing associated concepts.

alternate_labels

string

0..m

List of strings representing alternate labels for the value object.

variation

CURIE | Variation

1..1

MUST be a Variation or CURIE reference to a Variation.

molecule_context

string

0..1

The molecular context of this variant. Must be one of “genomic”, “transcript”, or “protein”.

structural_type

CURIE

0..1

The structural variant type associated with this variant. We RECOMMEND a descendent term of SO:0001537.

expressions

Expression

0..m

Typically HGVS or ISCN nomenclature expressions. Other systems relevant to the description of variation MAY be used.

gene_context

CURIE | Gene Descriptor

0..1

A specific gene context that applies to this variant.

vrs_ref_allele_seq

Sequence

0..1

A VRS Sequence corresponding to a “ref allele”, describing the sequence expected at a VRS SequenceLocation reference.

allelic_state

CURIE

0..1

We RECOMMEND that the allelic_state of a variation be described by terms from the Genotype Ontology (GENO). These SHOULD descend from concept GENO:0000875 <http://purl.obolibrary.org/obo/GENO_0000875>.

Location Descriptor

This descriptor is intended to reference VRS Location value objects. In addition to the attributes inherited from its Value Object Descriptor parent class, the Sequence Location Descriptor has the following attributes:

Field

Type

Limits

Description

type

string

1..1

MUST be “LocationDescriptor”

location_id

CURIE

0..1

This MUST be provided if location is omitted

location

VRS Location

0..1

This MUST be provided if location_id is omitted

Sequence Descriptor

This descriptor is intended to reference VRS Sequence value objects. In addition to the attributes inherited from its Value Object Descriptor parent class, the Sequence Descriptor has the following attributes:

Field

Type

Limits

Description

type

string

1..1

MUST be “SequenceDescriptor”

sequence_id

CURIE

0..1

This MUST be provided if sequence is omitted

sequence

VRS Sequence

0..1

This MUST be provided if sequence_id is omitted

residue_type

CURIE

0..1

CURIE MUST be SO:0000348 (nucleic acid), SO:0001407 (peptidyl), or a descendent of one of these concepts.

Gene Descriptor

This descriptor is intended to reference VRS Gene value objects. In addition to the attributes inherited from its Value Object Descriptor parent class, the Gene Descriptor has the following attributes:

Field

Type

Limits

Description

type

string

1..1

MUST be “GeneDescriptor”

gene_id

CURIE

0..1

This MUST be provided if gene is omitted

gene

VRS Gene

0..1

This MUST be provided if gene_id is omitted

Categorical Variation Descriptor

Computational Definition

This descriptor class is used for describing Categorical Variation value objects.

Other Data Classes

VCF Record

Extension

The Extension class provides VODs with a means to extend descriptions with other attributes unique to a content provider. These extensions are not expected to be natively understood under VRSATILE, but may be used for pre-negotiated exchange of message attributes when needed.

Field

Type

Limits

Description

type

string

1..1

MUST be “Extension”

name

string

1..1

A name for the Extension

value

any[]

0..*

Any primitive or structured object

Expression

The Expression class is designed to enable descriptions based on a specified nomenclature or syntax for representing an object. Common examples of expressions for the description of molecular variation include the HGVS and ISCN nomenclatures.

Field

Type

Limits

Description

type

string

1..1

MUST be “Expression”

syntax

CURIE

1..1

CURIE referencing the expression syntax

value

string

1..1

The concept expression as a string

version

string

0..1

An optional version of the expression syntax