Value Object Descriptor Specification

This draft specification, under development by Driver Projects of the GKS Work Stream, specifies standard data classes for the exchange of common information useful for the description of variation but superfluous to the salient elements necessary for specifying a value object. We describe these classes as Value Object Descriptors (VODs). The VOD specification introduced here is version-controlled and extensible, and envisioned to seed a larger collection of VODs used with other GA4GH standards beyond VRSATILE.

Use of VODs provides a convenience mechanism for passing labels that you As an example, this means that a value object representing a genomic variant in VRS can be conveniently passed alongside human identifiers (e.g. ClinVar IDs), expressions (e.g. HGVS descriptions), and important contexts (e.g. sequence type, gene, transcript) in a standard format. This additional structure is necessary due to the nature of value objects and VRS.

The GA4GH Variation Representation Specification (VRS) is a terminology, information model, and schema for the computational representation of variation. VRS also describes useful conventions for the normalization of variation forms for message passing between systems. Objects compliant with VRS are value objects: data objects that are compared by structure and value, in contrast to entities which are compared by registered identifiers. For example, the variants represented by the NM_004415.2:c.8472_8483del and LRG_423t1:c.8472_8483del HGVS descriptions are not found equivalent by comparing these strings, but by comparing the structure of the reference sequence and indicated change underlying the descriptors. Conversely, the meaning of the variant (a specific deletion on a specific residue sequence) is clear without reference to any external naming authority (in this example, the NM and LRG sequence identifiers), and in fact the referenced entities can only be retrieved through lookup on a sequence registry instead of through inspection of the variant itself.

Value Object Descriptor

Computational Definition

The abstract Value Object Descriptor parent class. All attributes of this parent class are inherited by descendent classes.

Information Model

Field

Type

Limits

Description

id

CURIE

1..1

Descriptor ID; MUST be unique within document.

type

string

1..1

MUST be VOD class name.

label

string

0..1

A primary label for the value object.

description

string

0..1

A free-text description of the value object.

xrefs

CURIE

0..m

List of CURIEs representing associated concepts.

alternate_labels

string

0..m

List of strings representing alternate labels for the value object.

extensions

Extension

0..m

List of resource-specific Extensions needed to describe the value object.

Variation Descriptor

Computational Definition

This descriptor class is used for describing VRS Variation value objects.

Information Model

Some VariationDescriptor attributes are inherited from Value Object Descriptor.

Field

Type

Limits

Description

id

CURIE

1..1

Descriptor ID; MUST be unique within document.

type

string

1..1

MUST be “VariationDescriptor”.

label

string

0..1

A primary label for the value object.

description

string

0..1

A free-text description of the value object.

xrefs

CURIE

0..m

List of CURIEs representing associated concepts.

alternate_labels

string

0..m

List of strings representing alternate labels for the value object.

extensions

Extension

0..m

List of resource-specific Extensions needed to describe the value object.

variation_id

CURIE

0..1

This SHOULD be provided if variation is omitted.

variation

Variation

0..1

This SHOULD be provided if variation_id is omitted.

molecule_context

string

0..1

The molecular context of this variant. Must be one of “genomic”, “transcript”, or “protein”.

structural_type

CURIE

0..1

The structural variant type associated with this variant. We RECOMMEND a descendent term of SO:0001537.

expressions

Expression

0..1

Typically HGVS or ISCN nomenclature expressions. Other systems relevant to the description of variation MAY be used.

vcf_record

VCF Record

0..1

A VCF Record of the variant. This SHOULD be a single allele, the VCF genotype (GT) field should be represented in the allelic_state attribute.

gene_context

CURIE | Gene Descriptor

0..1

A specific gene context that applies to this variant.

vrs_ref_allele_seq

Sequence

0..1

A VRS Sequence corresponding to a “ref allele”, describing the sequence expected at a VRS SequenceLocation reference.

allelic_state

CURIE

0..1

We RECOMMEND that the allelic_state of a variation be described by terms from the Genotype Ontology (GENO). These SHOULD descend from concept GENO:0000875 <http://purl.obolibrary.org/obo/GENO_0000875>.

Location Descriptor

This descriptor is intended to reference VRS Location value objects. In addition to the attributes inherited from its Value Object Descriptor parent class, the Sequence Location Descriptor has the following attributes:

Field

Type

Limits

Description

type

string

1..1

MUST be “LocationDescriptor”

location_id

CURIE

0..1

This MUST be provided if location is omitted

location

VRS Location

0..1

This MUST be provided if location_id is omitted

Sequence Descriptor

This descriptor is intended to reference VRS Sequence value objects. In addition to the attributes inherited from its Value Object Descriptor parent class, the Sequence Descriptor has the following attributes:

Field

Type

Limits

Description

type

string

1..1

MUST be “SequenceDescriptor”

sequence_id

CURIE

0..1

This MUST be provided if sequence is omitted

sequence

VRS Sequence

0..1

This MUST be provided if sequence_id is omitted

residue_type

CURIE

0..1

CURIE MUST be SO:0000348 (nucleic acid), SO:0001407 (peptidyl), or a descendent of one of these concepts.

Gene Descriptor

This descriptor is intended to reference VRS Gene value objects. In addition to the attributes inherited from its Value Object Descriptor parent class, the Gene Descriptor has the following attributes:

Field

Type

Limits

Description

type

string

1..1

MUST be “GeneDescriptor”

gene_id

CURIE

0..1

This MUST be provided if gene is omitted

gene

VRS Gene

0..1

This MUST be provided if gene_id is omitted

Categorical Variation Descriptor

Computational Definition

This descriptor class is used for describing Categorical Variation value objects.

Information Model

Some CategoricalVariationDescriptor attributes are inherited from Value Object Descriptor.

Field

Type

Limits

Description

id

CURIE

1..1

Descriptor ID; MUST be unique within document.

type

string

1..1

MUST be “VariationDescriptor”.

label

string

0..1

A primary label for the value object.

description

string

0..1

A free-text description of the value object.

xrefs

CURIE

0..m

List of CURIEs representing associated concepts.

alternate_labels

string

0..m

List of strings representing alternate labels for the value object.

extensions

Extension

0..m

List of resource-specific Extensions needed to describe the value object.

version

string

0..1

The version of the Categorical Variation Descriptor.

categorical_variation_id

CURIE

0..1

This SHOULD be provided if categorical_variation is omitted.

categorical_variation

CategoricalVariation

0..1

This SHOULD be provided if variation_id is omitted.

members

VariationMember

0..m

VariationMember instances that fall within the functional domain of the Categorical Variation.

Other Data Classes

VCF Record

Computational Definition

This data class is used when it is desirable to pass data as expected from a VCF record. The class is only used as an optional attribute within a Variation Descriptor. The Genotype field from a VCF should be captured by the allelic_state attribute in the Variation Descriptor.

Information Model

Field

Type

Limits

Description

genome_assembly

string

1..1

Identifier for the genome assembly used to call the allele.

chrom

string

1..1

A chromosome or contig identifier.

pos

string

1..1

The reference residue-coordinate position, with the first residue having position 1.

id

string

0..1

A semicolon-separated list of unique identifiers where available. For example, dbSNP rsIDs. We RECOMMEND storing this information as a list in the Variation Descriptor xrefs field.

ref

string

1..1

Reference base as expected by the VCF specification.

alt

string

1..1

Alternate base as expected by the VCF specification.

qual

string

0..1

Quality: Phred-scaled quality score for the assertion made in ALT.

filter

string

0..1

Filter status: PASS if this position has passed all filters.

info

string

0..1

Additional information: Semicolon-separated series of additional information fields.

Extension

The Extension class provides VODs with a means to extend descriptions with other attributes unique to a content provider. These extensions are not expected to be natively understood under VRSATILE, but may be used for pre-negotiated exchange of message attributes when needed.

Field

Type

Limits

Description

type

string

1..1

MUST be “Extension”

name

string

1..1

A name for the Extension

value

any[]

0..*

Any primitive or structured object

Expression

The Expression class is designed to enable descriptions based on a specified nomenclature or syntax for representing an object. Common examples of expressions for the description of molecular variation include the HGVS and ISCN nomenclatures.

Field

Type

Limits

Description

type

string

1..1

MUST be “Expression”

syntax

CURIE

1..1

CURIE referencing the expression syntax

value

string

1..1

The concept expression as a string

version

string

0..1

An optional version of the expression syntax