Categorical Variation Specification

To facilitate search of biomolecular variation, contemporary biomolecular knowledgebases routinely “flatten” variation concepts to a specific context that facilitates computable matching to assayed variation, and typically provide related contexts to help characterize the intended biological concept. For example, the variant “BRAF V600E” at the CIViC resource describes a protein change, but is flattened to a representative genomic change (GRCh37 chr7:g.140453136A>T) and contextualized with corresponding transcript (NM_004333.4:c.1799T>A) and protein (NP_004324.2:p.Val600Glu) descriptions. The representative change is linked to its ClinGen Allele Registry identifier (CAID; CA123643) to facilitate CAID matching from ClinGen resources.

However, CA123643 is likewise a collection of variation contexts, including many contexts that would typically not be considered equivalent to BRAF V600E: ENST00000497784.1:n.1834T>A, ENST00000647434.1:n.738-3918T>A, and ENST00000642228.1:c.*877T>A, for example, are all associated contexts with CA123643 but none result in an altered protein product. Similarly, CA16602531 can also serve as a linked representative genomic change (through NC_000007.14:g.140753335_140753336delinsTT), but again this concept contains several contexts describing the role of the variant that are not applicable to the V600E protein variation.

In addition, more complex cases of variation also exist, where the closest approximation of a variation amounts to a simple genomic range. Examples of these types of variation include: BRAF V600 mutations, TP53 truncating mutations, EGFR exon 19 deletions. The concepts associated with these variation (any protein mutation at a codon, any truncating mutation in a gene, and any in-frame deletion in an exon) are not clearly definable using a variation description framework such as VRS or HGVS.

To address these shortfalls, we introduce the Categorical Variation Specification. Categorical Variation captures the semantics that are missing or implied in genomic knowledge resources, providing a framework for expressing how genomic knowledge may match to assayed variation. Much like the VRS objects used in this specification, Categorical Variation classes are designed to instantiate value objects that are readily usable by genomic knowledge search engines. Also see the Categorical Variation Descriptor class for describing Categorical Variation under a consistent paradigm with the Value Object Descriptor class.

Categorical Variation

Computational Definition

A representation of a categorically-defined functional domain for variation, in which individual variation instances may be members.

Information Model

Field

Type

Limits

Description

_id

string

0..1

Categorical Variation Id. MUST be unique within document.

type

string

1..1

MUST be Categorical Variation class name.

Canonical Variation

Computational Definition

A categorical variation domain characterized by a representative Variation context to which members lift-over, project, translate, or otherwise directly align.

Information Model

Some CanonicalVariation attributes are inherited from Categorical Variation.

Field

Type

Limits

Description

_id

string

0..1

Categorical Variation Id. MUST be unique within document.

type

string

1..1

MUST be “CanonicalVariation”.

variation

Variation

1..1

The VRS Variation object to which congruency must be determined.

Complex Variation

Computational Definition

A categorical variation domain jointly characterized by two or more other categorical variation domains.

Information Model

Some ComplexVariation attributes are inherited from Categorical Variation.

Field

Type

Limits

Description

_id

string

0..1

Categorical Variation Id. MUST be unique within document.

type

string

1..1

MUST be “ComplexVariation”.

operands

Categorical Variation

2..m

The Categorical Variation objects that are being evaluated collectively.

operator

string

1..1

The logical operation applied to evaluating the object operands. MUST be “AND” or “OR”.