A Novel CDS Strategy for Molecularly-Guided Clinical Trials Matching

Mar 14, 2022 | Elimu Informatics

Clinical trials play a crucial role in evaluating the safety and efficacy of new life-saving therapeutic options. Matching patients to clinical trials not only advances therapeutic options, but it also potentially provides a patient with additional treatment choices. Molecular profiling of tumors has become standard-of-care, and the number of molecularly-guided precision cancer medicine clinical trials is steadily rising. But fewer than one in 20 adult patients with cancer who could benefit from life-saving, novel therapies enroll in cancer clinical trials [ref], and approximately 40% of trials close prematurely due to lack of patient enrollment [ref]. Current patient- and provider-facing clinical trial matching tools have limited automated matching capabilities, particularly when it comes to matching based on a tumor’s genetic profile. In this blog, we examine a novel clinical decision support (CDS) strategy for molecularly-guided clinical trials matching that has the potential to significantly improve these statistics. 

Of the many demographic, socioeconomic, attitudinal, and computational barriers to clinical trial matching, the computational barrier is the focus of this blog – i.e., does my patient’s tumor genetic profile satisfy the molecular inclusion criteria for a clinical trial? Many groups are driving important work in this area. For instance, ClinicalTrials.gov and NCI Clinical Trials provide a wide range of searchable fields, while folks at My Cancer Genome [ref] and Dana-Farber’s MatchMiner are advancing data models and matching algorithms. Within HL7, the CodeX FHIR Accelerator group is actively designing a clinical trial matching service [ref], while within GA4GH, the Genomics Knowledge Standards Work Stream is actively approaching this use case through formalization of variant and variant annotation representations. 

But what is it that makes molecularly-guided clinical trial matching so hard? A big part of the answer is that for many clinical trials, much of the trial’s inclusion criteria are unstructured, or are structured at a level of abstraction that differs from patient observations [ref]. It’s similar to where we know the medication that a patient is taking (e.g. ’empagliflozin 10 MG Oral Tablet’), and a clinical trial’s inclusion criteria is based on drug class (e.g. ‘SGLT2 Inhibitor’). In this scenario, determining whether or not the patient is a candidate requires that we somehow compute if their medication is included in the drug class. Likewise with matching on genetic profiles, we might know the patient’s variant (e.g. ‘NM_001374258.1(BRAF):c.1525G>A’) and that a clinical trial’s inclusion criteria is based on some type of category (e.g. ‘activating BRAF mutation’). In this scenario, we need to somehow determine whether or not our patient’s variant is included in the trial’s category. Our strategy described here, based on the HL7 Clinical Quality Language (CQL), provides a computable process to make this determination, and complements the ongoing efforts listed above.

CQL is a high-level, domain-specific language developed to create sharable CDS and clinical quality measurement knowledge artifacts. Let’s take a look at how we might use CQL to define molecularly-guided clinical trial inclusion categories, and how we might then test whether a patient satisfies one or more categories, thereby establishing a possible match. To do this, we have put together a detailed scenario, which can be downloaded below.

In this scenario, we want to determine if a patient is a candidate for a clinical trial. In March 2020, the patient is found to have a lung mass on a chest x-ray. Lung biopsy confirms non-small-cell lung cancer (NSCLC), and the patient is treated with primary resection. In July 2020, the patient presents with a persistent headache, and a mass is found on Head CT. Biopsy of the mass is consistent with an NSCLC metastasis. Whole exome sequencing (WES) of the biopsy specimen shows a variant ‘NM_000222.3(KIT):c.1674G>C’, thought likely to be oncogenic. The patient’s clinician is considering therapeutic options, including a new clinical trial with inclusion criteria of stage 4 NSCLC and any of the following: MET amplification, EGFR wildtype, BRAF class 2 mutation, KIT exon 11 mutation, and wants to know if the patient qualifies.

Each of these inclusion criteria can be expressed in CQL. The key to this formalization is to recognize that the FHIR Genomics Implementation Guide defines a ‘variant‘ profile, and CQL can express testable constraints against this profile. For instance, we might define the inclusion categories as:

  • MET amplification: Any duplication or copy number variant with copy number greater than 6 that fully encompasses the MET gene (NC_000007.14:116672195-116798386).
  • EGFR wildtype: EGFR (NC_000007.14:55019016-55211628) has been studied, with no simple or structural variants present.
  • BRAF class 2 mutation: Any normalized* variant in [‘NC_000007.14:140781616:C:T’, ‘NC_000007.14:140781616:C:A’, ‘NC_000007.14:140781601:C:G’, ‘NC_000007.14:140781602:C:T’, ‘NC_000007.14:140781601:C:A’, ‘NC_000007.14:140753333:T:C’, ‘NC_000007.14:140753331:T:G’, ‘NC_000007.14:140753332:T:G’ ,’NC_000007.14:140753344:A:T’, ‘NC_000007.14:140753345:G:C’]
  • KIT exon 11 mutation: Any simple or structural variant overlapping KIT Exon 11 (NC_000004.12:54727415-54727542)

* Variants can be normalized into a canonical representation using NCBI’s Sequence Position Deletion Insertion (SPDI) variation services. Normalization in and of itself is a great technique for enhanced clinical trial matching and other search scenarios.

Let’s take this last inclusion criterion, KIT exon 11 mutation, and see how it looks when formalized in CQL:

define "KIT Exon 11 Mutation":
// Simple or structural variant overlapping KIT Exon 11 (NC_000004.12:54727415-54727542)
  [Observation: "Genetic variant assessment"] variant
    ExactStart: singleton from (variant.component ES where ES.code ~ "exactStartEnd"),
    InnerStart: singleton from (variant.component IS where IS.code ~ "innerStartEnd"),
    OuterStart: singleton from (variant.component OS where OS.code ~ "outerStartEnd"),
    Ref: singleton from (variant.component R where R.code ~ "referenceAllele" ),
    RefSeq: singleton from (variant.component RS where RS.code ~ "genomicRefSeq")
    where variant.value ~ "Present"
        and (
            and Interval[ExactStart.value.low, ExactStart.value.low + Length(Ref.value)] 
              overlaps Interval[54727415,54727542))
            and Interval[Max({OuterStart.value.low, InnerStart.value.low}), 
              Min({OuterStart.value.high, InnerStart.value.high})] overlaps 

This CQL snippet provides a formal definition for ‘KIT Exon 11 Mutation’, based on the FHIR Genomics ‘variant‘ profile (a FHIR Observation who’s observation code is LOINC 69548-6 | Genetic Variant Assessment). The logic examines various components of the variant observation, looking primarily at the variant’s location, to see if it overlaps the region of KIT Exon 11. Recall that our patient was found to have ‘NM_000222.3(KIT):c.1674G>C’. This variant normalizes to ‘NC_000004.12:54727441:G:C’, which places it on chromosome 4, starting at position 54727441, placing it within KIT‘s exon 11 (which goes from 54727415 to 54727542). We can thus conclude that the patient’s variant is a KIT Exon 11 Mutation, and that therefore the patient is a potential candidate for this clinical trial. 

CQL is emerging as a popular language for encoding decision support and quality measure artifacts [ref]. And just as CQL enables the sharing of standards-based structured and computable criteria in these domains, it likewise supports the sharing of computable clinical trial criteria. The downloadable example has the CQL and mock patient data, packaged into an executable zip file. We are curious to hear your thoughts about this strategy for enhanced clinical trial matching.

Get Elimu in your Email

Leave a Comment