A data-driven approach to clinical trial patient samples

Ana Todorovic, Sophie Gibbons, Philip Craig, Benjamin Fell

Psychotropic drug trials often involve a substantial period of patient recruitment. Psychiatrists typically use their own judgment to sequentially approach patients who might fulfill complex inclusion and exclusion criteria. However, information already present in patient records can also be used to constrain who is approached for screening. If a list of patients who are likely to pass screening could be pre-selected based on information in electronic health records, recruitment time could decrease by a period of one year or more.

We extracted patient information from de-identified electronic health records in Akrivia Health’s psychiatric database. This database includes information from 3.8 million users of secondary mental health services within the NHS. We formed several prioritisation patient cohorts based on different interpretations of broad inclusion/exclusion criteria set by an industry partner, and how these might be interpreted using data available to us through the NHS. We then applied a funnel approach using structured data as well as information extracted with Natural Language Processing models applied to free-text clinical notes. Health data could be used to locate a group of several thousand patients out of an initial 3.8 million, who are likely to fulfil relevant criteria for drug trial screening. We pass this information further to partner clinical teams within the NHS who can identify and approach these patients for recruitment.

Approach for statistical analysis
As recruitment is still ongoing, this project did not include a comparison of how many patients pass screening relative to a traditional recruitment procedure. We will instead present how data can be extracted to match criteria set by industry partners. For example, evidence of mild cognitive impairment can be defined as either a score on a memory assessment scale or a recent first referral to a memory clinic. The former could be extracted using an NLP model with an 80% success rate, while the latter was present as information the clinician entered into a structured field.

Results and conclusion
Existing data in psychiatric records can be used to substantially speed up clinical trial screening, by identifying and approaching only patients who are likely to fulfill inclusion/exclusion criteria.