NLP Overview
Key points: NLP is a form of AI which processes language.
NLP is a branch of AI that endeavours to develop the automated computational processing of human (natural) language. That is to say develop machines that are able to read, understand, extract information from, and even converse in, human language. Most modern forms of NLP utilise Machine Learning (ML). The process of ML involves developing algorithms that allow computers to learn from typically large datasets and perform tasks with increasingly greater accuracy without being explicitly programmed.
Why is NLP important?
Large volumes of textual data
Considering the staggering amount of unstructured data that is generated every day, from medical records to social media, automation is critical to analysing text and speech data efficiently. NLP has various applications in health care today, from assisting with clinical documentation and coding, supporting clinical decision-making, to supporting mental health treatment.
We can access massive quantities of unstructured, text-heavy qualitative data and need a way to efficiently process it. This is especially true in psychiatric health records, where up to 70-80% of the information available exists as unstructured clinical notes. NLP is important because it helps resolve ambiguity in language and adds useful numeric structure to the data. The technology to analyse unstructured text actively learns from the data as it comes in by combining machine learning with human direction to generate new insights.
Our NLP uses a state-of-the-art language model called BERT (Bidirectional Encoder Representations from Transformers). This model has been pretrained on a vast text dataset with the key innovation of using self-attention. This allows the model to focus on different semantic aspects within a sentence and gives it a deeper sense of language context compared to other neural language models.
To systematically extract information from the free text, we use three models:
- Named Entity Recognition
- Contextual Classification
- Relation Extraction
The combination of these three models gives the outputs for all concepts in our schema.
Named Entity Recognition
Key points: Named Entity Recognition locates concept mentions, which can be AI or keyword solution.
The role of Named Entity Recognition (NER) is to locate the mention of our concepts. NER can be a BERT-based model trained to spot patterns in sentences which indicate the presence of a concept or can be a keyword search (regex) to match mentions of concept to a pre-specified list. The best approach to take is heavily dependent upon which concept is being identified. For example, if we’re interested in the concept Medication, we find that a keyword search can do a good job of identifying valid medication mentions since we can write a pre-specified list of the medication names we want to look for. On the inverse, some concepts, such as Symptoms, are much harder to write a pre-specified list for as there are many possible ways a clinician can write down mentions of this concept. Both approaches have their strengths and limitations, so the choice of which to use is concept-specific.
Contextual Classification
Key points: Contextual Classification sorts concept mentions according to the context.
Using Contextual Classification, we can classify the “implicit fields” (e.g. experiencer: is this concept being referred to in reference to the patient, or someone else?) for a concept based on the sentence where the concept mention is identified. The BERT model can interpret the meaning of a concept mention based on its neighbouring words. The implicit fields are determined during the development of a concept’s schema, with aim of meaningfully classifying each mention such that it can be used in research.
Relation Extraction
Key points: Relation Extraction identifies links between different concept mentions.
Using Relation Extraction, we can link some concepts together. For example, a medication mention can be used in research without needing to be linked to any other concepts, however, some concepts, such as medication dosage, cannot stand alone but add additional information to another concept when these concepts are linked.
Relation Extraction would therefore allow us to link the right medications and dosages together in the following sentence:
Switch patient’s medication from Mirtazapine 30 mg to Venlafaxine 75 mg.
That is, relating Mirtazapine to 30 mg, and Venlafaxine to 75 mg.
Benefits of Natural Language Processing
Since between 70-80% of essential patient data lies in unstructured clinical notes, gaining immediate access to clinical information is difficult.
Researchers can use artificial intelligence methods to sort through unstructured data by using natural language processing. The data can then provide valuable insights into patient care, research efforts, and disease diagnosis.
An example of the use of NLP:
The aim of the study was to identify a group of patients presumed to have Difficult-to-treat-depression (DTD) in UK specialist mental health National Health Service (NHS) Trusts and to examine demographic, disease and treatment data through WHITE PAPER the analysis of secondary-care mental health records (‘real-world data’).
An NLP model was used to analyse anonymised electronic health records (EHRs) of five specialist mental health National Health Service (NHS) Trusts in the United Kingdom. Data on disease characteristics, comorbidities and treatment histories were extracted from structured fields and using natural language algorithms from unstructured fields.
( https://akriviahealth.com/insight/difficult-to-treat-depression/ )
Without accurate and systematic case identification, population management and research, understanding psychiatric conditions and other complex conditions is not possible. NLP can support with identifying the correct treatment options for patients, as well as assisting in disease diagnosis.
Natural Language processing holds the power to improve how we live and work. It can help bring progress to areas that have been slow or difficult to change without the partnership between human and technology.