Maximizing the use of social and behavioural information


  • Precision medicine will require better data sources of social and behavioural data.
  • Most structured social and behavioural data fields from EHR data are inadequate.
  • NLP of unstructured EHR text could yield a chronological timeline of rich data.
  • Social and behavioural information could be extracted from unstructured EHR data.


The contribution of social and behavioural factors in the development of mental health conditions and treatment effectiveness is widely supported, yet there are weak population-level data sources on social and behavioural determinants of mental health. Enriching these data gaps will be crucial to accelerating precision medicine. Some have suggested the broader use of electronic health records (EHR) as a source of non-clinical determinants, although social and behavioural information are not systematically collected metrics in EHRs, internationally.


In this commentary, we highlight the nature and quality of key available structured and unstructured social and behavioural data using a case example of value counts from secondary mental health data available in the UK from the UK Clinical Record Interactive Search (CRIS) database; highlight the methodological challenges in the use of such data; and possible solutions and opportunities involving the use of natural language processing (NLP) of unstructured EHR text.


Most structured non-clinical data fields within secondary care mental health EHR data have too much missing data for adequate use. The utility of other non-clinical fields reported semi-consistently (e.g., ethnicity and marital status) is entirely dependent on treating them appropriately in analyses, quantifying the many reasons behind missingness in consideration of selection biases. Advancements in NLP offer new opportunities in the exploitation of unstructured text from secondary care EHR data particularly given that clinical notes and attachments are available in large volumes of patients and are more routinely completed by clinicians. Tackling ways to re-use, harmonize, and improve our existing and future secondary care mental health data, leveraging advanced analytics such as NLP is worth the effort in an attempt to fill the data gap on social and behavioural contributors to mental health conditions and will be necessary to fulfill all of the domains needed to inform personalized interventions.