Mining veterinary text using Large Language Models to find trends in dogs’ diseases

Adam Williams – A case study of LLMs in veterinary text mining

Challenge

Every day, thousands of people will take their beloved pets to the vets. The reason for these visits could include anything from a routine check-up to diagnosis of a serious health condition. In all cases, the details of the visit (what was wrong, medications prescribed and other notes) will be recorded by the presiding vet in a veterinary clinical note, an electronic record of the visit. The challenge of my PhD is to discover how to best analyse extract trends in disease and treatment from these records, with a particular focus on diseases affecting dogs. I will be doing this on a dataset of 11 million vet records (6 million of which are specific to dogs) collected by the Small Animal Veterinary Surveillance Network (SAVSNET).

Approach

To extract information from veterinary records at scale, I will be using Large Language Models (LLMs). LLMs are artificial intelligence models, trained on large amounts of data (typically sourced from the internet) to model how language is used and hence perform functions such as text generation and text classification amongst others. Most people will be familiar with LLMs through applications such as ChatGPT and Google Gemini which allow users to input a text prompt and generate a response to that prompt based on what they’ve learned from the data that they have been trained on. However, LLMs are much more than just ChatGPT, and my project will explore the effectiveness of several different language models at identifying disease signals in veterinary health records.

In the first year of my PhD, I have been exploring BERT. Well established within the field of biomedical text mining, BERT is a sister model to GPT that takes text as an input and uses it to generate vector representations of words (known as embeddings) and captures the meaning of words in different contexts. For example, in the sentences “Marley was dead” and “I’m dead chuffed with that” the word “dead” would need different embeddings as it has a significantly different meaning in each sentence. The original BERT model generated embeddings from training on Wikipedia and BooksCorpus, both sources that use standard English. However, a strength of BERT is that these embeddings can be tuned to reflect how language is used in specific domains by performing additional training on domain specific texts. I have used this in my research better allowing BERT to understand how vets use grammar, words and abbreviations in dog related consults. I have then been able to fine tune BERT to create a classifier which performs well when applied to classifying the large number of records in SAVSNET. However, to build this classifier requires many manually labelled training examples (on the order of 1000). Labelling these records takes a long time and can be susceptible to the bias of the labeller.

Therefore, an attractive option can be to use generative models (e.g. GPT, Meta LLaMa-3). The models that sit behind application such as ChatGPT, generative models learn tasks such as classification through so-called “Shot learning” where a prompt (i.e. what you’d put into ChatGPT) is constructed consisting of a command and examples for the model to learn from. Typically, the number of examples required is on the order of 10 examples for effective training, far fewer than BERT requires. However, although it requires less data train, this approach is currently less accurate than BERT classifiers. A potential way to improve accuracy whilst still using generative models is to use so called Small Language Models (SLMs, e.g. Microsoft Phi). These models consist of fewer parameters than models such as GPT-3.5 but can be trained on specific domain (like BERT) maintaining features of generative models such as shot learning. I will be exploring both these approaches throughout the rest of my PhD.

Impact

What difference will It make?

By improving the ability to extract disease signals from large quantities of free text, it will allow for better outbreak monitoring in dogs and by extension allow for us to specifically examine trends associated with diseases (e.g. are certain breeds more susceptible, are certain medications over-prescribed). This information can then be used by Dogs Trust to inform policy on communicating to owners on how best to care for their dogs.

What problem will it solve for my industrial partner?

This project will solve two problems for the Dogs Trust. Firstly, it will allow Dogs Trust to gain insight from the SAVSNET data. Dogs Trust already has a strong working relationship with SAVSNET, and by having research conducted specifically on dogs using SAVSNET data, it gives them better insights into what vets are seeing and what problems are developing in the canine population. Secondly, it will allow Dogs Trust to gain greater insight into their own free text data collected in their citizen science surveys by applying these LLM approaches. This could include developing models to identify what is wrong with a dog based on owner descriptions or developing models to flag owner related issues such as cries for help (which are sometimes included in surveys).

My background

After completing a masters degree in physics at the University of York, I went on to work for the Dogs Trust. I worked there for three years, applying and developing my skills in Python to build tools for research studies such as Generation Pup and the Post Adoption Welfare Study. Through helping on these studies, it gave me an idea of what issues the Dogs Trust wanted to tackle around animal welfare, including better understanding issues such as aggression and canine health. By undertaking this PhD, I hope to help Dogs Trust better understand these issues, both by looking at how these issues are diagnosed and treated by vets and by applying the methods I have learned to Dogs Trust data to help them better understand the free text they collect from their citizen science surveys.