Tom Williams Harrison is developing a natural language processing service at Exgence
From January-March 2019, LIV.DAT student Tom Williams Harrison worked on a placement at Exgence Ltd. Exgence is a startup that is aiming to provide solutions to software companies that spend a lot of resources processing "Invitation to Tender" (ITT) documents. An ITT document is created by a prospective customer of the software vendor, looking to purchase a particular software package to use in their business. The document will contain a list of requirements that the customer has, and is sent out to many software vendors. Each vendor then completes each item, detailing the level to which their own software can fulfil the requirement, and returns the form to the sender. The problem is that determining whether your software packages fulfil a list of requirements is often not trivial, particularly for larger software companies with many products, and whose sales teams are separated from the software engineers.
Tom summarised his secondment: "For my placement at Exgence I worked to develop a solution for this problem, that works by analysing a set of existing ITT documents and performing natural language processing to extract the semantic information from previously completed answers to questions. Exgence's product would integrate with the editor of the ITT document (such as MS Excel) and provide suggested answers to new requirement questions in real time."
In the beginning of this placement Tom created a program that could extract the ITT question/answer information from the documents in a minimally-supervised fashion. Tom notes: "This was an interesting challenge, since there is no set standard for these documents, so they are completely arbitrary in structure and formatting. They can even be different filetypes, so I wrote the converter to target .xlsx, .docx, and .rtf documents (which basically make up all ITT documents, as sadly the business world hasn't caught up with open standards very much)."
The later stages of the three-month placement were spent creating suitable pre-processing steps and developing a number of NLP models to process the structured data. This stage was a valuable learning experience in machine learning methods on language-like data, which is very different to the kinds of data Tom had previously worked with.
The original length of the secondment was three months, in order for Tom to travel to CERN for six months from March. After his time at CERN, he decided to return to Exgence to carry out the remaining three months of the secondment. This time, the focus has been more on the software engineering and devops stage of development, in order to turn the product into a completed package. This includes properly packaging the software itself, and integrating the frontend deployment code with the backend conversion and training framework that was written previously.
For an online solution like this, it is important that the software build is reproducible and can be easily deployed to production, so Tom has helped to create a full build of the software based on docker and docker-compose. As production-running code should not break, we are currently investigating the different solutions for continuous integration and deployment (CI/CD). Tom has also learnt more about the available options for applying version control to large binary files, which is important for a solution involving formatted documents and NLP corpus data.
"Due to the very small size of the startup, I have been able to enjoy a lot of freedom in creating the framework, as well as very fast communication and decision times. Having higher responsibility to find the right solutions to problems has been a really useful learning experience. Due to the relatively early time in the startup cycle that I joined, I have been able to develop most of the product backend essentially from scratch, and being familiar with the whole codebase has meant that experimenting with new ideas is much easier. I look forward to seeing how the finished product will take shape over the next two months." - Tom described his experience at Exgence.