back to list

Internship: Information Extraction from user manuals for Knowledge Graphs



User manuals and service manuals play a crucial role in guiding individuals on how to effectively and safely operate and maintain machinery. They serve as invaluable resources, providing step-by-step instructions, troubleshooting tips, and essential information. However, one of the challenges that arise with these manuals is the difficulty in searching for specific information across a vast collection of documents. As the number of products and versions of those products continues to expand, users often encounter the daunting task of sifting through numerous manuals to find the specific details they need. This can lead to frustration and time-consuming searches, hindering the overall user experience. Large Language Models (LLMs) have revolutionized the field of natural language processing by demonstrating their remarkable ability to generate human-like text and comprehend vast amounts of information. However, one critical challenge that arises with LLMs is the issue of hallucination, where they can generate false or inaccurate information that appears convincing. This becomes particularly problematic when accuracy and reliability are crucial as wrong information may worsen a customer or engineers existing problem. One potential solution to address this challenge is to integrate LLMs with Knowledge Graphs, structured databases that capture factual information. By integrating LLMs with the curated and validated knowledge stored in Knowledge Graphs, it becomes possible to enhance the accuracy and reliability of generated content.


Extracting relevant information from textual data presents numerous challenges when it comes to industrial applications. Industrial texts often contain a multitude of technical terms, acronyms, domain-specific jargon, non-explicit facts and references to far-distant pieces of text or figures making it difficult for traditional text mining. While Natural Language Processing (NLP) is receiving an increasing amount of attention, it is often not ready to address a host of challenges in the technical domain [1]. Technical Language Processing (TLP) narrows its scope to a specialized domain characterized by technical jargon, specific terminology, and domain-specific conventions. Unlike NLP, TLP often lacks suitable or even available off-the-shelf models or libraries for a host of tasks.

Research direction

To bridge this gap, the research plan involves exploring domain adaptation techniques. By leveraging existing NLP models and tools, the plan is to adapt them to the domain of the food processing industry using domain specific data. By combining the knowledge and techniques from NLP with domain-specific expertise, the research aims to narrow the gap between TLP and NLP, enabling more accurate and context-aware text mining for industrial applications.

Goals & Deliverables

This internship project will focus on semi-automated preparing the data into a suitable format for LLM’s (Data wrangling). The end-result is intended to be used in a follow-up MSc Thesis ([Link to MSc project]).

Collect and select documents for desired information needs

Design and implement pipelines to wrangles content of PDF’s into quality text,

images and tables that can be used to build KG’s

Explore options of integrating this combination of heterogeneous sources into a KG

Student requirements:

Having passed (or intending to take) Text Mining (2AMM30) is a pre

Experienced with Python

Experience with text data

About Marel

Marel is a global company specializing in advanced food processing equipment and systems for the poultry, meat, and fish industries. They offer a comprehensive range of solutions for various stages of food production. Marel's technologies and automation solutions are designed to enhance efficiency, improve product quality, and ensure food safety. They also provide software solutions and data analytics services to help food processors monitor and optimize their operations. With a strong emphasis on innovation and sustainability, Marel collaborates with industry partners to drive advancements in food processing methods.

[1] Dima, Alden & Lukens, Sarah & Hodkiewicz, Melinda & Sexton, Thurston & Brundage, Michael. (2021). Adapting natural language processing for technical text‡. Applied AI Letters. 2. 10.1002/ail2.33.

Zeno van Cauter
Get in contact