Data and AI cluster

Project: Large Language Models with Knowledge Graphs

Description

Project description:

Large Language Models (LLMs) are well-known for knowledge acquisition from large-scale corpus and for achieving SOTA performance on many NLP tasks. However, they can suffer from various issues, such as hallucinations, false references, made-up facts. On the other hand, Knowledge Graphs (KGs) can store enormous amounts of facts in a structured and explicit manner. However, unlike LLMs, formulating KGs is a laborious process, and querying KGs might be computationally demanding. An interesting research question is then the following: How to combine KGs and LLMs such that LLMs provide answers based on facts and do not hallucinate in any way. We will use a well-established KG (e.g., from biology or chemistry) and focus entirely on the combination of LLMs and KGs.

In this thesis: (a) you will study the techniques for combining LLMs and KGs, (b) you will formulate and code your own LLM+KG, (c) you will design and carry out evaluations for your LLM+KG.

Literature (examples):

Singhal et al., “Large language models encode clinical knowledge”, https://www.nature.com/articles/s41586-023-06291-2
Yang et al., “ChatGPT is not Enough: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling”, https://arxiv.org/abs/2306.11489
Pan et al., “Unifying Large Language Models and Knowledge Graphs: A Roadmap”, https://arxiv.org/abs/2306.08302
Tomczak, “Deep Generative Modeling”, https://link.springer.com/book/10.1007/978-3-030-93158-2

Prerequisites:

reading and understanding scientific literature
very good coding skills in Python using PyTorch and other ML libraries
good knowledge of Deep Learning and the basics of Generative AI
curious attitude, independence, thinking out-of-the box

Details

Student: JL
Jasper Linders
Supervisor: Jakub Tomczak