Data and AI cluster

Research group All Advanced Models through Open Research and Engineering Data Mining Database Machine Learning for Physical Sciences Scalable Online Data Management Uncertainty in AI Show items up to All 5 years 10 years 15 years

Master projects

Here you can find all our available master projects.

Open Projects (22)

From Querying to Updating: Towards DML Support for a Metadata-Aware and Annotatable Graph Database

Property graph databases such as Neo4j are widely used these days for various applications like knowledge graphs backed LLM pipelines and modeling interconnected data, but traditional systems treat metadata (e.g., labels and property keys) separately from data and offer limited support for annotating subgraphs. …

Nick Yakovets

Sepehr Sadoughi
More info
Making the Invisible Visible: Identifying and Linking Internal Knowledge Sources in an Industrial KG

BackgroundMarel, a global leader in the food processing industry, specializes in designing and manufacturing advanced machinery for processing poultry, meat, and fish. Effective knowledge sharing among engineers at Marel is important for sustaining business operations. DescriptionThis project will explore how internally referenced knowledge sources—such as …

George Fletcher

Sepehr Sadoughi
More info
Knowledge Archeology from Homebrew Data Sources: Integrating Informal Information Sources Into the KG

BackgroundMarel, a global leader in the food processing industry, specializes in designing and manufacturing advanced machinery for processing poultry, meat, and fish. Effective knowledge sharing among engineers at Marel is important for sustaining business operations. Description“Homebrew” systems are fragmented knowledge artifacts such as spreadsheets, ad-hoc …

George Fletcher

Sepehr Sadoughi
More info
Your own MSc project in databases at a company

If you have found a MSc project at a company with a strong database angle, then I am open to supervising. Note: a MSc project isn't an internship, and so the project must have a clear, relevant, and challenging research problem. Also, a strong …

Robert Brijder
More info
Maintenance and Real-Time Updating of Deployed Knowledge Graphs

(This project is also available as an internship)Company: Marel Location: BoxmeerBackgroundKnowledge Graphs have emerged as a powerful tool for representing vast amounts of interconnected data. By structuring data in a graph format, enterprises can uncover relationships and insights that are often hidden in traditional …

Nick Yakovets

Sepehr Sadoughi
More info
Context-aware knowledge retrieval from KGs for technical support thinking assistant

Background: Knowledge Graphs (KGs) are structured representations of knowledge, that organize information in a graph-based format, where entities (nodes) and the relationships between them (edges) represent facts in an interconnected network. This graph-based structure enables encoding complex interrelationships and semantic information, making it an …

Nick Yakovets

Sepehr Sadoughi
More info
Implementing the Graph Pattern Matching Language (GPML) Fragment for GQL on AvantGraph

Graph databases have emerged as a powerful contender to traditional relational databases, especially in areas where complex relationships and interconnections are required, such as social networks and knowledge graphs. This has led to the development of various query languages to interact with graph databases, …

Nick Yakovets

Sepehr Sadoughi
More info
Show me the path: do people only care about paths?

Paths in graphs are natural, arising in domains as diverse as social networks (e.g., which people are in the same community?), communication networks (e.g., how does information spread via SMS messages?), and literary networks (e.g., which scientific papers are the most influential, in terms …

George Fletcher

SB
Sourav Bhowmick (NTU Singapore)
More info
A Counterpart of SQL for Matrices

Most commercial databases are relational and use SQL to query the data. Often, however, data is not relational. Indeed, data scientists often deal with matrices instead of relations. A counterpart of SQL for the matrices is therefore needed, and initial progress has been reported …

Robert Brijder
More info
Programming Database Theory: Using a theorem prover to formalize database theory

Proving a theorem is similar to programming: in both cases the solution is a sequence of precise instructions to obtain the output/theorem given the input/assumptions. In fact, there are programming languages such as Lean, Coq, and Isabelle that can be used to prove theorems. …

Robert Brijder
More info
Personalized research project in graph data management

This is a wildcard for projects in (knowledge) graph data management.If you took EDS (Engineering Data Systems) and liked what we did there, we offer research+engineering projects in the scope of our database engine AvantGraph (AvantGraph.io). Topics include (but not limited to):- graph query …

Nick Yakovets

Bram van de Wall
More info
Schema language engineering in AvantGraph

Schema languages are critical for data system usability, both in terms of human understanding and in terms of system performance [0]. The property graph data model is part of the upcoming ISO standards around graph data management [4]. Developing a standard schema language for …

George Fletcher
More info
Cardinality estimation for factorized query processing

It is well-known that processing of complex analytical queries over large graph datasets introduces a major pain point - runtime memory consumption. To address this, recently, a method based on factorized query processing (FQP) has been proposed. It has been shown that this method …

Nick Yakovets
More info
Building benchmarks for modern graph databases

There exists a wide variety of benchmarks available for graph databases: both synthetic and real-world-based. However, one important problem with current state of the art in graph database benchmarking is that all of the existing benchmarks are inherently based on workloads from relational databases, …

Nick Yakovets
More info
Achieving main-memory query processing performance on secondary storage on graph query workloads

Since DRAM is still relatively expensive and contemporary graph database workloads operate with billion-node-scale graphs, contemporary graph database engines still have to rely on secondary storage for query processing. In this project, we explore how novel techniques such as variable-page sizes and pointer swizzling can …

Nick Yakovets

Bram van de Wall
More info
Explaining schema conformance for knowledge graphs: conformance reporting for WikiProjects members

Wikidata is an open collaboratively built knowledge base. In the Wikidata community groups of editors who share interest in specific topics form WikiProjects. As part of their regular work, members of WikiProjects would like to regularly test the conformance of entity data in Wikidata against schemas for entity classes. …

George Fletcher

KU
Katherine Thornton, Yale University (USA)
More info
Improving knowledge graph completeness with schemas: Wikidata and ShEx

In the collaboratively built knowledge base Wikidata some editors would appreciate suggestions of how to improve the completeness of items. Currently some community members use an existing tool, Recoin, described in this paper, to get suggestions of relevant properties to use to contribute additional statements. This process could …

George Fletcher

KU
Katherine Thornton, Yale University (USA)
More info
SIMD-based JSON data processing in a dynamic Language VM

The JSON data format is one of the most popular human-readable data formats, and is widely used in Web and Data-intensive applications. Unfortunately, reading (i.e., parsing) and processing JSON data is often a performance bottleneck due to the inherent textual nature of JSON. Recent …

Daniele Bonetta
More info
ML-based compiler auto-tuning in GraalVM

Machine-learning based approaches [3] are increasingly used to solve a number of different compiler optimization problems. In this project, we want to explore ML-based techniques in the context of the Graal compiler [1] and its Truffle [2] language implementation framework, to improve the performance …

Daniele Bonetta
More info
Dynamic SQL query compilation in GraalVM

Data processing systems such as Apache Spark [1] rely on runtime code generation [2] to speedup query execution. In this context, code generation typically translates a SQL query to some executable Java code, which is capable of delivering high performance compared to query interpretation. …

Daniele Bonetta
More info
ML-based Profile-guided optimization in GraalVM

Profile-guided optimization (PGO) [1] is a compiler optimization technique that uses profiling data to improve program runtime performance. It relies on the intuition that runtime profiling data from previous executions can be used to drive optimization decisions. Unfortunately, collecting such profile data is expensive, …

Daniele Bonetta
More info
Sea-of-nodes graphs query and visualization

Language Virtual Machines such as V8 or GraalVM [3] use Graphs to represent code. One example Graph representation is the so-called Sea-of-nodes model [1]. Sea-of-nodes graphs of real-world programs have millions of edges, and are typically very hard to query, explore, and analyze. In …

Daniele Bonetta
More info

Assigned Projects (3)

Finished Projects (43)

A feasibility study on automated database exercise generation with large language modelsJul 2024

Your lecturers here at the university spend a lot of time creating new exercises for our students, both for weekly assignments as for exams. If you extrapolate this to universities and professional training globally, this is a tremendous effort and use of time. It …

WA
Willem Aerts

George Fletcher

Daphne Miedema
More info
Analyzing progression of question difficulty for SQL questions on Stack OverflowJul 2024

SQL is difficult to use effectively, and creates many errors. Error types and frequency in SQL have been analyzed by various researchers, such as Ahadi, Prior, Behbood and Lister, and Taipalus and Siponen. One method of problem solving that computer scientists apply is posting …

BW
Bert Wijnhoven

George Fletcher

Daphne Miedema
More info
Execution and Visual Representations of SQL queries in case of syntax errorsSep 2023

Query formulation in SQL is difficult for novices, and many errors are made in query formulation. Existing research has focused on registering error types and frequencies. Not much attention has been paid to solving these problems. One of the problems in SQL is with …

BW
Bas Witters

George Fletcher

Daphne Miedema
More info

From Querying to Updating: Towards DML Support for a Metadata-Aware and Annotatable Graph Database

Personalized research project in graph data management

Schema language engineering in AvantGraph

Explaining schema conformance for knowledge graphs: conformance reporting for WikiProjects members

Improving knowledge graph completeness with schemas: Wikidata and ShEx

A feasibility study on automated database exercise generation with large language modelsJul 2024

Analyzing progression of question difficulty for SQL questions on Stack OverflowJul 2024

Execution and Visual Representations of SQL queries in case of syntax errorsSep 2023

Research group All Advanced Models through Open Research and Engineering Data Mining Database Machine Learning for Physical Sciences Scalable Online Data Management Uncertainty in AI Show items up to All 5 years 10 years 15 years

Master projects

Open Projects (22)

Making the Invisible Visible: Identifying and Linking Internal Knowledge Sources in an Industrial KG

Knowledge Archeology from Homebrew Data Sources: Integrating Informal Information Sources Into the KG

Your own MSc project in databases at a company

Maintenance and Real-Time Updating of Deployed Knowledge Graphs

Context-aware knowledge retrieval from KGs for technical support thinking assistant

Implementing the Graph Pattern Matching Language (GPML) Fragment for GQL on AvantGraph

Show me the path: do people only care about paths?

A Counterpart of SQL for Matrices

Programming Database Theory: Using a theorem prover to formalize database theory

Cardinality estimation for factorized query processing

Building benchmarks for modern graph databases

Achieving main-memory query processing performance on secondary storage on graph query workloads

SIMD-based JSON data processing in a dynamic Language VM

ML-based compiler auto-tuning in GraalVM

Dynamic SQL query compilation in GraalVM

ML-based Profile-guided optimization in GraalVM

Sea-of-nodes graphs query and visualization

Assigned Projects (3)

High-Throughput Computational and Data Pipeline for iSCAT MicroscopyJun 2026

Efficient database infrastructure for librariesNov 2025

Object-relational mapping for key-value databasesDec 2024

Finished Projects (43)

Filter Research group All Advanced Models through Open Research and Engineering Data Mining Database Machine Learning for Physical Sciences Scalable Online Data Management Uncertainty in AI Show items up to All 5 years 10 years 15 years

Master projects

Open Projects (22)

From Querying to Updating: Towards DML Support for a Metadata-Aware and Annotatable Graph Database

Making the Invisible Visible: Identifying and Linking Internal Knowledge Sources in an Industrial KG

Knowledge Archeology from Homebrew Data Sources: Integrating Informal Information Sources Into the KG

Your own MSc project in databases at a company

Maintenance and Real-Time Updating of Deployed Knowledge Graphs

Context-aware knowledge retrieval from KGs for technical support thinking assistant

Implementing the Graph Pattern Matching Language (GPML) Fragment for GQL on AvantGraph

Show me the path: do people only care about paths?

A Counterpart of SQL for Matrices

Programming Database Theory: Using a theorem prover to formalize database theory

Personalized research project in graph data management

Schema language engineering in AvantGraph

Cardinality estimation for factorized query processing

Building benchmarks for modern graph databases

Achieving main-memory query processing performance on secondary storage on graph query workloads

Explaining schema conformance for knowledge graphs: conformance reporting for WikiProjects members

Improving knowledge graph completeness with schemas: Wikidata and ShEx

SIMD-based JSON data processing in a dynamic Language VM

ML-based compiler auto-tuning in GraalVM

Dynamic SQL query compilation in GraalVM

ML-based Profile-guided optimization in GraalVM

Sea-of-nodes graphs query and visualization

Assigned Projects (3)

High-Throughput Computational and Data Pipeline for iSCAT MicroscopyJun 2026

Efficient database infrastructure for librariesNov 2025

Object-relational mapping for key-value databasesDec 2024

Finished Projects (43)

A feasibility study on automated database exercise generation with large language modelsJul 2024

Analyzing progression of question difficulty for SQL questions on Stack OverflowJul 2024

Execution and Visual Representations of SQL queries in case of syntax errorsSep 2023

Research group All Advanced Models through Open Research and Engineering Data Mining Database Machine Learning for Physical Sciences Scalable Online Data Management Uncertainty in AI Show items up to All 5 years 10 years 15 years