Data and AI cluster

Research group All Automated Machine Learning Data Mining Database Generative AI Uncertainty in AI Show items up to All 5 years 10 years 15 years

Master projects

Here you can find all our available master projects.

Open Projects (28)

Empirical study of the social life of Knowledge graphs

Company: Marel Location: Boxmeer Background Marel, a global leader in the food processing industry, specializes in designing and manufacturing advanced machinery for processing poultry, meat, and fish. Effective knowledge sharing among engineers at Marel is crucial to support business operations. However, not all knowledge …

George Fletcher

Zeno van Cauter
More info
Empirical study of knowledge evolution through Knowledge Graph changes

Company: Marel Location: Boxmeer Background Marel, a global leader in the food processing industry, specializes in designing and manufacturing advanced machinery for processing poultry, meat, and fish. Effective knowledge sharing among engineers at Marel is important for sustaining business operations.Problem description In this project …

George Fletcher

Sepehr Sadoughi
More info
Maintenance and Real-Time Updating of Deployed Knowledge Graphs

(This project is also available as an internship)Company: Marel Location: BoxmeerBackgroundKnowledge Graphs have emerged as a powerful tool for representing vast amounts of interconnected data. By structuring data in a graph format, enterprises can uncover relationships and insights that are often hidden in traditional …

Nick Yakovets

Zeno van Cauter
More info
Context-aware knowledge retrieval from KGs for technical support thinking assistant

Background: Knowledge Graphs (KGs) are structured representations of knowledge, that organize information in a graph-based format, where entities (nodes) and the relationships between them (edges) represent facts in an interconnected network. This graph-based structure enables encoding complex interrelationships and semantic information, making it an …

Nick Yakovets

Sepehr Sadoughi
More info
Implementing the Graph Pattern Matching Language (GPML) Fragment for GQL on AvantGraph

Graph databases have emerged as a powerful contender to traditional relational databases, especially in areas where complex relationships and interconnections are required, such as social networks and knowledge graphs. This has led to the development of various query languages to interact with graph databases, …

Nick Yakovets

Sepehr Sadoughi
More info
Object-relational mapping for key-value databases

Object-relational mappers (ORM) like Django allow one to interact with a database in an object-oriented manner, and provide constructs for easy deployment of web-based applications that depend on a database. The underlying database of an ORM is typically a SQL database. It is unclear …

Robert Brijder
More info
Efficient database infrastructure for libraries

Database management systems for libraries (as in, institutions for lending books) need to satisfy a number of specific needs, in particular regarding the types of queries that need to be supported and regarding performance of the queries that are most often executed. In this …

Robert Brijder
More info
Show me the path: do people only care about paths?

Paths in graphs are natural, arising in domains as diverse as social networks (e.g., which people are in the same community?), communication networks (e.g., how does information spread via SMS messages?), and literary networks (e.g., which scientific papers are the most influential, in terms …

George Fletcher

SB
Sourav Bhowmick (NTU Singapore)
More info
A survey on correlations and similarities for spatiotemporal data

Finding pairs of locations that present interesting correlations or similarities (e.g., in their weather, development rate, or population statistics through time) can provide useful insights in different contexts/domains. For example, if a country observes that two different cities have a high similarity on the …

Odysseas Papapetrou
More info
Discovery and maintenance of heavy hitters over sliding windows, in a distributed environment.

Synopses are extensively used for summarizing high-frequency streaming data, e.g., input from sensors, network packets, financial transactions. Some examples include Count-Min sketches, Bloom filters, AMS sketches, samples, and histogram. This project will focus on designing, developing, and evaluating synopses for the discovery of heavy …

Odysseas Papapetrou
More info
Detection of similarities and correlations in multidimensional time series

Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). Most algorithms consider one-dimensional time series. For example, in the context of finance, the time series might represent the …

Odysseas Papapetrou
More info
Lagged multivariate correlations

Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). The 'industry-standard' correlations are pairwise correlations, i.e., correlations between two variables. Multivariate correlations are correlations between three or more …

Odysseas Papapetrou
More info
A Counterpart of SQL for Matrices and Tensors

Most commercial databases are relational and use SQL to query the data. Often, however, data is not relational. Indeed, data scientists often deal with matrices instead of relations. A counterpart of SQL for the matrices and tensors is therefore needed, and initial progress has …

Robert Brijder
More info
Programming Database Theory: Using a theorem prover to formalize database theory

Proving a theorem is similar to programming: in both cases the solution is a sequence of precise instructions to obtain the output/theorem given the input/assumptions. In fact, there are programming languages such as Lean, Coq, and Isabelle that can be used to prove theorems. …

Robert Brijder
More info
Personalized research project in graph data management

This is a wildcard for projects in (knowledge) graph data management.If you took EDS (Engineering Data Systems) and liked what we did there, we offer research+engineering projects in the scope of our database engine AvantGraph (AvantGraph.io). Topics include (but not limited to):- graph query …

Nick Yakovets

Bram van de Wall
More info
Schema language engineering in AvantGraph

Schema languages are critical for data system usability, both in terms of human understanding and in terms of system performance [0]. The property graph data model is part of the upcoming ISO standards around graph data management [4]. Developing a standard schema language for …

George Fletcher
More info
Cardinality estimation for factorized query processing

It is well-known that processing of complex analytical queries over large graph datasets introduces a major pain point - runtime memory consumption. To address this, recently, a method based on factorized query processing (FQP) has been proposed. It has been shown that this method …

Nick Yakovets
More info
Building benchmarks for modern graph databases

There exists a wide variety of benchmarks available for graph databases: both synthetic and real-world-based. However, one important problem with current state of the art in graph database benchmarking is that all of the existing benchmarks are inherently based on workloads from relational databases, …

Nick Yakovets
More info
Achieving main-memory query processing performance on secondary storage on graph query workloads

Since DRAM is still relatively expensive and contemporary graph database workloads operate with billion-node-scale graphs, contemporary graph database engines still have to rely on secondary storage for query processing. In this project, we explore how novel techniques such as variable-page sizes and pointer swizzling can …

Nick Yakovets

Bram van de Wall
More info
Social Media Data Analysis - Twitter Case Study

In this project, we will analyze social media dataset to answer interesting questions about human behavior. We aim to study biases using social media data and propose fair solutions. The project also aims to model human behavior on social media (depends on the topic).This …

George Fletcher

Akrati Saxena
More info
Fairness-aware Community Detection Method

In real-world networks, nodes are organized into communities and the community size follows power-law distribution. In simple words, there are a few communities of bigger size and many communities of small size. Several methods have been proposed to identify communities using structural properties of …

George Fletcher

Akrati Saxena
More info
Explaining schema conformance for knowledge graphs: conformance reporting for WikiProjects members

Wikidata is an open collaboratively built knowledge base. In the Wikidata community groups of editors who share interest in specific topics form WikiProjects. As part of their regular work, members of WikiProjects would like to regularly test the conformance of entity data in Wikidata against schemas for entity classes. …

George Fletcher

KU
Katherine Thornton, Yale University (USA)
More info
Improving knowledge graph completeness with schemas: Wikidata and ShEx

In the collaboratively built knowledge base Wikidata some editors would appreciate suggestions of how to improve the completeness of items. Currently some community members use an existing tool, Recoin, described in this paper, to get suggestions of relevant properties to use to contribute additional statements. This process could …

George Fletcher

KU
Katherine Thornton, Yale University (USA)
More info
SIMD-based JSON data processing in a dynamic Language VM

The JSON data format is one of the most popular human-readable data formats, and is widely used in Web and Data-intensive applications. Unfortunately, reading (i.e., parsing) and processing JSON data is often a performance bottleneck due to the inherent textual nature of JSON. Recent …

Daniele Bonetta
More info
ML-based compiler auto-tuning in GraalVM

Machine-learning based approaches [3] are increasingly used to solve a number of different compiler optimization problems. In this project, we want to explore ML-based techniques in the context of the Graal compiler [1] and its Truffle [2] language implementation framework, to improve the performance …

Daniele Bonetta
More info
Dynamic SQL query compilation in GraalVM

Data processing systems such as Apache Spark [1] rely on runtime code generation [2] to speedup query execution. In this context, code generation typically translates a SQL query to some executable Java code, which is capable of delivering high performance compared to query interpretation. …

Daniele Bonetta
More info
ML-based Profile-guided optimization in GraalVM

Profile-guided optimization (PGO) [1] is a compiler optimization technique that uses profiling data to improve program runtime performance. It relies on the intuition that runtime profiling data from previous executions can be used to drive optimization decisions. Unfortunately, collecting such profile data is expensive, …

Daniele Bonetta
More info
Sea-of-nodes graphs query and visualization

Language Virtual Machines such as V8 or GraalVM [3] use Graphs to represent code. One example Graph representation is the so-called Sea-of-nodes model [1]. Sea-of-nodes graphs of real-world programs have millions of edges, and are typically very hard to query, explore, and analyze. In …

Daniele Bonetta
More info

Assigned Projects

No currently assigned Projects.

Finished Projects (21)

A feasibility study on automated database exercise generation with large language modelsJul 2024

Your lecturers here at the university spend a lot of time creating new exercises for our students, both for weekly assignments as for exams. If you extrapolate this to universities and professional training globally, this is a tremendous effort and use of time. It …

WA
Willem Aerts

George Fletcher

Daphne Miedema
More info
Analyzing progression of question difficulty for SQL questions on Stack OverflowJul 2024

SQL is difficult to use effectively, and creates many errors. Error types and frequency in SQL have been analyzed by various researchers, such as Ahadi, Prior, Behbood and Lister, and Taipalus and Siponen. One method of problem solving that computer scientists apply is posting …

BW
Bert Wijnhoven

George Fletcher

Daphne Miedema
More info
Execution and Visual Representations of SQL queries in case of syntax errorsSep 2023

Query formulation in SQL is difficult for novices, and many errors are made in query formulation. Existing research has focused on registering error types and frequencies. Not much attention has been paid to solving these problems. One of the problems in SQL is with …

BW
Bas Witters

George Fletcher

Daphne Miedema
More info
Multivariate correlations for data cleaningJul 2023

Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). The 'industry-standard' correlations are pairwise correlations, i.e., correlations between two variables. Multivariate correlations are correlations between three or more variables. …

Odysseas Papapetrou
More info
Correlation Detective on streaming dataJul 2023

Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). The 'industry-standard' correlations are pairwise correlations, i.e., correlations between two variables. Multivariate correlations are correlations between three or more variables. …

Odysseas Papapetrou
More info
Efficient Granger causalityJul 2023

Granger causality is among the standard functions for quantifying causal relationships between time series (e.g., closing prices of stocks). However, naïve computation of Granger causality requires pairwise comparisons between all time series, which comes with quadradic complexity. In this project you will focus on …

Odysseas Papapetrou
More info

Filter Research group All Automated Machine Learning Data Mining Database Generative AI Uncertainty in AI Show items up to All 5 years 10 years 15 years

Master projects

Open Projects (28)

Empirical study of the social life of Knowledge graphs

Empirical study of knowledge evolution through Knowledge Graph changes

Maintenance and Real-Time Updating of Deployed Knowledge Graphs

Context-aware knowledge retrieval from KGs for technical support thinking assistant

Implementing the Graph Pattern Matching Language (GPML) Fragment for GQL on AvantGraph

Object-relational mapping for key-value databases

Efficient database infrastructure for libraries

Show me the path: do people only care about paths?

A survey on correlations and similarities for spatiotemporal data

Discovery and maintenance of heavy hitters over sliding windows, in a distributed environment.

Detection of similarities and correlations in multidimensional time series

Lagged multivariate correlations

A Counterpart of SQL for Matrices and Tensors

Programming Database Theory: Using a theorem prover to formalize database theory

Personalized research project in graph data management

Schema language engineering in AvantGraph

Cardinality estimation for factorized query processing

Building benchmarks for modern graph databases

Achieving main-memory query processing performance on secondary storage on graph query workloads

Social Media Data Analysis - Twitter Case Study

Fairness-aware Community Detection Method

Explaining schema conformance for knowledge graphs: conformance reporting for WikiProjects members

Improving knowledge graph completeness with schemas: Wikidata and ShEx

SIMD-based JSON data processing in a dynamic Language VM

ML-based compiler auto-tuning in GraalVM

Dynamic SQL query compilation in GraalVM

ML-based Profile-guided optimization in GraalVM

Sea-of-nodes graphs query and visualization

Assigned Projects

Finished Projects (21)

A feasibility study on automated database exercise generation with large language modelsJul 2024

Analyzing progression of question difficulty for SQL questions on Stack OverflowJul 2024

Execution and Visual Representations of SQL queries in case of syntax errorsSep 2023

Multivariate correlations for data cleaningJul 2023

Correlation Detective on streaming dataJul 2023

Efficient Granger causalityJul 2023

Research group All Automated Machine Learning Data Mining Database Generative AI Uncertainty in AI Show items up to All 5 years 10 years 15 years