Here you can find all our available master projects.
Company: Datacation / aerovision.aiLocation: Eindhoven (AI Innovation Center at High Tech Campus) or Amsterdam (VU)Project descriptionAerovision.ai is a start-up that is building a no-code A.I. platform for drone companies. With this A.I. platform, companies can train, deploy and evaluate their customized computer vision algorithms, …
Multiple projects possible!GeneralWe are interested in the following topics:theoretical aspects of deep generative modeling, e.g., proposing new models, theoretical analysis (e.g., formulating theorems, proving/showing properties);applications of deep generative modeling, however, we must be aware of limited computational resources at the University;any data modality is …
Proving a theorem is similar to programming: in both cases the solution is a sequence of precise instructions to obtain the output/theorem given the input/assumptions. In fact, there are programming languages such as Lean, Coq, and Isabelle that can be used to prove theorems. …
Introduction: Artificial intelligence (AI) has shown great promise in different domains including the clinical domain. However, the applications of the developed AI model in clinical practices remained limited mainly due to the lack of model explainability. Clinicians, in general, want to know why an …
--update--: This project is now taken by Davis EisaksThe goal of this project is to study how to train a machine learning model in a gossip-based approach, where if two devices (e.g smartwatches) pass each other in the physical space, they could exchange part of …
Node-based BNNs assign latent noise variables to hidden nodes of a neural network. By restricting inference to the node-based latent variables, node stochasticity greatly reduces the dimension of the posterior. This allows for inference of BNNs that are cheap to compute and to communicate, …
Motivation. Recently, vision transformer architecture, ViT, excels at many tasks in computer vision, such as image recognition [1], image segmentation [2], image retrieval [3], image generation [4], visual object tracking [5] or object detection [6]. However, all these different sub-tasks require domain expertise, such as the type, …
Multi-objective Neural Architecture Search (NAS) aims to automatically design network architectures for both efficiency and accuracy. On one side, a task-specific neural architecture achieving considerable accuracy needs to be designed. On the other side, real-time requirements and energy consumption budgets of the network deployment need to be met …
Multi-objective Neural Architecture Search (NAS) aims to automatically design network architectures for both efficiency and accuracy. On one side, a task-specific neural architecture achieving considerable accuracy needs to be designed. On the other side, real-time requirements and energy consumption budgets of the network deployment need to be met …
Multi-objective Neural Architecture Search (NAS) aims to automatically design network architectures for both efficiency and accuracy. On one side, a task-specific neural architecture achieving considerable accuracy needs to be designed. On the other side, real-time requirements and energy consumption budgets of the network deployment need to be met …
Multi-objective Neural Architecture Search (NAS) aims to automatically design network architectures for both efficiency and accuracy. On one side, a task-specific neural architecture achieving considerable accuracy needs to be designed. On the other side, real-time requirements and energy consumption budgets of the network deployment need to be met …
Multi-objective Neural Architecture Search (NAS) aims to automatically design network architectures for both efficiency and accuracy. On one side, a task-specific neural architecture achieving considerable accuracy needs to be designed. On the other side, real-time requirements and energy consumption budgets of the network deployment need to be met …
Multi-objective Neural Architecture Search (NAS) aims to automatically design network architectures for both efficiency and accuracy. On one side, a task-specific neural architecture achieving considerable accuracy needs to be designed. On the other side, real-time requirements and energy consumption budgets of the network deployment need to be met …
Multi-objective Neural Architecture Search (NAS) aims to automatically design network architectures for both efficiency and accuracy. On one side, a task-specific neural architecture achieving considerable accuracy needs to be designed. On the other side, real-time requirements and energy consumption budgets of the network deployment need to be met …
Knowledge Graphs (KG) are an upcoming way of modeling data and knowledge, as opposed to the traditional tabular DBMS. Not only does it allow for structured querying, it also provides possibilities for semantic search and a new way of looking at “related entities”.Building a …
The current state-of-the-art Natural Language Processing (NLP) models generally follow a process: first the model is pre-trained on a large amount (billions of words) of unsupervised text for self-supervised learning, followed by fine-tuning on supervised data for use on downstream tasks (e.g. Named-Entity-Recognition (NER), …
Relation extraction is a well-researched field in which one tries to find a relationship between two entities in a piece of text. For example, in the sentence “In 2002 Elon Musk founded SpaceX” contains the entities [Elon Musk] and [SpaceX], and the relation [foundedBy].Research …
ASML has recently re-confirmed there two projects; a couple more will likely be confirmed in the coming weeksXAI in Exceptional Model Mining (--- update --- this project is taken by Yasemin Yasarol)In the semiconductor industry there are different, diverse and unique failure modes that impact …
--- update --- These projects are no longer available. Theonymfi Anogeianaki will work on FairML.1. Bayesian inferenceWe have been doing ‘traditional’ machine learning for years now at Floryn but never investigated Bayesian modeling. We currently make use of probability measures that come from our (frequentist) machine learning …
The success (and the cost) of a machine learning product or project depends to a great extend on the quality of the available data. If the data has significant flaws, it may make a project much more expensive and much more time consuming than …
This internal project aims at studying and devising new bounds for the computational complexity of inferences in probabilistic circuits and their robust/credal counterpart, including approximation results and fixed-parameter tractability. It requires mathematical interest and good knowledge of theory of computation. This is a theoretical …
This internal project aims at implementing a new approach to learning the structure and parameters of Bayesian networks. It is mostly an implementation project, as the novel ideas are already established (but never published, so the approach is novel). It requires high expertise in …
This is a wildcard for projects in (knowledge) graph data management.If you took EDS (Engineering Data Systems) and liked what we did there, we offer research+engineering projects in the scope of our database engine AvantGraph (AvantGraph.io). Topics include (but not limited to):- graph query …
Survival analysis is a set of techniques for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems (Wikipedia).Selecting the best approach for survival analysis for a given problem is non-trivial. In this project we would …
Selecting the best approach for time series forecasting is non-trivial.In this project we would like to automate this task, which is quite unexplored territory.The initial approach could be to extend an AutoML tool (GAMA, https://github.com/openml-labs/gama) with a time series forecasting search space (e.g. all methods …
Neural Architecture Search is about automatically creating the best neural networks for a given task. Many techniques are very expensive, though. The goal here is to make it orders of magnitude more efficient by 'warm starting' from pretrained models on similar tasks we've seen …
Photo-chemistry is a technique where input chemicals are first ionised and then new molecules are synthesised through interactions with photons. The exact amount of input chemicals is very important to make such reactions run effectively. Currently, Bayesian Optimization is used to find the optimal mixtures, …
There is an infinite number of ways to design a machine learning system, and many careful decisions need to be made based on prior experience. The field of automated machine learning (AutoML) aims to make these decisions in a data-driven, objective, and automated way. …
There is an infinite number of ways to design a machine learning system, and many careful decisions need to be made based on prior experience. The field of automated machine learning (AutoML) aims to make these decisions in a data-driven, objective, and automated way.In …
There are an infinite number of ways to design a machine learning system, and many careful decisions need to be made based on prior experience. The field of automated machine learning (AutoML) aims to make these decisions in a data-driven, objective, and automated way.There …
Humans are very efficient learners because we can very efficiently leverage prior experience when learning new tasks. For instance, a child first learns how to walk, and then efficiently learns how to run (obviously without starting from scratch).Several areas of machine learning aim to …
Bayesian Optimization is often used in Automated machine learning to predict which models to evaluate next. It works by learning a 'surrogate model' that is trained on previous models tried and can predict which models are interesting to try next. In all current AutoML …
Autonomous vehicles and robots need 3D information such as depth and pose to traverse paths safely and correctly. Classical methods utilize hand-crafted features that can potentially fail in challenging scenarios, such as those with low texture [1]. Although neural networks can be trained on …
Schema languages are critical for data system usability, both in terms of human understanding and in terms of system performance [0]. The property graph data model is part of the upcoming ISO standards around graph data management [4]. Developing a standard schema language for …
Context of the work: Deep Learning (DL) is a very important machine learning area nowadays and it has proven to be a successful tool for all machine learning paradigms, i.e., supervised learning, unsupervised learning, and reinforcement learning. Still, the scalability of DL models is …
Context of the work: Deep Learning (DL) is a very important machine learning area nowadays and it has proven to be a successful tool for all machine learning paradigms, i.e., supervised learning, unsupervised learning, and reinforcement learning. Still, the scalability of DL models is …
Nowadays, data changes very rapidly. Every day new trends appear on social media with millions of images. New topics rapidly emerge from the huge number of videos uploaded on Youtube. Attention to continual lifelong learning has recently increased to cope with this rapid data …
With the rapid development of multi-media social network platforms, e.g., Instagram, Tiktok, etc., more and more content is generated in the multi-modal format rather than pure text. This brings new challenges for researchers to analyze the user generated content and solve some concrete problems …
Deep neural networks (DNN) deployed in the real world are frequently exposed to non-stationary data distributions and required to sequentially learn multiple tasks. This requires that DNNs acquire new knowledge while retaining previously obtained knowledge. However, continual learning in DNNs, in which networks are …
Every second, around 107 to 108 bits of information reach the human visual system (HVS) [IK01]. Because biological hardware has limited computational capacity, complete processing of massive sensory information would be impossible. The HVS has therefore developed two mechanisms, foveation and fixation, that preserve perceptual performance …
Every second, around 107 to 108 bits of information reach the human visual system (HVS) [IK01]. Because biological hardware has limited computational capacity, complete processing of massive sensory information would be impossible. The HVS has therefore developed two mechanisms, foveation and fixation, that preserve perceptual performance …
Every second, around 107 to 108 bits of information reach the human visual system (HVS) [IK01]. Because biological hardware has limited computational capacity, complete processing of massive sensory information would be impossible. The HVS has therefore developed two mechanisms, foveation and fixation, that preserve perceptual …
Self-supervised learning [1, 2] solves pretext prediction tasks that do not require annotations in order to learn feature representations. Recent empirical research has demonstrated that deeper and wider models benefit more from task-agnostic use of unlabeled data than their smaller counterparts; i.e., smaller models …
It is well-known that processing of complex analytical queries over large graph datasets introduces a major pain point - runtime memory consumption. To address this, recently, a method based on factorized query processing (FQP) has been proposed. It has been shown that this method …
Deep clustering is a well-researched field with promising approaches. Traditional nonconvex clustering methods require the definition of a kernel matrix, whose parameters vastly influence the result, and are hence difficult to specify. In turn, the promise of deep clustering is that a feature transformation …
There exists a wide variety of benchmarks available for graph databases: both synthetic and real-world-based. However, one important problem with current state of the art in graph database benchmarking is that all of the existing benchmarks are inherently based on workloads from relational databases, …
IntroductionThe Observe, Orient, Decide and Act (OODA) loop [1] shapes most modern military warfare doctrines. Typically, after gathering sensor and intelligence data in the Observe step, a common tactical operating picture of the monitored aerial, maritime and/or ground scenario is built and shared among …
This project is particularly suited for Physics-affine students who are maybe already familiar with the statistical physics of glass formation. Project DescriptionIn order to metastasize, cancer cells need to move. Estimating the ability for cells to move, i.e. their dynamics, or so-called migration potential, …
Since DRAM is still relatively expensive and contemporary graph database workloads operate with billion-node-scale graphs, contemporary graph database engines still have to rely on secondary storage for query processing. In this project, we explore how novel techniques such as variable-page sizes and pointer swizzling can …
Influence blocking and fake news mitigation have been the main research direction for the network science and data mining research communities in the past few years. Several methods have been proposed in this direction [1]. However, none of the proposed solutions has proposed feature-blind …
In this project, we will analyze social media dataset to answer interesting questions about human behavior. We aim to study biases using social media data and propose fair solutions. The project also aims to model human behavior on social media (depends on the topic).This …
In the past 10-15 years, a massive amount of social networking data has been released publicly and analyzed to better understand complex networks and their different applications. However, ensuring the privacy of the released data has been a primary concern. Most of the graph …
In real-world networks, nodes are organized into communities and the community size follows power-law distribution. In simple words, there are a few communities of bigger size and many communities of small size. Several methods have been proposed to identify communities using structural properties of …
Deep neural networks (DNN) are achieving superior performance in perception tasks; however, they are still riddled with fundamental shortcomings. There are still core questions about what the network is truly learning. DNNs have been shown to rely on local texture information to make decisions, …
Context:Financial sector is a tightly regulated environment. All models used in the financial sector, are studied under the microscope of developers, validators, regulators, and eventually the end users – the clients, before these models can be deployed and used.To assess whether a customer should be …
Reinforcement learning (RL) is a general learning, predicting, and decision-making paradigm and applies broadly in many disciplines, including science, engineering, and humanities. Conventionally, classical RL approaches have seen prominent successes in many closed world problems, such as Atari games, AlphaGo, and robotics. However, dealing …
The goal of this thesis is to develop techniques to generate knowledge graph(s) (KG) by: recognizing and extracting entities and predicates from selected structural parts of transcriptions of our customer contacts - via chat and call - with our Customer Services center; mapping the …
At KPN we collect the transcriptions of our customer contacts with our Customer Services centers executed via the chat and call channels. The structure of such dialogues is made by a number of classifiable parts some of which always occur, for example greetings, customer …
Customers reach out to KPN for various purposes and one of the easiest way for them is to call customer service. There is already a process of analyzing customer calls to classify correct cause of the call but there are some challenges in terms of …
IntroductionConsider the problem of anomaly detection in log data. Log messages are typically human-interpretable text strings providing contextual information regarding the execution of a complex software system. They are often designed to help system maintainers to make a proper diagnostic of known system faults …
Neural networks typically consist of a sequence of well-defined computational blocks that are executed one after the other to obtain an inference for an input image. After the neural network has been trained, a static inference graph comprising these computational blocks is executed for …
Granger causality is among the standard functions for quantifying causal relationships between time series (e.g., closing prices of stocks). However, naïve computation of Granger causality requires pairwise comparisons between all time series, which comes with quadradic complexity. In this project you will focus on …
Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). The 'industry-standard' correlations are pairwise correlations, i.e., correlations between two variables. Multivariate correlations are correlations between three or more variables. …
Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). The 'industry-standard' correlations are pairwise correlations, i.e., correlations between two variables. Multivariate correlations are correlations between three or more variables. …
Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). The 'industry-standard' correlations are pairwise correlations, i.e., correlations between two variables. Multivariate correlations are correlations between three or more variables. …
Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). The 'industry-standard' correlations are pairwise correlations, i.e., correlations between two variables. Multivariate correlations are correlations between three or more variables. …
Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). The 'industry-standard' correlations are pairwise correlations, i.e., correlations between two variables. Multivariate correlations are correlations between three or more variables. …
Synopses are extensively used for summarizing high-frequency streaming data, e.g., input from sensors, network packets, financial transactions. Some examples include Count-Min sketches, Bloom filters, AMS sketches, samples, and histogram. This project will focus on designing, developing, and evaluating synopses for the discovery of heavy …
Wikidata is an open collaboratively built knowledge base. In the Wikidata community groups of editors who share interest in specific topics form WikiProjects. As part of their regular work, members of WikiProjects would like to regularly test the conformance of entity data in Wikidata against schemas for entity classes. …
In the collaboratively built knowledge base Wikidata some editors would appreciate suggestions of how to improve the completeness of items. Currently some community members use an existing tool, Recoin, described in this paper, to get suggestions of relevant properties to use to contribute additional statements. This process could …
See PDF
Missing values occur in every real-world dataset. Yet, most of our prediction and classification methods are developed for and tested on complete datasets. A few simple missing data methods exist, such as dropping incomplete cases (i.e. listwise deletion, complete case analysis) and mean/mode imputation. …
The JSON data format is one of the most popular human-readable data formats, and is widely used in Web and Data-intensive applications. Unfortunately, reading (i.e., parsing) and processing JSON data is often a performance bottleneck due to the inherent textual nature of JSON. Recent …
Machine-learning based approaches [3] are increasingly used to solve a number of different compiler optimization problems. In this project, we want to explore ML-based techniques in the context of the Graal compiler [1] and its Truffle [2] language implementation framework, to improve the performance …
Data processing systems such as Apache Spark [1] rely on runtime code generation [2] to speedup query execution. In this context, code generation typically translates a SQL query to some executable Java code, which is capable of delivering high performance compared to query interpretation. …
Profile-guided optimization (PGO) [1] is a compiler optimization technique that uses profiling data to improve program runtime performance. It relies on the intuition that runtime profiling data from previous executions can be used to drive optimization decisions. Unfortunately, collecting such profile data is expensive, …
Language Virtual Machines such as V8 or GraalVM [3] use Graphs to represent code. One example Graph representation is the so-called Sea-of-nodes model [1]. Sea-of-nodes graphs of real-world programs have millions of edges, and are typically very hard to query, explore, and analyze. In …
In the Database group, we like to learn more about students’ understanding of query languages. We often do this through user studies, in which we also ask questions about their prior experience with the language. This prior experience may have a large influence on …
SQL has proven to be difficult for students to use effectively. Various papers have been written on the types and frequencies of SQL errors. However, this does not mean that all errors are equal. Some errors may inhibit query formulation much more than others. …
Project description This project is concerned with the recognition of symbols of piping and process equipment together with the instrumentation and control devices that appear on piping and instrumentation diagrams (P&ID). Each item on the P&ID is associated with a pipeline. Piping engineers often receive drawings …
Bayesian networks are a popular model in AI. Credal networks are a robust version of Bayesian networks created by replacing the conditional probability mass functions describing the nodes by conditional credal sets (sets of probability mass functions). Next to their nodes, Bayesian networks are …
In anomaly detection, we aim to identify unusual instances in different applications, including malicious users detection in OSNs, fraud detection, and suspicious bank transaction detection. Most of the proposed anomaly detection methods are dependent on network structure as some specific structural pattern can convey …
Reinforcement learning (RL) is a computational approach to automating goal-directed decision making. In this project, we will use the framework of Markov decision processes. Fairness in reinforcement learning [1] deals with removing bias from the decisions made by the algorithms. Bias or discrimination in …
Reinforcement learning (RL) is a computational approach to automating goal-directed decision making. Reinforcement learning problems use either the framework of multi-armed bandits or Markov decision processes (or their variants). In some cases, RL solutions are sample inefficient and costly. To address this issue, some …
Reinforcement learning (RL) is a computational approach to automating goal-directed decision making using the feedback observed by the learning agent. In this project, we will be using the framework of multi-armed bandits and Markov decision processes. Observational data collected from real-world systems can mostly …
In wind farms, one source of reduction in power generation by the turbines is the reduction of wind speed in the wake downstream of each turbine's rotor. Namely, a turbine downstream in the wind direction of another will effectively experience wind with a reduced …
Recommender Systems (RSs) have emerged as a way to help users find relevant information as online item catalogs increased in size. There is an increasing interest in systems that produce recommendations that are not only relevant, but also diverse [1]. In addition to users, increased …
---UPDATE---: This project is now taken by Jonas NiederleNanopore sequencing is a third-generation sequencing method that directly measures long DNA or RNA (Figure 1). The method works by translocating a single DNA strand through a Nanopore in which an electric current signal is measured. The …
--update--: This project is now taken byTijs TeulingsThe topic of the project is simulation of bubbles with deep generative models. Bubbles are a fascinating phenomenon in multiphase flow, and they play an important role in chemical, industrial processes. Bubbles can be simulated well with …
--- UPDATE ---: This project is now taken by Tim van EngelandMeta-learning (also referred to as learning to learn) is a set of Machine Learning techniques that aim to learn quickly from a few given examples in changing environments [1]. One instantiation of the meta-learning …
In a classification task, some instances are classified more robustly than others. Namely, even with a large modification of the training set, these instances (in the test set) will be assigned to the same class. Other instances are non-robust in the sense that a …
Your lecturers here at the university spend a lot of time creating new exercises for our students, both for weekly assignments as for exams. If you extrapolate this to universities and professional training globally, this is a tremendous effort and use of time. It …
Query formulation in SQL is difficult for novices, and many errors are made in query formulation. Existing research has focused on registering error types and frequencies. Not much attention has been paid to solving these problems. One of the problems in SQL is with …
SQL is difficult to use effectively, and creates many errors. Error types and frequency in SQL have been analyzed by various researchers, such as Ahadi, Prior, Behbood and Lister, and Taipalus and Siponen. One method of problem solving that computer scientists apply is posting …
--- Subproject 1 has been filled. Subproject 2 is still open.In this project, we work together with the Dutch south-west Early Psoriatic Arthritis Registry (DEPAR) which is a collaboration of 15 medical centers in the Netherlands that aim to investigate which patient characteristics, measurements …
See PDF
See PDF
See PDF
(irrelevant)
(irrelevant for self-defined project)
No finished Projects are available.