Here you can find all our available master projects.
In this project you will consider the use of synopses (course 2AMD15) for continual learning. You will (a) explore how existing synopses can be used to support continual learning tasks, e.g., to mitigate forgetting (b) develop novel sketches, if needed, (c) prove their properties …
Odysseas Papapetrou
Mykola Pechenizkiy
Databases often act as the backend for visualization -- to safely store the data, and to aggregate/serve it to the visualization layer efficiently, such that it is shown to the user in a way that helps decision making. This connection between the two layers …
Odysseas Papapetrou
Minwise sampling (or MinHash) is a collection of methods that estimate similarity between sets. Most methods assume static data. A new method, designed last year in our group, also works with non-static (i.e., streaming) data, and it can support deletion. This thesis will focus …
Odysseas Papapetrou
In [1] we proposed OmniSketch, the first sketch that supports OLAP-like analytics. In this thesis you will consider either of the two options: (a) distributing OmniSketch such that it works efficiently over large clusters, (b) making it able to handle sliding windows queries, by using …
Odysseas Papapetrou
Wieger Punter
The recent work "Synopses for summarizing spatial data streams" describes a framework that allows any existing synopsis to summarize spatial data. This thesis focuses on further extending this work by replacing the simple regular grid structure that is used now with other, more space …
Odysseas Papapetrou
Wieger Punter
The recent work "Synopses for summarizing spatial data streams" describes a framework that allows any existing synopsis to summarize spatial data. This thesis focuses on further extending this work by rethinking the allocation of space in the spatial sketch. For example, areas in the …
Odysseas Papapetrou
Wieger Punter
Training ML models over big data is a time-consuming and energy-hungry process. Furthermore it requires full access over the data, which is challenging in many use cases, due to the size of the data. The problem is particularly challenging when the data is read …
Odysseas Papapetrou
Mykola Pechenizkiy
Time series data is widely generated and used across various fields, including healthcare, finance, and surveillance. For example, in the stock market, the changes in stock prices throughout the day form a time series. In such contexts, it is often important to perform searches—either …
Odysseas Papapetrou
No currently assigned Projects.
Data ingestion in IoT networks frequently utilizes a software called 'IoT hub'. In this project you will: (a) consider the requirements of a large organization (Naturalis), for ingesting data from their IoT network to their Databricks platform, (b) examine the usefulness of existing IoT …
Odysseas Papapetrou
Correlations are instrumental for our understanding on complex systems. For example, after years of studying scientists know that smoking is correlated to cancer. There are however some more nuanced correlations, which are more difficult to detect. These are called ‘deep correlations’ or ‘high-order correlations’. …
Odysseas Papapetrou
When an ambulance is dispatched to assist a patient, it would be highly beneficial for ambulance personnel to have controlled access to the patient’s medical data stored in their general practitioner’s (GP) database. Currently, such access is not feasible due to both technical and …
Odysseas Papapetrou
The plethora of cheap smart devices (particularly smart phones and smart watches) makes it promising for improved monitoring of home-care patients. In this thesis you will investigate the key involved challenges and study and propose technical solutions, using big data technologies (the contents of …
Odysseas Papapetrou
Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). The 'industry-standard' correlations are pairwise correlations, i.e., correlations between two variables. Multivariate correlations are correlations between three or more …
Odysseas Papapetrou
Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). Most algorithms consider one-dimensional time series. For example, in the context of finance, the time series might represent the …
Odysseas Papapetrou
Synopses are extensively used for summarizing high-frequency streaming data, e.g., input from sensors, network packets, financial transactions. Some examples include Count-Min sketches, Bloom filters, AMS sketches, samples, and histogram. This project will focus on designing, developing, and evaluating synopses for the discovery of heavy …
Odysseas Papapetrou
Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). The 'industry-standard' correlations are pairwise correlations, i.e., correlations between two variables. Multivariate correlations are correlations between three or more variables. …
Odysseas Papapetrou
Correlations are extensively used in all data-intensive disciplines, to identify relations between the data (e.g., relations between stocks, or between medical conditions and genetic factors). The 'industry-standard' correlations are pairwise correlations, i.e., correlations between two variables. Multivariate correlations are correlations between three or more variables. …
Odysseas Papapetrou