back to list

Project: Applications of minwise sampling

Description

Minwise sampling (or MinHash) is a collection of methods that estimate similarity between sets. Most methods assume static data. A new method, designed last year in our group, also works with non-static (i.e., streaming) data, and it can support deletion.

This thesis will focus on exploring applications where such a dynamic similarity algorithm is valuable (e.g., evolving networks, streaming data, content moderation) and on adjusting the method to fit the task at hand and benchmarking the method against current state-of-the-art alternatives. The project is ideal for students interested in scalable algorithms, experimental evaluation and bridging theory with impactful applications.

Details
Supervisor
Odysseas Papapetrou
Secondary supervisor
WP
Wieger R. Punter (PhD student)
Interested?
Get in contact