Data and AI cluster

Project: [Red Cross] Low-resource Language Support

Description

This project is offered by The Netherlands Red Cross.

Background

The Qualitative Feedback Analysis (QFA) project focusses on providing LLM-based intelligence to internal information-gathering tools of the Red Cross Federation. The tool supports a set of pre-defined workflows, to classify and summarize, as well as free-text prompting. As a project of 510, the data and digital team of the Netherlands Red Cross, it is open source. See https://github.com/rodekruis/qualitative-feedback-analysis

In the development of the system, it is important to ensure that it works well in as many contexts as possible. This means the quality of the output is reliable, and personal data is stripped before handing it to the language model that is hosted off-site.

Problem statement

The current implementation focuses on a set of key languages that are generally understood by modern language models, such as English, French and Ukrainian. However, as the Red Cross is active in many countries, it has users in many low-resource languages too. Modern LLMs have shown to have sub-par performance in these low-resource languages, which introduces non-trivial design decisions to be made regarding the processing of the input data.

The QFA project would benefit greatly from a thought-out-and tested strategy to integrate low-context languages into its pipelines. A comprehensive testing framework that can be used to validate the quality of potential changes will make future improvements to the project more reliable as well. Depending on the interest of the research group or student, the academic focus area can be in bias, anonymization or more technical subjects such as efficiency, amongst others.

Profile

The research student is preferably finishing a Masters degree. Interest in LLMs gives bonus points, good Python skills are required.

Details

Supervisor: Joaquin Vanschoren
Secondary supervisor: Dalton Harmsen
Interested?: Get in contact