Project: Multi-modal Representation Learning and Applications


With the rapid development of multi-media social network platforms, e.g., Instagram, Tiktok, etc., more and more content is generated in the multi-modal format rather than pure text. This brings new challenges for researchers to analyze the user generated content and solve some concrete problems especially NLP tasks in social networks. Tasks such as fake news detection and sarcasm detection are representative examples. Traditional fake news detection and sarcasm detection methods only focus on single-modal data, i.e., text data, so they are difficult to extend to multi-modal setting to make effective detections.

To overcome this limitation, multi-modal representation learning methods have been proposed to fuse information from both modalities. Moreover, one can design the information fusion strategy to enhance the representation learning from each modality. Some representative methods include attention-based strategies [1], graph-based strategies [2] [3], etc.

In this project, we would like to either explore a general multi-modal representation learning framework to better fuse information or focus on one specific downstream task, e.g., fake news detection or sarcasm detection to improve the detection performance using multi-modal learning methods.

Some public benchmark datasets are available [4] [5].



Yulong Pei
Secondary supervisor
Tianjin Huang
