back to list

Project: Multi-modal Representation Learning and Applications


With the rapid development of multi-media social network platforms, e.g., Instagram, Tiktok, etc., more and more content is generated in the multi-modal format rather than pure text. This brings new challenges for researchers to analyze the user generated content and solve some concrete problems especially NLP tasks in social networks. Tasks such as fake news detection and sarcasm detection are representative examples. Traditional fake news detection and sarcasm detection methods only focus on single-modal data, i.e., text data, so they are difficult to extend to multi-modal setting to make effective detections.

To overcome this limitation, multi-modal representation learning methods have been proposed to fuse information from both modalities. Moreover, one can design the information fusion strategy to enhance the representation learning from each modality. Some representative methods include attention-based strategies [1], graph-based strategies [2] [3], etc.

In this project, we would like to either explore a general multi-modal representation learning framework to better fuse information or focus on one specific downstream task, e.g., fake news detection or sarcasm detection to improve the detection performance using multi-modal learning methods.

Some public benchmark datasets are available [4] [5].



[1] Hongliang Pan, Zheng Lin, Peng Fu, Yatao Qi, and Weiping Wang. 2020. Modeling intra and intermodality incongruity for multi-modal sarcasm detection. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1383–1392

[2] Bin Liang, Chenwei Lou, Xiang Li, Min Yang, Lin Gui, Yulan He, Wenjie Pei, and Ruifeng Xu. 2022. Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1767–1777, Dublin, Ireland. Association for Computational Linguistics.

[3] Bin Liang, Chenwei Lou, Xiang Li, Lin Gui, Min Yang, and Ruifeng Xu. 2021. Multi-modal sarcasm detection with interactive in-modal and cross-modal graphs. In Proceedings of the 29th ACM International Conference on Multimedia, page 4707-4715.

[4] Kai Nakamura, Sharon Levy, and William Yang Wang. 2020. Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6149–6157, Marseille, France. European Language Resources Association.

[5] Yitao Cai, Huiyu Cai, and Xiaojun Wan. 2019. Multimodal sarcasm detection in Twitter with hierarchical fusion model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2506–2515, Florence, Italy. Association for Computational Linguistics.

Yulong Pei
Secondary supervisor
Tianjin Huang
Get in contact