With the
rapid development of multi-media social network platforms, e.g., Instagram,
Tiktok, etc., more and more content is generated in the multi-modal format
rather than pure text. This brings new challenges for researchers to analyze
the user generated content and solve some concrete problems especially NLP
tasks in social networks. Tasks such as fake news detection and sarcasm detection
are representative examples. Traditional fake news detection and sarcasm detection
methods only focus on single-modal data, i.e., text data, so they are difficult
to extend to multi-modal setting to make effective detections.
To overcome
this limitation, multi-modal representation learning methods have been proposed
to fuse information from both modalities. Moreover, one can design the information
fusion strategy to enhance the representation learning from each modality. Some
representative methods include attention-based strategies [1], graph-based strategies
[2] [3], etc.
In this
project, we would like to either explore a general multi-modal representation
learning framework to better fuse information or focus on one specific
downstream task, e.g., fake news detection or sarcasm detection to improve the
detection performance using multi-modal learning methods.
Some public
benchmark datasets are available [4] [5].
Reference:
[1] Hongliang
Pan, Zheng Lin, Peng Fu, Yatao Qi, and Weiping Wang. 2020. Modeling intra and
intermodality incongruity for multi-modal sarcasm detection. In Findings of the
Association for Computational Linguistics: EMNLP 2020, pages 1383–1392
[2] Bin Liang, Chenwei Lou, Xiang Li, Min Yang, Lin
Gui, Yulan He, Wenjie Pei, and Ruifeng Xu. 2022. Multi-Modal Sarcasm Detection via Cross-Modal Graph
Convolutional Network. In Proceedings of the 60th Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1767–1777, Dublin, Ireland. Association for
Computational Linguistics.
[3] Bin Liang, Chenwei Lou, Xiang Li, Lin Gui, Min Yang, and
Ruifeng Xu. 2021. Multi-modal sarcasm detection with interactive in-modal and
cross-modal graphs. In Proceedings of the 29th ACM International Conference on
Multimedia, page 4707-4715.
[4] Kai Nakamura, Sharon Levy, and William Yang Wang. 2020.
Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News
Detection. In Proceedings of the Twelfth Language Resources and Evaluation
Conference, pages 6149–6157, Marseille, France. European Language Resources
Association.
[5] Yitao Cai, Huiyu Cai, and Xiaojun Wan. 2019. Multimodal sarcasm detection in Twitter with hierarchical fusion model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2506–2515, Florence, Italy. Association for Computational Linguistics.