MixSarc: A Bangla-English Code-Mixed Corpus For Implicit Meaning Identification

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh

Abstract

Thisthesisfocusesondetectinghumor,sarcasm,offensiveness,andvulgarityinBangla English code-mixed text, an area largely overlooked in existing natural language pro cessing (NLP) research. A novel dataset has been proposed, which will be created by scraping and filtering social media content, followed by manual annotation across fourattributes. Twotransformer-basedapproacheswereexploredinsmallscale: multi class and multi-label text classification. The study also proposes future directions, in cluding dataset balancing, comparative evaluation of transformer models and large language models (LLMs), and the introduction of a SarOff Score to better capture sarcasm-offense overlap. By addressing the complexities of code-mixed tone detec tion, this work advances NLP in low-resource, multilingual settings

Description

Supervised by Mr. Md Rafid Haque, Lecturer, Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2025

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By