MixSarc: A Bangla-English Code-Mixed Corpus For Implicit Meaning Identification

Ahmed, Tamim; Alam, Kazi Samin Yasar; Chowdhury, Md Tanbir

MixSarc: A Bangla-English Code-Mixed Corpus For Implicit Meaning Identification

Files

Primary 50 Fulltext_ CSE_ MixSarc A Bangla-English Code-Mixed Corpus For Implicit Meaning.pdf (1.02 MB)

50 Turnitin Report_ CSE_200041150_200041119_200041114_PR.pdf (561.05 KB)

Date

2025-10-25

Authors

Ahmed, Tamim

Alam, Kazi Samin Yasar

Chowdhury, Md Tanbir

Publisher

Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh

Abstract

Thisthesisfocusesondetectinghumor,sarcasm,offensiveness,andvulgarityinBangla English code-mixed text, an area largely overlooked in existing natural language pro cessing (NLP) research. A novel dataset has been proposed, which will be created by scraping and filtering social media content, followed by manual annotation across fourattributes. Twotransformer-basedapproacheswereexploredinsmallscale: multi class and multi-label text classification. The study also proposes future directions, in cluding dataset balancing, comparative evaluation of transformer models and large language models (LLMs), and the introduction of a SarOff Score to better capture sarcasm-offense overlap. By addressing the complexities of code-mixed tone detec tion, this work advances NLP in low-resource, multilingual settings

Description

Supervised by Mr. Md Rafid Haque, Lecturer, Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2025

URI

https://repository.iutoic-dhaka.edu/handle/123456789/2637

Collections

2025

Full item page

MixSarc: A Bangla-English Code-Mixed Corpus For Implicit Meaning Identification

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By