SemEval-2022 Task 4: Effective Data Augmentation Methods for Patronizing Language Detection and Multi-label Classification
We all are patronizing and condescending sometimes. And of course, we all are susceptible to be condescended and patronized by others. But some groups are, unfortunately, more used to be referred to with this undervaluing treatment. The so-called vulnerable communities seem to be the perfect target for charity and pity-driven texts, condescension and patronization in news stories.
PCL is often involuntary and unconscious and the authors using such language are usually trying to help the communities in need, by raising awareness, moving the audience to action or standing for the rights of the under-represented. But PCL can potentially be very harmful, as it feeds tereotypes, routinizes discrimination and drives to greater exclusion.
For more details about the task check out here.
This paper presents a combination of data aug- mentation methods to boost the performance of state-of-the-art transformer-based language models for Patronizing and Condescending Language (PCL) detection and multi-label PCL classification tasks. These tasks are inherently different from sentiment analysis because posi- tive/negative hidden attitudes in the context will not necessarily be considered positive/negative for PCL tasks. Our approach relies on fine- tuning pretrained RoBERTa and GPT3 mod- els such as Davinci and Curie engines with extra-enriched PCL dataset. We augmented the underrepresented class of annotated data to achieve competitive results among top-16 SemEval-2022 participants. Furthermore, we discuss Few-Shot learning technique to over- come the limitation of low-resource NLP prob- lems.
All source code used to generate the results and figures in the paper are in
the code
folder.
The calculations and figure generation are all run inside
Jupyter notebooks.
The data used in this study is available upon request.
You can download a copy of all the files in this repository by cloning the git repository:
git clone https://github.com/daniel-saeedi/PCL_Detection_SemEval2022.git
Run this command to install dependencies:
pip3 install -r requirements.txt