Transformer-Based Ensemble Model For Arabic Dialectal Sentiment Classification
------------------------------------------------------------------------------
this paper examines a variety of machine learning, deep learning, and transformer-based models for Arabic dialectal sentiment classification task. for machine learninig models SVM. NB, DT, and XGboost are  with TF-IDF feature extractin method. for deep learning models both CNN and BlSTM are employed with AraVec (SG & COW), AraBERT, FasText, and MARBERT. for transformer-based models CAMeLBERT, XLM-RoBERTa, MARBERT and their ensemble model are experimented. for improved performance, all the models are optimized. the findings proved that the transformer-base ensemble model outperformed the whole models before and after optimizatin.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
paper abstract:
----------------
Social media platforms like Twitter, Facebook, and Instagram have become crucial avenues for individuals to express their opinions and thoughts, especially in the context of global emergencies. These platforms serve as valuable sources of viewpoints, necessitating analysis for decision-making and understanding societal trends. Automatic opinion mining, or sentiment analysis, is employed to examine people's sentiments towards specific subjects. However, when applied to dialectal Arabic, opinion mining presents a challenging task in natural language processing due to the language's complex semantic and morphological analysis, as well as the presence of multiple dialects. Opinion mining has a list of synonymous such as sentiment analysis, emotion mining, review mining, and sentiment classification. This paper focuses on the analysis of tweets from three benchmark datasets: Arabic Sentiment Tweets Dataset (ASTD), A Twitter-based Benchmark Arabic Sentiment Analysis Dataset (ASAD), and Tweets Emoji Arabic Dataset (TEAD). The analysis involves experimentation with various comparative models, including machine learning models, deep learning models, transformer-based models, and a transformer-based ensemble model. Features are extracted for both machine learning and deep learning models using techniques such as AraVec, FastText, AraBERT, and Term Frequency-Inverse Document Frequency (TF-IDF). Machine learning models, including Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree (DT), and Extreme Gradient Boosting (XgBoost), are compared with deep learning models like Convolutional Neural Network (CNN) and Bidirectional Long Short Term Memory (BLSTM) deep neural networks. Additionally, transformer-based models such as CAMeLBERT, XLM-RoBERTa, and MARBERT, along with their ensemble, are experimented. The results indicate that the proposed transformer-based ensemble model achieved the best performance, with average accuracy, recall, precision, and f1-score of 90.4%, 88%, 87.3%, and 87.7%, respectively.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
in the "Codes" folder, you will find the following scripts:
------------------------------------------------------------
1- training_using_bert_extracted_featrues.ipynb: file for training BLSTM and CNN models using Arabert and MARBERT feature extractor methods.
2- camelbert_fine-tuning.ipynb: file for fine-tuning camelbert model using the three datasets (ASTD, ASAD, TEAD).
3- training_using_w2v_features.ipynb: file for training BLSTM and CNN models using fasttext and AraVec (SG & COW) feature extractor methods.
4- transformers_ensemble.ipynb: file for training the proposed transformer-based ensemble model of CAMeLBERT, XLM-RoBERTa, and MARBERT using the three datasets (ASTD, ASAD, TEAD).
5- CNN_hyperparameter_optimization.ipynb: file for optimizing the CNN Model with AraVec (SG).
6- BLSTM_hyperparameter_optimization.ipynb: file for optimizing the BLSTM Model with FastText.
7- marbert_fine-tuning.ipynb: file for fine-tuning MARBERT model using the three datasets (ASTD, ASAD, TEAD).
8- ml_models.ipynb: file for fine-tuning SVM, NB, DT, and XGboost models using the three datasets (ASTD, ASAD, TEAD).
9- ml_models_optimization.ipynb: file for optimizing the SVM, NB, DT, and XGboost models
10- preprocessing_steps.ipynb: file for cleaning the datasets (ASTD, ASAD, TEAD).
11- xroberta_fine-tuning.ipynb: file for fine-tuning XLM-RoBERTa model using the three datasets (ASTD, ASAD, TEAD).
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
to run the conducted experiements, follow these steps:
--------------------------------------------------------
1- download the datasets and scripts into your local machine.
2- install dependences.
3- intially clean the datasets by running the preprocessing_steps.ipynb file.
4- run each script separatly and record the results.

For any inquiries or assistance regarding our study, please contact Eman Aboelela at eman.aboelela@intellaworld.com.