# **Named Entity Recognition (NER) Model for Argumentative Essays** This repository contains the implementation of a Named Entity Recognition (NER) model for analyzing argumentative essays using transformer-based architectures. The goal of this project is to identify entities related to argument structure in student essays to evaluate the quality of arguments. 10.5281/zenodo.14829503 ## **Features** * Dataset: Feedback Prize \- Evaluating Student Writing (Kaggle) https://www.kaggle.com/competitions/feedback-prize-effectiveness/data * Model: google/bigbird-roberta-base (Transformer-based) * Libraries: PyTorch, Transformers (Hugging Face), Scikit-learn * Key functionalities: Data preprocessing, NER model training, evaluation, and inference. --- ## **Installation** To use the code, follow these steps: ### **Prerequisites** 1. **Python**: Ensure you have Python 3.9 or higher installed. **Libraries**: Install the required libraries using the following command: `pip install torch transformers scikit-learn pandas numpy tqdm` 2. **GPU Support**: For efficient training, ensure that your system has a compatible GPU and CUDA installed. --- ### **Steps for Implementation** **Clone the Repository** Clone this repository to your local system: `git clone https://github.com/username/ner-argument-essays.git` `cd ner-argument-essays` **Prepare the Dataset** Download the *Feedback Prize \- Evaluating Student Writing* dataset from Kaggle (link here) and place it in the `data/` directory. **Data Preprocessing** Run the preprocessing script to clean and prepare the dataset for NER: `python preprocess_data.py` **Model Training** Train the NER model using the prepared dataset: `python train_model.py --epochs 10 --batch_size 16 --learning_rate 5e-5` Adjust hyperparameters (`epochs`, `batch_size`, `learning_rate`) as needed. **Validation and Testing** Evaluate the model's performance on the validation dataset: `python evaluate_model.py` This will output metrics such as F1 score, precision, and recall. **Inference** Use the trained model to perform inference on new, unlabeled data: `python infer.py --input_file data/new_essays.txt --output_file results/predictions.json` --- ## **Directory Structure** bash CopyEdit `ner-argument-essays/` `│` `├── data/` `│ ├── train.csv # Training dataset` `│ ├── test.csv # Testing dataset` `│ └── new_essays.txt # New essays for inference` `│` `├── models/` `│ └── bigbird-roberta/ # Pre-trained transformer model` `│` `├── scripts/` `│ ├── preprocess_data.py # Data preprocessing script` `│ ├── train_model.py # Model training script` `│ ├── evaluate_model.py # Model evaluation script` `│ └── infer.py # Inference script` `│` `└── README.md # Project documentation` --- ## **Results** The model achieves an F1 score of **XX.XX%**, precision of **XX.XX%**, and recall of **XX.XX%** on the validation dataset. --- ## **Limitations** This implementation is optimized for a specific dataset and may require adjustments for other datasets or tasks. Additionally, GPU support is recommended for training, as training on a CPU may be time-consuming. --- ## **Contributing** Contributions are welcome\! If you find issues or have suggestions, feel free to open an issue or submit a pull request. ---