# Machine Learning Model Evaluation for Classification

This Python script is designed for evaluating and comparing various classification algorithms on a dataset extracted from `rawexperimentaldata.csv`. It uses multiple models, visualizations, and performance metrics to determine the most accurate classifier for a given task.

## 📁 Project Structure

```
├── Source Code Updated.py
├── datacsv/
│   └── rawexperimentaldata.csv
├── README.md
```

---

## 🚀 Features

- Loads and preprocesses a large CSV dataset using chunking
- Applies multiple classification algorithms including:
  - KNN, SVM, Logistic Regression, Decision Tree, Random Forest, XGBoost, AdaBoost, Extra Trees, LDA, Naive Bayes, MLP, and Stacking
- Performs data scaling
- Evaluates performance using:
  - Accuracy
  - Confusion Matrix
  - Cross-validation
  - ROC Curve

---

## 📦 Requirements

Install the required Python libraries:

```bash
pip install pandas numpy matplotlib seaborn scikit-learn xgboost
```


## 🧠 Usage

### 1. Prepare Your Dataset

Ensure that your CSV file is located at:
```
datacsv/rawexperimentaldata.csv
```
- Must use `;` as the delimiter
- Must include columns from `add-double` to `tableswitch`, and a `Class` column

### 2. Run the Script

```bash
python "Source Code Updated.py"
```

### 3. Output

- Prints accuracy and confusion matrices for each model
- Displays confusion matrix heatmaps
- Compares models using cross-validation with boxplots
- Plots ROC curves for trained models

---

## 📊 Model Evaluation

Models are evaluated in three phases:

1. **Basic Training/Test Accuracy + Confusion Matrix**
2. **Hyper-parameter tuning**
3. **Cross-Validation Scores**
4. **ROC Curve Plots**

A stacking classifier is also used to ensemble multiple base classifiers.