# Machine Learning Model Evaluation for Classification This Python script is designed for evaluating and comparing various classification algorithms on a dataset extracted from `rawexperimentaldata.csv`. It uses multiple models, visualizations, and performance metrics to determine the most accurate classifier for a given task. ## 📁 Project Structure ``` ├── Source Code Updated.py ├── datacsv/ │ └── rawexperimentaldata.csv ├── README.md ``` --- ## 🚀 Features - Loads and preprocesses a large CSV dataset using chunking - Applies multiple classification algorithms including: - KNN, SVM, Logistic Regression, Decision Tree, Random Forest, XGBoost, AdaBoost, Extra Trees, LDA, Naive Bayes, MLP, and Stacking - Performs data scaling - Evaluates performance using: - Accuracy - Confusion Matrix - Cross-validation - ROC Curve --- ## 📦 Requirements Install the required Python libraries: ```bash pip install pandas numpy matplotlib seaborn scikit-learn xgboost ``` ## 🧠 Usage ### 1. Prepare Your Dataset Ensure that your CSV file is located at: ``` datacsv/rawexperimentaldata.csv ``` - Must use `;` as the delimiter - Must include columns from `add-double` to `tableswitch`, and a `Class` column ### 2. Run the Script ```bash python "Source Code Updated.py" ``` ### 3. Output - Prints accuracy and confusion matrices for each model - Displays confusion matrix heatmaps - Compares models using cross-validation with boxplots - Plots ROC curves for trained models --- ## 📊 Model Evaluation Models are evaluated in three phases: 1. **Basic Training/Test Accuracy + Confusion Matrix** 2. **Hyper-parameter tuning** 3. **Cross-Validation Scores** 4. **ROC Curve Plots** A stacking classifier is also used to ensemble multiple base classifiers.