# Image Classification with Vision Transformers and k-Fold Cross-Validation This project uses Vision Transformers (ViT) for image classification with k-fold cross-validation. The provided Python scripts allow for both model training and evaluation using Hugging Face’s `transformers` and `datasets` libraries. Results include validation accuracy across folds and visualization of confusion matrices. ## Contents - [Overview](#overview) - [Setup](#setup) - [Scripts](#scripts) - [1. Model Training](#1-model-training) - [2. Evaluation and Confusion Matrix Generation](#2-evaluation-and-confusion-matrix-generation) - [Usage](#usage) - [Results](#results) - [License](#license) ## Overview These scripts perform image classification tasks using a Vision Transformer model with cross-validation. The `train.py` script trains the model on multiple folds, while the `generate_confusion_matrix.py` script handles model evaluation and confusion matrix visualization for each fold. This ensures comprehensive performance assessment, providing metrics like accuracy, precision, recall, and F1 scores. ## Setup ### Requirements - Python 3.7 or higher - PyTorch - Hugging Face Transformers - Datasets - scikit-learn - Matplotlib - Seaborn Install the dependencies with: bash pip install torch transformers datasets scikit-learn matplotlib seaborn Directory Structure Ensure the data is structured as follows for each fold: dataset_directory/ fold1/ train/ test/ fold2/ train/ test/ ... foldN/ train/ test/ Scripts 1. Model Training (train.py) This script performs training over multiple folds. Each fold's model is saved for later evaluation, and validation accuracy is recorded for performance analysis. Parameters: model_name: Name of the model from Hugging Face Model Hub (e.g., "google/vit-base-patch16-224"). output_dir: Directory to save the model and logs for each fold. batch_size: Batch size for training and evaluation. num_epochs: Number of epochs per fold. num_folds: Total number of folds. Usage: python train.py This script: Loads the dataset for each fold and applies preprocessing using AutoFeatureExtractor. Trains the model with Trainer and saves the best model for each fold. Stores validation accuracies to calculate the average across all folds. 2. Evaluation and Confusion Matrix Generation (generate_confusion_matrix.py) After training the models with train.py, this script loads the saved models for each fold and performs inference on the test set, calculating performance metrics and plotting the confusion matrix. Usage: python generate_confusion_matrix.py This script: Loads the saved model from train.py for each fold. Makes predictions and calculates metrics for accuracy, precision, recall, and F1 score. Generates and displays a confusion matrix for each fold. Usage Run train.py to train the model on all folds. After training, models and logs will be saved in the specified output directory. Use generate_confusion_matrix.py to evaluate each fold’s model and visualize confusion matrices. The results will include: Validation accuracy across all folds from train.py. Confusion matrix for each fold, providing a clear view of the model's performance per class from generate_confusion_matrix.py. Results The average validation accuracy across folds is displayed after training with train.py. The evaluation script generate_confusion_matrix.py will print and plot metrics and confusion matrices to assess the model’s performance for each fold. License This project is licensed under the MIT License. See the LICENSE file for more details.