# Image Classification with Vision Transformers and k-Fold Cross-Validation

This project uses Vision Transformers (ViT) for image classification with k-fold cross-validation. The provided Python scripts allow for both model training and evaluation using Hugging Face’s `transformers` and `datasets` libraries. Results include validation accuracy across folds and visualization of confusion matrices.

## Contents

- [Overview](#overview)
- [Setup](#setup)
- [Scripts](#scripts)
  - [1. Model Training](#1-model-training)
  - [2. Evaluation and Confusion Matrix Generation](#2-evaluation-and-confusion-matrix-generation)
- [Usage](#usage)
- [Results](#results)
- [License](#license)

## Overview

These scripts perform image classification tasks using a Vision Transformer model with cross-validation. The `train.py` script trains the model on multiple folds, while the `generate_confusion_matrix.py` script handles model evaluation and confusion matrix visualization for each fold. This ensures comprehensive performance assessment, providing metrics like accuracy, precision, recall, and F1 scores.

## Setup

### Requirements

- Python 3.7 or higher
- PyTorch
- Hugging Face Transformers
- Datasets
- scikit-learn
- Matplotlib
- Seaborn

Install the dependencies with:

bash
pip install torch transformers datasets scikit-learn matplotlib seaborn


Directory Structure
Ensure the data is structured as follows for each fold:

dataset_directory/
  fold1/
    train/
    test/
  fold2/
    train/
    test/
  ...
  foldN/
    train/
    test/

Scripts
1. Model Training (train.py)
This script performs training over multiple folds. Each fold's model is saved for later
 evaluation, and validation accuracy is recorded for performance analysis.

Parameters:

model_name: Name of the model from Hugging Face Model Hub (e.g., "google/vit-base-patch16-224").
output_dir: Directory to save the model and logs for each fold.
batch_size: Batch size for training and evaluation.
num_epochs: Number of epochs per fold.
num_folds: Total number of folds.

Usage:
python train.py

This script:

Loads the dataset for each fold and applies preprocessing using AutoFeatureExtractor.
Trains the model with Trainer and saves the best model for each fold.
Stores validation accuracies to calculate the average across all folds.

2. Evaluation and Confusion Matrix Generation (generate_confusion_matrix.py)

After training the models with train.py, this script loads the saved models for each fold
and performs inference on the test set, calculating performance metrics and plotting the
 confusion matrix.

Usage:
python generate_confusion_matrix.py

This script:

Loads the saved model from train.py for each fold.
Makes predictions and calculates metrics for accuracy, precision, recall, and F1 score.
Generates and displays a confusion matrix for each fold.
Usage
Run train.py to train the model on all folds. After training, models and logs will be 
saved in the specified output directory.
Use generate_confusion_matrix.py to evaluate each fold’s model and visualize confusion 
matrices.
The results will include:

Validation accuracy across all folds from train.py.
Confusion matrix for each fold, providing a clear view of the model's performance per 
class from generate_confusion_matrix.py.

Results
The average validation accuracy across folds is displayed after training with train.py. 
The evaluation script generate_confusion_matrix.py will print and plot metrics and 
confusion matrices to assess the model’s performance for each fold.

License
This project is licensed under the MIT License. See the LICENSE file for more details.