Evaluating Video-Based Synthetic Data for Training Lightweight Models in Strawberry Leaf Disease Classification

Description
This repository contains the Python code and dataset for a study evaluating the use of video-based synthetic data, generated via Sora by OpenAI, to train lightweight deep learning models for strawberry leaf disease classification. The project addresses data scarcity in agricultural research by creating a synthetic training set of 1,467 images from video frames, which are used to fine-tune six models (ResNet-18, DenseNet-121, MobileNetV3-Small, ShuffleNetV2, EfficientNet-B0, and ViT-Tiny). These models are evaluated on real images, with the best-performing model, ResNet-18, achieving an accuracy of 98.71% and an F1-score of 98.71% on the test set. A 5-fold cross-validation yields an average accuracy of 98.9%. The code also includes computation of quality metrics (FID and SSIM) for the synthetic dataset.
Dataset Information

Synthetic Dataset (Training):
Size: 1,467 images (739 healthy, 728 damaged).
Source: Generated using Sora with structured text prompts and 16 reference images from the PlantVillage dataset. Videos: 16 five-second clips at 30 FPS (8 per class), yielding 1,200 frames per class initially. Frames filtered with SSIM threshold of 0.95 and manually curated to remove blurry/low-quality images.
Quality Metrics: FID (healthy: 129.1, damaged: 120.68); Diversity Score (healthy: 0.847, damaged: 0.800, overall: 0.845); SSIM Mean (healthy: 0.15, damaged: 0.19); SSIM Std (healthy: 0.12, damaged: 0.09).
Availability: Hosted on Kaggle at https://www.kaggle.com/datasets/amiski/synthetic-strawberry-leaf-disease-dataset/data.


Real Datasets (Validation and Testing):
Validation: 388 images (194 healthy, 194 damaged).
Test: 618 images (309 healthy, 309 damaged).
Sources: PlantVillage Dataset and Strawberry Leaves Dataset.


Code Information

Main Script: cnn_with_synithatic.py (a single Python file, designed for Google Colab, containing training, evaluation, visualization, and FID/SSIM computation sections).

Sections include: Mounting Google Drive, data loading, model training (e.g., ResNet-18), testing, metrics computation (accuracy, precision, recall, F1-score, confusion matrix, FLOPs via thop), training history plots, and FID/SSIM calculation.
Similar scripts were used for other models (DenseNet-121, EfficientNet-B0, MobileNetV3-Small, ShuffleNetV2, ViT-Tiny), with modifications to the model loading section.


Language: Python 3.x.
Frameworks: PyTorch 2.6.0, torchvision.

Usage Instructions
This code is intended for execution in Google Colab with GPU access (e.g., Nvidia A100 with 40 GB VRAM). It assumes data is stored in Google Drive.

Setup:
Open the script in Google Colab.
Mount Google Drive: drive.mount('/content/drive').
Install dependencies: !pip install thop pytorch-fid scikit-image.


Data Preparation:
Ensure dataset is in /content/drive/My Drive/strawberry_sora_images with subfolders train/healthy, train/damaged, val/healthy, val/damaged, test/healthy, test/damaged.
Download real datasets from the provided URLs if needed.


Training and Evaluation:
Run the script sections sequentially.
For training: Executes for 50 epochs with Adam optimizer (lr=0.001), ReduceLROnPlateau scheduler (patience=7), and CrossEntropyLoss. Saves best and final models to /content/drive/My Drive/Strawberry_Sora_Training_Results/ResNet18_Large (adjust for other models).
For testing: Loads the best model, evaluates on test set, computes metrics, FLOPs, and inference speed.
Visualizations: Generates training history plots and confusion matrix.


FID and SSIM Calculation:

Run the dedicated section at the end of the script. It loads images, computes FID using InceptionV3 features, and SSIM-based diversity metrics.


Requirements

Python Libraries:

torch (2.6.0)
torchvision
matplotlib
seaborn
numpy
scikit-learn
thop
opencv-python (cv2)
PIL (Pillow)
pytorch-fid
scikit-image
tqdm


Hardware: Nvidia GPU (e.g., A100 with 40 GB VRAM) recommended; tested on Google Colab Pro+.
CUDA: Version 12.4.
Other: Access to Sora (OpenAI) for video generation (proprietary; pre-generated data is provided on Kaggle).

Methodology
Data Generation: Synthetic videos created using Sora with text prompts (e.g., healthy/damaged leaf conditions) and reference images, producing 1,200 frames per class. Frames extracted, filtered (SSIM > 0.95), and curated to 1,467 images.
Model Training: Pre-trained lightweight models fine-tuned on the synthetic dataset for 50 epochs using Adam optimizer (learning rate 0.001), batch size 32, and CrossEntropyLoss. On-the-fly augmentations: RandomResizedCrop, HorizontalFlip, Rotation(20°), ColorJitter (brightness/contrast/saturation=0.2, hue=0.1).
Evaluation: Single-split on 618 real test images, with metrics (accuracy, precision, recall, F1-score). Five-fold cross-validation on combined train/validation sets for ResNet-18. FLOPs computed via thop; inference speed measured in ms/image.
Quality Metrics: FID between synthetic train and real val+test; SSIM for diversity within synthetic sets.

Citations
Mohanty, S. P., et al. (2016). PlantVillage Dataset. https://github.com/spMohanty/PlantVillage-Dataset.
Hariri, A., & Avşar, E. (2022). Strawberry Leaves Dataset. https://data.mendeley.com/datasets/trwfmgjjr6/1.
OpenAI (2025). Sora: Video Generation Model. (Referenced in study).

License
This project is licensed under the MIT License.