
# Hierarchical Multi-Scale GAN for Chinese Watercolor Painting Generation

## Abstract
This project presents a hierarchical multi-scale unconditional generative adversarial network (GAN) designed to synthesize high-quality Chinese watercolor paintings from a single image. Inspired by SinGAN, the model leverages internal patch statistics and learns image-specific distributions across scales. The framework is tailored to capture the stylistic and structural patterns typical of Chinese watercolor art.

## Dataset Overview
- **Content**: The dataset consists of high-resolution images of traditional Chinese watercolor paintings.
- **Sources**: Images were collected from renowned art institutions:
  - Harvard
  - Metropolitan Museum of Art
  - Princeton
  - Smithsonian
- **Structure**: Each source has its own subfolder under `All-Paintings/`, with images in `.jpg` format.
- **Organization**:
  ```
  source_code/
    All-Paintings/
      Harvard/
      Metropolitan/
      Princeton/
      Smithsonian/
  ```

## Code Overview
- `model.py`: Defines the Generator, Discriminator, and PsiNet networks.
- `pyramid.py`: Constructs an image pyramid for multi-scale training.
- `train.py`: Main training script with GAN training logic.
- `metrics.py`: Implements FID and Diversity Score calculations.
- `utils.py`: Contains helper functions (e.g., image loading, normalization).

## Getting Started

### Setup
Install the required dependencies:
```bash
pip install torch torchvision numpy pillow scikit-image tqdm pytorch-fid lpips
```

### Running Code

#### 1. Training
Edit the `train.py` file to specify your input image path (e.g., `All-Paintings/Harvard/harvard_0.jpg`). Then run:
```bash
python train.py
```

#### 2. Image Generation
After training, synthesized watercolor paintings can be generated by sampling random noise at the coarsest scale and propagating through the GAN pyramid.

#### 3. Evaluation Metrics
- **FID (Fréchet Inception Distance)**: Quantifies similarity between real and generated images using `metrics.py`.
- **Diversity Score**: Measures variability using LPIPS or standard deviation across multiple generations.

## Methodology

### Data Preprocessing
- Input images are resized into an image pyramid with decreasing resolution.
- Patches are sampled across scales to preserve texture statistics.

### Model Architecture
- Each GAN consists of:
  - 3x3 Convolution → BatchNorm → LeakyReLU blocks.
- Progressive multi-scale training: each scale's GAN is trained sequentially and frozen before moving to finer resolution.

### Evaluation Metrics
- **FID** for realism.

## Requirements
- Python ≥ 3.7
- PyTorch ≥ 1.7
- Additional libraries:
  - `torchvision`
  - `numpy`
  - `pillow`
  - `scikit-image`
  - `tqdm`
  - `pytorch-fid`
  - `lpips`

## Citations
If you use this work in your research, please cite the following papers:

- Arjovsky et al. *Wasserstein GAN with Gradient Penalty (WGAN-GP)* [[arXiv:1704.00028]](https://arxiv.org/abs/1704.00028)  
- Shaham et al. *SinGAN: Learning a Generative Model from a Single Natural Image* [[arXiv:1905.01164]](https://arxiv.org/abs/1905.01164)

## License
Licensed under [MIT License](LICENSE.md).
