# Economic Insights into Credit Risk: A Deep Learning Model for Predicting Loan Repayment Capacity

## Description
This project implements a deep learning model using a Transformer-based architecture to predict loan repayment capacity. The model analyzes both categorical and numerical features from loan applications, aiming to improve the accuracy of credit risk assessment for financial institutions.

## Dataset Information
The model is trained on the Lending Club dataset, which contains:
- Loan application details
- Borrower characteristics
- Historical credit data
- Loan performance data
- Target variable: loan status (Fully Paid/Charged Off)

The data is split into training, validation, and test sets, located in the `python_code/data/` directory.

## Code Information
The project is organized as follows:
```
python_code/
├── data/                      # Data directory
│   ├── train.csv              # Training data
│   ├── val.csv                # Validation data
│   └── test.csv               # Test data
├── model.py                   # Model architecture implementation
├── training.py                # Training and data processing logic
├── main.py                    # Command-line interface
├── training_utils.py          # Training utilities
├── data_processing.ipynb      # Data processing notebook
└── checkpoints/               # Directory for saved models
    └── best_model.pth         # Best model checkpoint
```

## Usage Instructions

### Training
To train the model, run:
```bash
python main.py --mode train \
    --data_dir ./data \
    --model_path ./checkpoints/best_model.pth \
    --batch_size 256 \
    --embedding_dim 64 \
    --num_heads 2 \
    --num_layers 1 \
    --dropout 0.1 \
    --learning_rate 1e-4 \
    --num_epochs 50
```

### Prediction
To make predictions on new data:
```bash
python main.py --mode predict \
    --model_path ./checkpoints/best_model.pth \
    --input_file path/to/input.csv \
    --output_file path/to/predictions.csv
```

## Requirements
- torch >= 1.9.0
- pandas >= 1.3.0
- numpy >= 1.19.0
- scikit-learn >= 0.24.0
- tqdm >= 4.62.0

Install dependencies with:
```bash
pip install -r requirements.txt
```

## Methodology
1. **Feature Encoding**: Categorical features are embedded; numerical features are normalized and projected into the embedding space.
2. **Gated Transformer Processing**: Multi-head self-attention captures feature interactions, with gating mechanisms, layer normalization, and residual connections for stable training.
3. **Prediction Module**: Outputs the probability of loan repayment.


## 7. Citations
Dataset used in this paper is available at [Here](https://www.kaggle.com/datasets/jeandedieunyandwi/lending-club-dataset)

## 8. License

This project is open source under the MIT License.