# Corporate ESG Scores Prediction

## 1. Description

This repository implements the ExcelFormer deep learning model to predict Corporate ESG (Environmental, Social, and Governance) scores from tabular company data. The project provides a full pipeline from data preprocessing to model training and evaluation.

## 2. Dataset Information

- **Source:** The dataset is located at `source code/raw_data/company_esg_financial_dataset.csv`.
- **Content:** The dataset contains company-level ESG scores along with financial and categorical features relevant for prediction.
- **Format:** CSV file with both numerical and categorical columns.

## 3. Code Information

- **Project Structure:**
  ```
  source code/
  ├── raw_data/
  │   └── company_esg_financial_dataset.csv
  ├── src/
  │   ├── data/
  │   │   ├── preprocessing.py
  │   │   └── dataset.py
  │   ├── models/
  │   │   ├── excelformer.py
  │   │   └── layers.py
  │   ├── training/
  │   │   ├── trainer.py
  │   │   └── metrics.py
  │   └── utils/
  │       └── config.py
  ├── configs/
  │   └── default.yaml
  ├── requirements.txt
  └── main.py
  ```

- **Key Components:**
  - `src/data/`: Data loading and preprocessing scripts.
  - `src/models/`: ExcelFormer model and custom layers.
  - `src/training/`: Training loop and evaluation metrics.
  - `src/utils/`: Configuration utilities.
  - `main.py`: Entry point for training and evaluation.

## 4. Usage Instructions

1. **Clone the repository** and navigate to the project directory.

2. **Create a virtual environment:**
   ```bash
   python -m venv venv
   source venv/bin/activate 
   ```

3. **Install dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

4. **Train the model:**
   ```bash
   python main.py --config configs/default.yaml
   ```

## 5. Requirements

- Python 3.7+
- See `requirements.txt` for all required Python libraries.

## 6. Methodology

- **Model Architecture:**
  - Input embedding using Gated Linear Units (GLU).
  - Semi-Permeable Attention (SPA) with 32 heads.
  - GLU-based feedforward layers.
  - Interaction Attenuated Initialization.
  - Two fully-connected layers for final prediction.

- **Evaluation Metrics:**
  - Mean Squared Error (MSE)
  - Mean Absolute Error (MAE)
  - Root Mean Squared Error (RMSE)

## 7. Citations

Dataset used in this project is available at [this paper](https://www.ewadirect.com/proceedings/aemps/article/view/18728)


## 8. License & Contribution Guidelines

- **License:** This project is open source under the MIT License.
- **Contributions:** Contributions are welcome! Please open an issue or submit a pull request for improvements or bug fixes.
.
