# README: A Novel Hybrid TCN-TE-ANN Model for High-Precision Solar Irradiance Prediction

---

## File Structure
|-- Overview/                       # An overview of the codes
|-- What the Codes are Used for/    # Key Features of the Codes
|-- What the Code Does/             # Steps of the Codes
|-- Introduction of the codes/      # Code Structure
|-- Prerequisites/                  # The dependencies to install
|-- Dataset/                        # How to get the dataset and the codes
    |--- Source
    |--- DOI/URL
    |--- Dataset Clarity    
|-- Repository/                     # The publicly available repository
|-- STEPS FOR IMPLEMENTATION/       # Steps for reproducing the codes
    |--- 1. Access and Download the Codes and Dataset
    |--- 2. Data Preprocessing
        |---- 2.1. Feature Selection Process
    |--- 3. Executing the Jupyter Notebook File
        |---- 3.1: Open File Using Jupyter-Notebook
        |---- 3.2: Run All Cells
|-- Outputs
    |--- 1. Model Metrics
    |--- 2. Visualizations
|-- Usage and License
    |--- Restricted Use
    |--- Feature Access
|-- Disclaimer

---

### Overview
This code presents a hybrid machine learning model combining **Temporal Convolutional Network (TCN)**, **Transformer**, and **Artificial Neural Network (ANN)** for accurate **Global Horizontal Irradiance (GHI)** prediction. The approach leverages time-series meteorological data and deep learning techniques to address challenges in solar irradiance forecasting.

### What the Codes are Used for:
- **Accurate Predictions**: Provides high-precision GHI predictions, essential for optimizing solar energy systems and grid stability.
- **Comprehensive Workflow**: Combines advanced machine learning techniques into a streamlined process, ensuring robustness and scalability.
- **Reproducibility**: Fully self-contained with clear instructions for data preprocessing, model training, and evaluation.
- **Real-World Impact**: Supports renewable energy planning and integration, aligning with global sustainability goals.

### What the Code Does
- **Data Preprocessing**: Cleans the raw dataset, selects relevant features, scales data, and prepares time-series sequences.
- **Feature Extraction with TCN**: Captures temporal dependencies from sequential data using a Temporal Convolutional Network.
- **Feature Refinement with Transformer Encoder**: Applies attention mechanisms to prioritize relevant patterns in the extracted features.
- **Prediction with ANN**: Uses a dense neural network to predict GHI values based on the refined features.
- **Evaluation**: Assesses model performance with metrics such as MAE, MSE, RMSE, and R².
- **Visualization**: Generates plots, including scatter plots, residuals, learning curves, and error histograms, to analyze predictions and errors.

---

### Introduction of the codes
The entire code for the model is available in a single Jupyter Notebook file named **"The codes for TCN+Transformer+ANN.ipynb"**, which can be publicly accessible on Kaggle with the provided DOI/URL at below. It contains all the steps for data preprocessing, model training, evaluation, and visualization.

- **Data Preprocessing**: Handles NASA-sourced solar irradiance data and scaling.
- **Temporal Feature Extraction**: Utilizes a TCN to capture temporal dependencies in the data.
- **Attention Mechanism**: Incorporates a Transformer Encoder to prioritize relevant features.
- **Prediction**: Applies a dense ANN to refine extracted features and predict GHI.
- **Evaluation**: Implements standard metrics, including MAE, MSE, RMSE, and R², to assess model performance.
- **Visualization**: Generates visualizations for scatter plots, residuals, error distributions, and learning curves.

---

### Prerequisites
To reproduce this codes, ensure the following dependencies are installed:
- Python 3.7 or higher
- TensorFlow (for deep learning)
- NumPy (for numerical computations)
- Pandas (for data manipulation)
- Scikit-learn (for preprocessing and evaluation)

Install dependencies using:
```bash
pip install tensorflow numpy pandas scikit-learn tcn matplotlib
```

---

### Dataset
- **Source:** The dataset is sourced from **NASA's API** (https://search.earthdata.nasa.gov/) and spans 22 years (2000–2022) with a 30-minute resolution.
- **DOI/URL:** [10.34740/KAGGLE/DS/6006986](https://www.kaggle.com/datasets/muratiik/solar-radiation-data-for-forcating-fron-nasa)
    -- This DOI/URL is also prominently cited in the manuscript to align with reproducibility standards.
- The dataset includes meteorological and irradiance features from three significant U.S. solar sites: Desert Sunlight, Copper Mountain, and Solar Star.
- **Dataset Clarity:** The dataset used in this study is sourced from the National Solar Radiation Database (NSRDB), maintained by the U.S. Department of Energy (DOE) and operated by NREL. The NSRDB provides open access to its datasets under a free license for research and academic purposes. Users are granted the right to use or copy the data, provided proper credit is given to DOE/NREL/ALLIANCE (https://nsrdb.nrel.gov). The dataset spans 22 years (2000–2022) with a 30-minute temporal resolution and includes meteorological and irradiance features. The version of the dataset used in this study has been curated for accuracy and reproducibility. Please refer to article after publication for more details. The curated dataset is hosted on Kaggle for transparency.
- Note: Data preprocessing details are in the **code** section.

### Repository
The dataset and the code repository is available on **Kaggle**: [10.34740/KAGGLE/DS/6006986](https://www.kaggle.com/datasets/muratiik/solar-radiation-data-for-forcating-fron-nasa)

---

### Steps for Implementation

- Note: the codes are for just one point of station. For other stations, you need to just change the csv file accordingly. 

#### 1. Access and Download the Codes and Dataset
- Download the dataset and the codes from the provided DOI/URL.
- The code for the feature selection process is vailable in a single Jupyter Notebook file named **"Feature_Selection.ipynb"**
- The entire codebase for the model is available in a single Jupyter Notebook file named **"The codes for TCN+Transformer+ANN.ipynb"**, which is self-contained and provides all steps required for implementation, ensuring clarity for reproducibility.
- Place the dataset files and the codes in a folder in the root directory for the Jupyter.

#### 2. Data Preprocessing
- Ensure that the downloaded dataset is accessible within the same directory as the Jupyter Notebook File.
- The preprocessing step ensures that the raw dataset is structured and ready for model training. The annual files collected from the NSRDB API were consolidated into a single unified file for each data point, covering the period from 2000 to 2022 with a 30-minute temporal resolution. Initial rows containing metadata such as units and time zone information were processed to extract essential attributes, including Location ID, Latitude, Longitude, and Elevation, which were converted into individual columns. These metadata rows were subsequently removed to streamline the dataset. Additionally, the Cloud Type feature, a categorical variable, was transformed using One-Hot Encoding to convert it into binary features suitable for the machine learning model. Redundant and irrelevant features such as Clearsky DHI, Clearsky DNI, Clearsky GHI, Global Horizontal UV Irradiance, and geographic coordinates were excluded following a rigorous feature selection process to improve model performance and computational efficiency.
    -- The dataset is split into training (90%) and testing (10%) sets. (The test set covers approximately 26  of the data).
    -- Sequential input data is created using time steps.
    -- Features are scaled using **MinMaxScaler** to improve model performance.

##### 2.1. Feature Selection Process
- The code for the the feature selection process is available in a single Jupyter Notebook file named **"Feature_Selection.ipynb"**, which is also available through Kaggle on the same provided URL/DOI.
- Execute the file from jupyter notebook. Analyze correlation and mutual information scores to select the most relevant features for model training.
    -- The results will be positive/negative correlation scores and mutual Information scores. 
- After the feature selection process only these features are used:
                    ['Year', 'Month', 'Day', 'Hour', 'Minute', 'Temperature', 'Relative Humidity',
                     'Solar Zenith Angle', 'Surface Albedo', 'Pressure', 'Precipitable Water',
                     'Wind Direction', 'Wind Speed', 'DHI', 'DNI','GHI']

#### 3. Executing the Jupyter Notebook File
Execute the following steps in sequence:

##### **3.1: Open File Using Jupyter-Notebook**
- Launch the Jupyter Notebook interface and open **"The codes for TCN+Transformer+ANN.ipynb"**.
- Note: the codes are for just one point of station. For other stations, you need to just change the csv file accordingly. 

##### **3.2: Run All Cells**
Execute all cells sequentially to:
- Preprocess the dataset.
- Extract temporal features using TCN.
    - The **Temporal Convolutional Network (TCN)** extracts temporal dependencies from sequential data.
    - TCN features are saved for subsequent processing.
    - This generates feature files saved as `.npy`.
- Apply the Transformer Encoder for feature refinement.
    - The **Transformer** applies attention mechanisms to enhance the TCN features, focusing on relevant time steps and patterns.
- Train the ANN model and predict GHI values.
    - The **Artificial Neural Network (ANN)** refines the features and predicts the GHI values.
    - The ANN consists of fully connected layers with Batch Normalization and Dropout for robustness.
- Evaluate model performance and generate visualizations. The model is evaluated using the following performance metrics:
    - Mean Absolute Error (**MAE**)
    - Mean Squared Error (**MSE**)
    - Root Mean Squared Error (**RMSE**)
    - Coefficient of Determination (**R²**)

---

### Outputs
1. **Model Metrics**: MAE, MSE, RMSE, and R² values.
2. **Visualizations**:
   - Scatter plot of predicted vs. actual values.
   - Learning curves (training and validation loss).
   - Residual plots and error distributions.

### Usage and License
The dataset and code are part of an ongoing academic study and are shared for **transparency and reproducibility**. Please note:
- **Restricted Use:** The dataset cannot be publicly used, reproduced, or distributed until the associated research paper is officially published.
- **Future Access:** Upon publication, access guidelines will be updated, and usage terms clarified.

For queries, please contact the corresponding author: [muratisik@ahievran.edu.tr](mailto:muratisik@ahievran.edu.tr).

---
**Disclaimer:**
This code and dataset are shared as part of ongoing research. Usage in violation of these terms is not permitted.