# README: Bike Sharing Forecasting Program

## Overview

This program implements multiple machine learning models to forecast the number of available bikes in a bike-sharing system for the next several time steps. The goal is to provide accurate short-term predictions to help optimize bike distribution and availability.

The models implemented in this project include:

- **Decision Tree Regressor**
- **Random Forest Regressor**
- **XGBoost Regressor**
- **Linear Regression**
- **Support Vector Regressor (SVR)**
- **K-Nearest Neighbors (KNN)**
- **Multi-Layer Perceptron (MLP)**
- **AdaBoost Regressor**

## Prerequisites

Before running the program, ensure you have Python installed along with the required libraries. You can install all dependencies using the following command:

```bash
pip install numpy pandas scikit-learn matplotlib seaborn xgboost tensorflow keras
```

### Required Libraries:

- `numpy`: For numerical computations.
- `pandas`: For handling datasets.
- `scikit-learn`: For implementing machine learning models.
- `matplotlib` & `seaborn`: For data visualization.
- `xgboost`: For the XGBoost regressor.
- `tensorflow` & `keras`: For deep learning models such as MLP.

## Dataset

The program uses a dataset from the `seoul hourly/2020_hourly.csv` file. This dataset contains hourly records of bike availability in a bike-sharing system, along with other relevant features such as temperature, humidity, and time-related attributes.

Ensure the dataset file is available in the specified path before executing the script.

## Running the Program

To execute the script, use the following command:

```bash
python <script_name>.py
```

Replace `<script_name>.py` with the actual name of your script.

## Key Functionalities

### 1. Data Preprocessing

- Loads and cleans the dataset.
- Handles missing values and outliers.
- Normalizes numerical features for better model performance.

### 2. Feature Engineering

- Extracts relevant statistical features such as:
  - Mean, median, standard deviation, skewness, and kurtosis of bike availability.
- Generates additional time-based features such as:
  - Hour of the day, day of the week, and seasonal indicators.

### 3. Train-Test Split

- Splits data into training and testing sets to evaluate model performance.
- Uses a time-series approach for data partitioning.

### 4. Model Training & Evaluation

- Trains multiple machine learning models.
- Evaluates models using the following metrics:
  - **Root Mean Squared Error (RMSE)**
  - **Mean Absolute Error (MAE)**
  - **Mean Absolute Percentage Error (MAPE)**
  - **R-squared (R²) score**

### 5. Forecasting

- Predicts bike availability for the next several time steps.
- Provides insights into future demand trends.

## Output

- The program prints evaluation metrics (RMSE, MAE, R²) for each model.
- Forecasted values for future time steps.
- Optionally, visualizes actual vs. predicted values using matplotlib and seaborn.

## Customization

- Modify the `n_steps_in` and `n_steps_out` parameters to adjust the forecasting window.
- Uncomment the visualization section at the end of the script to generate graphical analysis.
- Change hyperparameters of the models to fine-tune performance.

## Notes

- Ensure the dataset is correctly formatted before running the script.
- Data preprocessing steps can be adjusted based on specific dataset characteristics.
- Running deep learning models like MLP may require more computational power.

## Author

**Ganjar Alfian**\
Email: [ganjar.alfian@ugm.ac.id](mailto\:ganjar.alfian@ugm.ac.id)

## License

This project is licensed under the MIT License - see the LICENSE file for details.