# README: Bike Sharing Forecasting Program ## Overview This program implements multiple machine learning models to forecast the number of available bikes in a bike-sharing system for the next several time steps. The goal is to provide accurate short-term predictions to help optimize bike distribution and availability. The models implemented in this project include: - **Decision Tree Regressor** - **Random Forest Regressor** - **XGBoost Regressor** - **Linear Regression** - **Support Vector Regressor (SVR)** - **K-Nearest Neighbors (KNN)** - **Multi-Layer Perceptron (MLP)** - **AdaBoost Regressor** ## Prerequisites Before running the program, ensure you have Python installed along with the required libraries. You can install all dependencies using the following command: ```bash pip install numpy pandas scikit-learn matplotlib seaborn xgboost tensorflow keras ``` ### Required Libraries: - `numpy`: For numerical computations. - `pandas`: For handling datasets. - `scikit-learn`: For implementing machine learning models. - `matplotlib` & `seaborn`: For data visualization. - `xgboost`: For the XGBoost regressor. - `tensorflow` & `keras`: For deep learning models such as MLP. ## Dataset The program uses a dataset from the `seoul hourly/2020_hourly.csv` file. This dataset contains hourly records of bike availability in a bike-sharing system, along with other relevant features such as temperature, humidity, and time-related attributes. Ensure the dataset file is available in the specified path before executing the script. ## Running the Program To execute the script, use the following command: ```bash python .py ``` Replace `.py` with the actual name of your script. ## Key Functionalities ### 1. Data Preprocessing - Loads and cleans the dataset. - Handles missing values and outliers. - Normalizes numerical features for better model performance. ### 2. Feature Engineering - Extracts relevant statistical features such as: - Mean, median, standard deviation, skewness, and kurtosis of bike availability. - Generates additional time-based features such as: - Hour of the day, day of the week, and seasonal indicators. ### 3. Train-Test Split - Splits data into training and testing sets to evaluate model performance. - Uses a time-series approach for data partitioning. ### 4. Model Training & Evaluation - Trains multiple machine learning models. - Evaluates models using the following metrics: - **Root Mean Squared Error (RMSE)** - **Mean Absolute Error (MAE)** - **Mean Absolute Percentage Error (MAPE)** - **R-squared (R²) score** ### 5. Forecasting - Predicts bike availability for the next several time steps. - Provides insights into future demand trends. ## Output - The program prints evaluation metrics (RMSE, MAE, R²) for each model. - Forecasted values for future time steps. - Optionally, visualizes actual vs. predicted values using matplotlib and seaborn. ## Customization - Modify the `n_steps_in` and `n_steps_out` parameters to adjust the forecasting window. - Uncomment the visualization section at the end of the script to generate graphical analysis. - Change hyperparameters of the models to fine-tune performance. ## Notes - Ensure the dataset is correctly formatted before running the script. - Data preprocessing steps can be adjusted based on specific dataset characteristics. - Running deep learning models like MLP may require more computational power. ## Author **Ganjar Alfian**\ Email: [ganjar.alfian@ugm.ac.id](mailto\:ganjar.alfian@ugm.ac.id) ## License This project is licensed under the MIT License - see the LICENSE file for details.