# Quantum-inspired Seagull optimised Deep Belief Network approach for cardiovascular disease prediction


## Overview

This project is focused on predicting the presence of heart disease using various machine learning algorithms. The data used in this project comes from multiple sources, including the Cleveland, Hungarian, Switzerland, and Long Beach VA datasets. These datasets are well-known in the field of cardiovascular research and have been used extensively to develop predictive models for heart disease.

## Dataset

The project utilizes the following datasets:
- **Cleveland Data**: `cleveland.data` and `processed.cleveland.data`
- **Hungarian Data**: `hungarian.data` and `processed.hungarian.data`
- **Switzerland Data**: `switzerland.data` and `processed.switzerland.data`
- **Long Beach VA Data**: `long-beach-va.data` and `processed.va.data`

These datasets contain various attributes such as age, sex, chest pain type, resting blood pressure, cholesterol levels, fasting blood sugar, resting ECG results, maximum heart rate achieved, exercise-induced angina, and more.

## Preprocessing

The raw data was preprocessed as follows:
- Removal of duplicate entries.
- Calculation of Body Mass Index (BMI).
- Filtering out unrealistic blood pressure values.
- Splitting the data into training (70%) and test (30%) sets.

## Models Used

The following machine learning models were used to predict heart disease:
- **Logistic Regression**
- **Random Forest Classifier**
- **Gradient Boosting Classifier**
- **Support Vector Machines (SVM)**
- **K-Nearest Neighbors (KNN)**
- **Neural Networks (using Keras)**
- **XGBoost**
- **LightGBM**

Additionally, hyperparameter tuning was performed using `hyperopt`.

## How to Run the Code

1. Clone the repository or download the code.
2. Ensure you have all necessary dependencies installed. You can do this using pip:
    ```
    pip install -r requirements.txt
    ```
3. Run the implementation script:
    ```
    python Implementation_code.py
    ```

## Dependencies

The project requires the following Python libraries:
- `numpy`
- `pandas`
- `matplotlib`
- `sklearn`
- `keras`
- `xgboost`
- `lightgbm`
- `hyperopt`

## Results

The performance of each model was evaluated using cross-validation. The results showed that ensemble models like Random Forest and Gradient Boosting performed better compared to individual classifiers.

## Conclusion

This project demonstrates the effectiveness of machine learning models in predicting heart disease. With proper preprocessing and model tuning, the predictive performance can be significantly improved.

## Acknowledgments

The datasets used in this project are provided by the UCI Machine Learning Repository and have been cited in numerous academic papers related to heart disease prediction.

## Contact

For any questions or issues, please contact [Your Name] at [Your Email].