# Predicting Amyloid Proteins using Attention-Based Long Short-Term Memory ## Introduction This repository contains the implementation of our attention-based Long Short-Term Memory (LSTM) model for predicting amyloid proteins. The model leverages LSTM networks, enhanced with an attention mechanism, to effectively capture patterns in protein sequences that are indicative of amyloid formation. This approach aims to improve the accuracy of amyloid protein prediction, which is crucial for understanding diseases related to amyloid aggregation. ## Requirements To run the code, ensure that you have the following dependencies installed: ``` PyTorch >= 2.0.0 scikit-learn >= 1.4.1 ``` You can install the required packages using pip: ```sh pip install torch>=2.0.0 scikit-learn>=1.4.1 ``` ## Data Processing The data processing step involves reading protein sequences from FASTA files located in the `./data` directory. The sequences are encoded and then split into training, validation, and test sets. This step prepares the data for training the model. To process the data, run the following command: ```sh python dataset.py ``` This script will generate the necessary data splits for the model training phase. ## Model Training The core of the model is implemented in `model.py`, where we define an LSTM network with an attention mechanism to focus on relevant parts of the sequence. The `utils.py` file includes functions for metric calculation and evaluation of the model's performance. To train the model, execute the following command: ```sh python training.py ``` This will initiate the training process using the preprocessed data. The script includes configurations for hyperparameters, such as the number of epochs, learning rate, and batch size, which you can adjust as needed. ## Implementation Steps 1. **Download the code and data from the Supplemental files** 2. **Install the Required Packages:** Install the dependencies as listed in the requirements section. 3. **Prepare the Data:** Ensure your FASTA files are correctly placed in the `./data` directory. Run `dataset.py` to process and split the data. 4. **Train the Model:** Execute `training.py` to start training the LSTM model. The script will save the trained model and print performance metrics based on the test set. 5. **Evaluate the Model:** Evaluate the model's performance using the saved metrics and compare the results to the benchmarks provided in the paper.