# Predicting adaptability of students in online entrepreneurship education using TabNSA

## Description

This project implements a comprehensive machine learning pipeline for predicting student adaptability levels in online entrepreneurship education using the TabNSA (Tabular Native Sparse Attention) model. The pipeline includes data preprocessing, model training, and evaluation components designed to handle tabular data with categorical features efficiently.

The TabNSA model combines Native Sparse Attention mechanisms with TabMixer modules to capture both local feature dependencies and global feature interactions in educational datasets. This approach is particularly effective for analyzing student characteristics and predicting their adaptability to online learning environments.

## Code Information

```
python_code/
├── README.md                                    # Project documentation
├── requirements.txt                             # Python dependencies
├── students_adaptability_level_online_education.csv  # Original dataset
├── data_prep.py                                # Data preprocessing and splitting
├── tabnsa_model.py                             # TabNSA model implementation
├── model_train.py                              # Model training pipeline
├── evaluate.py                                 # Model evaluation and metrics
└── output/                                     # Generated during training/evaluation
    ├── best_model.pth                          # Best model checkpoint
    ├── model.pth                               # Final trained model
    ├── history.json                            # Training history
    ├── test_metrics.json                       # Evaluation metrics
    ├── confusion_matrix.png                    # Confusion matrix visualization
    └── classification_report.txt              # Detailed classification report
```

## Dataset Description

### Overview
The dataset contains information about students' adaptability levels in online entrepreneurship education. It was collected to understand how various student characteristics and environmental factors influence their ability to adapt to online learning platforms.

### Dataset Details
- **Source**: `students_adaptability_level_online_education.csv`
- **Total Samples**: 1,206 original samples → 1,205 after cleaning (removed duplicates)
- **Features**: 14 categorical features
- **Target Variable**: `Adaptivity Level` (3 classes: Low, Moderate, High)
- **Data Type**: Tabular data with mixed categorical features

### Feature Description
| Feature | Description | Categories |
|---------|-------------|------------|
| Gender | Student's gender | Boy, Girl |
| Age | Age group | 11-15, 16-20, 21-25 |
| Education Level | Current education level | School, College, University |
| Institution Type | Type of educational institution | Government, Non Government |
| IT Student | Whether student is studying IT | Yes, No |
| Location | Urban/Rural location | Yes (Urban), No (Rural) |
| Load-shedding | Frequency of power outages | Low, High |
| Financial Condition | Economic status | Poor, Mid |
| Internet Type | Type of internet connection | Wifi, Mobile Data |
| Network Type | Mobile network generation | 3G, 4G |
| Class Duration | Duration of online classes | 0, 1-3, 3-6 |
| Self Lms | Self-learning management system usage | Yes, No |
| Device | Device used for online learning | Mobile, Tab, Computer |
| Adaptivity Level | Target variable - adaptability level | Low, Moderate, High |

### Data Characteristics
- **Missing Values**: None (complete dataset)
- **Class Distribution**: Balanced across adaptability levels
- **Preprocessing**: One-hot encoding applied to all categorical features
- **Train/Validation/Test Split**: 80/10/10 (stratified)


## Requirements
Create a virtual environment (optional) and install requirements:

```bash
pip install -r requirements.txt
```

## Usage Instructions
1) Preprocess and split (stratified 80/10/10)
```bash
python data_prep.py \
  --data_csv "../students_adaptability_level_online_education.csv"
```
Outputs under `./preprocessed_data/`:
- `train.csv`, `val.csv`, `test.csv` (raw, cleaned)
- `preprocessor.joblib` (OneHotEncoder pipeline for features)
- `label_encoder.joblib` (LabelEncoder for target)
- `feature_names.json` (expanded feature names post-encoding)
- `preprocessing_info.json` (dataset statistics and metadata)

2) Train TabNSA model
```bash
python model_train.py \
  --artifacts_dir "./preprocessed_data" \
  --epochs 100 \
  --batch_size 32 \
  --learning_rate 0.001
```
Outputs under `./output/`:
- `best_model.pth` (best model during training)
- `model.pth` (final trained model)
- `history.json` (training history and model config)

3) Evaluate on test set
```bash
python evaluate.py \
  --artifacts_dir "./preprocessed_data"
```
Outputs under `./output/`:
- `test_metrics.json` (accuracy, precision, recall, f1 per-avg)
- `confusion_matrix.png`
- `classification_report.txt`

## TabNSA Model Architecture

The TabNSA (Tabular Native Sparse Attention) model is a hybrid deep learning framework that combines:

### 1. **Native Sparse Attention (NSA)**
- **Sliding-window local attention**: Captures dependencies among neighboring features
- **Block-wise compression**: Aggregates non-overlapping feature blocks
- **Block selection**: Selects most informative blocks for processing
- **Linear complexity**: Scales efficiently for high-dimensional tabular data

### 2. **TabMixer Module**
- **Channel-wise MLP**: Processes each feature embedding independently
- **Token-wise MLP**: Captures interactions between different features
- **Non-linear activations**: ReLU functions for enhanced representational capacity

### 3. **Feature Processing Pipeline**
- **Feature embedding**: Projects each feature to D-dimensional space
- **Multi-layer processing**: Stacked NSA + TabMixer layers
- **Fusion mechanism**: Element-wise summation of NSA and TabMixer outputs
- **Global pooling**: Mean pooling across feature dimension
- **Classification head**: Two-layer MLP with GeLU activations

## Model Hyperparameters

Default configuration:
- `embed_dim`: 128 (embedding dimension)
- `num_heads`: 8 (attention heads)
- `num_layers`: 3 (number of NSA+TabMixer layers)
- `window_size`: 8 (sliding window size for local attention)
- `block_size`: 4 (block size for compression)
- `mlp_ratio`: 4.0 (MLP expansion ratio)
- `dropout`: 0.1 (dropout rate)

## Training Features
- **Class Weighting**: Handles class imbalance using sklearn's balanced class weights
- **Early Stopping**: Stops training when validation loss doesn't improve for 10 epochs
- **Learning Rate Scheduling**: Reduces LR by factor of 0.5 when validation loss plateaus
- **Weight Decay**: AdamW optimizer with L2 regularization
- **Device Support**: Automatically uses CUDA if available, falls back to CPU

## Citation
- Dataset DOI: 10.1109/icccnt51525.2021.9579741
- TabNSA  DOI: 10.48550/ARXIV.2503.09850

## License
This code is licensed under the MIT License.
