README 1. Folder Overview This folder contains the following: - ML_IoT_Device_ID.Rproj: The RStudio project file for this experiment. - 5 R scripts: These scripts store the code required for the experiment. - 14 RData files: These files store the experimental data. - Two datasets: (1) IotDataset.csv (2) IotSentinal.csv 2. Prerequisites Before running ML_IoT_Device_ID.Rproj, ensure that the following software is installed: - R Programming Language Download link: https://www.r-project.org - RStudio Description: RStudio is a powerful integrated development environment (IDE) for R, designed to manage and execute R projects, including .Rproj files. Download link: https://posit.co/products/open-source/rstudio/ 3. Code Overview The experimental code is divided into 5 parts: (1) experiment.R Purpose: Executes the experiment, stores results, and visualizes outcomes. Experiment.R is the experiment execution script. This script conducts a series of experiments on two IoT datasets ('IotDataset.csv' and 'IotSentinal.csv') using various feature selection methods and classification algorithms. The primary goal is to evaluate the performance of different classifiers (SVM, Neural Networks, Decision Trees, Random Forest) on feature sets derived from the IoT datasets using different feature selection techniques: a. Pearson Correlation Coefficient (PCC) b. Mutual Information (MI) c. Binary Grey Wolf Optimizer (BGWO) d. Binary Genetic Algorithm (BGA) e. Binary Particle Swarm Optimization (BPSO) The experiment follows these key steps: - Step 1: Loading and initializing the datasets. - Step 2: Performing feature selection using the aforementioned techniques (PCC, MI, BGWO, BGA, BPSO). - Step 3: Training and evaluating classifiers on the selected feature sets. - Step 4: Repeating the experiments with multiple random seeds to ensure statistical significance. - Step 5: Storing results for comparison and further analysis. The script is organized into the following sections: - Section 1: Full Feature Set Evaluation: Evaluates classifiers using all available features. - Section 2: Pearson Correlation Coefficient (PCC): Features are selected based on correlation thresholds. - Section 3: Mutual Information (MI): Features are selected based on mutual information thresholds. - Section 4: Binary Grey Wolf Optimizer (BGWO): Optimized feature selection and classification. - Section 5: Binary Genetic Algorithm (BGA): Optimized feature selection and classification. - Section 6: Binary Particle Swarm Optimization (BPSO): Optimized feature selection and classification. Results are saved at each stage, enabling easy comparison between different feature selection methods and classifiers. (2) featureSelectionMethods.R Purpose: Contains all the feature selection algorithm functions. Functions: - pearsonCorrelation: Performs feature selection using the Pearson correlation coefficient to remove highly correlated features and evaluates the resulting feature subset with a classifier. - mutualInfo: Performs feature selection using Mutual Information (MI) to evaluate the relationship between features and the target variable. Selects features with MI above a given threshold and evaluates a classifier's performance. - BGWO: Implements Binary Grey Wolf Optimizer for feature selection using a fitness function to find the optimal subset of features. - BGA: Implements Binary Genetic Algorithm for feature selection by evolving a population of feature subsets through crossover and mutation based on a fitness function. - BPSO: Implements Binary Particle Swarm Optimization for feature selection. Updates particle positions based on their best-known positions and the global best in the swarm until optimization completes. (3) classifier.R Purpose: Stores all supervised machine learning algorithms used for IoT device classification including SVM, Neural Networks, Decision Trees, and Random Forests. Functions: - svm/nn/dt/rf_fitnessFunction1/2: Each function is designed to evaluate the performance of models on a specific dataset. - fitnessFunction: Calculates the fitness value for a model based on prediction performance and feature selection. - feature_set_summary: Summarizes the results of the feature selection algorithm. - rf_classifier_d/s: Classifier function which returns confusion matrix. (4) drawGraph.R Contains functions for visualizing experimental results. Functions: - drawPearsonResult: This function visualizes the results of feature selection based on Pearson Correlation Coefficient (PCC). It shows the number of selected features at various correlation thresholds with a bar chart, and overlays the classification accuracy (by classifier type) to illustrate the relationship between accuracy and the number of selected features. - drawMutualInfoResult: This function visualizes the results of feature selection based on Mutual Information (MI). It uses a bar chart to display the number of features selected at various MI boundaries and combines it with accuracy data to show the relationship between the number of features selected and classification accuracy. - drawResult: This function visualizes the feature selection and classification result in each repeat experiment. It plots a bar chart and line graph to show the number of selected features and their corresponding accuracy. The x-axis represents the experiment number, and the y-axis shows the number of features selected, with accuracy values annotated. - drawResultSumNew: This function summarizes and visualizes all experimental results, displaying the number of features selected and classification accuracy for different feature selection methods. It generates a boxplot for each classifier and feature selection method, showing the distribution of accuracy and the number of features selected. - drawConfusionMatrix: This function visualizes the confusion matrix of a classifier, which compares the predicted labels to the real labels. The matrix uses color gradients and numeric labels to provide an intuitive view of the classifier's performance. (5) otherFunction.R Includes miscellaneous functions used in the experiment, such as data calculation and format conversion.