We have used the UCF101 data set, UCF101 is an action recognition data set of realistic action videos, collected from YouTube, having 101 action categories. This data set is an extension of UCF50 data set which has 50 action categories. the data set is freely available at the following link https://www.crcv.ucf.edu/data/UCF101.php The permanenet link of IEMOCAP data set is as follows https://www.kaggle.com/datasets/samuelsamsudinng/iemocap-emotion-speech-database?resource=download