Subject independent emotion recognition using EEG and physiological signals – a comparative study

Manju Priya Arthanarisamy Ramaswamy (Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India)
Suja Palaniswamy (Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India)

Applied Computing and Informatics

ISSN: 2634-1964

Article publication date: 29 September 2022

1081

Abstract

Purpose

The aim of this study is to investigate subject independent emotion recognition capabilities of EEG and peripheral physiological signals namely: electroocoulogram (EOG), electromyography (EMG), electrodermal activity (EDA), temperature, plethysmograph and respiration. The experiments are conducted on both modalities independently and in combination. This study arranges the physiological signals in order based on the prediction accuracy obtained on test data using time and frequency domain features.

Design/methodology/approach

DEAP dataset is used in this experiment. Time and frequency domain features of EEG and physiological signals are extracted, followed by correlation-based feature selection. Classifiers namely – Naïve Bayes, logistic regression, linear discriminant analysis, quadratic discriminant analysis, logit boost and stacking are trained on the selected features. Based on the performance of the classifiers on the test set, the best modality for each dimension of emotion is identified.

Findings

 The experimental results with EEG as one modality and all physiological signals as another modality indicate that EEG signals are better at arousal prediction compared to physiological signals by 7.18%, while physiological signals are better at valence prediction compared to EEG signals by 3.51%. The valence prediction accuracy of EOG is superior to zygomaticus electromyography (zEMG) and EDA by 1.75% at the cost of higher number of electrodes. This paper concludes that valence can be measured from the eyes (EOG) while arousal can be measured from the changes in blood volume (plethysmograph). The sorted order of physiological signals based on arousal prediction accuracy is plethysmograph, EOG (hEOG + vEOG), vEOG, hEOG, zEMG, tEMG, temperature, EMG (tEMG + zEMG), respiration, EDA, while based on valence prediction accuracy the sorted order is EOG (hEOG + vEOG), EDA, zEMG, hEOG, respiration, tEMG, vEOG, EMG (tEMG + zEMG), temperature and plethysmograph.

Originality/value

Many of the emotion recognition studies in literature are subject dependent and the limited subject independent emotion recognition studies in the literature report an average of leave one subject out (LOSO) validation result as accuracy. The work reported in this paper sets the baseline for subject independent emotion recognition using DEAP dataset by clearly specifying the subjects used in training and test set. In addition, this work specifies the cut-off score used to classify the scale as low or high in arousal and valence dimensions. Generally, statistical features are used for emotion recognition using physiological signals as a modality, whereas in this work, time and frequency domain features of physiological signals and EEG are used. This paper concludes that valence can be identified from EOG while arousal can be predicted from plethysmograph.

Keywords

Citation

Arthanarisamy Ramaswamy, M.P. and Palaniswamy, S. (2022), "Subject independent emotion recognition using EEG and physiological signals – a comparative study", Applied Computing and Informatics, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/ACI-03-2022-0080

Publisher

:

Emerald Publishing Limited

Copyright © 2022, Manju Priya Arthanarisamy Ramaswamy and Suja Palaniswamy

License

Published in Applied Computing and Informatics. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode


1. Introduction

Subject independent emotion recognition using single or multiple modalities is a burgeoning area of research in affective computing. Emotion recognition (ER) plays a vital role in human computer interaction (HCI) as it tries to make HCI, similar to human–human interaction (HHI) by incorporating ER and emotion expression capabilities in machines. The distinguishing feature between HCI and HHI is the ER and emotion expression capabilities of humans.

Humans recognize others’ emotions via facial expression and contextual information in day-to-day life. Emotions serve as evolved communication and hence should evoke behaviors that reveal the subjects’ emotional state to others [1]. The emotional state of a person can be inferred from behavior in face, voice, whole-body and observer ratings. James’s emotion theory [2] states that emotional response can be measured using peripheral physiological signals. Some of the peripheral physiological signals used in ER are electrodermal activity (EDA), cardiovascular activity and respiration activity. Cannon’s emotion theory [3] suggests that emotions are derived from subcortical centers, and this led to the study of emotional responses of central nervous system (CNS) signals using EEG, neuroimaging techniques and electrooculogram (EOG).

Subject dependent unimodal and multimodal ER provides considerable accuracy, while subject independent ER needs improvement. One aspect that hinders baseline of subject independent ER models is the non-availability of subject independent test sets for the publicly available multimodal ER datasets. Many of the subject independent ER studies in the literature provide an average of leave one subject out (LOSO) validation score as final accuracy. In this work, the test subjects used for validation of the model are specified explicitly so that any future work can use these model scores as a baseline.

Subject independent ER capabilities of time and frequency domain features of EEG and peripheral physiological signals, namely EOG, EMG, EDA, temperature, plethysmograph and respiration both independently and in combination on the DEAP dataset in arousal and valence dimensions using classifiers - Naïve Bayes, logistic regression, linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logit boost and stacking are explored in this research. Through this research work, it is found that, from an ergonomic perspective, valence can be measured from the eyes while arousal can be measured from changes in blood volume. The model scores of this work can be used as a baseline for future work as this work reports the results on a truly subject independent test set.

2. Related works

The recent advances in multimodal ER are in the areas of feature extraction, feature selection, modeling and fusion strategies. Multimodal ER involves three important aspects: extracting shared representations from multiple modalities, removing redundant features and learning key features from each modality. To address all three aspects, multimodal deep belief network (MDBN) was investigated [4]. Recent studies in the literature used global average pooling [5], deep belief network (DBN) [6] and multi-hypergraph neural network [7] to investigate the aspect of correlation among features in multimodal ER. The optimal combination of features play a significant role in multimodal ER and was studied using multi-kernel learning approach [8] and deep learning based hierarchical feature fusion approach [9]. Recent studies have explored the significance of features in ER [10].

The body of work in literature has explored the feature extraction ability of deep learning networks for end-to-end ER architectures and its performance was determined by the strength of the input signals [11]. Deep learning architectures like ensemble convolution neural network (ECNN) [5], DBN [6], inception ResNet v2 [12], spiking neural networks (SNN) [13], autoencoder [14], hierarchy modular neural network (HMNN) [15], MDBN [16], transfer learning [17], transformer-based architecture using CNN [17] and high resolution network (HRNet) [18] were explored for ER.

Decision level fusion versus feature level fusion is a long-standing contention in the field of multimodal ER. The decision level fusion improves the accuracy by 5% in comparison to unimodal accuracy [19], whereas the feature level fusion provides ER accuracy comparable to decision level fusion with less computation time [10]. Some of the literature [10, 12, 19] reported LOSO validation score as final accuracy which is a limitation in subject independent ER.

The work reported in this paper sets the baseline for subject independent ER using DEAP dataset by clearly specifying the subjects used in training and test set. In addition, this work specifies the cut-off score used to classify the scale as low or high in arousal and valence dimensions. Generally, statistical features are used for ER using physiological signals as a modality, whereas in this work, time and frequency domain features of physiological signals and EEG are used. The experiment is conducted on both modalities independently and in combination. This work arranges the physiological signals in order based on the prediction accuracy obtained on test data using time and frequency domain features.

3. Materials and methods

DEAP dataset is used to compare the prediction ability of time and frequency domain features of EEG and physiological signals over a similar set of classifiers and to sort the physiological signals. In this experiment, two ensemble classifiers – logit boost and stacking and two statistical classifiers – Naïve Bayes and QDA are used. All four classifiers are used independently and in combination of EEG and physiological signals. The feature selection and training of classifiers are performed using Weka software [20]. The proposed methods for arousal and valence prediction in multimodal and unimodal environments are shown in Figure 1.

3.1 DEAP dataset description

DEAP [21] dataset has EEG and peripheral physiological signal recordings of 32 participants (16 for each gender). The signals were recorded when the participants watched music video of length one minute. Each participant watched a subset of 40 music videos and rated the valence, arousal, dominance and liking of each video. For each trial, 32 channels of EEG signals and 12 channels of peripheral signals were recorded using Biosemi active two system at 512 Hz.

3.2 Evaluation measures

The evaluation metrics used to compare different models in this experiment are accuracy and F1-score. Additional metrics, namely ROC area and kappa statistic are reported for the proposed methods.

3.2.1 Accuracy

Accuracy is the measure of correctly classified instances. The accuracy percentage ranges from 0 to 100, where 100 is the best possible accuracy and is shown in equation (1).

(1)Accuracy=TruePositive+TrueNegativeTruePositive+FalsePositive+TrueNegative+FalseNegative

3.2.2 F1-score

F1-score is the harmonic mean of precision and recall. The range of F1-score varies from 0 to 1, where 0 is the worst possible score and 1 is the best possible score and is shown in equation (2). F1-score gives better measure of incorrectly classified instances.

(2)F1score=2*Precision*RecallPrecision+Recall

3.2.3 ROC area

The area under the ROC curve measures the ability of the binary classifier to distinguish between classes. The value ranges from 0 to 1, where 1 implies the classifier is able to perfectly distinguish between the classes.

3.2.4 Cohen’s kappa

Cohen’s kappa values range from −1 to 1, where 1 implies the model is good. A kappa value of 0 indicates that the model is as good as a chance classifier.

3.3 Training and test dataset split

The dataset is split into subject independent training and test sets in the ratio of 70:30. The data of 22 participants is used as training set, while the remaining data of 10 participants is used as test set. Subjects – s02, s04, s05, s09, s15, s20, s23, s28, s29 and s30 are used in the test set while the rest of the subjects are used in the training set.

3.4 Labeling strategy

As the objective is to train classifiers using supervised learning, the continuous scale ratings of valence and arousal are converted into labels by splitting the continuous scale. The scale range [0, 5.0] is considered as low, while (5.0, 9.0] is considered as high. The scale value of 5.0 is chosen as the split point as the mean value of the ratings lie approximately around 5.0. The labeling strategy and distribution of labels across the training and test sets are shown in Table 1.

3.5 Pre-processing

The DEAP dataset provides the pre-processed data and is explained in this sub-section. The EEG signals were down-sampled to 128 Hz and EOG artefacts were removed. Bandpass filter with frequency range of 4.0 to 45.0 Hz was applied. The EEG data were averaged to common reference and pre-trial baseline was removed. The physiological signals were down-sampled to 128 Hz and the pre-trial baseline was removed.

3.6 Feature extraction

Time domain and frequency domain features were used to find the electrode position for top-30 features and it was found that frequency-based power spectrum density provided better accuracy [22]. In contrast, another study found that power spectral density did not perform well [23]. As emotions vary with time, Hjorth features were widely used in ER as these features are useful in monitoring time varying EEG signals [24]. Hence, this work extracted both time domain features – Hjorth activity, Hjorth complexity and frequency domain feature – power spectral density (PSD) and used feature selection process to select the best performing features [25]. Hjorth activity and Hjorth complexity features are computed for the entire time range of the signal. A total of 280 features are computed as shown in Table 2.

Horizontal EOG, vertical EOG, zygomaticus major EMG and trapezius EMG are computed by subtracting the corresponding values between two channels and is shown in Table 2. In this work, EEG is considered as one modality, and all other signals are grouped under physiological signals. For the feature extraction process, y(t) is considered as the signal and dy(t)dt as the first derivative of the signal.

3.6.1 Hjorth activity

Hjorth activity [24] parameter is the total power of the signal. It is the surface of the power spectrum in the frequency domain and is shown in equation (3).

(3)Activity=Variance(y(t))

3.6.2 Hjorth complexity

Hjorth complexity [24] is a dimensionless parameter defined as the ratio of mobility of the first derivative of the signal to the mobility of the signal, as shown in equation (4). The mobility is defined as the square root of the ratio of the variance of the first derivative of the signal to the variance of the signal, as shown in equation (5). The mobility of the signal represents the frequency variance of the power spectrum and can be illustrated as the standard deviation of the power spectrum along the frequency axis. The Hjorth complexity gives an estimate of the bandwidth of the signal and indicates the shape similarity of the signal to a pure sine wave.

(4)Complexity=Mobility(dy(t)dt)Mobility(y(t))
where mobility is defined as in equation (5).
(5)Mobility=Variance(dy(t)dt)Variance(y(t))

3.6.3 Power spectral density

PSD refers to the spectral energy distribution of the signal per unit time [26] and is computed separately for alpha (8 – 12 Hz), beta (12 – 30 Hz), gamma (30 – 45 Hz), theta (4 – 8 Hz) and delta band (0 – 4 Hz) of each channel using the Welch method.

3.7 Feature selection

In this work, feature selection is done using best first search strategy which is a correlation-based feature subset selection [27] where the correlation between the feature and the output class, and the correlation among the features is computed. Feature selection is done such that the subset of features are highly correlated with the class while intercorrelation among the selected features is low. In this work, best first strategy is carried out with an initial empty feature list followed by iteratively including and excluding all possible single attributes. In best first search strategy, single features that have high correlation with the class are added to the search space. If the added feature does not contribute to the improvement of accuracy, then the algorithm backtracks to the last best subset in the feature space and continues the search. In order to avoid exploring the entire feature space a stopping criterion is used. In this work, the search procedure is terminated if there is no improvement for the last five iterations.

The features selected for each of EEG, physiological, and combined modalities are shown in Table 3. From the features selected, it is observed that only frequency-based PSD features are selected for physiological signals while time based Hjorth features are selected for T7, P7, Fz, FP1, FC6 electrodes of EEG signal. The position of T7, P7, Fz, FP1 and FC6 electrode is associated with superior temporal gyrus, lateral occipital cortex, superior frontal gyrus, frontal pole and precentral gyrus, respectively [28]. From the feature selection process, it is observed that the time domain features of electrodes associated with gyrus and frontal pole brain regions are selected [28]. This is in accordance with literature which states that the gyrus [29] and the frontal pole [30] have a role in emotion regulation.

3.8 Classifiers

In this work, the features from different modalities are fed to the supervised machine learning algorithms, namely, Naïve Bayes [31], logistic regression [32], LDA [33], QDA [33], logit boost [34], and stacking [35] and the accuracy determined is compared. These classifiers are explained briefly in supplementary material at https://github.com/armanjupriya-er/er-comparison-supplementary.

4. Results and discussion

The accuracy and F1-score of the experiment are shown in Table 4. The graphic illustration is available as Figure S1 at: https://github.com/armanjupriya-er/er-comparison-supplementary. The results obtained using EEG and physiological signals as independent modalities indicate that EEG signals are better at arousal prediction compared to physiological signals by 7.18%, while physiological signals are better at valence prediction compared to EEG signals by 3.51%. Combining EEG and physiological modalities, the arousal prediction is better than physiological signal modality by 2.39% and is inferior to EEG modality by 4.46%, while the valence prediction of the combined modality is better than EEG modality by 3.07% and is inferior to physiological modality by 0.42%. From the prediction accuracy in arousal and valence dimension, it is observed that EEG as a single modality and physiological signal as a single modality performs better than combining EEG with physiological signals.

A one-way ANOVA test was conducted in order to validate whether there is any significant difference in prediction ability between the EEG and physiological modalities using the same set of features. One-way ANOVA for arousal accuracy (F (1,6) = 7.05, p = 0.0378) shows that there is a significant difference in accuracy levels reported by EEG and physiological signals, while the difference in F1-Score (F (1,6) = 5.07, p = 0.0653) is not significant at 5% level of significance. One-way ANOVA for valence accuracy (F (1,6) = 0.08, p = 0.7874) and F1-score (F (1,6) = 0.25, p = 0.6372) shows that there is no significant difference in accuracy and F1-score between EEG and physiological modalities at 5% level of significance.

Feature selection shows that EEG electrodes F3, T7, CP5, P3, P7, Fz, FC6, CP6, P4 are used in arousal prediction, while signals from FP1 and FC6 are used in valence prediction. Based on the observation, FC6 electrode is common for arousal and valence prediction; therefore prediction ability of the FC6 electrode is studied and shown in Table 5. The experimental results suggest that the ability of the FC6 electrode to predict the valence is 59.00%, which is equal to the best prediction accuracy, obtained using all of the physiological signals. The prediction accuracy of the FC6 electrode with respect to arousal is 52.25%, which is at par with the prediction accuracy of the physiological signals. The FC6 electrode position corresponds to the primary motor cortex area in the brain, which is associated with the function of controlling different muscle groups [36]. This concludes that the prediction accuracy of the FC6 electrode in the valence dimension comes from muscle activity.

On further analysis of the selected features list, it is observed that zEMG plays a significant role in the prediction of both arousal and valence. Classifiers were trained on the physiological signals: EOG, EMG, EDA, temperature, plethysmograph and respiration using the features listed in Table 2 to study their prediction ability. To sort the physiological signals based on the prediction accuracy, same set of features are fed to the classifiers listed earlier. The best prediction accuracy obtained and the corresponding classifier for each of the physiological signals is shown in Table 5. The graphic illustration is available as Figure S2 at: https://github.com/armanjupriya-er/er-comparison-supplementary.

Study of the prediction capability of time and frequency domain features of EOG, EMG, EDA, temperature, plethysmograph and respiration indicates that the plethysmograph shows an arousal prediction accuracy of 55.50%, which is inferior to the EEG modality by 0.89%, while the EOG shows valence prediction accuracy of 60.00%, which is better than the combination of all physiological signals by 1.69%. Features of EDA and zEMG each resulted in valence prediction accuracy of 58.25%. The sorted order of physiological signals based on arousal prediction accuracy is as follows: plethysmograph, EOG (hEOG + vEOG), vEOG, hEOG, zEMG, tEMG, temperature, EMG (tEMG + zEMG), respiration, EDA, whereas based on valence prediction accuracy the sorted order is EOG (hEOG + vEOG), EDA, zEMG, hEOG, respiration, tEMG, vEOG, EMG (tEMG + zEMG), temperature, plethysmograph. The valence prediction accuracy of EOG is superior to zEMG and EDA by 1.75% at the cost of higher number of electrodes (EOG requires four electrodes, whereas zEMG and GSR each require two electrodes). The results indicate that the valence prediction accuracy comes from muscle activity. Another notable observation is that, in a high dimensional feature space, ensemble classifiers (logit boost, stacking) perform better (Table 4), and in a low dimensional feature space, statistical models (logistic regression, LDA, QDA) perform better (Table 5) which is in line with literature [37].

The results of the experiment in comparison with the state-of-the-art (SOTA) is presented as supplementary Table S1 at https://github.com/armanjupriya-er/er-comparison-supplementary. It is observed that only 36.4% of the studies [10, 12, 19, 38] in literature have used subject independent ER; while all other studies in literature are on subject specific ER. All the subject independent studies have used LOSO validation, and the reported results are average of the LOSO results across the subjects. The current experiment is subject independent, and it varies from all the previous studies as it splits the subjects into 70:30 ratio for training and testing respectively. The results reported in Tables 4, 5, 6 and S1 are from the test set.

The performance of regularized deep fusion of kernel machines (RDFKMs) on EEG, EMG, EDA and respiratory rate [38] and pretrained inception ResNet v2 on facial expression, EEG and GSR modalities [12] were explored in recent literature. Similarly, recent research investigated the performance of statistical features on combination of multiple modalities [10, 19]. Unlike the experiment carried out in this work, all the above-mentioned recent research reported the average LOSO validation score as the final accuracy. Also, some of the recent works did not publish the cut-off score used to distinguish low and high values in the arousal and valence dimensions [10, 38] whereas, two other recent works mentioned the cut-off score as 4.5 [19] and cut-off score range as [1.0,3.0] (for low) and [7.0, 9.0] (for high) [12]. This is in contrast to the experiment carried out in this work which uses scale ranges of [0, 5.0] and (5.0, 9.0] as low and high, respectively.

The accuracy obtained by proposed unimodal valence recognition using EOG and multimodal valence recognition using zEMG and EOG is better than the accuracy obtained in literature [10, 12] by 5.44% and 11.52% respectively, but less than the accuracy obtained in literature [19, 38] by 16.87% and 6.97%, respectively. The accuracy obtained by proposed unimodal arousal recognition using EEG or plethysmograph is better than the accuracy obtained in Ref. [12] by 4.08% and is less compared to all other methods. This research work is not compared with the subject dependent ER studies listed in Table S1, as this experiment is about subject independent ER. The low accuracy reported in this experiment can be partly attributed to the dataset used to report the test accuracy. This experiment specifically uses a separate test set while all other subject independent ER work [10, 12, 19, 38] listed in Table S1 reports an average of LOSO accuracy. Also, in this experiment, the same set of features is used across different modalities. More research is needed to determine whether modality specific features improve the prediction accuracy. Table 6 shows additional evaluation metrics for the proposed methods, including accuracy, ROC area, kappa statistic, precision, recall, true positive rate (TPR), false positive rate (FPR), F1-score, true positive count (TP count), false positive count (FN count), true negative count (TN count) and false negative count (FN count). According to the kappa statistic, F1-score and accuracy, EEG is better suited for arousal prediction, whereas EOG is better suited for valence prediction. The ROC area reported for arousal prediction by EEG modality is less compared to the plethysmograph modality by 0.028. From an ergonomic perspective, obtaining a plethysmograph signal is easier compared to obtaining EEG signals.

5. Conclusion

The experimental results of this work suggest that arousal dimension prediction ability is high for EEG signals, while valence dimension prediction ability is high for the combination of EOG and zEMG signals. In addition, valence can be measured from the eyes (EOG) while arousal can be measured from the changes in blood volume (plethysmograph). Also, muscle activity plays a significant role in valence prediction.

Further research is required to examine whether the prediction ability of the EEG signal is resulting from brain regions associated with muscle activity or not. Whether modality specific features improve the prediction accuracy or not is yet to be explored. The experiment needs to be repeated on other existing or new datasets to identify the best modality for each emotion dimension. To determine the effect of stimulus on eye muscle, further study of eye movements while expressing emotions can be performed.

Figures

Proposed multimodal and unimodal emotion recognition methods for arousal dimension and valence dimension

Figure 1

Proposed multimodal and unimodal emotion recognition methods for arousal dimension and valence dimension

Labeling strategy and distribution of labels across training and test set

AttributeScale rangeLabelTraining set [Count (%)]Test set [Count (%)]
Valence>=0 and <=5LV398 (45.23%)174 (43.50%)
Valence>5 and <=9HV482 (54.77%)226 (56.50%)
Arousal>=0 and <=5LA348 (39.55%)195 (48.75%)
Arousal>5 and <=9HA542 (60.45%)205 (51.25%)

Features for EEG channels and physiological channels

ModalityChannel nameFeaturesNumber of features
EEGFP1, AF3, F3, F7, FC5, FC1, C3, T7, CP5, CP1, P3, P7, PO3, O1, Oz, Pz, FP2, AF4, Fz, F4, F8, FC6, FC2, Cz, C4, T8, CP6, CP2, P4, P8, PO4, O2Hjorth Activity, Hjorth Complexity, Alpha Band Welch PSD, Beta Band Welch PSD, Gamma Band Welch PSD, Theta Band Welch PSD, Delta Band Welch PSD32 × 7 = 224
EOGEXG1 – EXG2, EXG3 – EXG42 × 7 = 14
EMG – Zygomaticus Major, TrapeziusEXG5 – EXG6, EXG7 – EXG82 × 7 = 14
EDAGSR11 × 7 = 7
RespirationResp1 × 7 = 7
PlethysmographPlet1 × 7 = 7
TemperatureTemp1 × 7 = 7

Features selected from each of modalities

ModalityFeatures selected
Arousal – LA, HAValence – LV, HV
EEGGAMMA_F3_Welch_PSD, T7_HjorthActivity, GAMMA_CP5_Welch_PSD, ALPHA_P3_Welch_PSD, P7_HjorthComplexity, Fz_HjorthComplexity, ALPHA_FC6_Welch_PSD, GAMMA_CP6_Welch_PSD, GAMMA_P4_Welch_PSDFP1_HjorthActivity, FC6_HjorthComplexity, ALPHA_FC2_Welch_PSD
PhysiologicalGAMMA_hEOG_Welch_PSD, GAMMA_zEMG_Welch_PSDGAMMA_hEOG_Welch_PSD, GAMMA_zEMG_Welch_PSD
EEG + PhysiologicalGAMMA_F3_Welch_PSD, T7_HjorthActivity, GAMMA_CP5_Welch_PSD, ALPHA_P3_Welch_PSD, P7_HjorthComplexity, Fz_HjorthComplexity, ALPHA_FC6_Welch_PSD, GAMMA_CP6_Welch_PSD, GAMMA_P4_Welch_PSD, GAMMA_zEMG_Welch_PSDFC6_HjorthComplexity, GAMMA_zEMG_Welch_PSD

Accuracy and F1-score for EEG, physiological and EEG + physiological modality

ModalityClassifierArousal – LA, HAValence – LV, HV
Accuracy (%)F1 scoreAccuracy (%)F1 score
EEGLogit Boost53.750.53655.000.504
Naïve Bayes51.750.46856.750.513
QDA51.750.47057.000.524
Stacking56.000.56056.750.538
PhysiologicalLogit Boost49.500.41559.000.592
Naïve Bayes49.000.39555.000.405
QDA48.750.39455.000.401
Stacking52.250.50857.750.576
EEG + PhysiologicalLogit Boost46.750.45158.750.588
Naïve Bayes49.500.44656.500.566
QDA50.250.46455.250.402
Stacking53.500.53456.500.566

Prediction accuracy of physiological signals and FC6 EEG electrode

ModalityArousalValence
ModelAccuracyF1-scoreModelAccuracyF1score
EMG (tEMG + zEMG)Logistic52.000.414Logistic56.750.502
EOG (hEOG + vEOG)Naïve Bayes53.750.513LDA60.000.601
EDALogit Boost51.250.427Logistic58.250.446
hEOGQDA53.000.460Logit Boost57.250.573
PlethysmographLDA55.500.460Logistic55.000.417
RespirationLogit Boost51.750.454LDA57.250.469
tEMGNaïve Bayes52.500.397Logit Boost57.250.537
TemperatureLogit Boost52.250.385LDA56.750.426
vEOGQDA53.750.528Logistic57.250.536
zEMGLogistic52.500.404Logistic58.250.522
FC6 EEG ElectrodeStacking52.250.475QDA59.000.500

Evaluation metrics for the proposed models

MetricProposed Unimodal-1Proposed Unimodal-2Proposed Multimodal-1Proposed Unimodal-3Proposed Unimodal-4Proposed Unimodal-5
ModalityEEGPlethysmographzEMG, EOGEOGEDAzEMG
Class LabelsLA, HALA, HALV, HVLV, HVLV, HVLV, HV
MethodStackingLDALogit BoostLDALogisticLogistic
Accuracy (%)56.0055.5059.0060.0058.2558.25
Precision0.5610.6440.5960.6140.7600.575
Recall0.5600.5550.5900.6000.5830.583
F1-Score0.5600.4600.5920.6010.4460.522
Kappa Statistic0.1210.0910.1770.2070.0450.081
ROC Area0.5540.5820.6080.6380.5860.546
TPR0.5600.5550.5900.6000.5830.583
FPR0.4390.4660.4110.3870.5420.508
TP Count (0 as 0)11425102114733
TN Count (1 as 1)110197134126226200
FN Count (0 as 1)811707260167141
FP Count (1 as 0)95892100026

References

1Darwin C. The expression of the emotions in man and animals. Chicago: The University of Chicago Press; 1965.

2Lange CG, James W (Eds). The emotions. Williams & Wilkins Co; 1922. 1.

3Cannon WB. The James–Lange theory of emotions: a critical examination and an alternative theory. The Am J Psychol. 1927; 39: 106-24.

4Wang Z, Zhou X, Wang W, Liang C. Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video. Int J Machine Learn Cybernetics. 2020; 11: 923-34. doi: 10.1007/s13042-019-01056-8.

5Huang H, Hu Z, Wang W, Wu M. Multimodal emotion recognition based on ensemble convolutional neural network. IEEE Access. 2020; 8: 3265-71. doi: 10.1109/access.2019.2962085.

6Hassan MM, Alam MGR, Uddin MZ, Huda S, Almogren A, Fortino G. Human emotion recognition using deep belief network architecture. Inf Fusion. 2019; 51: 10-18. doi: 10.1016/j.inffus.2018.10.009.

7Zhu J, Wei Y, Feng Y, Zhao X, Gao Y. Physiological signals-based emotion recognition via high-order correlation learning. ACM Trans Multimedia Comput Commun Appl. 2020; 15: 1-18. doi: 10.1145/3332374.

8Poria S, Peng H, Hussain A, Howard N, Cambria E. Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing. 2017; 261: 217-30. doi: 10.1016/j.neucom.2016.09.117.

9Yin Z, Zhao M, Wang Y, Yang J, Zhang J. Recognition of emotions using multimodal physiological signals and an ensemble deep learning model. Computer Methods Programs Biomed. 2017; 140: 93-110. doi: 10.1016/j.cmpb.2016.12.005.

10Bota P, Wang C, Fred A, Silva H. Emotion assessment using feature fusion and decision fusion classification based on physiological data: are we there yet?. Sensors (Switzerland). 2020; 20: 4723. doi: 10.3390/s20174723.

11Dzieżyc M, Gjoreski M, Kazienko P, Saganowski S, Gams M. Can we ditch feature engineering? End-to end deep learning for affect recognition from physiological sensor data. Sensors (Switzerland). 2020; 20: 1-21. doi: 10.3390/s20226535.

12Cimtay Y, Ekmekcioglu E, Caglar-Ozhan S. Cross-subject multimodal emotion recognition based on hybrid fusion. IEEE Access. 2020; 8: 168865-78. doi: 10.1109/access.2020.3023871.

13Tan C, Ceballos G, Kasabov N, Subramaniyam NP. FusionSense: emotion classification using feature fusion of multimodal data and deep learning in a brain-inspired spiking neural network. Sensors (Switzerland). 2020; 20: 5328. doi: 10.3390/s20185328.

14Fu J, Mao Q, Tu J, Zhan Y. Multimodal shared features learning for emotion recognition by enhanced sparse local discriminative canonical correlation analysis. Multimedia Syst. 2017; 25: 451-61. doi: 10.1007/s00530-017-0547-8.

15Li W, Chu M, Qiao J. Design of a hierarchy modular neural network and its application in multimodal emotion recognition. Soft Comput. 2019; 23: 11817-28. doi: 10.1007/s00500-018-03735-0.

16Wang Z, Zhou X, Wang W, Liang C. Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video. Int J Machine Learn Cybernetics. 2020; 11: 923-34. doi: 10.1007/s13042-019-01056-8.

17Kaya H, Gürpınar F, Salah AA. Video-based emotion recognition in the wild using deep transfer learning and score fusion. Image Vis Comput. 2017; 65: 66-75. doi: 10.1016/j.imavis.2017.01.012.

18Tzirakis P, Chen J, Zafeiriou S, Schuller B. End-to-end multimodal affect recognition in real-world environments. Inf Fusion. 2021; 68: 46-53. doi: 10.1016/jinffus.2020.10.011.

19Ayata D, Yaslan Y, Kamasak ME. Emotion recognition from multimodal physiological signals for emotion aware healthcare systems. J Med Biol Eng. 2020; 40: 149-57. doi: 10.1007/s40846-019-00505-7.

20Frank E, Hall MA, Witten IH. The WEKA workbench. Online appendix for data mining: practical machine learning tools and techniques. 4th ed. Morgan Kaufmann; 2016.

21Koelstra S, Muhl C, Soleymani M, Lee J-S, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I, DEAP . A database for emotion analysis using physiological signals. IEEE Trans Affective Comput. 2012; 3: 18-31. doi: 10.1109/t-affc.2011.15.

22Wang XW, Nie D, Lu BL. EEG-based emotion recognition using frequency domain features and support vector machines. In: Lu BL, Zhang L, Kwok J (Eds.). Neural information processing. ICONIP 2011. Lecture notes in computer science, Vol. 7062. Berlin, Heidelberg: Springer; 2011. doi: 10.1007/978-3-642-24955-6_87.

23Jenke R, Peer A, Buss M. Feature extraction and selection for emotion recognition from EEG. IEEE Trans Affective Comput. 2014; 5(3): 327-39. doi: 10.1109/TAFFC.2014.2339834.

24Hjorth B. The physical significance of time domain descriptors in EEG analysis. Electroencephalography Clin Neurophysiol. 1973; 34(3): 321-5. ISSN 0013-4694 doi: 10.1016/0013-4694(73)90260-5.

25Hjorth B. EEG analysis based on time domain properties. Electroencephalography Clin Neurophysiol. 1970; 29(3): 306-10. doi: 10.1016/0013-4694(70)90143-4.

26Welch P. The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans Audio Electroacoustics. 1967; 15(2): 70-3. doi: 10.1109/TAU.1967.1161901.

27Hall MA. Correlation-based feature subset selection for machine learning. Hamilton, New Zealand: University of Waikato; 1998.

28Scrivener CL, Reader AT. Variability of EEG electrode positions and their underlying brain regions: visualizing gel artifacts from a simultaneous EEG-fMRI dataset. Brain Behav. 2022; 12(2): e2476. doi: 10.1002/brb3.2476.

29Portugal LCL, Alves RDCS, Orlando FJ, Sanchez TA, Mocaiber I, Volchan E, Smith Erthal F, Antunes David I, Kim J, Oliveira L, Padmala S, Chen G, Pessoa L, Garcia Pereira M. Interactions between emotion and action in the brain. NeuroImage. 2020; 214: 116728. ISSN 1053 - 8119 doi: 10.1016/j.neuroimage.2020.116728.

30Koch SBJ, Mars RB, Toni I, Roelofs K. Emotional control, reappraised. Neurosci Biobehavioral Rev. 2018; 95: 528-34. ISSN 0149-7634 doi: 10.1016/j.neubiorev.2018.11.003.

31John GH. Pat langley: estimating continuous distributions in bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo; 1995. p. 338-45.

32le Cessie S, van Houwelingen JC. Ridge estimators in logistic regression. Appl Stat. 1992; 41(1): 191-201.

33James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning : with applications in R. New York: Springer; 2013.

34Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. The Annals of Statistics. 2000; 28: 337-407. doi: 10.1214/aos/1016218223.

35Wolpert D. Stacked generalization. Neural Networks. 1992; 5: 241-59. doi: 10.1016/S0893-6080(05)80023-1.

36Rimbert S, Al-Chwa R, Zaepffel M, Bougrain L. Electroencephalographic modulations during an open- or closed-eyes motor task. Peer J. 2018; 6: e4492. doi: 10.7717/peerj.4492.

37Pappu V, Pardalos PM. High-dimensional data classification. In: Aleskerov F, Goldengorin B, Pardalos P, (Eds.). Clusters, orders, and trees: methods and applications. Springer optimization and its applications, Vol. 92. New York, NY: Springer; 2014. doi: 10.1007/978-1-4939-0742-7_8.

38Zhang X, et al. Emotion recognition from multimodal physiological signals using a regularized deep fusion of kernel machine. IEEE Trans Cybernetics. 2021; 51(9): 4386-99. doi: 10.1109/TCYB.2020.2987575.

Further reading

39Huang Y, Yang J, Liu S, Pan J. Combining facial expressions and electroencephalography to enhance emotion recognition. Future Internet. 2019; 11: 1-17. doi: 10.3390/fi11050105.

40Kwon YH, Shin SB, Kim SD. Electroencephalography based fusion two-dimensional (2d)-convolution neural networks (CNN) model for emotion recognition system. Sensors (Switzerland). 2018; 18: 1383. doi: 10.3390/s18051383.

41.Karanchery S, Palaniswamy S. Emotion recognition using one-shot learning for human-computer interactions. In: 2021 International Conference on Communication, Control and Information Sciences (ICCISc); 2021. p. 1-8. doi: 10.1109/ICCISc52257.2021.9485024.

42Kuruvayil S, Palaniswamy S. Emotion recognition from facial images with simultaneous occlusion, pose and illumination variations using meta-learning. J King Saud Univ - Computer Inf Sci. 2021. ISSN 1319-1578. doi: 10.1016/j.jksuci.2021.06.012 (In press).

43Sasidharakurup H, Nutakki C, Rajendran A, Venugopal P, Sumon M, Navaneethkumar L, Madhu H, Bipin GN, Shyam D. Spectral correlations in speaker-listener behavior during a focused duo conversation using EEG. In: Proceedings of the Seventh International Conference on Advances in Computing, Communications and Informatics (ICACCI-2018), Bangalore, Karnataka, India, Sept 19-22, 2018.

44Bodda S, Maya S, Potti M, Naryanan E, Sohan U, Bhuvaneshwari Y, Mathiyoth R, Diwakar S. Computational analysis of EEG activity during stance and swing gait phases. In: Proceedings of the Third International Conference on Computing and Network Communications (CoCoNet’19) (accepted), Trivandrum, Kerala, India, 2019.

45Keshari T, Palaniswamy S. Emotion recognition using feature-level fusion of facial expressions and body gestures. In: 2019 4th International Conference on Communication and Electronics Systems (ICCES 2019), Coimbatore, TamilNadu, India; 2019. p. 1184-9.

46.Lawrance D, Palaniswamy S. Emotion recognition from facial expressions for 3D videos using siamese network. In: 2021 International Conference on Communication, Control and Information Sciences (ICCISc); 2021. p. 1-6. doi: 10.1109/ICCISc52257.2021.9484949.

Acknowledgements

The authors of the paper thank Dr. Karthik R.M. for providing his valuable feedback and suggestions on the manuscript.

Corresponding author

Suja Palaniswamy can be contacted at: p_suja@blr.amrita.edu

Related articles