• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Mental Illness Disorder Diagnosis Using Emotion Variation Detection from Continuous English Speech

    2021-12-15 07:07:44LalithaDeepaGuptaMohammedZakariahandYousefAjamiAlotaibi
    Computers Materials&Continua 2021年12期

    S.Lalitha,Deepa Gupta,Mohammed Zakariah and Yousef Ajami Alotaibi

    1Department of Electronics&Communication Engineering,Amrita School of Engineering,Amrita Vishwa Vidyapeetham,Bengaluru,India

    2Department of Computer Science&Engineering,Amrita School of Engineering,Amrita Vishwa Vidyapeetham,Bengaluru,India

    3Department of Computer Engineering,College of Computer and Information Sciences,King Saud University,Saudi Arabia

    Abstract: Automatic recognition of human emotions in a continuous dialog model remains challenging where a speaker’s utterance includes several sentences that may not always carry a single emotion.Limited work with standalone speech emotion recognition (SER) systems proposed for continuous speech only has been reported.In the recent decade, various effective SER systems have been proposed for discrete speech, i.e., short speech phrases.It would be more helpful if these systems could also recognize emotions from continuous speech.However,if these systems are applied directly to test emotions from continuous speech, emotion recognition performance would not be similar to that achieved for discrete speech due to the mismatch between trainingdata(from training speech)and testing data(from continuous speech).The problem may possibly be resolved if an existing SER system for discrete speech is enhanced.Thus, in this work the author’s existing effective SER system for multilingual and mixed-lingual discrete speech is enhanced by enriching the cepstral speech feature set with bi-spectral speech features and a unique functional set of Mel frequency cepstral coefficient features derived from a sine filter bank.Data augmentation is applied to combat skewness of the SER system toward certain emotions.Classification using random forest is performed.This enhanced SER system is used to predict emotions from continuous speech with a uniform segmentation method.Due to data scarcity,several audio samples of discrete speech from the SAVEE database that has recordings in a universal language,i.e.,English,are concatenated resulting in multi-emotionalspeech samples.Anger,fear,sad,and neutral emotions,which are vital during the initial investigation of mentally disordered individuals,are selected to build six categories of multi-emotional samples.Experimental results demonstrate the suitability of the proposed method for recognizing emotions from continuous speech as well as from discrete speech.

    Keywords: Continuous speech; cepstral; bi-spectral; multi-emotional; discrete;emotion; filter bank; mental illness

    1 Introduction

    A mental disorder, also called mental illness or psychiatric disorder [1], is a mental or behavioral pattern that causes significant impairment or distress in terms of personal functioning [2].Mental disorders affect emotion, behavioral control, and cognition, and cause substantial interference in the learning ability of children as well as the functioning capability of adults at work and with their families.Mental disorders tend to originate at an early age and if not diagnosed and treated the individual suffers in a chronic recurrent manner [3].The recent decade has witnessed a significant increase in the number of people suffering from mental illness [4–7].Further, the COVID-19 pandemic has had an adverse effect on the mental health of people directly affected by the corona virus but also on their family members and friends as well as the general public [8–10].Thus, there exists an urgency to advance human mental health globally,resulting in a great demand for health care professionals for diagnosis and treatment [11–13].

    Patients suffering from different mental disorders typically experience certain specific emotions.Anxiety and fear are associated with individuals undergoing stress [14] and seasonal affective disorder [15].Major depressive disordered [16] and mood disordered individuals [17] are prone to sadness and, in some cases, such individuals are always emotionally neutral and do not respond to situations that would typically cause an emotional response.Anger and fear are usually experienced by COVID-19 affected patients [18].Borderline Personality Disorder (BPD) is a prevalent mental disorder that has an identifiable emotional component.It is reported that approximately 1.6% of the general population and 20% of the psychiatric population suffer from BPD [19].Typically, BPD patients have rapid mood swings, tend to be emotionally unstable, and experience intense negative emotions (also referred to as affective dysregulation).People suffering with BPD do not feel the same emotion at all the times [20–23].Apart from mental illness, individuals with medical issues, for example hormonal and heart related issues, experience fear, anger, or sad emotions [24–27].Thus, anger, fear, sad, and neutral emotions are indicators of mental disorders and other medical conditions.If these emotions could be predicted, then this would greatly help mental healthcare professionals during an initial investigation to diagnose the ailment.

    Human emotions can be detected through speech, facial expressions, gestures, electroencephalography signals and autonomic nervous system signals.Amongst these modalities, recognition of emotions using speech is more popular for data collection and speech sample processing is more convenient.During the primary investigation of a mental illness, doctors spend time counseling patients [28].In the process of continuous conversation, the sequence of emotions experienced by the patients are vital to understand the symptoms and the associated disorder.This situation would benefit from a speech-based automated system that can continuously detect the sequence of a patient’s emotions during counseling.Such a system would help doctors identify the mental illness.

    Speech-based automated systems have been developed for health care [29–31].These systems are equipped with emotional intelligence, causing mental health services to be further strengthened.Various automated systems that recognize emotions using text or multimodal analysis,i.e., a combination of text, images, and the linguistics of speech, have been designed [32–34].However, most existing automated speech emotion recognition (SER) systems are monolingual and can recognize emotions only from discrete speech.If these systems can be further enhanced to recognize emotions from continuous speech they could be more beneficial for doctors to diagnose patients with mental illness.Such a continuous SER system is proposed in this research work.

    This remainder of this paper is organized as follows.Section 2 briefly review state of the art SER systems.Section 3 outlines the proposed approach and performance measures.Experiments are described and the results are discussed in Section 4.Conclusions and suggestions for future work are presented in Section 5.

    2 State-of-the-Art Models

    A typical SER system processes and classifies various speech signals to recognize the embedded emotions.There exist several approaches to model emotions; however, categorical and dimensional models are most common [35–38].Categorical models deal with discrete human emotions experienced most commonly in day-to-day life.For example, Ekman proposed six basic human emotions, i.e., anger, disgust, fear, surprise, happiness, and sadness [39].A dimensional model interprets discrete emotion in terms of valence and arousal dimension [40].In the literature, SER based on dimensional models is referred to as continuous emotion recognition [41–43].In both categorical and dimensional SER models, emotion is recognized from a short duration phrase(2–4 s) for monolingual, multilingual, cross-lingual, and mixed-lingual contexts [44,45].

    However, with conversational/continuous speech, speech data lasts for a longer duration, and the same emotion might not exist throughout the spoken utterance.Therefore, to deal with such situations, an SER system for continuous speech is essential.Few studies have investigated SER systems for continuous speech, and emotion databases with continuous speech are not available.Yeh et al.[46] investigated a continuous SER system using a segmentation-based approach to recognize emotions on continuous Mandarin emotion speech.Their study involved discrete emotion samples with categories of angry, happy, neutral, sad, and boredom.Multi-emotional samples with variable lengths were created by combining any two discrete emotion samples belonging to different categories, such as angry–happy, neutral–sad, and boredom–happy, resulting in a total of 10 categories.Frame-based and voiced segmentation techniques were designed to evaluate the two emotions in each voice sample multi-emotional sample.A 128-feature set included jitter, shimmer, formants, linear predictive coefficients, linear prediction cepstral coefficients, Mel Frequency Cepstral Coefficients (MFCC) and MFCC derivatives, log frequency power coefficients, Perceptual Linear Prediction (PLP), and Rasta-PLP served as the speech features.Relevant features were extracted using sequential forward and sequential backward selection methods.A weighted discrete k-nearest neighbor classifier was considered that was trained using variable length utterances created from the database [46].Fan et al.[47] investigated a multi-scaled time window for continuous SER.Their work involved recognizing two emotions from two classes of voice samples, i.e., angry–neutral or happy–neutral samples from the Emo-dB database and a Chinese database [47].Various MFCC features, modulation spectral features, and global statistical features were employed in experiments.The LIBSVM library was applied for classification.The training data was combined and segmented uniformly to train the classifier.System performance was compared with a baseline Hidden Markov Model (HMM) system [48].The best results were obtained using global statistical features.

    Summary and Limitation of State of the Art Approaches:

    From the survey conducted it is evident that various SER systems for monolingual, multilingual, cross-lingual and mixed-lingual discrete speech have been proposed in the past decade.However, few studies have considered continuous SER.In addition, the discrete speech SER systems used segmented continuous speech to train the classifier.Dedicated segmentation methods were incorporated to detect emotion variation boundaries in continuous speech using German and Chinese language voice samples.

    It would be more practical and useful if a well-established SER system that works for discrete speech could also be applied for continuous speech, To a large extent, existing discrete SER systems may not be able to capture the sequence of emotions in continuous speech due to variation in emotion boundaries of training samples (derived from discrete speech) and test samples (derived from continuous speech).To address this, if some enhancements are incorporated in the prevailing SER systems for discrete speech, then continuous emotions could be better detected.Further, the SER systems should be robust for detecting emotions from a universal language, such as English, so that it can be versatile across the globe.Such an SER system is proposed in this article.

    The primary contributions of this study are as follows.

    a.Unique sine filter bank-based Mel-coefficient functionals are explored to recognize speech emotion.

    b.A distinctive compact cepstral and bi-spectral feature combination is proposed for effective SER.

    c.The proposed SER system efficiently recognizes emotions in continuous speech as well as discrete speech using a simple uniform segmentation technique.

    3 Proposed Approach

    The workflow of the implemented methodology is shown in Fig.1.The principal constituent modules include database preparation, preprocessing, speech feature extraction, classification, and post-processing.

    Figure 1:Proposed continuous SER work overflow

    3.1 Database Preparation

    Globally, the majority of people communicate in English.Hence, the SAVEE database, which contains recordings of utterances from four male native British English speakers was selected.The focus of this work is toward recognition of emotions from continuous speech of mentally disordered individuals during counseling.Thus, angry, neutral, sad, and fear emotions are considered.In the database used, the recordings comprised fifteen phonetically balanced sentences per emotion from the standard TIMIT corpus, with an additional 30 sentences for neutral emotion [49].

    Creation of Multi-Emotional Voice Samples

    Here, the focus is on continuous emotion detection.Due to the lack of available continuous speech emotion samples, a database needed to be created from available discrete emotion samples.In the database under consideration each sample includes a discrete emotion of 2–4 s.In a practical situation, human emotions exist for a certain period.Thus, 3–4 samples of the same emotion class are concatenated to form a voice sample of a single emotion category, as shown in

    Figs.2 and 3.Two such voice samples from different emotion categories are concatenated to create a continuous multi-emotional speech sample, as depicted in Fig.4.Thus, in this work, continuous speech samples are multi-emotional with a duration of 7–12 s.Five different categories of multiemotional voice samples, i.e., angry–neutral, sad–angry, angry–fear, sad–neutral, and fear–neutral are created using Audacity [50], which is an open-source audio editor and recording application software.Any two emotions from angry, neutral, sad, and fear are considered in multi-emotional speech creation as identification of these emotions are significant in any clinical investigation of an individual thought to be suffering from a mental disorder.

    Figure 2:Creation of an angry utterance

    Figure 3:Creation of a neutral utterance

    Figure 4:Combining utterances

    3.2 Preprocessing

    This phase involves segmentation of continuous speech.As shown in Fig.5, the speech signal is segmented uniformly into segments of constant lengths (e.g., 2 s) and two consecutive frames make an independent speech sample.Framing is performed without overlapping.Then, the emotion of each segment can be recognized.

    Figure 5:Uniform segmentation of continuous speech

    3.3 Speech Feature Extraction

    In this study, the speech feature set includes cepstral features, bispectral features, and modified sine-based MFCC coefficients.

    3.3.1 Cepstral Features

    Unique cepstral speech feature functionals derived from Mel, Bark, and inverted Mel filter banks along with modified H-coefficients and additional parameters are found to be quite robust for multilingual and mixed-lingual SER for discrete samples from Indian and western language backgrounds [51].The feature set in a previous study [51] form a size of 151 coefficients, as shown in Tab.1, which are part of the speech feature set in this work.

    Table 1:Cepstral feature set [51]

    3.3.2 Bispectral Features

    Fundamentally, a bispectrum is a Fourier transform of dimension two from the cumulant function of the third order, as shown in Eq.(1).

    Here,P(fx,fy)denotes a bispectrum with frequencies (fx,fy).X(f) represents Fourier transform, * signifies complex conjugate, and E[.] means expectation of operation [52].The bispectrum of a speech signal includes redundant data.Thus, bispectral features are selected from the non-redundant area(Ω), as shown in Fig.6.

    Figure 6:Non-redundant area

    Frequencies represented in Fig.6 are normalized by Nyquist frequency.Eqs.(2)–(11) illustrate the procedural steps to derive bispectral speech features.The mean magnitude of the bispectrum is expressed as follows:

    where p denotes the number of points prevailing in that region [53].The weighted center of bispectrum (WCOB) is derived using Eqs.(5)–(8).

    Here, c and d provide the bin index of the frequency existing in the region, whereg1d,g2drepresents WCOB andg3d,g4dare WCOB absolute values [54].

    The log amplitude summation (Ta) of the bispectrum is derived as follows.

    Similarly, the log amplitude summation from diagonal elements (Tb) in the bispectrum derived as follows

    The amplitude of diagonal elements (Tc,Td,Te) with first and second order spectral moments is derived by:

    A total of six features comprising the bispectrum mean amplitude and five features of bispectrum log amplitudes are derived and form the part of the proposed speech feature set.

    3.3.3 Modified Sine-Based MFCC Coefficients

    The process flow for extraction of sine-based Mel coefficients is shown in Fig.7.Initially, the power spectra of the preprocessed speech signal is derived.Differing from the conventional triangular shaped filter bank for MFCC feature extraction as discussed in an earlier SER study [55],here sinusoidal filter banks, as shown in Fig.8, are applied to the power spectra.The center frequencies of the filter banks are given as Eq.(12).

    whereB-1(m)is defined in Eq.(13) as follows:

    Here, f(p) denotes center frequency,fsrepresents sampling frequency, and N is the window length.

    Figure 7:Sine-based MFCC feature extraction process

    Figure 8:Sine-based filter bank

    In this study, for each speech signal, 151 cepstral features, six bispectral features, and six sine-based MFCC functionals (i.e., 163 coefficients) are extracted.

    3.4 Classification and Post Processing

    For the proposed SER work, various classifiers from Python [56] were chosen.However,compared with other classifiers, superior performance was achieved with the random forest (RF)classifier.Therefore, the RF classifier was hence considered in this work [57].With the knowledge acquired by the classifier during training from feature vectors of discrete samples referred as learning from discrete SER Model, emotion is predicted for each continuous speech segment.The feature vector is comprised of cepstral, bi-spectral, and sine filter bank-based MFCC functionals.In the post processing phase, a decision rule is deployed to determine the sequence of emotions.For every consecutive three speech segments, the emotion predicted the maximum number of times is the emotion determined.These predicted emotions are sequences of emotions in the continuous speech.

    3.5 Evaluation Metrics

    In this study, performance measures of recall, precision, F-measure, and accuracy are considered to evaluate the system [58].

    3.5.1 Recall

    Recallisthe number of instances that are relevant among the total number of relevant instances.Recall is also known as sensitivity.

    where, True Positive is number of samples predicted positive that are actually positive, and False Negative is the number of examples predicted negative that are actually negative.

    3.5.2 Precision

    The prince let himself be persuaded to go with the nurse, but when the princess questioned him as to who he was and how he had got into her garden, he behaved like a man out of his mind--sometimes smiling, sometimes crying, and saying: I am hungry, Or words misplaced and random61, civil mixed with the rude

    Precision gives the number of instances that are relevant among the instances retrieved.Precision is also known as the positive predictive value.

    Precision quantifies the number of correct positive predictions.

    3.5.3 F-Measure

    F-measure is the harmonic mean of recall and precision.

    3.5.4 Accuracy

    Accuracy is the number of test samples of a particular emotion classified accurately with respect to the total number of test samples of the emotion under consideration.

    4 Experimental Work,Results,and Discussion

    The experimental work is performed in two successive modules.The focus of the first module involves enhancing the author’s previously proposed SER system on discrete speech [51].This is required because, although the existing SER system is suitable for recognizing emotions from various discrete speech languages, when continuous speech is tested, the performance is not similar.Therefore, the initial experimental work involves increasing the robustness of this existing SER system for discrete speech with the addition of a few more important speech features so that emotions could also be detected from continuous speech.The enhanced SER system is referred to as the proposed SER system.The second module involves experimentation on continuous speech using the proposed SER system.Both modules involve extraction of cepstral, bi-spectral and sinebased MFCC functionals speech features.An RF classifier is used.Fivefold cross-validation is applied to analyze system performance.

    4.1 Module 1:Experimentation and Analysis for Proposed SER System

    The previously proposed SER system for multilingual and mixed-lingual discrete speech [51]is considered in this work.The previous system comprised cepstral speech feature functionals of size 151 coefficients for each speech sample and used a simple RF classifier.Data augmentation was applied to avoid system bias toward any specific set of emotion categories.The current study focuses on recognizing emotions that are indicators of mental illness, i.e., angry, sad, fear, and neutral emotions.Thus, the initial phase of the work involved investigating the performance of the previous SER system [51] in recognizing these four emotions from discrete samples from the SAVEE database.The results obtained by this investigation are shown in Tab.2.

    Table 2:Previous SER system performance using cepstral features

    From Tab.2, it can be observed that, among the four indicative emotions, the system best recognizes angry and neutral emotions with recall rates of 96.7% and 95.0%, respectively.Precision and F-Scores rates are also reported to be above 80.0% for the aforementioned emotions.The minmax rates achieved are recall 70.0%–96.7%, precision 77.8%–90.6%, and F-Score 73.7%–93.5%.In addition, weighted averages of approximately 82.0% are obtained across all performance measures.Samples from sad emotions are misclassified as neutral while fear is primarily classified as angry.Thus, considerably lower rates are reported for sad and fear emotions.The previously proposed system has to be made more robust in recognizing fear and sad emotions along with angry and neutral emotions, such that emotions in continuous speech can be well detected.

    One probable solution the authors considered to overcome this limitation was to expand the existing speech feature set and enhance system performance for fear and sad emotions.Thus, in this work, the cepstral feature set used in the previous system [51] is enhanced using bi-spectral features that capture the higher order statistics of the signal spectra.The experimental work now involves extracting cepstral–bi-spectral feature combinations and analyzing the SER system.Thus,a speech feature set of 157 coefficients (151 cepstral features and 6 bi-spectral features) for each speech sample was extracted from all the audio samples of the SAVEE database.The feature set derived from the speech samples were subjected to an emotion recognition task.The SER system performance is shown in Tab.3.

    Table 3:Performance of the discrete SER system using cepstral and bi-cepstral features

    From the results shown in Tab.3, it is evident that the higher-order statistics of the bispectral features along with the cepstral features are significant for emotion recognition, and all emotions show performance measures greater than 85.0%.Fear and sad emotions show an increased recall rate of approximately 16% and 5%, respectively, compared with the results shown in Tab.2.The min-max rates achieved are recall 86.7%–100.0%, precision 86.4%–100%, and F-Score 87.7%–97.6%.The min rates across the three measures, which were previously less than 80%, have improved and remained above 85%.In addition, with the inclusion of bi-spectral features, weighted averages were approximately 92.0%, which is 10% higher than those reported in Tab.2, where only cepstral features were considered, across all performance measures.Note that,although SER performance has improved, some errors persist, i.e., sad is recognized as neutral and fear is recognized as angry.Thus, the recall rates for sad and fear emotions were between 80.0%–90.0%.

    To overcome this and further enhance the emotion prediction of the SER system, the speech feature set is further expanded.For this purpose, the authors focused on altering the filter bank shape used to derive cepstral features.With an initial work in this direction, the authors considered altering one of the cepstral feature filter bank shapes proposed in Tab.1.Among this set, MFCC has been a popular feature for various speech applications, including emotion recognition [59].Thus, in this study, the filter bank shape of MFCC is altered.Traditionally to date, triangular filter banks have been used for MFCC feature extraction.In this work, sine-shaped filter banks have been considered, and MFCC features are derived.The extraction procedure is discussed in Section 3.

    Six functionals of the Mel coefficients are derived from the sine filter bank and appended to the feature vector of the cepstral and bispectral feature combination.This resulted in a size of 163 coefficients for each speech sample.Classification was performed and the robustness of this feature combination is analyzed.The results obtained are shown in Tab.5.With the incorporation of the new speech feature, all four emotions are optimally recognized with performance rates greater than 95% for all measures.This indicates that the shape of the filter bank has a considerable effect on the extracted Mel coefficients and hence on the emotion discriminating capability.The average accuracy of all three performance measures was 97.9%.The min-max band for recall was 95.8%–100%, precision was 96.6%–99.2%, and the F-Score was 96.2%–99.6%.

    From an analysis of the results presented in Tab.4, the previous SER system [51] is enhanced with the inclusion of bi-spectral and sine filter bank-based MFCC coefficients.This enhanced system has proven to be robust in recognizing all the four emotions of discrete speech and henceforth is referred to as the proposed SER system.

    Table 4:Performance of the SER model using cepstral, bi-cepstral, and modified sine-based MFCC features

    4.2 Module 2:Experimentation and Analysis of Proposed SER System for Continuous Speech

    In this module, experiments are conducted with regard to the recognition of emotions from continuous multi-emotional speech samples using the proposed SER system.The detailed workflow is explained in Section 3.Emotions of either angry–neutral, fear–neutral, sad–neutral,angry–fear, angry–sad, or fear–sad are included in each continuous speech sample during data creation.The feature vector of the discrete speech samples were input to train the classifier that was subsequently applied to recognize emotions in continuous speech.In this experimental procedure, for example, consider the context with recognizing emotions of an angry–neutral speech, all the samples created of this category is divided into five different folds.When continuous speech samples of a particular fold of angry–neutral is tested, all those discrete samples of angry and neutral used for data creation in that fold are removed from the discrete input to the training phase to avoid the bias during testing.The same is repeated during testing the remaining five categories of continuous speech samples.

    In this context of experimentation, a multi-emotional sample of angry–neutral, as shown in Fig.9, was tested using the proposed SER system.For each segment, the system recognizes the associated emotion.The angry emotion is denoted A, and the neutral emotion is denoted N.The decision rule was as follows:for every three consecutive segments the maximally recognized emotion is considered to be emotion.Finally, all these emotions are emotions in the continuous speech.As observed in Fig.9, Angry-Angry-Angry–Neutral are the emotions of the speech sample tested.

    Figure 9:Recognized emotions in the continuous speech sample tested

    All continuous emotion samples were tested, and the obtained results were analyzed.First,the performance of the proposed SER system across each fold during the fivefold cross-validation was investigated.The bar charts in Figs.10a–10f depict how each emotion paired with another emotion in a speech sample of each fold is recognized in continuous speech using the proposed method.Every fold consists of eight test samples across any continuous emotion category.From the plots, it is observed that both emotions in Angry–Neutral and Angry–Sad are consistently recognized from the continuous speech across all folds.However, recognition of Sad in the Sad–Neutral combination shows a large variation across the folds.Fear in the neutral or angry combination and sad combined with neutral show large variations in recognized emotions across the folds.In addition, both emotions in the Fear–Sad combination remained consistent across the folds; however, fear is confused with angry, and sad is confused with neutral, resulting in a lower recognition performance.With the investigation of emotions recognized across folds, the next step involved overall performance analysis, as illustrated in Tab.5.

    The performance of the proposed SER system across each of these six different multiemotional pairs in continuous speech is shown in Tab.5.Emotions in continuous speech are better recognized from the Angry–Neutral emotion pair, with an accuracy of 85.0% and 97.0% for angry and neutral, respectively.Angry is better recognized with accuracy of at least 80.0% and higher with any of the continuous speech sample.Fear emotion is often confused with angry and is moderately recognized with Fear–Neutral and Angry–Fear scenarios.Sad is primarily classified as neutral, resulting in lower recognition performance, as observed in the Sad–Neutral combination.Thus, recognition of sad remains challenging when associated with neutral emotion.The emotions from the fear–sad continuous emotion category was found to be considerably lower, i.e., 54.6%for fear and 45.8% for sad emotion.

    An analysis of the accuracy performance across the six multi-emotional categories considered in this work is shown in Fig.11.Considerable accuracy recognition rates higher than 75.0% are guaranteed for any continuous emotion category.The Angry–Neutral emotion pair demonstrated superior recognition rates, with an average accuracy of 91.0%.With the exception of the fear–sad combination, the min-max average accuracy band was 71.3%–91.0%.

    Figure 10:Accuracy (%) performance of proposed SER system for continuous speech across each fold during fivefold cross-validation (a) Angry–Neutral continuous speech (b) Angry–Fear continuous speech (c) Angry–Sad continuous speech (d) Fear–Neutral continuous speech (e) Sad–Neutral continuous speech (f) Fear–Sad continuous speech

    As shown in Tab.5, similar performance results were obtained when the first and second emotion samples of the multi-emotional combination were interchanged.For example, Angry–Neutral and Neutral–Angry continuous samples were recognized with the same accuracy by the proposed SER system.

    Table 5:Accuracy (%) of the proposed SER system for continuous speech

    Figure 11:Average accuracy (%) of SER system for various continuous emotion combinations

    4.3 Comparative Analysis of the Proposed System with Existing Continuous SER Studies

    In this section, the proposed SER system is compared with existing works for recognition of emotions in continuous speech.As shown in Tab.6, two similar studies have been identified.Though each work is validated on a different database, the comparison is performed to compare the robustness of the proposed methodology to existing techniques.The proposed work involved uniform segmentation independent of the detection of emotion variation boundaries, as performed in existing studies [46,47].The proposed SER system showed considerable average recognition accuracy of 74.2% using a unique cepstral feature functional set of size 163.Four discrete emotions with six multi-emotional categories were involved.However, in the work of Yeh et al.[46],although five discrete emotions with 10 different multi-emotional categories using a Mandarin database are involved, with uniform segmentation, the model could achieve only 40% accuracy.Applying end point detection in the segmentation method and a feature selection method, 89.0%was achieved.Similarly, in the study conducted by Fan et al.[47], three discrete emotions with only two multi-emotional categories were involved.Though only small feature set of size 85 was applied, a multi-time scale window was applied during the segmentation stage, with an additional task of training and testing samples were chosen to be of the same length.Although both studies [46,47] demonstrated accuracy of approximately 89.0%, they only considered continuous speech, and validation was performed on Mandarin (Chinese language voice samples) and EmodB (German language voice samples) databases where the speakers emotion voice recordings are highly expressive.Note that Mandarin and German are not universal languages.In contrast, the SAVEE database is considered in this work.The SAVEE database contained voice samples from male speakers in English, which is a universal language.In addition, the emotions are very flat and not expressive.The proposed SER system exhibits considerable emotion recognition for both discrete and continuous speech, proving the robustness of the emotion carrying capability of the chosen speech feature combination, which avoids detection of emotion variation boundaries,feature selection techniques, and the use of segmented continuous speech during training for continuous emotion recognition.

    Table 6:Comparative analysis of the proposed continuous SER with previous studies

    5 Conclusion and Future Research

    This study focused on the recognition of human emotions in continuous speech in a mental health context.In this study, an existing SER system for discrete speech that is quite robust for multilingual and mixed-lingual contexts is enhanced to capture emotion variations in continuous speech.It was demonstrated that altering the filter bank shape during MFCC extraction was effective in improving SER.Sine filter bank-based Mel cepstral coefficients and a cepstral–bi-spectral feature set proved to be capable of recognizing emotions from continuous speech.In addition, uniform segmentation is considered.The proposed system is independent of any dedicated segmentation techniques and feature selection algorithms.Differing from existing SER systems, the proposed system is well suited for recognizing continuous emotions in continuous speech besides discrete speech.Thus, the proposed SER system is suitable to be deployed in bots for effective mental disorder investigations.

    The proposed SER system recognizes emotions from continuous English speech.Since this system is an enhanced version an existing system that is suitable for multilingual and mixed-lingual contexts, emotions from continuous speech of other languages should also be better recognized.Therefore, in future, the performance of the proposed system for continuous speech of other languages could be tested.This study is intended toward recognizing mental illness based on emotional content of speech.Therefore, real time audio recorded during counseling sessions with mental illness patients could be used to test the proposed system.In this study, two emotions are included in each multi-emotional voice sample.Future research could include more emotion categories.In addition, features could be added to the existing feature set so that the sad emotion could be better recognized in the presence of neutral or fear emotion in continuous speech.More significantly, cepstral features could be derived from different filter bank shapes.

    Funding Statement:This work was partially supported by the Research Groups Program (Research Group Number RG-1439-033), under the Deanship of Scientific Research, King Saud University,Riyadh, Saudi Arabia.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    午夜福利高清视频| av播播在线观看一区| 韩国高清视频一区二区三区| 男人和女人高潮做爰伦理| 一个人观看的视频www高清免费观看| 国产精品爽爽va在线观看网站| 国产精品一及| 免费电影在线观看免费观看| 久久久久精品久久久久真实原创| 日本免费一区二区三区高清不卡| 日韩一本色道免费dvd| 亚洲电影在线观看av| av天堂中文字幕网| 亚洲国产日韩欧美精品在线观看| 国产成人精品一,二区| 国产精品麻豆人妻色哟哟久久 | 中文字幕熟女人妻在线| 日本猛色少妇xxxxx猛交久久| 国产精品一区二区三区四区久久| 免费不卡的大黄色大毛片视频在线观看 | 狂野欧美激情性xxxx在线观看| 七月丁香在线播放| 国内揄拍国产精品人妻在线| 欧美区成人在线视频| 精品久久久久久久人妻蜜臀av| 国产 一区 欧美 日韩| 天堂影院成人在线观看| 久久草成人影院| 91精品国产九色| 在线播放国产精品三级| 男人狂女人下面高潮的视频| 午夜福利成人在线免费观看| 国产精品国产三级国产专区5o | 精华霜和精华液先用哪个| 男人舔女人下体高潮全视频| 亚洲av.av天堂| 日韩欧美三级三区| 亚洲av免费在线观看| 国产高清视频在线观看网站| 亚洲av福利一区| 国国产精品蜜臀av免费| 乱人视频在线观看| 深爱激情五月婷婷| 一级毛片久久久久久久久女| 久久综合国产亚洲精品| 国产免费视频播放在线视频 | 能在线免费观看的黄片| 日韩高清综合在线| 狂野欧美白嫩少妇大欣赏| 别揉我奶头 嗯啊视频| 亚洲图色成人| 国产综合懂色| 十八禁国产超污无遮挡网站| 欧美一级a爱片免费观看看| 日本一本二区三区精品| 99热精品在线国产| a级一级毛片免费在线观看| 国产伦理片在线播放av一区| 国产伦在线观看视频一区| 日韩国内少妇激情av| 日韩,欧美,国产一区二区三区 | 亚洲欧美清纯卡通| 久久久久九九精品影院| 亚洲最大成人手机在线| 午夜爱爱视频在线播放| 三级国产精品片| 晚上一个人看的免费电影| 99热这里只有精品一区| 国产 一区 欧美 日韩| 国产成人精品一,二区| 99久久精品一区二区三区| 97超碰精品成人国产| 亚洲,欧美,日韩| 精品酒店卫生间| 中文亚洲av片在线观看爽| 国产午夜精品论理片| 亚洲国产日韩欧美精品在线观看| 中国美白少妇内射xxxbb| 亚洲美女搞黄在线观看| av黄色大香蕉| 国产美女午夜福利| 亚洲国产精品成人久久小说| 少妇人妻精品综合一区二区| 国产v大片淫在线免费观看| 成人二区视频| 亚洲天堂国产精品一区在线| 麻豆一二三区av精品| 亚洲在久久综合| 成人亚洲欧美一区二区av| 亚洲av二区三区四区| 人妻夜夜爽99麻豆av| 神马国产精品三级电影在线观看| 97人妻精品一区二区三区麻豆| 日韩精品有码人妻一区| 国产精品爽爽va在线观看网站| 午夜福利在线观看吧| 免费看av在线观看网站| 亚洲天堂国产精品一区在线| 少妇的逼水好多| 国产精品久久久久久久电影| 亚洲av日韩在线播放| 深夜a级毛片| 国产国拍精品亚洲av在线观看| 国产成人a∨麻豆精品| 欧美色视频一区免费| 啦啦啦啦在线视频资源| 欧美极品一区二区三区四区| 联通29元200g的流量卡| 国产伦理片在线播放av一区| 国产熟女欧美一区二区| 中文字幕av成人在线电影| 精品国产一区二区三区久久久樱花 | 国产国拍精品亚洲av在线观看| 嫩草影院精品99| 天天躁日日操中文字幕| 久久久久网色| 最近2019中文字幕mv第一页| 国产91av在线免费观看| 成人综合一区亚洲| 色5月婷婷丁香| 欧美日韩综合久久久久久| 纵有疾风起免费观看全集完整版 | 久久久欧美国产精品| 亚洲最大成人中文| 国产免费福利视频在线观看| 精品人妻视频免费看| 国产精品人妻久久久影院| 视频中文字幕在线观看| 在现免费观看毛片| 嘟嘟电影网在线观看| 麻豆一二三区av精品| 一级黄色大片毛片| 国产爱豆传媒在线观看| 久久亚洲精品不卡| 长腿黑丝高跟| 偷拍熟女少妇极品色| 一级毛片我不卡| 中文字幕精品亚洲无线码一区| 亚洲国产精品国产精品| 日韩在线高清观看一区二区三区| 一个人看的www免费观看视频| 中文字幕av成人在线电影| 在线观看一区二区三区| 天堂√8在线中文| 免费搜索国产男女视频| 一级黄片播放器| 国产精品麻豆人妻色哟哟久久 | 精品久久久久久久人妻蜜臀av| 中文亚洲av片在线观看爽| 久久精品国产自在天天线| 精品一区二区免费观看| 丰满乱子伦码专区| 性插视频无遮挡在线免费观看| 人妻系列 视频| 国产大屁股一区二区在线视频| 麻豆一二三区av精品| 国产黄片美女视频| av在线观看视频网站免费| 久久久久久久久大av| 18禁在线播放成人免费| 国语对白做爰xxxⅹ性视频网站| 一级黄片播放器| 久久久久久久久久久丰满| 亚洲欧美日韩东京热| 日日啪夜夜撸| 日韩三级伦理在线观看| 国产91av在线免费观看| 欧美高清成人免费视频www| av.在线天堂| 免费播放大片免费观看视频在线观看 | 99久久中文字幕三级久久日本| 午夜亚洲福利在线播放| 国产精品熟女久久久久浪| 久久久久久久国产电影| 亚洲av中文字字幕乱码综合| 老司机影院毛片| 婷婷色麻豆天堂久久 | 高清毛片免费看| 日韩一区二区视频免费看| 三级经典国产精品| 只有这里有精品99| 亚洲熟妇中文字幕五十中出| 国产精品久久久久久av不卡| 男人狂女人下面高潮的视频| 大又大粗又爽又黄少妇毛片口| 最近中文字幕2019免费版| 国产真实伦视频高清在线观看| 国产一区二区在线av高清观看| 亚洲精品影视一区二区三区av| 秋霞在线观看毛片| 大又大粗又爽又黄少妇毛片口| 看十八女毛片水多多多| 亚洲真实伦在线观看| 国产精品国产三级国产av玫瑰| 91在线精品国自产拍蜜月| 午夜激情福利司机影院| 乱码一卡2卡4卡精品| 不卡视频在线观看欧美| 波多野结衣高清无吗| 国产一区亚洲一区在线观看| 99国产精品一区二区蜜桃av| a级毛色黄片| 免费一级毛片在线播放高清视频| 三级男女做爰猛烈吃奶摸视频| 亚洲自拍偷在线| 99九九线精品视频在线观看视频| 亚洲成人久久爱视频| 亚洲人成网站高清观看| 国产av码专区亚洲av| 最新中文字幕久久久久| 日韩成人av中文字幕在线观看| 欧美xxxx性猛交bbbb| 欧美一级a爱片免费观看看| 在线观看66精品国产| 精品人妻视频免费看| 日韩成人av中文字幕在线观看| 午夜激情福利司机影院| 国产一区有黄有色的免费视频 | 精品人妻一区二区三区麻豆| 亚洲色图av天堂| 成人三级黄色视频| 久久综合国产亚洲精品| 三级毛片av免费| 天堂网av新在线| 三级经典国产精品| 免费观看a级毛片全部| 国产亚洲最大av| 亚洲av免费高清在线观看| 亚洲国产精品久久男人天堂| 亚洲无线观看免费| 一区二区三区高清视频在线| 免费av毛片视频| 男女视频在线观看网站免费| 中文字幕亚洲精品专区| 91狼人影院| 亚洲精品日韩在线中文字幕| 精品午夜福利在线看| 好男人在线观看高清免费视频| 国产在视频线在精品| 看十八女毛片水多多多| 日本色播在线视频| 青青草视频在线视频观看| 久久人人爽人人片av| 欧美97在线视频| av福利片在线观看| 欧美bdsm另类| 一个人免费在线观看电影| 最近2019中文字幕mv第一页| 看片在线看免费视频| 国内少妇人妻偷人精品xxx网站| 国产一区亚洲一区在线观看| 男插女下体视频免费在线播放| 国产av一区在线观看免费| 精品熟女少妇av免费看| 久久亚洲精品不卡| av在线播放精品| 免费看日本二区| 久久久久久久久久成人| 日本与韩国留学比较| 一级二级三级毛片免费看| 99热这里只有是精品在线观看| 久久亚洲国产成人精品v| 人人妻人人澡欧美一区二区| 久久久精品欧美日韩精品| 韩国av在线不卡| 在线播放无遮挡| 人人妻人人澡欧美一区二区| 亚洲av电影在线观看一区二区三区 | 久久久久九九精品影院| 久久久久久久久久久丰满| 欧美不卡视频在线免费观看| 婷婷色综合大香蕉| 最后的刺客免费高清国语| 国产 一区 欧美 日韩| 欧美日韩一区二区视频在线观看视频在线 | 国产精品一及| 亚洲综合精品二区| 中文字幕免费在线视频6| 国产黄片美女视频| 日韩制服骚丝袜av| 亚洲综合精品二区| 天堂网av新在线| 久久久久久九九精品二区国产| 九九在线视频观看精品| 久久99热这里只有精品18| 亚洲欧美清纯卡通| 大又大粗又爽又黄少妇毛片口| 日本-黄色视频高清免费观看| av免费在线看不卡| 欧美成人一区二区免费高清观看| 色尼玛亚洲综合影院| 成年女人看的毛片在线观看| 亚洲精品色激情综合| 国产精品嫩草影院av在线观看| 又黄又爽又刺激的免费视频.| 永久网站在线| 中文精品一卡2卡3卡4更新| 午夜爱爱视频在线播放| 亚洲精华国产精华液的使用体验| 男女啪啪激烈高潮av片| 偷拍熟女少妇极品色| 如何舔出高潮| 亚洲最大成人中文| 成人国产麻豆网| 国模一区二区三区四区视频| 国产成人a区在线观看| 国产爱豆传媒在线观看| 最近视频中文字幕2019在线8| 色视频www国产| 久久精品国产亚洲av涩爱| 国产成人91sexporn| 中文字幕精品亚洲无线码一区| 日本熟妇午夜| 七月丁香在线播放| 国产精品熟女久久久久浪| 国产伦一二天堂av在线观看| 2021天堂中文幕一二区在线观| 精品久久久久久久久av| 又爽又黄无遮挡网站| 亚洲成色77777| 精品久久久久久久久av| 成人鲁丝片一二三区免费| 亚洲成色77777| 国产一区二区亚洲精品在线观看| 国产精品蜜桃在线观看| 日本免费在线观看一区| 麻豆国产97在线/欧美| 欧美xxxx性猛交bbbb| 国产午夜精品论理片| 在线观看一区二区三区| 一本久久精品| 久久亚洲国产成人精品v| 国产精品av视频在线免费观看| 国产精华一区二区三区| 欧美潮喷喷水| 欧美日本亚洲视频在线播放| 69人妻影院| 国产大屁股一区二区在线视频| 汤姆久久久久久久影院中文字幕 | 国产伦一二天堂av在线观看| 国产精品福利在线免费观看| 久久国产乱子免费精品| 亚洲av男天堂| 久久久久性生活片| 久久久久免费精品人妻一区二区| eeuss影院久久| videossex国产| 久久人妻av系列| 亚洲av中文av极速乱| 搡老妇女老女人老熟妇| 中文字幕人妻熟人妻熟丝袜美| 国产高潮美女av| 久久久久久久午夜电影| 久久这里有精品视频免费| 免费黄色在线免费观看| 波野结衣二区三区在线| 久久人人爽人人爽人人片va| 国产单亲对白刺激| 人体艺术视频欧美日本| 一级毛片久久久久久久久女| 中文字幕人妻熟人妻熟丝袜美| 午夜福利网站1000一区二区三区| 99九九线精品视频在线观看视频| 乱码一卡2卡4卡精品| 欧美精品一区二区大全| 免费观看性生交大片5| 联通29元200g的流量卡| 亚洲欧美日韩无卡精品| 最后的刺客免费高清国语| 国产亚洲精品av在线| 亚洲av.av天堂| 青春草国产在线视频| 高清午夜精品一区二区三区| 噜噜噜噜噜久久久久久91| 中文字幕人妻熟人妻熟丝袜美| 人人妻人人澡欧美一区二区| 丰满少妇做爰视频| 中国美白少妇内射xxxbb| 高清日韩中文字幕在线| 51国产日韩欧美| 国产精品人妻久久久久久| 久久这里只有精品中国| 国产麻豆成人av免费视频| 国产白丝娇喘喷水9色精品| 99在线人妻在线中文字幕| 成人亚洲精品av一区二区| 午夜福利高清视频| 亚洲婷婷狠狠爱综合网| 亚洲三级黄色毛片| eeuss影院久久| 少妇人妻精品综合一区二区| 日韩欧美国产在线观看| 亚洲精品aⅴ在线观看| 免费不卡的大黄色大毛片视频在线观看 | 国产一区二区在线观看日韩| 免费播放大片免费观看视频在线观看 | 久久久久久大精品| a级毛片免费高清观看在线播放| 一本一本综合久久| 欧美一级a爱片免费观看看| 丝袜喷水一区| 免费av不卡在线播放| 97超视频在线观看视频| 精品人妻熟女av久视频| 高清在线视频一区二区三区 | 亚洲欧美日韩东京热| 国产真实伦视频高清在线观看| 99在线人妻在线中文字幕| 欧美色视频一区免费| 一级爰片在线观看| 国内精品美女久久久久久| 色哟哟·www| 欧美xxxx性猛交bbbb| 国产精品不卡视频一区二区| 日本与韩国留学比较| 综合色丁香网| 久久久国产成人免费| 赤兔流量卡办理| 国产极品天堂在线| 一级二级三级毛片免费看| 国产精品久久久久久久久免| 色综合色国产| 亚洲一级一片aⅴ在线观看| 国产精品嫩草影院av在线观看| av专区在线播放| 国产av在哪里看| 日日干狠狠操夜夜爽| 九九爱精品视频在线观看| 九九在线视频观看精品| 国产成人精品婷婷| 毛片女人毛片| 亚洲人成网站在线观看播放| 亚洲精品亚洲一区二区| 国产极品精品免费视频能看的| 搞女人的毛片| 美女黄网站色视频| 亚洲乱码一区二区免费版| 亚洲欧洲日产国产| 亚洲精品,欧美精品| 少妇丰满av| 综合色av麻豆| 91久久精品国产一区二区成人| 亚洲国产欧美人成| 欧美色视频一区免费| 麻豆一二三区av精品| 永久免费av网站大全| 精品国产三级普通话版| 性色avwww在线观看| 老司机福利观看| 白带黄色成豆腐渣| 3wmmmm亚洲av在线观看| 久久韩国三级中文字幕| 中文字幕亚洲精品专区| 欧美最新免费一区二区三区| 色综合色国产| 中文资源天堂在线| 亚洲人成网站高清观看| 别揉我奶头 嗯啊视频| 在线免费十八禁| 亚洲人与动物交配视频| 国产成人a∨麻豆精品| 亚洲精品国产成人久久av| 七月丁香在线播放| 免费观看a级毛片全部| 国内精品宾馆在线| 特级一级黄色大片| 精品人妻视频免费看| 岛国在线免费视频观看| 晚上一个人看的免费电影| 国产精品.久久久| 亚洲综合色惰| 好男人在线观看高清免费视频| 99久久精品一区二区三区| 亚洲精品aⅴ在线观看| 成人特级av手机在线观看| 国产麻豆成人av免费视频| 国产精品99久久久久久久久| 免费观看a级毛片全部| 亚洲va在线va天堂va国产| 观看免费一级毛片| 亚洲激情五月婷婷啪啪| 亚洲av电影在线观看一区二区三区 | 成人高潮视频无遮挡免费网站| 亚洲欧美日韩无卡精品| av视频在线观看入口| 精品久久久噜噜| 纵有疾风起免费观看全集完整版 | 亚洲欧美日韩东京热| 国产 一区精品| av卡一久久| 成人美女网站在线观看视频| 亚洲精品自拍成人| 久久久精品欧美日韩精品| 老司机福利观看| 国产一级毛片七仙女欲春2| 国产黄a三级三级三级人| 日韩 亚洲 欧美在线| 久久鲁丝午夜福利片| 精品久久久久久电影网 | 人人妻人人澡人人爽人人夜夜 | 色网站视频免费| 国产精品一区二区三区四区久久| 三级毛片av免费| 欧美最新免费一区二区三区| 变态另类丝袜制服| 18禁动态无遮挡网站| 亚洲电影在线观看av| 国产精品三级大全| 欧美成人免费av一区二区三区| 国产免费男女视频| 国产国拍精品亚洲av在线观看| 日韩一区二区三区影片| 老司机影院毛片| 久久久久久久久久黄片| 免费搜索国产男女视频| 欧美潮喷喷水| 国产不卡一卡二| 99在线视频只有这里精品首页| 美女被艹到高潮喷水动态| av播播在线观看一区| 午夜免费激情av| 国产免费又黄又爽又色| 91av网一区二区| 麻豆成人午夜福利视频| 久久久久久久久久成人| 亚洲av.av天堂| 精品99又大又爽又粗少妇毛片| 国内少妇人妻偷人精品xxx网站| 免费观看性生交大片5| 国产高清三级在线| 国产 一区精品| 亚洲欧美日韩高清专用| 岛国毛片在线播放| 亚洲最大成人av| 又黄又爽又刺激的免费视频.| 婷婷色麻豆天堂久久 | 久久精品国产鲁丝片午夜精品| 男人舔奶头视频| 亚洲高清免费不卡视频| 尾随美女入室| 日韩三级伦理在线观看| 欧美成人午夜免费资源| 国内精品美女久久久久久| 国产免费视频播放在线视频 | 老司机福利观看| 中文字幕av成人在线电影| 免费搜索国产男女视频| 看黄色毛片网站| 久久久久久国产a免费观看| 亚洲国产成人一精品久久久| 免费看a级黄色片| 99久国产av精品国产电影| 欧美日韩国产亚洲二区| 黑人高潮一二区| 高清毛片免费看| 男人舔女人下体高潮全视频| 三级毛片av免费| 亚洲国产高清在线一区二区三| 亚洲aⅴ乱码一区二区在线播放| 日日摸夜夜添夜夜添av毛片| 久久久久久久久久黄片| 亚洲在线观看片| 国产成人免费观看mmmm| 久久草成人影院| 最近中文字幕2019免费版| 午夜亚洲福利在线播放| 国产一级毛片七仙女欲春2| 狠狠狠狠99中文字幕| 18+在线观看网站| 国产精品国产三级国产av玫瑰| 亚洲美女搞黄在线观看| 日本色播在线视频| 最近的中文字幕免费完整| 你懂的网址亚洲精品在线观看 | 日韩视频在线欧美| 99热网站在线观看| 午夜精品国产一区二区电影 | 97在线视频观看| 国产成人精品一,二区| www.色视频.com| 欧美成人午夜免费资源| 久久精品久久久久久噜噜老黄 | 边亲边吃奶的免费视频| 国模一区二区三区四区视频| 欧美不卡视频在线免费观看| 噜噜噜噜噜久久久久久91| 观看免费一级毛片| 久久99热这里只有精品18| 国产精品蜜桃在线观看| 中文字幕av成人在线电影| 成人性生交大片免费视频hd| 寂寞人妻少妇视频99o| 亚洲精品国产av成人精品| 看十八女毛片水多多多| av在线天堂中文字幕| av在线老鸭窝| 国产精品一及| 亚洲精品一区蜜桃| 日韩制服骚丝袜av| 亚洲不卡免费看| 精品国产三级普通话版| 九九爱精品视频在线观看| 成人国产麻豆网| 亚洲精品一区蜜桃| 99久久精品热视频| 国产精品永久免费网站| eeuss影院久久| 国产一级毛片在线| 亚洲欧美精品专区久久| 国产乱来视频区| 国产三级在线视频| 99热6这里只有精品| 国产三级中文精品| 免费黄色在线免费观看| 日韩一本色道免费dvd|