• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Improved Speech Emotion Recognition Focusing on High-Level Data Representations and Swift Feature Extraction Calculation

    2024-01-12 03:45:44AkmalbekAbdusalomovAlpamisKutlimuratovRashidNasimovandTaegKeunWhangbo
    Computers Materials&Continua 2023年12期

    Akmalbek Abdusalomov ,Alpamis Kutlimuratov ,Rashid Nasimov and Taeg Keun Whangbo,?

    1Department of Computer Engineering,Gachon University,Sujeong-Gu,Seongnam-Si,Gyeonggi-Do,13120,Korea

    2Department of AI.Software,Gachon University,Seongnam-Si,13120,Korea

    3Department of Artificial Intelligence,Tashkent State University of Economics,Tashkent,100066,Uzbekistan

    ABSTRACT The performance of a speech emotion recognition(SER)system is heavily influenced by the efficacy of its feature extraction techniques.The study was designed to advance the field of SER by optimizing feature extraction techniques,specifically through the incorporation of high-resolution Mel-spectrograms and the expedited calculation of Mel Frequency Cepstral Coefficients(MFCC).This initiative aimed to refine the system’s accuracy by identifying and mitigating the shortcomings commonly found in current approaches.Ultimately,the primary objective was to elevate both the intricacy and effectiveness of our SER model,with a focus on augmenting its proficiency in the accurate identification of emotions in spoken language.The research employed a dual-strategy approach for feature extraction.Firstly,a rapid computation technique for MFCC was implemented and integrated with a Bi-LSTM layer to optimize the encoding of MFCC features.Secondly,a pretrained ResNet model was utilized in conjunction with feature Stats pooling and dense layers for the effective encoding of Mel-spectrogram attributes.These two sets of features underwent separate processing before being combined in a Convolutional Neural Network(CNN)outfitted with a dense layer,with the aim of enhancing their representational richness.The model was rigorously evaluated using two prominent databases:CMU-MOSEI and RAVDESS.Notable findings include an accuracy rate of 93.2%on the CMU-MOSEI database and 95.3%on the RAVDESS database.Such exceptional performance underscores the efficacy of this innovative approach,which not only meets but also exceeds the accuracy benchmarks established by traditional models in the field of speech emotion recognition.

    KEYWORDS Feature extraction;MFCC;ResNet;speech emotion recognition

    1 Introduction

    The affective disposition of human beings,in other words,their emotional state,serves as a significant determinant in how they interact with one another and with machines.This emotional underpinning is not just trivial but rather it deeply informs and molds a multitude of communication pathways.Such pathways extend from visual cues evident in facial expressions,auditory signals highlighted in vocal characteristics,to the semantic structures embedded within verbal exchanges.

    Notably,spoken language,an integral component of human communication,is not simply a means of exchanging information.It serves as a critical medium for the articulation and conveyance of a myriad of human emotions.This emotive dimension of speech is often implicitly encoded in tonality,pitch,pace,and volume,thereby making it a rich and multifaceted source of information[1].Consequently,when we consider the interaction between humans and machines,especially in the context of developing interfaces that are natural and user-friendly,the ability to correctly understand,interpret,and respond to these emotional nuances becomes increasingly paramount.The pursuit of enabling machines with this proficiency akin to an emotionally intelligent human listener has the potential to profoundly enhance the level of empathy,engagement,and overall effectiveness of human-machine interactions.In this light,the affective computing field stands as a promising domain,committed to endowing machines with these emotionally cognizant capabilities.

    The process of discerning emotional cues in verbal communication,also known as Speech Emotion Recognition(SER),bears immense relevance for gaining insights into human communicative behaviors.At its core,SER provides a mechanism to interpret a speaker’s emotional condition utilizing the acoustic and prosodic attributes of their utterances.This creates an intriguing nexus of linguistics,psychology,and artificial intelligence[2–4].The broad-ranging implications and applications of SER techniques have permeated various fields,such as trading[5],tele-consultation(health prediction)[6],education[7],each with its unique requirements and objectives.

    Emotion recognition hinges on pinpointing features that capture emotional cues within datasets.Essentially,finding a precise feature set for this task is intricate due to emotions’multifaceted and personal nature.These‘emotion cues’in data symbolize attributes that resonate with specific emotions.Recognizing these cues,like pitch variations in speech,is vital for creating emotion-savvy models.While references[8–9]show no consensus on the best features,the quest is to identify features broad yet precise enough for different emotions,a continuing challenge in the field.

    In recent years,a predominant portion of studies have leaned towards the use of deep learning models,trained specifically to distill relevant feature sets from the data corpus[10–12].The precision of categorization in the realm of SER issues is predominantly influenced by the procurement and choice of effective features.For instance,the extraction of features from speech signals leverages techniques such as MFCC,mel-scale spectrogram,tonal power,and spectral flux.To enhance learning performance by reducing feature size,the Deer Hunting with Adaptive Search(DH-AS)algorithm is employed for optimal feature selection in the research[11].These selected features are then subjected to emotion classification via the Hybrid Deep Learning(HDL)approach,which combines both Deep Neural Network(DNN)and Recurrent Neural Network(RNN).

    Many trusted studies in SER have leaned on acoustic features like Pitch,Energy,MFCC,Discrete Fourier Transform(DFT),among others[13].Digging deeper,the effectiveness of emotion classification in speech emotion recognition is primarily anchored in the capability to distill and pinpoint the most influential features within speech data.These distinguishing traits shed light on a speaker’s emotional state.Therefore,the steps of feature extraction and selection are paramount,often shaping the classification algorithm’s outcome.Notably,MFCCs furnish a detailed insight into speech signals compared to rudimentary acoustic attributes.At its core,MFCCs capture the power spread across frequencies in a speech signal,presenting a glimpse of the speaker’s unique vocal tract dynamics.

    Our proposed solution employs a bifurcated strategy to mitigate these significant issues.First,capitalizing on our pioneering rapid computation of MFCC,we introduce an expedited method for the extraction of MFCC from vocal signals,serving as our MFCC feature encoder.The primary objective of this strategy is to enhance the efficiency of the feature extraction process,thereby enabling accurate and swift decomposition of speech data.This novel computational tactic emphasizes improving the velocity and effectiveness of MFCC attribute extraction.We further incorporate a bidirectional Long Short-Term Memory (Bi-LSTM) layer,tasked with capturing and encoding the complex temporal dependencies within the MFCC sequence.Second,serving as our Mel-spectrogram feature encoder,we exploit a pre-trained Residual Neural Network(ResNet)along with feature Stats pooling and fully connected layers to extract high-resolution spectrograms from Mel-spectrogram features.

    Subsequently,the outputs from both feature encoders are concatenated and introduced to a CNN,and subsequently to a Fully Connected Network(FCN).Ultimately,a softmax function is employed to facilitate emotion classification.This all-encompassing strategy is formulated to enhance the intricacy and efficiency of our speech emotion recognition paradigm.

    The main contributions of the proposed system are as follows:

    –A novel system for SER has been introduced,exhibiting remarkable accuracy when benchmarked against existing base models.This cutting-edge approach signifies a promising trajectory for subsequent research within the SER sphere.

    –Pioneering techniques were employed to extract fast MFCC features and Mel-spectrogram features from audio signals.

    –A novel method for the swift calculation of MFCC features was formulated.The accelerated computation of MFCC features greatly improves the efficiency of the feature extraction phase in the SER process,thereby reducing the overall processing time and increasing system responsiveness.The extracted MFCC features provide essential insights into the spectral properties of the speech signal,making them invaluable for emotion detection and recognition tasks.

    –A parallel processing methodology was introduced for the implementation of the Hanning window and value reduction operations.

    – Collectively,the outcomes of this research have significantly enriched our comprehension of SER,offering crucial insights into the development of more proficient speech recognition models.The ramifications of these results span various fields,including emotion identification in human-robot exchanges,progressions in speech therapy methodologies,and enhancements in psychiatric health diagnostics.

    The structure of this manuscript is as follows: Section 2 provides a comprehensive review of current research on SER modeling employing deep learning methodologies.Sections 3 and 4 are dedicated to a comprehensive elucidation of the proposed SER model,supplemented by empirical substantiation of its effectiveness and comparative analyses against established benchmarks.The aim is to equip the reader with an in-depth comprehension of the model’s structure and competencies,as well as positioning it within the broader context of the SER domain.In Section 5,a definitive summary is provided,along with discussions on potential avenues for future exploration.The document culminates with a reference list that includes a broad spectrum of recent academic publications related to SER.

    2 Related Work

    In this section,we offer a synopsis of the prevailing scholarly works pertinent to the subject of SER.By examining the current state of research in this area,we hope to provide valuable context and insights,and to highlight key areas where further investigation is needed.This overview should serve as a valuable foundation upon which to build future research endeavors,driving innovation and advancement in this exciting field of study.

    Over the recent timespan,systems underpinned by deep neural network architectures [14–17]have demonstrated significant triumphs in discerning emotions from vocal signals.In particular,the combination of CNNs and LSTMs in end-to-end methods[18–19]provided a robust way to capture both spatial features (through CNN) and temporal dynamics (through LSTM) in speech data.This fusion allowed for the thorough analysis of the speech signal,leveraging the strength of both models to yield a more accurate and nuanced understanding of the emotional content.This innovative blend of CNN and LSTM architectures paved the way for more sophisticated and effective speech emotion recognition models,revolutionizing the field’s landscape.

    The research[20]detailed a SER-focused integrated deep neural network model.Developed using advanced multi-task learning,it sets new SER standards with remarkable results on the renowned IEMOCAP dataset.It efficiently utilizes pretrained wav2vec-2.0 for speech feature extraction,refined on SER data.This model serves a dual purpose: emotion identification and automatic speech recognition,also producing valuable speech transcriptions.The research[21]investigated the nuances of implementing dilation/stride in 2D dilated convolution.It presents a method for the efficient execution of the inference section,free from constraints on input size,filter size,dilation factor,or stride parameters.This approach is built on a versatile 2D convolution architecture and reimagines 2D-dilated convolution through strategic matrix manipulation.Notably,its computational complexity remains constant regardless of dilation factor changes.Additionally,the method seamlessly integrates stride,resulting in a framework proficient in handling both dilation and stride simultaneously.The scholarly investigation [22] orchestrated a fusion of MFCCs and time-domain characteristics,generating an innovative hybrid feature set aimed at amplifying the performance matrix of SER systems.The resultant hybrid features,coined MFCCT,are employed within the architecture of a CNN to create a sophisticated SER model.Notably,this synergistic amalgamation of MFCCT features with the CNN model markedly transcends the effectiveness of standalone MFCCs and time-domain elements across universally acknowledged datasets.

    Furthermore,research[23]addressed the challenging task of effectively merging multimodal data due to their inherent differences.While past methods like feature-level and decision-level fusion often missed intricate modal interactions,a new technique named ‘multimodal transformer augmented fusion’is introduced.This method combines feature-level with model-level fusion,ensuring a deep exchange of information between various modalities.Central to this model is the fusion module,housing three Cross-Transformer Encoders,which generate multi-modal emotional representations to enhance data integration.Notably,the hybrid approach uses multi-modal features from feature-level fusion and text data to better capture nuances in speech.

    The operational efficiency of SER systems often encounters roadblocks owing to the intricate complexity inherent in these systems,the lack of distinctiveness in features,and the intrusion of noise.In an attempt to overcome these hurdles,the research [24] introduced an enhanced acoustic feature set,which is a composite of MFCC,Linear Prediction Cepstral Coefficients(LPCC),Wavelet Packet Transform (WPT),Zero Crossing Rate (ZCR),spectrum centroid,spectral roll-off,spectral kurtosis,Root Mean Square (RMS),pitch,jitter,and shimmer.These collectively serve to magnify the distinctive nature of the features.Further augmenting this proposition is the deployment of a streamlined one-dimensional deep convolutional neural network 1-D DCNN,designed to both reduce computational complexity and effectively encapsulate the long-term dependencies embedded within speech emotion signals.Acoustic parameters,typically embodied in the form of a feature vector,play a pivotal role in determining the salient characteristics of speech.The research [25] unfolded a pioneering SER model adept at simultaneously learning the Mel Spectrogram(MelSpec)and acoustic parameters,thereby harnessing their respective advantages while curbing their potential shortcomings.For the acoustic parameters,the model leverages the Geneva Minimalistic Acoustic Parameter Set(GeMAPS),a comprehensive compilation of 88 parameters acclaimed for their efficacy in SER.The model,as proposed,is a multi-input deep learning architecture comprising a trinity of networks,each catering to a specific function:one dedicated to the processing of MelSpec in image format,another engineered to handle GeMAPS in vector format,and the final one synergizing the outputs from the preceding two to forecast emotions.

    In spite of the individualistic models posited by authors within the previously referenced literature for the SER task,the persistent presence of certain limitations and the obstacle of sub-optimal prediction accuracy continue to warrant further exploration and resolution.The ensuing segments of this document thoroughly illuminate the exhaustive process flow of the proposed system,buttressed by detailed empirical results that serve as corroborative evidence.

    3 The Proposed SER System

    The section explicates the complex nuances inherent within the proposed system,explicitly engineered to discern emotional indications within vocal articulations.The system consists of two primary constituents,each integral to generating a precise interpretation of the speaker’s emotional predilection.A thorough operational sequence is illustrated in Fig.1 portraying the ordered progression of stages involved in the system’s deployment.The disparate components of the model synergistically operate to fulfill the goal of detecting emotional cues in speech.The architecture is constructed via the incorporation of MFCC and Mel-spectrogram characteristics,employing diverse deep learning techniques in accordance with their designated objectives.In sum,the proposed model embodies a holistic and robust strategy for auditory emotion identification,demonstrating versatility across a wide spectrum of pragmatic applications.This adaptability enhances the model’s operational capacity and positions it as a potent tool within the ever-evolving field of auditory emotion recognition.

    3.1 MFCC Feature Encoder

    3.1.1 Accelerated MFCC

    MFCCs have gained recognition as informative attributes for analyzing speech signals,finding widespread usage in the field of speech recognition.These characteristics are built upon two fundamental principles: cepstral analysis and the Mel scales.With their ability to capture crucial aspects of the speech signal,MFCC features have become a cornerstone of speech recognition systems.The process of extracting MFCC features involves separating them from the recorded speech signals.This separation allows for the isolation of specific acoustic properties that contribute to the discriminative power of the features.By focusing on these distinctive aspects,the MFCC algorithm enhances the accuracy and effectiveness of speech recognition.MFCCs offer a concise yet informative representation of the speech signal,facilitating robust speech recognition.These features serve as vital inputs to classification algorithms and models,enabling accurate identification and understanding of spoken words and phrases.

    This study introduces an expedited approach for extracting the MFCC from speech signals.The primary objective is to streamline the process of feature extraction,enabling efficient and accurate analysis of speech data.The proposed approach focuses on optimizing the speed and effectiveness of MFCC feature extraction.By employing innovative techniques and algorithms,the study aims to reduce the computational complexity and time required for extracting MFCC features from speech signals.

    Figure 1:Operational sequence of the proposed system

    The research delves into the development of efficient algorithm that leverage parallel processing and optimization strategies to expedite the extraction process.These advancements enable real-time or near real-time extraction of MFCC features,making it more practical for applications requiring swift processing,such as speech recognition,audio classification,and voice-based systems.

    We contemplate the operational order of the suggested approach(Fig.2)for swift derivation of the MFCC features from spoken discourse.

    Figure 2:Suggested fast calculation approach of the MFCC

    1)Framing–after initial filtering,the speech signal undergoes segmentation into frames of 16 milliseconds.Except for the initial frame,each subsequent frame includes the last 10 milliseconds of the frame before it,thereby generating a seamless and overlapping sequence of frames that covers the whole length of the signal.In this particular study,the frame length (N) is set to 256 samples,considering the speech signal’s sampling rate of 16 kHz.The offset length(M)is defined as 160 samples,indicating the displacement between consecutive frames.As a result,there is a 62.5%overlap between adjacent frames,which means that each frame shares a significant portion of data with the preceding and subsequent frames.Opting for this degree of overlap guarantees that 50%to 75%of the frame length is covered,conforming to the advised span for the analysis of speech signals.By including this level of overlap,the study aims to capture sufficient temporal information and provide a more comprehensive representation of the speech signal within each frame.

    2)Parallel processing of Hanning window and Reducing values(Fig.3).A Hanning window with a size of 1D was employed in this study.The application of the Hanning window aims to curtail disruptions from high-frequency constituents and diminish energy seepage.The structure of the Hanning window permits its side lobes to counterbalance each other,consequently mitigating the effect of high-frequency disturbances on the intended signal.By doing so,it helps to achieve improved spectral resolution and minimize the phenomenon of energy leakage,where signal energy spills into adjacent frequency bins.To minimize distortion and ensure smoother transitions within individual frames,a weight box is employed in the context of this study.This weight box serves the purpose of reducing abrupt changes and promoting a more gradual variation in the signal.The signal under examination in this study is a floating signal that comprises a dense and continuous tone.The strength or amplitude of this unvarying tone is primarily influenced by the amplitude of a pure tone at a distinct frequency,represented asf.This pure tone is subjected to filtering through the Hanning window,which helps shape and modify its characteristics.Incorporating the effects of the window’s frequency response,the magnitude of the flat tone is influenced when the Hanning window is applied to the pure tone.This process allows for the manipulation and adjustment of the signal’s spectral properties,leading to a more controlled and refined representation of the floating signal.An important feature of this window is that it establishes zero boundaries for the frames.This facilitates the computation of short-term energies as the signal traverses through the window.Subsequently,these energies can be retrieved from the sequence to ascertain the lowest amplitude energies.The aim is to exclude low-energy signals from the total signal by evaluating the signal’s energy and simultaneously refining it with this window.The subsequent formula(1) is essential for calculating the signal energy in this procedure:

    In this equation,Enrepresents the energy of the input signal fragment,whilexidenotes the signal value.In the subsequent step,the signal undergoes processing that effectively reduces the quantity of values that are passed into the processor.The window size,which is determined by both the number of samples and the duration,serves as a crucial parameter in the analysis.It is influenced by factors such as the fundamental frequency,intensity,and variations within the signal.

    3) Short-Time Fourier Transform (STFT)–the concept of high or low height in relation to the STFT has an intuitive interpretation.The STFT is a transform technique closely associated with the Fourier transform.It is employed to analyze the frequency and phase characteristics of localized portions of a signal as they vary over time.In practical terms,the computation of the STFT involves dividing a longer time signal into shorter segments of equal length.Each segment is then individually subjected to Fourier transformation,thereby revealing the Fourier spectrum of each segment.In practical applications,discrete-time signals are commonly used.The corresponding conversion from time domain to frequency domain is achieved through a discrete Fourier transformation,where the length of the signalXnrepresents the complex value frequency domain of N coefficients.The STFT is a widely utilized tool in speech analysis and processing,as it captures the time-varying evolution of frequency components.One notable advantage of the STFT,similar to the spectrum itself,is that its parameters have meaningful and intuitive interpretations.To visualize the STFT,it is often represented using the logarithmic spectra 20 log 10(X(h,j)).These 2D log spectra can then be displayed as a spectrogram using a color map known as a thermal map.During the third stage of the algorithm,the frames that have undergone the weight windowing process are subjected to the STFT spectral switching procedure.This involves opening the windows and calculating the Discrete Fourier Transform(DFT)of each window,resulting in the STFT of the signal.The transformation of theXnandWnwindows of the input signal can be defined as follows:

    where the k-index represents the frequency values,xnis the signal window,wnis the window function and N is the total number of samples in the window.

    4) Mel-Filterbank.In the fourth phase,the signal,now transformed to the frequency spectrum,is divided into segments using triangular filters,the boundaries of which are determined by the Mel frequency scale.The transition to the Mel frequency scale is guided by the following formula:

    where f—represents the frequency band.

    3.1.2 Bi-LSTM Encoder

    The processing pipeline for the MFCC sequence is meticulously crafted to harness the power of recurrent neural networks.We employ a bidirectional LSTM layer,imbued with a dropout rate of 0.5,to capture and encode the intricate temporal dependencies within the MFCC sequence.This bidirectional nature enables the model to effectively leverage both past and future context,ensuring a holistic understanding of the speech data.Furthermore,to counteract the risk of overfitting and enhance model generalization,a dropout rate of 0.1 is introduced in a subsequent linear layer,leveraging the rectified linear unit(ReLU)activation function to facilitate non-linear transformations and foster expressive feature representations.This thoughtful design seamlessly integrates regularization techniques,underscoring our commitment to achieving robust and reliable SER performance.The output from the Bi-LSTM encoder is subsequently amalgamated with the resultant output from the Mel-spectrogram feature encoder.This fusion ensures a comprehensive representation of the data by combining temporal sequence learning from Bi-LSTM with the frequency-based understanding from the Mel-spectrogram feature encoder.It optimizes the system’s learning capability by exploiting the complementary information inherent in these two distinct yet interrelated sources.

    Figure 3:Parallel processing of Hanning window and Reducing values

    3.2 Mel-Spectrogram Feature Encoder

    Conventional SER methodologies have conventionally relied upon an extensive repertoire of low-level time-and frequency-domain features to effectively capture and represent the multifaceted tapestry of emotions conveyed through speech.However,recent advancements have witnessed a paradigm shift towards cutting-edge SER systems that harness the formidable prowess of complex neural network architectures,enabling direct learning from spectrograms or even raw waveforms.In the pursuit of furthering this research frontier,our study capitalizes on the ResNet[26]architecture,skillfully employing mel-spectrograms as input features.By extracting high-resolution spectrograms,our model adeptly encodes the subtle intricacies of the spectral envelope and the coarse harmonic structures,deftly unraveling the rich tapestry of emotions permeating speech signals.Through this sophisticated framework (Fig.4),our model transcends the limitations of traditional feature-based approaches,culminating in an elevated degree of accuracy and efficacy in the discernment and recognition of emotions in speech data.

    Figure 4:Mel-spectrogram feature encoder

    Our Mel-spectrogram feature encoder commences with a meticulous two-step process.Firstly,we subject a ResNet model to pre-training on the expansive Librispeech dataset [27].This initial phase endows the model with a comprehensive foundation of knowledge,enabling it to glean essential insights into the underlying speech representations.Subsequently,we strategically replace the fully connected (FC) layers of the pre-trained model with Stats Pooling and FC layers.This deliberate replacement serves to prime the model for the precise task at hand-SER-employing the CMU-MOSEI and RAVDESS datasets as our experimental bedrock.To holistically capture the intricate temporal dynamics and contextual information pervasive within the speech data,our proposed system leverages the indispensable statistics pooling layer[28].Serving as a pivotal component within the architecture,this layer adeptly assimilates frame-level information over temporal sequences.Through the astute concatenation of the mean and standard deviation computed over the frames,it ingeniously distills the sequence of frames into a single,compact vector representation.This judiciously crafted vector encapsulates vital statistical information that encapsulates the nuanced emotional content ingrained within the speech signal.Notably,our system operates across distinct levels of granularity,intelligently harnessing the disparate capabilities of the ResNet model’s components.The convolutional layers,meticulously designed to extract salient features,operate at the frame level,diligently capturing local patterns and structures that underpin the speech signal’s intrinsic characteristics.In a complementary fashion,the FC layers assume the role of segment-level interpreters,harmoniously synthesizing the accumulated frame-level information within a given segment of speech.This segment-based perspective engenders a holistic grasp of the temporal dynamics while facilitating a comprehensive interpretation of higher-level emotional patterns.As a result,our SER system manifests heightened discrimination capabilities,adroitly striking a balance between fine-grained temporal dynamics and the discernment of overarching emotional patterns.

    In order to capture a rich and detailed representation of the audio signals,we employ a meticulous procedure for extracting high-resolution log-mel spectrograms.These spectrograms are meticulously engineered to possess a dimensionality of 128,allowing for a comprehensive encoding of the acoustic features embedded within the speech signals.The extraction process entails a frame-based approach,where each frame spans a duration of 25 ms and is sampled at a rate of 100 Hz,effectively capturing temporal dynamics with fine-grained precision at intervals of 10 ms.To ensure appropriate feature normalization,we employ a segment-level mean and variance normalization technique.However,we acknowledge that this normalization approach falls short of the optimal scenario where normalization is conducted at the recording or conversation level.In light of this limitation,we recognize the value of a more holistic normalization strategy that takes into account the broader context of the recording or conversation.By considering statistics computed at the conversation level,such as mean and variance,we can effectively capture the inherent variations and nuances within the conversation.Significantly,our meticulous experimentation has unveiled a compelling finding: normalizing the segments using conversation-level statistics yields substantial enhancements in the performance of the SER system,particularly when applied to the CMU-MOSEI and RAVDESS datasets.This empirical observation underscores the criticality of incorporating context-aware normalization techniques in order to effectively capture the subtle emotional cues embedded within real-world conversational scenarios and elevate the overall accuracy of SER systems.

    The ResNet model architecture employed in our study incorporates a first block comprising 32 channels,signifying the number of parallel convolutional filters utilized.To optimize the training process,we leverage the stochastic gradient descent (SGD) optimizer with a momentum value of 0.9,ensuring efficient convergence towards an optimal solution.Concurrently,a batch size of 32 is employed,allowing for parallel processing and expedited training.For the purpose of fine-tuning the convolutional layers,we adopt a learning rate of 0.001,enabling precise adjustments to the network parameters during this critical phase.Notably,the learning rate strategy is intelligently modulated,remaining constant for the initial 10 epochs to establish a stable training foundation.Subsequently,for each subsequent epoch,the learning rate is halved,facilitating finer parameter updates as the training progresses.To introduce the crucial element of non-linearity and enhance the model’s expressive capabilities,ReLU activation functions are applied across all layers,excluding the output layer.This choice of activation function enables the ResNet model to effectively capture complex patterns and salient features,facilitating the extraction of meaningful representations.In order to expedite the training process and bolster the model’s generalization properties,we integrate layer-wise batch normalization.This technique normalizes the inputs to each layer,ensuring consistent distribution and alleviating internal covariate shift,thereby accelerating model convergence and enhancing its ability to generalize to unseen data.

    4 Experiments and Discussion

    4.1 Datasets

    4.1.1 The CMU-MOSEI

    Table 1:RAVDESS and the CMU-MOSEI dataset content

    4.1.2 RAVDESS

    The RAVDESS[30]dataset stands as a notable collection designed specifically for cutting-edge research pursuits in emotion recognition.Distinguished by its accessibility and public availability,the RAVDESS dataset encompasses a rich reservoir of recordings that encompasses both emotional speech and song,rendering it an invaluable resource within the scientific community.A notable facet of the dataset lies in its diverse repertoire of emotional expressions,spanning a spectrum that encompasses neutral,calm,happy,sad,angry,fearful,surprised,and disgusted states,each meticulously articulated in both English and French languages.Actors were chosen to deliver monologues consisting of 13 sentences,both statements and questions,each conveying specific emotional tones.

    The data collection process is marked by meticulousness,with a cohort of 24 highly skilled individuals,equally balanced between males and females,actively participating.This gender parity ensures an equitable representation,fostering a holistic understanding of emotional expressions across diverse demographics.The speech dataset has 1440 files,derived from 60 trials for each of the 24 actors.The dataset is in WAV format with a 16-bit bitrate and a 48 kHz sampling rate.In Table 1 of the RAVDESS dataset,while each emotion is represented by 192 samples,the“Neutral”emotion has only 96 samples.

    4.2 Implementation Details

    To fairly evaluate our model on the CMU-MOSEI and RAVDESS datasets,we strictly trained it using the method described in[31].We split the data into 80%for training and 20%for testing.This allows the model to train on a large part of the data and still have a significant portion for validation,ensuring a complete evaluation of its performance.We obtained 1152 training samples and 288 testing samples from the RAVDESS dataset through this process.In a similar vein,for the CMU-MOSEI dataset,we assigned 22852 training instances and 5714 testing instances.Unlike the method in [32],we did not use 10-fold cross-validation.This was due to the challenges and resources needed to apply cross-validation to deep learning models.

    In speech emotion recognition,precision,recall,and accuracy are common metrics to evaluate model performance.They are widely accepted in the SER field because they offer a well-rounded view of how a model performs,considering both its relevance and comprehensiveness.

    Our system uses TensorFlow with Python,a popular open-source machine learning platform.We chose the Adam optimizer with a learning rate of 0.0001 and used‘categorical_crossentropy’for the loss function.We also applied L2 regularization for better efficiency and reliable convergence.

    We trained our model for 200 epochs using batches of 32 on a system with an Nvidia GeForce GTX 1660 Ti 16 GB GPU and an Intel Core i7-1265UE 10-Core CPU.Running on Windows 11 with 64 GB RAM,this setup allowed us to efficiently carry out deep learning tasks during both training and testing.

    He got plenty to run for, too, for all the hunters aimed at him, and tried to shoot him, and the dogs barked and ran after him wherever they got wind of him

    4.3 Recognition Results

    Tables 2 and 3 represent a detailed breakdown of the performance of the proposed emotion recognition system for each class of emotion.The classes include “Angry”,“Sadness”,“Disgust”,“Fear”,“Happy”,“Surprise”,“Calm”,and“Neutral”.The performance measures used are precision,recall,and F1 score.

    From the Table 2,we can infer that the model has a relatively high precision,recall,and F1 score for all the emotion classes of the RAVDESS dataset,indicating that it is performing well in recognizing different emotions from the speech data.The lowest precision is for the“Neutral”class at 89.9%,but this is still quite high.The lowest recall is for the“Fear”class at 93.7%,but again,this is relatively high.The F1 scores also indicate that there’s a good balance between precision and recall across all classes,with the lowest F1 score being 91.8%for the“Neutral”class.For example,in the“Surprise”class,the model correctly predicted anger with a precision of 96.6%of the times.It identified 97.3%of all actual instances of surprise(recall).The F1 score of 96.9%suggests a good balance between precision and recall.

    Table 2:The system’s recognition performance on different emotions of the RAVDESS dataset

    Overall,these numbers suggest the model performs well across different emotion categories,successfully recognizing each type of emotion from the given speech data.

    In the case of the CMU-MOSEI dataset,we evaluated the performance of our emotion recognition model across six different emotions:Angry,Sadness,Disgust,Fear,Happy,and Surprise.The results,as outlined in Table 3,show strong performance across all tested emotions.Our model achieved exceptional precision,particularly for ‘Happy’emotion,which scored 98.2%.Similarly,recall was highest for‘Happy’at 96.8%,indicating the model’s robustness in identifying instances of this emotion.The model demonstrated balanced performance in the ‘Sadness’category,with both precision and recall scoring around 95.0%.Furthermore,the ‘Happy’emotion resulted in the highest F1 score(97.7%),denoting an excellent harmony between precision and recall.

    Table 3:The system’s recognition performance on different emotions of the CMU-MOSEI dataset

    Table 4:The confusion matrix on the RAVDESS dataset

    Table 5:The confusion matrix on the CMU-MOSEI dataset

    However,our analysis also highlighted some areas for potential improvement.Both ‘Surprise’and‘Disgust’had relatively lower precision,recall,and F1 scores compared to other emotions,which suggests room for further optimization.This robust evaluation provides important insights into our model’s performance,emphasizing its strengths and revealing potential areas for future enhancements.These results are an encouraging step forward for the development of more effective and accurate emotion recognition models.

    The evaluation process was expanded by employing a confusion matrix,as shown in Tables 4 and 5.These tables supplied a visual interpretation and detailed explanation of how the model performed.It illustrated that the model surpassed a 92%accuracy rate on the RAVDESS dataset and 90%on the CMU-MOSEI dataset for each unique emotion class.These results suggest a high level of precision in the classification tasks,indicating the model’s sturdy and trustworthy ability to categorize emotions.

    The following Table 6 presents a comparative analysis of the proposed SER system against benchmark methods applied to two distinct datasets:RAVDESS and CMU-MOSEI.

    Table 6:Comparison with the benchmark SER methods

    For the RAVDESS dataset,we compare four SER methods,namely Bhangale et al.[24],Ephrem et al.[33],Pulatov et al.[34],and UA et al.[35].The proposed system is also included for reference.Notably,the proposed system exhibits the highest accuracy of 95.3%,outperforming the other methods.

    On the CMU-MOSEI dataset,we assess the performance of three SER methods,specifically Mittal et al.[36],Xia et al.[37],and Jing et al.[38],along with the proposed system.Here,the proposed system achieves an accuracy of 93.2%,which surpasses the results obtained by the other methods.

    The findings from this comparative analysis demonstrate the efficacy of the proposed system in both datasets,RAVDESS and CMU-MOSEI,showcasing its robustness in recognizing emotions from speech signals.It is important to consider the limitations and biases in each dataset,as they can impact the performance of SER methods.Careful evaluation and validation on diverse datasets are essential for developing robust emotion recognition models.

    4.4 Discussion and Limitations

    The results of our study demonstrate that the speech emotion recognition system we developed outperforms existing models in terms of detection accuracy.By leveraging high-resolution Melspectrograms for feature extraction and swiftly computing MFCCs,our system streamlines the emotion recognition process.This innovation effectively tackles the pressing challenge of feature extraction,a crucial component that significantly impacts the effectiveness of SER systems.

    Upon evaluating our proposed model,we garnered several compelling insights.The system showcased an impressive level of accuracy,as evidenced by the data in the accompanying tables.Across multiple emotion categories,our model consistently achieved Precision,Recall,and F1 scores exceeding 90% for the majority of them.These metrics highlight the model’s robustness and skill in emotion classification tasks.

    The categories of ‘Happy’emotions stood out,with exceptional Precision and Recall rates of 98.2% and 96.8%,respectively.This underlines the model’s proficiency in correctly identifying and categorizing instances of happiness.Conversely,the categories for ‘Surprise’and ‘Disgust’showed slightly weaker performance metrics,suggesting that the model may face challenges in accurately categorizing these specific emotions.

    These variations could be attributed to the inherently complex and subjective nature of human emotions,which can differ significantly across individual experiences and settings.Nonetheless,the overall high performance metrics affirm the model’s potential and efficacy.

    While our findings are encouraging,there are several limitations to consider.Our study primarily utilized the CMU-MOSEI and RAVDESS databases,which mainly consist of acted emotions that may not fully represent the nuances of spontaneous emotional expressions.For future research,extending to databases that capture more naturalistic emotional behavior would be beneficial.Additionally,our system is optimized for English and may display varying performance levels in different linguistic and cultural contexts.Future work should aim to improve the system’s linguistic and cultural adaptability.Moreover,our current model is audio-focused,overlooking the potential benefits of integrating visual or textual cues.Investigating multi-modal systems could offer a more holistic approach to emotion recognition in future studies.

    Finally,it is worth noting that the performance of our system can be influenced by various factors like background noise,distance of the speaker from the microphone,and other environmental elements.To enhance robustness,future iterations of this model could incorporate noise reduction techniques and additional preprocessing measures to maintain high recognition accuracy under diverse recording conditions.

    5 Conclusion

    The primary focus of our study was to design a sophisticated SER system that makes the most of Mel-spectrograms for intricate spectrogram extraction,coupled with the swift computation capabilities of MFCC,making the feature extraction phase both efficient and effective.At the heart of our effort was a commitment to go beyond existing benchmarks.We sought to address and overcome the limitations of current techniques,driven by an unwavering commitment to heightened accuracy in emotion recognition.In order to validate our advancements,we subjected our proposed system to rigorous evaluations using two distinct databases: The CMU-MOSEI and RAVDESS.The ensuing results not only met our expectations but in many respects,exceeded them.The system showcased its mettle by recording an accuracy rate of 93.2% on the CMU-MOSEI dataset and an even more commendable 95.3%on the RAVDESS dataset.

    Our findings in this research signify more than just technical advancements;they herald a new era in speech recognition systems.The insights we have garnered underscore several compelling avenues that warrant deeper investigation.To elaborate,we are currently delving into the fusion of our established model with the nuanced mechanisms of transformer architectures and self-attention.Additionally,there is a concerted effort underway to harness the power of pretrained audio models.Our overarching aim remains clear: to sift through speech and extract features that are not only abundant but also meaningful,thereby elevating the finesse of emotion detection.Recognizing the evolving landscape of spoken language and its myriad emotional undertones,we are also directing our energies towards assimilating a broader emotional speech database.Such a move is anticipated to fortify our model’s adaptability,ensuring it remains robust when faced with a spectrum of emotional expressions and intricate speech variations.By doing so,we aim to make our model not just technically adept but also practically invaluable in diverse real-world scenarios.

    In closing,the advancements birthed from this research project hold profound potential.The ripple effects of our work are anticipated to be felt far and wide,from making machine-human interactions more intuitive and genuine to refining therapeutic speech interventions and offering sharper mental health evaluations.We stand at the cusp of a transformative era,and our work seeks to be a beacon,lighting the way for future explorations and innovations that have the power to enrich and reshape the tapestry of our daily interactions and experiences.

    Acknowledgement:The authors A.A.,A.K.and R.N.would like to express their sincere gratitude and appreciation to the supervisor,Taeg Keun Whangbo(Gachon University)for his support,comments,remarks,and engagement over the period in which this manuscript was written.Moreover,the authors would like to thank the editor and anonymous referees for the constructive comments in improving the contents and presentation of this paper.

    Funding Statement:This work was supported by the GRRC program of Gyeonggi Province(GRRCGachon2023(B02),Development of AI-based medical service technology).

    Author Contributions:A.A.developed the method;A.A.,A.K.and R.N.performed the experiments and analysis;A.A.wrote the paper;T.K.W.supervised the study and contributed to the analysis and discussion of the algorithm and experimental results.All authors reviewed the results and approved the final version of the manuscript.

    Availability of Data and Materials:All datasets are available publicly at the website www.kaggle.com.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    亚洲人成网站高清观看| 午夜激情欧美在线| 国产在线精品亚洲第一网站| 日本一二三区视频观看| 老鸭窝网址在线观看| 一个人免费在线观看电影 | 啪啪无遮挡十八禁网站| 淫秽高清视频在线观看| 哪里可以看免费的av片| 1024香蕉在线观看| 成人三级黄色视频| 最好的美女福利视频网| 久久国产精品影院| 欧美不卡视频在线免费观看| 级片在线观看| 蜜桃久久精品国产亚洲av| 国产单亲对白刺激| 国产成人av教育| tocl精华| 久久久色成人| 国产精品 国内视频| 舔av片在线| a级毛片a级免费在线| 成年女人永久免费观看视频| 国产欧美日韩精品亚洲av| 亚洲av美国av| 免费高清视频大片| 精品久久久久久,| 男女午夜视频在线观看| 国产一区在线观看成人免费| 每晚都被弄得嗷嗷叫到高潮| 人妻夜夜爽99麻豆av| 我的老师免费观看完整版| 国产午夜福利久久久久久| 一区二区三区激情视频| 在线观看免费视频日本深夜| 91老司机精品| 两人在一起打扑克的视频| 亚洲一区高清亚洲精品| 女人被狂操c到高潮| 国内精品久久久久久久电影| 日本撒尿小便嘘嘘汇集6| 亚洲欧美激情综合另类| 可以在线观看的亚洲视频| 99久久国产精品久久久| 成人av在线播放网站| 精品熟女少妇八av免费久了| 国产欧美日韩一区二区精品| 好男人在线观看高清免费视频| 在线观看免费午夜福利视频| 国产麻豆成人av免费视频| 国产日本99.免费观看| 少妇的丰满在线观看| 成年免费大片在线观看| 狂野欧美白嫩少妇大欣赏| 三级毛片av免费| 97超级碰碰碰精品色视频在线观看| 夜夜夜夜夜久久久久| 性欧美人与动物交配| 精品午夜福利视频在线观看一区| 在线看三级毛片| 制服丝袜大香蕉在线| 亚洲熟妇熟女久久| 国产精品久久久av美女十八| 最好的美女福利视频网| 精品福利观看| 国产精品一区二区精品视频观看| 一本综合久久免费| 欧美不卡视频在线免费观看| 少妇熟女aⅴ在线视频| 欧美日韩精品网址| 亚洲av五月六月丁香网| 两性午夜刺激爽爽歪歪视频在线观看| 亚洲无线观看免费| 在线十欧美十亚洲十日本专区| 白带黄色成豆腐渣| 国产黄片美女视频| 国语自产精品视频在线第100页| 久久精品91蜜桃| 12—13女人毛片做爰片一| 伊人久久大香线蕉亚洲五| 最好的美女福利视频网| 国产精品香港三级国产av潘金莲| 男人舔女人下体高潮全视频| 美女高潮喷水抽搐中文字幕| 亚洲精品美女久久久久99蜜臀| 国产av不卡久久| 国产成人福利小说| 国产一区二区三区在线臀色熟女| 成在线人永久免费视频| 国内毛片毛片毛片毛片毛片| 色播亚洲综合网| 国产亚洲av嫩草精品影院| 欧美色视频一区免费| 欧美黄色淫秽网站| 欧美中文综合在线视频| 亚洲成a人片在线一区二区| 老司机福利观看| 国产又黄又爽又无遮挡在线| 日日夜夜操网爽| 日韩国内少妇激情av| 女同久久另类99精品国产91| 亚洲专区中文字幕在线| 最新中文字幕久久久久 | 偷拍熟女少妇极品色| 欧美黑人巨大hd| 真人一进一出gif抽搐免费| xxxwww97欧美| 久久久久精品国产欧美久久久| 国产久久久一区二区三区| 国产av一区在线观看免费| 久久久久国产一级毛片高清牌| 日本在线视频免费播放| 男女那种视频在线观看| 久久久久久久久中文| 国内精品一区二区在线观看| 女同久久另类99精品国产91| 日韩人妻高清精品专区| 18禁黄网站禁片免费观看直播| 亚洲,欧美精品.| a级毛片a级免费在线| 亚洲无线在线观看| 久久婷婷人人爽人人干人人爱| 免费高清视频大片| 90打野战视频偷拍视频| 久久亚洲精品不卡| 97人妻精品一区二区三区麻豆| 国产三级中文精品| 国产又色又爽无遮挡免费看| 黄片小视频在线播放| 亚洲精品美女久久av网站| 听说在线观看完整版免费高清| 18禁美女被吸乳视频| 真人一进一出gif抽搐免费| 97超级碰碰碰精品色视频在线观看| svipshipincom国产片| 亚洲成av人片在线播放无| 免费无遮挡裸体视频| 中出人妻视频一区二区| 久久伊人香网站| 国产精品野战在线观看| 欧美日韩国产亚洲二区| 亚洲精品一卡2卡三卡4卡5卡| 国产高潮美女av| 国产精品香港三级国产av潘金莲| 五月玫瑰六月丁香| 99在线人妻在线中文字幕| 91在线精品国自产拍蜜月 | 一级a爱片免费观看的视频| 亚洲欧美日韩无卡精品| 国产伦精品一区二区三区视频9 | 精品不卡国产一区二区三区| 老汉色∧v一级毛片| 亚洲精品一区av在线观看| 亚洲成人中文字幕在线播放| 黄片小视频在线播放| 90打野战视频偷拍视频| 欧美黑人巨大hd| 女警被强在线播放| www.精华液| 成人精品一区二区免费| 熟妇人妻久久中文字幕3abv| 天天一区二区日本电影三级| 久久国产精品影院| 99国产极品粉嫩在线观看| 久久婷婷人人爽人人干人人爱| 九九热线精品视视频播放| 亚洲av熟女| 亚洲色图 男人天堂 中文字幕| 亚洲人成伊人成综合网2020| www.熟女人妻精品国产| 日韩欧美在线乱码| 久久国产精品影院| 天天躁狠狠躁夜夜躁狠狠躁| 丰满的人妻完整版| 噜噜噜噜噜久久久久久91| 精品免费久久久久久久清纯| 亚洲色图 男人天堂 中文字幕| 老司机午夜福利在线观看视频| 日日夜夜操网爽| 国产毛片a区久久久久| 99视频精品全部免费 在线 | 一边摸一边抽搐一进一小说| 成人鲁丝片一二三区免费| 国产成人av激情在线播放| 欧美精品啪啪一区二区三区| 亚洲一区二区三区色噜噜| 国产高清激情床上av| 琪琪午夜伦伦电影理论片6080| 国产精品自产拍在线观看55亚洲| 中亚洲国语对白在线视频| e午夜精品久久久久久久| 精品久久久久久久久久免费视频| 亚洲国产精品合色在线| 大型黄色视频在线免费观看| 免费av毛片视频| 天堂√8在线中文| 免费人成视频x8x8入口观看| 欧美乱妇无乱码| 精品日产1卡2卡| 国产欧美日韩精品一区二区| 成年版毛片免费区| 国产精品久久久久久亚洲av鲁大| 国产伦精品一区二区三区视频9 | 999久久久精品免费观看国产| 亚洲美女黄片视频| 国产精品一区二区三区四区久久| 欧美日韩乱码在线| 久久久国产成人免费| 日韩欧美在线二视频| 亚洲午夜理论影院| 国产精品久久久久久久电影 | 亚洲成人中文字幕在线播放| 噜噜噜噜噜久久久久久91| 高清毛片免费观看视频网站| 午夜成年电影在线免费观看| 国产精品99久久久久久久久| 国产精品综合久久久久久久免费| 亚洲,欧美精品.| 老鸭窝网址在线观看| 成年免费大片在线观看| 亚洲av中文字字幕乱码综合| 美女午夜性视频免费| 久久中文看片网| 狠狠狠狠99中文字幕| 久久久久性生活片| 99久久99久久久精品蜜桃| 黄色 视频免费看| 精品国产亚洲在线| 久久久久亚洲av毛片大全| 日韩三级视频一区二区三区| 国产精华一区二区三区| 欧美中文综合在线视频| 免费在线观看视频国产中文字幕亚洲| 国产精品久久久久久亚洲av鲁大| 国产伦精品一区二区三区视频9 | 1000部很黄的大片| 亚洲精品美女久久久久99蜜臀| 国产精华一区二区三区| 久久久精品大字幕| 国产亚洲精品久久久久久毛片| 美女免费视频网站| 精品久久久久久久毛片微露脸| 男人的好看免费观看在线视频| 法律面前人人平等表现在哪些方面| 国产精品99久久久久久久久| 99国产极品粉嫩在线观看| 久久久国产精品麻豆| 一进一出抽搐动态| 国产激情欧美一区二区| 国内精品美女久久久久久| 国产成年人精品一区二区| 久久亚洲真实| 中文字幕精品亚洲无线码一区| 免费一级毛片在线播放高清视频| 国产午夜福利久久久久久| 久久99热这里只有精品18| 欧美zozozo另类| 国产视频内射| 久久精品亚洲精品国产色婷小说| 亚洲av成人不卡在线观看播放网| 一进一出好大好爽视频| 嫩草影院入口| 手机成人av网站| 女人高潮潮喷娇喘18禁视频| 亚洲欧美激情综合另类| 成人性生交大片免费视频hd| av在线天堂中文字幕| 巨乳人妻的诱惑在线观看| 两个人的视频大全免费| 中文在线观看免费www的网站| 中文资源天堂在线| 精品久久久久久久末码| 国产一区二区在线av高清观看| 精品乱码久久久久久99久播| 在线永久观看黄色视频| 岛国视频午夜一区免费看| 精品国产美女av久久久久小说| 国产精品免费一区二区三区在线| 五月玫瑰六月丁香| 国产成人福利小说| 久久久久久大精品| 午夜亚洲福利在线播放| 亚洲精品中文字幕一二三四区| 麻豆av在线久日| 桃红色精品国产亚洲av| 亚洲人与动物交配视频| 老司机午夜十八禁免费视频| 男人舔女人下体高潮全视频| 国产精品一及| 嫩草影院精品99| a在线观看视频网站| 香蕉国产在线看| 在线看三级毛片| 国产精品久久电影中文字幕| 级片在线观看| 中文资源天堂在线| 伊人久久大香线蕉亚洲五| 99re在线观看精品视频| 国产一区二区激情短视频| 俄罗斯特黄特色一大片| 69av精品久久久久久| 日本一二三区视频观看| 天天躁日日操中文字幕| 国产亚洲精品久久久com| 又大又爽又粗| 国产精品av视频在线免费观看| 亚洲五月天丁香| 色哟哟哟哟哟哟| 亚洲在线观看片| 国产精品久久视频播放| 一级a爱片免费观看的视频| 看片在线看免费视频| 性色avwww在线观看| av国产免费在线观看| 亚洲av五月六月丁香网| 亚洲avbb在线观看| 亚洲黑人精品在线| 国产精品女同一区二区软件 | 国产在线精品亚洲第一网站| 亚洲在线观看片| 免费观看精品视频网站| 99国产精品99久久久久| 小蜜桃在线观看免费完整版高清| 亚洲国产欧美网| 亚洲va日本ⅴa欧美va伊人久久| 曰老女人黄片| 国产一级毛片七仙女欲春2| 999久久久国产精品视频| 97人妻精品一区二区三区麻豆| 在线视频色国产色| 婷婷六月久久综合丁香| 国产成年人精品一区二区| 噜噜噜噜噜久久久久久91| 日本免费a在线| 岛国在线免费视频观看| 国产精品 国内视频| tocl精华| 亚洲人与动物交配视频| 欧美大码av| 成人一区二区视频在线观看| 国产成人av激情在线播放| 可以在线观看毛片的网站| 久久久久国产精品人妻aⅴ院| 熟女电影av网| 亚洲真实伦在线观看| 国产一区二区在线观看日韩 | 午夜久久久久精精品| 国产伦精品一区二区三区四那| 一二三四在线观看免费中文在| av天堂中文字幕网| 免费看a级黄色片| 九九久久精品国产亚洲av麻豆 | 90打野战视频偷拍视频| 色精品久久人妻99蜜桃| 国产欧美日韩一区二区精品| e午夜精品久久久久久久| 欧美成人免费av一区二区三区| 91在线精品国自产拍蜜月 | 国产主播在线观看一区二区| 国产三级中文精品| 日韩精品青青久久久久久| 国产精品一区二区三区四区免费观看 | 国内揄拍国产精品人妻在线| 欧美黑人欧美精品刺激| 俄罗斯特黄特色一大片| 日本 欧美在线| 成年人黄色毛片网站| xxxwww97欧美| 在线播放国产精品三级| 淫妇啪啪啪对白视频| 亚洲精品美女久久av网站| 丝袜人妻中文字幕| 啦啦啦观看免费观看视频高清| 亚洲专区中文字幕在线| 欧美一区二区国产精品久久精品| 国产精品美女特级片免费视频播放器 | 成年版毛片免费区| 亚洲,欧美精品.| 狂野欧美白嫩少妇大欣赏| 欧美乱码精品一区二区三区| 午夜a级毛片| 成人av在线播放网站| 天堂av国产一区二区熟女人妻| 精品久久久久久,| 欧美在线一区亚洲| 亚洲天堂国产精品一区在线| 日韩欧美 国产精品| 国产精品精品国产色婷婷| 一卡2卡三卡四卡精品乱码亚洲| av天堂中文字幕网| 亚洲精品美女久久av网站| 日本黄色片子视频| 老熟妇乱子伦视频在线观看| 美女高潮喷水抽搐中文字幕| 国产麻豆成人av免费视频| 成人国产综合亚洲| 免费高清视频大片| 日本黄色片子视频| 白带黄色成豆腐渣| 久久婷婷人人爽人人干人人爱| 午夜福利视频1000在线观看| 成人一区二区视频在线观看| 亚洲美女黄片视频| 嫁个100分男人电影在线观看| 窝窝影院91人妻| 国产精品久久久av美女十八| 久久久久久久精品吃奶| 观看免费一级毛片| 国产av在哪里看| 最近最新免费中文字幕在线| 日韩大尺度精品在线看网址| 人人妻人人澡欧美一区二区| 国产成人精品无人区| 亚洲在线自拍视频| 18禁观看日本| 狠狠狠狠99中文字幕| 精品一区二区三区av网在线观看| 久久精品人妻少妇| av视频在线观看入口| 亚洲人成网站高清观看| 男人舔女人的私密视频| 91久久精品国产一区二区成人 | a在线观看视频网站| 99久久99久久久精品蜜桃| 99久久精品热视频| xxxwww97欧美| 久久精品91蜜桃| 国产精品av久久久久免费| 成人18禁在线播放| 999久久久国产精品视频| 色精品久久人妻99蜜桃| 日日摸夜夜添夜夜添小说| 国产野战对白在线观看| 午夜免费激情av| 在线观看舔阴道视频| 久久婷婷人人爽人人干人人爱| 亚洲人成电影免费在线| 成人高潮视频无遮挡免费网站| 亚洲av美国av| 99热6这里只有精品| 麻豆成人av在线观看| 舔av片在线| 美女扒开内裤让男人捅视频| 欧美一区二区精品小视频在线| 亚洲国产欧美一区二区综合| 一级毛片高清免费大全| 久久久久久国产a免费观看| 亚洲国产精品999在线| 男人的好看免费观看在线视频| 亚洲午夜理论影院| 亚洲美女视频黄频| 在线看三级毛片| 亚洲成人久久性| 国产精品 国内视频| 欧美高清成人免费视频www| 欧美午夜高清在线| 国产69精品久久久久777片 | 国产伦人伦偷精品视频| 91老司机精品| 一区福利在线观看| 久久久久性生活片| 精品福利观看| 久久久国产精品麻豆| 成人18禁在线播放| 亚洲国产精品999在线| 麻豆国产av国片精品| 亚洲欧美日韩高清在线视频| 亚洲专区中文字幕在线| 丝袜人妻中文字幕| 亚洲va日本ⅴa欧美va伊人久久| 变态另类丝袜制服| 中文在线观看免费www的网站| 日韩成人在线观看一区二区三区| 色播亚洲综合网| 日日摸夜夜添夜夜添小说| 久久99热这里只有精品18| 97超级碰碰碰精品色视频在线观看| 又紧又爽又黄一区二区| 国产伦人伦偷精品视频| 一区二区三区国产精品乱码| 国产欧美日韩一区二区三| 欧美中文日本在线观看视频| 12—13女人毛片做爰片一| 国产精品久久视频播放| 亚洲自偷自拍图片 自拍| 美女黄网站色视频| 亚洲男人的天堂狠狠| 国产精品久久久av美女十八| 精品无人区乱码1区二区| 亚洲一区二区三区色噜噜| 国产精品久久视频播放| 国产伦在线观看视频一区| 成人国产综合亚洲| 国产亚洲欧美在线一区二区| 亚洲av电影不卡..在线观看| 又粗又爽又猛毛片免费看| 好男人在线观看高清免费视频| 一级黄色大片毛片| 欧美性猛交╳xxx乱大交人| 亚洲欧洲精品一区二区精品久久久| 人人妻人人澡欧美一区二区| 亚洲国产色片| 婷婷亚洲欧美| 观看美女的网站| 欧美高清成人免费视频www| 九九久久精品国产亚洲av麻豆 | 两个人看的免费小视频| 婷婷六月久久综合丁香| 久久草成人影院| 女人高潮潮喷娇喘18禁视频| 亚洲成人久久爱视频| 精品国产超薄肉色丝袜足j| 在线观看午夜福利视频| 又黄又粗又硬又大视频| 国产精品爽爽va在线观看网站| 精品国内亚洲2022精品成人| 国产精品久久久久久精品电影| 色视频www国产| 91字幕亚洲| 九色国产91popny在线| 亚洲专区字幕在线| 亚洲av日韩精品久久久久久密| 91老司机精品| 久久亚洲精品不卡| 国产单亲对白刺激| 亚洲精品一区av在线观看| tocl精华| 一区二区三区激情视频| 亚洲国产精品成人综合色| 小蜜桃在线观看免费完整版高清| 亚洲国产精品sss在线观看| 欧美日韩一级在线毛片| 黑人巨大精品欧美一区二区mp4| 日日干狠狠操夜夜爽| 无限看片的www在线观看| av在线天堂中文字幕| 少妇的逼水好多| 老汉色av国产亚洲站长工具| 精品一区二区三区av网在线观看| 久久精品国产清高在天天线| 熟女少妇亚洲综合色aaa.| 亚洲成人精品中文字幕电影| bbb黄色大片| 日韩人妻高清精品专区| 熟女人妻精品中文字幕| 我的老师免费观看完整版| 麻豆国产av国片精品| 一区二区三区高清视频在线| 精品一区二区三区视频在线 | 国产精品一区二区精品视频观看| 日日干狠狠操夜夜爽| 色老头精品视频在线观看| 免费大片18禁| 亚洲国产日韩欧美精品在线观看 | 最近视频中文字幕2019在线8| 国产精品野战在线观看| 床上黄色一级片| 美女被艹到高潮喷水动态| 波多野结衣高清作品| 中文在线观看免费www的网站| 国产私拍福利视频在线观看| 国产人伦9x9x在线观看| 国产精品久久久人人做人人爽| 99久久成人亚洲精品观看| 婷婷六月久久综合丁香| 中文字幕最新亚洲高清| 国产91精品成人一区二区三区| 成熟少妇高潮喷水视频| 午夜影院日韩av| 很黄的视频免费| 亚洲 国产 在线| 精品一区二区三区四区五区乱码| 久久人人精品亚洲av| 成年免费大片在线观看| 日韩欧美精品v在线| 老司机午夜十八禁免费视频| 最好的美女福利视频网| xxxwww97欧美| 亚洲国产精品sss在线观看| 久久精品国产99精品国产亚洲性色| 午夜福利在线在线| 国产日本99.免费观看| 午夜福利在线在线| 桃色一区二区三区在线观看| 在线视频色国产色| 99久国产av精品| 国产伦精品一区二区三区视频9 | 色播亚洲综合网| 在线观看免费视频日本深夜| 真人一进一出gif抽搐免费| 亚洲一区二区三区色噜噜| 国产淫片久久久久久久久 | 视频区欧美日本亚洲| 亚洲中文日韩欧美视频| 叶爱在线成人免费视频播放| 无限看片的www在线观看| 日本a在线网址| 国产精品国产高清国产av| 国产成人精品久久二区二区91| 欧美不卡视频在线免费观看| av视频在线观看入口| 激情在线观看视频在线高清| 亚洲精品久久国产高清桃花| 久久香蕉精品热| 可以在线观看的亚洲视频| 又粗又爽又猛毛片免费看| 桃色一区二区三区在线观看| av国产免费在线观看| svipshipincom国产片| 久久久成人免费电影| 天堂网av新在线| 欧美日韩中文字幕国产精品一区二区三区| 亚洲av成人一区二区三| 免费看美女性在线毛片视频|