• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Robust Framework for Multimodal Sentiment Analysis with Noisy Labels Generated from Distributed Data Annotation

    2024-03-23 08:16:50KaiJiangBinCaoandJingFan

    Kai Jiang,Bin Caoand Jing Fan

    School of Computer Science and Technology,Zhejiang University of Technology,Hangzhou,310023,China

    ABSTRACT Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of our method by comparing it with four baselines.

    KEYWORDS Distributed data collection;multimodal sentiment analysis;meta learning;learn with noisy labels

    1 Introduction

    Sentiment analysis detects people’s attitudes,emotions,moods,and other subjective information[1–3] which can benefit many applications,such as emotional care service,mental health test and depression detection.The advent of distributed data collection and annotation has ushered in a new era,enabling the acquisition of extensive multimodal sentiment datasets from diverse sources such as search engines,video media,and social platforms like WeChat,Twitter,and Weibo[4].This abundance of data sources has greatly accelerated progress in the field of multimodal sentiment analysis.Regrettably,the inherent differences in annotators’proficiency levels have led to the introduction of a significant number of noisy labels[5–7].Recent unimodal research reveals that deep neural networks(DNNs)will overfit to noisy labels leading to a poor performance[8].So,it is a challenging problem for multimodal sentiment analysis with noisy labels.

    To address this challenging problem,numerous unimodal methods are proposed to explore the robust training of DNNS in the presence of noisy labels,such as sample selection methods [9–12]which adopt a clean sample selection strategy to identify and discard noisy data before DNN training,and label correction methods which attempt to find correct labels for noisy data [13–16].Although these noisy label learning methods reach promising performance with unimodal data,they cannot simultaneously tackle multimodal scenarios,such as multimedia data.

    Moreover,existing multimodal sentiment analysis methods are not explicitly tailored to address noisy labels,potentially leading to overfitting the noisy data[17,18].We conducted an empirical study on an existing multimodal sentiment analysis method tensor fusion network(TFN)[19]trained with noisy labels.Fig.1 illustrates the accuracy of TFN on different training epochs.We can observe that the accuracy on the training dataset has been increasing,but the accuracy on the validation dataset is declining which shows the DNNs tend to memorize the noisy labels rapidly,leading to a deterioration in performance.Hence,it is valuable and significant to explore how to train a robust multimodal sentiment analysis model with noisy labels,but as far as we know,there has been little related literature in this direction over the past years.

    Figure 1:We train an existing multimodal sentiment analysis model TFN on the Yelp-5 dataset with clean labels and 80% symmetric noisy labels (introduced in Section 4.1).The accuracy on different epochs is shown in the figures:(a)accuracy for the clean and noisy training dataset;(b)accuracy for the clean validation dataset

    In fact,given a multimodal dataset with noisy labels,to design a noise-tolerant label multimodal sentiment analysis method,two sub-tasks should be carefully considered,i.e.,how to correct the noisy labels?andhow to conduct multimodal sentiment analysis?

    In this paper,we introduce the Multimodal Robust Meta Learning(MRML)framework designed to enhance multimodal sentiment analysis by mitigating the effects of noisy labels across different modalities while concurrently establishing correlations between them.The framework optimizes the whole procedure of label correction and network training through a three-stage process.In the first stage,we propose a two-layer fusion net to correlate the different modalities deeply.Inspired by the attention mechanism[20],we first usefeature fusionwhere we calculate the weight for each modality feature and then average them.Second,instead of simply concatenating the two feature vectors,we usemodality early fusionwhere we apply two linear layers to calculate the attention weights for each modality feature.Compared with the unimodal feature,the multimodal fused feature has complementary information for label correction and network training.

    In the second stage,we present a multiple meta-learner strategy to automatically correct the noisy labels iteratively by using the multimodal fused feature.Similar to the recent noisy label learning work called Co-teaching[10],we use two meta-learners and exploit the different information from multiple models during the label correction procedure to increase the quality of the generated correct label and prevent the model from overfitting to noisy labels.After label correction,we train the learner with the corrected labels generated by the meta-learner.In the third stage,we update the meta-learner by minimizing the loss of clean validation data.Such a three-stage optimization process is expected to train a faithful meta label corrector and a robust learner by leveraging the clean validation data.

    The main contributions of our paper are as follows:

    ? We propose a robust multimodal sentiment analysis framework with noisy labels that can robustly train the network with multimodal noisy labels.

    ? We introduce a two-layer fusion network that effectively integrates information from diverse modalities.This integration enhances the quality of extracted multimodal data features,thereby contributing to improved label correction and network training outcomes.

    ? A novel multiple meta-learner strategy is proposed to robustly map noisy labels to the corrected ones by using the different information from multiple meta-learners.

    ? We implement experiments on three popular multimodal sentiment analysis datasets with varying noise levels and types to demonstrate the robust performance of our method.

    The organization of the forthcoming sections of this paper is as follows: Section 2 outlines the standard unimodal meta label correction network,while Section 3 delves into the comprehensive implementation details of MRML.In Section 4,we provide an account of the outcomes attained from our experimental evaluation.The examination of relevant research is presented in Section 5,with the final summary and conclusions offered in Section 6.

    2 Preliminaries

    In this section,we briefly summarize the typical unimodal meta label correction net[16,21].For an unimodal sentiment analysis task,(x,y) is the input and the corresponding label.Given a noisy training datasetD={(xi,yi),1≤i≤N},wherexiis thei-th sample andyiis the original(potentially noisy)label.LetDv=be the clean validation dataset whereM?N.We denote the meta-learner(label corrector)which generates corrected labels as=gφ(h(xi),yi),whereh(xi)is a feature representation of inputxiandis the corrected(pseudo)label outputted by the meta-learner,yidenotes the original label andφdenotes the meta-learner parameters.

    Meanwhile,we denote the learner (classifier) as=fθ(xi),whereis the predicted value,θdenotes the parameters of learner.The training objective(goal of learner)is to get a minimal loss on the training datasetDas

    For given aφ,we can get the optimalθ?(φ)through Eq.(1).So there is a functional relationship betweenθandφ,we denote the relationship asθ=θ?(φ).To this end,the meta-training objective(objective of meta-learner)is to get a minimal loss on the validation datasetDvas

    where theLvdenotes the meta-training loss on clean validation dataset.

    Bi-Level Optimization.There is a dependence between learnerθand meta-learnerφ.So it requires updating the optimalθ?wheneverφupdates which has been defined as a bi-level optimization procedure.Recently,Ren et al.[22] proposed a one-step stochastic gradient descent (SGD) method to approximate the optimalθ?forφupdating once.Specifically,at thet-th iteration,method updatesθas

    whereηis the step size forθ.Then it uses gradient descent to updateφas

    whereηis the step size forφ.Then it usesφt+1to updateθas

    whereθt+1is a better parameter than.

    Finally,the method uses Eqs.(3)–(5)to optimizeθandφuntil convergence.

    Analysis.The effectiveness of employing an uncontaminated validation dataset to steer model training in the presence of noisy labels is evident.The bi-level optimization approach is well-suited for implementing this strategy,enabling the framework to be trained seamlessly from start to finish.

    However,the aforementioned description shows the current two shortcomings of the existing unimodal meta label correction net.First,the current framework can only handle unimodal data and is not suitable for multimodal application scenarios.Another,due to the inherent uncertainty and inconsistency introduced by the noisy data,the predictions of the single meta-learner can fluctuate greatly during training with noisy labels which will further degrade the correctness of the corrected label[23].

    3 MRML Implementation

    Fig.2 shows our novel Multimodal Robust Meta Learning(MRML)framework for multimodal sentiment analysis with noisy labels where we treat the whole procedure of label correction and network training as a three-stage optimization process,i.e.,Multimodal Data Fusion,Label Correction and Learner Training,Meta-Learner Optimization.The corresponding pseudo-code is provided in Algorithm 1.

    3.1 Notations

    Figure 2:Overview of MRML architecture and computation flow.Here is the model’s operational flow:(1)Noisy training data input:it inputs the noisy training data into the learner and then obtains the logits and fused features from the learner.(2)Label correction:subsequently,the fused feature is fed into the meta-learner,which generates corrected labels.(3) Training loss computation: the next step involves the calculation of the training loss by using the logits and corrected labels to update the learner.(4)Validation loss computation:the updated learner then receives clean validation data and calculates the validation loss.(5) Meta-learner parameter update: finally,the gradient of the metalearner’s parameters is calculated through the validation loss to update the meta-learner

    3.2 Overview

    For a clear understanding,we first briefly introduce MRML architecture and the three-stage optimization process.Three models are involved in the framework,one learner and two meta-learners.The learner is defined as

    whereθis the parameters of learner,in whichθcandθf(wàn)denote the parameters of classification net and two-layer fusion net,respectively.And the two meta-learners are defined as

    whereh(xi)is the fused feature of inputxi,φis the meta-learner parameters anddenotes the corrected label.

    The three-stage workflows of MRML are:

    Stage 1:Multimodal Data Fusion.The primary objective of this stage is to construct the input for Stage 2,facilitating label correction and learner training.For this purpose,we introduce a two-layer fusion network that individually represents text and image data,followed by the amalgamation of these features.

    Stage 2:Label Correction and Learner Training.In this stage,we propose a multiple meta-learner strategy to generate corrected labels by using the fused featureh(xi).Then,we compute the training loss with the logits of learnerfθ(xi)and the corrected labelto update learnerθtoθ′.

    Stage 3:Meta-Learner Optimization.This stage uses a clean validation datasetDvfor meta-learner optimization.Specifically,we input the multimodal validation data to the updated main learnersθ′and compute the validation loss,then compute the gradient of the validation loss of the parameters to meta learner to update the meta learner.

    3.3 A Two-Layer Fusion Net

    As shown in the right part of the Fig.2,the two-layer fusion netθf(wàn)is the main component of learnerθand it will correlate each multimodal data as the input for Stage 2 that could augment the label correction with more information through the fused feature.The quality of the fused feature extracted by the two-layer fusion net is crucial for the label correction,where the fused feature generates the corrected label.First,we use BERT[24]and ResNet[25]to represent text and image data as follows:

    Text representation.We use the mean pooling to all tokens’hidden states from the BERT to represent text data as.

    Image representation.Image representation is based on ResNet model.We use the final output vector of the ResNet after the global pooling layer.The output size of the last convolutional layer in ResNet is 14×14×dr,where 14×14 denotes 196 block regionsIi,j(i,j=1,2...,14)in an image.Each regional feature representation can be defined asVi,j=ResNet(Ii,j).The extracted features of block regionsare arranged into an image block embedding sequenceb1=V1,1Wr,...,b196=V14,14Wr,whereVi,j∈andWr∈to match the embedding size of BERT,anddr=2048 when working with ResNet-50.

    wherenris the number of regions and is 196 in this paper.Hence,each modality’s representation feature can be defined as

    After representing two modalities,we use two fusion strategies namely feature fusion and modality fusion to combine the features.

    wherej,j′∈{image,text}denotes modalities;is the weight for the modalityjunder the guidance of modalityj′;wjis the final reconstruction weight for the modalityj;W1,W2are weight matrices andb1,b2are biases.After feature fusion,is now considered feature vectors of each modality and ready to serve as inputs of the next layer.

    (2)Modality early fusion.Motivated by the work of[27],we perform modality early fusion instead of simply concatenating the different modalities’feature vectors.We implement two linear layers to calculate the attention weights for each modality feature.

    3.4 Multiple Meta-Learner Strategy

    Multi-network strategies and ensemble learning have been shown their efficient for numerous different deep learning problems[10,28,29].The main goal is to enhance the performance of the DNNs against noise.Hence,we add a second meta-learner to increase the quality of label correction which can be defined as

    The utilization of a multiple meta-learner strategy offers two significant viewpoints[30].The initial aspect of introducing a second meta-learner is aimed at enhancing label correction,leading to more accurate labels.This corrective measure mitigates the potential of overfitting by refining labels not solely reliant on a single model.The second perspective involves enhancing the learner’s knowledge through additional information derived from these improved labels.On the contrary,a good learner will generate a high-quality fused feature which is crucial for the meta-learner to correct the noisy label.We demonstrate these two perspectives in the ablation study.The meta-learner and learner will help each other to learn with noisy labels.

    3.5 Bi-Level Optimization

    As mentioned in Section 2,the bi-level optimization in MRML can be defined as

    whereLis the loss function for classification,i.e.,cross-entropy,andh(x)fusionis the fused feature.

    One-step SGD method for bi-level optimization.Outside of meta label correction research,various other studies[31–33]also have used a similar bi-level problem.Instead of updating the optimalθ?for eachφ,a one-step SGD optimization method has been employed to update the θ and approximate the optimal learner for a given?

    whereηis the learning rate of the learner.Since the loss of meta-learner can be defined as,the bi-level optimization problem with one-step SGD now becomes

    4 Experiments

    In this section,we describe the extensive experiments performed to evaluate the effectiveness of MRML and compare it with the baselines under different noisy types and ratios.

    4.1 Datasets and Noise Settings

    Datasets.In a manner that does not compromise the breadth of applicability,we assess the performance of MRML using three extensively employed datasets for multiple sentiment analysis,as detailed in Table 1.We briefly introduce them as follows:

    Table 1:The statistics of datasets used

    ?Yelp-5,a dataset of online reviews scraped fromYelp.comin the food and restaurants category[34].Altogether,the dataset comprises over 44,000 reviews paired with corresponding images.Each individual review is associated with a single image.

    ?Twitter-15,a dataset consists of image-text reviews,where each multimodal sample contains text,a corresponding image,and an emotion target [35].It contains 3179 training samples,1122 testing samples and 1037 development samples.

    ?Multi-ZOL,a dataset of online reviews about shopping,economy,society,people’s livelihood,news,etc.[36].The dataset encompasses 5288 multimodal reviews,with each of these reviews containing both textual content and a set of images.

    Noise settings.Following the related work[13],as shown in Fig.3,we corrupt the label of training data with two settings:

    ?Symmetric noise:At noise ratio isp,a clean sample’s label is corrupted to other labels with probabilityand is kept in original label with probability 1-p,wherenis the number of classes.

    ?Asymmetric noise:At noise ratio isp,a clean sample’s label is corrupted to one of the othern-1 labels with probabilitypand is kept in original label with probability 1-p,wherenis the number of classes.

    Figure 3:Examples of the noise transition matrix for symmetric and asymmetric noise(taking 6 classes and noise ratio p=50% as an example)

    4.2 Baselines and Experiment Details

    Baselines.Since it is rarely touched on previous methods about multimodal sentiment analysis with noisy labels,we evaluate our method against the following baseline methods in multimodal sentiment analysis:

    ?MIMN,the multi-interactive memory network incorporates a pair of interactive memory networks.These networks are designed to oversee both textual and visual information,guided by the provided aspect[36].

    ?VistaNet,a framework that harnesses both textual and visual elements,utilizing visual cues to align and highlight essential sentences within a document through the application of attention mechanisms[34].

    ?HFIR,a hybrid fusion method based on the information relevance (HFIR) for multimodal sentiment analysis[27].

    ?ITIN,a novel Image-Text Interaction Network to explore the intricate relationship between affective image regions and textual content for multimodal sentiment analysis[37].

    Data preparation.Since our method needs additional clean validation data,we follow related work[13,22]to randomly select 100 samples per class from the training dataset before adding noise as clean validation data.

    Model preparation.(1) For data representation,we use BERT (the mean pooling to all tokens’hidden states)and ResNet-50(the final output vector after the global pooling layer)to represent text and image data,respectively.(2) For two meta-learners,as shown in Fig.2,we use the same 3-layer fully connected networks with dimensions of(768,128),(128,128),(128,label_numbers)initialized with different parameters for label correction.And we apply the linear activation functionReLUand the nonlinear activation functionTanhto enhance the model learning ability and use a classification layer to output corrected label distribution.(3)For the classification net in the learner,we use a simple 4-layer fully connected network for classification given as Table 2.

    Table 2:The classification net in learner

    Table 3:Test accuracy(%)of all baselines on Yelp-5 dataset under different noise ratios and types

    Table 4:Test accuracy(%)of all baselines on Twitter-15 dataset under different noise ratios and types

    Training details.(1) In early training epochs,the meta-learner has a poor ability to correct labels resulting in producing more error labels.We began to correct labels at a later 5 epochs as an initial warm-up.(2) In all conducted experiments,we utilize the ADAM optimizer [38] to train our approach.We set a maximum of 100 epochs for each dataset,initializing the learning rate to 0.0001.Additionally,we follow a consistent practice of saving testing results when the best outcomes are achieved on the development set across all methods.Our experimentation was carried out using Python 3.8 and PyTorch 1.8,executed on an RTX 3090Ti GPU.The reported results are averaged over five separate runs.

    4.3 Comparison with the Baselines

    We perform multimodal sentiment analysis across three distinct datasets to assess both MRML and the baseline methods.The accuracy results of our experiments are presented in Tables 3–5 for the respective datasets.Our method MRML achieves the best performance on all test cases.For example,MRML outperforms HFIR by up to 24.1%,31.4% and 23.9% on Yelp-5,Twitter-15 and Multi-ZOL datasets,respectively.It shows that our MRML is more robust to noisy labels and could provide guidance for future multimodal sentiment analysis with noisy labels.

    One similar trend that can be derived in the three tables is that the performance of all baselines degrades as the noise ratio goes up which confirms the noisy labels remarkably influence the performance of existing multimodal sentiment analysis methods.On the contrary,our method has no such issues.MRML achieves 30.8% on the Multi-ZOL dataset under 80% symmetric noise,which is significantly higher than that obtained by VistaNet(8.3%),MIMN(19.6%)and HFIR(6.9%),ITIN(21.46%).Especially,the degrading speed for VistaNet is even faster(from 45.5% to 6.9% with 20%-symmetric to 80%-symmetric).This is because VistaNet has no specified mechanism for dealing with noisy labels.On the other hand,we can observe that MIMN and ITIN have certain noise-tolerant abilities.For example,on the Multi-ZOL dataset with 80%-symmetric noise,MIMN achieves 19.6%which is obviously higher than 8.3% of VistaNet and 6.9% of HFIR.Similarly,ITIN outperforms VistaNet,MIMN and HFIR by up to 12.6%,1.5% and 10.7% on Twitter-15 dataset with 80%-symmetric noise,respectively.The main reason behind this may be that they use a multiple model strategy(i.e.,MIMN uses two memory networks for text and image data and ITIN a novel image-text interaction network) like our MRML,thus indicating the superiority of our multiple meta-learner strategy.

    Observing the data presented in Table 5,it is evident that the performance of all methods is comparatively lower on the Multi-ZOL dataset in comparison to the other two datasets,particularly in instances of elevated noise ratios.This phenomenon highlights the influence of class count on the ability to counteract interference caused by noisy labels.Notably,the robust fitting capabilities of DNNs can lead to a higher susceptibility to overfitting in more challenging tasks,particularly those involving a larger number of classes and the presence of noisy labels.

    Table 5:Test accuracy(%)of all baselines on Multi-ZOL dataset under different noise ratios and types

    Table 6:Test accuracy(%)of ablation study on Yelp-5 dataset under different noise ratios and types

    Table 7:Test accuracy (%) of ablation study on Multi-ZOL dataset under different noise ratios and types

    4.4 Ablation Study

    MRML introduces two main components which are the two-layer fusion net and a second metalearner.Therefore,it is necessary to conduct further experiments for an in-depth analysis of the contributions of each component.

    (1) Two-Layer Fusion Net.We implement MRML with one,multiple modalities and a concat fusion strategy.

    ?Text.Text vectors after the mean pooling to all tokens’hidden states of BERT are inputs of the classification net and meta-learner.

    ?Image.Image vectors after the pooling layer of ResNet are inputs of the classification net and meta-learner.

    ?Concat.Previous research concats multimodal feature vectors.We implement this concatenation strategy to fuse multimodal data[39].

    (2)Multiple Meta-Learner Strategy.We conduct experiments by using a single meta-learner for label correction and others remain the same.

    Tables 6 and 7 show the results in terms of classification accuracy on Yelp-5 and Multi-ZOL datasets.In general,we can see that both components provide an improvement over other methods.Moreover,the collaborative integration of the two components within MRML results in a more effective synergy,leading to enhanced classification accuracy through their combined efforts.The most significant improvements are gained on Multi-ZOL under 20%-symmetric noise with up to 13.5% increase in accuracy.

    Another,the feature based only on the image modality does not perform well,while text performs much better,demonstrating the important role of text modality.Compared with the concat fusion strategy,our proposed two-layer fusion net further improves the classification performance,revealing that our fusion net leverages features of two modalities in a more effective way.

    Fig.4 shows the results in terms of label correction accuracy on Yelp-5 dataset.Similar to the above classification results,the two meta-learners with the fused feature generated by our two-layer fusion net achieve the best label correction performance,indicating that the high quality of multimodal features and a second meta-learner are beneficial for label correction.Based on this insight,it is reasonable to anticipate that the introduction of a third network could potentially lead to additional performance enhancements.However,since the huge computation for bi-level optimization,we only consider the addition of more models when the computation resources are sufficient.

    Figure 4:(Continued)

    5 Related Work

    In this section,we describe the related works about unimodal learning with noisy labels methods and multimodal sentiment analysis methods.

    5.1 Learning with Noisy Labels

    Few methods have been revealed by far on how to effectively conduct multimodal sentiment analysis with noisy labels.However,many unimodal methods with noisy labels have been proposed which can be divided into three parts.

    Sample selection.Sample selection methods focus on using a data selection method to identify and discard noisy samples before training the model.Confident learning[11]calculated the confidence value of data and discarded the noisy data from the training dataset.Co-teaching[10]simultaneously trained two networks,and each model chooses the data with less loss to each other.Elkan[40]estimated the noisy data through positive-unlabeled learning.SELF[41]proposed a noisy data filtering method through model ensemble learning which utilizes the model’s predictions in different epochs to remove the noisy samples.AUM[9]identified the noisy data by measuring the mean difference between the logits of the sample’s assigned class.These methods have a common shortcoming in that a large amount of data would be discarded which reduces the robustness of the model when the noise ratio is high.

    Sample reweighting.Many existing methods aim to reweight the noisy data.Ren et al.[22]used a meta-reweighting method to assign small weights to the noisy data which could reduce the model’s negative impact.Wang et al.[42] reweighted the model’s noisy data through a weighting scheme.Shu et al.[43] also used a meta-reweight framework with a clean validation dataset and learned a loss-weighting function.All of these methods need a clean validation dataset to reweight noisy data.Xue et al.[44] estimated the noisy probability of data by using a probabilistic local outlier factor.Jiang et al.[12]proposed a model named MentorNet which leverages lesson plans by learning samples that are likely to be correct and dynamically learns data-driven lessons through StudentNet.Harutyunyan et al.[45] reduced the memorization of noisy labels through the mutual information between weights and updated the weights of data based on the gradients of the last layers.These sample reweighting methods always assign small weights to noisy data which would cause a waste of data information and degenerate the robustness of the model.

    Sample relabeling.The sample relabeling methods aim to correct the noisy labels which could leverage all the training data.Mixup [46] corrects the noisy labels by using data augmentation techniques.Hendrycks et al.[13]estimated the label corruption matrix,and then trained the network leveraging this corruption matrix.Mixmatch [47] used data augmentation and a single model’s prediction to relabel data.DivideMix[48]first identified the noisy training data through the Mixture of Gaussians.Then it utilizes two networks based on the co-teaching mechanism to correct noisy labels.Finally,it used the Mixmatch strategy[47]to train the two networks.Recently,many methods based on meta-learning[16,21,32,49,50]have been proposed.They adopt the meta-process as label correction,which aims to generate corrected labels for noisy data.All these methods use a clean validation dataset to guide the network training with noisy labels.

    5.2 Multimodal Sentiment Analysis

    Given the widespread use of diverse user-generated content,such as text,images,and speech,sentiment analysis has expanded beyond just text-based analysis.The field of multimodal sentiment analysis is dynamic,involving the automated extraction of people’s sentiments from various forms of communication channels.

    Multimodal data often comprises both text and image information,which can synergistically enhance and complement each other.Early research primarily focused on feature-based approaches.For instance,Borth et al.[51]introduced textual features derived from English grammar,spelling,and style scores,alongside visual features obtained through the extraction of adjective-noun pairs from images.More recently,the advancement of deep learning has led to the emergence of numerous neural network-based techniques for multimodal sentiment analysis.An example is the work by Yu et al.[52],where they pre-trained models for text and images to individually capture their respective feature representations.These features were subsequently combined and used to train a logistic regression model.Some work [53,54] concatenated features from different multimodal data and input it into the model.Another,some works appliedlate-fusionmethods that combine the predicting values from the individual unimodal models through a learning model [55,56] or an ensemble strategy like voting scheme [57–59].In Salur et al.[60],a soft voting-based ensemble model was proposed that takes advantage of the effective performance of different classifiers on different modalities.However,these methods ignore the connection between modalities.In response to these challenges,numerous researchers have employed LSTM cells and gating mechanisms to capture interaction dynamics within multimodal data[61–64].Han et al.[65]employed a gated control mechanism within the Transformer architecture to further enhance the ultimate output.Zadeh et al.[66] introduced a multiview gated memory unit to capture and forecast cross-modality interactions.Zhu et al.[37] presented a novel Image-Text Interaction Network (ITIN) for exploring the intricate connection between emotional image regions and textual content.While these techniques significantly enhance performance,their intricate architectures and substantial computational demands impede model interpretability.To address these limitations,our paper introduces an innovative fusion approach based on lightweight attention mechanisms.

    6 Conclusion

    This paper offers a concise examination of the challenge of multiple sentiment analysis involving noisy labels.Recent advancements in unimodal meta label correction have showcased promising potential in mitigating the impact of noisy labels.Building upon this foundation,we introduce a novel approach named Multimodal Robust Meta Learning(MRML)framework for multimodal sentiment analysis.This framework aims to counteract the influence of noisy labels in multimodal scenarios and simultaneously establish correlations across distinct modalities.Our MRML framework encompasses a three-stage optimization process.

    In the initial stage,we propose a two-layer fusion network to merge multimodal features.The subsequent stage involves a multiple meta-learner strategy,responsible for generating corrected labels and training the learner using these improved labels.In the final stage,we leverage a clean validation dataset to fine-tune the meta-learner.Through comprehensive experiments across three widely-utilized datasets,we validate the efficacy of MRML.Looking ahead,our future endeavors are centered around enhancing the MRML framework and extending its application to diverse domains.

    Acknowledgement:Thanks to the three anonymous reviewers and the journal editors for their invaluable contributions to the improvement of the logical organization and content quality of this paper.

    Funding Statement:This research was partially supported by STI 2030-Major Projects 2021ZD0200400,National Natural Science Foundation of China(62276233 and 62072405)and Key Research Project of Zhejiang Province(2023C01048).

    Author Contributions:The authors confirm contribution to the paper as follows:study conception and design:Kai Jiang,Bin Cao,Jing Fan;data collection:Kai Jiang;analysis and interpretation of results:Kai Jiang,Bin Cao,Jing Fan;draft manuscript preparation:Kai Jiang,Bin Cao.All authors reviewed the results and approved the final version of the manuscript.

    Availability of Data and Materials:The data used in this article are freely available in the mentioned references.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    欧美日韩中文字幕国产精品一区二区三区| 日韩欧美国产在线观看| 69人妻影院| 国产一区二区三区在线臀色熟女| 99在线人妻在线中文字幕| 国产高清激情床上av| 午夜a级毛片| 91av网一区二区| 国产三级中文精品| 国产一区二区三区视频了| 欧美日韩瑟瑟在线播放| 日日夜夜操网爽| 国产综合懂色| 99久国产av精品| 国产日本99.免费观看| 在线看三级毛片| 99久久综合精品五月天人人| 亚洲欧美精品综合久久99| 国产精品亚洲av一区麻豆| 美女 人体艺术 gogo| 日本撒尿小便嘘嘘汇集6| 免费在线观看亚洲国产| 亚洲人成电影免费在线| 亚洲人与动物交配视频| 久99久视频精品免费| av女优亚洲男人天堂| 成人av在线播放网站| 男人和女人高潮做爰伦理| 最后的刺客免费高清国语| 十八禁人妻一区二区| 老鸭窝网址在线观看| 亚洲七黄色美女视频| 亚洲精品影视一区二区三区av| 日本 欧美在线| 精品免费久久久久久久清纯| 久久久久久久精品吃奶| 丝袜美腿在线中文| 国产精品久久久久久人妻精品电影| 亚洲七黄色美女视频| 性色avwww在线观看| 国产淫片久久久久久久久 | 国产真人三级小视频在线观看| eeuss影院久久| 精品国产三级普通话版| 高清日韩中文字幕在线| 高清日韩中文字幕在线| 亚洲精品国产精品久久久不卡| 一个人免费在线观看的高清视频| 久久6这里有精品| 欧美激情在线99| 欧美黑人巨大hd| a在线观看视频网站| 禁无遮挡网站| 国产成人影院久久av| 欧美成人一区二区免费高清观看| 乱人视频在线观看| 色综合站精品国产| 亚洲av一区综合| 啪啪无遮挡十八禁网站| 中亚洲国语对白在线视频| 中亚洲国语对白在线视频| 真人一进一出gif抽搐免费| 欧美日韩福利视频一区二区| 操出白浆在线播放| aaaaa片日本免费| 精品久久久久久成人av| 国产又黄又爽又无遮挡在线| 成人欧美大片| 午夜激情欧美在线| 一区二区三区免费毛片| 最近最新免费中文字幕在线| 国产欧美日韩一区二区精品| 美女 人体艺术 gogo| 欧美成人免费av一区二区三区| 最近最新中文字幕大全电影3| 一区二区三区国产精品乱码| 欧美日韩国产亚洲二区| 亚洲va日本ⅴa欧美va伊人久久| 白带黄色成豆腐渣| 亚洲 欧美 日韩 在线 免费| 国产高清视频在线播放一区| 97碰自拍视频| 女人高潮潮喷娇喘18禁视频| 国产一区在线观看成人免费| 午夜免费成人在线视频| 悠悠久久av| 亚洲精品在线美女| 九九在线视频观看精品| 日韩精品青青久久久久久| 18禁黄网站禁片免费观看直播| 男人舔女人下体高潮全视频| 日韩欧美三级三区| 嫩草影院精品99| 啪啪无遮挡十八禁网站| 亚洲国产色片| 欧美3d第一页| 老师上课跳d突然被开到最大视频 久久午夜综合久久蜜桃 | or卡值多少钱| 香蕉丝袜av| 久久精品91无色码中文字幕| 国产成人aa在线观看| 亚洲一区高清亚洲精品| 日本三级黄在线观看| a级一级毛片免费在线观看| 成人高潮视频无遮挡免费网站| 小说图片视频综合网站| 日韩免费av在线播放| 3wmmmm亚洲av在线观看| 欧美黑人欧美精品刺激| 偷拍熟女少妇极品色| 日韩欧美精品免费久久 | 国产精品美女特级片免费视频播放器| 国产日本99.免费观看| 色av中文字幕| 久久久精品大字幕| 天天一区二区日本电影三级| 久久精品91蜜桃| 男人舔奶头视频| 国产精品久久久久久精品电影| 有码 亚洲区| 女人十人毛片免费观看3o分钟| 十八禁人妻一区二区| 中文字幕精品亚洲无线码一区| 欧美日韩亚洲国产一区二区在线观看| 99久久成人亚洲精品观看| 日本在线视频免费播放| 国产激情偷乱视频一区二区| 在线观看免费视频日本深夜| 久久欧美精品欧美久久欧美| 少妇丰满av| 麻豆国产97在线/欧美| 丝袜美腿在线中文| 黄色片一级片一级黄色片| 久久草成人影院| 国产精品一区二区免费欧美| 91九色精品人成在线观看| 亚洲成人免费电影在线观看| 久久人人精品亚洲av| 亚洲国产欧洲综合997久久,| 亚洲av熟女| 久99久视频精品免费| 高清在线国产一区| 天堂av国产一区二区熟女人妻| 日本成人三级电影网站| 欧美一区二区精品小视频在线| 国产三级在线视频| 午夜免费观看网址| 一个人免费在线观看的高清视频| 禁无遮挡网站| 色av中文字幕| 欧美日本视频| 国产一区二区三区视频了| eeuss影院久久| 亚洲狠狠婷婷综合久久图片| 免费大片18禁| 99riav亚洲国产免费| www国产在线视频色| 国产综合懂色| 久久久久久久久中文| 在线天堂最新版资源| 2021天堂中文幕一二区在线观| 午夜a级毛片| 中文字幕人妻丝袜一区二区| 久久精品91无色码中文字幕| 99久久综合精品五月天人人| 成人精品一区二区免费| 午夜福利在线观看免费完整高清在 | 国产一区在线观看成人免费| 国产精品三级大全| 老鸭窝网址在线观看| 夜夜夜夜夜久久久久| 看片在线看免费视频| 精品国产亚洲在线| 日韩精品中文字幕看吧| 老汉色∧v一级毛片| 国产精品香港三级国产av潘金莲| 日韩欧美免费精品| 亚洲成av人片免费观看| 国产单亲对白刺激| 少妇的逼水好多| 动漫黄色视频在线观看| 亚洲成人中文字幕在线播放| 97超视频在线观看视频| 黄色视频,在线免费观看| 给我免费播放毛片高清在线观看| 欧美成人免费av一区二区三区| 国产亚洲精品综合一区在线观看| 国产精品亚洲一级av第二区| 国产真人三级小视频在线观看| 香蕉丝袜av| 国产成人av教育| 看免费av毛片| 在线观看66精品国产| 中文字幕人成人乱码亚洲影| 国产av不卡久久| 国产亚洲欧美在线一区二区| 俺也久久电影网| 欧美中文综合在线视频| 在线国产一区二区在线| 国产99白浆流出| 色综合亚洲欧美另类图片| 欧美日韩乱码在线| 亚洲人与动物交配视频| 两个人看的免费小视频| av天堂中文字幕网| 3wmmmm亚洲av在线观看| 亚洲av日韩精品久久久久久密| 久久久久国内视频| 国产单亲对白刺激| 亚洲午夜理论影院| 亚洲国产高清在线一区二区三| 欧美日本亚洲视频在线播放| 99在线人妻在线中文字幕| 真人一进一出gif抽搐免费| 黄色丝袜av网址大全| 两性午夜刺激爽爽歪歪视频在线观看| 国内少妇人妻偷人精品xxx网站| 一进一出好大好爽视频| 99久久无色码亚洲精品果冻| 女人被狂操c到高潮| 中文资源天堂在线| 久久精品人妻少妇| 美女高潮喷水抽搐中文字幕| 久久6这里有精品| 久久精品国产99精品国产亚洲性色| 麻豆一二三区av精品| 大型黄色视频在线免费观看| 亚洲av免费在线观看| 色精品久久人妻99蜜桃| 久久草成人影院| 欧美绝顶高潮抽搐喷水| 99国产极品粉嫩在线观看| 99精品欧美一区二区三区四区| 国产精品久久视频播放| 一卡2卡三卡四卡精品乱码亚洲| 在线观看舔阴道视频| 国产免费一级a男人的天堂| 在线免费观看的www视频| 国产精品久久久久久人妻精品电影| 国产单亲对白刺激| 看黄色毛片网站| 欧美一区二区精品小视频在线| 天堂动漫精品| 1024手机看黄色片| 亚洲天堂国产精品一区在线| 老熟妇乱子伦视频在线观看| 村上凉子中文字幕在线| 亚洲熟妇熟女久久| 午夜福利在线观看吧| 内地一区二区视频在线| 国产精品国产高清国产av| 亚洲国产精品999在线| 久久久久久大精品| 美女 人体艺术 gogo| 亚洲国产日韩欧美精品在线观看 | 男插女下体视频免费在线播放| 精品99又大又爽又粗少妇毛片 | 嫁个100分男人电影在线观看| 尤物成人国产欧美一区二区三区| 一个人看视频在线观看www免费 | x7x7x7水蜜桃| 香蕉久久夜色| 神马国产精品三级电影在线观看| 国产精品三级大全| 麻豆国产av国片精品| 脱女人内裤的视频| 成年女人看的毛片在线观看| 国产精品久久久久久精品电影| 亚洲精品影视一区二区三区av| 法律面前人人平等表现在哪些方面| 少妇熟女aⅴ在线视频| 岛国在线观看网站| 日韩免费av在线播放| 久久久久久国产a免费观看| bbb黄色大片| 99精品在免费线老司机午夜| 国产一区二区三区视频了| 久久精品国产亚洲av香蕉五月| 一个人观看的视频www高清免费观看| 黄片大片在线免费观看| 国产精品 国内视频| 国产精品久久久人人做人人爽| 国产探花极品一区二区| 精品久久久久久久末码| 亚洲va日本ⅴa欧美va伊人久久| www国产在线视频色| 国产精品亚洲美女久久久| 精品久久久久久,| 极品教师在线免费播放| 国产日本99.免费观看| 搡老熟女国产l中国老女人| 日韩欧美免费精品| 国产成人av教育| 色综合婷婷激情| 精品人妻偷拍中文字幕| 国产精品日韩av在线免费观看| 久久久久性生活片| 一本综合久久免费| 欧美+亚洲+日韩+国产| 有码 亚洲区| 亚洲av免费高清在线观看| 国产精品 欧美亚洲| 麻豆成人av在线观看| 亚洲av美国av| 1024手机看黄色片| 免费在线观看日本一区| av在线蜜桃| 有码 亚洲区| 日本撒尿小便嘘嘘汇集6| 欧美+日韩+精品| www日本黄色视频网| 亚洲精华国产精华精| 级片在线观看| 九九在线视频观看精品| 亚洲熟妇中文字幕五十中出| 久久久国产成人免费| 日日摸夜夜添夜夜添小说| 精品电影一区二区在线| 亚洲av电影在线进入| 国产精品亚洲av一区麻豆| 99国产精品一区二区三区| 午夜福利欧美成人| 久久精品91无色码中文字幕| 国产高清三级在线| 身体一侧抽搐| 琪琪午夜伦伦电影理论片6080| 欧美丝袜亚洲另类 | www.色视频.com| 成人无遮挡网站| 成人一区二区视频在线观看| 国产精品综合久久久久久久免费| 国产精品三级大全| 亚洲人成伊人成综合网2020| 国产探花极品一区二区| 国产精品久久久久久精品电影| 精品欧美国产一区二区三| 欧美日本视频| 国内揄拍国产精品人妻在线| 日韩亚洲欧美综合| 亚洲成av人片在线播放无| 国产真人三级小视频在线观看| 日韩高清综合在线| 小蜜桃在线观看免费完整版高清| 亚洲真实伦在线观看| 岛国在线免费视频观看| 亚洲天堂国产精品一区在线| 国产午夜福利久久久久久| 欧美黑人欧美精品刺激| 久久久久性生活片| 激情在线观看视频在线高清| 国产蜜桃级精品一区二区三区| 精品欧美国产一区二区三| 国产亚洲精品av在线| 久久婷婷人人爽人人干人人爱| 色精品久久人妻99蜜桃| 国产在线精品亚洲第一网站| 波多野结衣高清无吗| 日本a在线网址| 久久人妻av系列| 久久精品人妻少妇| а√天堂www在线а√下载| 亚洲人成网站在线播| 午夜福利在线观看免费完整高清在 | 一本一本综合久久| 看片在线看免费视频| 精华霜和精华液先用哪个| 一本久久中文字幕| 色综合婷婷激情| 九九久久精品国产亚洲av麻豆| 午夜精品久久久久久毛片777| 久久久久久久久中文| 亚洲精品成人久久久久久| 久久精品91无色码中文字幕| 一级毛片女人18水好多| 男人舔奶头视频| 欧美激情在线99| 国产精品一区二区免费欧美| 日本与韩国留学比较| 欧美xxxx黑人xx丫x性爽| 国产成人a区在线观看| 12—13女人毛片做爰片一| or卡值多少钱| 亚洲国产精品久久男人天堂| 三级国产精品欧美在线观看| 天美传媒精品一区二区| 制服人妻中文乱码| 欧美日本亚洲视频在线播放| 啪啪无遮挡十八禁网站| 脱女人内裤的视频| 色在线成人网| 午夜老司机福利剧场| 小蜜桃在线观看免费完整版高清| 日本与韩国留学比较| 老鸭窝网址在线观看| 可以在线观看毛片的网站| 久久人人精品亚洲av| 午夜老司机福利剧场| 日日夜夜操网爽| 日韩 欧美 亚洲 中文字幕| 毛片女人毛片| 成年版毛片免费区| 丰满人妻一区二区三区视频av | 亚洲精品456在线播放app | 免费人成视频x8x8入口观看| 真实男女啪啪啪动态图| 国产av在哪里看| 日日干狠狠操夜夜爽| 最近最新中文字幕大全电影3| 老熟妇仑乱视频hdxx| 18禁在线播放成人免费| 尤物成人国产欧美一区二区三区| 国内少妇人妻偷人精品xxx网站| 欧美一区二区亚洲| 制服人妻中文乱码| av中文乱码字幕在线| 午夜免费激情av| 国产伦一二天堂av在线观看| 亚洲自拍偷在线| 久久精品91无色码中文字幕| 在线播放无遮挡| 色噜噜av男人的天堂激情| 国产真实乱freesex| 婷婷亚洲欧美| 91麻豆精品激情在线观看国产| 99久久精品国产亚洲精品| 色老头精品视频在线观看| 在线观看免费午夜福利视频| 真实男女啪啪啪动态图| 午夜免费成人在线视频| 中亚洲国语对白在线视频| 看黄色毛片网站| 男女做爰动态图高潮gif福利片| bbb黄色大片| 国产精品一区二区三区四区免费观看 | 国内精品一区二区在线观看| АⅤ资源中文在线天堂| 亚洲avbb在线观看| 国产高潮美女av| 最近最新免费中文字幕在线| 免费人成在线观看视频色| 欧美bdsm另类| 日韩有码中文字幕| 国产亚洲精品久久久久久毛片| 免费看美女性在线毛片视频| 一本精品99久久精品77| 成人国产一区最新在线观看| 精品久久久久久久末码| 国产午夜精品论理片| 九九久久精品国产亚洲av麻豆| 午夜激情福利司机影院| 欧美区成人在线视频| 听说在线观看完整版免费高清| 脱女人内裤的视频| 观看免费一级毛片| 中亚洲国语对白在线视频| 国产亚洲精品综合一区在线观看| 国产私拍福利视频在线观看| 狂野欧美白嫩少妇大欣赏| 亚洲人成网站在线播放欧美日韩| 一个人免费在线观看电影| 中文字幕av在线有码专区| 国产精品免费一区二区三区在线| www国产在线视频色| 在线观看av片永久免费下载| 高清毛片免费观看视频网站| 国产精品乱码一区二三区的特点| 国产精品精品国产色婷婷| 日日夜夜操网爽| 久久精品国产清高在天天线| 色视频www国产| 每晚都被弄得嗷嗷叫到高潮| 久久精品91无色码中文字幕| 国产探花极品一区二区| 久久久久久久午夜电影| 国产高清三级在线| 在线观看一区二区三区| aaaaa片日本免费| 12—13女人毛片做爰片一| 亚洲精品久久国产高清桃花| 欧美精品啪啪一区二区三区| 舔av片在线| 成人欧美大片| 久久精品国产清高在天天线| 亚洲国产中文字幕在线视频| 男人舔奶头视频| 精品欧美国产一区二区三| 日本熟妇午夜| 极品教师在线免费播放| 国产伦精品一区二区三区四那| 搡女人真爽免费视频火全软件 | 亚洲av第一区精品v没综合| 精品99又大又爽又粗少妇毛片 | 欧美色欧美亚洲另类二区| 99久久久亚洲精品蜜臀av| 最后的刺客免费高清国语| 又黄又爽又免费观看的视频| 久久久久久大精品| 国产欧美日韩精品一区二区| 亚洲,欧美精品.| 真实男女啪啪啪动态图| 嫩草影视91久久| 亚洲黑人精品在线| 两个人看的免费小视频| 午夜福利欧美成人| 国产麻豆成人av免费视频| 亚洲va日本ⅴa欧美va伊人久久| 亚洲内射少妇av| 亚洲欧美日韩高清在线视频| 欧美午夜高清在线| 蜜桃久久精品国产亚洲av| 日韩成人在线观看一区二区三区| 午夜激情欧美在线| 久久久久久九九精品二区国产| 国产精品嫩草影院av在线观看 | 中文字幕人妻丝袜一区二区| 最近最新中文字幕大全电影3| 久久99热这里只有精品18| 老司机福利观看| 在线视频色国产色| 国产精品亚洲av一区麻豆| 亚洲乱码一区二区免费版| 久久久久性生活片| 亚洲精品在线观看二区| 在线观看免费午夜福利视频| 国产高潮美女av| 麻豆国产av国片精品| 精品一区二区三区视频在线观看免费| 国产精品 国内视频| 欧美不卡视频在线免费观看| 最近最新中文字幕大全免费视频| 99国产精品一区二区三区| 性色av乱码一区二区三区2| 亚洲av美国av| 91麻豆av在线| 日韩欧美精品v在线| 国产69精品久久久久777片| 国产av在哪里看| x7x7x7水蜜桃| 男女午夜视频在线观看| 非洲黑人性xxxx精品又粗又长| 好男人电影高清在线观看| 国产成人av教育| 久久欧美精品欧美久久欧美| x7x7x7水蜜桃| av中文乱码字幕在线| e午夜精品久久久久久久| 午夜免费成人在线视频| 亚洲成a人片在线一区二区| 亚洲精品在线观看二区| 在线观看日韩欧美| 精品电影一区二区在线| 美女 人体艺术 gogo| 男人舔奶头视频| 在线看三级毛片| 日韩成人在线观看一区二区三区| www国产在线视频色| 中文字幕人成人乱码亚洲影| 欧美中文综合在线视频| 久久精品国产亚洲av香蕉五月| 成年女人毛片免费观看观看9| 亚洲av成人不卡在线观看播放网| 一级黄片播放器| 国产真实伦视频高清在线观看 | av天堂在线播放| 久久精品国产99精品国产亚洲性色| 午夜福利18| 免费看日本二区| 国产一区二区在线观看日韩 | 国产黄片美女视频| 亚洲五月天丁香| 国产精品久久久久久久电影 | 欧美三级亚洲精品| 男人的好看免费观看在线视频| 精品国产超薄肉色丝袜足j| 国产精品嫩草影院av在线观看 | 丝袜美腿在线中文| 成年版毛片免费区| 日日干狠狠操夜夜爽| 免费一级毛片在线播放高清视频| 香蕉av资源在线| 两个人看的免费小视频| 国产精品久久久人人做人人爽| 99热6这里只有精品| 亚洲精品亚洲一区二区| 桃色一区二区三区在线观看| eeuss影院久久| 欧美大码av| www.999成人在线观看| 波多野结衣巨乳人妻| 深爱激情五月婷婷| 亚洲欧美日韩东京热| 久久天躁狠狠躁夜夜2o2o| 全区人妻精品视频| 别揉我奶头~嗯~啊~动态视频| 久久久久久九九精品二区国产| 亚洲人成伊人成综合网2020| 亚洲欧美日韩无卡精品| 欧美极品一区二区三区四区| www.色视频.com| 久久久精品大字幕| 亚洲精品粉嫩美女一区| 91在线观看av| 亚洲欧美日韩卡通动漫| 精品不卡国产一区二区三区| 久久午夜亚洲精品久久| 亚洲专区中文字幕在线| 夜夜夜夜夜久久久久| 男人舔女人下体高潮全视频| 国产高清videossex| 亚洲熟妇熟女久久| 亚洲国产日韩欧美精品在线观看 | 国产亚洲精品一区二区www| 久久6这里有精品| 国产精品久久久久久久电影 | 午夜福利高清视频|