• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Arabic Named Entity Recognition:A BERT-BGRU Approach

    2021-12-14 09:58:12NorahAlsaaranandMahaAlrabiah
    Computers Materials&Continua 2021年7期

    Norah Alsaaranand Maha Alrabiah

    Department of Computer Science,Imam Muhammad Ibn Saud Islamic University,Riyadh,Saudi Arabia

    Abstract:Named Entity Recognition(NER)is one of the fundamental tasks in Natural Language Processing (NLP), which aims to locate, extract, and classify named entities into a predefined category such as person, organization and location.Most of the earlier research for identifying named entities relied on using handcrafted features and very large knowledge resources,which is time consuming and not adequate for resource-scarce languages such as Arabic.Recently, deep learning achieved state-of-the-art performance on many NLP tasks including NER without requiring hand-crafted features.In addition,transfer learning has also proven its efficiency in several NLP tasks by exploiting pretrained language models that are used to transfer knowledge learned from large-scale datasets to domain-specific tasks.Bidirectional Encoder Representation from Transformer(BERT)is a contextual language model that generates the semantic vectors dynamically according to the context of the words.BERT architecture relay on multi-head attention that allows it to capture global dependencies between words.In this paper, we propose a deep learning-based model by fine-tuning BERT model to recognize and classify Arabic named entities.The pre-trained BERT context embeddings were used as input features to a Bidirectional Gated Recurrent Unit(BGRU)and were fine-tuned using two annotated Arabic Named Entity Recognition(ANER)datasets.Experimental results demonstrate that the proposed model outperformed state-of-the-art ANER models achieving 92.28% and 90.68%F-measure values on the ANERCorp dataset and the merged ANERCorp and AQMAR dataset,respectively.

    Keywords: Named entity recognition; Arabic; deep learning; BGRU; BERT

    1 Introduction

    Textual information represents a wide share of digital content, and it is continuing to grow rapidly every moment, which requires linguistic and deep semantic analysis techniques to achieve a better and faster understanding of this information.NER is one of the techniques used to identify and classify each word in a given text into predefined semantic categories such as person name, location name, organization name, and miscellaneous.NER plays an essential role in several NLP tasks such as information retrieval, question answering, machine translation, and text summarization.However, NER is considered a challenging task and has several difficulties.For example, the presence of homonyms, synonyms and acronyms is causing text ambiguity, which is considered one of the main challenges that face NER.Additionally, a word may appear as a Named Entity (NE) in one context and as a common noun in another; a word may also refer to more than one entity type.Moreover, names compose the largest part of language and they are constantly evolving in different domains, and as NER is considered a domain-dependent task, this requires updating annotated corpora which is costly and requires linguistic and domain expertise [1].Furthermore, due to the international domination of the English language in several fields such as communications, science, information technology, business, and other fields, most of the NER research is focused on studying English, which caused the lack of variety of text genres for other languages such as Arabic.

    There is a considerable amount of work on NER; different techniques relying on rule-based approaches, machine learning-based approaches and hybrid approaches have been introduced.Although these approaches showed significant and successful results, they often require a lot of human effort in designing rules and/or features.Recently, deep learning became a trending approach attracting significant attention due to its state-of-the-art performance in various NLP tasks including NER.The main strength of deep learning is its reliance on non-linear transformations while learning complicated features from data, in addition to reducing the time and effort required to design NER features by learning useful representations from raw data automatically [2].Recurrent Neural Network (RNN) is a neural network architecture that can handle sequential data with feedback loops that memorize the previous time steps output and use them to predict the current time step input.RNNs are often used in NLP tasks that require the sequential processing of texts, such as translation applications where understanding the next word is based on the context of the previous text [3,4].Bidirectional RNNs, such as BGRU, can learn long dependency input which makes it superior in many NLP tasks.

    NLP tasks especially for low recourse languages such as Arabic suffer from nonsufficient training data, which causes poor model generalization.Transfer learning helps in solving this issue by transferring knowledge across domains or tasks.It aims to extract the knowledge from one or more source tasks and applies that knowledge to other target tasks.This improves the performance of learning and avoid expensive efforts for data labeling.Fine-tuning is one of the transfer learning strategies where the downstream NLP task model is initialized with the pretrained language model parameters.Then, all the parameters are fine-tuned using the downstream task’s labelled data.This requires less time to train the down-stream task model because a huge part was done by the pre-trained language model [5].Word2vec [6], Glove [7], and Fasttext [8] are all examples of pre-trained language models, called word embeddings, that can be used in transfer learning.Although these word embeddings have achieved great success in many NLP tasks, they are considered as non-contextual language models as they only capture static meanings of words and represent each word by a single vector, which makes polysemous words a challenge [9].On the other hand, contextual language models have the ability to understand the full context of each word in an input sequence and build more than one vector for each word depending on its position in text.These models have shown that such representations are able to achieve state-ofthe-art performance in many complex NLP tasks.BERT is an example of a contextual language model that can be fine-tuned.BERT is based on a multilayer bidirectional transformer encoder that takes into account both left and right contexts [10].

    Although the last decade has witnessed a significant progress in ANER, the task of developing a robust ANER system remains challenging compared to other languages.This is due to the fact that Arabic has no capitalization of proper nouns, which is considered a significant feature in some languages that helps in defining nouns.In addition, Arabic has a complex morphological system that allows conjunctions, prepositions and pronouns to be added to words as suffixes and prefixes, such as the word waletasa’adowhich means “and so that you can become happy.” Common Arabic writings often miss the presence of short vowels (diacritics) that are essential for identifying the correct word’s form and meaning.For example, the wordmay have several variations which are katabameaning “wrote”, kotobmeaning “books”andmeaning “was written.” Moreover, there are variations in writing certain Arabic letters especially in transliterated words, such as the word gram, which can be written asorAdditionally, similarity or inherent disagreement in some Arabic characters may cause orthographical confusion.For example, the variation of the hamza which is commonly written asinstead ofand.Furthermore, the lack of computational linguistic resources for ANER, as mentioned earlier, poses another challenge to ANER [11].

    In this work, we fine-tune the pre-trained BERT language model for Arabic NER using a BGRU-based deep learning model.Pre-trained BERT language model parameters were trained extensively using large unannotated data, to encode information about the language.Vectors representations by the pre-trained BERT language model are fine-tuned on the ANER task-specific dataset rather than using them directly.A BGRU layer was added on top of the pre-trained BERT language model for further training to improve the final context vectors.Fine-tuning helps in training the model with less time and epochs because a huge part of the training was performed by the pre-trained BERT language model.In addition, the model can be trained to good performance with small training dataset because the ANER model weights are initialized with the pre-trained BERT weights.

    The rest of this paper is organized as follows.Section 2 provides an overview of the underlying research field.Section 3 presents the proposed BERT-BGRU architecture for ANER in details.Section 4 describes the used datasets, the experimental settings, baselines, and results.Section 5 provides a discussion of the obtained results.Finally, Section 6 presents the conclusions and future work.

    2 Related Work

    There is a fair amount of literature concerning NER research, which covers a wide range of approaches such as rule-based, machine learning-based, deep learning-based and hybrid approaches.However, deep learning approach has achieved state-of-the-art results using various models, and the Bidirectional Long Short-Term Memory with Conditional Random Fields(BLSTM-CRF) was the most dominating model [12–17].Different variations of the model were introduced to improve results such as adding a part-of-speech attention mechanism to utilize the part of speech information [12], adding a multi-head attention layer to get the meaning of the words and their syntactic features [13] and using additional word features such as capital letters [14].Other deep learning models were also applied including using Convolutional Neural Network (CNN).Wang et al.[18] used a hierarchical CNN to extract context information with a gating mechanism into the convolutional layer.Bidirectional Gated CNN were used in [19] for Chinese NER.Kosasih et al.[20] used a BGRU-CRF model for Indonesian NER.

    It can be noticed that the success of an NER model depends heavily on its input representation.Word and character embeddings are commonly used; where word-level representations encode the syntactic and semantic features of words and character-level representation helps in dealing with the Out Of Vocabulary (OOV) problem.Zirikly et al.[21] demonstrated that the embedding representation scheme can replace the use of dictionaries and gazetteers and still result in high performance even if the used training dataset is small.For word embedding, several models were used for the input representation including word2vec [22–24], Glove [25–27], Fastext [28–30], ELMo [31–33] and BERT [34–36].Whereas, for character embeddings, BLSTM and CNN were the most commonly used models [37–39].On the other hand, BERT obtained efficient embedding compared to the other models in terms of word and character representation [34,40].Different layers where added on top of BERT to fine-tune it for NER task; a comparative study demonstrated that fine-tuning BERT with a BLSTM-CRF layer outperformed a linear CRF layer [41].Another attempt proved that using a BGRU-CRF layer yielded better results [42].Yan et al.[43] added a multi-head attention layer to the BGRU-CRF layer.Straka et al.[44] combined the representation of FastText, BERT and Flair for Czech NER, which outperformed the BERT only version of their model.Li et al.[41] applied dictionary features and radical features to the fine-tuned BERT model to improve the model performance for Chinese clinical NER.

    Regarding ANER, BLSTM-CRF was also a dominating model in the literature; Gridach [45]used a BLSTM-CRF model with a BLSTM-based character-level embedding and a word2vec pre-trained word embedding to extract named entities from Arabic social media.On the other hand, Khalifa et al.[46] was the first ANER model that utilized CNN for constructing character embeddings with BLSTM-CRF and word2vec word embedding.El Bazi et al.[47] used the BLSTM-CRF model with pre-trained Fasttext word embeddings and character-based representations constructed by CNN model.Gridach et al.[48] used character-level representation constructed by BGRU model and pre-trained word2vec word representations to improve their system performance.CRF was used as the tag decoder layer instead of decoding each tag independently.To overcome the problem of lacking sufficient training data, Helwe et al.[49] used deep co-learning which is a semi-supervised learning approach that can be trained using both labeled and unlabeled data.Their model used two classifiers that learn from each other using two different views of the data.Ali et al.[50] added an attention embedding layer that combined the representation of a pre-trained word2vec model and character embeddings provided by a CNN model in order to create a good word representation.BLSTM and BGRU have been compared where BLSTM outperformed BGRU.The authors improved the model by adding a self-attention layer on the top of the encoder in order to provide high or low consideration to words based on their involvement in the creation of the sentence meaning [51].Ali et al.[52] used BLSTM as encoder and decoder model for ANER; their model outperformed the BGRU-CRF model by [48]and the BLSTM-CRF model by [53].

    Fine-tuning transfer learning has been recently investigated in ANER [54], utilized transfer learning with deep neural networks to build a Pooled-GRU model combined with the Multilingual Universal Sentence Encoder (MUSE) language model.Their model outperformed the BLSTMCRF model proposed by [55].AraBERT [56] is a BERT model trained specifically for Arabic and it was tested on ANER task.The model outperformed the BLSTM-CRF model proposed by [55]and BERT multilingual.

    3 Proposed Model Architecture

    Our proposed model consists of three main layers; the first layer produces the input representations using a BERT pre-trained language model.The second layer learns context representation from the previous component output using BGRU as a context encoder model.Finally, the tags prediction layer, takes the context representation as input and produces a sequence of tags corresponding to the input sequence.Fig.1 shows the model architecture.

    Figure 1:BERT-BGRU model architecture

    3.1 Input Representation Layer:BERT Contextual Language Model

    BERT is a contextual language model developed by Devlin et al.[10] in 2019, which is based on a multilayer bidirectional transformer encoder that takes into account both left and right contexts.BERT trained using two unsupervised tasks:the masked language model and the next sentence prediction.In the masked language model, randomly 15% of the tokens in the input sequence are masked and the model could predict the target token in a multi-layered context.Then, the final hidden vectors corresponding to the masked tokens are fed into an output Softmax over the vocabulary.The next sentence prediction aims to understand the relationship between two sentences.For every input sequence, the first token is a special classification token [CLS]and the final hidden state corresponding to this token is used as the sequence representation.BERT uses the WordPiece tokenizer by Wu et al.[57], which breaks down any word that is not found in the vocabulary into sub-words to overcome the OOV problem.The final embeddings of BERT are the sum of tokens embedding, segment embedding, and position embedding.Token embedding indicates the embedding of the current word, Segment Embedding indicates the index embedding of the sentence in which the current word is located, Position Embedding indicates the index embedding of the current word position.Two outputs are produces by BERT model; [CLS]sentence embedding vector and word-level embeddings that take context into account.

    Since BERT is more suitable for languages that have very rich morphological systems such as Arabic, we exploit AraBERTv0.1 [56] pretrained model as the input representation to the context encoder layer in our proposed ANER system.AraBERTv0.1 is trained using the BERTBASE size,which has 12 heads, 12 layers, 768 hidden units per layer, and a total of 110M parameters.The corpora used for training the model are the manually scraped Arabic news websites for articles and two publicly available large Arabic corpora:the 1.5 billion words Arabic Corpus and the Open Source International Arabic News Corpus (OSIAN).The final dataset size is 70 million sentences,corresponding to 24 GB of text.Their dataset covers news from different media in different Arab regions, and therefore can be representative of a wide range of topics discussed in the Arab world.In addition, words that include Latin characters were preserved, since it is common to mention named entities, scientific and technical terms in their original language, to avoid information loss.The final size of vocabulary was 64k tokens.

    3.2 Context Encoder Layer:BGRU Model

    The encoder is used to capture the context and order of words in the sentence and encode them into a fixed length representation.RNN is commonly used as context encoder for NER tasks; it processes the timesteps of its input sequences in order, where shuffling or reversing the timesteps can completely change the representations that the RNN extracts from the sequence.RNN becomes less accurate with long sequence data in which it is difficult for the network to memorize far away from previous time steps outputs [3].This problem is called the vanishing gradient problem.GRU is a variation of RNN that helps in handling the gradient vanishing problems and can learn long dependency input.It uses update and reset gates to control and update the cell state.The update gate controls what information from the previous time step output should pass and used as the next time step input.The reset gate controls what information from the previous time step output should be forgotten and what information is relevant to the current time step input [3,4].At time step t, the output of both reset gatertand forget gateutare calculated using their own weights matrixUr,Uu,Wr, andWu.Then, a sigmoid activation functionσis applied which works as a binary-like mask in which information with near zero values is blocked and information with values near one are allowed to pass.The output of the reset gate is used to calculate the memory contentby performing a Hadamard productwhich is an element-wise multiplication ofrtwith the value of the previous hidden stateht?1.Finally,the output of the update gateutis used to determine which information from the current memory content will be used as the new hidden statehtthat will be passed to the next unit [58].The following present the equations used to calculate the hidden statehtat time step t.

    GRU has a more simplified architecture than LSTM, which is also a variation of the standard RNN.The main difference between GRU and LSTM is that GRU has less parameters than LSTM, which makes it faster to train.In addition, LSTM shows better results with large datasets,while GRU has better results with small datasets [3].The final hidden state of a unidirectional GRU reflects more information about the end of the input sequence than its beginning.In the NER task, it is important to capture prior and posterior information; considering the two sentences:“Ford was born in Detroit” and “Ford Motors company”.The word “Ford” in the first sentence should be tagged as a person named entity while in the second sentence it should be tagged as an organization named entity.Note that the word “Ford” appears in the beginning of both sentences and the encoder will make the prediction before detecting any words in the sentence, which will cause incorrect predictions.BGRU, the bidirectional variant of GRU, solves this problem by parsing the input sequence from both directions.In fact, it has outperformed regular GRU in many NLP tasks [59].In BGRU, the forward GRU reads the entire source sequence one word at a time from left-to-right and the backward GRU reads the words from right-to-left.At time step t, the output of the forward GRUcovers the input sequence in the forward direction and the backward GRUcovers the input sequence coming in the backward direction.Then, the final hidden statehtwill be a concatenation ofandasht=[58].

    53.Miss Charlotte:The stepsisters are rarely named in any Cinderella tale. Perrault s use of a name comes from his literary embellishment of the tale and was a personal choice. The name he uses in the original French is Javotte.Return to place in story.

    3.3 Tag Prediction Layer:Dense Layer with Softmax Function

    The prediction layer produces a sequence of tags corresponding to the input sequence.At each time stept, the hidden statehtfed to a fully connected layer (dense layer) in order to get the score of each tag.The nth entry of this vector represents the score for tagkto wordn.Matrix(P) with sizek?nwill be formed wherenis the number of words in the input sequence andkis the number of possible named entity tags each word can have.Then the Softmax function turns these scores into a probability distribution.The probability of assigning a tagyto wordxis calculated by the following equation, whereYdenotes the set of all possible tags.

    The tag with highest probability at each word position is chosen such asy=argmaxP(y|x).Softmax function considers the NER task as a multi-class classification problem in which the tag for each word is independently predicted based on the context vector without considering its neighbours [2].

    4 Experiments

    Two experiments are performed to evaluate our proposed BERT-BGRU model; one with ANERCorp dataset only and the other on the merged ANERCorp and AQMAR dataset.

    4.1 Dataset

    We use two annotated datasets to train and evaluate our model, which are ANERCorp and AQMAR.ANERCorp by Benajiba et al.[60] is an Arabic named entities manually annotatedcorpusthat is freely available for research purposes.It contains 150,286 tokens and 32,114 named entities from 316 articles that were selected from different types and different newspapers in order to obtain acorpusas generalized as possible.Thecorpuswas annotated following the annotation scheme defined in the MUC-6 with IOB tagging.Words tagged with nine classes were used for the annotation:B-PERS, I-PERS, B-LOC, I-LOC, B-ORG, I-ORG, B-MISC, I-MISC, and O.Where PERS denotes a person named entity, LOC denotes a location named entity, ORG denotes an organization named entity, MISC denotes miscellaneous, which is a named entity that does not belong to any of the other classes, and O denotes other words that are not named entities.CONLL training file format [61] is used in which the file is organized in 2 columns; the first column for the words and the second one for the tags.Named entities were distributed as 39%for person, 30.4% for location, 20.6% for organization, and 10% for miscellaneous.On the other hand, AQMAR by Mohit et al.[62] is acorpusof Arabic Wikipedia articles annotated with named entity information.It contains about 3,000 sentences from 31 Arabic Wikipedia articles covering several genres including history, technology, science, and sports.ACE guidelines were followed in identifying the named entities boundaries and tagging.IOB tagging scheme was used to annotate 74,000-tokens with person, location, organization, miscellaneous and other tags.Tab.1 presents the training, validation, and testing statistics for the ANERCorp dataset.Tab.2 presents the training, validation, and testing statistics for the merged ANERCorp and AQMAR dataset.Both datasets were splitted ~80% as training dataset, ~10% as validation dataset and ~10% as testing dataset.

    Table 1:Training, validation, and testing statistics for ANERCorp dataset

    Table 2:Training, validation, and testing statistics for merged ANERCorp and AQMAR dataset

    4.2 Data Preprocessing

    Since ANERCorp and AQMAR do not follow the same tagging scheme.AQMAR dataset uses MIS and PER tags to denote person and miscellanies named entities, respectively.While ANERCorp uses MISC and PERS to denote them.Therefore, we performed the necessary preprocessing to unify the tags before merging them.In addition, and in order to achieve high accuracy results, the dataset was cleaned by removing punctuations, other special characters, and diacritics signsTo use the pre-trained BERT model, input sentences should be preprocessed in the same way BERT model was trained.Therefore, we enclosed each input sentence between[CLS] and [SEP] tokens.In addition, words in the input sentence are tokenized into list of tokens that are available in the BERT vocabulary.Out of vocabulary words are progressively split into sub-tokens.We assigned the word’s tag for the first sub-token only and we considered the rest of the sub-tokens as padding.For example, the wordis tokenized asand its given tags would be [‘O’, ‘PAD’]

    4.3 Baseline

    To evaluate our BERT-BGRU model, we compare it to five state-of-the-art deep learningbased ANER models that were recently published.The first baseline is the BRGU-Softmax model by Ali et al.[50], which used a pre-trained word2vec embedding layer, an attention embedding layer and a CNN-based character embedding layer.The second baseline is the fine-tuned pretrained AraBERT model for ANER by Antoun et al.[56].The third baseline is the BLSTM-CRF model proposed by [46] using word2vec word embedding and a CNN-based character embedding layer.The fourth baseline [51] is an improvement of [50] that used a BLSTM-Softmax model and a Self-Attention layer on the top of the encoder.Finally, the fifth baseline is a variation of [50] with a BLSTM encoder–decoder model [52].The first three baselines are trained and evaluated on ANERCorp and the rest are trained and evaluated on the merged ANERcorp and AQMAR dataset.

    4.4 Experiment Setting

    In this experiment, Pytorch API was used for model implementation and the whole experiment was run on the Google Colab platform (https://colab.research.google.com/) with a Tesla T4 GPU.We utilized the training dataset for training our model and exploited the validation dataset to choose the hyper-parameters.A maximum input sequence length is set to 100, sequences greater than this length would be truncated and sequences less than this length would be padded to obtain the same length.The GRU hidden units is set to 384 which is half of the BERT hidden units.The batch size is 32 for training and 8 for testing and validation.The Adam optimizer was used with 0.1 Warmup and 1e?4 learning rate.The Model is trained for 5 epochs with 0.2 dropout rate to avoid overfitting.

    4.5 Results

    Table 3:Results of BERT-BGRU model on ANERCorp dataset

    Two experiments are performed, one run on ANERCorp only and the other run on the merged ANERCorp and AQMAR dataset.The details of the evaluation results of the five entities identified by the BERT-BGRU model are shown in Tabs.3 and 4.‘P’, ‘R’, and ‘F’ denote precision, recall, and macro-averaged F1-score, respectively.

    Table 4:Results of BERT-BGRU model on the merged ANERCorp and AQMAR dataset

    5 Discussion

    To evaluate our results, we compare the performance of our BERT-BGRU model against the state-of-the-art baselines.To compare our work with [50,56], we assumed that the authors identified the conventional named entities categories used in the literature, which are person,location, organization and miscellaneous since this information is not specified in their paper.To compare our work to [46], we identified person, location, and organization named entities, and to compare our work to [51,52] we identified person, location, organization, and other named entities as stated by the authors.In addition, we used the dataset splits provided by the authors when available and the standard 80% as training dataset, 10% as validation dataset and 10% as testing dataset otherwise.Moreover, to compare our model with [52], we reran the model on the merged ANERCorp and AQMAR datasets that is split into 70% as training dataset, 10% as validation dataset and 20% as testing dataset.The performance of our model against the performance of the first [50] and the second [56] baselines is illustrated in Tab.5.The performance of our model compared to baseline [46] on the ANERCorp dataset is shown on Tab.6.On the other hand,Tabs.7 and 8 illustrate the performance of our model against the fourth [51] and the fifth [52]baselines, respectively, on the merged ANERCorp and AQMAR dataset.

    Table 5:Performance of our BERT-BGRU model and baselines model on ANERCorp datas et

    It can be noticed from Tab.5 that our BERT-BGRU model outperformed [50] by 3.39 F-measure points, which indicates that the context semantic representation of dynamically generated word vectors by the pre-trained BERT language model is better than the non-contextual word vectors representations in representing sentence features.Our model also outperformed [56]by 6.31 points, which reflects the benefit of transferring the knowledge represented by BERT to BGRU deep learning model to help in learning more context information.Furthermore, as shown in Tab.6, our model outperformed Khalifa et al.[46] by 6.78 F-measure points even though that they used a pretrained word2vec model with a vocabulary size of 6,261,756 words, while the pretrained BERT model used in our work have a vocabulary size of only 64k words.On the other hand, our model outperformed [51] by 1.77 F-measure points, Tab.7, and showed comparable results to [52], Tab.8, with an increase of 0.06 F-measure points.However, [51,52] applied an embedding attention layers on top of the word and character level representations in order to determine how to consolidate the information for each word, however, no such layers were applied in our model since this was already achieved by the transformer multi-head attention in the BERT model architecture.In addition, our model required 5 epochs to train while the model in [52]was trained for 20 epochs.This demonstrates the effectiveness of fine-tuning pre-trained BERT in reducing the required training time for the downstream task model compared to the needed time to train the model from scratch.

    Table 6:Performance of our BERT-BGRU model and baselines model on ANERCorp dataset across Person, Location and Organization named entities

    Table 7:Performance of our BERT-BGRU model and Ali et al.[51] model on merged ANERCorp and AQMAR dataset

    Table 8:Performance of our BERT-BGRU model and Ali et al.[52] model on merged ANERCorp and AQMAR dataset

    6 Conclusion

    In this paper, we investigate the use of contextualized embeddings methods, namely BERT.By utilizing these embeddings as input to a BGRU deep neural network.Our experiments show the effectiveness of fine-tuning pre-trained BERT language model for languages with rich morphology and low resources specifically in NER task.Furthermore, fine-tuning BERT as the underlying input representation model effectively reduces the training time of the downstream deep neural network-based model since a major part of training was already done by BERT.Moreover, BERT stacks multiple layers of attention making it able to produce rich representations without the need of adding extra embeddings attention layers to the NER model.Our proposed BERT-BGRU model outperformed the compared baselines and achieved state-of-the-art results on ANERCorp dataset and the merged ANERCorp and AQMAR dataset by achieving an F-measure values of 92.28% and 90.68%, respectively without the need of any feature engineering.Our future work will be committed to using BERT-BGRU-CRF for ANER task in addition to applying additional features such as dictionary features to improve our result.

    Acknowledgement:The authors extends their appreciation to the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University for funding and supporting this work through Graduate Students Research Support Program.

    Funding Statement:This research is funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University through the Graduate Students Research Support Program.

    Conficts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    天堂网av新在线| 十八禁网站免费在线| 久久中文字幕一级| 很黄的视频免费| 怎么达到女性高潮| 成年女人看的毛片在线观看| 在线免费观看的www视频| 我要搜黄色片| 老司机午夜福利在线观看视频| 一级毛片精品| 日韩精品青青久久久久久| 老汉色av国产亚洲站长工具| 别揉我奶头~嗯~啊~动态视频| 久久国产精品人妻蜜桃| 亚洲中文字幕一区二区三区有码在线看 | 中文字幕人妻丝袜一区二区| 亚洲精品久久国产高清桃花| 中文字幕最新亚洲高清| 国产一级毛片七仙女欲春2| 久久精品国产亚洲av香蕉五月| 少妇裸体淫交视频免费看高清| 在线免费观看不下载黄p国产 | 99精品欧美一区二区三区四区| 最近视频中文字幕2019在线8| 后天国语完整版免费观看| 级片在线观看| 成年女人看的毛片在线观看| 亚洲精品国产精品久久久不卡| 又大又爽又粗| 午夜久久久久精精品| 网址你懂的国产日韩在线| 巨乳人妻的诱惑在线观看| 观看美女的网站| 欧美黄色淫秽网站| 天天一区二区日本电影三级| 午夜a级毛片| 精品国产乱码久久久久久男人| 亚洲人成伊人成综合网2020| 久久精品亚洲精品国产色婷小说| 免费看a级黄色片| 亚洲无线在线观看| xxxwww97欧美| 怎么达到女性高潮| 久久久精品大字幕| 啪啪无遮挡十八禁网站| 18禁裸乳无遮挡免费网站照片| 国产精品美女特级片免费视频播放器 | 首页视频小说图片口味搜索| 久久午夜综合久久蜜桃| 亚洲国产欧美网| 日本熟妇午夜| 中文资源天堂在线| 欧洲精品卡2卡3卡4卡5卡区| 亚洲欧洲精品一区二区精品久久久| 成熟少妇高潮喷水视频| 成熟少妇高潮喷水视频| 99久久综合精品五月天人人| www.精华液| 成熟少妇高潮喷水视频| 极品教师在线免费播放| 亚洲aⅴ乱码一区二区在线播放| 午夜福利在线观看吧| 欧美成狂野欧美在线观看| 成人av在线播放网站| 亚洲欧美激情综合另类| 国产成人一区二区三区免费视频网站| 午夜激情欧美在线| 桃红色精品国产亚洲av| 亚洲成av人片免费观看| 免费看十八禁软件| 天天躁日日操中文字幕| 日日干狠狠操夜夜爽| 成在线人永久免费视频| 黄色视频,在线免费观看| 黄片大片在线免费观看| 欧美黄色淫秽网站| 亚洲在线观看片| 成人高潮视频无遮挡免费网站| 亚洲中文字幕日韩| 亚洲成人精品中文字幕电影| av欧美777| 超碰成人久久| 亚洲第一欧美日韩一区二区三区| 最新在线观看一区二区三区| 久久久久性生活片| 欧美黄色淫秽网站| 精品无人区乱码1区二区| 国产成人一区二区三区免费视频网站| 夜夜夜夜夜久久久久| 91麻豆av在线| av黄色大香蕉| 中文字幕av在线有码专区| 老司机深夜福利视频在线观看| 三级男女做爰猛烈吃奶摸视频| 午夜福利高清视频| 九九热线精品视视频播放| 很黄的视频免费| 国产成+人综合+亚洲专区| www日本黄色视频网| 色播亚洲综合网| 午夜成年电影在线免费观看| 免费在线观看影片大全网站| 久久精品综合一区二区三区| 99久国产av精品| h日本视频在线播放| 国产精品免费一区二区三区在线| 男人和女人高潮做爰伦理| 亚洲 欧美一区二区三区| 日韩免费av在线播放| 午夜福利免费观看在线| 成人亚洲精品av一区二区| 国产精品一区二区精品视频观看| АⅤ资源中文在线天堂| 成年女人毛片免费观看观看9| 国产毛片a区久久久久| 国产精品av视频在线免费观看| 国模一区二区三区四区视频 | 久久久成人免费电影| 高清毛片免费观看视频网站| 成人精品一区二区免费| 一区福利在线观看| 99在线视频只有这里精品首页| 亚洲中文日韩欧美视频| 亚洲精品一区av在线观看| 国产成人精品久久二区二区91| 丰满人妻熟妇乱又伦精品不卡| 国产精品亚洲一级av第二区| 噜噜噜噜噜久久久久久91| 久久久久国产一级毛片高清牌| 亚洲欧美精品综合一区二区三区| 淫秽高清视频在线观看| 国产成人啪精品午夜网站| 午夜两性在线视频| 国产精品精品国产色婷婷| 老司机福利观看| 国产精品久久久久久久电影 | 美女 人体艺术 gogo| 久久久久免费精品人妻一区二区| 国产野战对白在线观看| 18禁裸乳无遮挡免费网站照片| 国产欧美日韩精品一区二区| 丁香欧美五月| 国产淫片久久久久久久久 | 欧美丝袜亚洲另类 | 久久午夜亚洲精品久久| 亚洲专区字幕在线| 亚洲熟妇中文字幕五十中出| 人妻久久中文字幕网| 夜夜看夜夜爽夜夜摸| 国产精品日韩av在线免费观看| 国产69精品久久久久777片 | 最新在线观看一区二区三区| 成年女人看的毛片在线观看| 18禁美女被吸乳视频| 高清在线国产一区| 每晚都被弄得嗷嗷叫到高潮| 免费人成视频x8x8入口观看| 99久久精品一区二区三区| 国产成人系列免费观看| 成年人黄色毛片网站| 午夜日韩欧美国产| 99热只有精品国产| 欧美中文日本在线观看视频| 日本黄大片高清| 噜噜噜噜噜久久久久久91| 99热这里只有精品一区 | 久久久久国内视频| 色综合婷婷激情| 日韩 欧美 亚洲 中文字幕| 久久久久国内视频| 两性午夜刺激爽爽歪歪视频在线观看| 99在线人妻在线中文字幕| av片东京热男人的天堂| 久久精品91蜜桃| or卡值多少钱| 国产又黄又爽又无遮挡在线| 日韩成人在线观看一区二区三区| 麻豆成人午夜福利视频| 老司机深夜福利视频在线观看| 亚洲乱码一区二区免费版| 男女做爰动态图高潮gif福利片| 国产熟女xx| 国产熟女xx| 99热只有精品国产| 18禁国产床啪视频网站| 久久久精品欧美日韩精品| 欧美色视频一区免费| 特大巨黑吊av在线直播| 日日夜夜操网爽| 亚洲av中文字字幕乱码综合| 老司机在亚洲福利影院| 日韩中文字幕欧美一区二区| 国产三级在线视频| 岛国视频午夜一区免费看| 国产黄a三级三级三级人| 色播亚洲综合网| 亚洲人成电影免费在线| 制服人妻中文乱码| 久久久久免费精品人妻一区二区| 午夜视频精品福利| 丰满人妻一区二区三区视频av | 国产亚洲av嫩草精品影院| 91九色精品人成在线观看| 国产精品av视频在线免费观看| 可以在线观看的亚洲视频| www日本在线高清视频| 99精品欧美一区二区三区四区| 成人鲁丝片一二三区免费| 亚洲中文字幕日韩| 欧美zozozo另类| 免费观看精品视频网站| 丰满的人妻完整版| 免费看日本二区| 亚洲电影在线观看av| 欧美性猛交╳xxx乱大交人| 国产人伦9x9x在线观看| 亚洲国产看品久久| 精品一区二区三区视频在线 | 国产高清激情床上av| 操出白浆在线播放| 国产亚洲av高清不卡| 国产精品av久久久久免费| 国产av一区在线观看免费| 嫩草影视91久久| 色精品久久人妻99蜜桃| 一个人看视频在线观看www免费 | 国产成人精品无人区| 国产亚洲欧美98| 午夜久久久久精精品| 三级男女做爰猛烈吃奶摸视频| 18禁观看日本| 国产伦精品一区二区三区视频9 | 蜜桃久久精品国产亚洲av| 国产精品综合久久久久久久免费| 日本一二三区视频观看| 亚洲色图av天堂| 国产午夜精品久久久久久| 熟女少妇亚洲综合色aaa.| 午夜视频精品福利| 亚洲成av人片免费观看| 又紧又爽又黄一区二区| 91久久精品国产一区二区成人 | 欧美日韩瑟瑟在线播放| 亚洲中文av在线| 国产久久久一区二区三区| 婷婷精品国产亚洲av在线| 这个男人来自地球电影免费观看| 欧美不卡视频在线免费观看| 中文字幕熟女人妻在线| 国产伦精品一区二区三区四那| 国产免费av片在线观看野外av| 黄色片一级片一级黄色片| 三级男女做爰猛烈吃奶摸视频| 免费大片18禁| 中文字幕高清在线视频| 久久精品国产清高在天天线| 熟女少妇亚洲综合色aaa.| 日韩欧美国产一区二区入口| 精品欧美国产一区二区三| 国内精品一区二区在线观看| 香蕉丝袜av| 国产综合懂色| 色综合亚洲欧美另类图片| 女人被狂操c到高潮| 国产精品综合久久久久久久免费| 欧美乱妇无乱码| 老司机在亚洲福利影院| av视频在线观看入口| 天堂av国产一区二区熟女人妻| 国产伦精品一区二区三区视频9 | 国产探花在线观看一区二区| 18禁国产床啪视频网站| 一卡2卡三卡四卡精品乱码亚洲| 日韩人妻高清精品专区| tocl精华| 亚洲熟女毛片儿| 欧美激情在线99| 韩国av一区二区三区四区| 亚洲av美国av| 99久久成人亚洲精品观看| 亚洲成人精品中文字幕电影| 桃色一区二区三区在线观看| 亚洲精品在线观看二区| 久99久视频精品免费| 激情在线观看视频在线高清| 免费大片18禁| 国产亚洲精品一区二区www| 国产成人福利小说| 脱女人内裤的视频| 亚洲av成人av| 国内精品久久久久精免费| 婷婷六月久久综合丁香| 法律面前人人平等表现在哪些方面| 2021天堂中文幕一二区在线观| 别揉我奶头~嗯~啊~动态视频| 日本成人三级电影网站| 久久亚洲真实| 欧美激情久久久久久爽电影| 搡老妇女老女人老熟妇| 特级一级黄色大片| 国产精品女同一区二区软件 | 欧美zozozo另类| 国产午夜精品论理片| 久久精品综合一区二区三区| 国产av麻豆久久久久久久| 岛国视频午夜一区免费看| 欧美国产日韩亚洲一区| 久久99热这里只有精品18| 特大巨黑吊av在线直播| 国产一区在线观看成人免费| 女人被狂操c到高潮| 日韩欧美国产一区二区入口| 非洲黑人性xxxx精品又粗又长| 免费在线观看影片大全网站| 国产成人影院久久av| 亚洲色图av天堂| 51午夜福利影视在线观看| 欧美日本视频| 亚洲性夜色夜夜综合| 真人做人爱边吃奶动态| 亚洲av日韩精品久久久久久密| 国产在线精品亚洲第一网站| 国产精品免费一区二区三区在线| 亚洲天堂国产精品一区在线| 国产精品,欧美在线| 亚洲第一电影网av| 他把我摸到了高潮在线观看| 国产av一区在线观看免费| 最新中文字幕久久久久 | 国内少妇人妻偷人精品xxx网站 | 欧美在线黄色| 性欧美人与动物交配| 国产三级中文精品| 亚洲欧美激情综合另类| 国产三级在线视频| 狂野欧美激情性xxxx| 男人舔奶头视频| 麻豆成人av在线观看| 精品免费久久久久久久清纯| 757午夜福利合集在线观看| 黄色视频,在线免费观看| 久久久久久久久中文| 亚洲精品美女久久av网站| 久久热在线av| 欧美黑人欧美精品刺激| 怎么达到女性高潮| 成年女人永久免费观看视频| 少妇的逼水好多| 日韩三级视频一区二区三区| 亚洲 欧美一区二区三区| 一二三四社区在线视频社区8| 国产在线精品亚洲第一网站| 天堂√8在线中文| 怎么达到女性高潮| 久久久久久国产a免费观看| 日日夜夜操网爽| 久久久水蜜桃国产精品网| 免费在线观看日本一区| 亚洲熟妇中文字幕五十中出| 日日夜夜操网爽| 麻豆国产97在线/欧美| 成人一区二区视频在线观看| 久久久久久大精品| 国产av麻豆久久久久久久| 床上黄色一级片| 美女被艹到高潮喷水动态| 国产精品九九99| 日日夜夜操网爽| 国产成人福利小说| 日韩欧美在线乱码| 国内毛片毛片毛片毛片毛片| 天天一区二区日本电影三级| 欧美黑人巨大hd| 亚洲av五月六月丁香网| 国产成人精品无人区| 国产精品一区二区三区四区免费观看 | 免费搜索国产男女视频| 麻豆成人av在线观看| 18禁裸乳无遮挡免费网站照片| 女同久久另类99精品国产91| 成年人黄色毛片网站| 人妻夜夜爽99麻豆av| 国产真实乱freesex| 国产高潮美女av| 麻豆国产av国片精品| 悠悠久久av| 国产精品美女特级片免费视频播放器 | 精品久久久久久久末码| 亚洲人成网站在线播放欧美日韩| 精品欧美国产一区二区三| 麻豆久久精品国产亚洲av| 久久久色成人| 亚洲专区中文字幕在线| 欧美色欧美亚洲另类二区| 久久人人精品亚洲av| 狂野欧美激情性xxxx| 欧美在线黄色| 天堂影院成人在线观看| 亚洲国产欧洲综合997久久,| 少妇的逼水好多| 国产精品精品国产色婷婷| 亚洲专区字幕在线| 久久亚洲真实| 午夜福利在线观看免费完整高清在 | 亚洲精品美女久久久久99蜜臀| 脱女人内裤的视频| 国产精品自产拍在线观看55亚洲| 亚洲欧美日韩东京热| 两个人视频免费观看高清| 欧美日韩精品网址| 给我免费播放毛片高清在线观看| 999久久久国产精品视频| 很黄的视频免费| 亚洲自拍偷在线| 成人亚洲精品av一区二区| 久久99热这里只有精品18| 最新美女视频免费是黄的| 十八禁网站免费在线| 脱女人内裤的视频| 国产精品av久久久久免费| 熟妇人妻久久中文字幕3abv| 母亲3免费完整高清在线观看| 国产亚洲av高清不卡| 香蕉国产在线看| 999久久久国产精品视频| 亚洲成a人片在线一区二区| 美女大奶头视频| 午夜成年电影在线免费观看| 中文字幕人妻丝袜一区二区| 日韩欧美在线乱码| 国产成人精品无人区| 在线a可以看的网站| 欧美成狂野欧美在线观看| 日韩免费av在线播放| 国产视频内射| 久久这里只有精品中国| 国内精品久久久久精免费| 90打野战视频偷拍视频| 成人性生交大片免费视频hd| 好男人电影高清在线观看| 美女高潮的动态| 色噜噜av男人的天堂激情| 午夜久久久久精精品| 国产av不卡久久| 9191精品国产免费久久| 欧美激情在线99| 久久香蕉精品热| 成人av在线播放网站| 最好的美女福利视频网| 亚洲国产中文字幕在线视频| 亚洲精品美女久久久久99蜜臀| 国产成人精品久久二区二区免费| 亚洲欧美日韩卡通动漫| 亚洲电影在线观看av| 黄色片一级片一级黄色片| 亚洲国产精品sss在线观看| 午夜成年电影在线免费观看| 长腿黑丝高跟| 国产精品亚洲美女久久久| 亚洲国产精品成人综合色| 精品不卡国产一区二区三区| 一进一出抽搐动态| 日韩 欧美 亚洲 中文字幕| 久久午夜综合久久蜜桃| 综合色av麻豆| 久久久久国产精品人妻aⅴ院| 一边摸一边抽搐一进一小说| 999久久久国产精品视频| 欧美中文日本在线观看视频| 久久精品国产亚洲av香蕉五月| 亚洲在线观看片| 中文字幕熟女人妻在线| 亚洲色图av天堂| 后天国语完整版免费观看| av中文乱码字幕在线| 欧美中文综合在线视频| 长腿黑丝高跟| 日韩国内少妇激情av| 中亚洲国语对白在线视频| 色哟哟哟哟哟哟| 免费在线观看亚洲国产| 久久婷婷人人爽人人干人人爱| 欧美一级a爱片免费观看看| 成在线人永久免费视频| 三级男女做爰猛烈吃奶摸视频| 一级作爱视频免费观看| 五月伊人婷婷丁香| 99热这里只有是精品50| 国产精品女同一区二区软件 | 午夜激情欧美在线| 一区福利在线观看| 可以在线观看毛片的网站| 日本免费a在线| 亚洲人成网站在线播放欧美日韩| 99热6这里只有精品| 午夜a级毛片| 免费观看精品视频网站| 91av网站免费观看| 亚洲成人免费电影在线观看| 日本撒尿小便嘘嘘汇集6| 国产伦精品一区二区三区视频9 | 欧美不卡视频在线免费观看| 久9热在线精品视频| 亚洲美女黄片视频| 观看免费一级毛片| 国产精品av久久久久免费| 在线a可以看的网站| 三级毛片av免费| 最新中文字幕久久久久 | 国产私拍福利视频在线观看| 看黄色毛片网站| 欧美zozozo另类| 黄色 视频免费看| 久久草成人影院| 欧美黑人欧美精品刺激| 搞女人的毛片| 久久午夜亚洲精品久久| 国产成人系列免费观看| 夜夜躁狠狠躁天天躁| 在线观看美女被高潮喷水网站 | 亚洲av成人精品一区久久| 久久中文字幕人妻熟女| 成人特级黄色片久久久久久久| 99热6这里只有精品| 久久精品国产亚洲av香蕉五月| 国产欧美日韩精品一区二区| 国产精品久久久久久人妻精品电影| 日本黄大片高清| 久久精品国产99精品国产亚洲性色| 精品乱码久久久久久99久播| 成人永久免费在线观看视频| 日本一二三区视频观看| 久久国产乱子伦精品免费另类| 亚洲av电影不卡..在线观看| 免费观看的影片在线观看| 中文字幕av在线有码专区| 亚洲av电影在线进入| 黑人操中国人逼视频| 亚洲九九香蕉| 成人高潮视频无遮挡免费网站| 母亲3免费完整高清在线观看| 小说图片视频综合网站| 丝袜人妻中文字幕| 成年女人看的毛片在线观看| 99在线视频只有这里精品首页| 国内少妇人妻偷人精品xxx网站 | 亚洲色图av天堂| 精品久久久久久久末码| 全区人妻精品视频| 后天国语完整版免费观看| 国内少妇人妻偷人精品xxx网站 | 日韩欧美国产一区二区入口| 首页视频小说图片口味搜索| 99re在线观看精品视频| 色综合欧美亚洲国产小说| 久久精品国产亚洲av香蕉五月| 亚洲av中文字字幕乱码综合| 中国美女看黄片| 日本与韩国留学比较| 日韩欧美国产在线观看| 嫩草影院精品99| 亚洲 欧美 日韩 在线 免费| 国产单亲对白刺激| 九色成人免费人妻av| 国语自产精品视频在线第100页| 久久这里只有精品19| 1024香蕉在线观看| 一个人看视频在线观看www免费 | 99久久精品一区二区三区| 亚洲精华国产精华精| 人人妻,人人澡人人爽秒播| 伦理电影免费视频| 天天添夜夜摸| 999久久久精品免费观看国产| 免费看日本二区| 19禁男女啪啪无遮挡网站| 国产黄片美女视频| 午夜激情福利司机影院| 黄片小视频在线播放| 亚洲欧美日韩东京热| 国产美女午夜福利| 国产私拍福利视频在线观看| 免费在线观看视频国产中文字幕亚洲| 麻豆成人午夜福利视频| 久久99热这里只有精品18| 国产午夜精品论理片| 99在线视频只有这里精品首页| 国产高清激情床上av| 久久久久免费精品人妻一区二区| 真人一进一出gif抽搐免费| 日本 av在线| 亚洲午夜理论影院| 午夜精品在线福利| 热99在线观看视频| 韩国av一区二区三区四区| 后天国语完整版免费观看| 国产精品av视频在线免费观看| 国产亚洲精品一区二区www| 麻豆国产av国片精品| 99久久精品热视频| 国产午夜精品论理片| 亚洲国产色片| 免费av毛片视频| 在线十欧美十亚洲十日本专区| 热99re8久久精品国产| 美女高潮喷水抽搐中文字幕| 白带黄色成豆腐渣| 日本黄色片子视频| 亚洲精品国产精品久久久不卡| 国产视频一区二区在线看| 亚洲欧美精品综合久久99| 亚洲av电影在线进入| 999精品在线视频| 亚洲第一电影网av| 亚洲欧美一区二区三区黑人|