• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Machine Learning-Based Advertisement Banner Identification Technique for Effective Piracy Website Detection Process

    2022-08-24 03:28:10LelisaAdebaJilchaandJinKwak
    Computers Materials&Continua 2022年5期

    Lelisa Adeba Jilcha and Jin Kwak

    1ISAA Lab.,Department of AI Convergence Network,Ajou University,Suwon,16499,Korea

    2Department of Cyber Security,Ajou University,Suwon,16499,Korea

    Abstract: In the contemporary world,digital content that is subject to copyright is facing significant challenges against the act of copyright infringement.Billions of dollars are lost annually because of this illegal act.The current most effective trend to tackle this problem is believed to be blocking those websites, particularly through affiliated government bodies.To do so, an effective detection mechanism is a necessary first step.Some researchers have used various approaches to analyze the possible common features of suspected piracy websites.For instance, most of these websites serve online advertisement,which is considered as their main source of revenue.In addition,these advertisements have some common attributes that make them unique as compared to advertisements posted on normal or legitimate websites.They usually encompass keywords such as click-words(words that redirect to install malicious software)and frequently used words in illegal gambling,illegal sexual acts,and so on.This makes them ideal to be used as one of the key features in the process of successfully detecting websites involved in the act of copyright infringement.Research has been conducted to identify advertisements served on suspected piracy websites.However, these studies use a static approach that relies mainly on manual scanning for the aforementioned keywords.This brings with it some limitations, particularly in coping with the dynamic and ever-changing behavior of advertisements posted on these websites.Therefore,we propose a technique that can continuously fine-tune itself and is intelligent enough to effectively identify advertisement (Ad) banners extracted from suspected piracy websites.We have done this by leveraging the power of machine learning algorithms,particularly the support vector machine with the word2vec word-embedding model.After applying the proposed technique to 1015 Ad banners collected from 98 suspected piracy websites and 90 normal or legitimate websites,we were able to successfully identify Ad banners extracted from suspected piracy websites with an accuracy of 97%.We present this technique with the hope that it will be a useful tool for various effective piracy website detection approaches.To our knowledge, this is the first approach that uses machine learning to identify Ad banners served on suspected piracy websites.

    Keywords: Copyright infringement; piracy website detection; online advertisement; advertisement banners; machine learning; support vector machine;word embedding;word2vec

    1 Introduction

    In this age, following the miraculous advancement of technology, industries and individuals involved in digitally creating, connecting, and distributing works of art are earning more multidimensional benefits than ever.Nevertheless, countless piracy websites on the Internet illegally distribute digital content without the consent of copyright holders, using loopholes on the Internet[1–3].Consequently,the creative industry faces substantial challenges against illegal and unauthorized distribution of its works;this includes damaged reputation,reduced production funding,and security risks.As several piracy sites join the network every day, billions of dollars worldwide are subject to loss each year[3].

    Some tech giants are trying different approaches to weaken the piracy industry.Google,particularly through YouTube,attempts to combat infringement by providing easy access to legitimate music,videos,and other media[2,3].Although this approach is effective in the long-term efforts to weaken the entities involved in this industry,the ever-increasing piracy sites still require serious attention.On the other hand, various studies have focused on detection and blocking of websites involved in the act of copyright infringement.For the detection purpose, two approaches, internet traffic analysis and webpage feature analysis,have been utilized so far[2,4–6].Matthew et al.[6]performed website traffic analysis on legal video streaming websites such as Netflix,YouTube,and Twitch and various pirated websites.They used machine learning algorithms to detect the unique signature of both type of websites,based on network traffic features such as size and number of packets,source and destination port number,byte distribution,etc.Similarly,Kim et al.[4]and Choi et al.[5]proposed webpage feature analysis technique to automate the detection and blocking processes.The central idea of these works is to rely on scanning for the presence of features (keywords) that are related to illegal torrent sites,video streaming sites,and webtoons.However,the keyword analysis approaches used so far are mainly based on static comparisons between words extracted from intended websites and the dictionary of predefined keywords collected from suspected piracy sites.

    Most piracy sites provide online advertising services to generate revenue[2].These advertisements have some attributes that makes them unique compared to advertisements posted on normal or legitimate websites,as shown in Fig.1 below.They usually encompass click-words(words that redirect to install malicious software), words that promise a free use, and suspicious contents that have a significant degree of similarity with frequently used words and sentences in illegal gambling, illegal sexual acts,and so on[2].This makes them ideal to be used as one of the key features to successfully detect websites involved in the act of copyright infringement.

    Kim et al.[4] analyzed Ad banners extracted from various websites and proposed a technique that distinguish between Ad banners extracted from normal sites and their piracy counterparts.The authors used statical keyword comparison approach and they were successful in doing so.However,after deeply analyzing the behavior of banners posted on randomly selected suspected piracy sites,we observed that the contents of Ads on piracy sites are highly dynamic and continue to change,to avoid being detected by static detection approaches.Consequently, these techniques become less accurate over time and fail to cope with the dynamic and adaptive nature of those features.

    Figure 1:Illustration of Ads posted on suspected piracy websites

    Therefore,we propose a machine learning-based technique that can continuously fine-tune itself and is intelligent enough to effectively overcome the aforementioned challenges.We propose this technique in the hope that it will help enhance the process of effectively identifying copyright infringement sites.In this work, we used support vector machines (SVMs) as our base machine learning model, along with a pretrained word2vec vectorization model.After performing analysis on Ad banners extracted from 98 suspected piracy and 90 normal sites,we were able to successfully distinguish between both domains with a 97%accuracy.The remainder of this paper is structured as follows:In Section 2,we review related works on word embedding and classification models,focusing on the particular models we selected to use; in Section 3, we discuss the proposed method in detail;in Section 4,we evaluate the performance of our proposed technique;finally,we provide concluding remark in Section 5.

    2 Related Work

    2.1 Word Embedding Overview

    In the process of classifying text using machine learning algorithms,the most important step after data cleaning is converting the words or documents into numerical data through a vectorization or embedding process.This process allows statistical classifiers to perform the required mathematical calculations on words,such as adding,subtracting,and finding the distances between words[7].Unlike word encoding,word embedding attempts to capture the semantic,contextual,and syntactic meanings of each word in a vocabulary.It represents single words with a set of vectors that accurately groups similar words around one point in the vector space [8].We can then calculate the cosine distance between any two target words to determine their relationship.

    There are various commonly used approaches to perform this operation:frequency-based embedding approaches such as Count Vector, TF-IDF (Term Frequency—Inverse Document Frequency),and Co-Occurrence Vector;and prediction-based embedding approaches such as the continuous bag of words (CBOW) and Skip-gram [8,9].Frequency-based embedding approaches are relatively easy to work with;however,they cannot identify semantic features and preserve the relationships between words.On the other hand, prediction-based embedding approaches are neural network (NN)-based architectures, which provide state-of-the-art solutions for this problem.The CBOW and Skip-gram models are widely used architectures in many text classification problems[10].

    2.2 Vectorization Using word2vec Model

    Word2vec,in general,is a technique that combines two prediction-based algorithms:Skip-gram and CBOW,and two moderately efficient training models:hierarchical SoftMax and negative sampling[11].CBOW and Skip-gram models are mirrored versions of each other in which CBOW is trained to predict a single word from a fixed window size of context words, whereas Skip-gram performs the opposite,as illustrated in Fig.2[8].

    Figure 2:Diagram of CBOW and skip-gram models

    Unlike the CBOW, the Skip-gram is used to find a distributed word representation by learning word vectors that are good enough to predict nearby words[8].For each estimation step,the model takes one word from the input as a center word and predicts words in its context up to a certain window size.In other words,it determines the conditional probability of context words(in each window size)given the center word.This is mathematically described in Eq.(1).The probability of prediction or the average log probability is maximized by efficiently choosing the vector representation of words[11].

    Heremis the radius of the window size and for each wordt=1,...T,ωtis the center word,andωt+jis the word in the context of a window size of 2m.

    The problem with this model is that it is computationally intensive when the corpus contains a very large vocabulary.Reference[11]solved this problem using the hierarchical SoftMax model,which was then replaced by the negative sampling.Negative sampling is the process of sampling negative words(words out of context or out of the window size range)ensuring that the positive words(words in the context)are classified with a higher probability than their negative counterparts for given input words,as shown in Eq.(2).The Skip-gram with negative sub-sampling outperforms every other method and is used as part of the final word2vec model[11].

    Eq.(2)maximizes the dot product between the center wordvwIand the context wordsby having the SoftMax iterate only over the subset ofkclasses whereEwt~Pn(w)is the randomly sampled negative word.

    2.3 Support Vector Machines(SVM)

    SVM is a supervised machine learning algorithm to find a hyperplane that classifies all training vectors into two classes.During this process,each data item is plotted as a point in an n-dimensional space(where n is the number of features)with the value of each feature being a particular coordinate[12].Then,classification is performed by finding the hyperplane that differentiates the two classes very well.SVM works on vectorized data and can be used for both linear and nonlinear classification tasks such as text processing,image categorization,and hand-written recognition[13].

    In the case of linear classification,the model creates a hyperplane with two parallel planes with distancedfrom the center hyperplane in such a way that one line passes through the nearest positive point and the other passes through the nearest negative point,as shown in Fig.3[14].The points closest to the decision boundary are called support vectors.In this case,the classifier function is defined in Eq.(3).

    wherewindicates a weight vector used for defining the decision boundary,andbis a bias.Iff(x)=1,thenxbelongs to the first class; iff(x) = -1, the object belongs to the second class; andf(x) = 0 indicates an optimal hyperplane.

    Figure 3:Decision boundary and margin of SVM

    On the other hand,nonlinear classification is achieved by changing the dimensionality of the input feature space using kernel functions such as polynomial,sigmoid,and RBF(Radial Bases Function)[15].These functions are based on mapping input features to a much higher dimensional set of features.The input mapping assumes the generation of a new function,x→φ(x) which is used for kernel function computation:K(xi xj) =φ(xi)t φ(xj),the decision function is then described as follows in Eq.(4).The SVM then applies a linear classifier to those higher-dimensional features and learns the nonlinear decision boundaries.In such cases, the efficiency of the model depends on the choice of kernels[16].

    3 Proposed Technique

    Our objective was to build a machine learning model that can categorize sentences or texts extracted from Ad banners into their respective classes, class of suspected piracy sites and class of normal or legitimate websites.To do so, we proposed a machine learning-based technique that uses an SVM algorithm as a base architecture along with word2vec vectorization model.Our proposed technique goes through a three-stage process,which includes data collection and preprocessing,feature extraction(vectorization),and modeling and testing the classifier.The first process involves crawling Ad banners from the target web pages and extracting features from the Ad banners.The second process involves data cleaning and vectorization while the final process involves modeling, training,testing, and optimizing a statistical classifier.Fig.4 illustrates the process of the proposed method.The detailed description of each stage is described in the following subsections.

    Figure 4:Overall process of the proposed technique

    3.1 Data Collection

    We collected data from 98 suspected piracy sites and 90 normal or legitimate websites.We used Alexa Rank and the Google Transparency Report as our main sources of information to nominate websites for data collection.Alexa is a web traffic analysis company that provides information about the popularity index for various websites[17].We selected top-rated websites based on the information acquired from this website.On the other hand,the Google Transparency Report provides information related to the involvement of websites in illegal acts such as copyright infringement[18].Similarly,we identified suspected piracy sites based on this information.

    The data collection process involved three steps.The first step was identifying and crawling Ad banners from the target website by manually analyzing or inspecting the corresponding hypertext markup language (HTML) code.After confirming the existence of Ad banners, we extracted those banners using the Selenium web crawler tool based on their unique URL(unified resource locator).Selenium is an open-source tool that help automating the web crawling.The second step was extracting any text from the Ad banners using the Google Vision API (Application Peripheral Interface), as described in the Section 3.1.1.Finally,the third step involved preparing a dataset for further processing and classification operations.We were able to collect 1015 Ad banners from 188 websites, as given below in Tab.1.

    Table 1:Number and type of websites used

    3.1.1 Text Extraction and Dataset Preparation

    As mentioned in Section 1,the main purpose of this work was to identify Ad banners posted on suspected piracy sites based on the words or sentences that appear on them.However, Ad banners usually integrated in websites in the form of image.Consequently, we used an optical character recognition (OCR) tool, namely the Google Vision API, to recognize and extract characters from the Ad banners.The Vision API is a machine learning tool that helps detect and extract text from images[19].The return value from requesting this operation is in JSON(JavaScript Object Notation)format and includes the entire extracted string as well as individual words and their bounding boxes.Because we were interested only in individual words rather than structured sentences,we filtered out the values for the bounding box and used the rest as they were.Finally,we labeled the data per their corresponding domain(“1”for data that comes from normal or legitimate sites and“0”for the data that comes from the suspected piracy sites)and stored them in a CSV(Comma-separated values)file for further data processing operations.

    3.2 Preprocessing and Vectorization

    3.2.1 Preprocessing

    In this stage,we first cleaned our data and then prepared a vector representation for each word in the dataset to make it ready for further classification operations.Text or sentences in various Ad banners usually includes more special characters such as“!@#$%&()*”,and misspelled words.Further,it contains fewer stop words such as“the,”“a,”“in,”“about,”hyperlinks,emails,and URLs.However,we removed all these attributes as they carry little or no useful information that can impact the decision of our machine learning model;in fact,they can reduce the quality of the classifier output.Removal of URLs and special characters was achieved through pattern matching with the help of the Python Regular Expression(RE)library.Similarly,removal of the stop words was carried out using the“NLTK(Natural Language Toolkit)”Python library.NLTK is a popular and widely used Python library for NLP(natural language processing).Finally,the dataset was split into training(75%)and test(25%)groups.Fig.5 shows a partial appearance of the dataset after the data cleaning operation was performed.

    Figure 5:Portion of the dataset after data cleaning operation

    3.2.2 Vectorization

    As described in Section 2.1, before applying the machine learning model to text documents, we first need to turn the text content into numerical feature vectors.For this purpose, after comparing most of the existing vectorization models,we found that the word2vec model is robust and outperforms other vectorization models in our context.Using this method,we tested two approaches:one was to train the word2vec vectorization model from scratch based on our dataset,and the other was to use pretrained vectorization models.We achieved a better result using the latter approach.The details are discussed in Section 4.2.

    3.3 Classification

    After formatting and preparing the dataset for further classification operation, the next step was to choose, model, and train the machine learning classifier until it was efficient enough to distinguish between the two classes (suspected piracy and legitimate sites).As we already had a labeled dataset,we needed a supervised machine learning model for efficient classification operation.Additionally, because our target was to perform classification, we required a model that could provide a deterministic output(such as deep learning,SVM,and na?ve Bayes)rather than one with a probabilistic output.Therefore,we needed to carefully choose between these models to obtain a better result.

    3.3.1 Selecting Classification Algorithm

    We considered some conditions for the effective selection of our machine learning-based classifier models.The first condition was the size of the dataset.As described in Section 3.1, our dataset comprised sentences and words extracted from Ad banners,which limited its size.On the other hand,after deeply analyzing the nature of our dataset,we noted that there was a probability of overlapping between the two classes.For instance, a portion of sentences related to shopping advertisements sometimes had a probability of being served on both website classes;therefore,owing to the size and nature of our dataset,we chose SVM as the classifier model.Compared to the deep learning models,SVM works well when the dataset is relatively small;at times,it worked better than the neural networkbased models[15].Additionally,compared to the na?ve Bayes model,SVM tries to find the best margin that separates the two classes and thus better reduces the risk of error due to overlapping[20].

    3.3.2 Modeling and Training the Classifier

    We used the Scikit-Learn (sklearn) machine learning library from the Python programming language and built our model using SVM.SVMs with various classification,regression,and clustering algorithms are included in this library with all their respective features[21].This library greatly reduces the burden of modeling the classifier.We imported the support vector classifier(SVC)from the SVM module and initialized it with a random state of its parameters,such as regularization(C),gamma,and kernel function.The trick was to tune these parameters until we found the best classifier that could optimally categorize sentences or texts extracted from Ad banners into their respective classes.The regularization parameter is used to set the degree of ignorance for each misclassified training example,while the gamma parameter defines the influence of each training example on drawing the decision boundary [15].On the other hand, the kernel function transforms the decision surface into a linear equation in a higher number of dimension spaces whenever there is nonlinearity in the distribution of the training dataset[15].

    After deeply analyzing and visualizing the nature of our dataset, we observed that it was easily separable with a linear function and a small portion of its training elements would cross the decision boundary.Hence, we chose to use a linear kernel function with an optimally lower value for the regularization parameter and the default value for the gamma parameter.However,we found a better result after optimizing our parameters using a cross-validation function called “GridSearchCV( )”from the sklearn library.This is a function that helps to loop through predefined hyperparameters and optimally fit our model on the training dataset.We further examine this process in Section 4.3.

    4 Evaluation of the Proposed Technique

    4.1 Implementation and Evaluation Environment

    As we used a pretrained vectorization model, we needed our computational environment to be sufficiently fast and capable of accommodating all important data.Therefore, we set up the testing environment,as shown in Tab.2.

    Table 2:Implimentation environment setup

    4.2 Vectorizer Evaluation

    As mentioned in Section 3.2,we used the word2vec model to vectorize our preprocessed dataset.However, owing to the size and nature of our dataset, we tested two approaches to find the best vectorization model that could efficiently distribute our dataset with an optimal cosine distance between each word.The first approach was to build the vectorization model from scratch, and the second was to use pretrained models that have been trained on a sufficiently large corpus.The second approach showed better results for successfully vectorizing our dataset.

    In the first approach,we built our word2vec model using“Gensim”,which is an NLP library for Python that comes with word2vec class in it.We loaded our dataset and performed the preprocessing operation after installing all required libraries such as gensim and python-levenshtein on Jupyter notebook.Then,we built our word2vec model using a Skip-gram with a window size of 5,a minimum word count of 2(owing to the smaller size of words per Ad banner),and 4 workers(number of CPU threads used in training the model).Finally,we prepared a vocabulary of unique words and trained our model for 10 epochs.

    After the training phase was completed,we evaluated the performance of our model by finding the closest words for a randomly selected word from the dataset.However,we observed an embarrassing result,as shown in Fig.6.This may be due to the relatively small data size compared to what is required to successfully train such models.In addition,we assumed that this might be due to the type and nature of texts,that is,unstructured and less correlated,used on various Ad banners.Therefore,we decided to test using a pretrained vectorization model that has been trained on a large dataset with sufficient vector dimensionality and an associated data domain.

    Figure 6:Vectorization result using word2vec model trained on our dataset

    In the second approach, we used a pretrained Google word2vec vectorization model, which is trained on a part of the Google news dataset containing approximately 100 billion words.The training is performed with a Skip-gram of window size 10 as the base architecture, uses negative sampling as the training algorithm,and contains 300-dimensional vectors for 3 million words and phrases.To accommodate the 3 million words(4 bytes per word),each represented by 300-dimensional vectors,it requires a RAM with at least 3.6 GB of free space(3 M×300×4 bytes=~3.6 GB).To be safe,we set up our environment using 12 GB RAM with an Intel(R)Core(TM)i7-4790K CPU@4.00 GHz×64-based processor.The vectorization results were satisfactory,as shown in Fig.7.

    However, there was still a gap in calculating the cosine distance between words in our corpus.For instance, as shown in Fig.8, the cosine distance between the word “betting” and “gambling”is 0.6097108, which is somewhat unsatisfactory.Hence, we used “Spacy”, which is an NLP library for Python and provides various pretrained models in different languages with different sizes.We used a model trained on written blogs,news,and comments in English,and we were able to obtain a satisfactory result,as shown in Fig.9.The portion of our vectorized dataset is shown in Fig.10.

    Figure 7:Vectorization result using google pretrained word2vec model

    Figure 8:Cosine distance between 2 words using google pretrained word2vec model

    Figure 9:Cosine distance between 2 words using pretrained model in spacy

    4.3 Classifier Evaluation

    We tested the performance of the SVM model using various combinations of its parameter values.The test was performed on 25% of the dataset.We achieved an accuracy of 97% through the optimization process.On the other hand, the worst performance was recorded when using the SVM with the sigmoid kernel function.The overall accuracy of this approach is 86%.It was the lowest compared to 95%, 95%, and 97% using methods 1, 2, and 3, respectively.The performance matrices were used are accuracy,precision,recall,and f1-score.The formulae used to calculate each performance matrix are given in Eqs.(5)–(8),respectively.

    Figure 10:Vector representation of our dataset using pretrained model in spacy

    4.3.1 Method 1:Using the Default SVM Configuration

    Here, we used the SVM with a default value for each parameter (C=1, gamma = scale, kernel= rbf).The model was efficient enough to classify both domains, as shown in Fig.11.The test was performed on 254 data elements,127 from each class.An overall accuracy of 95%was achieved using this method.The results of measuring accuracy,precision,recall,and f1 score are given in Tab.3.

    Figure 11:Confusion matrix using SVC with default configuration

    Table 3:Overall performance of our SVM model with default SVC configuration

    4.3.2 Method 2:Using SVM with Polynomial Kernel

    Using this approach, we modeled our classifier with a polynomial (poly) kernel function while taking the default values for the remaining parameters.This test was also performed on 254 data elements with 127 elements from each class.The overall result was quite similar to that obtained using method 1,as shown in Fig.12.The results of measuring accuracy,precision,recall,and f1 score are given in Tab.4.

    Figure 12:Confusion matrix after using SVC with polynomial kernel

    Table 4:Overall performance of our SVM model with polynomial kernel

    4.3.3 Method 3:Using Optimization Function

    Here,we used the cross-validation function called“GridSearchCV()”from the sklearn library to help our model optimally fit our training dataset.This was achieved by looping through predefined hyperparameters and obtaining an optimal value for each parameter.Our hyperparameters were{‘C’:[0.5, 1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100], ‘gamma’:[‘scale’, 1, 0.1, 0.01, 0.001, 0.0001], ‘kernel’:[‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’]} with the scoring matrix set to be “accuracy”.The size of the test dataset remained the same as that of the above methods.The outcome was more satisfactory compared to the aforementioned approaches,as shown in Fig.13.The model was slightly better at understanding data from suspected piracy sites.Furthermore, 97% of the overall accuracy was achieved using this approach.The results of measuring accuracy,precision,recall,and f1 score are given in Tab.5.

    Figure 13:Confusion matrix after using optimization function

    Table 5:Overall performance of our SVM model with optimization function

    The graph,as shown in the Fig.14,summarizes comparison of the accuracy,precision,recall,and F1-score values when using the SVM classifier with different parameter configurations.

    5 Discussions

    In general,in the automated detection of suspected piracy websites,two approaches have been utilized so far,internet traffic analysis and webpage feature analysis[2,4–6].However,as for the dynamic and adaptive nature of those websites,it is important to use these two approaches in aggregation for a better outcome.In this study,we conducted our research focusing on the webpage feature analysis approach, particularly analysis of Ad banners posted on various pirated and non-pirated websites.Even though there are some previous studies in this regard,none of them have used machine learning techniques.Our experimental result, conducted on data collected from various websites over time,showed the promise of using machine learning technique to overcome the aforementioned challenges.The limitation with this study,however,is that the experimental analysis was performed on a relatively smaller dataset.We believe that having a larger dataset will help improve the outcome in two ways.The first is to improve the representation of our data set in vector space,and the second is to enhance our classifier model by allowing us to apply more complex deep learning algorithms such as LSTM(long-short term memory)and BIRT(bidirectional encoder representations).

    Figure 14:Summary of the experimental results

    6 Conclusions

    After analyzing the negative impact and dynamic nature of websites that are involved in the act of copyright infringement, we proposed in this paper a machine learning-based technique that can intelligently analyze and extract meaningful features from Ad banners and effectively categorize those Ad banners to determine whether they belong to suspected piracy sites.Due to the size and nature of our dataset, we chose SVM as our classifier model.As we already had a labeled dataset, we needed a supervised machine learning model for efficient classification operation.Again, compared to the deep learning models, SVM works well when the dataset is relatively small.Additionally, compared to the na?ve Bayes model,SVM tries to find the best margin that separates the two classes and thus better reduces the risk of error due to domain overlapping.In general,the proposed technique involves three major steps.The first step is data collection,which includes the crawling of Ad banners from the target webpages and extraction of words or sentences existing on the Ad banners.The second step is data processing and feature extraction,which includes data cleaning using preprocessing operations and applying the word2vec algorithm for vectorization.The final step is classification,which includes building, training, and testing a classifier model that can classify between Ad banners collected from suspected piracy sites and those from normal or legitimate sites.Performance was evaluated by applying the proposed technique to 1015 Ad banners collected from 98 suspected piracy websites and 90 normal or legitimate websites.We used various approaches to model our SVM classifier and finally achieved an accuracy of 97%using an SVM with optimization function.We believe that this work will be very useful as an input for effective piracy site detection processes.In the future,we plan to collect more dataset and apply various advanced deep learning models to enhance the research work.

    Funding Statement:This research project was supported by the Ministry of Culture, Sports, and Tourism(MCST)and the Korea Copyright Commission in 2021(2019-PF-9500).

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    久久99蜜桃精品久久| 最近2019中文字幕mv第一页| 少妇被粗大猛烈的视频| a级毛色黄片| 欧美bdsm另类| av在线亚洲专区| 熟女人妻精品中文字幕| 久久人人爽人人爽人人片va| 99视频精品全部免费 在线| 女的被弄到高潮叫床怎么办| 中国国产av一级| 日产精品乱码卡一卡2卡三| 99热这里只有是精品50| 亚洲性久久影院| 一级毛片我不卡| 少妇人妻久久综合中文| 国产精品一区二区在线观看99| 干丝袜人妻中文字幕| 深爱激情五月婷婷| 成人国产麻豆网| 少妇高潮的动态图| 啦啦啦啦在线视频资源| videossex国产| 国产高清国产精品国产三级 | 丝袜喷水一区| 成年人午夜在线观看视频| 国产男女内射视频| 亚洲av成人精品一二三区| 麻豆久久精品国产亚洲av| 久久精品国产亚洲av涩爱| 黄色欧美视频在线观看| 超碰97精品在线观看| 在线观看免费高清a一片| 亚洲av男天堂| 汤姆久久久久久久影院中文字幕| 免费av观看视频| 又黄又爽又刺激的免费视频.| 黄片无遮挡物在线观看| 91狼人影院| 亚洲天堂国产精品一区在线| 一本一本综合久久| 欧美日韩视频高清一区二区三区二| 搡女人真爽免费视频火全软件| 精品国产一区二区三区久久久樱花 | 能在线免费看毛片的网站| 久久久久精品久久久久真实原创| 熟女电影av网| 国产综合懂色| 日韩亚洲欧美综合| 亚洲欧洲日产国产| 精品人妻视频免费看| 五月开心婷婷网| 高清在线视频一区二区三区| 精品一区二区三区视频在线| 亚洲精品乱久久久久久| av免费在线看不卡| 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 国产探花在线观看一区二区| 啦啦啦啦在线视频资源| 亚洲aⅴ乱码一区二区在线播放| 最近最新中文字幕免费大全7| 免费看a级黄色片| 看十八女毛片水多多多| 老师上课跳d突然被开到最大视频| 五月天丁香电影| 在线观看美女被高潮喷水网站| 亚洲天堂av无毛| 女人久久www免费人成看片| 国产日韩欧美亚洲二区| 乱系列少妇在线播放| 久久精品久久精品一区二区三区| 啦啦啦在线观看免费高清www| 亚洲精品成人久久久久久| 亚洲电影在线观看av| 精品少妇黑人巨大在线播放| 亚洲三级黄色毛片| 欧美成人精品欧美一级黄| 国产亚洲av嫩草精品影院| 亚洲欧洲日产国产| 久久久久久伊人网av| 中文欧美无线码| 国内精品美女久久久久久| 国产精品一区二区三区四区免费观看| 日韩一区二区视频免费看| 国产日韩欧美亚洲二区| 18禁动态无遮挡网站| 秋霞在线观看毛片| 日韩亚洲欧美综合| 91在线精品国自产拍蜜月| 综合色av麻豆| 爱豆传媒免费全集在线观看| 日本与韩国留学比较| 精品熟女少妇av免费看| 另类亚洲欧美激情| 人妻夜夜爽99麻豆av| 日本与韩国留学比较| 亚洲欧美一区二区三区国产| 黄色一级大片看看| 久久久久国产精品人妻一区二区| 国精品久久久久久国模美| 高清av免费在线| 99久久中文字幕三级久久日本| 亚洲自拍偷在线| 亚洲av欧美aⅴ国产| 熟女电影av网| 国产探花极品一区二区| 美女被艹到高潮喷水动态| 观看免费一级毛片| 亚洲精品乱码久久久v下载方式| 免费在线观看成人毛片| 亚洲成人久久爱视频| 午夜老司机福利剧场| 久久精品久久精品一区二区三区| a级毛片免费高清观看在线播放| 亚洲av福利一区| 成人特级av手机在线观看| 国产亚洲最大av| 成人国产麻豆网| 亚洲熟女精品中文字幕| 神马国产精品三级电影在线观看| 边亲边吃奶的免费视频| 一个人看的www免费观看视频| 日韩在线高清观看一区二区三区| 亚洲精品国产av蜜桃| 国产片特级美女逼逼视频| 国产在视频线精品| 国产综合精华液| 亚洲最大成人av| 国产成人福利小说| 亚洲一级一片aⅴ在线观看| 欧美bdsm另类| 只有这里有精品99| 国产爱豆传媒在线观看| 精品酒店卫生间| 黄色配什么色好看| 一个人看视频在线观看www免费| 美女被艹到高潮喷水动态| 国产大屁股一区二区在线视频| 国产爽快片一区二区三区| 国产精品一二三区在线看| 亚洲av免费高清在线观看| 在线亚洲精品国产二区图片欧美 | 国产成人免费观看mmmm| 国产精品偷伦视频观看了| 在线观看av片永久免费下载| 亚洲av日韩在线播放| 寂寞人妻少妇视频99o| 天堂中文最新版在线下载 | 亚洲av欧美aⅴ国产| 亚洲欧美精品自产自拍| 你懂的网址亚洲精品在线观看| 免费av不卡在线播放| 国产欧美亚洲国产| 亚洲精品视频女| 欧美丝袜亚洲另类| 国产精品熟女久久久久浪| 欧美成人一区二区免费高清观看| 国产伦精品一区二区三区视频9| 建设人人有责人人尽责人人享有的 | 亚洲人与动物交配视频| 噜噜噜噜噜久久久久久91| 少妇的逼好多水| 高清日韩中文字幕在线| freevideosex欧美| 亚洲av日韩在线播放| 美女脱内裤让男人舔精品视频| 免费看光身美女| 亚洲精品一二三| 人人妻人人澡人人爽人人夜夜| 国产乱人视频| 亚洲欧美精品自产自拍| 欧美日本视频| 80岁老熟妇乱子伦牲交| 97在线人人人人妻| 成人二区视频| 三级国产精品片| 黑人高潮一二区| 国产免费一级a男人的天堂| 小蜜桃在线观看免费完整版高清| 国产高清三级在线| 成人亚洲精品av一区二区| 热re99久久精品国产66热6| 精品一区二区三区视频在线| 99精国产麻豆久久婷婷| 人人妻人人澡人人爽人人夜夜| 国产成人精品福利久久| 日韩国内少妇激情av| 久久久久久久久久久免费av| 欧美区成人在线视频| 久久女婷五月综合色啪小说 | 国产女主播在线喷水免费视频网站| 寂寞人妻少妇视频99o| 亚洲国产高清在线一区二区三| 色网站视频免费| 熟女电影av网| 91久久精品电影网| 国产永久视频网站| 亚洲精品,欧美精品| eeuss影院久久| 99热网站在线观看| 日韩大片免费观看网站| 我的女老师完整版在线观看| 国产成年人精品一区二区| 久久久国产一区二区| 欧美少妇被猛烈插入视频| 美女xxoo啪啪120秒动态图| 嫩草影院精品99| 久久久精品欧美日韩精品| 五月天丁香电影| 国产高清三级在线| 午夜免费观看性视频| 91精品一卡2卡3卡4卡| 国产成人freesex在线| 寂寞人妻少妇视频99o| 人妻夜夜爽99麻豆av| av.在线天堂| 99视频精品全部免费 在线| 三级经典国产精品| 国产高清不卡午夜福利| 国产精品.久久久| 在线精品无人区一区二区三 | 黄色视频在线播放观看不卡| 18禁在线无遮挡免费观看视频| 国模一区二区三区四区视频| 国产亚洲精品久久久com| 欧美zozozo另类| 搡女人真爽免费视频火全软件| 少妇被粗大猛烈的视频| 亚洲精品日韩av片在线观看| 不卡视频在线观看欧美| 欧美一级a爱片免费观看看| 亚洲国产成人一精品久久久| 亚洲精品乱码久久久v下载方式| 日日啪夜夜爽| 亚洲国产精品国产精品| 别揉我奶头 嗯啊视频| 狠狠精品人妻久久久久久综合| 女的被弄到高潮叫床怎么办| 亚洲av不卡在线观看| 特大巨黑吊av在线直播| 最近最新中文字幕大全电影3| 80岁老熟妇乱子伦牲交| 成年版毛片免费区| 在线天堂最新版资源| 日韩一本色道免费dvd| 哪个播放器可以免费观看大片| 99久久精品热视频| 一级毛片 在线播放| 日韩av免费高清视频| 亚洲av福利一区| 一级a做视频免费观看| 一级二级三级毛片免费看| 亚洲aⅴ乱码一区二区在线播放| 男人和女人高潮做爰伦理| 少妇熟女欧美另类| 亚洲精品国产av蜜桃| 欧美一级a爱片免费观看看| 亚洲性久久影院| 小蜜桃在线观看免费完整版高清| 91狼人影院| 成年女人看的毛片在线观看| 欧美亚洲 丝袜 人妻 在线| 精品国产乱码久久久久久小说| 丰满人妻一区二区三区视频av| 亚洲成色77777| 亚洲色图av天堂| 两个人的视频大全免费| 亚洲欧美日韩无卡精品| 中国三级夫妇交换| 校园人妻丝袜中文字幕| 国模一区二区三区四区视频| 久久人人爽人人片av| 精品人妻一区二区三区麻豆| 国产伦理片在线播放av一区| 色综合色国产| 婷婷色麻豆天堂久久| 99久久中文字幕三级久久日本| 亚洲av一区综合| 日本一二三区视频观看| 亚洲国产成人一精品久久久| 日韩一区二区视频免费看| 国产精品一区二区三区四区免费观看| 男人舔奶头视频| 国产探花极品一区二区| 精品熟女少妇av免费看| 精品国产乱码久久久久久小说| 激情五月婷婷亚洲| 最近手机中文字幕大全| 亚洲av不卡在线观看| 可以在线观看毛片的网站| 三级经典国产精品| 2021天堂中文幕一二区在线观| 啦啦啦在线观看免费高清www| 亚洲色图综合在线观看| 99热全是精品| 青春草视频在线免费观看| 国产成人精品福利久久| 一区二区三区精品91| 久久人人爽人人爽人人片va| 欧美成人a在线观看| 婷婷色av中文字幕| 人人妻人人看人人澡| 全区人妻精品视频| 亚洲美女视频黄频| 精品少妇久久久久久888优播| 午夜激情福利司机影院| 联通29元200g的流量卡| 肉色欧美久久久久久久蜜桃 | 欧美日韩国产mv在线观看视频 | 精品一区二区三区视频在线| 又爽又黄a免费视频| 精品酒店卫生间| 黑人高潮一二区| 亚洲va在线va天堂va国产| 午夜精品国产一区二区电影 | 亚洲美女视频黄频| 欧美zozozo另类| 最近最新中文字幕大全电影3| 日本熟妇午夜| 黄片wwwwww| 日韩伦理黄色片| 久久久国产一区二区| 少妇猛男粗大的猛烈进出视频 | 亚洲最大成人中文| 国产乱人视频| 国产欧美另类精品又又久久亚洲欧美| 亚洲精品乱久久久久久| 女人十人毛片免费观看3o分钟| 亚洲精品日本国产第一区| 久久久久久久国产电影| 欧美一级a爱片免费观看看| 亚洲经典国产精华液单| 国产精品福利在线免费观看| 国产成人精品婷婷| 男男h啪啪无遮挡| 久久精品国产亚洲网站| 尤物成人国产欧美一区二区三区| 国产成人一区二区在线| 婷婷色麻豆天堂久久| 国产黄a三级三级三级人| 91在线精品国自产拍蜜月| 国产乱来视频区| 国产亚洲av片在线观看秒播厂| 欧美人与善性xxx| 精品国产露脸久久av麻豆| 久久精品国产自在天天线| 中国国产av一级| 国产人妻一区二区三区在| 涩涩av久久男人的天堂| 少妇高潮的动态图| 欧美日韩在线观看h| 久久久久久国产a免费观看| 日本一二三区视频观看| 丝瓜视频免费看黄片| 国产黄片美女视频| 在线观看一区二区三区| 69av精品久久久久久| 一级av片app| 看黄色毛片网站| av网站免费在线观看视频| 我的女老师完整版在线观看| 性插视频无遮挡在线免费观看| 久久99热6这里只有精品| 人人妻人人看人人澡| 男女啪啪激烈高潮av片| 久久久久久国产a免费观看| 精品一区二区三区视频在线| 中文在线观看免费www的网站| av天堂中文字幕网| 亚洲av成人精品一二三区| 久热久热在线精品观看| 久久亚洲国产成人精品v| 只有这里有精品99| 搡老乐熟女国产| 少妇 在线观看| 小蜜桃在线观看免费完整版高清| 国产爱豆传媒在线观看| 亚洲精品色激情综合| 久久久久九九精品影院| 伊人久久国产一区二区| 白带黄色成豆腐渣| 男女下面进入的视频免费午夜| 久久精品久久精品一区二区三区| 国产精品伦人一区二区| 久久久久久久久久成人| 亚洲av国产av综合av卡| 舔av片在线| 女人被狂操c到高潮| 国产探花极品一区二区| 免费高清在线观看视频在线观看| 成人漫画全彩无遮挡| 久久精品久久久久久久性| 亚洲无线观看免费| 日韩中字成人| 大片免费播放器 马上看| 欧美成人a在线观看| 天堂俺去俺来也www色官网| 91精品一卡2卡3卡4卡| 男女边摸边吃奶| av在线天堂中文字幕| 天堂俺去俺来也www色官网| a级一级毛片免费在线观看| 成年女人看的毛片在线观看| 又大又黄又爽视频免费| 国产免费视频播放在线视频| 熟妇人妻不卡中文字幕| 亚洲成色77777| 少妇人妻久久综合中文| 性色avwww在线观看| 一级爰片在线观看| av在线蜜桃| 免费不卡的大黄色大毛片视频在线观看| 高清欧美精品videossex| 日韩视频在线欧美| 午夜福利高清视频| 欧美性感艳星| 成人鲁丝片一二三区免费| 免费看光身美女| 少妇人妻久久综合中文| 亚洲婷婷狠狠爱综合网| 免费黄网站久久成人精品| 成人亚洲精品一区在线观看 | 日日啪夜夜爽| 黄色一级大片看看| 久久精品国产自在天天线| 久久97久久精品| 亚洲精品乱码久久久v下载方式| 亚洲av中文av极速乱| 少妇猛男粗大的猛烈进出视频 | 日韩av不卡免费在线播放| 亚洲美女视频黄频| 在线观看免费高清a一片| 三级国产精品片| 一级爰片在线观看| 汤姆久久久久久久影院中文字幕| 大片免费播放器 马上看| 国产亚洲av片在线观看秒播厂| 国产精品一区二区性色av| 成人漫画全彩无遮挡| 日本三级黄在线观看| 天堂中文最新版在线下载 | 丝袜美腿在线中文| 中文字幕免费在线视频6| 欧美最新免费一区二区三区| 免费不卡的大黄色大毛片视频在线观看| 久久精品夜色国产| 伦理电影大哥的女人| 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 美女内射精品一级片tv| 久久久久性生活片| 成年版毛片免费区| 国产成人精品婷婷| 人妻系列 视频| 国产白丝娇喘喷水9色精品| 国产成人精品婷婷| 如何舔出高潮| 国产午夜福利久久久久久| 欧美性感艳星| 老师上课跳d突然被开到最大视频| 国产高潮美女av| 少妇裸体淫交视频免费看高清| 成年av动漫网址| 在线看a的网站| 视频中文字幕在线观看| 亚洲精品国产av蜜桃| 国内精品美女久久久久久| 免费大片18禁| 中文字幕人妻熟人妻熟丝袜美| 中文字幕av成人在线电影| 免费观看无遮挡的男女| 国产探花在线观看一区二区| 成人黄色视频免费在线看| 日韩亚洲欧美综合| 久久精品国产亚洲av涩爱| 一级毛片 在线播放| 99热全是精品| av天堂中文字幕网| 噜噜噜噜噜久久久久久91| 高清毛片免费看| 久久久亚洲精品成人影院| 久久6这里有精品| 成人一区二区视频在线观看| 婷婷色综合大香蕉| 国产人妻一区二区三区在| 亚洲色图av天堂| 久久精品综合一区二区三区| 国产精品久久久久久久久免| 夜夜看夜夜爽夜夜摸| 成人漫画全彩无遮挡| 真实男女啪啪啪动态图| 日韩欧美精品v在线| 毛片女人毛片| 免费看光身美女| 一级a做视频免费观看| 中文字幕制服av| av网站免费在线观看视频| 91久久精品国产一区二区成人| 成年女人在线观看亚洲视频 | 亚洲成人久久爱视频| 成年版毛片免费区| 97在线视频观看| 国产亚洲av片在线观看秒播厂| 婷婷色麻豆天堂久久| 精品久久久精品久久久| 在线播放无遮挡| 亚洲精品乱久久久久久| 特大巨黑吊av在线直播| 亚洲av欧美aⅴ国产| 嫩草影院入口| 高清视频免费观看一区二区| 丰满人妻一区二区三区视频av| 一级黄片播放器| 黄色日韩在线| 男女国产视频网站| 真实男女啪啪啪动态图| 欧美日韩在线观看h| 亚洲综合精品二区| 国产精品不卡视频一区二区| 99久久中文字幕三级久久日本| 亚洲精品中文字幕在线视频 | 日本色播在线视频| 春色校园在线视频观看| 精品少妇黑人巨大在线播放| 九色成人免费人妻av| 成年av动漫网址| 国产精品偷伦视频观看了| 亚洲最大成人手机在线| 色综合色国产| 嫩草影院精品99| 午夜精品一区二区三区免费看| 日韩精品有码人妻一区| 日韩 亚洲 欧美在线| 欧美最新免费一区二区三区| 久久久精品欧美日韩精品| 日日啪夜夜爽| eeuss影院久久| 久久久久国产精品人妻一区二区| 国产老妇女一区| 听说在线观看完整版免费高清| 国产色婷婷99| 久久久久久久久大av| 晚上一个人看的免费电影| 91久久精品电影网| 一本久久精品| 久久鲁丝午夜福利片| 日韩强制内射视频| 最近最新中文字幕大全电影3| 黄片wwwwww| 亚洲人与动物交配视频| 国产男女超爽视频在线观看| 少妇裸体淫交视频免费看高清| 国产探花在线观看一区二区| 菩萨蛮人人尽说江南好唐韦庄| 性色av一级| 亚洲av中文av极速乱| 国产黄色视频一区二区在线观看| 亚洲第一区二区三区不卡| 大片免费播放器 马上看| 国产成人精品福利久久| 日本爱情动作片www.在线观看| 免费看不卡的av| 国产亚洲午夜精品一区二区久久 | 高清日韩中文字幕在线| av线在线观看网站| 看非洲黑人一级黄片| 国产精品伦人一区二区| 国产精品国产三级专区第一集| 男人和女人高潮做爰伦理| 亚洲国产高清在线一区二区三| av国产久精品久网站免费入址| 三级经典国产精品| 国产成人免费无遮挡视频| 免费看光身美女| 校园人妻丝袜中文字幕| 另类亚洲欧美激情| 国国产精品蜜臀av免费| 午夜精品一区二区三区免费看| 97超视频在线观看视频| 国产欧美日韩一区二区三区在线 | 自拍偷自拍亚洲精品老妇| 国产成人aa在线观看| 1000部很黄的大片| 亚洲av免费高清在线观看| 免费观看av网站的网址| 超碰av人人做人人爽久久| 午夜免费观看性视频| 日韩大片免费观看网站| 欧美极品一区二区三区四区| 免费看a级黄色片| 欧美xxxx性猛交bbbb| 国产白丝娇喘喷水9色精品| 国产亚洲最大av| 别揉我奶头 嗯啊视频| 亚洲自拍偷在线| 久久人人爽人人片av| 亚洲欧美一区二区三区黑人 | 简卡轻食公司| 午夜福利在线观看免费完整高清在| 一个人观看的视频www高清免费观看| 成年女人在线观看亚洲视频 | 丝袜脚勾引网站| 色网站视频免费| 天美传媒精品一区二区| 亚洲无线观看免费| 精品久久久久久久久av| 男女那种视频在线观看| 一级毛片 在线播放| 亚洲精品一区蜜桃| 日本午夜av视频| 国产精品久久久久久精品古装| 亚洲精品aⅴ在线观看| 亚洲国产最新在线播放| 欧美激情久久久久久爽电影| 欧美一区二区亚洲| 亚洲欧美成人精品一区二区|