• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A new focused crawler using an improved tabu search algorithm incorporating ontology and host information*#

    2023-07-06 08:07:24JingfaLIUZhenWANGGuoZHONGZhiheYANG

    Jingfa LIU ,Zhen WANG??,2 ,Guo ZHONG ,Zhihe YANG

    1School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510006,China

    2China Unicom Central South Research Institute,Changsha 410000,China

    Abstract: To solve the problems of incomplete topic description and repetitive crawling of visited hyperlinks in traditional focused crawling methods,in this paper,we propose a novel focused crawler using an improved tabu search algorithm with domain ontology and host information (FCITS_OH),where a domain ontology is constructed by formal concept analysis to describe topics at the semantic and knowledge levels.To avoid crawling visited hyperlinks and expand the search range,we present an improved tabu search (ITS) algorithm and the strategy of host information memory.In addition,a comprehensive priority evaluation method based on Web text and link structure is designed to improve the assessment of topic relevance for unvisited hyperlinks.Experimental results on both tourism and rainstorm disaster domains show that the proposed focused crawlers overmatch the traditional focused crawlers for different performance metrics.

    Key words: Focused crawler;Tabu search algorithm;Ontology;Host information;Priority evaluation

    1 Introduction

    Currently,Internet resources are growing explo‐sively.The data update speed is increasing,and users’ needs for Web information are becoming more per‐sonalized.Traditional search engines can no longer sat‐isfy the needs of customized information,so a focused crawler (Chakrabarti et al.,1999;Deng,2020) is pre‐sented to collect topical information.Compared with traditional crawlers,a focused crawler can retrieve larger quantities and higher-quality topic-relevant web‐pages.Therefore,in recent years,focused crawlers have attracted the attention of many scholars (Yu and Liu,2015;Hosseinkhani et al.,2021).

    At present,focused crawlers face three main is‐sues: topic description,evaluation of the topic rele‐vance of unvisited hyperlinks,and design of crawl‐ing strategies.The methods of topic description include mainly topic words (Fei and Liu,2018),context graphs (CGs) (Du et al.,2014;Guan and Luo,2016),and domain ontology (Khan and Sharma,2016;Rani et al.,2017).The topic words are collected through the experience of domain experts,but there is a prob‐lem of semantic ambiguity.The construction of CGs relies on the user’s historical crawling information and may deviate from the topic if the user lacks topicrelevant knowledge.Because ontology can describe the specific domain at the semantic and knowledge levels,most semantic-based crawlers (Khan and Sharma,2016;Lakzaei and Shmasfard,2021) use ontology to describe topics.

    The methods of evaluating unvisited hyperlinks include hyperlink structure based methods and web‐page content based methods.Hyperlink structure based methods,such as the PageRank algorithm (Yuan et al.,2017) and the hyperlink induced topic search (HITS) algorithm (Asano et al.,2007),focus on the structure itself and ignore the relevance of the topic,which may cause crawlers “topic drifting.” Webpage content based methods evaluate mainly priorities of unvisited hyperlinks by calculating and analyzing the relevance of the webpage text and anchor text,such as the fishsearch algorithm (de Bra et al.,1994) and sharksearch algorithm (Prakash and Kumar,2015).These algorithms ignore the characteristics of the global hy‐perlink structure and perform well only when search‐ing nearby webpages.Most researchers ignore the impact of combining these two methods,and the con‐sidered metrics are not sufficiently comprehensive.

    The crawling strategy determines the order in which hyperlinks with different priorities are visited.The traditional algorithms include mainly breadth-first search (BFS) (Li et al.,2015) and optimal priority search (OPS) (Rawat and Patil,2013).BFS neglects the accessing order of hyperlinks during crawling so that it has the worst performance.OPS takes only the best value of priority into account,and the greedy strategy leads to a greater possibility of falling into a choice of a hyperlink with no prospects.To avoid the inherent flaws of greedy algorithms,many scholars have proposed intelligent focused crawling methods based on metaheuristic strategies.For instance,He et al.(2009) proposed a focused crawling strategy based on the simulated annealing (SA) algorithm,al‐lowing crawlers to obtain suboptimal hyperlinks for expanding the search range.Yan and Pan (2018) con‐sidered users’ browsing behavior to optimize genetic operations and proposed a heuristic focused crawling strategy based on an improved genetic algorithm (GA).Tong (2008) considered the distribution characteris‐tics of website resources and proposed a heuristic fo‐cused crawling strategy based on an adaptive dynamic evolutionary particle swarm optimization (PSO) algo‐rithm.Xiao and Chen (2018) analyzed the priority of crawlers in global crawling and proposed a focused crawling strategy based on the gray wolf optimization (GWO) algorithm.Recently,Liu JF et al.(2022a) proposed a heuristic focused crawling strategy combin‐ing ontology learning and the multiobjective ant colony algorithm (OLMOACO).In OLMOACO,a method of the nearest farthest candidate solution (NFCS) combined with fast nondominated sorting is used to select a set of Pareto-optimal hyperlinks and guide the crawlers’ search directions.Liu JF et al.(2022b) built a multiobjective optimization model for evaluat‐ing unvisited hyperlinks based on Web text and link structure,and proposed a focused crawling strategy combining the Web space evolution algorithm and do‐main ontology (FCWSEO).Both the OLMOACO and FCWSEO algorithms guide the crawling direction by building a multiobjective optimization model to select the next hyperlink to visit.However,the OLMOACO and FCWSEO algorithms suffer from the tendency to crawl the visited hyperlinks under a few hosts,which causes the crawler to converge prematurely.

    To overcome the above issues,in this paper,we propose a novel focused crawler using an improved tabu search algorithm with domain ontology and host information (FCITS_OH).The main contributions of this paper are as follows:

    1.Two domain ontologies of tourism and rain‐storm disaster based on formal concept analysis (FCA) are constructed to describe topics at the semantic and knowledge levels.

    2.In the crawling process,an improved tabu search (ITS) strategy with host information is pre‐sented to select the next hyperlink,where the modi‐fied tabu object and acceptance principles are used to avoid crawling the visited hyperlinks in the focused crawler,and the host information memory of hyper‐links is proposed to prevent the crawler from cycling under a few hosts,which controls the convergence speed of the algorithm.

    2 Topic description

    In this study,we use domain ontology to describe the topic.This section first introduces the construction process of domain ontology based on the FCA method and then computes the topic semantic weighted vec‐tor based on domain ontology semantics.

    2.1 Ontology construction

    FCA (Zhu et al.,2017) is a semiautomatic method of constructing ontology,whose main data structure is the concept lattice.The process of generating a concept lattice is concept clustering,which form?alizes the hierarchical relationship between concepts.The detailed steps of ontology construction in this study are as follows: (1) select five keywords for the determined domain and search for keywords through search engines such as Baidu and Google to obtain the top 50 webpages of each search engine;(2) use the tool IK-Analyzer (Wang and Meng,2014) to per‐form word segmentation;(3) extract document sets and term sets that describe the topic;(4) build a document–term matrix,which is input into the tool ConExp (https://sourceforge.net/projects/conexp/) to generate a concept lattice and obtain a Hasse diagram;(5) describe the hierarchical relations among concepts by ontology Web language (OWL) (https://www.w3.org/TR/owl-features/);(6) visualize the ontology by Protégé (https://protege.stanford.edu/).

    Applying the above method,we construct a tour‐ism ontology and a rainstorm disaster ontology.The tourism ontology includes seven branches: tourist attractions,tourism purpose,accommodation,service agencies,tourism routes,means of transportation,and tourist.The whole ontology includes 61 concepts and a seven-level hierarchical structure.The rainstorm di‐saster ontology includes three branches: disaster man‐agement,secondary disaster,and disaster grade.The whole ontology contains 50 concepts and a six-level hierarchical structure.

    2.2 Topic semantic weighted vector computation

    Referring to Liu JF et al.(2022a),we consider five impact factors,including semantic distance (IFDis),concept density (IFDen),concept depth (IFDep),con‐cept coincidence degree (IFCoi),and concept semantic relationship (IFRel),to measure the topic semantic similarity between concepts based on the constructed domain ontology.The calculation formula of the se‐mantic similarity between conceptC1and conceptC2,Sem(C1,C2),is shown as follows:

    Here,the adjustment factorsk1,k2,k3,k4,andk5are non-negative and satisfyk1+k2+k3+k4+k5=1.To ob‐tain the topic semantic weighted vector,we first de‐termine a topic conceptC,which is tourism or rain‐storm disaster in this study.Suppose the topic word vectorT=(t1,t2,...,tn).Calculate the semantic simi‐larity between each topic wordti(i=1,2,...,n) and topic conceptCbased on Eq.(1) to obtain the corre‐sponding topic semantic weighted vectorWT=(wt1,wt2,...,wtn),wherewti(i=1,2,...,n) is the weight of theithtopic wordtiinT.Thus,the topic semantic weighted vector between topic conceptCand topic word vectorTis shown as follows:

    3 Comprehensive evaluation method of hyperlinks

    We use the vector space model (VSM) (Farag et al.,2018) to calculate the topic relevance of a webpage and propose a comprehensive priority evaluation method for predicting the topic relevance of unvisited hyperlinks.

    3.1 Topic relevance of webpages

    Most webpages are represented as HTML files,and the content of the webpage is presented in the form of tags.Different positions of tags display dif‐ferent importance degrees in the entire webpage.We choose main tags from HTML files and divide them into five groups.Each tag group is assigned a specific weightWk(k=1,2,...,5),as shown in Table 1.

    Table 1 Division of labels and their weights

    We map the webpage text into a webpage fea‐ture vectorD=(d1,d2,...,dn) and obtain the corre‐sponding webpage feature weighted vectorWD=(wd1,wd2,...,wdn),wherewdi(i=1,2,...,n) represents the weight of theithfeature word and is computed by the improved term frequency-inverse document fre‐quency (TF-IDF) (Wu YL et al.,2017).Its expression is shown as follows:

    Here,K=5,tfi,krepresents the normalized TF of theithtopic word at thekthposition (group) of thewebpage text,maxfi,krepresents the maximum TF of theithtopic word in all label groups,andWkrepresents the weight of thekthgroup of labels.We adopt VSM (Farag et al.,2018) to calculate the topic relevanceR(p) of webpagep.Its expression is shown in Eq.(4):

    VSM is a well-known measure of cosine simi‐larity and transforms a language problem into a math‐ematical problem.The cosine similarity between two vectors is considered the similarity of the text related to the given topic.When the angle between two vec‐tors is equal to 0°,the relevance between them is the maximum and equals 1,indicating that they are the most relevant.When the angle is equal to 90°,the rel‐evance is the minimum and equals 0,indicating that they are irrelevant.Assume that the threshold of the webpage topic relevance isα.IfR(p)>α,then web‐pagepis considered to be topic-relevant.

    3.2 Topic relevance of anchor text

    The anchor text usually has only a few words or phrases,but it is an important resource to predict the relevance of the webpage to which the hyperlink points.Generally,the TF-IDF model (Wu YL et al.,2017) is used to evaluate the importance of key‐words.However,it is not comprehensive to use TF to measure the importance of a word in the whole an‐chor text.Therefore,we use the improved BM25 model (Wu TY,2018) to evaluate the importance of keywords in the anchor text.It retains the impor‐tant indicator of IDF in the TF-IDF model and im‐proves the computation of TF.The BM25 algorithm is generally used to evaluate the relevance of words and documents.In this study,we use the BM25 algo‐rithm to obtain the weights of words in the anchor text.The weightwaiof theithtopic word in the anchor text is calculated as follows:

    Here,Nis the number of crawled webpages,Nidenotes the number of webpages containing theithtopic word,a>1,mrepresents the number of webpages containing the anchor text of the considered hyper‐link,k=2 andb=0.75represent adjustment factors,dljrepresents the length of thejthwebpage (i.e.,the num‐ber of words) containing the anchor text,avgdl is the average length of all crawled webpages,andfi,jde‐notes the frequency of theithtopic word in the anchor text located in thejthwebpage.After obtaining the an‐chor text feature weighted vectorWA=(wa1,wa2,...,wan),we calculate the cosine similarity between the topic semantic weighted vectorWTand the anchor text feature weighted vectorWAto obtain the topic relevanceR(Al) of anchor textAl.The topic relevance of the anchor textAlis computed as follows:

    whereA=(a1,a2,...,an) denotes the anchor text fea‐ture vector.

    3.3 Improved PageRank value computation

    The PageRank algorithm is an essential algo‐rithm for evaluating unvisited hyperlinks.For webpagep,the traditional calculation formula of the PageRank (PR) value is

    Here,dis the damping factor and is set to 0.85,hrepresents the total number of all in-links of web‐pagep,piis theithin-link webpage of webpagep,PR(pi) denotes the PR value of webpagepi,andC(pi) represents the total number of out-links of webpagepi.To avoid topic drifting of traditional PR calcula‐tion,by referring to Ma et al.(2016),we integrate the anchor text topic relevance into PR value calcula‐tion and propose an improved PR value calculation method for webpagep,which is shown as follows:

    Here,ωrepresents an adjustment factor and is set to 0.6 in this study,andR(Ai) represents the topic relevance of anchor textAiof theithin-link of web‐pagep(Section 3.2).

    3.4 Topic relevance evaluation of hyperlinks

    A comprehensive priority evaluation method is given to evaluate the topic relevance of unvisited hy‐perlinkl.Its expression is shown as follows:

    Here,r1,r2,andr3represent weighted factors and satisfyr1+r2+r3=1,P(l) represents the compre‐hensive priority value of the unvisited hyperlinkl,R(Al) represents the topic relevance of anchor textAlof hyperlinkl,R(pi) represents the topic relevance of webpagepithat contains hyperlinkl,mis the number of webpages containing hyperlinkl,and PR(pl) is the PR value of webpageplcontaining hyperlinkl.To filter irrelevant hyperlinks,we set a comprehensive priority thresholdβ.IfP(l)≥β,the unvisited hyper‐linklis considered topic-relevant and is added to the waiting queue (Qwait).

    4 Focused crawler based on tabu search with ontology and host information

    In this section,we first introduce the tabu search (TS) algorithm and subsequently propose the im‐proved tabu search (ITS) algorithm by modifying the tabu object and acceptance principles.Finally,by incorporating the domain ontology and host informa‐tion memory into the focused crawling strategy based on ITS,a new focused crawler using ITS with the ontology and host information (FCITS_OH) algo‐rithm is proposed.

    4.1 Tabu search algorithm

    The TS algorithm was first proposed by Fred Glover.The TS algorithm (Liu JF et al.,2021) is a ran‐dom heuristic algorithm based on local search in es‐sence.It generates some new candidate solutions in the neighborhood of the current solution.The basic flow of TS is as follows: (1) Given an initial solution,select some candidate solutions from the neighbor‐hood of the current solution.(2) If the objective func‐tion value of the optimal candidate solution is better than the objective function value of the current opti‐mal solution,ignore its tabu property,displace the current solution and the current optimal solution with the optimal candidate solution,and add it into the tabu list and simultaneously update the term of each object in the tabu list;otherwise,select the nontabu optimal solution from the candidate solutions as the new current solution,add it into the tabu list,and up‐date the term of each object in the tabu list.(3) Re‐peat the above process until the algorithm meets the ending condition.The TS algorithm involves some re‐lated elements,such as the objective function,neigh‐borhood,tabu list,and aspiration criterion,which will directly affect the optimization performance of the algorithm.

    4.2 Objective function

    The objective function is also called the fitness function,which is used to compute the objective value of the solution.In the focused crawler,the objective function is expressed by the comprehensive priority of hyperlinkl(Eq.(9)),andP(l) represents the objec‐tive function value.

    4.3 Neighborhood set and extended neighborhood set

    Definition 1(Neighborhood set) The set of all hy‐perlinks in the webpage to which the current hyper‐link Plink points is called the neighborhood set of Plink,denoted asN(Plink).

    Definition 2(Candidate neighborhood set) The set of hyperlinks with a comprehensive priority higher than the thresholdβ,located in the webpage to which the current hyperlink Plink points,is called the candi‐date neighborhood set of Plink,denoted asC(Plink).Obviously,C(Plink) ?N(Plink).

    Definition 3(Extended neighborhood set) The set of hyperlinks whose comprehensive priority is higher than the thresholdβin the webpage where the cur‐rent hyperlink Plink is located is called the extended neighborhood set of Plink,denoted asE(Plink).

    In the entire crawling process,the traditional neighborhood search range considers only hyperlinks in the webpage to which the current hyperlink Plink points,i.e.,neighborhood set or candidate neighbor‐hood set.To expand the search range of the crawler,our ITS algorithm extends the neighborhood set to the extended neighborhood set.After access to the candidate neighborhood set of the current hyperlink Plink for a specified number of times and each time if there is no suitable hyperlink to be found,the next hyperlink will be selected from the extended neighborhood set.

    4.4 Tabu list

    The tabu list contains the tabu object and tabu length.The tabu object is the object in the tabu list.Whenupdating the crawler queue based on the neigh‐borhood setN(Plink),it is possible for the crawler to repeatedly select a certain hyperlink Plinkwith the highest comprehensive priority.To avoid this,in the tra‐ditional TS algorithm,if the comprehensive priority of Plinkis higher than the priority of the current optimal hyperlink,the algorithm will ignore its tabu property and replace the current optimal hyperlink and the cur‐rent hyperlink by Plink,and at the same time set it as the tabu object and put it into the tabu list;otherwise,the nontabu hyperlink with the highest comprehen‐sive priority fromN(Plink)will be selected as the current hyperlink and regarded as a new tabu object.However,in the ITS algorithm,we do not consider whether the current hyperlink Plink is a tabu object or not.As long as each of the comprehensive priorities of five randomly selected hyperlinks fromC(Plink) is lower than Plink’s comprehensive priority,we will set Plink as a tabu object,put Plink into the tabu list,and then select a nontabu hyperlink with the highest com‐prehensive priority fromE(Plink) as the current hyper‐link.Obviously,when the hyperlink is selected fromE(Plink),Plink is not selected again.This improved tabu object strategy not only gives the current hyperlink more opportunities to select the next hyperlink with bet‐ter comprehensive priority from the candidate neigh‐borhood set,but also effectively extends the search range of the crawler by the extended neighborhood set.

    Tabu length denotes the maximum number of times by which tabu objects are not picked out from the tabu list without considering the aspiration cri‐terion.In this study,the tabu length is set to five.

    4.5 Aspiration criterion and improved acceptance principles

    The aspiration criterion means that when a ta‐booed hyperlink has higher comprehensive priority than the current optimal hyperlink,the tabu property of this tabooed hyperlink will be ignored,and it will be accepted as the current hyperlink.In the traditional TS algorithm,when the tabooed hyperlink does not satisfy the aspiration criterion,the nontabu hyperlink with the highest comprehensive priority will be se‐lected from the neighborhood set as the current hy‐perlink (ignoring its comparison with the current hy‐perlink).This method easily accepts hyperlinks with a low comprehensive priority.The ITS algorithm re‐fines the acceptance principles by the following steps while retaining the aspiration criterion:

    1.If hyperlink Glink selected fromC(Plink) is a tabu object and satisfies the aspiration criterion,Glink will be released and accepted as the current hyperlink Plink.

    2.If hyperlinkGlink is a tabu object and does not satisfy the aspiration criterion,Glink will not be accepted as the current hyperlink.Thereafter,a new hyperlink is randomly selected fromC(Plink).If its comprehensive priority is higher than that of the cur‐rent hyperlink,it will be accepted as the new current hyperlink;otherwise,another hyperlink will be se‐lected fromC(Plink) and judged whether it is accepted.This process is repeated five times until a selected hyperlink is accepted.If each of the five cannot be accepted,we set the hyperlinkPlink as a tabu object and put it into the tabu list.Then,select a nontabu hy‐perlink with the highest comprehensive priority fromE(Plink) as the current hyperlinkPlink.Update the tabu listand release the object whose term is 0.

    3.If hyperlink Glink is not a tabu object and its comprehensive priority is higher than that of the cur‐rent hyperlink Plink,Glink will be accepted as the current hyperlink Plink.

    4.If hyperlink Glink is not a tabu object and its comprehensive priority is not higher than that of the current hyperlink Plink,Glink will not be accepted as the current hyperlink.Thereafter,the five different hyperlinks are selected fromC(Plink),similar to the above step 2.If the comprehensive priority of a se‐lected hyperlink is higher than that of the current hy‐perlink,this hyperlink will be accepted as a new cur‐rent hyperlink.If each of them cannot be accepted as the current hyperlink,we set the hyperlinkPlink as a tabu object and put it into the tabu list.Then,select a nontabu hyperlink with the highest comprehensive pri‐ority fromE(Plink) as the current hyperlink Plink.Up‐date the tabu listand release the object whose term is 0.

    4.6 Focused crawler based on the improved tabu search algorithm

    The ITS algorithm is obtained by improving the tabu object and acceptance principles of the traditional TS algorithm.The ITS algorithm is applied to deter‐mine the next hyperlink to be visited from the wait‐ing queueQwait.

    First,initialize the tabu listH1.Suppose that Hlink is the current optimal hyperlink and that Plink is the current hyperlink selected randomly fromQwait.Construct a candidate neighborhood setC(Plink) and an extended neighborhood setE(Plink) based on the current hyperlink Plink.Randomly select a hyperlinkGlink fromC(Plink) as a candidate hyperlink.Then,judge whether Glink is accepted according to the im‐proved acceptance principles.If it is accepted,re‐place the current hyperlink Plink by Glink,and out‐put the hyperlink Plink.If it is not accepted,select another candidate hyperlink Glink fromC(Plink) and continue the judgment process.If five different candi‐date hyperlinks are selected,and each time the selected candidate hyperlink is not accepted,then we set Plink as a tabu object and put it into tabu listH1.Reselect a nontabu hyperlink with the highest comprehensive priority from the extended neighborhood setE(Plink) as the current hyperlink.Update the tabu listH1by subtracting 1 for the term of each tabu object in the tabu list,and release the object whose term is 0.The above iteration process is repeated until a hy‐perlink is accepted.The detailed process of the ITS(Qwait) algorithm is presented in Algorithm A1 of Appendix.

    4.7 Focused crawler combining ontology and the improved tabu search algorithm

    By introducing the ITS algorithm into the focused crawler and using the domain ontology to describe the topic,we design a focused crawling strategy combining ontology and the ITS algorithm (FCOITS),which is used to fetch topic-relevant webpages from the Internet.

    First,determine the topic and build the domain ontology about this topic,and add the seed uniform resource locators (URLs) toQwait.Suppose thatαis the threshold of topic-relevant webpages and thatβis the threshold of the hyperlink’s comprehensive prior‐ity.Then,the ITS(Qwait) algorithm is used to select the next hyperlink phead to visit and download the webpage phead-page to which the hyperlink phead points.IfR(phead-page)>α,it is considered a topicrelevant webpage;otherwise,it is considered an irrel‐evant webpage.Subsequently,all hyperlinks in the webpage phead-page are extracted and added to the set of child-links.Calculate the comprehensive priority of every hyperlink child-linkiin child-links based on Eq.(9).IfP(child-linki)>β,add child-linkitoQwait;oth‐erwise,discard it.The above iteration process is re‐peated until the end conditions are met.Fig.1 shows the flowchart of the proposed FCOITS algorithm.The detailed process of the FCOITS algorithm is presented in Algorithm A2 of Appendix.

    Fig.1 Flowchart of the proposed FCOITS algorithm (DP: number of downloaded webpages;LP: number of downloaded topic-relevant webpages;Qwait: waiting queue)

    4.8 Focused crawler combining the FCOITS algorithm and host information

    It is possible that the crawler recursively crawls under a few hosts,resulting in premature conver‐gence of the crawler and limitation to retrieve more topic-relevant webpages.Hostgraph (Jiang and Zhang,2007) reveals the connection between hyperlinks and common hosts.For example,“klme.nuist.edu.cn” in the hyperlink “https://klme.nuist.edu.cn/index.htm” is the host,and any hyperlink that contains it can be under the same host.In this study,we analyze the hyperlink’s syntactic structure,leverage the host of the hyperlink,and propose a new focused crawler that integrates host information into FCOITS,called FCITS_OH.

    At the beginning of FCITS_OH,the hyperlinks that are located under different hosts and have higher comprehensive priorities are selected as seed hyper‐links to avoid premature convergence of the algo‐rithm.Then,put the selected hyperlinks intoQwait.Apply the ITS(Qwait) algorithm to obtain the head hy‐perlink phead ofQwait,whose host is marked byphead_host.Suppose that the number of hosts inQwaitis QN_host.The number of downloaded webpages is DP.The algorithm ends when DP reaches 15 000.The algorithm completion rate is defined by com_rate=DP/15 000.During crawling,if some hyperlinks are visited many times under the same host,the crawler will select another hyperlink located at differ‐ent hosts inQwaitfrom the current host.To avoid the crawler circularly crawling under a few hosts,a tabu listH2for hosts is defined.Continue the following four steps: (1) If phead_host is a tabooed host,call the ITS(Qwait) algorithm to obtain another head hyper‐link phead ofQwait.(2) If com_rate<0.3 and QN_host<10,select three hyperlinks according to descending order of comprehensive priorities from the discarded hyperlinks whose hosts do not belong to the set of hosts inQwaitand add them intoQwait.This is condu‐cive to expanding the number of hosts of hyperlinks inQwait.(3) If the number of visited hyperlinks under the current host phead_host is smaller than 50,continue to visit other hyperlinks under the current hostphead_host;otherwise,compute the percentage ph_ratio (the number of topic-relevant webpages to the num‐ber of all visited webpages under the current host phead_host).(4) If the number of visited hyperlinks under the current host phead_host is smaller than 100 and ph_ratio>0.8,continue to visit other hyperlinks under the current hostphead_host;otherwise,set phead_host as a tabooed host and put it intoH2.After the head hyperlink phead is obtained,continue the re‐maining steps of Algorithm A2 until the end condi‐tions are met.The detailed process of the FCITS_OH algorithm is presented in Algorithm A3 of Appendix.

    5 Experimental results and analysis

    In this study,the initial seed hyperlinks are ac‐quired from Baidu,which is the most authoritative and widely used search engine in China.We obtain some webpages by searching the keywords “tourism” and “rainstorm disaster,” separately.We choose 30 top-ranked webpages as the initial seed hyperlinks in the tourism domain and rainstorm disaster domain (Tables S1 and S2 in the supplementary materials).

    In addition,some important parameters (αandβ) have a great impact on the experimental results.For example,if the topic relevance thresholdαis too high,the crawled topic-relevant webpages will be re‐duced because some topic-relevant webpages are fil‐tered.If the thresholdαis too low,some irrelevant web‐pages will be wrongly considered as topic-relevant webpages.We have conducted parameter experiments on different values ofαin the range of 0.5–0.8 based on the lattice search method under different domains by referring to Liu WJ and Du (2014).The results show that whenα=0.7 in the tourism domain andα=0.62 in the rainstorm disaster domain,the crawler could correctly capture topic-relevant webpages and achieve the best performance.The other parameters are set similarly.Here,we setβ=0.19 for the tourism domain andβ=0.15 for the rainstorm disaster domain.In addition,r1=0.55,r2=0.25,andr3=0.20.

    5.1 Performance metrics

    The effectiveness of the focused crawlers can be generally evaluated by accuracy (AC) and recall (RC).AC equals the ratio of the number of downloaded topic-relevant webpages to the total number of down‐loaded webpages.RC equals the ratio of the number of downloaded topic-relevant webpages to the total number of all topic-relevant webpages on the Internet.Because it is difficult to count the total number of topic-relevant webpages on the Internet,in this study,we do not use RC as the evaluation metric.In addi‐tion,we use the average topic relevance (AR) and the standard deviation (SD) of downloaded webpages as evaluation metrics.These three metrics are as follows:

    Here,SD is the standard deviation of all down‐loaded webpages compared to AR,used to measure the spread of the topic relevance of all downloaded webpages.The value of SD is in [0,1].

    5.2 Experimental results of different crawlers

    In this study,we first test six focused crawling algorithms in the tourism domain and then test seven focused crawling algorithms in the rainstorm disaster domain under the same experimental environment,including BFS (Li et al.,2015),OPS (Rawat and Patil,2013),focused crawler based on the simulated anneal‐ing algorithm (FCSA) (Liu JF et al.,2019),FCWSEO (Liu JF et al.,2022b),OLMOACO (Liu JF et al.,2022a),FCOITS,and FCITS_OH.The last two algo‐rithms are proposed in this study.We implement all crawling algorithms in Java language and run them on an Intel Core i7-7700 PC with 3.6 GHz CPU and 8.0 GB RAM.When the number of downloaded web‐pages reaches 15 000,all algorithms tend to be stable and terminate.The same evaluation metrics are used to test different crawling algorithms on the two topics of tourism and rainstorm disaster.This is conducive to investigating the validity,superiority,and adaptability of each algorithm.

    5.2.1 Experimental results in the tourism domain

    Experimental results of LP,AC,AR,and SD by six different crawling algorithms including BFS,OPS,FCSA,FCWSEO,FCOITS,and FCITS_OH in the tourism domain are shown in Figs.2–5 for compari‐son.Fig.2 shows the results of LP obtained by six crawling algorithms in the tourism domain.With the increase in the number of downloaded webpages,the LP of five crawling algorithms increases rapidly except for BFS.Obviously,the LP obtained by FCITS_OH is significantly greater than that of the five other crawling algorithms.The LP obtained by FCITS_OH is 13 082 when DP reaches 15 000.Fig.3 shows the re‐sults of AC obtained by six crawling algorithms in the tourism domain.It is not hard to see from the figure that the AC of FCITS_OH becomes higher than that of the five other crawling algorithms after the DP exceeds 8000.The AC of the BFS,OPS,FCSA,FCWSEO,FCOITS,and FCITS_OH crawling algorithms is 0.3740,0.6820,0.7503,0.8086,0.8453,and 0.8721,re‐spectively,when DP reaches 15 000.

    Fig.2 Results of LP obtained by six crawling algorithms in the tourism domain (LP: number of downloaded topicrelevant webpages;DP: number of downloaded webpages)

    Fig.3 Results of AC obtained by six crawling algorithms in the tourism domain (AC: accuracy;DP: number of downloaded webpages)

    Fig.4 shows the results of AR obtained by six crawling algorithms in the tourism domain.According to Fig.4,the AR of FCITS_OH is obviously higher than that of the five other crawling algorithms after the DP exceeds 8000.The AR of the BFS,OPS,FCSA,FCWSEO,FCOITS,and FCITS_OH crawling algorithms is 0.4247,0.6966,0.7292,0.7553,0.7806,and 0.7912,respectively,when DP reaches 15 000.Fig.5 shows the results of SD obtained by six crawl‐ing algorithms in the tourism domain.From Fig.5,FCITS_OH maintains a low SD throughout the cra?wling process.The SD of the BFS,OPS,FCSA,FCWSEO,FCOITS,and FCITS_OH crawling algo‐rithms is stable at 0.2848,0.2317,0.1769,0.1293,0.1413,and 0.1340,respectively,when DP reaches 15 000.The SD reflects the stability of the topic rele‐vance of webpages captured by the algorithm.The lower the SD,the more stable the algorithm.Al‐though the SD of FCITS_OH is slightly higher than that of FCWSEO,FCITS_OH outperforms FCWSEO in the other evaluation metrics.

    Fig.4 Results of AR obtained by six crawling algorithms in the tourism domain (AR: average topic relevance;DP: number of downloaded webpages)

    Fig.5 Results of SD obtained by six crawling algorithms in the tourism domain (SD: standard deviation;DP: number of downloaded webpages)

    5.2.2 Experimental results in the rainstorm disaster domain

    Experimental results by seven different crawling algorithms including BFS,OPS,FCSA,FCWSEO,OLMOACO,FCOITS,and FCITS_OH in the rain‐storm disaster domain for four evaluation metrics (LP,AC,AR,and SD) are shown in Figs.6–9.Fig.6 shows the results of LP obtained by seven crawling algorithms in the rainstorm disaster domain.From Fig.6,we find that when DP reaches 15 000,FCITS_OH obtains 12 393 topic-relevant webpages,indicat‐ing that FCITS_OH can collect more topic-relevant webpages than the six other crawling algorithms.Fig.7 shows the results of AC obtained by seven crawling algorithms in the rainstorm disaster domain.From Fig.7,we find that the AC of FCITS_OH tends to stabi‐lize gradually after the DP exceeds 10 000.Finally,the AC of the BFS,OPS,FCSA,FCWSEO,OLMOACO,FCOITS,and FCITS_OH crawling algorithms is 0.2366,0.6542,0.7004,0.8103,0.7417,0.7969,and 0.8262,respectively.Compared with the six other crawling algorithms,FCITS_OH has a higher AC.

    Fig.6 Results of LP obtained by seven crawling algorithms in the rainstorm disaster domain (LP: number of down‐loaded topic-relevant webpages;DP: number of downloaded webpages)

    Fig.7 Results of AC obtained by seven crawling algorithms in the rainstorm disaster domain (AC: accuracy;DP: number of downloaded webpages)

    Fig.8 shows the results of AR obtained by seven crawling algorithms in the rainstorm disaster domain.Throughout the crawling process,the AR of FCITS_OH is relatively high and stable among the seven crawling algorithms.Finally,the AR of the BFS,OPS,FCSA,FCWSEO,OLMOACO,FCOITS,and FCITS_OH cra?wling algorithms is stable at 0.2947,0.6376,0.6627,0.8200,0.7781,0.7306,and 0.7421,respectively.Fig.8 shows that although the AR of FCWSEO and OLMOACO is slightly higher than that of FCITS_OH,the effect of FCITS_OH in grabbing topic-relevant web‐pages is relatively stable.

    Fig.8 Results of AR obtained by seven crawling algorithms in the rainstorm disaster domain (AR: average topic relevance;DP: number of downloaded webpages)

    Fig.9 shows the results of SD obtained by seven crawling algorithms in the rainstorm disaster domain.The SD of FCITS_OH maintains a downwards trend in the whole crawling process.Finally,the SD of the BFS,OPS,FCSA,FCWSEO,OLMOACO,FCOITS,and FCITS_OH crawling algorithms is 0.3096,0.2599,0.1953,0.1570,0.1375,0.1495,and 0.1444,respectively.Although the SD of FCITS_OH is slightly higher than that of OLMOACO,they are comparable.

    Fig.9 Results of SD obtained by seven crawling algorithms in the rainstorm disaster domain (SD: standard deviation;DP: number of downloaded webpages)

    5.3 Analysis and discussion

    From Figs.2,3,4,6,7,and 8,it is not hard to find that the OPS algorithm has a better performance in the early crawling stage on the tourism and rain‐storm disaster domains,but the performance degrades in the later crawling stage,resulting from its greedy strategy.The OPS algorithm always selects the highest priority hyperlink from the waiting queue to crawl the webpage.When it falls into a choice of a hyper‐link with no prospects,the webpage to which it points may contain few valuable hyperlinks,which is not con‐ducive to the expansion of the search range.The FCSA algorithm is also a kind of greedy strategy but changes the optimal search by adopting a certain probability to receive hyperlinks with relatively low priority.How‐ever,the performance of the FCSA algorithm highly depends on its parameters,such as the initial temper‐ature and annealing speed,which are difficult to de‐termine.Therefore,the ability of the FCSA algorithm to grab topic-relevant webpages is only slightly higher than that of the BFS and OPS algorithms.

    Fig.3 shows that the FCITS_OH algorithm over‐matches the FCWSEO algorithm on AC in the tour‐ism domain,and it can also be seen from Fig.7 that the FCITS_OH algorithm outperforms the FCWSEO and OLMOACO algorithms on AC in the rainstorm disaster domain.The FCWSEO and OLMOACO algo‐rithms grow fast in the early stage and tend to stabi‐lize without improvement later in the crawling pro‐cess.This is because the FCWSEO algorithm is a kind of multiobjective optimization algorithm that produces nondominant hyperlinks within circular re‐gions.However,as the circular regions expand,it will easily catch some hyperlinks with no prospects by contrasting with the adjacent hyperlinks and affect the crawling performance.For the OLMOACO algo‐rithm,it is easier to accumulate pheromones to find an optimal search path at the beginning,and as the crawler proceeds,it is affected by the feedback mech‐anism that makes it difficult to improve the phero‐mone of the optimal path again.As a result,it is chal‐lenging to continue to enhance the ability to fetch more topic-relevant webpages.The FCITS_OH algorithm uses the tabu object and aspiration criterion to avoid crawling visited hyperlinks and introduces host infor‐mation to expand the search range,so it is easier to find the optimal crawling path and fetch more topicrelevant hyperlinks in the entire crawling process.

    The specific values of all evaluation metrics and running time of the abovementioned seven crawling algorithms in the tourism domain and rainstorm disas‐ter domain are displayed in Table 2 when DP reaches 15 000.From Table 2,we can find that the running time of BFS is the shortest,while FCWSEO and OLMOACO require longer running time than the other crawling algorithms in the tourism domain and rain‐storm disaster domain,respectively.This is because FCWSEO and OLMOACO are multiobjective opti‐mization crawling algorithms,where the optimiza‐tion process of hyperlink selection based on a multi‐objective optimization model increases the time con‐sumption.The running time of FCITS_OH is slightly longer than that of the other crawling algorithms ex‐cept FCWSEO and OLMOACO.This is because it takes more running time to construct the ontology and extract host information.

    To further investigate the effectiveness of the improved strategies of tabu object and acceptance principles in the ITS algorithm,we design the fo‐cused crawler based on the improved tabu search al‐gorithm (FCITS) and the focused crawler based on the traditional tabu search algorithm (FCTS).For con‐venience of presentation,Table 2 also shows the LP,AC,AR,SD,and running time of FCITS and FCTS in the tourism and rainstorm disaster domains when DP reaches 15 000.We find that the experimental re‐sults of FCITS for all evaluation metrics except the run‐ning time are better than those of FCTS.This further confirms the effectiveness of the improved strategies in the ITS algorithm.With regard to the running time of the ITS algorithm,we analyze the time complexityof the ITS algorithm and the TS algorithm in the crawling process as follows.

    Suppose that there aremhyperlinks in the wait‐ing queueQwait.The time complexity of selecting a hyperlink Plink fromQwaitisO(m).The time con‐sumption of selecting a hyperlink Glink from the neighborhoodC(Plink) is assumed to bek1~O(m).The time consumption for determining the tabu ob‐ject is constant,and the time complexity of comput‐ing the topic relevance of the hyperlink isk2×O(DP×n).Here,k2is the time consumption of word segmen‐tation,word frequency statistics,and link extraction from webpages;O(DP×n) represents the time com‐plexity of calculatingR(pi),PR(pl),andR(Al);DP andnare the number of downloaded webpages and the number of topic words,respectively.The time con‐sumption of selecting a hyperlink from the extended neighborhood set is assumed to bek3~O(m).There‐fore,the time complexity of the ITS algorithm can be expressed asO(m)×[k1×k2×O(DP×n)×k3].Becausek1~O(m),k2~O(n),andk3~O(m),the time complexity of the ITS algorithm isO(m3×DP×n2).

    Different from the ITS algorithm,the TS algo‐rithm selects a nontabu link with the best comprehen‐sive priority from neighborhoodC(Plink) when the tabooed hyperlink does not satisfy the aspiration cri‐terion,so its time complexity isO(m2×DP×n2).By analyzing the time complexities of ITS and TS,it can be seen that the time complexity of the ITS algorithm is higher than that of the TS algorithm.This results in a longer running time of the FCITS algorithm than the FCTS algorithm.

    It can be seen from Table 2 that not all evalua‐tion metrics of FCITS_OH have the optimal results.To better evaluate the effectiveness and superiority of FCITS_OH,the Friedman test (Derrac et al.,2011),which is a nonparametric statistical test,is used to comprehensively evaluate the performance of these algorithms.In this study,when DP=15 000,the re‐sults obtained by nine crawling algorithms for the four representative metrics (LP,AC,AR,and SD) are converted to average ranks.The best performing al‐gorithm for each metric should have the rank of 1,the second best ranks 2,and so on.The smaller the aver‐age rank is,the better the performance.Table 3 dis‐plays the experimental results of nine crawling algo‐rithms based on four evaluation metrics by the Fried‐man test when DP reaches 15 000.From Table 3,we can find that the FCITS_OH algorithm is the best performing algorithm among the nine algorithms in the two domains in terms of the four metrics.In sum‐mary,the experimental results show that FCITS_OH achieves impressive and satisfactory results in most performance evaluation metrics,particularly prevail‐ing over the other eight crawlers in LP and AC.There‐fore,we can conclude that the proposed FCITS_OH crawler is an effective semantic retrieval method.

    Table 3 Friedman ranks of nine crawling algorithms for the four representative evaluation metrics in the tourism and rainstorm disaster domains when DP reaches 15 000

    6 Conclusions

    The drawback of traditional crawlers is that they cannot provide enough topic-relevant information for a specific domain.To overcome the shortcomings oftraditional crawlers,this paper focuses on focused crawlers.We propose a novel focused crawling algo‐rithm,namely,FCITS_OH.Specifically,we construct a domain ontology based on the FCA method for topic description at the semantic and knowledge levels.The ITS strategy and host information are used to select the next hyperlink in the focused crawler.In addition,we design a comprehensive priority evaluation method for evaluating unvisited hyperlinks and preventing the problem of topic drifting.To demonstrate the effec‐tiveness and superiority of the FCITS_OH algorithm,we compare the experimental results of FCITS_OH and FCOITS with those of BFS,OPS,FCSA,and FCWSEO in the literature in the tourism domain and BFS,OPS,FCSA,OLMOACO,and FCWSEO in the rainstorm disaster domain under the same experimen‐tal environment.The experimental results show that FCITS_OH outperforms other focused crawling algo‐rithms and has the ability to collect more quantities and higher-quality webpages.Furthermore,we compare the experimental results of FCTS based on the origi‐nal TS and FCITS based on ITS.The experimental results confirm the effectiveness of the proposed ITS.

    The proposed FCITS_OH has some disadvan‐tages,however,such as no consideration of the tunnel crossing technique.It is possible for a hyperlink to cross an irrelevant webpage to a relevant webpage.In addition,in the topic-relevance evaluation of un‐visited hyperlinks in the focused crawlers,the tradi‐tional single-objective optimization method based on the weighted sum is adopted,which is difficult to determine the optimal weight coefficients reasonably.In future work,we intend to study focused crawlers based on the tunnel crossing technique and multiob‐jective intelligent optimization algorithms to improve our evaluation metrics.

    Contributors

    Jingfa LIU designed the research.Zhen WANG drafted the paper,implemented the software,and performed the experi‐ments.Guo ZHONG and Zhihe YANG revised and finalized the paper.

    Compliance with ethics guidelines

    Jingfa LIU,Zhen WANG,Guo ZHONG,and Zhihe YANG declare that they have no conflict of interest.

    Data availability

    Data are available in a public repository.

    List of supplementary materials

    Table S1 Seed uniform resource locators (URLs) in the tour‐ism domain

    Table S2 Seed uniform resource locators (URLs) in the rain‐storm disaster domain

    Appendix: Proposed focused crawling algorithms

    十分钟在线观看高清视频www | 久久久久性生活片| 高清在线视频一区二区三区| 久久av网站| 日韩欧美 国产精品| 日本wwww免费看| 偷拍熟女少妇极品色| 久久久久精品性色| 日韩电影二区| 精品熟女少妇av免费看| 日韩三级伦理在线观看| 男人和女人高潮做爰伦理| 欧美极品一区二区三区四区| 麻豆成人av视频| 大话2 男鬼变身卡| 日韩三级伦理在线观看| 老司机影院成人| 尤物成人国产欧美一区二区三区| 全区人妻精品视频| 大码成人一级视频| 热99国产精品久久久久久7| 99久久人妻综合| 亚洲在久久综合| 国产亚洲5aaaaa淫片| 一级毛片电影观看| 欧美丝袜亚洲另类| 成人高潮视频无遮挡免费网站| 久久99精品国语久久久| 人人妻人人添人人爽欧美一区卜 | 日本免费在线观看一区| 亚洲一级一片aⅴ在线观看| 六月丁香七月| av在线app专区| 亚洲av二区三区四区| 99热国产这里只有精品6| 国产黄片视频在线免费观看| 亚洲美女搞黄在线观看| 美女脱内裤让男人舔精品视频| 国产精品av视频在线免费观看| 91在线精品国自产拍蜜月| 欧美3d第一页| 国产精品一区二区三区四区免费观看| 日韩制服骚丝袜av| av不卡在线播放| 国产大屁股一区二区在线视频| h日本视频在线播放| 3wmmmm亚洲av在线观看| videossex国产| 国产深夜福利视频在线观看| 看十八女毛片水多多多| 成人国产av品久久久| 久久久久久久久久久免费av| 美女高潮的动态| 国产日韩欧美亚洲二区| 赤兔流量卡办理| 色综合色国产| 色5月婷婷丁香| 亚洲av不卡在线观看| 99久久精品一区二区三区| 久久女婷五月综合色啪小说| 成人毛片60女人毛片免费| 欧美xxⅹ黑人| freevideosex欧美| 视频区图区小说| 国产亚洲91精品色在线| 国产高清国产精品国产三级 | 欧美高清成人免费视频www| 久久精品国产亚洲av涩爱| 十八禁网站网址无遮挡 | 久久久久久久久久人人人人人人| .国产精品久久| 永久免费av网站大全| 日本午夜av视频| 一区二区av电影网| 18禁裸乳无遮挡动漫免费视频| 中文精品一卡2卡3卡4更新| 男女国产视频网站| 一二三四中文在线观看免费高清| 久久午夜福利片| 成年人午夜在线观看视频| 51国产日韩欧美| 蜜桃亚洲精品一区二区三区| 美女高潮的动态| 亚洲精品乱码久久久v下载方式| 亚洲国产精品国产精品| 亚洲第一av免费看| 蜜桃在线观看..| 人妻制服诱惑在线中文字幕| www.av在线官网国产| 久久人妻熟女aⅴ| 亚洲天堂av无毛| 一区二区av电影网| 色哟哟·www| 午夜福利影视在线免费观看| 51国产日韩欧美| 亚洲国产成人一精品久久久| 久久精品国产亚洲av天美| 国产亚洲av片在线观看秒播厂| 国产精品人妻久久久久久| 亚洲国产高清在线一区二区三| 少妇人妻一区二区三区视频| 少妇丰满av| 国产精品三级大全| 欧美日韩视频高清一区二区三区二| 美女福利国产在线 | 蜜桃亚洲精品一区二区三区| 最后的刺客免费高清国语| 狂野欧美激情性bbbbbb| 丝瓜视频免费看黄片| 免费人成在线观看视频色| 色5月婷婷丁香| 国产熟女欧美一区二区| 亚洲精品日韩在线中文字幕| 国内揄拍国产精品人妻在线| 2018国产大陆天天弄谢| 最近中文字幕高清免费大全6| 我要看黄色一级片免费的| 少妇 在线观看| 欧美高清性xxxxhd video| 久久这里有精品视频免费| 黄色配什么色好看| 欧美三级亚洲精品| 只有这里有精品99| 我的女老师完整版在线观看| 日本猛色少妇xxxxx猛交久久| 美女xxoo啪啪120秒动态图| 欧美xxxx黑人xx丫x性爽| 国产精品99久久99久久久不卡 | 香蕉精品网在线| av黄色大香蕉| 免费看不卡的av| av.在线天堂| 黄片wwwwww| 亚洲欧洲日产国产| 大片免费播放器 马上看| 在线免费十八禁| 中文字幕免费在线视频6| 老女人水多毛片| 80岁老熟妇乱子伦牲交| 99国产精品免费福利视频| 女人久久www免费人成看片| 成人亚洲精品一区在线观看 | 日韩中文字幕视频在线看片 | 欧美激情国产日韩精品一区| 精品国产露脸久久av麻豆| 1000部很黄的大片| 久久国产乱子免费精品| 国产亚洲91精品色在线| 美女内射精品一级片tv| 2021少妇久久久久久久久久久| 男人狂女人下面高潮的视频| 国产色婷婷99| 亚洲中文av在线| 亚洲av国产av综合av卡| 亚洲美女视频黄频| 免费av中文字幕在线| 国产免费一区二区三区四区乱码| 97精品久久久久久久久久精品| 最近最新中文字幕大全电影3| 日韩一区二区视频免费看| 亚洲国产日韩一区二区| 国产高潮美女av| av国产免费在线观看| 成人国产av品久久久| 日本-黄色视频高清免费观看| 久久99精品国语久久久| 99久久中文字幕三级久久日本| 亚洲国产精品一区三区| 女性被躁到高潮视频| 午夜精品国产一区二区电影| 精品视频人人做人人爽| 色网站视频免费| 街头女战士在线观看网站| 国产精品三级大全| 久久国内精品自在自线图片| 中文字幕免费在线视频6| h视频一区二区三区| 大陆偷拍与自拍| 久久精品国产自在天天线| 亚洲欧洲日产国产| 97热精品久久久久久| 国产精品国产三级国产专区5o| 亚洲国产毛片av蜜桃av| 波野结衣二区三区在线| 街头女战士在线观看网站| 欧美国产精品一级二级三级 | av女优亚洲男人天堂| 狠狠精品人妻久久久久久综合| 欧美日本视频| 1000部很黄的大片| 97超视频在线观看视频| 嫩草影院新地址| 国产美女午夜福利| 国产高清三级在线| 人人妻人人澡人人爽人人夜夜| 五月伊人婷婷丁香| 卡戴珊不雅视频在线播放| 国产视频内射| 国产av一区二区精品久久 | 五月玫瑰六月丁香| 精品久久久精品久久久| 香蕉精品网在线| 97超碰精品成人国产| 永久免费av网站大全| 97超视频在线观看视频| av卡一久久| 国产无遮挡羞羞视频在线观看| 日韩视频在线欧美| 亚洲电影在线观看av| 久久99热这里只有精品18| 久久国产亚洲av麻豆专区| 97精品久久久久久久久久精品| 青春草视频在线免费观看| 国产免费福利视频在线观看| 亚州av有码| 亚洲欧美清纯卡通| 嘟嘟电影网在线观看| 久久久久久久久久成人| 亚洲熟女精品中文字幕| 男女国产视频网站| 亚洲欧美中文字幕日韩二区| 青春草视频在线免费观看| 国产有黄有色有爽视频| 国产精品国产三级国产av玫瑰| 少妇猛男粗大的猛烈进出视频| 纯流量卡能插随身wifi吗| 黄色配什么色好看| 99热国产这里只有精品6| 久久 成人 亚洲| 午夜精品国产一区二区电影| 色综合色国产| 国产精品福利在线免费观看| 国产亚洲最大av| 狂野欧美白嫩少妇大欣赏| 青春草视频在线免费观看| 99视频精品全部免费 在线| 男女下面进入的视频免费午夜| 嫩草影院入口| 国产人妻一区二区三区在| videos熟女内射| 国产熟女欧美一区二区| 久久久久久久久大av| av卡一久久| 国产久久久一区二区三区| 黄色配什么色好看| 少妇丰满av| 久久97久久精品| 熟妇人妻不卡中文字幕| 成人免费观看视频高清| 最黄视频免费看| 久久6这里有精品| 性色av一级| 日韩在线高清观看一区二区三区| 在线观看av片永久免费下载| 国产精品三级大全| 国产人妻一区二区三区在| 久久毛片免费看一区二区三区| 日韩一本色道免费dvd| 精品午夜福利在线看| 国产老妇伦熟女老妇高清| 国产成人精品一,二区| av.在线天堂| 国产精品国产三级国产av玫瑰| 久久国产亚洲av麻豆专区| 日日啪夜夜爽| 久久国产精品大桥未久av | 最新中文字幕久久久久| av国产精品久久久久影院| 热re99久久精品国产66热6| av视频免费观看在线观看| 久久久亚洲精品成人影院| a级毛片免费高清观看在线播放| 男人爽女人下面视频在线观看| 99久久精品一区二区三区| 日韩欧美 国产精品| 一边亲一边摸免费视频| 大香蕉97超碰在线| 国产亚洲av片在线观看秒播厂| 国产精品一区二区性色av| 国产精品国产三级国产av玫瑰| 国产片特级美女逼逼视频| 人妻 亚洲 视频| 免费人妻精品一区二区三区视频| 久久久久久久精品精品| 中文字幕制服av| 欧美最新免费一区二区三区| 99久久中文字幕三级久久日本| 午夜福利影视在线免费观看| 免费少妇av软件| 两个人的视频大全免费| 国产精品免费大片| 欧美xxxx性猛交bbbb| 国产成人一区二区在线| 亚洲怡红院男人天堂| 国产在视频线精品| 亚洲丝袜综合中文字幕| 日韩av不卡免费在线播放| 国产精品一区二区在线不卡| 妹子高潮喷水视频| 女性生殖器流出的白浆| 首页视频小说图片口味搜索 | 男女之事视频高清在线观看 | 亚洲中文字幕日韩| 欧美人与善性xxx| 美女主播在线视频| 99国产精品99久久久久| 国产亚洲精品第一综合不卡| 在线看a的网站| 欧美大码av| 伊人久久大香线蕉亚洲五| 免费高清在线观看日韩| 亚洲国产欧美在线一区| 亚洲精品成人av观看孕妇| 热99久久久久精品小说推荐| 国产一区二区三区av在线| 啦啦啦视频在线资源免费观看| 在线观看人妻少妇| 久久免费观看电影| 黑人欧美特级aaaaaa片| 婷婷成人精品国产| 女人久久www免费人成看片| 在线av久久热| 亚洲精品一卡2卡三卡4卡5卡 | 久久久久久久国产电影| 免费女性裸体啪啪无遮挡网站| 美女扒开内裤让男人捅视频| 久热爱精品视频在线9| 国产一区二区激情短视频 | 高清av免费在线| 国产片特级美女逼逼视频| 黄片小视频在线播放| 午夜老司机福利片| 中文乱码字字幕精品一区二区三区| a级片在线免费高清观看视频| 女警被强在线播放| 欧美乱码精品一区二区三区| 日韩制服丝袜自拍偷拍| 国产成人精品久久二区二区91| 青春草亚洲视频在线观看| 国产深夜福利视频在线观看| 男女下面插进去视频免费观看| 国产日韩一区二区三区精品不卡| 久久久久久久国产电影| 高清黄色对白视频在线免费看| 免费av中文字幕在线| 每晚都被弄得嗷嗷叫到高潮| 国产精品av久久久久免费| 国产人伦9x9x在线观看| 男人舔女人的私密视频| 人成视频在线观看免费观看| 校园人妻丝袜中文字幕| 亚洲欧美色中文字幕在线| 又大又爽又粗| 精品国产国语对白av| 国产精品久久久人人做人人爽| 亚洲天堂av无毛| 在线 av 中文字幕| 亚洲国产精品999| 少妇人妻久久综合中文| 侵犯人妻中文字幕一二三四区| 国产男女超爽视频在线观看| 欧美国产精品一级二级三级| 亚洲专区中文字幕在线| 交换朋友夫妻互换小说| 国产成人av激情在线播放| 在线观看人妻少妇| 欧美精品人与动牲交sv欧美| 久久精品久久精品一区二区三区| 丝袜喷水一区| 国产黄频视频在线观看| 一区二区三区乱码不卡18| 亚洲精品一二三| 首页视频小说图片口味搜索 | 捣出白浆h1v1| 日韩中文字幕视频在线看片| 黄色 视频免费看| 国产成人精品久久久久久| 国产日韩一区二区三区精品不卡| 亚洲少妇的诱惑av| 欧美日韩成人在线一区二区| 一级毛片我不卡| 亚洲国产精品999| 爱豆传媒免费全集在线观看| 超碰97精品在线观看| 亚洲九九香蕉| av天堂久久9| 极品少妇高潮喷水抽搐| 汤姆久久久久久久影院中文字幕| 久久国产亚洲av麻豆专区| 少妇被粗大的猛进出69影院| 国产国语露脸激情在线看| 一级黄色大片毛片| 色网站视频免费| 日本av手机在线免费观看| 一区福利在线观看| 国产成人一区二区三区免费视频网站 | 国产精品一区二区免费欧美 | 精品国产一区二区三区四区第35| av福利片在线| 久久久久国产精品人妻一区二区| 国产av精品麻豆| 亚洲国产精品一区三区| 亚洲国产毛片av蜜桃av| 一区在线观看完整版| 欧美日韩精品网址| 亚洲国产精品999| 久久久国产一区二区| 午夜免费成人在线视频| 丝袜人妻中文字幕| 王馨瑶露胸无遮挡在线观看| 亚洲精品一二三| 国产男女内射视频| 久久人人97超碰香蕉20202| 超色免费av| 又黄又粗又硬又大视频| 国产欧美日韩精品亚洲av| 性少妇av在线| 久久青草综合色| 1024香蕉在线观看| 69精品国产乱码久久久| 丁香六月欧美| 久久av网站| 亚洲欧洲日产国产| 国产一区有黄有色的免费视频| 亚洲色图 男人天堂 中文字幕| 精品福利观看| 少妇裸体淫交视频免费看高清 | 国产精品久久久久久精品古装| 免费女性裸体啪啪无遮挡网站| 日日爽夜夜爽网站| 国产91精品成人一区二区三区 | 天天躁狠狠躁夜夜躁狠狠躁| 亚洲熟女精品中文字幕| 狂野欧美激情性bbbbbb| 成年美女黄网站色视频大全免费| 国产1区2区3区精品| 亚洲成人国产一区在线观看 | 少妇精品久久久久久久| 一区在线观看完整版| 亚洲欧美中文字幕日韩二区| 黄色一级大片看看| 欧美大码av| 欧美在线黄色| 国产成人91sexporn| 免费看av在线观看网站| 电影成人av| 国产精品熟女久久久久浪| 国产成人精品在线电影| 久久热在线av| 两个人免费观看高清视频| 国产麻豆69| 亚洲国产成人一精品久久久| 超色免费av| 男人爽女人下面视频在线观看| 人体艺术视频欧美日本| 久久青草综合色| 国产不卡av网站在线观看| 国产精品免费大片| 国产精品麻豆人妻色哟哟久久| 青草久久国产| 99久久综合免费| 欧美黑人欧美精品刺激| 久久ye,这里只有精品| 日韩视频在线欧美| 少妇人妻久久综合中文| 中文字幕制服av| 69精品国产乱码久久久| 一级黄色大片毛片| 天天操日日干夜夜撸| 丰满少妇做爰视频| 国产在线免费精品| 久久青草综合色| 亚洲男人天堂网一区| 日韩,欧美,国产一区二区三区| 成人手机av| 日韩一区二区三区影片| 男的添女的下面高潮视频| 日韩免费高清中文字幕av| 国产免费现黄频在线看| 啦啦啦在线免费观看视频4| 18禁国产床啪视频网站| 夫妻午夜视频| 美女中出高潮动态图| 亚洲免费av在线视频| av线在线观看网站| 国产欧美日韩综合在线一区二区| 亚洲熟女毛片儿| 精品免费久久久久久久清纯 | 少妇粗大呻吟视频| 制服诱惑二区| 国产精品久久久久久精品古装| 成人手机av| 亚洲av综合色区一区| 大香蕉久久成人网| 久热爱精品视频在线9| 欧美在线一区亚洲| 中文字幕精品免费在线观看视频| 香蕉丝袜av| 又黄又粗又硬又大视频| 久久人妻福利社区极品人妻图片 | 国产一区二区三区综合在线观看| 亚洲av电影在线观看一区二区三区| 美女中出高潮动态图| 免费久久久久久久精品成人欧美视频| 亚洲第一av免费看| 各种免费的搞黄视频| 丝袜喷水一区| 精品久久蜜臀av无| 无遮挡黄片免费观看| 极品人妻少妇av视频| 午夜视频精品福利| 中文字幕人妻熟女乱码| 欧美日韩黄片免| 久久九九热精品免费| 国产亚洲av高清不卡| 母亲3免费完整高清在线观看| 亚洲一区中文字幕在线| 亚洲欧美精品自产自拍| 婷婷色综合www| 肉色欧美久久久久久久蜜桃| netflix在线观看网站| 人体艺术视频欧美日本| 咕卡用的链子| 精品国产一区二区三区久久久樱花| 好男人视频免费观看在线| 精品久久久久久久毛片微露脸 | 国产成人欧美在线观看 | 丝袜喷水一区| 亚洲人成电影观看| 一区二区三区四区激情视频| 亚洲国产精品国产精品| 男女边摸边吃奶| 久久热在线av| 蜜桃国产av成人99| 你懂的网址亚洲精品在线观看| 国产精品三级大全| 久久久久久久大尺度免费视频| 大香蕉久久网| 日日夜夜操网爽| 午夜福利免费观看在线| 欧美日韩视频高清一区二区三区二| 一区二区日韩欧美中文字幕| 国产精品三级大全| 欧美老熟妇乱子伦牲交| 丰满迷人的少妇在线观看| 久久av网站| 高潮久久久久久久久久久不卡| www.999成人在线观看| 亚洲,一卡二卡三卡| 午夜福利免费观看在线| 精品卡一卡二卡四卡免费| 国产男女超爽视频在线观看| 美国免费a级毛片| 精品免费久久久久久久清纯 | 狠狠精品人妻久久久久久综合| 又大又黄又爽视频免费| 免费在线观看影片大全网站 | 啦啦啦视频在线资源免费观看| 国产一区二区三区综合在线观看| 一区二区三区四区激情视频| 岛国毛片在线播放| 国产免费一区二区三区四区乱码| 美女脱内裤让男人舔精品视频| 亚洲欧美清纯卡通| 永久免费av网站大全| 中文乱码字字幕精品一区二区三区| 欧美精品一区二区大全| 精品福利观看| 18在线观看网站| 精品国产超薄肉色丝袜足j| 亚洲熟女精品中文字幕| 女人被躁到高潮嗷嗷叫费观| 亚洲欧洲国产日韩| 午夜免费成人在线视频| 成人国语在线视频| 美女午夜性视频免费| 欧美精品av麻豆av| 99re6热这里在线精品视频| 日日夜夜操网爽| 精品国产一区二区三区久久久樱花| 我要看黄色一级片免费的| 欧美在线一区亚洲| 国产亚洲av高清不卡| 麻豆国产av国片精品| 日本五十路高清| 国产男人的电影天堂91| 久久毛片免费看一区二区三区| 我的亚洲天堂| 午夜福利在线免费观看网站| 我要看黄色一级片免费的| 母亲3免费完整高清在线观看| 日韩中文字幕欧美一区二区 | 每晚都被弄得嗷嗷叫到高潮| 99久久精品国产亚洲精品| 国产真人三级小视频在线观看| 多毛熟女@视频| 91精品三级在线观看| 国产成人精品久久二区二区免费| 免费女性裸体啪啪无遮挡网站| 亚洲午夜精品一区,二区,三区| 嫁个100分男人电影在线观看 | 麻豆国产av国片精品| 国产在线一区二区三区精| 婷婷丁香在线五月| 久久毛片免费看一区二区三区| 脱女人内裤的视频| 国产有黄有色有爽视频| 亚洲国产精品成人久久小说| 一区二区三区精品91| 91字幕亚洲| 国产精品亚洲av一区麻豆| 考比视频在线观看| 人人澡人人妻人| 久久女婷五月综合色啪小说| 国产老妇伦熟女老妇高清|