• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Survey about Algorithms Utilized by Focused Web Crawler

    2018-07-27 07:03:28YongBinYuShiLeiHuangNyimaTashiHuanZhangFeiLeiandLinYangWu
    關(guān)鍵詞:日喀則青藏那曲

    Yong-Bin Yu, Shi-Lei Huang, Nyima Tashi, Huan Zhang, Fei Lei, and Lin-Yang Wu

    Abstract—Focused crawlers (also known as subjectoriented crawlers), as the core part of vertical search engine, collect topic-specific web pages as many as they can to form a subject-oriented corpus for the latter data analyzing or user querying. This paper demonstrates that the popular algorithms utilized at the process of focused web crawling, basically refer to webpage analyzing algorithms and crawling strategies (prioritize the uniform resource locator (URLs) in the queue).Advantages and disadvantages of three crawling strategies are shown in the first experiment, which indicates that the best-first search with an appropriate heuristics is a smart choice for topic-oriented crawling while the depth-first search is helpless in focused crawling. Besides, another experiment on comparison of improved ones (with a webpage analyzing algorithm added) is carried out to verify that crawling strategies alone are not quite efficient for focused crawling and in most cases their mutual efforts are taken into consideration. In light of the experiment results and recent researches, some points on the research tendency of focused crawler algorithms are suggested.

    1. Introduction

    With the skyrocketing of information and explosion of web page in the World Wide Web, it becomes much harder and inconvenient for “advanced users” to retrieve relevant information. In other words, it leads to a big challenge for us, about how to distill the miscellaneous web resources and get the results we really want. The limitations of generalpurpose search engine, such as lacking in precision, lead to enormous “topic-irrelevant” information problems, or being unable to deal with complicated information needs of certain individuals or organizations. These problems have increased the phenomena that users would like to choose the vertical search engine for specialized, high-quality, and upto-date results. As for the vertical search engine, the focused crawler[1]is a core part. In contrast to a general-purpose spider which crawls almost every web page and stores them into a database, a focused crawler tries to download as many pages pertinent to the predefined topic as it can, and keep the irrelevant ones downloaded to a minimum at the same time.

    Before starting scraping, both kinds of crawlers are given a set of seed uniform resource locators (URLs) as their start points. For focused crawlers, increasing the precision of fetching topic-specific webpage significantly makes the seeds choose paramount importance. Selection of appropriate seed URLs relies on external services which usually are general web search engines like Google and Yahoo, and several attempts have already been made to improve the harvest rate (the percentage of relevant pages retrieved) by utilizing search engines as a source of seed URLs[2],[3]. The approach of seeds selecting is beyond the scope of this survey but for its importance, we still bring it up.

    The complete process of crawling is modeled as a directed graph (no loop with duplicate URL checking algorithms applied and not taking into account of the URLs downloading sequence), each webpage is represented by a node and a link from one page to another presented by an edge. Furthermore,all seeds with their each offspring pages form a forest and each seed page becomes the root of a single tree of the forest, as shown in Fig. 1. The spider uses a certain crawler algorithm to traverse the whole graph (forest).

    At the beginning of crawling, the seed URL (seed page) is sent to the URLs manager. Then, the downloader fetches the URL and gets it downloaded after the manager ensures that there exists a new URL. The contents of the webpage are fed to a specifically designed parser which distills the irregular data to get new URLs we truly want queued in the URLs manager.Webpages or parsed data is stored in a container which can be a local file system or database, depending on the requirements of crawler design. The process repeats ad infinitum while the focused crawler traverses the web until the URLs manager is empty or some certain preset stop conditions are fulfilled. The simplified crawler workflow is shown in Fig. 2.

    Fig. 1. Web crawling forest.

    Fig. 2. Simplified crawler workflow.

    For a well-designed subject-oriented crawler, it is necessary to be equipped with appropriate algorithms for the URLs manager and parser, crawling strategies, and webpage analyzing algorithms, respectively[4],[5]. Crawling strategies determine the downloaded sequence of the URLs, namely prioritizing the URLs, which mostly relies on the evaluation of relatedness to a predefined subject. As for the parser, it is designed to extract topic-relevant URLs and dump irrelevant ones, and evaluate the pertinence of the webpage to the subject.The two components are not mutually exclusive but cooperate closely with each other to address the key challenge of focused crawling, identifying the next most relevant link to fetch from the URLs queue.

    The contributions of this paper are as follows:

    1) Provide a comprehensive review to webpage analyzing algorithms and crawling strategies.

    2) The close relationships between the two types of algorithms are stated in detail.

    3) Experiments are carried out to prove that the combination of the two types of algorithms tremendously improves the efficiency of the focused crawler.

    The rest of this paper is organized as follows. Related work in recent years is introduced in Section 2. Sections 3 and 4 provide overviews on webpage analyzing algorithms and crawling strategies, respectively. Experimental results and comparisons as well as some thoughts are shown in Section 5.Finally, concluding remarks are given in Section 6.

    2. Related Work

    This section provides a quick review of recent studies in focused crawlers, from which the future research tendency is exposed.

    To alleviate the problems of local searching algorithms(like depth-first and breadth-first), the genetic algorithm (GA)is proposed to improve the quality of searching results in focused crawling[5]. In [6], the results showed GA can effectively prevent the search agent from being trapped in local optimal and it also can significantly improve the quality of search results. As a global searching algorithm, GA allows the crawler to find pertinent pages without any distance limit and it does not introduce noise into the pages collections. GA is an adaptive and heuristic method for solving optimization problems. In B. W. Yohaneset al. ’s paper[5], heuristics referred to effective combination of content-based and linkbased page analysis. Their results show GA-crawler can traverse the web search space more comprehensively than traditional focused crawler. In C. Yuet al. ’s research[7], GA was used to acquire optimal combination of the full text and anchor text instead of the linear combination which may lack objectivity and authenticity. Apart from GA, a focused clawer which applied the cell-like membrane computing optimization algorithm (CMCFC) was proposed in W. Liu and Y. Du’s work[8]. Similarly, the purpose of this algorithm is to overcome the weakness of empirical linear combination of weighted factors.

    A two-stage effective deep web harvesting framework,namely SmartCrawler, was proposed in [9] to explore more web resources which are beyond the reach of search engine.The framework encompasses two stages. In the first stage,SmartCrawler performs site-based searching for the most relevant web site for a given topic with the help of search engines. In the second stage, it achieves fast in-site searching by excavating the most pertinent links with an adaptive linkranking. The studies show the approach achieves wide coverage for deep net interfaces and maintains extremely efficient locomotion. In contrast to other crawlers, it gets a higher harvest rate.

    Ontology can be used to represent the knowledge underlying topics and web documents[10]. Semantic focused crawlers are able to download relevant pages precisely and efficiently by automatically understanding the semantics underlying the web information and predefined topic. The uncertain quality of ontology leads to the limitation of ontology-based semantic focused crawlers, so in order to work it out, ontology learning technologies are integrated. With ontology learning, facts and patterns from a corpus of data can be semi-automatically extracted and turned into machinereadable ontology. H. Donget al.[11]and K. S. Dilipet al.[12]proposed self-adaptive focused crawlers based on ontology learning. H. Donget al. managed to precisely discover, format,and index relevant web documents in the uncontrolled web environment, while K. S. Dilipet al. aimed at overcoming the issues caused by tagged web assets. R. Gauret al.[13]leveraged an ontology-based method to semantically expand the search topic and make it more specific. To find the most relevant webpage, which meet, the requirements of users, G. H. Agreet al.[14]utilized the knowledge path based on the ontology tree to get relevant URLs. R. Zuninoet al.[15]presented an approach integrated with the semantic networks and ontology to build a flexible focused crawler. Some researches[16]-[21]on focused crawler based on the semantics have been published in recent years.

    It is quite normal for one irrelevant webpage to contain several segments pertinent to the predefined topic and for one relevant webpage to encompass irrelevant parts. To predict more precisely, the webpage is divided into smaller units for further analysis. A. Seyfiet al.[22],[23]proposed a sort of focused crawler based on T-graph to prioritize each unvisited URL with a high accuracy. A heuristic-based approach, content block partition-selective link context (CBP-SLC), was proposed in[24], with which highly pertinent regions in an excessive jumble webpage were not obscured. Similarly, B. Gangulyet al. designed an algorithm which segments the webpage in the light of headings into content blocks and calculates the relevancy of each block separately[25].

    To analyze the features of documents, they can be represented by the vector space model (VSM). The relevancy between documents is measured by their similarity and a basic method used to do correlation analysis was introduced in [26].

    With the boom of artificial intelligence (AI), some researchers tend to study on learning-based focused crawlers.The classifier is a commonly used tool in learning-based focused crawlers just as what were presented in [27] and [28].J. A. Torkestani proposed an adaptive algorithm based on learning automata with which the focused crawler can get the most relevant URLs and know how to approach to the potential target documents[29]. M. Kumaret al.[21]proposed to design a database for information retrieval of Indian origin academicians. H. Luet al.[30]presented an approach which uses web page classification and link priority evaluation.

    As for this paper, a review work, it should be noted that some references may have been overlooked owing to numerous variants, just like the focused crawler based on links and contents. In the next two sections, we consecutively review two basic sorts of algorithms: Webpage analyzing algorithms and crawling strategies.

    3. Webpage Analyzing Algorithms

    In this section, PageRank, SiteRank, vision-based page segmentation (VIPS), as well as densitometric segmentation,four algorithms are introduced.

    3.1 PageRank

    PageRank[3],[31]was proposed by S. Brin and L. Page, the founders of Google. The PageRank of a page represents the probability that a random surfer who follows a link randomly from one page to another will be on that page at any given time. A page’s score depends recursively upon the scores of the pages that point to it. Sourcepages distribute their PageRank across all of their outlinks.However, PageRank based on hyperlink analysis without any notion of page contents, analyzing the authority of hyperlinks, undoubtedly is an excellent URL ordering metric but this is just useful if we are just looking for hot pages but not topic-specific pages[32].

    Based on the fact that the original PageRank algorithm is a topic-independent measure of the importance of webpages,several improvements have been made to guide topic-driven crawlers. In [33], a combination of PageRank and similarity with the topic was used to guide the crawler. An improved PageRank algorithm called T-PageRank was proposed in [34]and [35], which is based on the “topical random surfer” in contrast to “a random surfer” in PageRank, and for a better performance, it is combined with the topic similarity of the hyperlink metadata.

    3.2 SiteRank

    Classical centralized algorithms like PageRank are both time-consuming and costly for the whole web graph. J. Wu and K. Aberer in their work[36]put forward a higher abstraction level of Web graph, that is SiteGraph instead of DocGraph. They distributed the task of computing page ranking to a set of distributed peers each of which crawled and stored a small fraction of web graph. In complete contrast to setting up a centralized storage, indexing, link analysis system used to compute the global PageRank of all documents based on the global Web graph and document link structure, they built a decentralized system whose participating servers computed the global ranking of their locally crawled and stored subset of Web based on the local document link structure and the global SiteRank.

    The result shows that the computation of the SiteRank of such a Web-scale SiteGraph is fully tractable in a low-end personal computer (PC). Compared with PageRank, the cost is largely reduced while SiteRank keeps good quality of the ranking results. Additionally, it is difficult to spam SiteRank for a PageRank spammer since they have to set up a quantity of spamming Web sites to take advantage of the spamming SiteLinks.

    3.3 VIPS

    There are a lot of links in one page, only a few of which have something to do with the predefined topic. PageRank and SiteRank do not distinguish these links, leading to noise like ads’ links while analyzing the webpages. PageRank and SiteRank are the algorithms of page granularity and site granularity, respectively. VIPS, a block granularity of webpage algorithm used to compute the relevance and accurately predict the unvisited URLs[37], was proposed in[38]. The algorithm spans over three phases: Visual block extraction, visual separator detection, and content structure construction. These three steps as a whole are regarded as a round. The visited webpage is segmented into several big blocks, and for each big block, the same segmentation process repeats recursively until we get adequately small blocks.

    VIPS here is adopted to segment webpages to extract relevant links or contents, by means of utilizing visual cues and document object model (DOM) tree to better partition a page at the semantic level. The algorithm is based on the fact that semantically related contents are usually grouped together and the entire page is divided into regions for different contents using explicit or implicit visual separators such as lines, blank areas, images, font sizes, and colors. The output of the algorithm is a content structure tree whose nodes depict semantically coherent content[39].

    In [38], it diagramed the working process of VIPS in details. It first extracts all the suitable blocks from the hyper text markup language (HTML) DOM tree, and then it tries to find the separators denote the horizontal and vertical lines in a webpage that usually cross with no blocks. Finally, based on these separators, the semantic structure for the webpage is constructed. The URLs and their anchor text in the pertinent blocks are extracted for calculating the scores, based on the scores of the URLs ranked in the URLs manager.

    3.4 Densitometric Segmentaion

    According to Kohlschütter, for the segmentation problem,only three types exist: Segmentation as a visual problem,segmentation as a linguistic problem, and segmentation as a densitometric problem[40].

    Visual segmentation is the easiest one for common people to understand. A man looks at a webpage and he is able to immediately distinguish one section from another. A computer vision algorithm can act similarly. However, it has to render each webpage and do a pile of things that are computationally quite expensive. The linguistic approach is somewhat more reasonable. Distributions of linguistic units such as words,syllables, and sentences have been widely used as statistical measures to identify structural patterns in plain text. The shortage is that it only works for large blocks of text, which means linguistic contents in small blocks like the header and footer are usually lacking for analysis.

    Kohlschütter’s densitometric approach has a tendency to work as well as a visual algorithm, while as fast as a linguistic approach. The basic process is walking through nodes, and assigning a text density to each node (the density is defined as the result of dividing the number of tokens by the number of ‘lines’). Merge neighbor nodes with the same densities and repeat the process until desired granularity is reached somewhat like VIPS.

    What is brilliant is the simplicity of this algorithm and that they managed to get the consuming time down to 15 ms per page on average. By contrast, visual segmentation takes 10 s to process a single webpage on average.

    4. Crawling Strategies

    In this section, for crawling strategies, there are three algorithms taken into consideration, breadth-first search, depthfirst search, and best-first search.

    4.1 Depth-First Search Strategy

    It is an algorithm of traversing graph data structures.Generally, when the algorithm starts, a node is chosen as a root node and next nodes are selected in a way that the algorithm explores as far as possible along the branch. In short, the algorithm begins by diving down as quickly as possible to the leaf nodes of the tree. When there is no node for the next searching step, it goes back to the previous node to check whether there exists another unvisited node. Taking the left tree in Fig. 1 as an example, the seed page 1 is searched, then page 1.1, page 1.1.1, and 1.1.2 get searched in sequence, and finally,page 1.2, page 1.2.1, and page 1.2.2 are searched in order.

    However, this algorithm might end up in an infinite loop[41],that is the crawler may get trapped, leading to that only a very tiny part of tree is explored, with the reality taken into consideration that the web graph we want to traverse is so tremendously enormous that we can consider it as an infinite graph, which means the graph has infinite nodes or paths. A depth limit (the algorithm can only go down a limited level) is a trial of solving this trap problem, even if it is not exhaustive.The depth-first alone is not appropriate to focused crawling in which we hope to cover relevant webpages as many as possible whereas the depth-first ignores quite a few.

    For the experiment purposes, the standard version of algorithm was modified in order to apply it to the Web search domain[42]. Instead of selecting the first node, there is selected a list of seed links, which is the beginning point of the algorithm.Thereafter, a link is chosen as the first link from the list. It is being evaluated by the algorithm of classification whether it satisfies criteria of specified domain. If it is classified as a proper website, it will land to the bucket of found links. In other case, the link is only marked as visited. Then the chosen website is parsed in order to extract new links from a page and all found links are inserted at the beginning of the URLs queue.The queue behaves in this case like a last in first out (LIFO)queue.

    4.2 Breadth-First Search Strategy

    The same as the depth-first, breadth-first is an algorithm of traversing graph data structures as well. The breadth-first search strategy is the simplest form of crawling algorithm,which starts with a link and carries on traversing the connected links (at the same level) without taking into consideration any knowledge about the topic. It will go down to the next level after completely going through one level. For example, in Fig.1, the breadth-first behaves like the process that in the left tree,the seed page1 is searched at the beginning, after which page 1.1 and page1.2 get searched in sequence and finally, page 1.1.1, page 1.1.2, page 1.2.1, as well as page 1.2.2 are searched in order. Since it does not take into account the relevancy of the path while traversing, it is also known as the blind search algorithm. The reason why the breadth-first can be adopted as a focused crawler is based on the theory that webpages at the same level or within a certain level away from the seed URL have a large probability to be topic-relevant[32]. For instance, in Fig. 1, there are 3 levels in total, the offspring of each seed is likely to share the pertinent subject with their own seed page.

    When the algorithm starts, there is a chosen starting node,in the case of focus Web crawling, there is a bunch of chosen root URL, (seed URLs), and the algorithm explores them in a way to firstly download pages from the same level on the branch. On each page, there are extracted external links from the Web pages and all these links land in the searching tree on the leafs of the current node (one level below the current node).When the crawler visits all the URLs from the same level, it goes a deeper level and starts the same operation again. So it traverses the tree level by level. The URLs queue behaves in this situation as a first in first out (FIFO) queue[42].

    4.3 Best-First Search Strategy

    The best-first search algorithm is also a graph based algorithm used for traversing the graph. According to a certain webpage analyzing algorithm, it predicts the similarity between the candidate URLs and target webpage or candidates’ relatedness to the predefined topic, picking the best one to fetch.

    Firstly, a rule about what is “best” should be defined,which usually is a score or rank. In most cases, a classifier algorithm is applied such as the naive bayes, cosine similarity, support vector machine (SVM), and stringmatching are used for scoring.

    Unlike the blind search algorithms—depth-first and breadth-first, best-first are algorithms that use heuristic methods to improve its results. Heuristics here refers to a general problem-solving rule or set of rules that do not guarantee the best solution, but serve useful information for solving problems. In our case the role of heuristics is to evaluate the website URL, based on the given knowledge before fetching its contents. The name of best-first refers to the method of exploring the node with the best score first. An evaluation function is used to assign a score to each candidate node. There are many different ways of evaluating the link before fetching the whole contents, e.g. evaluating the words in URL and evaluating the anchor text of the link, and more detailed researches in this area have been described in [43] to [49]. The algorithm maintains two lists, one containing a list of candidates to explore, and one containing a list of visited nodes.Since all unvisited successor nodes of every visited node are included in the list of candidates, the algorithm is not restricted to only exploring successor nodes of the most recently visited node. In other words, the algorithm always chooses the best of all unvisited nodes that have been graphed, rather than being restricted to only a small subset, such as the nearest neighbors.The previously stated algorithms the depth-first search and breadth-first search have this restriction.

    5. Experiments and Comparisons

    5.1 Comparisons of Three Crawling Strategies

    In order to compare the efficiency of these three crawling strategies in pertinent topic webpages crawling process, one experiment was conducted. The crawling platform was written in Python language and deployed on Windows 10 operating system.

    Before starting crawling, we needed to choose appropriate seed URLs and set the standard that one webpage could be classified as a relevant webpage. Thus,two keywords lists were created for seeds generation,correlation test, as well as link analysis. In this experiment,the main topic was ‘西藏建設(shè)成果’ which means the development achievements of Tibet. One keywords list contained ‘西藏’, ‘拉薩’, ‘日喀則’, ‘青藏’, ‘昌都’, ‘林芝’,‘山南’, ‘那曲’, ‘阿里’, referring to the geographic names that usually appeared in news relevant with Tibet. The other keywords list was based on ‘發(fā)展’, ‘建設(shè)’, ‘成果’, ‘成績’,‘偉業(yè)’, ‘創(chuàng)舉’, ‘事業(yè)’, ‘業(yè)績’, ‘成就’, ‘果實(shí)’, ‘碩果’, ‘進(jìn)展’, which were the synonyms of ‘a(chǎn)chievement’ or‘developing’, or shared the same semantics with‘a(chǎn)chievement’ or ‘developing’.

    Famous search engines like Google, Yahoo, Baidu, and Bing were adopted to retrieve high quality seed URLs. After most relevant results were returned by these search engines with topic ‘development achievement of Tibet’ as input,then we selected the most pertinent ones manually to ensure quality and authority.

    In relation to the method of judging whether a webpage was relevant or not, two parts were taken into consideration,the title and the main contents, extracted from ‘title’ and ‘p’elements of the webpage, respectively. There were two processes doing the estimate. The webpage with at least one geographic keyword in its title and at least three times the keyword in its main contents was sent to the subsequent processing or it was dumped. In general, word counts can be a powerful predictor of document relevance and can help distinguish documents that are about a particular subject from those that discuss that subject in passing[50]or those that have nothing to do with the subject. Hence, the minimum three times was utilized to increase the possibility that one webpage was relevant to our topic and at same time, the number three was not so large as to ignore potential pertinent documents.Afterwards, the processed webpage with the main contents in which the total number of occurrence of any keyword in the other keywords list is greater than 3, was marked as a relevant webpage. We have to admit that the relevance-check approach is a quite rough way and not greatly accurate, but it also works in a certain extent in our experiment, since all three algorithms are judged by the same method.

    In this experiment, the list is taken as the data structure for URLs’ scheduling. For simplicity, link evaluating is just grounded on one factor, and anchor text is the only basis of link analysis for the best-first search, for the reason that not only it can be easily fetched from the parsed webpage but also it has two properties that make it particularly useful for ranking webpage[51]. First, it tends to be very short, generally several words that often succinctly describe the topic of the linked page. Second, many queries are very similar to anchor text which is also short topical descriptions of webpages. The value of the outlink in one webpage depends on how many times the keyword in both keywords lists shows up in the anchor text, and if any keyword of both lists occur in the anchor text at the same time, we will add extra 2 to its value.The list was sorted by the value in the descending order, and in each crawl the first element was popped out.

    Fig. 3 and Fig. 4 illustrate the comparison of the amount of the obtained relevant webpage and frontier (crawler’s request queue) size by three different algorithms. Fig. 3 depicts an amount of relevant webpage versus crawling times.

    Fig. 4 presents the size of frontier versus crawling times.

    Fig. 3. Amount of relevant webpages changing with crawling times.

    Fig. 4. Frontier size changing with the crawling times.

    As it can be seen in Fig. 3, the depth-first search hardly gives good results and provides focused crawling nearly with no help. In 200 times crawling, the amount kept unchanged,actually being one, the straight line in the figure. As for the breadth-first search, it slowly finds the relevant links. In our expectation, the best-first search performs best, for as the first of the three tested algorithms encounters the high quality link,that is, high ranking in the sorted list.

    From Fig. 4, the depth-first stabilizes its size of the frontier at a certain level for its low efficiency, and the best-first, due to its used URL evaluating rule, has no great disparity with the depth-first in frontier size. In the case of the breadth-first search, it produces the fastest and the biggest amount of links.

    5.2 Comparisons of Three Improved Crawling Strategies

    The previous subsection shows the performance of three crawling strategies, and in this subsection, we introduce another experiment based on the former one, with a certain webpage analyzing algorithm preprocessing webpages in order to improve the performance of the focused crawler.

    We choose a block-granularity webpage algorithm, and it utilizes DOM features and visual cues, like VIPS. However, it does page segmentation by visual clustering. The algorithm will be introduced in detail. After downloading the webpage,we extract the useful elements (those most probably contain text information like

    and ) and then, store their XPath,position (from top and left), width, and height. The densitybased spatial clustering of applications with noise (DBSCAN),a data clustering algorithm is taken to do the visual clustering.Features fed to the DBSCAN are composed of two parts, visual features, as well as DOM features. In this experiment, visual features are represented as a 6-dimensional vector and DOM features are expressed as ann-dimensional vector wherenis determined by the amount of distinct paths in the XPATH list (XPATH of each element in the webpage is processed to build this list). The 6 values construct the visual features vector of element, value ‘left’, value ‘left + width’, value‘top’, value ‘top + height’, value ‘(2×left + width)/2’, and value ‘(2×top + height)/2’. We locate every path of each element’s XPATH in the XPATH list, distributing a value to the corresponding dimension in itsn-dimensional vector.And for other dimensions, they are assigned value zero as default. The two vectors are joined as a feature array. The output of DBSCAN is cluster labels for each element in the dataset. Clusters with one element are dumped, which are regarded as noisy ones. Intuitively and empirically, the header and footer of one webpage offer contents and potential useful links extraction no help, and thus we get rid of these two parts from clusters. Figs.5 and 6 are the segmentation results and final output after disposing the noise, header, and footer,respectively.

    One single color represents one cluster. Clusters in red circles (Fig. 5) have been removed (the middle one is noise),and then we get the final result illustrated in Fig. 6.

    The value of queuing links includes two parts. One is gotten from their anchor text described at the first experiment, and clusters information is taken into consideration for the other. We assume the cluster whose area is the largest contains elements that form the main contents, which is accord with common sense. When the amount of clusters is larger than 5, we assign a value 3 to the largest cluster, a value 2 to the second largest cluster,and a value 1 as default to the left clusters (‘largest’ here refers to the area of cluster that is composed of the area of each single element in it). While if the amount of clusters is less than 5, we only assign a value 2 to the largest cluster and a value 1 to the left. And the final value of link is

    Fig. 6. Final result with noise, header, and footer discarded.

    Each value has a weight, for reason that the anchor text conveys direct information about the topic while cluster information just reveals the relations between elements in an individual webpage, and hence 0.9 and 0.1 as weights are assigned to two values, respectively.

    The second experiment was carried out with the same platform, the same data, and the same hardware environment.

    Comparing Fig. 7 with Fig. 3, Fig. 8 with Fig. 4, we can draw such a conclusion that the variation trend keeps unchanged. For a better contrast of two experiments, they are put together as what showed in Fig. 9 and Fig. 10.

    What is interesting in Fig. 9 is the phenomenon that the segmentation does not show any improvement on the depthfirst and breadth-first, on account of that as blind search, the depth-first and breadth-first use no domain knowledge of the problem state, and they work without heuristics. To be more concrete, the estimation information of each queuing link is not utilized by depth-first and breadth-first. In this experiment,what the segmentation does for them is only removing links that are not quite relevant to our topic as noise, header, and footer, which cannot fundamentally change their low harvest rate in focused crawling. As for the best-first, there is a significant improvement. In our experiments, the segmentation by visual clustering increases the harvest rate from 40% to 60%. For our experiments are quite rough, the harvest rate cannot be definitely accurate. But under such a circumstance that two experiments were conducted under the same conditions and judged by the same mechanism, the improvements must mean something. The improvements can be explained by the more accurate out links estimation, which provides a more precise prediction.

    Fig. 7. Amount of relevant webpages, with page segmentation.

    Fig. 8. Frontier size, with page segmentation.

    Fig. 9. Compound of Fig. 7 and Fig. 3.

    Fig. 10. Compound of Fig. 8 and Fig. 4.

    In Fig. 10, it is obvious that after segmentation and useless clusters filtering, the frontier size keeps increasing under the line of the first experiment. In the central area of a webpage,the links always link to pertinent webpages while links in the header and footer probably not. Our topic is quite narrow, so the links of the central area tend to link to the same website where the topic is widely talked about and that is why the frontier size grows slowly and a quantity of irrelevant links lead to skyrocketing of the frontier size.

    Based on the two experiments, comparisons of the six crawling processes are shown in Table 1.

    Table 1: Comparison of six crawling results about three indicators

    5.3 Points on Research Tendency

    Undoubtedly, our ultimate target is to raise the harvest rate and cut the processing time down. Even though the harvest rate is the priority, a higher rate cannot be obtained without the cost of time. Visual segmentation, whether VIPS or clustering, takes too much time to process webpages. Therefore, densitometric segmentation is a better choice. However, segmentation is just a side of webpage analyzing, and what we really want is the relevance information on the webpage. So, to extract the comprehensive features (not just the anchor text, position, and size information), in order to make a more reliable prediction,and to keep the consuming-time as less as possible are the priorities of future research. Before any breakthroughs have been made, we have to strike a balance between the harvest rate and computational costs.

    In light of recent researches, more algorithms included in the rapidly growing area of AI are integrated with the focused crawler to improve its efficiency, which means methodologies in AI are also future directions in focused crawler researching.

    6. Conclusions

    As stated above, this survey gives a quick review of algorithms commonly utilized in two main parts of the focused crawler. The basic crawling strategies alone are not appropriate to the topic-driven crawler or webpage analyzing algorithms neither. In reality, the output of webpage analyzing algorithms is fed to crawling strategies, or in others words, the latter is based on the former, aiming to prioritize the URLs in the queue with mutual efforts. And finally the key challenge of focused crawler is worked out. Identifying the next most proper links to fetch from the URLs queue. Depending on different needs of applications, we choose corresponding algorithms.

    The future target is to improve the two sorts of algorithms and find a better combination, to enhance the crawling efficiency, basically at two aspects, the higher harvest rate and lower computational costs.

    猜你喜歡
    日喀則青藏那曲
    我校成功中標(biāo)西藏那曲市文化產(chǎn)業(yè)發(fā)展規(guī)劃編制項(xiàng)目
    援藏是一首奮進(jìn)之歌
    浙江人大(2022年7期)2022-07-30 03:22:30
    打開藝術(shù)的寶盒——“青藏三部曲”的多樣化文體與敘事探索
    阿來研究(2020年1期)2020-10-28 08:10:24
    青藏星夜
    文苑(2020年6期)2020-06-22 08:41:54
    日喀則脫毒馬鈴薯原原種薯霧培生產(chǎn)技術(shù)
    日喀則,天堂的鄰居
    布達(dá)拉(2019年5期)2019-07-09 01:21:44
    融入情境 落實(shí)新課標(biāo) 凸顯地理實(shí)踐力——以騎行青藏為例
    那曲河邊
    鹿鳴(2018年12期)2018-02-14 02:37:34
    生命青藏
    散文詩(2017年22期)2017-06-09 07:55:23
    日喀則“118”大到暴雪天氣診斷分析
    西藏科技(2015年3期)2015-09-26 12:11:08
    日韩三级视频一区二区三区| 91大片在线观看| 午夜免费激情av| 俄罗斯特黄特色一大片| 欧美黄色片欧美黄色片| 国产真人三级小视频在线观看| 人人妻,人人澡人人爽秒播| 欧美亚洲日本最大视频资源| 国产伦一二天堂av在线观看| 丝袜美足系列| 欧美一级a爱片免费观看看 | 国内久久婷婷六月综合欲色啪| 免费不卡黄色视频| 午夜福利视频1000在线观看 | 久久热在线av| 黄片播放在线免费| 高清在线国产一区| 欧美国产日韩亚洲一区| e午夜精品久久久久久久| 一区在线观看完整版| 丝袜人妻中文字幕| а√天堂www在线а√下载| 制服丝袜大香蕉在线| videosex国产| 久久久久久国产a免费观看| 午夜免费观看网址| 国产精品久久久人人做人人爽| 999精品在线视频| 亚洲三区欧美一区| 久久香蕉精品热| 一级黄色大片毛片| 国产乱人伦免费视频| 国产精品一区二区精品视频观看| 国产精品国产高清国产av| av电影中文网址| av电影中文网址| 搡老熟女国产l中国老女人| 中文字幕av电影在线播放| 看免费av毛片| 亚洲天堂国产精品一区在线| 欧美中文综合在线视频| 欧美中文综合在线视频| 多毛熟女@视频| 欧美另类亚洲清纯唯美| 亚洲成国产人片在线观看| 久久久久久亚洲精品国产蜜桃av| 一进一出抽搐动态| 久99久视频精品免费| 女警被强在线播放| 99久久国产精品久久久| 精品国产一区二区久久| 一进一出抽搐动态| 可以在线观看的亚洲视频| 日本一区二区免费在线视频| 在线观看免费午夜福利视频| 在线观看免费午夜福利视频| 久久久国产成人免费| 操出白浆在线播放| 欧美日韩一级在线毛片| 亚洲一卡2卡3卡4卡5卡精品中文| 午夜精品在线福利| 熟女少妇亚洲综合色aaa.| 久久久久国内视频| 大香蕉久久成人网| 国产成人影院久久av| a在线观看视频网站| 中亚洲国语对白在线视频| 色精品久久人妻99蜜桃| 一级片免费观看大全| 午夜福利在线观看吧| 亚洲五月天丁香| av超薄肉色丝袜交足视频| 国产精品久久视频播放| 桃红色精品国产亚洲av| 亚洲五月婷婷丁香| 精品欧美国产一区二区三| 在线观看免费日韩欧美大片| 欧美色视频一区免费| 男人操女人黄网站| 99精品久久久久人妻精品| 麻豆国产av国片精品| 精品熟女少妇八av免费久了| 在线观看免费午夜福利视频| 久久这里只有精品19| e午夜精品久久久久久久| 国产精品亚洲av一区麻豆| 可以免费在线观看a视频的电影网站| 久久久久国产一级毛片高清牌| 亚洲国产中文字幕在线视频| 亚洲天堂国产精品一区在线| 精品国产乱子伦一区二区三区| 亚洲中文日韩欧美视频| 亚洲国产欧美网| 久久久久国产精品人妻aⅴ院| 国产精品亚洲一级av第二区| 亚洲成人久久性| 国产高清视频在线播放一区| 欧美+亚洲+日韩+国产| 日韩成人在线观看一区二区三区| 美女午夜性视频免费| 国产av一区二区精品久久| 亚洲第一青青草原| 19禁男女啪啪无遮挡网站| 久久久久久大精品| 91av网站免费观看| 老鸭窝网址在线观看| 丝袜人妻中文字幕| 国产极品粉嫩免费观看在线| 欧美日韩一级在线毛片| 中文字幕色久视频| √禁漫天堂资源中文www| 亚洲欧洲精品一区二区精品久久久| xxx96com| 久久精品国产综合久久久| 不卡av一区二区三区| 黑人巨大精品欧美一区二区mp4| 精品国产超薄肉色丝袜足j| 每晚都被弄得嗷嗷叫到高潮| av中文乱码字幕在线| 国产主播在线观看一区二区| 在线免费观看的www视频| 精品乱码久久久久久99久播| 在线天堂中文资源库| 天天添夜夜摸| 国产成人精品久久二区二区91| 1024视频免费在线观看| 高清毛片免费观看视频网站| 久久香蕉激情| 欧美中文日本在线观看视频| 成人18禁在线播放| 国产激情欧美一区二区| 国产精品亚洲美女久久久| 在线观看66精品国产| 老司机靠b影院| 人人妻,人人澡人人爽秒播| 国产主播在线观看一区二区| 天堂动漫精品| 午夜亚洲福利在线播放| 最近最新免费中文字幕在线| 香蕉国产在线看| 亚洲人成伊人成综合网2020| 中出人妻视频一区二区| 久久人妻av系列| 亚洲av片天天在线观看| 久久久久国产精品人妻aⅴ院| 精品一区二区三区视频在线观看免费| 黄色 视频免费看| 91字幕亚洲| avwww免费| АⅤ资源中文在线天堂| 国产av精品麻豆| 十八禁人妻一区二区| 精品一区二区三区四区五区乱码| 精品久久久久久成人av| 啦啦啦观看免费观看视频高清 | 亚洲国产毛片av蜜桃av| 国产高清激情床上av| 久久精品影院6| 中文字幕另类日韩欧美亚洲嫩草| www.www免费av| 桃色一区二区三区在线观看| 国产av一区在线观看免费| 精品久久久久久久久久免费视频| 亚洲中文日韩欧美视频| 欧美精品亚洲一区二区| 国产av在哪里看| 欧美老熟妇乱子伦牲交| 可以免费在线观看a视频的电影网站| 怎么达到女性高潮| 亚洲一区中文字幕在线| 妹子高潮喷水视频| 一区二区日韩欧美中文字幕| 91精品三级在线观看| 亚洲一码二码三码区别大吗| 嫁个100分男人电影在线观看| 国产成人av教育| 亚洲第一电影网av| 精品国产国语对白av| 精品不卡国产一区二区三区| av中文乱码字幕在线| 午夜福利一区二区在线看| 淫秽高清视频在线观看| 十八禁人妻一区二区| 国内精品久久久久久久电影| 制服人妻中文乱码| 香蕉久久夜色| 妹子高潮喷水视频| 中文字幕人成人乱码亚洲影| 免费在线观看完整版高清| av视频在线观看入口| 好男人在线观看高清免费视频 | 97超级碰碰碰精品色视频在线观看| 正在播放国产对白刺激| 亚洲午夜理论影院| 在线观看66精品国产| 国产av在哪里看| 亚洲男人天堂网一区| 免费女性裸体啪啪无遮挡网站| 男人的好看免费观看在线视频 | 色在线成人网| 欧美黑人欧美精品刺激| 免费观看人在逋| 国产欧美日韩一区二区精品| www.精华液| 在线国产一区二区在线| 久久精品91蜜桃| 欧美在线黄色| www.999成人在线观看| 国产成人精品久久二区二区91| 一进一出抽搐动态| 天堂影院成人在线观看| 香蕉久久夜色| 乱人伦中国视频| 欧美日本中文国产一区发布| 18禁美女被吸乳视频| 人人妻人人澡人人看| 亚洲欧洲精品一区二区精品久久久| 韩国av一区二区三区四区| 亚洲精品中文字幕在线视频| 国产又爽黄色视频| 欧美日本中文国产一区发布| av超薄肉色丝袜交足视频| 亚洲成a人片在线一区二区| 免费观看精品视频网站| 欧美日韩精品网址| 人人妻人人澡欧美一区二区 | 中文字幕最新亚洲高清| 中文字幕精品免费在线观看视频| 一区二区三区高清视频在线| 两个人免费观看高清视频| 叶爱在线成人免费视频播放| 国产欧美日韩一区二区精品| 日本三级黄在线观看| netflix在线观看网站| 精品一区二区三区视频在线观看免费| 欧美av亚洲av综合av国产av| 国产一区二区在线av高清观看| 欧美成人一区二区免费高清观看 | 很黄的视频免费| 国产高清videossex| 好看av亚洲va欧美ⅴa在| 欧美一区二区精品小视频在线| 欧美老熟妇乱子伦牲交| 电影成人av| 成人国语在线视频| 精品高清国产在线一区| 午夜两性在线视频| 在线观看一区二区三区| 露出奶头的视频| 99香蕉大伊视频| 欧美国产日韩亚洲一区| 欧美国产精品va在线观看不卡| 欧美一区二区精品小视频在线| 国产私拍福利视频在线观看| 欧美在线一区亚洲| 成人国语在线视频| 在线国产一区二区在线| 国产精品综合久久久久久久免费 | 久久亚洲真实| 精品久久蜜臀av无| 久久国产亚洲av麻豆专区| 侵犯人妻中文字幕一二三四区| 一区福利在线观看| 久久人妻福利社区极品人妻图片| 亚洲人成电影免费在线| 午夜免费鲁丝| 午夜两性在线视频| 国产蜜桃级精品一区二区三区| 国产伦人伦偷精品视频| 日本vs欧美在线观看视频| 国内精品久久久久精免费| 在线观看www视频免费| av欧美777| 51午夜福利影视在线观看| 精品国产一区二区久久| 久久青草综合色| 国产一级毛片七仙女欲春2 | 国产又色又爽无遮挡免费看| 亚洲七黄色美女视频| 中文字幕另类日韩欧美亚洲嫩草| 在线观看免费午夜福利视频| 91老司机精品| 成人特级黄色片久久久久久久| 欧美午夜高清在线| 亚洲精华国产精华精| 欧美中文综合在线视频| 97人妻天天添夜夜摸| 99re在线观看精品视频| 男女下面插进去视频免费观看| 欧美丝袜亚洲另类 | 久久精品人人爽人人爽视色| 十八禁人妻一区二区| 欧美大码av| 欧美精品亚洲一区二区| 日韩 欧美 亚洲 中文字幕| 久久精品国产亚洲av高清一级| 性欧美人与动物交配| 欧美日本视频| 女人高潮潮喷娇喘18禁视频| 90打野战视频偷拍视频| 99精品久久久久人妻精品| 免费在线观看日本一区| 黄色 视频免费看| 亚洲熟妇中文字幕五十中出| 嫁个100分男人电影在线观看| 久久人妻av系列| 日本黄色视频三级网站网址| 亚洲精品久久成人aⅴ小说| 色播亚洲综合网| www.精华液| 一二三四在线观看免费中文在| avwww免费| 电影成人av| 天堂影院成人在线观看| 黄网站色视频无遮挡免费观看| 亚洲成人国产一区在线观看| 黄片大片在线免费观看| 精品一区二区三区av网在线观看| 少妇粗大呻吟视频| 黄色视频不卡| 精品久久蜜臀av无| 激情在线观看视频在线高清| 久久狼人影院| 人成视频在线观看免费观看| 91麻豆av在线| 亚洲午夜精品一区,二区,三区| 久久精品aⅴ一区二区三区四区| 亚洲国产精品成人综合色| 免费久久久久久久精品成人欧美视频| 亚洲午夜精品一区,二区,三区| 日韩欧美一区视频在线观看| 午夜福利在线观看吧| 亚洲欧洲精品一区二区精品久久久| 国产精品99久久99久久久不卡| 他把我摸到了高潮在线观看| 日韩高清综合在线| 国产成人av激情在线播放| 99久久久亚洲精品蜜臀av| 亚洲精品美女久久久久99蜜臀| 一区在线观看完整版| 免费一级毛片在线播放高清视频 | 18禁裸乳无遮挡免费网站照片 | 国产成人影院久久av| 淫妇啪啪啪对白视频| 纯流量卡能插随身wifi吗| 亚洲狠狠婷婷综合久久图片| 国产成人欧美在线观看| 成人国产综合亚洲| 少妇被粗大的猛进出69影院| 视频在线观看一区二区三区| 精品一区二区三区av网在线观看| 纯流量卡能插随身wifi吗| 国产一区二区三区在线臀色熟女| 夜夜夜夜夜久久久久| 老熟妇乱子伦视频在线观看| 亚洲视频免费观看视频| 精品欧美国产一区二区三| 热re99久久国产66热| 最好的美女福利视频网| 国产高清有码在线观看视频 | 久久香蕉精品热| 久久精品影院6| 精品国产一区二区久久| 亚洲专区字幕在线| 在线观看免费视频网站a站| 日韩大码丰满熟妇| 精品国产乱子伦一区二区三区| 久久人人97超碰香蕉20202| 久久人妻福利社区极品人妻图片| 成人亚洲精品av一区二区| 在线观看免费视频日本深夜| 国产精品一区二区三区四区久久 | 久久精品影院6| 国产区一区二久久| 国产精品1区2区在线观看.| 亚洲av美国av| 人人妻人人澡人人看| 女性被躁到高潮视频| 亚洲成人国产一区在线观看| 一级作爱视频免费观看| 久久精品国产综合久久久| av欧美777| 成年人黄色毛片网站| 一级毛片高清免费大全| 国产一区在线观看成人免费| 18禁裸乳无遮挡免费网站照片 | 国产亚洲av嫩草精品影院| 成人欧美大片| 村上凉子中文字幕在线| 亚洲av熟女| 日韩欧美国产在线观看| 黑人巨大精品欧美一区二区蜜桃| 久久人妻熟女aⅴ| 级片在线观看| 亚洲国产欧美日韩在线播放| or卡值多少钱| 亚洲国产精品久久男人天堂| 999精品在线视频| 香蕉丝袜av| 国产aⅴ精品一区二区三区波| 欧美一区二区精品小视频在线| 黄频高清免费视频| 亚洲国产精品久久男人天堂| 亚洲中文字幕日韩| 淫秽高清视频在线观看| 国产激情久久老熟女| 亚洲中文字幕一区二区三区有码在线看 | 在线观看66精品国产| 亚洲熟妇熟女久久| 中出人妻视频一区二区| 国产午夜福利久久久久久| 久久中文字幕人妻熟女| 午夜成年电影在线免费观看| 身体一侧抽搐| 亚洲欧美一区二区三区黑人| 黄片播放在线免费| 成人三级黄色视频| av网站免费在线观看视频| 黑人欧美特级aaaaaa片| 美女免费视频网站| 制服诱惑二区| 欧美精品啪啪一区二区三区| 他把我摸到了高潮在线观看| 国产精品秋霞免费鲁丝片| 日韩精品中文字幕看吧| 亚洲熟妇熟女久久| 亚洲avbb在线观看| 精品电影一区二区在线| 母亲3免费完整高清在线观看| 亚洲 欧美一区二区三区| 午夜福利欧美成人| 亚洲一卡2卡3卡4卡5卡精品中文| 国产伦一二天堂av在线观看| 免费高清视频大片| 欧美中文日本在线观看视频| 久久精品91蜜桃| av视频在线观看入口| 久久国产亚洲av麻豆专区| 日本 av在线| 久久久精品国产亚洲av高清涩受| 亚洲欧美精品综合一区二区三区| 久久香蕉国产精品| 国产精品99久久99久久久不卡| 久久婷婷人人爽人人干人人爱 | 激情视频va一区二区三区| 99国产综合亚洲精品| 国产日韩一区二区三区精品不卡| 97碰自拍视频| 久久性视频一级片| 在线观看舔阴道视频| 黄片小视频在线播放| 人人妻人人澡人人看| 又紧又爽又黄一区二区| 久久久精品欧美日韩精品| 中文字幕最新亚洲高清| 女人高潮潮喷娇喘18禁视频| 亚洲五月婷婷丁香| 欧美国产精品va在线观看不卡| 99精品欧美一区二区三区四区| 可以免费在线观看a视频的电影网站| 亚洲精品中文字幕在线视频| 满18在线观看网站| 丁香六月欧美| 亚洲激情在线av| 99久久综合精品五月天人人| 悠悠久久av| 99riav亚洲国产免费| 黑人操中国人逼视频| 精品国产美女av久久久久小说| 亚洲一区高清亚洲精品| 亚洲精品中文字幕在线视频| 久久久久久久精品吃奶| 成人精品一区二区免费| 黄片小视频在线播放| www国产在线视频色| 亚洲精品在线观看二区| 桃红色精品国产亚洲av| 757午夜福利合集在线观看| 90打野战视频偷拍视频| 精品久久久久久久久久免费视频| 国产xxxxx性猛交| 三级毛片av免费| 亚洲一区二区三区色噜噜| 免费人成视频x8x8入口观看| 成人欧美大片| 久久中文看片网| 国产区一区二久久| 纯流量卡能插随身wifi吗| 日韩大码丰满熟妇| 国产精品乱码一区二三区的特点 | 最新美女视频免费是黄的| 亚洲国产毛片av蜜桃av| 一进一出抽搐动态| 国产精品亚洲一级av第二区| 久久精品国产亚洲av高清一级| 国产亚洲精品综合一区在线观看 | 精品国产乱子伦一区二区三区| 午夜a级毛片| 老司机午夜福利在线观看视频| 纯流量卡能插随身wifi吗| 亚洲五月婷婷丁香| 9热在线视频观看99| 国产精品精品国产色婷婷| 午夜免费鲁丝| 日本a在线网址| 久久久久久久久中文| 19禁男女啪啪无遮挡网站| 国产精品一区二区在线不卡| 国产在线观看jvid| av片东京热男人的天堂| 欧美一区二区精品小视频在线| 18美女黄网站色大片免费观看| 欧美精品啪啪一区二区三区| 亚洲人成77777在线视频| 午夜免费鲁丝| 黄片小视频在线播放| 国产精品久久电影中文字幕| 99国产极品粉嫩在线观看| 别揉我奶头~嗯~啊~动态视频| 动漫黄色视频在线观看| 男女下面插进去视频免费观看| 咕卡用的链子| 久久久久久国产a免费观看| 免费观看精品视频网站| 宅男免费午夜| 正在播放国产对白刺激| 久热这里只有精品99| 老司机午夜福利在线观看视频| 婷婷精品国产亚洲av在线| 麻豆国产av国片精品| 少妇 在线观看| 国产精品久久久久久人妻精品电影| 国产熟女xx| 午夜福利免费观看在线| 丁香六月欧美| 国产精品二区激情视频| 首页视频小说图片口味搜索| 一二三四在线观看免费中文在| 性色av乱码一区二区三区2| 午夜免费鲁丝| 国产成人系列免费观看| 免费在线观看亚洲国产| 日韩欧美国产一区二区入口| 女警被强在线播放| 亚洲专区中文字幕在线| 女警被强在线播放| 99久久久亚洲精品蜜臀av| 如日韩欧美国产精品一区二区三区| 亚洲欧美精品综合一区二区三区| 啪啪无遮挡十八禁网站| 亚洲自偷自拍图片 自拍| 91麻豆精品激情在线观看国产| 人妻久久中文字幕网| 丁香欧美五月| 亚洲第一欧美日韩一区二区三区| 一进一出抽搐gif免费好疼| 中国美女看黄片| 欧美不卡视频在线免费观看 | 亚洲国产欧美一区二区综合| 国产1区2区3区精品| 黄色女人牲交| 精品欧美一区二区三区在线| 午夜免费成人在线视频| 真人一进一出gif抽搐免费| av免费在线观看网站| 免费女性裸体啪啪无遮挡网站| 中文字幕高清在线视频| 精品乱码久久久久久99久播| 亚洲国产中文字幕在线视频| 久久国产乱子伦精品免费另类| 欧美成狂野欧美在线观看| 国语自产精品视频在线第100页| 露出奶头的视频| 免费在线观看日本一区| 精品一品国产午夜福利视频| 最新在线观看一区二区三区| 天天躁夜夜躁狠狠躁躁| 亚洲人成电影观看| 亚洲午夜精品一区,二区,三区| 亚洲精品国产精品久久久不卡| 欧美一级a爱片免费观看看 | 亚洲人成77777在线视频| 中文字幕高清在线视频| 国产黄a三级三级三级人| 黄色毛片三级朝国网站| 久久精品人人爽人人爽视色| 亚洲国产欧美网| 欧美激情高清一区二区三区| 欧美 亚洲 国产 日韩一| 狠狠狠狠99中文字幕| 老熟妇乱子伦视频在线观看| 中亚洲国语对白在线视频| 欧美成人午夜精品| 午夜福利视频1000在线观看 | 中国美女看黄片| 国产亚洲av高清不卡| 18禁裸乳无遮挡免费网站照片 | 亚洲第一电影网av| 在线免费观看的www视频| 怎么达到女性高潮| 视频区欧美日本亚洲| 99久久99久久久精品蜜桃| 免费看美女性在线毛片视频| www日本在线高清视频| 欧美成人性av电影在线观看| 好看av亚洲va欧美ⅴa在| 国产亚洲精品久久久久5区| 欧美绝顶高潮抽搐喷水| 亚洲aⅴ乱码一区二区在线播放 | 免费人成视频x8x8入口观看| 国产乱人伦免费视频| www.www免费av| 身体一侧抽搐| 精品国产一区二区久久| 国产精品久久久av美女十八|