• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A case study of 3D RTM-TTI algorithm on multicore and many-core platforms①

    2017-06-27 08:09:23ZhangXiuxia張秀霞TanGuangmingChenMingyuYaoErlin
    High Technology Letters 2017年2期

    Zhang Xiuxia (張秀霞), Tan Guangming, Chen Mingyu, Yao Erlin

    (*State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, P.R.China) (**University of Chinese Academy of Sciences, Beijing 100049, P.R.China)

    A case study of 3D RTM-TTI algorithm on multicore and many-core platforms①

    Zhang Xiuxia (張秀霞)②***, Tan Guangming*, Chen Mingyu*, Yao Erlin*

    (*State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, P.R.China) (**University of Chinese Academy of Sciences, Beijing 100049, P.R.China)

    3D reverse time migration in tiled transversly isotropic (3D RTM-TTI) is the most precise model for complex seismic imaging. However, vast computing time of 3D RTM-TTI prevents it from being widely used, which is addressed by providing parallel solutions for 3D RTM-TTI on multicores and many-cores. After data parallelism and memory optimization, the hot spot function of 3D RTM-TTI gains 35.99X speedup on two Intel Xeon CPUs, 89.75X speedup on one Intel Xeon Phi, 89.92X speedup on one NVIDIA K20 GPU compared with serial CPU baseline. This study makes RTM-TTI practical in industry. Since the computation pattern in RTM is stencil, the approaches also benefit a wide range of stencil-based applications.

    3D RTM-TTI, Intel Xeon Phi, NVIDIA K20 GPU, stencil computing, many-core, multicore, seismic imaging

    0 Introduction

    3D reverse time migration in tiled transverse isotropy (3D RTM-TTI) is the most precise model used in complex seismic imaging, which remains challenging due to technology complexity, stability, computational cost and difficulty in estimating anisotropic parameters for TTI media[1,2]. Reverse time migration (RTM) model was first introduced in the 1983[3]by Baysal. However, the 3D RTM-TTI model is more recent[1,2,4], which is much more precise and intricate in complex seismic imaging. Normally, RTM-TTI needs thousands of iterations to get image data in particular precision. In our practical medium-scale data set, it takes around 606 minutes to iterate 1024 times with five processes on Intel Xeon processors. It will cost more when dealing with larger dataset or iterating more times in order to get more accurate result in future experiments. Enormous computing time prevents 3D RTM-TTI from being widely used in industry.

    The limitations of current VLSI technology resulting in memory wall, power wall, ILP wall and the desire to transform the ever increasing number of transistors on a chip dictated by Moore’s Law into faster computers have led most hardware manufacturers to design multicore processors and specialized hardware accelerators. In the last few years, specialized hardware accelerators such as the Cell B.E. accelerators[5], general-purpose graphics processing units (GPGPUs)[6]have attracted the interest of the developers of scientific computing libraries. Besides, more recent Intel Xeon Phi[7]also emerges in Graph500 rankings. High performance energy efficiency and high performance price ratio feature these accelerators. Our work is trying to address the enormous computing time of 3D RTM-TTI by utilizing them.

    The core computation of RTM model is a combination of three basic stencil calculations:x-stencil,y-stencil andz-stencil as explained later. Although the existing stencil optimization methods could be adopted on GPU and CPU, it’s more compelling than ever to design a more efficient parallel RTM-TTI by considering the relationship among these stencils. Besides, there is not much performance optimization research on Intel Xeon Phi. Fundamental research work on Intel Xeon Phi is needed to find their similarity and difference of the three platforms.

    In this paper, implementation and optimization of 3D RTM-TTI algorithms on CPUs, Intel Xeon Phi and GPU are presented considering both architectural features and algorithm characteristics. By taking the algorithm characteristics into account, a proper low data coupling task partitioning method is designed. Considering architecture features, a series of optimization methods is adopted explicitly or implicitly to reduce high latency memory access and the number of memory accesses. On CPU and Xeon Phi, we start from parallelization in multi-threading and vectorization, kernel memory access is optimized by cache blocking, huge page and loop splitting. On GPU, considering GPU memory hierarchy, a new 1-pass algorithm is devised to reduce computations and global memory access. The main contributions of this paper can be summarized as follows:

    1. Complex 3D RTM-TTI algorithm is systematically implemented and evaluated on three different platforms: CPU, GPU, and Xeon Phi, which is the first time to implement and evaluate 3D RTM-TTI on these three platforms at the same time.

    2. With deliberate optimizations, the 3D RTM-TTI obtains considerable performance speedup which makes RTM-TTI practical in industry.

    3. Optimization methods are quantitatively evaluated which may guide other developers and give us some insight about architecture in software aspect. By analyzing the process of designing parallel codes, some general guides and advice in writing and optimizing parallel program on Xeon Phi, GPUs and CPUs are given.

    The rest of the paper is organized as follows: An overview of algorithm and platform is given in Section 1. Section 2 and 3 highlight optimization strategies used in the experiments on CPU, Xeon Phi and GPU respectively. In Section 4, the experimental results and analysis of the results are presented. Related work is discussed in Section 5. At last, conclusion is done in Section 6.

    1 Background

    To make this paper self-contained, a brief introduction is given to 3D RTM-TTI algorithm, then the architecture of Intel MIC and NVIDIA GPU K20 and programming models of them are described respectively.

    1.1 Sequential algorithm

    RTM model is a reverse engineering process. The main technique for seismic imaging is to generate acoustic waves and record the earth’s response at some distance from the source. It tries to model propagation of waves in the earth in two-way wave equation, once from source and once from receiver. The acoustic isotropic wave can be written as partial differential functions[8]. Fig.1 shows the overall 3D RTM-TTI algorithm, which is composed of shots loop, nested iteration loop and nested grid loop. Inside iteration, it computes front and back propagation wave field, boundary processing and cross correlation. In timing profile, most of the computation time of 3D RTM-TTI algorithm is occupied by the wave field computing step. Fig.2 shows the main wave updating operations within RTM after discretization of partial differential equations. Wave updating function is composed of derivative computing, like most finite differential computing, they belong to stencil computing. Three base stencils are combined to formxy,yz,xzstencils,asFig.3shows.Eachcellinwavefieldneedsacubicof9×9×9toupdateasFig.4shows.Allthesethreestencilshaveoverlappedmemoryaccess.

    1.2 Architecture of Xeon Phi

    Xeon Phi (also called MIC)[7]is a brand name given to a series of manycore architecture. Knight Corner is the codename of Intel’s second generation manycore architecture, which comprises up to sixty-one processor cores connected by a high performance on-die bidirectional interconnect. Each core supports 4 hardware threadings. Each thread replicates some of the architectural states, including registers, which makes it very fast to switch between hardware threads. In addition to the IA cores, there are 8 memory controllers supporting up to 16 GDDR5 channels delivering up to 5.5GT/s. In each MIC core, there are two in-order pipelines: scalar pipeline and vector pipeline. Each core has 32 registers of 512 bits width. Programming on Phi can be run both natively like CPU and in offload mode like GPU.

    1.3 Kepler GPU architecture

    NVIDIA GPU[6]is presented as a set of multiprocessors. Each one is equipped with its own CUDA cores and shared memory (user-managed cache). Kepler is the codename for a GPU microarchitecture developed by NVIDIA as the successor to the Fermi. It has 13 to 15 SMX units, as for K20, the number of SMX units is 13. All multiprocessors have access to global device memory. Memory latency is hidden by executing thousands of threads concurrently. Registers and shared memory resources are partitioned among the currently executing threads, context switching between threads is free.

    Fig.3 One wave field point updating

    Fig.4 Stencil in a cubic

    2 Implementation and optimization on Intel Xeon Phi and CPU

    Optimizing RTM on Intel Xeon Phi and CPU is similar due to similar programming model, the optimization methods of these two platforms are proposed in detail in this section.

    2.1 Parallelization

    2.1.1 Multi-threading

    Intel threading building blocks (TBB) thread library is used to parallelize 3D RTM-TTI codes on CPU and Xeon Phi. Since grid size is much larger than the thread size, the task is partitioned in 3D dimension sub-cubic. Fig.5 demonstrates TBB template for 3D task partition, and the task size is (bx,by,bz).OnCPUandXeonPhiplatforms,eachthreadcomputesdeviationsinthesub-cubic.Anautomatictuningtechniqueisusedtosearchthebestnumberofthreads.ForRTMapplication,theoptimalnumberofthreadsonXeonPhiis120,thebestthreadsnumberofIntelXeonCPUNUMA-coreis12threads.

    2.1.2 Instruction level parallel: SIMDization

    One of the most remarkable features of Xeon Phi is its vector computing unit. Vector length is 512 bits, which is larger than CPU’s vector 256 bits AVX vector. One Xeon Phi vector instruction can be used to compute 512/8/4 = 16 single float type data at once. Vector instruction is used by unrolling the innermost loop and using #pragmasimdintrinsic.

    2.2Memoryoptimization

    2.2.1Cacheblocking

    Cacheblockingisastandardtechniqueforimprovingcachereuse,becauseitreducesthememorybandwidthrequirementofanalgorithm.Thedatasetinasinglecomputingnodeinourapplicationis4.6GB,whereascachesizefortheprocessorsinCPUandXeonPhiislimitedtoafewMBs.Thefactthathigherperformancecanbeachievedforsmallerdatasetsfittingintocachememorysuggestsadivide-and-conquerstrategyforlargerproblems.Cacheblockingisaneffectwaytoimprovelocality.Cacheblockingisusedtoincreasespatiallocality,i.e.referencingnearbymemoryaddressesconsecutively,andreduceeffectivememoryaccesstimeoftheapplicationbykeepingblocksoffuturearrayreferencesatthecacheforreuse.Sincethedatatotalusedisfarbeyondcachecapacityandnon-continuousmemoryaccess,acachemissisunavoidable.It’seasiertoimplementcacheblockingonthebasisofourpreviousparallelTBBimplementation,becauseTBBisataskbasedthreadlibrary,eachthreadcandoseveraltasks,soaparallelprogramcanhavemoretasksthanthreads.Thetasksize(bx,by,bz)isadjustedtosmallcubicthatcouldbecoveredbyL2cache.

    2.2.2Loopsplitting

    Loopsplittingorloopfissionisasimpleapproachthatbreaksaloopintotwoormoresmallerloops.Itisespeciallyusefulforreducingthecachepressureofakernel,whichcanbetranslatedtobetteroccupancyandoverallperformanceimprovement.Ifmultipleoperationsinsidealoopbodyreplyondifferentinputsandtheseoperationsareindependent,then,theloopsplittingcanbeapplied.Thesplittingleadstosmallerloopbodiesandhencereducestheloopregisterpressure.ThedataflowofPandQarequitedecoupled.It’sbettertosplitthemtoreducethestressofcache.IterateondatasetPandQrespectively.

    2.2.3Hugepagetable

    SinceTLBmissesareexpensive,TLBhitscanbeimprovedbymappinglargecontiguousphysicalmemoryregionsbyasmallnumberofpages.SofewerTLBentriesarerequiredtocoverlargervirtualaddressranges.Areducedpagetablesizealsomeansareductionmemorymanagementoverhead.Touselargerpagesizesforsharedmemory,hugepagesmustbeenabledwhichalsolocksthesepagesinphysicalmemory.Thetotalmemoryusedis4.67GB,andmorethan1Mpagesof4kBsizewillbeused,whichexceedswhatL1andL2TLBcanhold.Byobservationofthealgorithm,itisfoundthatPandQareusedmanytimes,hugepagesareallocatedforthem.Regular4kBpageandhugepagearemixedlyusedtogether.Theusingmethodissimple.First,interactwithOSbywritingourinputintotheprocdirectory, and reserve enough huge pages. Then usemmapfunction to map huge page files into process memory.

    3 Implementation and optimizations on GPU

    3.1 GPU implementation

    The progress of RTM is to compute a serials of derivatives and combine them to update wave fieldPandQ. In GPU implementation, there are several separate kernels to compute each derivative. Without losing generality, we give an example how to compute dxyin parallel. The output of this progress is a 3D grid of dxy. Task partition is based on result dxy. Each thread computenzpoints, each block computebx·bypanel,andlotsofblockswillcoverthetotalgrid.

    3.2Computingreductionand1-passalgorithmoptimization

    Fig.3showsseveralkindsofderivatives.Thetraditional2-passcomputationistocompute1-orderderivativedx, dy, dz,andthencomputedxy, dyz, dxzbasedonit.Thismethodwillbringadditionalglobalreads,globalwritesandstoragespace.Amethodtoreduceglobalmemoryaccessisdevisedbyusingsharedmemoryandregisters:1-passalgorithm.Similarto2-passalgorithm,eachthreadcomputesaz-direction result of dxy.The1-orderresultxy-panel is stored in shared memory, and register double buffering is used to reduce shared memory reading. Fig.6 shows a snapshot of register buffering.

    Fig.6 1-pass computing window snapshot

    4 Evaluation

    4.1 Experiment setup

    The experiment is conducted on three platforms. The main parameters are listed in Table 1. The input of RTM is single pulse data with grid dimension of 512×312×301. The algorithm iterates 1000 times. The time in this section is the average time of one iteration.

    Table 1 Architecture parameters

    4.2 Overall performance

    Fig.7 shows performance comparison of three platforms. Our optimized 3D RTM-TTI gains considerable performance speedup. The hotspot function of 3D RTMTTI gains 35.99X speedup on two Intel Xeon CPUs, 89.75X speedup on one Intel Xeon Phi, 89.92X speedup on one NVDIA K20 GPU compared with serial CPU baselines. Our work makes RTM-TTI practical in industry. The result also shows obviously that accelerators are better at 3D RTM-TTI algorithm than traditional CPUs. The hotspot function gains around 2.5X speedup on GPU and Xeon Phi than that on two CPUs. On one hand, because the data dependency in RTM algorithm is decoupled, plenty of parallelism could be applied. Accelerators have more cores, threads, and wider vector instructions. For example, Xeon Phi has 60 computing cores. Besides that, it has 512-bit width vector instruction. Tesla K20 GPU has 2496 cores. Hence, accelerators are good at data parallelism computing. RTM algorithm is a memory bounded application. Accelerators like Xeon Phi and GPU have 7X and 5X more theoretical memory bandwidth than CPU as shown in Table 1.

    Fig.7 Performance evaluations of three platforms

    4.3 Performance analysis

    On CPU, the wave updating function gains 35.99X speedup compared with single thread CPU baseline. 20.12X speedup comes from parallelism of multi-threading and vector instruction as 1.96X comes from memory optimization, such as cache blocking, loop splitting and huge page configuring, as Figs 8 and 9 show.

    Fig.10 and Fig.11 show the parallelism and memory optimization performance of Xeon Phi respectively. RTM gains 13.81X for using 512-bit vector instruction on Phi. From Table 1, it is seen that the ideal speedup for single float SIMD on Xeon Phi is 16X. SIMD is near to the ideal limit. It’s due to cache miss which will make the pipeline stalled. The multi-threading on Xeon Phi gains 40.13X speedup, there are 60 cores on Xeon Phi. Xeon Phi has very good scalability in multi-threading and wide vector instruction. RTM gains 2.08X speedup due to cache blocking, because cache blocking reduces cache miss rate and provides good memory locality which will benefits SIMD and multi-threading. RTM gains 1.44X by using huge page for reducing L2 TLB miss rate. Loop splitting gains 1.69X speedup to reduce cache pressure in advance. When compared on the same platform, 2806.13X speedup is gained compared with the single thread Xeon Phi baseline. Of this, 554.53X is from parallelism of multi-threading and vector instruction, 5.06X is achieved from memory optimization. Here Intel Phi is more sensitive to data locality according to more speedup gains from explicit memory optimization.

    Fig.8 Parallelism evaluation on CPU (MT:multi-threading, Vec: vectorization)

    Fig.9 Memory optimization on CPU (Ca: cache blocking, Sp:splitting)

    As Fig.12 shows, RTM gains 1.23X speedup by using 1-pass algorithm on GPU, and 1.20X speedup by using texture memory in 1-pass algorithm. In total,the hot spot function gains 2.33X speedup compared with the baseline parallel GPU implementation. Threads block and grid selection are very important to the performance of application. Making full use of fast memory, such as shared memory and texture memory, will benefit application a lot. Explicit data locality plays an important role in application performance on GPU.

    Fig.10 Parallelization on Phi

    Fig.11 Memory optimization on Phi (HP:huge page)

    Fig.12 Memory optimization on GPU evaluation

    5 Related work

    Araya-Polo[9]assessed RTM algorithm in three kinds of accelerators: IBM Cell/B.E., GPU, and FPGA, and suggested a wish list from programming model, architecture design. However they only listed some optimization methods, and didn’t evaluate the impact quantitatively on RTM performance. Their paper was published earlier than Intel Xeon Phi, so performance about Xeon Phi is not included in that paper. In this paper, we choose much more popular platforms, and we evaluated each optimization method quantitatively. Heinecke[10]discussed performance of regression and classification algorithms in data mining problems on Intel Xeon Phi and GPGPU, and demonstrated that Intel Xeon Phi was better at sparse problem than GPU with less optimizations and porting efforts. Micikevicius[11]optimized RTM on GPU and demonstrated considerable speedups. Our work differs from his in that the model in his paper is average derivative method, our’s model is 3D RTM-TTI, which is more compelling.

    6 Conclusion and Future work

    In this paper, we discussed the enormously time-consuming but important seismic imaging application 3D RTM-TTI by parallel solution, and presented our optimization experience on three platforms: CPU, GPU, and Xeon Phi. To the best of our knowledge this is the first simultaneous implementation and evalution of 3D RTM-TTI on these three new platforms. Our optimized 3D RTM-TTI gains considerable performance speedup. Optimization on the Intel Xeon Phi architecture is similiar to CPU due to similar x86 architecture and programming model. Thread parallelization, vectorization and explicit memory locality are particularly critical for this architecture to achieve high performance. Vector instruction plays an important role in Xeon Phi, and loop dependence should be dismissed in order to use them, otherwise, performance will be punished. In general, memory optimizations should be explicaed such as using shared memory, constant memory etc. To benefit GPU applications a lot, bank conflicts should be avoided to get higher practical bandwidth. In future, we will evaluate our distributed 3D RTM-TTI algorithm and analysis communications.

    [ 1] Alkhalifah T. An acoustic wave equation for anisotropic media.Geophysics, 2000, 65(4):1239-1250

    [ 2] Zhang H, Zhang Y. Reverse time migration in 3D heterogeneous TTI media. In: Proceedings of the 78th Society of Exploration Geophysicists Annual International Meeting, Las Vegas, USA, 2008. 2196-2200

    [ 3] Baysal E, Kosloff D D, Sherwood J W. Reverse time migration.Geophysics, 1983, 48(11):1514-1524

    [ 4] Zhou H, Zhang G, Bloor B. An anisotropic acoustic wave equation for modeling and migration in 2D TTI media. In: Proceedings of the 76th Society of Exploration Geophysicists Annual International Meeting, San Antonio, USA, 2006. 194-198

    [ 5] Gschwind M, Hofstee H P, Flachs B, et al. Synergistic processing in cell’s multicore architecture.IEEEMicro, 2006, 26(2):10-24

    [ 6] NVIDIA Cooperation, NVIDIA’s next generation cuda compute architecture: Fermi. http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf, White Paper, 2009

    [ 7] Intel Cooperation, Intel Xeon Phi coprocessor system software developers guide. https://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-system-software-developers-guide.html, White Paper, 2014

    [ 8] Micikevicius P. 3D finite difference computation on GPUs using CUDA. In: Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units, Washington, D.C., USA, 2009. 79-84

    [ 9] Araya-Polo M, Cabezas J, Hanzich M, et al. Assessing accelerator-based HPC reverse time migration.IEEETransactionsonParallelandDistributedSystems, 2011, 22(1):147-162

    [10] Heinecke A, Klemm M, Bungartz H J. From GPGPU to many-core: NVIDIA Fermi and Intel many integrated core architecture.ComputinginScience&Engineering, 2012,14(2): 78-83

    [11] Zhou H, Ortigosa F, Lesage A C, et al. 3D reverse-time migration with hybrid finite difference pseudo spectral method. In: Proceedings of the 78th Society of Exploration Geophysicists Annual Meeting, Las Vegas, USA, 2008. 2257-2261

    Zhang Xiuxia, born in 1987, is a Ph.D candidate at Institute of Computing Technology, Chinese Academy of Sciences. Her research includes parallel computing, compiler and deep learning.

    10.3772/j.issn.1006-6748.2017.02.010

    ①Supported by the National Natural Science Foundation of China (No. 61432018).

    ②To whom correspondence should be addressed. E-mail: zhangxiuxia@ict.ac.cn

    on Apr. 16, 2016

    国产国拍精品亚洲av在线观看| 99国产精品一区二区蜜桃av| 日韩精品青青久久久久久| 欧美日韩一区二区视频在线观看视频在线 | 有码 亚洲区| 午夜福利视频1000在线观看| 日日摸夜夜添夜夜爱| 国产黄片视频在线免费观看| 国产一区二区在线av高清观看| 免费观看人在逋| 亚洲最大成人手机在线| 午夜爱爱视频在线播放| 三级毛片av免费| 少妇的逼水好多| 日本黄色片子视频| 国产乱人偷精品视频| 直男gayav资源| 天堂影院成人在线观看| 级片在线观看| 亚洲国产精品国产精品| 国产色婷婷99| 欧美日本视频| 人妻制服诱惑在线中文字幕| 国产午夜精品一二区理论片| 亚洲丝袜综合中文字幕| 国产在线精品亚洲第一网站| 中文在线观看免费www的网站| 中文字幕免费在线视频6| 夜夜夜夜夜久久久久| 欧美xxxx性猛交bbbb| 久久久色成人| 美女cb高潮喷水在线观看| videossex国产| 免费搜索国产男女视频| 日日干狠狠操夜夜爽| 午夜亚洲福利在线播放| 激情 狠狠 欧美| 综合色av麻豆| 嫩草影院新地址| av卡一久久| 成人鲁丝片一二三区免费| 国产 一区 欧美 日韩| 老司机福利观看| 成人午夜精彩视频在线观看| av在线亚洲专区| 亚洲中文字幕日韩| 在线观看一区二区三区| 九色成人免费人妻av| 亚洲婷婷狠狠爱综合网| 婷婷色av中文字幕| 中文字幕久久专区| 夫妻性生交免费视频一级片| 国产真实伦视频高清在线观看| 久久人人爽人人爽人人片va| 99国产精品一区二区蜜桃av| 成人av在线播放网站| 欧美极品一区二区三区四区| 国产午夜精品论理片| 日本-黄色视频高清免费观看| 3wmmmm亚洲av在线观看| 成人特级av手机在线观看| 男人的好看免费观看在线视频| 桃色一区二区三区在线观看| 亚洲久久久久久中文字幕| 亚洲精华国产精华液的使用体验 | 嫩草影院精品99| 午夜免费激情av| 国产91av在线免费观看| 国产精品电影一区二区三区| av专区在线播放| 黄色日韩在线| 日韩精品青青久久久久久| 国产精品一区二区三区四区久久| av视频在线观看入口| 人妻少妇偷人精品九色| 免费看日本二区| 久久久欧美国产精品| 国产熟女欧美一区二区| 最新中文字幕久久久久| 激情 狠狠 欧美| 午夜免费激情av| 蜜桃亚洲精品一区二区三区| 亚洲丝袜综合中文字幕| av.在线天堂| 哪个播放器可以免费观看大片| 男女边吃奶边做爰视频| 高清毛片免费观看视频网站| 一卡2卡三卡四卡精品乱码亚洲| 精品久久久久久久人妻蜜臀av| 国产黄片美女视频| 天天躁夜夜躁狠狠久久av| 村上凉子中文字幕在线| 国产精品三级大全| 久久久精品欧美日韩精品| av视频在线观看入口| 99久久无色码亚洲精品果冻| 少妇裸体淫交视频免费看高清| 99riav亚洲国产免费| 精品无人区乱码1区二区| 精品日产1卡2卡| 亚洲欧美日韩高清在线视频| 午夜免费男女啪啪视频观看| 变态另类成人亚洲欧美熟女| 免费电影在线观看免费观看| 中文字幕精品亚洲无线码一区| 亚洲国产日韩欧美精品在线观看| 免费观看a级毛片全部| 欧美zozozo另类| 久久精品国产亚洲av天美| 国产白丝娇喘喷水9色精品| 国产精品久久久久久精品电影| 国产精华一区二区三区| 亚洲在线观看片| 国内精品美女久久久久久| 国产精品一及| 18禁裸乳无遮挡免费网站照片| 国产成人精品一,二区 | 韩国av在线不卡| 日韩欧美在线乱码| 国产精品av视频在线免费观看| 亚洲欧美清纯卡通| 欧美变态另类bdsm刘玥| 国产一区二区在线观看日韩| 麻豆国产av国片精品| 18+在线观看网站| 免费观看a级毛片全部| 18禁在线无遮挡免费观看视频| 高清在线视频一区二区三区 | 国产爱豆传媒在线观看| 一夜夜www| 日本在线视频免费播放| 午夜福利高清视频| 日韩av在线大香蕉| 国产精品麻豆人妻色哟哟久久 | 熟女电影av网| 欧美性感艳星| 一本精品99久久精品77| 久久欧美精品欧美久久欧美| 成人无遮挡网站| 久久久精品大字幕| 精品人妻视频免费看| 国产精品久久久久久av不卡| 99热这里只有精品一区| 成年版毛片免费区| 亚洲欧美成人综合另类久久久 | 日韩 亚洲 欧美在线| 久久99热6这里只有精品| 国产伦精品一区二区三区四那| 成人亚洲欧美一区二区av| 男的添女的下面高潮视频| 欧美日韩精品成人综合77777| 一级毛片电影观看 | 久久久精品欧美日韩精品| 亚洲欧美中文字幕日韩二区| 国产乱人偷精品视频| 成人亚洲欧美一区二区av| 免费av毛片视频| 美女内射精品一级片tv| 欧美成人免费av一区二区三区| 精品少妇黑人巨大在线播放 | 久久精品影院6| 村上凉子中文字幕在线| 久久久精品欧美日韩精品| 熟女人妻精品中文字幕| 日韩在线高清观看一区二区三区| 日韩欧美精品v在线| 综合色丁香网| 亚洲精品乱码久久久v下载方式| 婷婷精品国产亚洲av| 国产精品免费一区二区三区在线| 亚洲精品国产成人久久av| 最近2019中文字幕mv第一页| 免费人成在线观看视频色| 99视频精品全部免费 在线| 久久精品夜夜夜夜夜久久蜜豆| 中国国产av一级| 99久国产av精品国产电影| 麻豆av噜噜一区二区三区| 久久久久久久久久久丰满| 麻豆成人av视频| 91久久精品电影网| 国产私拍福利视频在线观看| 色视频www国产| 亚洲经典国产精华液单| 听说在线观看完整版免费高清| 高清毛片免费看| 亚洲一区二区三区色噜噜| 免费看美女性在线毛片视频| 亚洲欧美日韩高清在线视频| 久久精品国产亚洲av香蕉五月| 亚洲成人久久爱视频| 少妇熟女aⅴ在线视频| 国产成年人精品一区二区| 人妻系列 视频| 婷婷精品国产亚洲av| 国产一区二区在线av高清观看| 蜜桃亚洲精品一区二区三区| 日韩一本色道免费dvd| 日韩av在线大香蕉| 久久亚洲国产成人精品v| 久久久成人免费电影| 欧美xxxx性猛交bbbb| 国产白丝娇喘喷水9色精品| 有码 亚洲区| 亚洲人成网站在线播| 91久久精品电影网| 久久久久九九精品影院| 亚洲国产精品合色在线| 日韩强制内射视频| 丝袜喷水一区| 国产不卡一卡二| 午夜爱爱视频在线播放| 国产精品久久久久久久电影| 丝袜美腿在线中文| 看片在线看免费视频| avwww免费| 狂野欧美白嫩少妇大欣赏| 免费无遮挡裸体视频| 99在线人妻在线中文字幕| 极品教师在线视频| 日本五十路高清| 插阴视频在线观看视频| 国产毛片a区久久久久| 一边亲一边摸免费视频| 夜夜夜夜夜久久久久| www.av在线官网国产| 国产一区亚洲一区在线观看| 亚洲av男天堂| 深夜a级毛片| 嫩草影院入口| 高清毛片免费看| 熟女电影av网| 淫秽高清视频在线观看| 国产精品久久久久久久久免| 欧美人与善性xxx| 国产伦精品一区二区三区视频9| 日韩欧美精品免费久久| 99久国产av精品| 成人永久免费在线观看视频| 国产亚洲5aaaaa淫片| 日本免费a在线| 亚洲aⅴ乱码一区二区在线播放| 秋霞在线观看毛片| 日韩国内少妇激情av| 亚洲美女视频黄频| 国产亚洲精品久久久com| 日韩大尺度精品在线看网址| 波多野结衣高清无吗| 国产精品精品国产色婷婷| 可以在线观看的亚洲视频| 日韩大尺度精品在线看网址| 国产免费一级a男人的天堂| 国国产精品蜜臀av免费| 99国产精品一区二区蜜桃av| 国产一级毛片在线| av在线亚洲专区| 日韩人妻高清精品专区| 色哟哟·www| 亚洲成a人片在线一区二区| 久久久欧美国产精品| 99热精品在线国产| av天堂中文字幕网| 久久亚洲精品不卡| 一级毛片我不卡| 日本三级黄在线观看| 搞女人的毛片| 亚洲欧美精品专区久久| 校园人妻丝袜中文字幕| 日本三级黄在线观看| 久久久久久九九精品二区国产| 卡戴珊不雅视频在线播放| 99热这里只有是精品50| 国产女主播在线喷水免费视频网站 | 少妇被粗大猛烈的视频| 免费观看人在逋| 国产精品蜜桃在线观看 | 菩萨蛮人人尽说江南好唐韦庄 | 偷拍熟女少妇极品色| 国内精品久久久久精免费| 黄片wwwwww| 欧美xxxx性猛交bbbb| 在线a可以看的网站| 国产一区二区亚洲精品在线观看| 又爽又黄a免费视频| 国产一区二区三区av在线 | 久久人人精品亚洲av| 国产亚洲精品久久久com| 久久久久久久久久久免费av| av视频在线观看入口| 看黄色毛片网站| 亚洲18禁久久av| 特大巨黑吊av在线直播| 亚洲国产精品sss在线观看| 久久精品夜色国产| 一区二区三区高清视频在线| 国内精品宾馆在线| .国产精品久久| 久久久精品94久久精品| 99久久精品一区二区三区| 久久亚洲精品不卡| 九九爱精品视频在线观看| 最好的美女福利视频网| 好男人在线观看高清免费视频| 成人亚洲欧美一区二区av| 日韩成人av中文字幕在线观看| a级一级毛片免费在线观看| 我的女老师完整版在线观看| 级片在线观看| 高清日韩中文字幕在线| 国产乱人偷精品视频| 在线观看一区二区三区| 一卡2卡三卡四卡精品乱码亚洲| 久久人人精品亚洲av| 午夜激情欧美在线| 大型黄色视频在线免费观看| 熟女人妻精品中文字幕| 日韩av不卡免费在线播放| 97热精品久久久久久| 欧美高清成人免费视频www| 日日摸夜夜添夜夜添av毛片| 日韩在线高清观看一区二区三区| 国产亚洲5aaaaa淫片| 国产精品精品国产色婷婷| 级片在线观看| 久久中文看片网| 欧美高清性xxxxhd video| 成人三级黄色视频| 国产一级毛片在线| 亚洲精品日韩av片在线观看| 桃色一区二区三区在线观看| 亚洲自拍偷在线| 三级毛片av免费| 午夜福利高清视频| 热99re8久久精品国产| 麻豆久久精品国产亚洲av| 99riav亚洲国产免费| av福利片在线观看| av卡一久久| 国产精品一区二区三区四区免费观看| 色播亚洲综合网| 欧美激情久久久久久爽电影| 91在线精品国自产拍蜜月| 国产伦在线观看视频一区| 中文字幕熟女人妻在线| 亚洲成av人片在线播放无| 久久久久久大精品| 中国美白少妇内射xxxbb| 狂野欧美激情性xxxx在线观看| 亚洲三级黄色毛片| 丰满人妻一区二区三区视频av| 国内久久婷婷六月综合欲色啪| 国内揄拍国产精品人妻在线| 亚洲第一电影网av| 亚洲av.av天堂| 高清在线视频一区二区三区 | 少妇被粗大猛烈的视频| 午夜福利成人在线免费观看| 男女做爰动态图高潮gif福利片| 成人亚洲精品av一区二区| 哪个播放器可以免费观看大片| 女人被狂操c到高潮| 久久人人精品亚洲av| 日韩一区二区三区影片| 亚洲成人精品中文字幕电影| 久久草成人影院| 成人三级黄色视频| 国产精品久久电影中文字幕| 久久久成人免费电影| 亚洲第一电影网av| 免费不卡的大黄色大毛片视频在线观看 | 校园人妻丝袜中文字幕| 精品少妇黑人巨大在线播放 | 一卡2卡三卡四卡精品乱码亚洲| 少妇被粗大猛烈的视频| 亚洲国产精品国产精品| 亚洲精品色激情综合| 欧美一级a爱片免费观看看| 亚州av有码| 国产在线精品亚洲第一网站| 麻豆国产97在线/欧美| 美女被艹到高潮喷水动态| 精品久久国产蜜桃| 欧美日本视频| av视频在线观看入口| 国产精品.久久久| 日日干狠狠操夜夜爽| 婷婷亚洲欧美| 高清毛片免费观看视频网站| 国产女主播在线喷水免费视频网站 | 国产成人福利小说| 在线国产一区二区在线| 亚洲av一区综合| 一个人看视频在线观看www免费| 18禁在线播放成人免费| 大香蕉久久网| 亚洲精品久久国产高清桃花| 国产极品精品免费视频能看的| 国产老妇伦熟女老妇高清| 两个人的视频大全免费| 免费在线观看成人毛片| 婷婷色综合大香蕉| 亚洲av电影不卡..在线观看| 日韩欧美一区二区三区在线观看| 十八禁国产超污无遮挡网站| 国产精品三级大全| 麻豆成人av视频| 成人午夜高清在线视频| 欧美区成人在线视频| 一本久久中文字幕| 22中文网久久字幕| 国产成人aa在线观看| 亚洲电影在线观看av| 嫩草影院入口| 熟妇人妻久久中文字幕3abv| av又黄又爽大尺度在线免费看 | 欧美日韩国产亚洲二区| 在线免费观看的www视频| 国产精品一区www在线观看| 国产熟女欧美一区二区| 日日摸夜夜添夜夜爱| 人人妻人人看人人澡| 中文精品一卡2卡3卡4更新| 如何舔出高潮| 亚洲精品乱码久久久久久按摩| 国产在视频线在精品| 少妇人妻一区二区三区视频| 亚洲国产精品成人久久小说 | 给我免费播放毛片高清在线观看| 伊人久久精品亚洲午夜| 久久精品综合一区二区三区| 午夜久久久久精精品| 国产不卡一卡二| 成年免费大片在线观看| 国产精品综合久久久久久久免费| 美女cb高潮喷水在线观看| 少妇高潮的动态图| 少妇被粗大猛烈的视频| 亚洲成a人片在线一区二区| 在线a可以看的网站| 日韩欧美精品v在线| 亚洲久久久久久中文字幕| 免费电影在线观看免费观看| 麻豆一二三区av精品| 一级二级三级毛片免费看| 黄色配什么色好看| 一本久久中文字幕| 亚洲精品亚洲一区二区| 成人亚洲精品av一区二区| 久久久久久伊人网av| 联通29元200g的流量卡| 欧美一区二区亚洲| 国产精品一区二区在线观看99 | 99热网站在线观看| 一级毛片aaaaaa免费看小| or卡值多少钱| 看十八女毛片水多多多| 少妇人妻精品综合一区二区 | 久久久久久久久久久免费av| 欧美性感艳星| 在线播放无遮挡| 国产精品爽爽va在线观看网站| 联通29元200g的流量卡| 波多野结衣高清作品| 久久人妻av系列| 菩萨蛮人人尽说江南好唐韦庄 | 能在线免费看毛片的网站| 久久鲁丝午夜福利片| 狂野欧美激情性xxxx在线观看| 精品国内亚洲2022精品成人| 亚洲av免费在线观看| 国产单亲对白刺激| 久久热精品热| 精品欧美国产一区二区三| 成人特级黄色片久久久久久久| 美女内射精品一级片tv| 国产欧美日韩精品一区二区| 免费av观看视频| 久久婷婷人人爽人人干人人爱| 中文字幕精品亚洲无线码一区| 午夜福利在线观看免费完整高清在 | 能在线免费看毛片的网站| 免费无遮挡裸体视频| 岛国在线免费视频观看| 熟妇人妻久久中文字幕3abv| 欧美性猛交╳xxx乱大交人| 日韩,欧美,国产一区二区三区 | 久久久a久久爽久久v久久| 色哟哟·www| 日韩,欧美,国产一区二区三区 | 免费观看精品视频网站| 欧美日韩乱码在线| av.在线天堂| 麻豆精品久久久久久蜜桃| 亚洲欧美成人综合另类久久久 | 日韩精品青青久久久久久| 午夜精品国产一区二区电影 | 男女那种视频在线观看| 国产精品精品国产色婷婷| 婷婷精品国产亚洲av| 人妻少妇偷人精品九色| 26uuu在线亚洲综合色| 亚洲欧美日韩高清在线视频| 午夜亚洲福利在线播放| 亚洲国产高清在线一区二区三| 韩国av在线不卡| 久久婷婷人人爽人人干人人爱| 成人欧美大片| av视频在线观看入口| 身体一侧抽搐| 中文字幕精品亚洲无线码一区| 99国产极品粉嫩在线观看| 蜜桃久久精品国产亚洲av| 99热网站在线观看| 18+在线观看网站| 美女内射精品一级片tv| 亚洲av中文av极速乱| 别揉我奶头 嗯啊视频| 国产一区二区激情短视频| 国产毛片a区久久久久| 国产老妇伦熟女老妇高清| 亚洲精品日韩av片在线观看| 成人美女网站在线观看视频| 嫩草影院新地址| 亚洲中文字幕一区二区三区有码在线看| 精品久久久久久久人妻蜜臀av| 床上黄色一级片| 欧美激情在线99| 精品少妇黑人巨大在线播放 | 青春草亚洲视频在线观看| 在线国产一区二区在线| 偷拍熟女少妇极品色| 国产伦在线观看视频一区| 18禁在线播放成人免费| 亚洲无线观看免费| 一级毛片久久久久久久久女| 国产精品乱码一区二三区的特点| 精品免费久久久久久久清纯| 99久久九九国产精品国产免费| 亚洲av熟女| 免费无遮挡裸体视频| 国模一区二区三区四区视频| 免费av不卡在线播放| 非洲黑人性xxxx精品又粗又长| 丰满的人妻完整版| 中文资源天堂在线| 免费观看的影片在线观看| 日本与韩国留学比较| 国产亚洲av片在线观看秒播厂 | 亚洲精品成人久久久久久| 中文字幕久久专区| 亚洲国产精品成人综合色| 男女啪啪激烈高潮av片| 别揉我奶头 嗯啊视频| 国产一区二区三区在线臀色熟女| 亚洲中文字幕日韩| 女人被狂操c到高潮| 国产成人精品婷婷| 国产毛片a区久久久久| 国产成人aa在线观看| 婷婷六月久久综合丁香| 久久亚洲国产成人精品v| 国产精华一区二区三区| av国产免费在线观看| 国产色婷婷99| 男女视频在线观看网站免费| 99国产极品粉嫩在线观看| 久久久a久久爽久久v久久| 校园人妻丝袜中文字幕| 不卡一级毛片| 内地一区二区视频在线| 舔av片在线| 国产精品久久久久久久电影| 国产精品久久久久久久久免| 男女边吃奶边做爰视频| 国产老妇女一区| 免费看av在线观看网站| 国产av不卡久久| 一个人看的www免费观看视频| 99热全是精品| 国产精品人妻久久久影院| 亚洲欧美日韩东京热| 国产精品美女特级片免费视频播放器| 日韩制服骚丝袜av| 日韩成人伦理影院| 日韩亚洲欧美综合| 亚洲欧洲国产日韩| 欧美高清成人免费视频www| 免费观看的影片在线观看| 看非洲黑人一级黄片| 国产成人午夜福利电影在线观看| 美女cb高潮喷水在线观看| 欧美bdsm另类| 国产精品人妻久久久久久| 亚洲四区av| 国产一区二区在线观看日韩| 18+在线观看网站| 毛片女人毛片| 热99在线观看视频| 91狼人影院| 美女cb高潮喷水在线观看| 国产一区二区在线观看日韩| 欧美高清成人免费视频www| 亚洲四区av| 搡女人真爽免费视频火全软件| 色5月婷婷丁香| 午夜爱爱视频在线播放| av免费在线看不卡| 久久精品国产亚洲av涩爱 | 最新中文字幕久久久久| 国产精品不卡视频一区二区| 亚洲欧美清纯卡通| 婷婷色综合大香蕉| 成人综合一区亚洲| 免费大片18禁| 99视频精品全部免费 在线|