• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A case study of 3D RTM-TTI algorithm on multicore and many-core platforms①

    2017-06-27 08:09:23ZhangXiuxia張秀霞TanGuangmingChenMingyuYaoErlin
    High Technology Letters 2017年2期

    Zhang Xiuxia (張秀霞), Tan Guangming, Chen Mingyu, Yao Erlin

    (*State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, P.R.China) (**University of Chinese Academy of Sciences, Beijing 100049, P.R.China)

    A case study of 3D RTM-TTI algorithm on multicore and many-core platforms①

    Zhang Xiuxia (張秀霞)②***, Tan Guangming*, Chen Mingyu*, Yao Erlin*

    (*State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, P.R.China) (**University of Chinese Academy of Sciences, Beijing 100049, P.R.China)

    3D reverse time migration in tiled transversly isotropic (3D RTM-TTI) is the most precise model for complex seismic imaging. However, vast computing time of 3D RTM-TTI prevents it from being widely used, which is addressed by providing parallel solutions for 3D RTM-TTI on multicores and many-cores. After data parallelism and memory optimization, the hot spot function of 3D RTM-TTI gains 35.99X speedup on two Intel Xeon CPUs, 89.75X speedup on one Intel Xeon Phi, 89.92X speedup on one NVIDIA K20 GPU compared with serial CPU baseline. This study makes RTM-TTI practical in industry. Since the computation pattern in RTM is stencil, the approaches also benefit a wide range of stencil-based applications.

    3D RTM-TTI, Intel Xeon Phi, NVIDIA K20 GPU, stencil computing, many-core, multicore, seismic imaging

    0 Introduction

    3D reverse time migration in tiled transverse isotropy (3D RTM-TTI) is the most precise model used in complex seismic imaging, which remains challenging due to technology complexity, stability, computational cost and difficulty in estimating anisotropic parameters for TTI media[1,2]. Reverse time migration (RTM) model was first introduced in the 1983[3]by Baysal. However, the 3D RTM-TTI model is more recent[1,2,4], which is much more precise and intricate in complex seismic imaging. Normally, RTM-TTI needs thousands of iterations to get image data in particular precision. In our practical medium-scale data set, it takes around 606 minutes to iterate 1024 times with five processes on Intel Xeon processors. It will cost more when dealing with larger dataset or iterating more times in order to get more accurate result in future experiments. Enormous computing time prevents 3D RTM-TTI from being widely used in industry.

    The limitations of current VLSI technology resulting in memory wall, power wall, ILP wall and the desire to transform the ever increasing number of transistors on a chip dictated by Moore’s Law into faster computers have led most hardware manufacturers to design multicore processors and specialized hardware accelerators. In the last few years, specialized hardware accelerators such as the Cell B.E. accelerators[5], general-purpose graphics processing units (GPGPUs)[6]have attracted the interest of the developers of scientific computing libraries. Besides, more recent Intel Xeon Phi[7]also emerges in Graph500 rankings. High performance energy efficiency and high performance price ratio feature these accelerators. Our work is trying to address the enormous computing time of 3D RTM-TTI by utilizing them.

    The core computation of RTM model is a combination of three basic stencil calculations:x-stencil,y-stencil andz-stencil as explained later. Although the existing stencil optimization methods could be adopted on GPU and CPU, it’s more compelling than ever to design a more efficient parallel RTM-TTI by considering the relationship among these stencils. Besides, there is not much performance optimization research on Intel Xeon Phi. Fundamental research work on Intel Xeon Phi is needed to find their similarity and difference of the three platforms.

    In this paper, implementation and optimization of 3D RTM-TTI algorithms on CPUs, Intel Xeon Phi and GPU are presented considering both architectural features and algorithm characteristics. By taking the algorithm characteristics into account, a proper low data coupling task partitioning method is designed. Considering architecture features, a series of optimization methods is adopted explicitly or implicitly to reduce high latency memory access and the number of memory accesses. On CPU and Xeon Phi, we start from parallelization in multi-threading and vectorization, kernel memory access is optimized by cache blocking, huge page and loop splitting. On GPU, considering GPU memory hierarchy, a new 1-pass algorithm is devised to reduce computations and global memory access. The main contributions of this paper can be summarized as follows:

    1. Complex 3D RTM-TTI algorithm is systematically implemented and evaluated on three different platforms: CPU, GPU, and Xeon Phi, which is the first time to implement and evaluate 3D RTM-TTI on these three platforms at the same time.

    2. With deliberate optimizations, the 3D RTM-TTI obtains considerable performance speedup which makes RTM-TTI practical in industry.

    3. Optimization methods are quantitatively evaluated which may guide other developers and give us some insight about architecture in software aspect. By analyzing the process of designing parallel codes, some general guides and advice in writing and optimizing parallel program on Xeon Phi, GPUs and CPUs are given.

    The rest of the paper is organized as follows: An overview of algorithm and platform is given in Section 1. Section 2 and 3 highlight optimization strategies used in the experiments on CPU, Xeon Phi and GPU respectively. In Section 4, the experimental results and analysis of the results are presented. Related work is discussed in Section 5. At last, conclusion is done in Section 6.

    1 Background

    To make this paper self-contained, a brief introduction is given to 3D RTM-TTI algorithm, then the architecture of Intel MIC and NVIDIA GPU K20 and programming models of them are described respectively.

    1.1 Sequential algorithm

    RTM model is a reverse engineering process. The main technique for seismic imaging is to generate acoustic waves and record the earth’s response at some distance from the source. It tries to model propagation of waves in the earth in two-way wave equation, once from source and once from receiver. The acoustic isotropic wave can be written as partial differential functions[8]. Fig.1 shows the overall 3D RTM-TTI algorithm, which is composed of shots loop, nested iteration loop and nested grid loop. Inside iteration, it computes front and back propagation wave field, boundary processing and cross correlation. In timing profile, most of the computation time of 3D RTM-TTI algorithm is occupied by the wave field computing step. Fig.2 shows the main wave updating operations within RTM after discretization of partial differential equations. Wave updating function is composed of derivative computing, like most finite differential computing, they belong to stencil computing. Three base stencils are combined to formxy,yz,xzstencils,asFig.3shows.Eachcellinwavefieldneedsacubicof9×9×9toupdateasFig.4shows.Allthesethreestencilshaveoverlappedmemoryaccess.

    1.2 Architecture of Xeon Phi

    Xeon Phi (also called MIC)[7]is a brand name given to a series of manycore architecture. Knight Corner is the codename of Intel’s second generation manycore architecture, which comprises up to sixty-one processor cores connected by a high performance on-die bidirectional interconnect. Each core supports 4 hardware threadings. Each thread replicates some of the architectural states, including registers, which makes it very fast to switch between hardware threads. In addition to the IA cores, there are 8 memory controllers supporting up to 16 GDDR5 channels delivering up to 5.5GT/s. In each MIC core, there are two in-order pipelines: scalar pipeline and vector pipeline. Each core has 32 registers of 512 bits width. Programming on Phi can be run both natively like CPU and in offload mode like GPU.

    1.3 Kepler GPU architecture

    NVIDIA GPU[6]is presented as a set of multiprocessors. Each one is equipped with its own CUDA cores and shared memory (user-managed cache). Kepler is the codename for a GPU microarchitecture developed by NVIDIA as the successor to the Fermi. It has 13 to 15 SMX units, as for K20, the number of SMX units is 13. All multiprocessors have access to global device memory. Memory latency is hidden by executing thousands of threads concurrently. Registers and shared memory resources are partitioned among the currently executing threads, context switching between threads is free.

    Fig.3 One wave field point updating

    Fig.4 Stencil in a cubic

    2 Implementation and optimization on Intel Xeon Phi and CPU

    Optimizing RTM on Intel Xeon Phi and CPU is similar due to similar programming model, the optimization methods of these two platforms are proposed in detail in this section.

    2.1 Parallelization

    2.1.1 Multi-threading

    Intel threading building blocks (TBB) thread library is used to parallelize 3D RTM-TTI codes on CPU and Xeon Phi. Since grid size is much larger than the thread size, the task is partitioned in 3D dimension sub-cubic. Fig.5 demonstrates TBB template for 3D task partition, and the task size is (bx,by,bz).OnCPUandXeonPhiplatforms,eachthreadcomputesdeviationsinthesub-cubic.Anautomatictuningtechniqueisusedtosearchthebestnumberofthreads.ForRTMapplication,theoptimalnumberofthreadsonXeonPhiis120,thebestthreadsnumberofIntelXeonCPUNUMA-coreis12threads.

    2.1.2 Instruction level parallel: SIMDization

    One of the most remarkable features of Xeon Phi is its vector computing unit. Vector length is 512 bits, which is larger than CPU’s vector 256 bits AVX vector. One Xeon Phi vector instruction can be used to compute 512/8/4 = 16 single float type data at once. Vector instruction is used by unrolling the innermost loop and using #pragmasimdintrinsic.

    2.2Memoryoptimization

    2.2.1Cacheblocking

    Cacheblockingisastandardtechniqueforimprovingcachereuse,becauseitreducesthememorybandwidthrequirementofanalgorithm.Thedatasetinasinglecomputingnodeinourapplicationis4.6GB,whereascachesizefortheprocessorsinCPUandXeonPhiislimitedtoafewMBs.Thefactthathigherperformancecanbeachievedforsmallerdatasetsfittingintocachememorysuggestsadivide-and-conquerstrategyforlargerproblems.Cacheblockingisaneffectwaytoimprovelocality.Cacheblockingisusedtoincreasespatiallocality,i.e.referencingnearbymemoryaddressesconsecutively,andreduceeffectivememoryaccesstimeoftheapplicationbykeepingblocksoffuturearrayreferencesatthecacheforreuse.Sincethedatatotalusedisfarbeyondcachecapacityandnon-continuousmemoryaccess,acachemissisunavoidable.It’seasiertoimplementcacheblockingonthebasisofourpreviousparallelTBBimplementation,becauseTBBisataskbasedthreadlibrary,eachthreadcandoseveraltasks,soaparallelprogramcanhavemoretasksthanthreads.Thetasksize(bx,by,bz)isadjustedtosmallcubicthatcouldbecoveredbyL2cache.

    2.2.2Loopsplitting

    Loopsplittingorloopfissionisasimpleapproachthatbreaksaloopintotwoormoresmallerloops.Itisespeciallyusefulforreducingthecachepressureofakernel,whichcanbetranslatedtobetteroccupancyandoverallperformanceimprovement.Ifmultipleoperationsinsidealoopbodyreplyondifferentinputsandtheseoperationsareindependent,then,theloopsplittingcanbeapplied.Thesplittingleadstosmallerloopbodiesandhencereducestheloopregisterpressure.ThedataflowofPandQarequitedecoupled.It’sbettertosplitthemtoreducethestressofcache.IterateondatasetPandQrespectively.

    2.2.3Hugepagetable

    SinceTLBmissesareexpensive,TLBhitscanbeimprovedbymappinglargecontiguousphysicalmemoryregionsbyasmallnumberofpages.SofewerTLBentriesarerequiredtocoverlargervirtualaddressranges.Areducedpagetablesizealsomeansareductionmemorymanagementoverhead.Touselargerpagesizesforsharedmemory,hugepagesmustbeenabledwhichalsolocksthesepagesinphysicalmemory.Thetotalmemoryusedis4.67GB,andmorethan1Mpagesof4kBsizewillbeused,whichexceedswhatL1andL2TLBcanhold.Byobservationofthealgorithm,itisfoundthatPandQareusedmanytimes,hugepagesareallocatedforthem.Regular4kBpageandhugepagearemixedlyusedtogether.Theusingmethodissimple.First,interactwithOSbywritingourinputintotheprocdirectory, and reserve enough huge pages. Then usemmapfunction to map huge page files into process memory.

    3 Implementation and optimizations on GPU

    3.1 GPU implementation

    The progress of RTM is to compute a serials of derivatives and combine them to update wave fieldPandQ. In GPU implementation, there are several separate kernels to compute each derivative. Without losing generality, we give an example how to compute dxyin parallel. The output of this progress is a 3D grid of dxy. Task partition is based on result dxy. Each thread computenzpoints, each block computebx·bypanel,andlotsofblockswillcoverthetotalgrid.

    3.2Computingreductionand1-passalgorithmoptimization

    Fig.3showsseveralkindsofderivatives.Thetraditional2-passcomputationistocompute1-orderderivativedx, dy, dz,andthencomputedxy, dyz, dxzbasedonit.Thismethodwillbringadditionalglobalreads,globalwritesandstoragespace.Amethodtoreduceglobalmemoryaccessisdevisedbyusingsharedmemoryandregisters:1-passalgorithm.Similarto2-passalgorithm,eachthreadcomputesaz-direction result of dxy.The1-orderresultxy-panel is stored in shared memory, and register double buffering is used to reduce shared memory reading. Fig.6 shows a snapshot of register buffering.

    Fig.6 1-pass computing window snapshot

    4 Evaluation

    4.1 Experiment setup

    The experiment is conducted on three platforms. The main parameters are listed in Table 1. The input of RTM is single pulse data with grid dimension of 512×312×301. The algorithm iterates 1000 times. The time in this section is the average time of one iteration.

    Table 1 Architecture parameters

    4.2 Overall performance

    Fig.7 shows performance comparison of three platforms. Our optimized 3D RTM-TTI gains considerable performance speedup. The hotspot function of 3D RTMTTI gains 35.99X speedup on two Intel Xeon CPUs, 89.75X speedup on one Intel Xeon Phi, 89.92X speedup on one NVDIA K20 GPU compared with serial CPU baselines. Our work makes RTM-TTI practical in industry. The result also shows obviously that accelerators are better at 3D RTM-TTI algorithm than traditional CPUs. The hotspot function gains around 2.5X speedup on GPU and Xeon Phi than that on two CPUs. On one hand, because the data dependency in RTM algorithm is decoupled, plenty of parallelism could be applied. Accelerators have more cores, threads, and wider vector instructions. For example, Xeon Phi has 60 computing cores. Besides that, it has 512-bit width vector instruction. Tesla K20 GPU has 2496 cores. Hence, accelerators are good at data parallelism computing. RTM algorithm is a memory bounded application. Accelerators like Xeon Phi and GPU have 7X and 5X more theoretical memory bandwidth than CPU as shown in Table 1.

    Fig.7 Performance evaluations of three platforms

    4.3 Performance analysis

    On CPU, the wave updating function gains 35.99X speedup compared with single thread CPU baseline. 20.12X speedup comes from parallelism of multi-threading and vector instruction as 1.96X comes from memory optimization, such as cache blocking, loop splitting and huge page configuring, as Figs 8 and 9 show.

    Fig.10 and Fig.11 show the parallelism and memory optimization performance of Xeon Phi respectively. RTM gains 13.81X for using 512-bit vector instruction on Phi. From Table 1, it is seen that the ideal speedup for single float SIMD on Xeon Phi is 16X. SIMD is near to the ideal limit. It’s due to cache miss which will make the pipeline stalled. The multi-threading on Xeon Phi gains 40.13X speedup, there are 60 cores on Xeon Phi. Xeon Phi has very good scalability in multi-threading and wide vector instruction. RTM gains 2.08X speedup due to cache blocking, because cache blocking reduces cache miss rate and provides good memory locality which will benefits SIMD and multi-threading. RTM gains 1.44X by using huge page for reducing L2 TLB miss rate. Loop splitting gains 1.69X speedup to reduce cache pressure in advance. When compared on the same platform, 2806.13X speedup is gained compared with the single thread Xeon Phi baseline. Of this, 554.53X is from parallelism of multi-threading and vector instruction, 5.06X is achieved from memory optimization. Here Intel Phi is more sensitive to data locality according to more speedup gains from explicit memory optimization.

    Fig.8 Parallelism evaluation on CPU (MT:multi-threading, Vec: vectorization)

    Fig.9 Memory optimization on CPU (Ca: cache blocking, Sp:splitting)

    As Fig.12 shows, RTM gains 1.23X speedup by using 1-pass algorithm on GPU, and 1.20X speedup by using texture memory in 1-pass algorithm. In total,the hot spot function gains 2.33X speedup compared with the baseline parallel GPU implementation. Threads block and grid selection are very important to the performance of application. Making full use of fast memory, such as shared memory and texture memory, will benefit application a lot. Explicit data locality plays an important role in application performance on GPU.

    Fig.10 Parallelization on Phi

    Fig.11 Memory optimization on Phi (HP:huge page)

    Fig.12 Memory optimization on GPU evaluation

    5 Related work

    Araya-Polo[9]assessed RTM algorithm in three kinds of accelerators: IBM Cell/B.E., GPU, and FPGA, and suggested a wish list from programming model, architecture design. However they only listed some optimization methods, and didn’t evaluate the impact quantitatively on RTM performance. Their paper was published earlier than Intel Xeon Phi, so performance about Xeon Phi is not included in that paper. In this paper, we choose much more popular platforms, and we evaluated each optimization method quantitatively. Heinecke[10]discussed performance of regression and classification algorithms in data mining problems on Intel Xeon Phi and GPGPU, and demonstrated that Intel Xeon Phi was better at sparse problem than GPU with less optimizations and porting efforts. Micikevicius[11]optimized RTM on GPU and demonstrated considerable speedups. Our work differs from his in that the model in his paper is average derivative method, our’s model is 3D RTM-TTI, which is more compelling.

    6 Conclusion and Future work

    In this paper, we discussed the enormously time-consuming but important seismic imaging application 3D RTM-TTI by parallel solution, and presented our optimization experience on three platforms: CPU, GPU, and Xeon Phi. To the best of our knowledge this is the first simultaneous implementation and evalution of 3D RTM-TTI on these three new platforms. Our optimized 3D RTM-TTI gains considerable performance speedup. Optimization on the Intel Xeon Phi architecture is similiar to CPU due to similar x86 architecture and programming model. Thread parallelization, vectorization and explicit memory locality are particularly critical for this architecture to achieve high performance. Vector instruction plays an important role in Xeon Phi, and loop dependence should be dismissed in order to use them, otherwise, performance will be punished. In general, memory optimizations should be explicaed such as using shared memory, constant memory etc. To benefit GPU applications a lot, bank conflicts should be avoided to get higher practical bandwidth. In future, we will evaluate our distributed 3D RTM-TTI algorithm and analysis communications.

    [ 1] Alkhalifah T. An acoustic wave equation for anisotropic media.Geophysics, 2000, 65(4):1239-1250

    [ 2] Zhang H, Zhang Y. Reverse time migration in 3D heterogeneous TTI media. In: Proceedings of the 78th Society of Exploration Geophysicists Annual International Meeting, Las Vegas, USA, 2008. 2196-2200

    [ 3] Baysal E, Kosloff D D, Sherwood J W. Reverse time migration.Geophysics, 1983, 48(11):1514-1524

    [ 4] Zhou H, Zhang G, Bloor B. An anisotropic acoustic wave equation for modeling and migration in 2D TTI media. In: Proceedings of the 76th Society of Exploration Geophysicists Annual International Meeting, San Antonio, USA, 2006. 194-198

    [ 5] Gschwind M, Hofstee H P, Flachs B, et al. Synergistic processing in cell’s multicore architecture.IEEEMicro, 2006, 26(2):10-24

    [ 6] NVIDIA Cooperation, NVIDIA’s next generation cuda compute architecture: Fermi. http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf, White Paper, 2009

    [ 7] Intel Cooperation, Intel Xeon Phi coprocessor system software developers guide. https://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-system-software-developers-guide.html, White Paper, 2014

    [ 8] Micikevicius P. 3D finite difference computation on GPUs using CUDA. In: Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units, Washington, D.C., USA, 2009. 79-84

    [ 9] Araya-Polo M, Cabezas J, Hanzich M, et al. Assessing accelerator-based HPC reverse time migration.IEEETransactionsonParallelandDistributedSystems, 2011, 22(1):147-162

    [10] Heinecke A, Klemm M, Bungartz H J. From GPGPU to many-core: NVIDIA Fermi and Intel many integrated core architecture.ComputinginScience&Engineering, 2012,14(2): 78-83

    [11] Zhou H, Ortigosa F, Lesage A C, et al. 3D reverse-time migration with hybrid finite difference pseudo spectral method. In: Proceedings of the 78th Society of Exploration Geophysicists Annual Meeting, Las Vegas, USA, 2008. 2257-2261

    Zhang Xiuxia, born in 1987, is a Ph.D candidate at Institute of Computing Technology, Chinese Academy of Sciences. Her research includes parallel computing, compiler and deep learning.

    10.3772/j.issn.1006-6748.2017.02.010

    ①Supported by the National Natural Science Foundation of China (No. 61432018).

    ②To whom correspondence should be addressed. E-mail: zhangxiuxia@ict.ac.cn

    on Apr. 16, 2016

    欧美日韩成人在线一区二区| 999精品在线视频| 91精品伊人久久大香线蕉| www.熟女人妻精品国产| 亚洲精品国产一区二区精华液| 欧美精品av麻豆av| 一边亲一边摸免费视频| 亚洲av男天堂| 国产精品国产三级专区第一集| 丝袜人妻中文字幕| 日韩av不卡免费在线播放| 国产成人精品久久久久久| 亚洲,欧美,日韩| 久久毛片免费看一区二区三区| 国产欧美日韩一区二区三区在线| 久久人人97超碰香蕉20202| 日韩制服丝袜自拍偷拍| 在线观看免费视频网站a站| 久久久精品区二区三区| 老司机影院成人| 老司机亚洲免费影院| 女性被躁到高潮视频| 午夜福利视频精品| 丁香六月欧美| 中文字幕色久视频| 最近最新中文字幕免费大全7| 又大又黄又爽视频免费| 免费在线观看黄色视频的| 一区二区三区精品91| 亚洲美女黄色视频免费看| xxx大片免费视频| 欧美少妇被猛烈插入视频| 青草久久国产| 中文字幕人妻丝袜制服| 国产成人精品久久久久久| 色精品久久人妻99蜜桃| 欧美精品一区二区免费开放| 最新的欧美精品一区二区| 一区在线观看完整版| 亚洲美女视频黄频| 亚洲精品乱久久久久久| 各种免费的搞黄视频| 国产精品 欧美亚洲| 天堂8中文在线网| 久久久久网色| 国产精品香港三级国产av潘金莲 | 免费高清在线观看日韩| 日韩制服骚丝袜av| 日韩欧美精品免费久久| 国产精品香港三级国产av潘金莲 | 精品久久蜜臀av无| 国产精品99久久99久久久不卡 | 观看美女的网站| 国产精品国产三级国产专区5o| 少妇被粗大的猛进出69影院| 国产精品偷伦视频观看了| 久久久久人妻精品一区果冻| 国产成人欧美| 亚洲av电影在线进入| 亚洲欧美成人综合另类久久久| 久久 成人 亚洲| 久久久久久久大尺度免费视频| 欧美黑人欧美精品刺激| 亚洲欧洲国产日韩| 最近手机中文字幕大全| 男女无遮挡免费网站观看| 欧美xxⅹ黑人| 午夜久久久在线观看| 水蜜桃什么品种好| 欧美黑人欧美精品刺激| 一区二区av电影网| 欧美 亚洲 国产 日韩一| 丝袜喷水一区| 色精品久久人妻99蜜桃| 亚洲 欧美一区二区三区| 在线亚洲精品国产二区图片欧美| 国产成人精品福利久久| 国产精品秋霞免费鲁丝片| 美女福利国产在线| 国产xxxxx性猛交| 国产成人精品久久久久久| 777米奇影视久久| 老司机亚洲免费影院| 国产精品香港三级国产av潘金莲 | 男女高潮啪啪啪动态图| 欧美亚洲日本最大视频资源| 国产成人精品在线电影| 桃花免费在线播放| 欧美精品亚洲一区二区| 亚洲在久久综合| 制服诱惑二区| 男男h啪啪无遮挡| 99久久人妻综合| 国产高清不卡午夜福利| 亚洲av福利一区| 曰老女人黄片| 国产精品二区激情视频| 99热国产这里只有精品6| 毛片一级片免费看久久久久| 欧美精品高潮呻吟av久久| 国产成人精品久久久久久| 国产男女内射视频| 日韩一区二区三区影片| 女人久久www免费人成看片| 精品一区二区免费观看| 亚洲自偷自拍图片 自拍| 国产免费又黄又爽又色| 1024视频免费在线观看| 久久99热这里只频精品6学生| 欧美成人精品欧美一级黄| 大香蕉久久网| 爱豆传媒免费全集在线观看| 亚洲精品第二区| 深夜精品福利| 亚洲av男天堂| 免费黄色在线免费观看| 女性生殖器流出的白浆| 中文精品一卡2卡3卡4更新| 曰老女人黄片| 99国产精品免费福利视频| 国产欧美日韩综合在线一区二区| 免费日韩欧美在线观看| 秋霞伦理黄片| 国产 一区精品| 久久国产亚洲av麻豆专区| 99久久人妻综合| 亚洲欧美一区二区三区黑人| 久久热在线av| 国产亚洲av片在线观看秒播厂| 永久免费av网站大全| 国产视频首页在线观看| 18禁观看日本| 中文字幕人妻丝袜制服| 两个人看的免费小视频| 韩国精品一区二区三区| 中文天堂在线官网| 免费观看性生交大片5| 成年av动漫网址| 国产精品av久久久久免费| 中文精品一卡2卡3卡4更新| 免费黄色在线免费观看| 午夜福利免费观看在线| 黑人猛操日本美女一级片| 高清视频免费观看一区二区| 亚洲美女视频黄频| 亚洲精品第二区| 国产探花极品一区二区| 少妇人妻久久综合中文| 亚洲精品成人av观看孕妇| 亚洲精品美女久久久久99蜜臀 | 精品少妇内射三级| 考比视频在线观看| 欧美日韩福利视频一区二区| www日本在线高清视频| 18禁国产床啪视频网站| 免费不卡黄色视频| 最近2019中文字幕mv第一页| 国产精品.久久久| 成人国产av品久久久| av有码第一页| 七月丁香在线播放| 久热爱精品视频在线9| 日韩,欧美,国产一区二区三区| 精品一区二区免费观看| 亚洲欧美激情在线| 国产精品.久久久| 久久久久国产一级毛片高清牌| 亚洲五月色婷婷综合| 久久久久精品人妻al黑| 成人国语在线视频| 日韩一区二区三区影片| 狠狠婷婷综合久久久久久88av| 成年女人毛片免费观看观看9 | 性色av一级| 午夜福利,免费看| 国产麻豆69| 免费日韩欧美在线观看| 欧美人与善性xxx| av福利片在线| 男人爽女人下面视频在线观看| 免费看不卡的av| 午夜老司机福利片| 韩国高清视频一区二区三区| 久久精品国产亚洲av高清一级| 国产亚洲最大av| 久久人人97超碰香蕉20202| 亚洲欧美一区二区三区黑人| 久久综合国产亚洲精品| 国产精品久久久av美女十八| 亚洲一卡2卡3卡4卡5卡精品中文| 中文字幕精品免费在线观看视频| 黄片小视频在线播放| 日韩一卡2卡3卡4卡2021年| 97人妻天天添夜夜摸| 十八禁网站网址无遮挡| 亚洲精品国产一区二区精华液| 999精品在线视频| 国产伦人伦偷精品视频| 啦啦啦视频在线资源免费观看| 国产精品久久久av美女十八| 人人妻人人澡人人看| 男人爽女人下面视频在线观看| 久久久久精品人妻al黑| 精品一区在线观看国产| tube8黄色片| 久久精品亚洲av国产电影网| 成年人午夜在线观看视频| 亚洲,欧美,日韩| 久久久久视频综合| 成年av动漫网址| 一级片免费观看大全| 国产成人免费无遮挡视频| 国产精品香港三级国产av潘金莲 | 黄色毛片三级朝国网站| 蜜桃在线观看..| 国产麻豆69| 少妇 在线观看| 亚洲人成77777在线视频| 欧美另类一区| 高清在线视频一区二区三区| 亚洲熟女精品中文字幕| 侵犯人妻中文字幕一二三四区| 波野结衣二区三区在线| 观看美女的网站| 亚洲欧美色中文字幕在线| 日本av免费视频播放| av不卡在线播放| 人人妻人人爽人人添夜夜欢视频| www.精华液| 午夜精品国产一区二区电影| 欧美日韩成人在线一区二区| 国产片特级美女逼逼视频| 不卡视频在线观看欧美| 久久ye,这里只有精品| 亚洲少妇的诱惑av| 精品国产乱码久久久久久小说| 老司机影院成人| 国产成人精品福利久久| 人人澡人人妻人| 日韩视频在线欧美| 一级毛片 在线播放| 在线观看www视频免费| 在线天堂最新版资源| 天堂8中文在线网| 亚洲国产看品久久| 18禁裸乳无遮挡动漫免费视频| 国产精品一二三区在线看| 精品国产露脸久久av麻豆| 欧美久久黑人一区二区| 成人午夜精彩视频在线观看| 桃花免费在线播放| 亚洲美女搞黄在线观看| 午夜av观看不卡| 下体分泌物呈黄色| 51午夜福利影视在线观看| 丁香六月欧美| 赤兔流量卡办理| 七月丁香在线播放| 国产深夜福利视频在线观看| 亚洲精品久久久久久婷婷小说| netflix在线观看网站| 亚洲男人天堂网一区| xxx大片免费视频| 三上悠亚av全集在线观看| 久久久亚洲精品成人影院| 国产免费一区二区三区四区乱码| 国产欧美亚洲国产| 黄片播放在线免费| 宅男免费午夜| 国产成人欧美在线观看 | 中文字幕人妻丝袜一区二区 | 亚洲男人天堂网一区| 成年女人毛片免费观看观看9 | √禁漫天堂资源中文www| 丰满饥渴人妻一区二区三| 在线天堂中文资源库| 99热国产这里只有精品6| 国产成人a∨麻豆精品| 成年av动漫网址| 女性生殖器流出的白浆| 少妇人妻久久综合中文| 中文字幕精品免费在线观看视频| 亚洲国产毛片av蜜桃av| 伊人亚洲综合成人网| 七月丁香在线播放| 母亲3免费完整高清在线观看| 成人国产麻豆网| 九九爱精品视频在线观看| 欧美激情极品国产一区二区三区| 免费观看a级毛片全部| 国产成人免费无遮挡视频| 精品一区二区三卡| 制服诱惑二区| 中文字幕精品免费在线观看视频| 97在线人人人人妻| 日本黄色日本黄色录像| 久久精品亚洲av国产电影网| 搡老乐熟女国产| 麻豆精品久久久久久蜜桃| 久久av网站| 欧美在线一区亚洲| 久久久国产一区二区| 最近的中文字幕免费完整| 蜜桃在线观看..| 又大又爽又粗| 亚洲欧洲日产国产| 国产精品一二三区在线看| 国产在线视频一区二区| 亚洲色图 男人天堂 中文字幕| av不卡在线播放| 啦啦啦 在线观看视频| 欧美日韩亚洲国产一区二区在线观看 | 五月开心婷婷网| 国产av码专区亚洲av| 国产精品秋霞免费鲁丝片| 一边亲一边摸免费视频| 中文字幕av电影在线播放| 亚洲专区中文字幕在线 | 伦理电影免费视频| 99久久99久久久精品蜜桃| 熟妇人妻不卡中文字幕| 亚洲视频免费观看视频| 纵有疾风起免费观看全集完整版| 国产欧美日韩综合在线一区二区| 欧美日韩视频高清一区二区三区二| 久久精品国产综合久久久| 亚洲熟女毛片儿| 国产淫语在线视频| 成人亚洲欧美一区二区av| 国产精品一区二区精品视频观看| 色吧在线观看| 久久久久视频综合| 亚洲人成电影观看| 日韩av免费高清视频| 亚洲 欧美一区二区三区| 美女视频免费永久观看网站| 亚洲精品视频女| 69精品国产乱码久久久| 日本猛色少妇xxxxx猛交久久| 91精品伊人久久大香线蕉| 欧美亚洲日本最大视频资源| 国产又色又爽无遮挡免| 国产黄频视频在线观看| 欧美国产精品一级二级三级| 精品久久久精品久久久| 高清在线视频一区二区三区| 久久女婷五月综合色啪小说| 国产国语露脸激情在线看| 黄片播放在线免费| 亚洲精品在线美女| 女性生殖器流出的白浆| 国产极品粉嫩免费观看在线| 王馨瑶露胸无遮挡在线观看| 三上悠亚av全集在线观看| 久久久国产精品麻豆| 日韩不卡一区二区三区视频在线| 男男h啪啪无遮挡| 国产深夜福利视频在线观看| 国产精品一区二区在线观看99| 又大又黄又爽视频免费| av卡一久久| 亚洲少妇的诱惑av| 在线天堂最新版资源| 成年女人毛片免费观看观看9 | 19禁男女啪啪无遮挡网站| 波多野结衣一区麻豆| 日本爱情动作片www.在线观看| 欧美日韩亚洲高清精品| 大香蕉久久网| av福利片在线| 99久久99久久久精品蜜桃| 亚洲精品一区蜜桃| 亚洲国产精品一区二区三区在线| 欧美少妇被猛烈插入视频| 丰满饥渴人妻一区二区三| 精品酒店卫生间| 最新在线观看一区二区三区 | 国产免费福利视频在线观看| 午夜久久久在线观看| 国产精品免费视频内射| 亚洲第一青青草原| 日本午夜av视频| 校园人妻丝袜中文字幕| 在线亚洲精品国产二区图片欧美| 久久久国产精品麻豆| 极品人妻少妇av视频| 美国免费a级毛片| 精品一区二区三卡| 亚洲人成电影观看| 国产免费视频播放在线视频| av线在线观看网站| 欧美xxⅹ黑人| 男女国产视频网站| av不卡在线播放| 在线亚洲精品国产二区图片欧美| 国产成人精品福利久久| 成人国产av品久久久| 无遮挡黄片免费观看| 免费观看av网站的网址| 精品午夜福利在线看| 麻豆精品久久久久久蜜桃| 久久久久精品人妻al黑| 菩萨蛮人人尽说江南好唐韦庄| 天天躁日日躁夜夜躁夜夜| 国产日韩欧美亚洲二区| 纵有疾风起免费观看全集完整版| 国产亚洲午夜精品一区二区久久| 日韩一区二区三区影片| 国产在线视频一区二区| 国产成人91sexporn| 成年人午夜在线观看视频| 国产不卡av网站在线观看| 日本一区二区免费在线视频| av电影中文网址| 久久久精品免费免费高清| 少妇被粗大猛烈的视频| 亚洲综合精品二区| 久热这里只有精品99| 亚洲色图 男人天堂 中文字幕| 9191精品国产免费久久| h视频一区二区三区| av免费观看日本| 99久久99久久久精品蜜桃| 秋霞伦理黄片| 国产精品成人在线| 9191精品国产免费久久| 久久精品aⅴ一区二区三区四区| 91aial.com中文字幕在线观看| 国产免费一区二区三区四区乱码| 男女无遮挡免费网站观看| 黄片无遮挡物在线观看| 大片免费播放器 马上看| 两个人免费观看高清视频| 波多野结衣一区麻豆| av不卡在线播放| 欧美人与善性xxx| 最新在线观看一区二区三区 | 18禁裸乳无遮挡动漫免费视频| 国产精品一区二区精品视频观看| 19禁男女啪啪无遮挡网站| 又粗又硬又长又爽又黄的视频| 久久久久久免费高清国产稀缺| 国产高清不卡午夜福利| 成人三级做爰电影| 亚洲精品国产区一区二| 精品国产露脸久久av麻豆| 久久久久精品国产欧美久久久 | 老司机影院毛片| 夫妻午夜视频| 天天躁夜夜躁狠狠躁躁| 午夜福利网站1000一区二区三区| 国产成人免费无遮挡视频| 高清视频免费观看一区二区| 免费久久久久久久精品成人欧美视频| videos熟女内射| 国产熟女午夜一区二区三区| 美国免费a级毛片| 成人免费观看视频高清| 国产精品 欧美亚洲| tube8黄色片| 亚洲熟女精品中文字幕| 色综合欧美亚洲国产小说| 久久久久久久国产电影| 久久免费观看电影| 高清欧美精品videossex| 大陆偷拍与自拍| 久久 成人 亚洲| 免费黄网站久久成人精品| 黑丝袜美女国产一区| 亚洲色图综合在线观看| 久久毛片免费看一区二区三区| 熟女少妇亚洲综合色aaa.| 嫩草影视91久久| 久久久亚洲精品成人影院| 91精品伊人久久大香线蕉| 宅男免费午夜| 亚洲av男天堂| 国产精品久久久久久久久免| 国产成人精品久久久久久| 国产av国产精品国产| 午夜免费观看性视频| 人人妻人人澡人人看| 国产黄色视频一区二区在线观看| 国产成人精品在线电影| 国产精品一区二区在线不卡| 丁香六月天网| 哪个播放器可以免费观看大片| 亚洲一区中文字幕在线| 欧美日本中文国产一区发布| 精品午夜福利在线看| 欧美成人精品欧美一级黄| 午夜福利网站1000一区二区三区| 亚洲,一卡二卡三卡| 成年动漫av网址| 日日撸夜夜添| 伊人久久国产一区二区| 黄片无遮挡物在线观看| 一级a爱视频在线免费观看| 免费观看人在逋| 亚洲成人手机| 美女中出高潮动态图| 最近最新中文字幕免费大全7| 亚洲国产成人一精品久久久| 日本av免费视频播放| 麻豆精品久久久久久蜜桃| 肉色欧美久久久久久久蜜桃| 日日爽夜夜爽网站| 成人午夜精彩视频在线观看| 亚洲av福利一区| 亚洲国产欧美日韩在线播放| 日本黄色日本黄色录像| av网站在线播放免费| 精品亚洲成国产av| av福利片在线| 麻豆精品久久久久久蜜桃| 青草久久国产| 久久久精品区二区三区| 亚洲天堂av无毛| 久久久久久人妻| 又大又黄又爽视频免费| 婷婷成人精品国产| 亚洲av中文av极速乱| 2021少妇久久久久久久久久久| 亚洲av电影在线进入| 国产免费视频播放在线视频| 国产在线视频一区二区| 两个人看的免费小视频| 精品视频人人做人人爽| 十分钟在线观看高清视频www| 亚洲国产成人一精品久久久| 国产精品国产三级专区第一集| 五月天丁香电影| 精品国产一区二区三区久久久樱花| 久久99热这里只频精品6学生| 午夜免费观看性视频| 在线观看三级黄色| 一个人免费看片子| 最新在线观看一区二区三区 | 久久ye,这里只有精品| 国产成人系列免费观看| 久久久久国产一级毛片高清牌| 午夜福利乱码中文字幕| 久久午夜综合久久蜜桃| 久久久精品免费免费高清| 国产男人的电影天堂91| 伊人久久国产一区二区| 又大又黄又爽视频免费| 如何舔出高潮| 亚洲人成电影观看| 最近的中文字幕免费完整| 一级片免费观看大全| 国产精品久久久久久精品电影小说| 母亲3免费完整高清在线观看| 一本大道久久a久久精品| 成人亚洲欧美一区二区av| 少妇猛男粗大的猛烈进出视频| 精品少妇黑人巨大在线播放| 青春草亚洲视频在线观看| h视频一区二区三区| 大码成人一级视频| 免费黄频网站在线观看国产| 精品国产一区二区久久| 18禁观看日本| 国产国语露脸激情在线看| 欧美日韩亚洲高清精品| 午夜av观看不卡| 亚洲精品在线美女| 一区二区三区精品91| 一级黄片播放器| 日韩人妻精品一区2区三区| 中文乱码字字幕精品一区二区三区| 肉色欧美久久久久久久蜜桃| 欧美乱码精品一区二区三区| 精品酒店卫生间| 啦啦啦视频在线资源免费观看| 亚洲精品美女久久av网站| 校园人妻丝袜中文字幕| 亚洲成av片中文字幕在线观看| 丰满迷人的少妇在线观看| 宅男免费午夜| 亚洲国产精品一区三区| 欧美激情 高清一区二区三区| 激情五月婷婷亚洲| 亚洲精品久久成人aⅴ小说| 黑人猛操日本美女一级片| 男人添女人高潮全过程视频| 国产精品秋霞免费鲁丝片| 纯流量卡能插随身wifi吗| 一区在线观看完整版| 免费看av在线观看网站| 大香蕉久久成人网| 亚洲精品在线美女| 亚洲天堂av无毛| 久久久久久人人人人人| 免费在线观看黄色视频的| 亚洲美女搞黄在线观看| 亚洲欧美成人综合另类久久久| 久久久久久久大尺度免费视频| 中文字幕人妻熟女乱码| 国产精品久久久久久久久免| 美女主播在线视频| 18禁动态无遮挡网站| 99热国产这里只有精品6| 成年av动漫网址| 国产麻豆69| 日本91视频免费播放| 久久久久视频综合| 男人添女人高潮全过程视频| 亚洲国产中文字幕在线视频| 国产精品香港三级国产av潘金莲 | 久久久久国产一级毛片高清牌| 精品人妻在线不人妻| 亚洲国产精品一区三区| 老汉色av国产亚洲站长工具| 高清不卡的av网站|