• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A case study of 3D RTM-TTI algorithm on multicore and many-core platforms①

    2017-06-27 08:09:23ZhangXiuxia張秀霞TanGuangmingChenMingyuYaoErlin
    High Technology Letters 2017年2期

    Zhang Xiuxia (張秀霞), Tan Guangming, Chen Mingyu, Yao Erlin

    (*State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, P.R.China) (**University of Chinese Academy of Sciences, Beijing 100049, P.R.China)

    A case study of 3D RTM-TTI algorithm on multicore and many-core platforms①

    Zhang Xiuxia (張秀霞)②***, Tan Guangming*, Chen Mingyu*, Yao Erlin*

    (*State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, P.R.China) (**University of Chinese Academy of Sciences, Beijing 100049, P.R.China)

    3D reverse time migration in tiled transversly isotropic (3D RTM-TTI) is the most precise model for complex seismic imaging. However, vast computing time of 3D RTM-TTI prevents it from being widely used, which is addressed by providing parallel solutions for 3D RTM-TTI on multicores and many-cores. After data parallelism and memory optimization, the hot spot function of 3D RTM-TTI gains 35.99X speedup on two Intel Xeon CPUs, 89.75X speedup on one Intel Xeon Phi, 89.92X speedup on one NVIDIA K20 GPU compared with serial CPU baseline. This study makes RTM-TTI practical in industry. Since the computation pattern in RTM is stencil, the approaches also benefit a wide range of stencil-based applications.

    3D RTM-TTI, Intel Xeon Phi, NVIDIA K20 GPU, stencil computing, many-core, multicore, seismic imaging

    0 Introduction

    3D reverse time migration in tiled transverse isotropy (3D RTM-TTI) is the most precise model used in complex seismic imaging, which remains challenging due to technology complexity, stability, computational cost and difficulty in estimating anisotropic parameters for TTI media[1,2]. Reverse time migration (RTM) model was first introduced in the 1983[3]by Baysal. However, the 3D RTM-TTI model is more recent[1,2,4], which is much more precise and intricate in complex seismic imaging. Normally, RTM-TTI needs thousands of iterations to get image data in particular precision. In our practical medium-scale data set, it takes around 606 minutes to iterate 1024 times with five processes on Intel Xeon processors. It will cost more when dealing with larger dataset or iterating more times in order to get more accurate result in future experiments. Enormous computing time prevents 3D RTM-TTI from being widely used in industry.

    The limitations of current VLSI technology resulting in memory wall, power wall, ILP wall and the desire to transform the ever increasing number of transistors on a chip dictated by Moore’s Law into faster computers have led most hardware manufacturers to design multicore processors and specialized hardware accelerators. In the last few years, specialized hardware accelerators such as the Cell B.E. accelerators[5], general-purpose graphics processing units (GPGPUs)[6]have attracted the interest of the developers of scientific computing libraries. Besides, more recent Intel Xeon Phi[7]also emerges in Graph500 rankings. High performance energy efficiency and high performance price ratio feature these accelerators. Our work is trying to address the enormous computing time of 3D RTM-TTI by utilizing them.

    The core computation of RTM model is a combination of three basic stencil calculations:x-stencil,y-stencil andz-stencil as explained later. Although the existing stencil optimization methods could be adopted on GPU and CPU, it’s more compelling than ever to design a more efficient parallel RTM-TTI by considering the relationship among these stencils. Besides, there is not much performance optimization research on Intel Xeon Phi. Fundamental research work on Intel Xeon Phi is needed to find their similarity and difference of the three platforms.

    In this paper, implementation and optimization of 3D RTM-TTI algorithms on CPUs, Intel Xeon Phi and GPU are presented considering both architectural features and algorithm characteristics. By taking the algorithm characteristics into account, a proper low data coupling task partitioning method is designed. Considering architecture features, a series of optimization methods is adopted explicitly or implicitly to reduce high latency memory access and the number of memory accesses. On CPU and Xeon Phi, we start from parallelization in multi-threading and vectorization, kernel memory access is optimized by cache blocking, huge page and loop splitting. On GPU, considering GPU memory hierarchy, a new 1-pass algorithm is devised to reduce computations and global memory access. The main contributions of this paper can be summarized as follows:

    1. Complex 3D RTM-TTI algorithm is systematically implemented and evaluated on three different platforms: CPU, GPU, and Xeon Phi, which is the first time to implement and evaluate 3D RTM-TTI on these three platforms at the same time.

    2. With deliberate optimizations, the 3D RTM-TTI obtains considerable performance speedup which makes RTM-TTI practical in industry.

    3. Optimization methods are quantitatively evaluated which may guide other developers and give us some insight about architecture in software aspect. By analyzing the process of designing parallel codes, some general guides and advice in writing and optimizing parallel program on Xeon Phi, GPUs and CPUs are given.

    The rest of the paper is organized as follows: An overview of algorithm and platform is given in Section 1. Section 2 and 3 highlight optimization strategies used in the experiments on CPU, Xeon Phi and GPU respectively. In Section 4, the experimental results and analysis of the results are presented. Related work is discussed in Section 5. At last, conclusion is done in Section 6.

    1 Background

    To make this paper self-contained, a brief introduction is given to 3D RTM-TTI algorithm, then the architecture of Intel MIC and NVIDIA GPU K20 and programming models of them are described respectively.

    1.1 Sequential algorithm

    RTM model is a reverse engineering process. The main technique for seismic imaging is to generate acoustic waves and record the earth’s response at some distance from the source. It tries to model propagation of waves in the earth in two-way wave equation, once from source and once from receiver. The acoustic isotropic wave can be written as partial differential functions[8]. Fig.1 shows the overall 3D RTM-TTI algorithm, which is composed of shots loop, nested iteration loop and nested grid loop. Inside iteration, it computes front and back propagation wave field, boundary processing and cross correlation. In timing profile, most of the computation time of 3D RTM-TTI algorithm is occupied by the wave field computing step. Fig.2 shows the main wave updating operations within RTM after discretization of partial differential equations. Wave updating function is composed of derivative computing, like most finite differential computing, they belong to stencil computing. Three base stencils are combined to formxy,yz,xzstencils,asFig.3shows.Eachcellinwavefieldneedsacubicof9×9×9toupdateasFig.4shows.Allthesethreestencilshaveoverlappedmemoryaccess.

    1.2 Architecture of Xeon Phi

    Xeon Phi (also called MIC)[7]is a brand name given to a series of manycore architecture. Knight Corner is the codename of Intel’s second generation manycore architecture, which comprises up to sixty-one processor cores connected by a high performance on-die bidirectional interconnect. Each core supports 4 hardware threadings. Each thread replicates some of the architectural states, including registers, which makes it very fast to switch between hardware threads. In addition to the IA cores, there are 8 memory controllers supporting up to 16 GDDR5 channels delivering up to 5.5GT/s. In each MIC core, there are two in-order pipelines: scalar pipeline and vector pipeline. Each core has 32 registers of 512 bits width. Programming on Phi can be run both natively like CPU and in offload mode like GPU.

    1.3 Kepler GPU architecture

    NVIDIA GPU[6]is presented as a set of multiprocessors. Each one is equipped with its own CUDA cores and shared memory (user-managed cache). Kepler is the codename for a GPU microarchitecture developed by NVIDIA as the successor to the Fermi. It has 13 to 15 SMX units, as for K20, the number of SMX units is 13. All multiprocessors have access to global device memory. Memory latency is hidden by executing thousands of threads concurrently. Registers and shared memory resources are partitioned among the currently executing threads, context switching between threads is free.

    Fig.3 One wave field point updating

    Fig.4 Stencil in a cubic

    2 Implementation and optimization on Intel Xeon Phi and CPU

    Optimizing RTM on Intel Xeon Phi and CPU is similar due to similar programming model, the optimization methods of these two platforms are proposed in detail in this section.

    2.1 Parallelization

    2.1.1 Multi-threading

    Intel threading building blocks (TBB) thread library is used to parallelize 3D RTM-TTI codes on CPU and Xeon Phi. Since grid size is much larger than the thread size, the task is partitioned in 3D dimension sub-cubic. Fig.5 demonstrates TBB template for 3D task partition, and the task size is (bx,by,bz).OnCPUandXeonPhiplatforms,eachthreadcomputesdeviationsinthesub-cubic.Anautomatictuningtechniqueisusedtosearchthebestnumberofthreads.ForRTMapplication,theoptimalnumberofthreadsonXeonPhiis120,thebestthreadsnumberofIntelXeonCPUNUMA-coreis12threads.

    2.1.2 Instruction level parallel: SIMDization

    One of the most remarkable features of Xeon Phi is its vector computing unit. Vector length is 512 bits, which is larger than CPU’s vector 256 bits AVX vector. One Xeon Phi vector instruction can be used to compute 512/8/4 = 16 single float type data at once. Vector instruction is used by unrolling the innermost loop and using #pragmasimdintrinsic.

    2.2Memoryoptimization

    2.2.1Cacheblocking

    Cacheblockingisastandardtechniqueforimprovingcachereuse,becauseitreducesthememorybandwidthrequirementofanalgorithm.Thedatasetinasinglecomputingnodeinourapplicationis4.6GB,whereascachesizefortheprocessorsinCPUandXeonPhiislimitedtoafewMBs.Thefactthathigherperformancecanbeachievedforsmallerdatasetsfittingintocachememorysuggestsadivide-and-conquerstrategyforlargerproblems.Cacheblockingisaneffectwaytoimprovelocality.Cacheblockingisusedtoincreasespatiallocality,i.e.referencingnearbymemoryaddressesconsecutively,andreduceeffectivememoryaccesstimeoftheapplicationbykeepingblocksoffuturearrayreferencesatthecacheforreuse.Sincethedatatotalusedisfarbeyondcachecapacityandnon-continuousmemoryaccess,acachemissisunavoidable.It’seasiertoimplementcacheblockingonthebasisofourpreviousparallelTBBimplementation,becauseTBBisataskbasedthreadlibrary,eachthreadcandoseveraltasks,soaparallelprogramcanhavemoretasksthanthreads.Thetasksize(bx,by,bz)isadjustedtosmallcubicthatcouldbecoveredbyL2cache.

    2.2.2Loopsplitting

    Loopsplittingorloopfissionisasimpleapproachthatbreaksaloopintotwoormoresmallerloops.Itisespeciallyusefulforreducingthecachepressureofakernel,whichcanbetranslatedtobetteroccupancyandoverallperformanceimprovement.Ifmultipleoperationsinsidealoopbodyreplyondifferentinputsandtheseoperationsareindependent,then,theloopsplittingcanbeapplied.Thesplittingleadstosmallerloopbodiesandhencereducestheloopregisterpressure.ThedataflowofPandQarequitedecoupled.It’sbettertosplitthemtoreducethestressofcache.IterateondatasetPandQrespectively.

    2.2.3Hugepagetable

    SinceTLBmissesareexpensive,TLBhitscanbeimprovedbymappinglargecontiguousphysicalmemoryregionsbyasmallnumberofpages.SofewerTLBentriesarerequiredtocoverlargervirtualaddressranges.Areducedpagetablesizealsomeansareductionmemorymanagementoverhead.Touselargerpagesizesforsharedmemory,hugepagesmustbeenabledwhichalsolocksthesepagesinphysicalmemory.Thetotalmemoryusedis4.67GB,andmorethan1Mpagesof4kBsizewillbeused,whichexceedswhatL1andL2TLBcanhold.Byobservationofthealgorithm,itisfoundthatPandQareusedmanytimes,hugepagesareallocatedforthem.Regular4kBpageandhugepagearemixedlyusedtogether.Theusingmethodissimple.First,interactwithOSbywritingourinputintotheprocdirectory, and reserve enough huge pages. Then usemmapfunction to map huge page files into process memory.

    3 Implementation and optimizations on GPU

    3.1 GPU implementation

    The progress of RTM is to compute a serials of derivatives and combine them to update wave fieldPandQ. In GPU implementation, there are several separate kernels to compute each derivative. Without losing generality, we give an example how to compute dxyin parallel. The output of this progress is a 3D grid of dxy. Task partition is based on result dxy. Each thread computenzpoints, each block computebx·bypanel,andlotsofblockswillcoverthetotalgrid.

    3.2Computingreductionand1-passalgorithmoptimization

    Fig.3showsseveralkindsofderivatives.Thetraditional2-passcomputationistocompute1-orderderivativedx, dy, dz,andthencomputedxy, dyz, dxzbasedonit.Thismethodwillbringadditionalglobalreads,globalwritesandstoragespace.Amethodtoreduceglobalmemoryaccessisdevisedbyusingsharedmemoryandregisters:1-passalgorithm.Similarto2-passalgorithm,eachthreadcomputesaz-direction result of dxy.The1-orderresultxy-panel is stored in shared memory, and register double buffering is used to reduce shared memory reading. Fig.6 shows a snapshot of register buffering.

    Fig.6 1-pass computing window snapshot

    4 Evaluation

    4.1 Experiment setup

    The experiment is conducted on three platforms. The main parameters are listed in Table 1. The input of RTM is single pulse data with grid dimension of 512×312×301. The algorithm iterates 1000 times. The time in this section is the average time of one iteration.

    Table 1 Architecture parameters

    4.2 Overall performance

    Fig.7 shows performance comparison of three platforms. Our optimized 3D RTM-TTI gains considerable performance speedup. The hotspot function of 3D RTMTTI gains 35.99X speedup on two Intel Xeon CPUs, 89.75X speedup on one Intel Xeon Phi, 89.92X speedup on one NVDIA K20 GPU compared with serial CPU baselines. Our work makes RTM-TTI practical in industry. The result also shows obviously that accelerators are better at 3D RTM-TTI algorithm than traditional CPUs. The hotspot function gains around 2.5X speedup on GPU and Xeon Phi than that on two CPUs. On one hand, because the data dependency in RTM algorithm is decoupled, plenty of parallelism could be applied. Accelerators have more cores, threads, and wider vector instructions. For example, Xeon Phi has 60 computing cores. Besides that, it has 512-bit width vector instruction. Tesla K20 GPU has 2496 cores. Hence, accelerators are good at data parallelism computing. RTM algorithm is a memory bounded application. Accelerators like Xeon Phi and GPU have 7X and 5X more theoretical memory bandwidth than CPU as shown in Table 1.

    Fig.7 Performance evaluations of three platforms

    4.3 Performance analysis

    On CPU, the wave updating function gains 35.99X speedup compared with single thread CPU baseline. 20.12X speedup comes from parallelism of multi-threading and vector instruction as 1.96X comes from memory optimization, such as cache blocking, loop splitting and huge page configuring, as Figs 8 and 9 show.

    Fig.10 and Fig.11 show the parallelism and memory optimization performance of Xeon Phi respectively. RTM gains 13.81X for using 512-bit vector instruction on Phi. From Table 1, it is seen that the ideal speedup for single float SIMD on Xeon Phi is 16X. SIMD is near to the ideal limit. It’s due to cache miss which will make the pipeline stalled. The multi-threading on Xeon Phi gains 40.13X speedup, there are 60 cores on Xeon Phi. Xeon Phi has very good scalability in multi-threading and wide vector instruction. RTM gains 2.08X speedup due to cache blocking, because cache blocking reduces cache miss rate and provides good memory locality which will benefits SIMD and multi-threading. RTM gains 1.44X by using huge page for reducing L2 TLB miss rate. Loop splitting gains 1.69X speedup to reduce cache pressure in advance. When compared on the same platform, 2806.13X speedup is gained compared with the single thread Xeon Phi baseline. Of this, 554.53X is from parallelism of multi-threading and vector instruction, 5.06X is achieved from memory optimization. Here Intel Phi is more sensitive to data locality according to more speedup gains from explicit memory optimization.

    Fig.8 Parallelism evaluation on CPU (MT:multi-threading, Vec: vectorization)

    Fig.9 Memory optimization on CPU (Ca: cache blocking, Sp:splitting)

    As Fig.12 shows, RTM gains 1.23X speedup by using 1-pass algorithm on GPU, and 1.20X speedup by using texture memory in 1-pass algorithm. In total,the hot spot function gains 2.33X speedup compared with the baseline parallel GPU implementation. Threads block and grid selection are very important to the performance of application. Making full use of fast memory, such as shared memory and texture memory, will benefit application a lot. Explicit data locality plays an important role in application performance on GPU.

    Fig.10 Parallelization on Phi

    Fig.11 Memory optimization on Phi (HP:huge page)

    Fig.12 Memory optimization on GPU evaluation

    5 Related work

    Araya-Polo[9]assessed RTM algorithm in three kinds of accelerators: IBM Cell/B.E., GPU, and FPGA, and suggested a wish list from programming model, architecture design. However they only listed some optimization methods, and didn’t evaluate the impact quantitatively on RTM performance. Their paper was published earlier than Intel Xeon Phi, so performance about Xeon Phi is not included in that paper. In this paper, we choose much more popular platforms, and we evaluated each optimization method quantitatively. Heinecke[10]discussed performance of regression and classification algorithms in data mining problems on Intel Xeon Phi and GPGPU, and demonstrated that Intel Xeon Phi was better at sparse problem than GPU with less optimizations and porting efforts. Micikevicius[11]optimized RTM on GPU and demonstrated considerable speedups. Our work differs from his in that the model in his paper is average derivative method, our’s model is 3D RTM-TTI, which is more compelling.

    6 Conclusion and Future work

    In this paper, we discussed the enormously time-consuming but important seismic imaging application 3D RTM-TTI by parallel solution, and presented our optimization experience on three platforms: CPU, GPU, and Xeon Phi. To the best of our knowledge this is the first simultaneous implementation and evalution of 3D RTM-TTI on these three new platforms. Our optimized 3D RTM-TTI gains considerable performance speedup. Optimization on the Intel Xeon Phi architecture is similiar to CPU due to similar x86 architecture and programming model. Thread parallelization, vectorization and explicit memory locality are particularly critical for this architecture to achieve high performance. Vector instruction plays an important role in Xeon Phi, and loop dependence should be dismissed in order to use them, otherwise, performance will be punished. In general, memory optimizations should be explicaed such as using shared memory, constant memory etc. To benefit GPU applications a lot, bank conflicts should be avoided to get higher practical bandwidth. In future, we will evaluate our distributed 3D RTM-TTI algorithm and analysis communications.

    [ 1] Alkhalifah T. An acoustic wave equation for anisotropic media.Geophysics, 2000, 65(4):1239-1250

    [ 2] Zhang H, Zhang Y. Reverse time migration in 3D heterogeneous TTI media. In: Proceedings of the 78th Society of Exploration Geophysicists Annual International Meeting, Las Vegas, USA, 2008. 2196-2200

    [ 3] Baysal E, Kosloff D D, Sherwood J W. Reverse time migration.Geophysics, 1983, 48(11):1514-1524

    [ 4] Zhou H, Zhang G, Bloor B. An anisotropic acoustic wave equation for modeling and migration in 2D TTI media. In: Proceedings of the 76th Society of Exploration Geophysicists Annual International Meeting, San Antonio, USA, 2006. 194-198

    [ 5] Gschwind M, Hofstee H P, Flachs B, et al. Synergistic processing in cell’s multicore architecture.IEEEMicro, 2006, 26(2):10-24

    [ 6] NVIDIA Cooperation, NVIDIA’s next generation cuda compute architecture: Fermi. http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf, White Paper, 2009

    [ 7] Intel Cooperation, Intel Xeon Phi coprocessor system software developers guide. https://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-system-software-developers-guide.html, White Paper, 2014

    [ 8] Micikevicius P. 3D finite difference computation on GPUs using CUDA. In: Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units, Washington, D.C., USA, 2009. 79-84

    [ 9] Araya-Polo M, Cabezas J, Hanzich M, et al. Assessing accelerator-based HPC reverse time migration.IEEETransactionsonParallelandDistributedSystems, 2011, 22(1):147-162

    [10] Heinecke A, Klemm M, Bungartz H J. From GPGPU to many-core: NVIDIA Fermi and Intel many integrated core architecture.ComputinginScience&Engineering, 2012,14(2): 78-83

    [11] Zhou H, Ortigosa F, Lesage A C, et al. 3D reverse-time migration with hybrid finite difference pseudo spectral method. In: Proceedings of the 78th Society of Exploration Geophysicists Annual Meeting, Las Vegas, USA, 2008. 2257-2261

    Zhang Xiuxia, born in 1987, is a Ph.D candidate at Institute of Computing Technology, Chinese Academy of Sciences. Her research includes parallel computing, compiler and deep learning.

    10.3772/j.issn.1006-6748.2017.02.010

    ①Supported by the National Natural Science Foundation of China (No. 61432018).

    ②To whom correspondence should be addressed. E-mail: zhangxiuxia@ict.ac.cn

    on Apr. 16, 2016

    久久影院123| 韩国高清视频一区二区三区| 日日啪夜夜撸| 色94色欧美一区二区| 丝袜在线中文字幕| av一本久久久久| a级毛片在线看网站| 狠狠精品人妻久久久久久综合| 国产精品久久久久久精品电影小说| 视频区图区小说| 国产熟女欧美一区二区| 大香蕉久久网| 91精品国产九色| 亚洲国产色片| 久久97久久精品| 天堂8中文在线网| 一级a做视频免费观看| 国产成人精品婷婷| 久久人人爽人人爽人人片va| 日日啪夜夜撸| 最新的欧美精品一区二区| 久久久亚洲精品成人影院| 麻豆成人av视频| 日韩精品免费视频一区二区三区 | 精品卡一卡二卡四卡免费| 免费大片18禁| 欧美另类一区| 国产男人的电影天堂91| 少妇人妻久久综合中文| 欧美日韩精品成人综合77777| 亚洲欧美精品自产自拍| 精品久久久噜噜| 国精品久久久久久国模美| 亚洲欧洲日产国产| 啦啦啦在线观看免费高清www| 日本wwww免费看| freevideosex欧美| 欧美日韩综合久久久久久| 少妇高潮的动态图| 中文资源天堂在线| 亚洲精品一二三| 国产av一区二区精品久久| 日韩精品免费视频一区二区三区 | 免费大片黄手机在线观看| 美女中出高潮动态图| 久久久国产欧美日韩av| 丁香六月天网| 欧美 日韩 精品 国产| 美女视频免费永久观看网站| 久久久久网色| 久久 成人 亚洲| 青青草视频在线视频观看| 成人18禁高潮啪啪吃奶动态图 | 国产精品久久久久久久电影| 99久久精品一区二区三区| 秋霞伦理黄片| 精品亚洲乱码少妇综合久久| 国产日韩欧美视频二区| 亚洲精品第二区| 国产精品一二三区在线看| 天堂俺去俺来也www色官网| 久久精品国产亚洲网站| 亚洲人与动物交配视频| 美女福利国产在线| 亚洲欧洲精品一区二区精品久久久 | kizo精华| 日日摸夜夜添夜夜添av毛片| 国产精品人妻久久久久久| 欧美激情国产日韩精品一区| 亚洲第一av免费看| 成人综合一区亚洲| 国产精品国产av在线观看| 在线观看免费视频网站a站| 久久这里有精品视频免费| 国产免费又黄又爽又色| 老女人水多毛片| 韩国av在线不卡| 国产精品熟女久久久久浪| 最近手机中文字幕大全| 久久6这里有精品| 亚洲欧美日韩卡通动漫| 久久99热这里只频精品6学生| 9色porny在线观看| 日本av手机在线免费观看| 青青草视频在线视频观看| 一二三四中文在线观看免费高清| 亚洲精品久久午夜乱码| 少妇猛男粗大的猛烈进出视频| 少妇猛男粗大的猛烈进出视频| 亚洲欧美日韩卡通动漫| 成人无遮挡网站| 国产在视频线精品| 好男人视频免费观看在线| 久久人人爽av亚洲精品天堂| av天堂久久9| 国产黄片视频在线免费观看| 亚洲精品国产色婷婷电影| 国产视频首页在线观看| √禁漫天堂资源中文www| 狂野欧美激情性xxxx在线观看| 日本猛色少妇xxxxx猛交久久| 一边亲一边摸免费视频| 2018国产大陆天天弄谢| 最近最新中文字幕免费大全7| 18禁裸乳无遮挡动漫免费视频| 亚洲伊人久久精品综合| 99久久中文字幕三级久久日本| 精品久久久久久久久亚洲| 国产熟女午夜一区二区三区 | 男女啪啪激烈高潮av片| 91午夜精品亚洲一区二区三区| 国产黄色免费在线视频| 日韩伦理黄色片| av免费在线看不卡| 丰满乱子伦码专区| 少妇裸体淫交视频免费看高清| 在线观看免费高清a一片| 一区二区三区乱码不卡18| 简卡轻食公司| 亚洲精品456在线播放app| 熟妇人妻不卡中文字幕| 男女免费视频国产| 乱系列少妇在线播放| 人妻少妇偷人精品九色| 超碰97精品在线观看| 国产爽快片一区二区三区| 极品教师在线视频| 亚洲图色成人| 日韩视频在线欧美| 在线播放无遮挡| 国产欧美亚洲国产| 久久99一区二区三区| 日本与韩国留学比较| 国产亚洲午夜精品一区二区久久| 亚洲国产最新在线播放| 国产69精品久久久久777片| 亚洲国产精品国产精品| 丝袜喷水一区| 80岁老熟妇乱子伦牲交| 日韩在线高清观看一区二区三区| 亚洲久久久国产精品| .国产精品久久| 欧美xxⅹ黑人| 高清午夜精品一区二区三区| 制服丝袜香蕉在线| 只有这里有精品99| 日本午夜av视频| 女人精品久久久久毛片| 国产亚洲欧美精品永久| 91久久精品国产一区二区三区| 丰满迷人的少妇在线观看| 国产av国产精品国产| 国内精品宾馆在线| 麻豆精品久久久久久蜜桃| 黑人巨大精品欧美一区二区蜜桃 | 精品久久久久久电影网| 国产精品福利在线免费观看| 成人国产麻豆网| 男女边吃奶边做爰视频| 黄色怎么调成土黄色| 国模一区二区三区四区视频| 欧美xxxx性猛交bbbb| 久久精品久久久久久久性| 纯流量卡能插随身wifi吗| 久久久久久久亚洲中文字幕| 熟女av电影| 黑丝袜美女国产一区| 亚洲,欧美,日韩| 久久女婷五月综合色啪小说| 亚洲精品乱码久久久久久按摩| 一区二区av电影网| 国产成人精品婷婷| 久久国产亚洲av麻豆专区| 最后的刺客免费高清国语| 女性被躁到高潮视频| 中文字幕亚洲精品专区| 亚洲第一av免费看| 国产精品.久久久| 亚洲精品乱码久久久久久按摩| 搡女人真爽免费视频火全软件| a级毛片免费高清观看在线播放| 热re99久久国产66热| 午夜激情福利司机影院| 亚洲精品国产av蜜桃| 在线观看免费视频网站a站| 麻豆成人av视频| 我的老师免费观看完整版| 日韩伦理黄色片| 欧美少妇被猛烈插入视频| 在线观看免费高清a一片| 国内精品宾馆在线| 欧美日韩国产mv在线观看视频| 亚洲欧美清纯卡通| 香蕉精品网在线| 成年女人在线观看亚洲视频| 久久久久久久久久成人| videos熟女内射| 人妻系列 视频| 最近手机中文字幕大全| 国产精品女同一区二区软件| 午夜久久久在线观看| 如日韩欧美国产精品一区二区三区 | 欧美激情国产日韩精品一区| 欧美一级a爱片免费观看看| 性色av一级| 最近手机中文字幕大全| 七月丁香在线播放| 国产精品无大码| 欧美日韩一区二区视频在线观看视频在线| 大又大粗又爽又黄少妇毛片口| 欧美日韩亚洲高清精品| 国产黄色视频一区二区在线观看| 日韩免费高清中文字幕av| 国产av国产精品国产| 国产精品一区www在线观看| 高清在线视频一区二区三区| 日韩视频在线欧美| 女人久久www免费人成看片| 免费观看在线日韩| 欧美日韩精品成人综合77777| 亚洲国产欧美在线一区| 综合色丁香网| 日本av手机在线免费观看| 久久久久视频综合| 91久久精品国产一区二区三区| 在线观看免费高清a一片| 狂野欧美白嫩少妇大欣赏| 精品亚洲乱码少妇综合久久| 97超碰精品成人国产| 中文字幕人妻丝袜制服| 亚洲精品,欧美精品| 亚洲一级一片aⅴ在线观看| 视频中文字幕在线观看| 性色avwww在线观看| 建设人人有责人人尽责人人享有的| 欧美 日韩 精品 国产| 在线观看一区二区三区激情| 99热这里只有是精品50| 中文字幕久久专区| 亚洲第一av免费看| 一个人看视频在线观看www免费| 全区人妻精品视频| 男女啪啪激烈高潮av片| 亚洲第一区二区三区不卡| 国产黄频视频在线观看| 一本—道久久a久久精品蜜桃钙片| 久久久久人妻精品一区果冻| 色吧在线观看| 大片电影免费在线观看免费| 久久久久久伊人网av| 日本av手机在线免费观看| 久久久久网色| 九九爱精品视频在线观看| 一级毛片aaaaaa免费看小| 中文字幕亚洲精品专区| 精品视频人人做人人爽| 九草在线视频观看| 性高湖久久久久久久久免费观看| 美女大奶头黄色视频| 色婷婷av一区二区三区视频| 亚洲色图综合在线观看| 亚洲欧美一区二区三区国产| 新久久久久国产一级毛片| 久久久精品免费免费高清| av天堂中文字幕网| 在线观看免费日韩欧美大片 | 69精品国产乱码久久久| 精品国产一区二区久久| 国产精品免费大片| 精品少妇内射三级| 亚洲人与动物交配视频| 丁香六月天网| 99国产精品免费福利视频| 日日啪夜夜撸| 国产国拍精品亚洲av在线观看| 午夜福利影视在线免费观看| 如何舔出高潮| 欧美变态另类bdsm刘玥| 两个人的视频大全免费| 亚洲伊人久久精品综合| 嫩草影院新地址| 丝袜脚勾引网站| 最后的刺客免费高清国语| 一本大道久久a久久精品| 国产精品久久久久成人av| 日本欧美国产在线视频| 人妻少妇偷人精品九色| 国产精品久久久久久av不卡| 好男人视频免费观看在线| 精品卡一卡二卡四卡免费| av卡一久久| 嫩草影院入口| 18禁在线无遮挡免费观看视频| 国产av国产精品国产| 男男h啪啪无遮挡| 欧美日韩国产mv在线观看视频| 能在线免费看毛片的网站| 亚洲精品国产av成人精品| 男人和女人高潮做爰伦理| 国产精品久久久久久久久免| 日韩成人伦理影院| 日产精品乱码卡一卡2卡三| 97在线视频观看| 成年人午夜在线观看视频| 大又大粗又爽又黄少妇毛片口| 国产男女超爽视频在线观看| 国产极品天堂在线| 精品一区在线观看国产| av不卡在线播放| 国产成人精品一,二区| 全区人妻精品视频| 日韩大片免费观看网站| 中文字幕av电影在线播放| 晚上一个人看的免费电影| 欧美+日韩+精品| 国产精品成人在线| 成人18禁高潮啪啪吃奶动态图 | 午夜久久久在线观看| 黄色视频在线播放观看不卡| 一级毛片aaaaaa免费看小| a级毛片在线看网站| 男女边吃奶边做爰视频| 秋霞在线观看毛片| 久久精品国产亚洲网站| 亚洲欧美日韩另类电影网站| 亚洲精品乱码久久久v下载方式| 日韩在线高清观看一区二区三区| 亚洲精品亚洲一区二区| 搡老乐熟女国产| 亚洲成人一二三区av| 亚洲人成网站在线观看播放| 日日啪夜夜撸| 校园人妻丝袜中文字幕| 中文字幕制服av| av天堂中文字幕网| 久久精品夜色国产| 久久久久久久国产电影| 亚洲中文av在线| 最近最新中文字幕免费大全7| 在现免费观看毛片| 久久国产精品男人的天堂亚洲 | 18禁在线无遮挡免费观看视频| 久久久久久久久久久丰满| 熟女电影av网| 午夜激情久久久久久久| 国产精品99久久99久久久不卡 | a级毛片免费高清观看在线播放| 国产美女午夜福利| 人妻 亚洲 视频| 亚洲一区二区三区欧美精品| 国产日韩一区二区三区精品不卡 | 国产一区二区在线观看av| 成人无遮挡网站| av黄色大香蕉| 欧美三级亚洲精品| 99久久中文字幕三级久久日本| 99久久精品国产国产毛片| 国产亚洲av片在线观看秒播厂| 久久人妻熟女aⅴ| 一边亲一边摸免费视频| 狂野欧美白嫩少妇大欣赏| 伦精品一区二区三区| 九九久久精品国产亚洲av麻豆| 国内精品宾馆在线| 亚洲丝袜综合中文字幕| 亚洲精品日韩在线中文字幕| 一级二级三级毛片免费看| 国产69精品久久久久777片| 免费大片黄手机在线观看| 国产伦精品一区二区三区视频9| 国模一区二区三区四区视频| tube8黄色片| 午夜福利网站1000一区二区三区| 夫妻午夜视频| 欧美丝袜亚洲另类| 麻豆成人av视频| 成人国产麻豆网| videossex国产| 新久久久久国产一级毛片| 一级毛片黄色毛片免费观看视频| 午夜激情久久久久久久| 日韩一区二区三区影片| 亚洲欧美日韩东京热| 99热网站在线观看| 国产精品一区www在线观看| 欧美高清成人免费视频www| 久久99精品国语久久久| a级毛片免费高清观看在线播放| 日本黄色片子视频| 国产精品一区二区三区四区免费观看| 亚洲av男天堂| 国产老妇伦熟女老妇高清| 另类亚洲欧美激情| 亚洲精品aⅴ在线观看| 精品久久久久久久久av| 人妻人人澡人人爽人人| 人妻 亚洲 视频| 亚洲精品,欧美精品| 国产av国产精品国产| 国产亚洲av片在线观看秒播厂| 亚洲欧美一区二区三区黑人 | 夫妻午夜视频| 亚洲美女搞黄在线观看| 国产精品一区二区性色av| 男女国产视频网站| 如何舔出高潮| 高清午夜精品一区二区三区| 激情五月婷婷亚洲| 精品99又大又爽又粗少妇毛片| 精品熟女少妇av免费看| 久久久国产一区二区| 免费看av在线观看网站| 夫妻性生交免费视频一级片| 久久久国产一区二区| 国产69精品久久久久777片| 国产高清不卡午夜福利| 国产亚洲91精品色在线| 国产精品久久久久久久久免| 免费观看av网站的网址| 青春草亚洲视频在线观看| 亚洲精品乱码久久久v下载方式| 中文天堂在线官网| 在线观看美女被高潮喷水网站| 丝袜喷水一区| 国产精品福利在线免费观看| 夜夜骑夜夜射夜夜干| 久久久欧美国产精品| 亚洲欧洲国产日韩| 亚洲国产色片| 欧美激情极品国产一区二区三区 | 又黄又爽又刺激的免费视频.| 91午夜精品亚洲一区二区三区| 嫩草影院入口| 国产在线男女| www.色视频.com| 青春草国产在线视频| 亚洲成人av在线免费| 搡老乐熟女国产| 中文字幕人妻熟人妻熟丝袜美| 久久99一区二区三区| 99九九线精品视频在线观看视频| 极品少妇高潮喷水抽搐| 亚洲激情五月婷婷啪啪| 精品国产国语对白av| 亚洲人成网站在线观看播放| 最新中文字幕久久久久| 丝袜脚勾引网站| 国产白丝娇喘喷水9色精品| 免费av中文字幕在线| av线在线观看网站| 国产成人精品无人区| 精品少妇久久久久久888优播| 免费少妇av软件| 亚洲美女黄色视频免费看| 99国产精品免费福利视频| 国产精品一区www在线观看| h视频一区二区三区| 亚洲国产av新网站| 亚洲精品成人av观看孕妇| 一级毛片黄色毛片免费观看视频| 国产日韩一区二区三区精品不卡 | 久久韩国三级中文字幕| 成人午夜精彩视频在线观看| 又大又黄又爽视频免费| 亚洲精品第二区| 在线天堂最新版资源| 成人漫画全彩无遮挡| 国产亚洲av片在线观看秒播厂| 亚洲精品亚洲一区二区| 18禁在线无遮挡免费观看视频| 国产伦理片在线播放av一区| 黑人高潮一二区| 99热这里只有精品一区| 久久免费观看电影| 欧美 亚洲 国产 日韩一| 十八禁网站网址无遮挡 | 亚洲国产成人一精品久久久| 大话2 男鬼变身卡| 日日摸夜夜添夜夜爱| 亚洲无线观看免费| 色哟哟·www| 少妇精品久久久久久久| 观看av在线不卡| 亚洲人与动物交配视频| 久久久久久人妻| 国产极品天堂在线| 国产精品熟女久久久久浪| 国产黄片美女视频| 亚洲情色 制服丝袜| 中文字幕久久专区| 久久久久精品久久久久真实原创| 人妻系列 视频| 高清黄色对白视频在线免费看 | 人人妻人人爽人人添夜夜欢视频 | 男人和女人高潮做爰伦理| 人人妻人人澡人人爽人人夜夜| 少妇裸体淫交视频免费看高清| 国产伦精品一区二区三区四那| 亚洲av.av天堂| 欧美日韩亚洲高清精品| 亚洲一区二区三区欧美精品| 一本一本综合久久| 女人久久www免费人成看片| 一级毛片电影观看| 国产高清三级在线| 菩萨蛮人人尽说江南好唐韦庄| 在线观看www视频免费| 美女大奶头黄色视频| 人妻人人澡人人爽人人| 国产成人精品福利久久| 在线观看美女被高潮喷水网站| 国产av一区二区精品久久| 爱豆传媒免费全集在线观看| 日日啪夜夜爽| 最新中文字幕久久久久| 国产无遮挡羞羞视频在线观看| 亚洲一区二区三区欧美精品| 国产精品久久久久久精品电影小说| 在线天堂最新版资源| 中文字幕av电影在线播放| 人妻一区二区av| 色哟哟·www| 国产色婷婷99| 亚洲精品色激情综合| 中文字幕人妻熟人妻熟丝袜美| 精品国产乱码久久久久久小说| 国产乱来视频区| 日本黄大片高清| 嫩草影院入口| 欧美日本中文国产一区发布| 18禁在线播放成人免费| 午夜免费观看性视频| 国产一区二区在线观看av| 极品教师在线视频| av播播在线观看一区| 三级国产精品片| 国内少妇人妻偷人精品xxx网站| 国产精品熟女久久久久浪| 国产熟女欧美一区二区| 一二三四中文在线观看免费高清| 亚洲无线观看免费| 免费av中文字幕在线| 男人添女人高潮全过程视频| 亚洲一级一片aⅴ在线观看| 自线自在国产av| 最新中文字幕久久久久| 国产精品久久久久久久久免| 国产黄频视频在线观看| 国产中年淑女户外野战色| 22中文网久久字幕| 国产精品嫩草影院av在线观看| 精品国产露脸久久av麻豆| 性色avwww在线观看| 久久国产精品大桥未久av | 欧美变态另类bdsm刘玥| 国产一区亚洲一区在线观看| 亚洲精品一区蜜桃| 精品国产一区二区久久| 久久久久久久国产电影| 69精品国产乱码久久久| 热re99久久精品国产66热6| 中文欧美无线码| 丰满饥渴人妻一区二区三| av在线观看视频网站免费| 韩国av在线不卡| 日日啪夜夜撸| 国产中年淑女户外野战色| 婷婷色麻豆天堂久久| 国产精品99久久99久久久不卡 | 麻豆精品久久久久久蜜桃| 亚洲av国产av综合av卡| 狂野欧美白嫩少妇大欣赏| 欧美精品人与动牲交sv欧美| 在线 av 中文字幕| 欧美精品一区二区大全| 国产成人aa在线观看| 天天躁夜夜躁狠狠久久av| 中文乱码字字幕精品一区二区三区| 国产精品一区二区性色av| 成年人午夜在线观看视频| 男女啪啪激烈高潮av片| 国产中年淑女户外野战色| 亚洲真实伦在线观看| 欧美丝袜亚洲另类| 亚洲精品日韩av片在线观看| 成人二区视频| 观看av在线不卡| 精品99又大又爽又粗少妇毛片| 99热国产这里只有精品6| 乱人伦中国视频| 99久久精品热视频| 亚洲av在线观看美女高潮| 七月丁香在线播放| 久久久a久久爽久久v久久| 日韩中文字幕视频在线看片| 欧美97在线视频| 久久久精品94久久精品| 国产欧美日韩一区二区三区在线 | 男女边摸边吃奶| 99热这里只有是精品在线观看| 91aial.com中文字幕在线观看| 欧美日韩亚洲高清精品| 99久久精品热视频| 在线观看一区二区三区激情| 亚洲精品色激情综合| 亚洲三级黄色毛片| 国产精品免费大片| 亚洲精华国产精华液的使用体验| 99九九线精品视频在线观看视频| 伦精品一区二区三区| 国产av国产精品国产| 成人亚洲欧美一区二区av| 亚洲四区av| 交换朋友夫妻互换小说| 国产精品国产三级国产专区5o|