• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Design and implementation of near-memory computing array architecture based on shared buffer①

    2022-02-11 08:58:08SHANRuiGAOXuFENGYaniHUIChaoCUIXinyueCHAIMiaomiao
    High Technology Letters 2022年4期

    SHAN Rui(山 蕊),GAO Xu,FENG Yani,HUI Chao,CUI Xinyue,CHAI Miaomiao

    (*School of Electronic Engineering,Xi’an University of Posts and Telecommunications,Xi’an 710121,P.R.China)

    (**School of Computer,Xi’an University of Posts and Telecommunications,Xi’an 710121,P.R.China)

    Abstract

    Key words:near-memory computing,shared buffer,reconfigurable array processor,convolutional neural network(CNN)

    0 Introduction

    At present,central processing unit(CPU)+graphics processing unit(GPU)is often used to accelerate convolutional neural network(CNN),but it is inevitable to transfer a large amount of data between CPU memory and GPU memory.The long delay generated by data transmission limits the improvement of system performance[1].As the scale of deep learning is getting larger and larger,it puts high demands on computing performance and memory space.The problem of‘Memory Wall’caused by the mismatch of processing speed and memory speed has become increasingly serious[2].In order to reduce access latency and increase computing speed,hierarchical memory structures such as multi-level cache technology are generally used to reduce access latency[3].However,the three-level memory structure composed of registers,cache,and dynamic random access memory(DRAM)is difficult to meet the high bandwidth and high energy efficiency requirements of new applications for memory access.For the moment,large-scale processing systems are shifting from a traditional computing-centric model to a datacentric model[4].A general near-data processing architecture was proposed in Ref.[5],which is suitable for concurrent data structures such as linked list,skip list,and the first-in-first-out queue.However,the experiment analysis shows that the potential benefits of concurrent data structures based on near-data processing are not ideal.Efficient synchronization between the near-data processing cores is essential to give full play to near-data processing and achieve the high performance of parallel workloads.Due to the lack of shared cache and hardware cache consistency,supporting inter-core synchronous communication is still a challenge for many near-data processing structures[6].Due to the limitations of DRAM technology,the processing efficiency of the past near-data processing(NDP)is not good.However,with the advancement of through silicon via(TSV)technology,three-dimensional(3D)stacked memory has been developed,which enables logic chips and DRAM chips to coexist,making NDP easier to implement.In particular,GPU-based NDP is attractive because it can handle more algorithms.Although there is a strong demand for efficient image processing in many fields,there is not enough research on NDP for image processing.Considering the common characteristics of image processing algorithms and NDP constraints,a widely applicable programmable NDP architecture is demanded[7].A logic unit is designed in the three-dimensional stacked memory hybrid memory cube(HMC)[8],which makes all memory bundles of the HMC used to perform CNN in parallel.Taking into account the intensive data access characteristics of CNN,static random-access memory(SRAM)is added to the logic unit to cache the convolution kernel used in the convolution operation,thereby reducing data communication with DRAM,but this also increases the memory cost.

    Aiming at accelerating CNN calculation to meet the demand of image processing algorithms,a nearmemory computing array architecture based on the shared buffer is proposed,which can realize the flexible computing tasks transferring under the requirements of the different applications and quickly switching between computing and memory access modes.Meanwhile,a CNN parallel mapping method is designed,which further improves the calculation speed of CNN through the memory access latency reduced.The main contributions of this paper are summarized as follows.

    (1)According to the characteristics of low ratio of computing and memory access in data-intensive applications,a coprocessor is designed in the near-memory computing array to reduce the data communication between the reconfigurable array processor and the main memory.Through the effective integration of the nearmemory computing array and the internal processor of the reconfigurable array,the CNN calculation can be improved by the method of near-memory computing without changing the programming interface of the existing architecture.

    (2)Utilizing the locality of memory access and computational parallelism in near-memory computing array,an array-type shared buffer structure is designed to further improve the parallel access and calculation speed of the near-memory computing array.At the same time,the shared buffer array supports non-aligned memory access modes and can update the access mode in real-time.

    The rest of this article is organized as follows.Section 1 introduces the related work and motivation.Section 2 introduces the architecture and characteristics of near-memory computing array and shared cache array.Section 3 introduces the structure of CNN and the realization of parallelization.The experiment is carried out in Section 4.Finally,Section 5 gives some concluding remarks.

    1 Related work and motivation

    With the continuous development of data-intensive applications,the cost of memory access caused by moving data to the calculation location when processing large amounts of data has gradually become a bottleneck in improving system performance.More and more energy is consumed in the data transfer between the offchip memory and the processor[9].To further improve energy efficiency,near-memory computing technology is used to place computing resources near the location of the data,and application programs are reorganized to take advantage of the distributed computing architecture[10].Ref.[11]used HMC to achieve near-memory computing and used the atomic instructions provided by HMC 2.0 to propose a hardware and software mechanism that effectively utilized near-data processing functions.However,the excessively long data transmission latency also limits the increase in processing speed.In order to meet the high performance required by many CNN applications,dedicated hardware accelerators are usually designed.Ref.[12]proposed and assessed a novel mechanism that operates at cache level,leveraging both data-proximity and parallel processing capabilities,enabled by dedicated fully-digital vector functional units(FUs).It also demonstrates the integration of this mechanism in a conventional CPU.The obtained results show that the engine provides performance improvements on CNNs ranging from 3.92×to 16.6×.However,due to the limitations of the physical characteristics of traditional memory device,it is difficult to achieve further breakthroughs in near-memory computing simply by improving the existing memory.Therefore,new types of memory began to receive widespread attention.Ref.[13]used binary resistive random access memory(ReRAM)arrays to process deep neural networks and does not require peripheral computing circuits.This structure is trained on the MNIST data set,and the verification accuracy can reach 96.33%.However,when using ReRAM,write operations usually consume longer time and energy than reading operations,and the reliability of ReRAM still needs further research and exploration[14].

    Based on the above,the near-memory computing array architecture based on a shared buffer is proposed.On the one hand,it can effectively combine with process element(PE)resources in the reconfigurable array through user-defined instructions,and can switch computing and memory access modes in realtime.On the other hand,the shared buffer array not only meets the high bandwidth requirements of new applications but also supports parallel access to further improve computing efficiency.

    2 Near-memory computing array architecture based on shared buffer

    The proposed array architecture is composed of a near-memory computing array and a shared buffer array,as shown in Fig.1.The near-memory computing array is mainly responsible for executing the calculation part through instructions with the characteristics of store-calculation integration.The shared buffer array takes charge of the part of buffer with a high data reuse rate and further improves the parallel access and calculation speed of the near-memory computing array.

    Fig.1 Near-memory computing architecture based on reconfigurable array processor

    Without changing the programming interface of the original reconfigurable architecture,the near-memory computing array integrates with the reconfigurable array processor through custom instructions.The key part of the reconfigurable array are the process elements group(PEG),each PEG contains 4×4 process element PE,16 PEs are connected topologically through adjacency and interconnection of shared registers,and each PE is realized based on reduced instruction set computer(RISC)architecture.Each PE in the reconfigurable array is used as the main processor.The calculation and memory access modes are switched in real-time according to memory access requirements.The PE array can access the local memory directly,and can also issue a near-memory computing instruction to the coprocessor to perform simple calculations on the main memory.

    2.1 Near-memory computing array

    The key part of the near-memory computing array is 16 coprocessors,and each coprocessor has a one-toone correspondence with the main processor.When the main processor PE receives the configuration information issued in real-time through the H-tree configuration network[15],it sends the instructions to the coprocessor through the communication interface with the coprocessor.Before the coprocessor executes the instructions,the operation code represented by the high 6 bit of the instruction is used as a flag to determine whether the currently issued instruction is a calculation instruction supported by the coprocessor.If the opcode of the current instruction does not point to the coprocessor,the instruction is executed in the main processor and the coprocessor does not work.Otherwise,the instruction information of the current instruction except the high 6 bit opcode is issued to the coprocessor for execution.

    According to the characteristics of convolutional calculation,the near-memory computing instructions are specially designed with the characteristics of storecalculation integration,which can support the calculation of fetching from the memory and the update of the data in the memory.The data does not need to be transmitted to the PE array through multi-level buffering for calculation,which speeds up the calculation process of the CNN.

    Table 1 lists some of the special instructions supported by the coprocessor.Among them,M[Rs]and M[Rt]indicate that the source operand comes from external memory,and M[Rd]indicates that the result of the operation is written back to the memory.To further improve the calculation speed of the CNN algorithm,according to the calculation characteristics of the convolution operation in CNN,the multiplication and accumulation instruction MAC and the accumulation register Rm are designed.The input image pixels from the external storage are multiplied by the weight in the convolution kernel and directly added.The calculation result of the previous convolution stored in the register is accumulated,and the result is read from Rm by executing the STRM2 instruction,and the accumulation register is cleared in the next clock cycle after the result is read.

    Table 1 Some special instructions supported by the coprocessor

    To improve the efficiency of algorithm processing,a three-stage pipeline architecture is adopted in the coprocessor unit as shown in Fig.2.Buffering instruction,decoding and fetching data,execution and writeback are corresponding to the three levels respectively.The first level is used to buffer the calculation instructions issued by the main processor.The second level is used to decode instructions and obtain the source operands required for instruction execution according to the decoding results.The main function of the third level is to execute instructions and write the execution result of the instruction back to the destination address.

    Fig.2 The architecture of coprocessor

    2.2 Shared buffer array

    Near-memory computing arrays have the characteristics of high parallelism,and the use of traditional shared storage structures will have problems such as large access conflicts.The continuous sliding of the convolution window during the CNN processing will cause the data in the main memory to be frequently accessed,resulting in a large memory access overhead.Therefore,in order to reduce the data communication between the near-memory computing array and the main memory,an array-type shared buffer array storage structure is proposed.At the same time,the array-type storage structure can further improve the parallel access of the near-memory computing array.Since each coprocessor in the near-memory computing array can access the main memory,at least 16 buffer units are used.At the same time,considering that the convolution kernel is used in the convolution calculation of the entire feature image,the global reusability is higher,and the weight of the convolution kernel needs to be buffered in the buffer unit.Therefore,the designed shared cache array consists of 17 buffers that support non-aligned storage access.

    The entire shared buffer array is mainly composed of a judge unit,17 buffer units(buffer00-buffer16)and an arbiter,as shown in Fig.3.When the coprocessor sends an access request,it is first received by judge unit to determine whether the access is hit.If it is hit,the request will be sent to the corresponding hit buffer to directly read or update the data in the destination buffer.If it is not hit,it will be arbitrated to access the main memory.Each buffer is composed of a tag register unit and a data buffer unit.The tag register unit is used to store the state of the buffer unit and the first address of the buffered data.If the write access hits,the dirty location is to be set while updating the data.

    Fig.3 Shared buffer array structure

    3 CNN parallel mapping

    3.1 Alex Net

    CNN can be used to construct hierarchical classifiers[16],and can also be used in fine-grained recognition to extract discriminative features of images for learning by other classifiers[17].Based on the hardware platform architecture of the near-memory computing array and the shared buffer array,the parallel mapping of the AlexNet is completed and it conducts test verification and performance analysis on the proposed architecture.The specific structure of the AlexNet is shown in Table 2.

    Table 2 Specific structure of AlexNet

    Table 3 shows the calculation complexity and memory occupancy of the network model of each type of AlexNet network[18].From the table,it can be seen that the total amount of data of AlexNet network model parameters is very large,which can reach 230 MB.It is far more than on-chip memory capacity,so all parameters can only be stored in off-chip memory.When classifying and identifying a frame of the input feature map,the convolution operation process in CNN will consume plenty of computational resources,and about 1.33 GOPs are required in the AlexNet.Among all sub-network layers,the convolutional layer is the most computationally intensive part,accounting for more than 90% of the total computational amount of the entire network.

    Table 3 AlexNet network scale analysis

    3.2 Parallel mapping

    According to the structural characteristics of the near-memory computing array and the potential parallelism of CNN,a parallel mapping method is proposed.All memory access computing instructions will be executed in the coprocessor,and the main processor mainly controls the loop in the convolution operation.In this way,the parallelism of the CNN calculation can be improved,and the calculation efficiency of the processor can be promoted.

    In the AlexNet,the convolutional layer includes three types of 11×11,5×5,and 3×3[19].The parallel mapping method to implement the convolution operation is as follows.

    (1)When the size of the convolution kernel is 11×11,the input image size is 227×227.PE00-PE22 respectively completes the 1×11 convolution calculation,and PE30 accumulates the intermediate calculation data of these eleven PEs to obtain the final convolution result.

    After the instruction is issued and the operands required for the calculation instruction are ready,PE00 starts to do the 1×11 multiply-accumulate operation in the first line of the input image,and PE01 does the 1×11 multiply-accumulate operation in the second line,and so on.PE00-PE22 will send a handshake signal to PE30 after completing a 1×11 convolution.When PE30 receives the handshake signal,it will read out the eleven convolution results of PE00-PE22 and accumulate them.Then,it is stored in the corresponding memory location,and PE30 will also send a handshake signal to eleven PEs to indicate that the reception is complete.When PE00-PE22 receives the handshake signal of PE30,they will slide the convolution window to the right and continue to do the next 11×11 convolution until all the convolution results of this layer are obtained.The mapping structure is shown in Fig.4.

    Fig.4 11×11 convolution operation map

    (2)When the size of the convolution kernel is 5×5,the input image size is 27×27.PE00-PE10 respectively completes the 1×5 convolution operation,and PE30 accumulates the intermediate calculation data of five PEs to obtain a 5×5 convolution result.At the same time,after PE11-PE21 respectively completes the 1×5 convolution operation,they send the result to PE31 for accumulation and output a 5×5 convolution result.

    After the instruction is issued,PE00 does the 1×5 multiplication and accumulation operation of the first line of this layer of input,PE01 does the 1×5 multiplication and accumulation operation of the second line of the input image,and so on.At the same time,PE11-PE21 will multiply and accumulate lines 2 to 6 of the input image of this layer.Take PE00-PE10 convolution and accumulation in PE30 as an example:after PE00-PE10 completes a 1×5 convolution,they will send handshake signals to PE30 respectively.When PE30 receives the handshake signals,it will read out the intermediate calculation data of five PEs and write them back to the main memory after accumulation.PE30 will also send handshake signals to PE00-PE10 to indicate that the reception is complete.When PE00-PE10 receives the handshake signal of PE30,they will slide the convolution window to the right and continue to do the next 5×5 convolution until all results are obtained.The mapping structure is shown in Fig.5.

    Fig.5 5×5 convolution operation map

    (3)When the size of the convolution kernel is 3×3,the input image size is 15×15.PE00-PE02 completes the 1×3 convolution operation respectively,and PE30 accumulates the intermediate calculation data of 3 PEs to obtain a 3×3 convolution result.At the same time,PE03-PE11 respectively completes the 1×5 convolution operation,sends the result to PE31 for accumulation and outputs a 3×3 convolution result.PE12-PE20 respectively completes 1×3 convolution operations and PE30 performs accumulation to obtain a 3×3 convolution result.

    When the instruction is issued,PE00 will start the 1×3 multiplication and accumulation operation of the first line of the input image,PE01 will do the 1×3 multiplication and accumulation operation of the second line of the input image,and PE02 will do the third line of the input image.PE03-PE11 will do 1×3 multiply and accumulate operations from rows 2 to 4 at the same time,while PE12-PE20 will do 1×3 from rows 3 to 5 multiply and accumulate operations.Take PE00-PE02 convolution and accumulate in PE30 as an example:when PE00-PE02 completes a 1×3 convolution,they will send handshake signals to PE30,and when PE30 receives the handshake signals,it will read intermediate calculation data of PE00-PE02,sum this data and write back the result to the corresponding memory location.Meanwhile,PE30 will send handshake signals to these PEs to indicate that the reception is complete.When PE00-PE02 receives the handshake signal of PE30,they will slide the convolution window to the right and continue to do the following 3×3 convolution until all the results are obtained.The mapping structure is shown in Fig.6.

    Fig.6 3×3 convolution operation map

    3.3 Data reuse analysis

    The shared buffer array is used to buffer data to realize data reuse.Take the convolution operation of 3×3 size as an example,as shown in Fig.7.When the size of the convolution kernel is 3×3,twelve coprocessors are used to process three 3×3 convolution operations at the same time,of which nine coprocessors are used to perform 1×3 convolution operation,and three coprocessors are used to accumulate intermediate calculation data.When twelve coprocessors perform calculations at a time,twelve buffer units are required.Because the 3×3 convolution operation stride is one,each buffer unit can buffer sixteen 32-bit data of consecutive addresses.Therefore,when the convolution window slides to the right,if the buffer is hit,the data of the destination address can be directly read or updated,which extremely improves the data reuse of the convolution process and reduces the data communication between the near-memory computing array and the main memory.It also reduces the memory access latency and overhead.The occupancy of 5×5 and 11×11 convolution buffers is similar,but compared with 5×5 and 11×11 convolutions,the data reuse rate with the 3×3 convolution operation is the highest.

    Fig.7 3×3 convolutional data reuse situation

    4 Experimental results and analysis

    Xilinx’s Virtex 6 field programmable gate array(FPGA)development board is chosen to verify the proposed architecture and AlexNet CNN is realized on it.The source occupation can be seen in Table 4.The working frequency on FPGA is 110 MHz.

    Table 4 Hardware resource usage

    The computing time of single convolution is counted as shown in Table 5.Compared with Ref.[20],the performance is improved by 69.60%,75.00%,and 55.32%,respectively.

    The execution clock cycle of each layer of the convolutional layer is shown in Table 6.The proposed parallel computing method is used to perform convolution operations in the near-memory computing array architecture based on the reconfigurable array processor,and the designed architecture is verified and tested.In this paper,C1 inputs 3 images and outputs 96 feature maps,C2 inputs 96 feature images,and outputs 256 feature maps.The processing speed of the CNN accelerator proposed in Ref.[21]is faster than this paper.It is because each layer of C1-C5 has 2 input feature maps,and the amount of data processing is much lower than this paper.Comparing in processing single feature maps,this paper is faster than Ref.[21].Ref.[20]based on the multiple parallel features of CNN calculations proposed an architecture of CNN forward propagation process in parallel calculation,but the calculation speed is slower than this article.Ref.[22]used a reconfigurable array processor that does not support near-memory computing.Compared with it,the overall processing speed of the architecture designed in this paper is increased by 8.81%.

    Table 5 Comparison of the consumption time of single convolution

    Table 6 Convolutional layer execution time comparison

    Ref.[23]proposed a CNN hardware accelerator with an array architecture,which can reconfigure the layer parameters adapt to different CNN structures.By using multiple PEs to perform convolution operations at the same time,the calculation parallelism is improved.And the convolution processing speed is further enhanced.The frequency of the circuit under this architecture can reach 100 MHz.Ref.[24]designed and implemented an efficient and reusable CNN FPGA accelerator.Based on the modified roofline model,the microstructure of accelerator was optimized,the underlying FPGA calculation and bandwidth resource utilization were maximized.But the accuracy of this architecture is lower,and the power consumption is higher than this paper.Ref.[25]proposed a configurable neural network computing architecture by using reconfigurable data quantification to reduce power consumption and on-chip memory requirements,but this architecture is not universal.The maximum frequency of this paper can reach 110 MHz,which is greatly improved compared with Ref.[17]and Ref.[19]and supports 32-bit operand width.Table 7 shows the comparison of frequency,precision and power for different architectures.

    Table 7 Comparison of frequency,precision and power

    5 Conclusion

    For data-intensive applications such as deep learning,a near-memory computing array architecture based on the shared buffer is designed to improve the speed of intensive computing and alleviate bandwidth pressure.The memory occupancy is analyzed,and a method to realize the parallel calculation of the CNN is designed.The experimental results show that the architecture of this paper increases the speed of convolution operation while reducing memory access latency and improving data reuse.The highest frequency can reach 110 MHz.Compared with previous studies,the calculation speed of a single convolution operation is increased by 66.64% on average.Compared with the reconfigurable array processor that does not support near-memory computing,the processing speed of the entire convolutional layer is increased by 8.81%.

    国产精品蜜桃在线观看| 又粗又爽又猛毛片免费看| 欧美丝袜亚洲另类| 美女黄网站色视频| 欧美日本亚洲视频在线播放| 国产精品福利在线免费观看| 九九久久精品国产亚洲av麻豆| 国内少妇人妻偷人精品xxx网站| av线在线观看网站| 日韩亚洲欧美综合| 亚洲欧美日韩卡通动漫| 91久久精品国产一区二区三区| 99久久中文字幕三级久久日本| 黄片wwwwww| 亚洲av电影在线观看一区二区三区 | 久久久精品欧美日韩精品| 日本五十路高清| 成人高潮视频无遮挡免费网站| 尤物成人国产欧美一区二区三区| 如何舔出高潮| 免费观看的影片在线观看| 免费观看的影片在线观看| 日本三级黄在线观看| 在线播放无遮挡| 国语自产精品视频在线第100页| 国产亚洲91精品色在线| 国产不卡一卡二| 午夜免费男女啪啪视频观看| 国产精品永久免费网站| 女人被狂操c到高潮| 亚洲av免费高清在线观看| 天堂√8在线中文| 免费电影在线观看免费观看| 97在线视频观看| 老司机影院成人| 色尼玛亚洲综合影院| 久久久欧美国产精品| 成人综合一区亚洲| 欧美成人a在线观看| 精品午夜福利在线看| 国产午夜精品久久久久久一区二区三区| 最近中文字幕2019免费版| 国产av码专区亚洲av| 成人国产麻豆网| 91久久精品国产一区二区成人| 亚洲欧美日韩高清专用| 国产片特级美女逼逼视频| 成人漫画全彩无遮挡| 国产精品,欧美在线| 哪个播放器可以免费观看大片| 国产精品综合久久久久久久免费| 亚洲,欧美,日韩| 久久久久国产网址| 成人国产麻豆网| 天天躁日日操中文字幕| av在线老鸭窝| 国产极品精品免费视频能看的| 亚洲av男天堂| 免费大片18禁| 一区二区三区免费毛片| 一个人观看的视频www高清免费观看| 国产高清视频在线观看网站| 午夜福利网站1000一区二区三区| 熟女电影av网| 国产精品久久久久久久电影| 亚洲欧美精品专区久久| 男的添女的下面高潮视频| 国产精品无大码| 久久精品综合一区二区三区| 超碰97精品在线观看| 国产伦理片在线播放av一区| 最近中文字幕2019免费版| 又粗又爽又猛毛片免费看| 日韩中字成人| 国产精品三级大全| 日韩欧美 国产精品| 九草在线视频观看| 免费观看在线日韩| 久久久午夜欧美精品| a级毛片免费高清观看在线播放| 精品欧美国产一区二区三| 特大巨黑吊av在线直播| 内地一区二区视频在线| 在线播放国产精品三级| av在线老鸭窝| 亚洲av不卡在线观看| 一本一本综合久久| 亚洲综合精品二区| 久久午夜福利片| 日韩欧美精品免费久久| 2021少妇久久久久久久久久久| 国产成人aa在线观看| 麻豆av噜噜一区二区三区| 成人二区视频| 久久人人爽人人片av| 亚洲五月天丁香| 日韩av在线大香蕉| 能在线免费看毛片的网站| 最近手机中文字幕大全| 国产亚洲一区二区精品| 色综合色国产| 欧美日本亚洲视频在线播放| 国产亚洲91精品色在线| 久久精品综合一区二区三区| 九九久久精品国产亚洲av麻豆| 黄片无遮挡物在线观看| 国产精品福利在线免费观看| 国产一区二区在线观看日韩| 国产男人的电影天堂91| 国产高清有码在线观看视频| 又爽又黄无遮挡网站| 国产毛片a区久久久久| 哪个播放器可以免费观看大片| 中文字幕熟女人妻在线| 日日干狠狠操夜夜爽| 成人国产麻豆网| 三级国产精品欧美在线观看| 亚洲经典国产精华液单| 国产老妇女一区| 国产午夜精品论理片| 网址你懂的国产日韩在线| 国产精品无大码| 99久久无色码亚洲精品果冻| 一级毛片aaaaaa免费看小| 亚洲精品自拍成人| 嫩草影院新地址| 欧美xxxx性猛交bbbb| 国产黄片视频在线免费观看| 久久久国产成人精品二区| av黄色大香蕉| 久久亚洲国产成人精品v| 久久久亚洲精品成人影院| 美女黄网站色视频| 日本av手机在线免费观看| 国产精品嫩草影院av在线观看| 久久精品国产鲁丝片午夜精品| 亚洲,欧美,日韩| 成人毛片60女人毛片免费| 亚洲国产欧洲综合997久久,| 国产精品久久久久久精品电影小说 | 国产成人a∨麻豆精品| 国产乱来视频区| 国产伦一二天堂av在线观看| 亚洲精品aⅴ在线观看| 99久久精品热视频| 日日摸夜夜添夜夜爱| 精华霜和精华液先用哪个| 在线观看一区二区三区| 最近视频中文字幕2019在线8| 天天躁日日操中文字幕| 欧美色视频一区免费| 亚洲18禁久久av| 国产一区二区在线av高清观看| 狠狠狠狠99中文字幕| 久99久视频精品免费| 嘟嘟电影网在线观看| 国产精品久久久久久精品电影小说 | 午夜免费激情av| 99久久成人亚洲精品观看| 国产精品乱码一区二三区的特点| 国产乱人视频| 欧美精品国产亚洲| 99国产精品一区二区蜜桃av| 国产麻豆成人av免费视频| 日韩视频在线欧美| 日韩中字成人| 国产淫片久久久久久久久| 国产亚洲精品av在线| 热99在线观看视频| 99热6这里只有精品| 99久久九九国产精品国产免费| 亚洲av免费在线观看| 少妇人妻精品综合一区二区| 精品国产露脸久久av麻豆 | 青春草视频在线免费观看| 成人性生交大片免费视频hd| 久久精品久久精品一区二区三区| 亚洲欧美日韩高清专用| 午夜福利成人在线免费观看| 国产视频内射| 午夜精品国产一区二区电影 | 午夜视频国产福利| 国产 一区 欧美 日韩| 伦精品一区二区三区| 国产高潮美女av| 国产精品国产三级国产av玫瑰| 久久综合国产亚洲精品| 一边摸一边抽搐一进一小说| 欧美3d第一页| 国产一级毛片在线| 亚洲精品乱码久久久v下载方式| 免费黄色在线免费观看| 免费在线观看成人毛片| 久久久久久久国产电影| 老师上课跳d突然被开到最大视频| 国产精品人妻久久久久久| 日本黄大片高清| 一级毛片久久久久久久久女| 国产探花在线观看一区二区| 久久久亚洲精品成人影院| 变态另类丝袜制服| 69av精品久久久久久| 卡戴珊不雅视频在线播放| 午夜亚洲福利在线播放| 美女xxoo啪啪120秒动态图| 国产成人a∨麻豆精品| 亚洲国产最新在线播放| 精品久久久久久久久久久久久| 精品酒店卫生间| 日本与韩国留学比较| 深夜a级毛片| 只有这里有精品99| 久久久久精品久久久久真实原创| 高清av免费在线| 日本猛色少妇xxxxx猛交久久| 日日摸夜夜添夜夜爱| 内地一区二区视频在线| 亚洲中文字幕一区二区三区有码在线看| 日日撸夜夜添| 丰满乱子伦码专区| 亚洲自偷自拍三级| 男女国产视频网站| 男女视频在线观看网站免费| 成人无遮挡网站| 国产一级毛片在线| 国产成人一区二区在线| 97超视频在线观看视频| 亚洲一区高清亚洲精品| 亚洲最大成人中文| 国产免费福利视频在线观看| 自拍偷自拍亚洲精品老妇| 久久国产乱子免费精品| 久久精品影院6| 国产亚洲最大av| 亚洲av一区综合| 最近2019中文字幕mv第一页| 欧美最新免费一区二区三区| 国产黄片视频在线免费观看| 99热精品在线国产| 欧美一区二区精品小视频在线| 精品人妻视频免费看| 国产亚洲91精品色在线| 亚洲av免费在线观看| 一个人免费在线观看电影| 久久国内精品自在自线图片| 国产乱人偷精品视频| 欧美高清性xxxxhd video| 在线播放无遮挡| 成人三级黄色视频| 天天躁日日操中文字幕| 亚洲美女搞黄在线观看| 欧美成人午夜免费资源| 免费av观看视频| 99久久精品一区二区三区| 亚洲av二区三区四区| 日日撸夜夜添| 婷婷色av中文字幕| 三级经典国产精品| 精品一区二区免费观看| 亚洲国产欧美在线一区| 青春草亚洲视频在线观看| 2021少妇久久久久久久久久久| 亚洲人成网站在线播| 中国美白少妇内射xxxbb| 免费无遮挡裸体视频| 久久精品久久久久久噜噜老黄 | 欧美激情国产日韩精品一区| 嫩草影院精品99| 99久久成人亚洲精品观看| 国产 一区精品| 能在线免费观看的黄片| 女人久久www免费人成看片 | 欧美成人精品欧美一级黄| a级一级毛片免费在线观看| 国产毛片a区久久久久| 日韩人妻高清精品专区| eeuss影院久久| 最近中文字幕高清免费大全6| 麻豆乱淫一区二区| 久久久久久久国产电影| 少妇高潮的动态图| 高清av免费在线| 高清在线视频一区二区三区 | 国产在视频线精品| av天堂中文字幕网| 色播亚洲综合网| 国产伦一二天堂av在线观看| 久久人人爽人人爽人人片va| 三级毛片av免费| 一个人看视频在线观看www免费| 国产乱人视频| 亚洲国产欧洲综合997久久,| 国产黄片视频在线免费观看| 中文字幕精品亚洲无线码一区| 国产伦精品一区二区三区四那| 欧美三级亚洲精品| 亚洲av福利一区| 在现免费观看毛片| av天堂中文字幕网| 最近中文字幕高清免费大全6| 国产色爽女视频免费观看| 夜夜看夜夜爽夜夜摸| 汤姆久久久久久久影院中文字幕 | 中文亚洲av片在线观看爽| 亚洲av成人av| 激情 狠狠 欧美| 非洲黑人性xxxx精品又粗又长| 一级爰片在线观看| 亚洲电影在线观看av| 久久久久久久久大av| 亚洲综合精品二区| 真实男女啪啪啪动态图| 三级国产精品欧美在线观看| 丝袜美腿在线中文| 免费看光身美女| 欧美性猛交╳xxx乱大交人| 精品国内亚洲2022精品成人| 又粗又硬又长又爽又黄的视频| 老女人水多毛片| 国产三级中文精品| 久久精品久久精品一区二区三区| 日本黄色片子视频| a级一级毛片免费在线观看| av福利片在线观看| 亚洲欧美成人综合另类久久久 | 女的被弄到高潮叫床怎么办| 免费人成在线观看视频色| 成人毛片60女人毛片免费| 最近手机中文字幕大全| 日本黄色视频三级网站网址| 亚洲va在线va天堂va国产| 欧美高清成人免费视频www| 国内揄拍国产精品人妻在线| 能在线免费看毛片的网站| 亚洲精品456在线播放app| 国产欧美日韩精品一区二区| 亚洲电影在线观看av| 免费电影在线观看免费观看| 嫩草影院入口| 亚洲欧美精品综合久久99| 国产精品一区二区在线观看99 | 日本-黄色视频高清免费观看| 有码 亚洲区| 春色校园在线视频观看| 亚洲图色成人| 夫妻性生交免费视频一级片| 免费观看a级毛片全部| 亚洲av电影在线观看一区二区三区 | 日韩欧美在线乱码| 看非洲黑人一级黄片| 午夜免费激情av| 又黄又爽又刺激的免费视频.| 国产精品一区二区在线观看99 | 日韩一区二区三区影片| 日韩一区二区视频免费看| 国产精品1区2区在线观看.| 成人av在线播放网站| 国产免费男女视频| 国产精品国产三级国产av玫瑰| 免费在线观看成人毛片| 国产白丝娇喘喷水9色精品| 久久精品影院6| 欧美3d第一页| 国产午夜精品论理片| 只有这里有精品99| 一级二级三级毛片免费看| 超碰97精品在线观看| 天天一区二区日本电影三级| 26uuu在线亚洲综合色| 视频中文字幕在线观看| 久久久久久久国产电影| 日韩精品青青久久久久久| 99久久精品国产国产毛片| 久久久久免费精品人妻一区二区| 亚洲欧美成人综合另类久久久 | 成人美女网站在线观看视频| 身体一侧抽搐| 变态另类丝袜制服| 国产精品久久久久久精品电影小说 | 国产美女午夜福利| 亚洲色图av天堂| 嫩草影院新地址| 人妻制服诱惑在线中文字幕| 性插视频无遮挡在线免费观看| 三级经典国产精品| 最后的刺客免费高清国语| 欧美区成人在线视频| 一边摸一边抽搐一进一小说| .国产精品久久| 青青草视频在线视频观看| 一级毛片aaaaaa免费看小| ponron亚洲| 91久久精品国产一区二区三区| 久久久国产成人免费| 又黄又爽又刺激的免费视频.| 亚洲欧美精品自产自拍| 久久久久久大精品| 听说在线观看完整版免费高清| 成人鲁丝片一二三区免费| 大话2 男鬼变身卡| 免费观看的影片在线观看| 亚洲中文字幕一区二区三区有码在线看| 免费观看性生交大片5| 精品人妻一区二区三区麻豆| 欧美日韩在线观看h| 久久99热这里只频精品6学生 | 成人欧美大片| 永久网站在线| 亚洲内射少妇av| 亚洲最大成人手机在线| 欧美高清成人免费视频www| 久久精品国产自在天天线| 男人和女人高潮做爰伦理| 联通29元200g的流量卡| 2021天堂中文幕一二区在线观| 亚洲av免费高清在线观看| av福利片在线观看| 男人和女人高潮做爰伦理| 男的添女的下面高潮视频| 毛片女人毛片| 在线观看66精品国产| 亚洲av一区综合| 国产亚洲av片在线观看秒播厂 | 午夜激情福利司机影院| 亚洲国产日韩欧美精品在线观看| 亚洲欧洲国产日韩| 大香蕉97超碰在线| 国产免费男女视频| 色噜噜av男人的天堂激情| 国产精品嫩草影院av在线观看| www.av在线官网国产| 一二三四中文在线观看免费高清| 人体艺术视频欧美日本| 亚洲欧美中文字幕日韩二区| 亚洲乱码一区二区免费版| 晚上一个人看的免费电影| 日本wwww免费看| 国产精品伦人一区二区| 精品人妻偷拍中文字幕| 亚洲五月天丁香| 欧美高清成人免费视频www| 亚洲18禁久久av| 女的被弄到高潮叫床怎么办| 亚洲国产高清在线一区二区三| 丰满乱子伦码专区| 亚洲欧美精品综合久久99| 一区二区三区高清视频在线| 卡戴珊不雅视频在线播放| 99久久人妻综合| 汤姆久久久久久久影院中文字幕 | 亚洲精品亚洲一区二区| 丰满少妇做爰视频| 亚洲精品乱久久久久久| 国产高清国产精品国产三级 | 丰满人妻一区二区三区视频av| 成人综合一区亚洲| 99九九线精品视频在线观看视频| 国产黄片美女视频| 一本一本综合久久| 男的添女的下面高潮视频| 亚洲av免费高清在线观看| 最近中文字幕2019免费版| 狠狠狠狠99中文字幕| av在线老鸭窝| 91av网一区二区| 99在线视频只有这里精品首页| 国产精品一区二区三区四区免费观看| 五月玫瑰六月丁香| 国产69精品久久久久777片| 国产高潮美女av| 欧美zozozo另类| 亚洲av不卡在线观看| 黄色日韩在线| 国产精品爽爽va在线观看网站| 国产淫语在线视频| 亚洲自偷自拍三级| 国产在视频线在精品| 久久热精品热| 两个人视频免费观看高清| 午夜精品国产一区二区电影 | 18禁裸乳无遮挡免费网站照片| 99久久成人亚洲精品观看| 搞女人的毛片| 婷婷色综合大香蕉| av在线播放精品| 如何舔出高潮| 国产黄色视频一区二区在线观看 | 91久久精品国产一区二区成人| 国产伦一二天堂av在线观看| 激情 狠狠 欧美| 午夜视频国产福利| 成人特级av手机在线观看| 国产淫语在线视频| 日本猛色少妇xxxxx猛交久久| 成人美女网站在线观看视频| 黑人高潮一二区| 老师上课跳d突然被开到最大视频| 只有这里有精品99| av国产久精品久网站免费入址| 日韩成人伦理影院| 久久精品综合一区二区三区| 啦啦啦观看免费观看视频高清| 久久99热这里只有精品18| 一级黄片播放器| 国内精品一区二区在线观看| 精品久久国产蜜桃| 午夜福利成人在线免费观看| 91午夜精品亚洲一区二区三区| 午夜福利在线观看吧| 国产精品不卡视频一区二区| 久久精品影院6| 老师上课跳d突然被开到最大视频| 免费播放大片免费观看视频在线观看 | 尤物成人国产欧美一区二区三区| 精品一区二区免费观看| 国产精品永久免费网站| 色噜噜av男人的天堂激情| 女人被狂操c到高潮| 精品一区二区三区人妻视频| 卡戴珊不雅视频在线播放| 小蜜桃在线观看免费完整版高清| 丰满人妻一区二区三区视频av| 久久精品国产亚洲av天美| 哪个播放器可以免费观看大片| 少妇熟女欧美另类| 永久免费av网站大全| 国产精品嫩草影院av在线观看| 日本黄色视频三级网站网址| 一级黄片播放器| 日本wwww免费看| 亚洲av成人精品一区久久| 亚洲av免费在线观看| 亚洲在线自拍视频| 国产成人精品婷婷| 大又大粗又爽又黄少妇毛片口| 九九久久精品国产亚洲av麻豆| 啦啦啦啦在线视频资源| 三级国产精品欧美在线观看| 18禁在线无遮挡免费观看视频| 亚洲国产精品合色在线| 中国国产av一级| 亚洲成av人片在线播放无| 有码 亚洲区| 久久这里有精品视频免费| 九色成人免费人妻av| 久久这里有精品视频免费| 亚洲av熟女| 永久免费av网站大全| 特大巨黑吊av在线直播| 国产av不卡久久| 成人高潮视频无遮挡免费网站| 精品欧美国产一区二区三| ponron亚洲| 非洲黑人性xxxx精品又粗又长| 亚洲精品,欧美精品| 日韩一区二区三区影片| 亚洲精品aⅴ在线观看| 国产精华一区二区三区| 在线观看66精品国产| 一区二区三区四区激情视频| 亚洲,欧美,日韩| 欧美性感艳星| 尾随美女入室| 嫩草影院精品99| 国产成人午夜福利电影在线观看| АⅤ资源中文在线天堂| 成人欧美大片| 欧美性感艳星| av.在线天堂| 国产一级毛片在线| 天堂√8在线中文| 亚洲欧美日韩东京热| 久久久国产成人免费| 亚洲欧美精品专区久久| 晚上一个人看的免费电影| 日本wwww免费看| 级片在线观看| 99久久中文字幕三级久久日本| 直男gayav资源| 精品久久久久久电影网 | 白带黄色成豆腐渣| 嫩草影院新地址| 久久久久国产网址| 丝袜美腿在线中文| 国语自产精品视频在线第100页| 卡戴珊不雅视频在线播放| 人体艺术视频欧美日本| 国产真实乱freesex| 国产精品av视频在线免费观看| 18+在线观看网站| 男人的好看免费观看在线视频| 国产一区二区三区av在线| 日韩欧美国产在线观看| 国产一区二区在线av高清观看| 色5月婷婷丁香| 99久久无色码亚洲精品果冻| 九九在线视频观看精品| 久久久色成人| 精品人妻偷拍中文字幕| 桃色一区二区三区在线观看| 91av网一区二区| 亚洲av不卡在线观看| 男人舔女人下体高潮全视频| av线在线观看网站| 免费看a级黄色片| 免费观看的影片在线观看| 人妻制服诱惑在线中文字幕| 天堂√8在线中文| 日本三级黄在线观看| 91aial.com中文字幕在线观看| 一级毛片我不卡| 少妇熟女aⅴ在线视频| 久久久精品欧美日韩精品| 大又大粗又爽又黄少妇毛片口| 九色成人免费人妻av|