• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    SoftSSD:enabling rapid flash firmware prototyping for solid-state drives*&

    2023-06-02 12:30:50JinXUERenhaiCHENTianyuWANGZiliSHAO

    Jin XUE ,Renhai CHEN ,Tianyu WANG ,Zili SHAO

    1Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong 999077, China

    2College of Intelligence and Computing, Tianjin University, Tianjin 300354, China

    Abstract: Recently,solid-state drives (SSDs) have been used in a wide range of emerging data processing systems.Essentially,an SSD is a complex embedded system that involves both hardware and software design.For the latter,firmware modules such as the flash translation layer (FTL) orchestrate internal operations and flash management,and are crucial to the overall input/output performance of an SSD.Despite the rapid development of new SSD features in the market,the research of flash firmware has been mostly based on simulations due to the lack of a realistic and extensible SSD development platform.In this paper,we propose SoftSSD,a software-oriented SSD development platform for rapid flash firmware prototyping.The core of SoftSSD is a novel framework with an event-driven programming model.With the programming model,new FTL algorithms can be implemented and integrated into a full-featured flash firmware in a straightforward way.The resulting flash firmware can be deployed and evaluated on a hardware development board,which can be connected to a host system via peripheral component interconnect express and serve as a normal non-volatile memory express SSD.Different from existing hardware-oriented development platforms,SoftSSD implements the majority of SSD components(e.g.,host interface controller) in software,so that data flows and internal states that were once confined in the hardware can now be examined with a software debugger,providing the observability and extensibility that are critical to the rapid prototyping and research of flash firmware.We describe the programming model and hardware design of SoftSSD.We also perform experiments with real application workloads on a prototype board to demonstrate the performance and usefulness of SoftSSD,and release the open-source code of SoftSSD for public access.

    Key words: Solid-state drives;Storage system;Software hardware co-design

    1 Introduction

    Recently,solid-state drives (SSDs) have been used in a wide range of emerging data processing systems(Lu LY et al.,2016;Lee et al.,2019).Compared to traditional magnetic disks,SSDs deliver higher input/output (I/O) throughput as well as lower latency,which makes them one of the best choices as the underlying storage for large-scale data processing systems demanding high I/O performance.The design of modern SSDs has gone through intensive optimizations to reduce request latency and improve throughput.Meanwhile,novel storage interface protocols,such as non-volatile memory express(NVMe),have been proposed to reduce communication overhead between the host and the device to harvest high I/O performance enabled by modern SSDs.Under the hood,SSDs persist data on NAND flash memory,which retains data after power loss(Boukhobza et al.,2017).The flash memory chips typically form an array with multiple channels so that user requests can be distributed among them to better use the I/O parallelism.

    Although NAND flash memory is the core component of SSDs,it does not work off-the-shelf.A smart storage controller must be used to orchestrate the internal operations of SSDs and data flows between the host interface and the flash memory.By nature,the storage controller is a complex embedded system that involves both hardware and software design.The former includes physical interfaces to buses through which the storage controller is connected to the host and the underlying flash memory array.For the latter,firmware modules such as the flash translation layer (FTL) (Gupta et al.,2009;Ma et al.,2020)are implemented to handle internal operations and flash management of SSDs.In summary,the main task of a storage controller is to accept requests from the host interface,perform necessary transformations on them,and finally dispatch them through multiple channels to the underlying flash memory array.

    Storage controllers are responsible for various operations on the critical path of request processing.Thus,the performance delivered by an SSD depends heavily on the storage controller.To handle a tremendous number of concurrent I/O requests arriving from multiple submission queues(SQs)provided by the flexible host interface protocol,the storage controller,especially the flash firmware running on top of it,needs to perform internal SSD operations such as address translation and cache lookup with high efficiency.On the other hand,the architecture of modern SSDs has been designed in a hierarchical manner,so that flash transactions can be distributed across multiple channels and executed in parallel to obtain the maximal throughput (Gao et al.,2014).Thus,the storage controller needs to handle the queuing and scheduling of flash transactions to fully use the massive internal I/O parallelism.These internal tasks can incur high computational load when there are a large number of concurrent user requests and the flash firmware can soon become a bottleneck if the FTL algorithms are not well designed.Furthermore,manufacturers begin to employ highperformance multi-core microprocessors on SSDs to adapt to the massive I/O parallelism of the underlying flash storage.With a multi-core processor,the flash firmware can start multiple worker threads to process the computational work in parallel to improve throughput and hide request latency (Zhang et al.,2020).This opens new opportunities in flash firmware research,in which an extensible SSD development board is required for rapid prototyping and evaluation.

    Despite the rapid development of new features of SSDs in the market,the research of flash firmware has been mostly based on simulations due to the lack of a realistic and extensible SSD development platform.Existing SSD simulators (Kim et al.,2009;Hu et al.,2011;Yoo et al.,2013;Jung et al.,2016,2018;He et al.,2017;Li HC et al.,2018;Tavakkol et al.,2018) typically model only some parts that are considered to have the most significant impact on the end-to-end latency,e.g.,data transfer and flash command execution.The simplified latency model used by these simulators may not accurately capture the performance characteristics of real SSDs.For instance,many simulators model a single-queue host interface that reflects older protocols such as serial AT attachment(SATA),instead of newer storage protocols such as NVMe which provide multi-queue support and higher performance.Meanwhile,due to the physical characteristics of NAND flash,errors may occur during the read and the program processes,and data may be lost as a result of NAND flash wear-out after many program/erase cycles.Such physical characteristics have a negative effect on the SSD performance(Shi et al.,2016),but are not modeled by existing SSD simulators.As such,these simulation tools may not provide SSD performance results that match those of the real hardware.

    Conventionally,many components of a storage controller,such as the host interface controller,the flash transaction scheduler,and the error correction code (ECC) engine,are implemented in hardware on a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).Although such a hardware implementation can generally achieve higher performance compared to its software counterpart,it causes several difficulties for investigating and prototyping new flash firmware.Due to the complexity of modern storage protocols,implementing a host interface controller in pure hardware requires non-trivial effort.Also,new extensions and new transports,such as the NVMe keyvalue(KV)command set and the remote direct memory access (RDMA),are continuously added to existing storage protocols to adapt to emerging data models used in current data processing systems.If the host interface controller is implemented in hardware,it can be time-consuming to extend the hardware design to support the new extensions,making prototyping flash firmware for these new interfaces difficult.

    To bridge the gap between pure software simulation-based platforms and hardware-oriented platforms,we propose SoftSSD,a novel softwareoriented and flexible SSD development platform for rapid flash firmware prototyping,as shown in Fig.1b.The core of SoftSSD is a development board that can be plugged into a host machine and serve as a normal NVMe SSD.Different from existing hardwareoriented approaches,we implement only a small number of necessary components in the storage controller in hardware on an FPGA.These components include the physical layer interfaces to the host bus,e.g.,the peripheral component interconnect express(PCIe) link,and the NAND flash chips.Such interfaces are defined by specifications and thus not subject to frequent changes.Furthermore,these hardware components are required only to handle simple tasks such as receiving raw transaction layer packets (TLPs) from the PCIe link,so that they can be extended to support newer revisions of the interfaces.Other components of SoftSSD,including the NVMe interface and the flash transaction scheduler,are developed in pure software as parts of the flash firmware modules and run on an ARM-based embedded processor.These components as well as the flash firmware can be reconfigured and reprogrammed for research purpose.With SoftSSD,data flows and internal states that were once confined in the hardware design are now processed by the software and can be examined with a debugger,which provides observability and visibility in rapid prototyping and flash firmware research.

    Fig.1 Comparison of existing hardware-oriented platforms (a) and SoftSSD (b)

    However,implementing SSD components in software brings new challenges.First,compared to specialized hardware implementations,software implementations provide more flexibility at the cost of lower performance,which poses a great challenge in SoftSSD.Second,as SSD components are now an integrated part of the flash firmware,a new programming model is required to hide the details of interacting with the hardware and enable implementing and assembling these modules,including the FTL,into flash firmware in a straightforward way.To this end,we propose a novel framework with an event-driven threaded programming model,on which the flash firmware is built for SSD firmware.Under the framework,user requests are handled as small tasks that can be assigned to multiple threads and scheduled to maximize central processing unit (CPU) utilization and thus enhance the SoftSSD performance.Furthermore,the flash firmware built with the proposed framework can be deployed on a multi-core heterogeneous microprocessor to process I/O requests in parallel.

    We implement both the programming model and the hardware components of SoftSSD and carry out performance evaluation.We connect SoftSSD as a standard NVMe SSD to a real host system and conduct experiments with application workloads to demonstrate the performance of SoftSSD.We also compare our software-implemented NVMe interface with an FPGA-based NVMe controller.Experimental results show that SoftSSD can achieve good performance for real I/O workloads while providing observability and extensibility for rapid flash firmware prototyping.SoftSSD has been released as open source at https://github.com/Jimx-/softssd.

    This work is a revised version based on a preliminary version of our work published in Proceedings of the 2022 IEEE 40thInternational Conference on Computer Design (Xue et al.,2022).The new contributions of this paper are summarized as follows:

    1.We add a new section to present more background information about existing SSD development platforms and open-channel SSDs (Section 2.2).We also add Table 1 to summarize the differences between our proposed platform and existing SSD platforms.

    Table 1 Summary of different SSD development platform designs

    2.We analyze the overhead of the softwarebased PCIe TLP processing and its impact on direct memory access (DMA) request performance (Section 3.1).We add a new optimization that offloads the packet processing to FPGA fabric to improve the DMA performance.

    3.We add a new optimization on the ECC engine to separate the detection and correction of errors in data blocks and bypass the expensive softwarebased correction for clean blocks(Section 3.2).

    4.We analyze the performance overhead of the software-based host interface controller by implementing a hardware NVMe-over-PCIe controller in the FPGA and comparing its performance with that of SoftSSD (Section 6.5).

    2 Background

    2.1 NVMe SSD devices

    Fig.1 illustrates the overall architecture of a modern NVMe SSD.As shown in Fig.1,the SSD is connected to the host via a physical bus,most commonly PCIe.The physical bus interface allows the SSD and the host to exchange requests and data.Based on that,an NVMe controller manages the internal states of the storage protocol and parses incoming requests.The requests are processed by an FTL executing on a microprocessor.The FTL performs the necessary transform on the requests,converting the requests into flash commands (e.g.,read and program) executed by the NAND flash memory.The processor is connected to the flash memory components through an NAND flash interface which contains multiple bus channels.Each channel is connected to one or more NAND flash memory chips.

    To maximize the I/O parallelism,a flash chip is further organized in a hierarchical structure,as shown in Fig.1.Specifically,a flash chip is divided into multiple dies or logical units(LUNs),where each die has its own command and status register.This allows dies to execute different commands independently in parallel.Each die is further divided into multiple planes,where each plane has separate cache and data registers.Through multi-plane commands,multiple planes in the same die can execute the same operation to the same offset within the plane concurrently.Finally,a plane is divided into multiple blocks and flash pages.

    Due to the physical limitations of NAND flash memory,errors may occur during the read and the program processes.The cells that store data bits may wear out as a result of continuous program and erase operations so that some memory locations may no longer be programmed or erased properly.To ensure data integrity,it is crucial to equip the SSD with a mechanism that is capable of detecting and correcting a certain number of bit errors(Wang et al.,2017).The ECC engine encodes data programmed on the NAND flash memory in a way such that errors can be identified and corrected by the decoder when the data are read(Li S and Zhang,2010;Ho et al.,2013).For write operations,the ECC engine generates a code word for the flash page data,which is stored in the out-of-band(OOB)area of the data page.When reading data from the NAND flash memory,both the flash page data and the code word are retrieved.By comparing the code word re-computed from the read data and the stored code word,error bits can be identified.If a flash page contains more error bits that can be corrected by the ECC engine,the block of this flash page is marked as a bad block by the flash firmware and will not be used.

    In an NAND flash memory chip,a flash page(typically 4–16 KB)is the basic data unit for a read or write operation.In addition to the user data area,each flash page has a small OOB area that can be used by the flash firmware to store internal metadata such as the ECC.The flash pages are grouped into blocks,which are the granularity for an erase operation.Once data are written to a flash page,the flash page cannot be updated until it is later reclaimed and erased.Thus,SSDs generally do not support in-place updates for flash pages.Instead,pages are updated in an out-of-place manner.When a logical page is updated,the flash firmware allocates a newly erased physical page.After that,existing data for the logical page are read from the old physical location,merged with the update,and written to the new location.The logical-to-physical address mapping is updated so that subsequent requests to the logical page will be redirected to the new physical location.In this way,the old physical page (and all pages in the same flash block) does not need to be erased for the update,and thus the high erase latency is avoided for a write operation.However,because the logical page is now mapped to a new physical location,the old physical page must be marked invalid so that background garbage collection(GC)can be run later to reclaim the invalid flash pages (Yang et al.,2014).These internal details of NAND flash memory are handled by the FTL to expose a symmetric block access interface.

    2.2 SSD development platforms

    In recent years,open-source hardware SSD platforms such as Jasmine OpenSSD and Cosmos Plus OpenSSD (Kwak et al.,2020) were introduced to provide a realistic platform for the research of storage controllers.Both implementations provide development boards that can be connected to the host through a specific host interface (e.g.,SATA and NVMe) and used as a normal SSD.Based on that,some parts of the system can be reconfigured or re-implemented for research purpose.However,Jasmine OpenSSDs are based on a commercial ARM-based SATA controller system-on-a-chip(SoC).Thus,with Jasmine OpenSSDs,only the flash firmware can be modified to implement new flash management algorithms,but the underlying hardware components cannot be extended to support newer storage protocols such as NVMe.Cosmos Plus OpenSSD,on the contrary,implements the hardware components of the storage controller,e.g.,the host interface controller,the flash controller,and the ECC engine,on an FPGA,as shown in Fig.1a.Although this design makes it possible to modify the host interface and the flash controller to support newer protocols,it still requires significant efforts to develop and maintain these components on an FPGA.Also,compared to a software implementation,hardware implementations based on FPGA lack observability and extensibility that are critical to the prototyping and debugging of newer storage techniques.For example,the Cosmos Plus OpenSSD hardware NVMe controller supports only up to eight NVMe I/O queues,which is hard-coded in the design.To support more queues,the hardware design must be modified to add state storage and arbitration logic for the extra queues,which is error-prone.On the contrary,a software NVMe controller allows the host to transparently create a large number of I/O queues as long as they do not exceed device DRAM capacity.In this way,a software-based development platform can be extended to support workloads with a higher degree of queue-level parallelism without modifying the hardware components.For observability,with a hardware-based development platform,the hardware design needs to be modified to capture relevant internal signals for examination,whereas for a software-based platform,all internal states of the software components can be exposed to developers with a software debugger.

    Due to the difficulty in implementing all SSD functionalities on the development platform,some works proposed open-channel SSDs (Lu YY et al.,2013;Bj?rling et al.,2017)to reduce the device-side complexity.Different from traditional SSDs,openchannel SSDs do not have an FTL implemented on the device.Instead,the device exposes the internal organization and characteristics of flash memory and leaves the SSD management tasks to the host.In this way,FTL components such as the mapping table and GC can be implemented in the host operating system.Although open-channel SSDs provide a flexible approach for evaluating different FTL algorithms,they may require a significant amount of host memory and CPU cycles for SSD management.Furthermore,by treating the SSD as a raw flash memory device without minimal computation capabilities,applications that can be built based on such platforms are restricted.For example,in-storage computing (ISC) allows the host to offload computation tasks to the storage device for execution to reduce host-device data movement cost.However,such ISC applications are not supported by openchannel SSDs because they lack both the processing elements required to execute in-storage computation tasks and FTL metadata for in-situ flash data access.Table 1 summarizes the differences among these SSD platforms.

    SoftSSD is a development platform that involves both hardware and software design.In this section,we focus on the hardware design of SoftSSD,which is a minimal set of functionalities that manage the interactions with the host and the underlying NAND flash array.The overall architecture of the SoftSSD hardware components is shown in Fig.2.

    Fig.2 Overall architecture of the SoftSSD hardware components

    3 Hardware design

    3.1 PCIe interface

    The software-oriented approach of SoftSSD enables building a storage protocol controller on top of different transports.In this work,we focus on the NVMe protocol over a PCIe transport layer.A PCIe interconnect can be regarded as a packet-based network.Similar to TCP/IP networks,PCIe adopts a layered model that consists of a physical layer,a data link layer,and a transaction layer.The data link layer manages the link state and ensures the data integrity of PCIe packets as they are delivered across the PCIe link.The PCIe transaction layer delivers TLPs between two PCIe endpoints across the PCIe links.The root complex and PCIe switches route and forward TLPs based on the destination addresses provided in the packets.Other functionalities of PCIe are built on top of this packet-based communication.For example,to perform DMA to the host memory,a PCIe device sends a memory read(MRd)or memory write(MWr)transaction packet with the data payload to the root complex.The root complex has access to the host memory and it completes the transaction based on the TLP and optionally sends back a completion packet to the requesting device for non-posted requests (e.g.,MRd).Message-signaled interrupts (MSIs) are sent from a PCIe device to the host by writing pre-configured data to a memory address specified by the interrupt controller.

    In SoftSSD,we build only the physical and the data link layers in hardware on the FPGA.The two layers receive/transmit raw TLPs from/to the PCIe link.TLPs received from the link or to be transmitted are divided into two streams based on whether the device is the requester or the completer of the transaction.Two DMA interfaces move the raw TLPs between the PCIe controller and the device DRAM.After the raw TLPs arrive at the device DRAM,they are parsed and processed by the software PCIe transaction layer.The completer interface handles MRd or MWr requests from the host to the memory-mapped I/O regions exposed by Soft-SSD through the base address registers(BARs).For MWr requests,the write data are attached as payload in the TLPs and the device does not need to send back completion packets to the host.MRd requests contain a unique tag but do not have any data payload.Upon processing an MRd TLP,the software transaction layer sends a completion TLPcontaining the read data with the same tag as the request packet.The requester interface is used by the flash firmware to initiate DMA requests to the host memory.To send a DMA request,the flash firmware prepares a TLP in the device DRAM and configures the requester DMA controller to move the packet to the PCIe controller for transmission to the PCIe link.For DMA read requests,the host sends back response data with a completion TLP,which is moved to the device DRAM and processed by the software transaction layer.

    The performance of DMA requests to the host memory depends mainly on the requester DMA interface.For a PCIe device,the maximum size of the payload in a TLP may be different from the maximum size allowed for an MRd request.For this reason,an MRd request may be completed with one or multiple completion TLPs.With a full software transaction layer,all completion TLPs for a read request need to be copied to the device DRAM and processed by the software.The software transaction layer may further copy the data payload of the completion TLPs and re-assemble the response data in a data buffer,incurring high data movement and software packet processing overhead and causing a bottleneck for DMA request performance.

    To improve host DMA request performance,we offload the processing of requester completion TLPs from the software transaction layer to the requester DMA controller.As shown in Fig.3,the requester DMA controller internally records the data buffer addresses of up to 64 outstanding host memory DMA read requests in the buffer descriptor table.The buffer descriptor table also stores the current offset for each data buffer,which is the number of bytes already received for the read request.When initiating a new DMA read request to the host memory,the flash firmware allocates a unique tag for it and registers the buffer base address in the buffer descriptor table.Upon receiving new completion TLPs with response data,the requester DMA controller extracts the tag from the TLP and obtains the buffer address from the table.The data mover copies the data payload to the corresponding position in the data buffer and updates the offset.After the last completion for a DMA read request is received,the requester DMA controller sets the corresponding bit in the completion bitmap and notifies the flash firmware.With the offloading,the number of TLPs processed by the software transaction layer is reduced and the software does not need to repeatedly copy the data payload to re-assemble the response data,thus improving the host DMA request performance.

    Fig.3 Completion TLP processing offloading to the requester DMA controller

    Overall,the following listing shows the application programming interface (API) provided by the PCIe controller and the software PCIe transaction layer.The flash firmware defines the callback functions to handle memory read/write requests to the memory-mapped BAR regions.pcie_send_completion sends response data to complete a memory read request by the host.The flash firmware can use pcie_dma_read/write to transfer data between the host memory and the device DRAM.Finally,pcie_send_msi can be used to send an MSI/MSI-X (short for message signaled interrupt/extended message signaled interrupt) to the host with the provided interrupt vector.

    void pcie_mem_read_callback(int bar_nr,unsigned long addr,u16 requester_id,u8 tag,size_t len);

    void pcie_mem_write_callback(int bar_nr,unsigned long addr,const u8* buf,size_t len);

    int pcie_send_completion(unsigned long addr,u16 requester_id,u8 tag,const u8*buffer,size_t count);

    int pcie_dma_read(unsigned long host_addr,u8* buffer,size_t count);

    int pcie_dma_write(unsigned long host_addr,const u8* buffer,size_t count);

    void pcie_send_msi(u16 vector).

    3.2 Error correction bypass

    In this work,we use the Bose–Chaudhuri–Hocquenghem (BCH) code as the encoding scheme for ECC.With the BCH code,user data are divided into multiple blocks of fixed size and a block of code bits is calculated and appended to the data block.The code bits are stored with the user data in a flash page.When the data are read from the NAND flash memory,a decoder processes the data and code bits to detect and correct errors in the data.The decoding of the BCH code involves several phases.First,the syndromes are calculated from the read data and code bits.Based on the syndromes,we can detect whether there are errors in the data.If the syndromes are all zero,the read data do not contain any error bit and the decoding is finished.Otherwise,error locations are computed from the syndromes and the corresponding bits are flipped to correct the errors.Although the error correction is complex and requires more computational power,the error detection phase can be done in a streaming manner with fewer hardware resources.For this reason,we implement the complex error correction logic in software but offload the encoding and the error detection logic to the hardware.

    As shown in Fig.4,instead of storing ECC code bits in the OOB area of a flash page,we store the data blocks and code bits in an interleaving manner.This data layout allows us to perform data encoding and decoding in a streaming manner during data transfer without buffering any data.When writing data to a flash page,the DMA controller reads the user data from the device DRAM,separates them into data blocks,and streams them into the BCH encoder.The BCH encoder appends BCH code bits after each data blocks,which will be written to the flash page by the NAND flash controller.When reading flash pages from the NAND flash memory,the data are first segmented into frames that contain data blocks and the stored code bits.The frames are streamed into the BCH detector,which detects if there are errors in the data blocks without attempting to correct them.The error information is recorded in an error bitmap which can be accessed by the software ECC engine later.After the BCH error detector processes the data,the DMA controller uses scatter-gather DMA requests to write data blocks and code bits to the device DRAM.During this process,data blocks and ECC code bits are separated,and the original user data layout is restored.The software error correction engine then checks the error bitmap to determine which blocks in the user data contain error bits and corrects error bits in these blocks.

    Fig.4 Data flow for writing to (a) and reading from (b) flash pages

    Compared to error correction,the encoding and error detection logic consumes fewer hardware resources,which allows us to build one ECC encoder and one ECC error detector per each flash channel.This enables us to perform data encoding and error detection during the data transfer without incurring extra overhead.Once the ECC detector checks that a data block is free of errors,it can be forwarded to the flash firmware without further processing.Only blocks that contain error bits need to be sent to the software ECC engine to correct errors.With the error correction bypass,we can skip the expensive software error correction logic for most data blocks,which is more efficient than existing methods that share the hardware ECC engine among multiple channels in a time-division manner and significantly reduces request latencies.Although we reuse the same encoding scheme for both hardware error detection and software error detection in this work,it is also possible to implement error detection with a different encoding scheme,such as the cyclic redundancy check (CRC).In this way,we can implement the entire ECC engine in pure software for better flexibility.

    Overall,the hardware components in SoftSSD enable the flash firmware to interact with the host and NAND flash memory via different interfaces.On one hand,the PCIe controller and the software PCIe transaction layer allow the flash firmware to initiate and process memory transactions over a PCIe link.On the other hand,the flash channel controllers retrieve data from the NAND flash memory and check data integrity for the flash firmware.

    4 Software design

    Based on the minimal set of functionalities implemented by the hardware design,we can build the flash firmware to serve user requests.In this section,we discuss the programming model and the implementation of the software components in SoftSSD.

    4.1 Programming model

    FTL processing of a user request involves multiple blocking operations and may be suspended before such operations complete.For example,before issuing a flash command to write data to the NAND flash memory,the flash firmware needs to wait for the PCIe controller to transfer the write data from the host memory into internal buffers in the device DRAM.To exploit the data access parallelism,the flash firmware should continue to process new requests when the current request is blocked on host DMA/flash access operations instead of waiting for the operation to complete.For this reason,the flash firmware needs to efficiently switch between requests once a request is blocked so that multiple requests can be processed in a concurrent manner.Existing methods tackle this problem by separating the request processing into several stages.A user request of an arbitrary size is divided into slice commands that request fixed-sized data.Each stage processes one slice command at a time and when a slice command is blocked,the stage puts the command in an internal output queue for the next stage to retrieve.After the blocking operation completes,the slice command is resumed by the next stage.As such,user requests are processed in a pipelined manner to maximize the throughput.However,existing FTL algorithms still need to be re-designed and manually divided into multiple stages based on the possible suspension points to be used on the existing SSD platform.Furthermore,there can be dependencies between different stages.For example,during the address translation stage,requests may be issued to the flash transaction stage to read in pages that contain translation information.Such dependencies make it more difficult to implement the FTL algorithm as multiple stages.

    In SoftSSD,instead of dividing the flash firmware into stages,we map user requests to threads.Each request accepted through the NVMe interface is assigned to a thread,which runs the request until its completion.Once a request is blocked on data transfer or flash operations,its thread is switched out so that other requests can be processed in an interleaving way.Fig.5 shows an example of concurrent request processing with two threads.During the processing of a user request,the execution may be blocked due to various operations.For example,during the address translation,the FTL may need to issue flash read commands to load missing mapping table pages from the flash memory(mapping table read).It can also be blocked on DMA transfers between the host memory,the device DRAM,and NAND flash memory (host DMA and flash DMA).Whenever a thread is blocked on a specific operation,it puts itself in a wait queue and suspends itself.The scheduler then picks the other thread with pending user requests to continue execution.Later,when the operation completes,the scheduler is notified and it checks the wait queue to resume the suspended threads.With the framework,we can overlap computation tasks with blocking I/O operations so that multiple NVM requests can be processed concurrently to fully use data access parallelism and maximize the throughput.Also,the entire request processing is implemented as a monolithic threaded function in a straightforward way without dividing FTL algorithms into multiple stages connected through message queues.

    Fig.5 An example of request processing under the proposed programming model with two threads

    In this work,we use a coroutine-based asynchronous framework to implement the proposed programming model.Coroutines(Conway,1963;Moura and Ierusalimschy,2009;Belson et al.,2019)are similar to normal functions,but their execution can be suspended and resumed at manually defined suspension points.Local variables used by a coroutine are preserved across suspensions until the coroutine is destroyed.Specifically,we implement one variant called stackful coroutines,in which each coroutine has its own separate runtime stack instead of sharing the stack of the underlying thread.With stackful coroutines,local states and the call stack of a coroutine are preserved on the private stack.Nested coroutine calls are simply function calls that do not incur extra overhead.Context switches between coroutines have the same effect as that of saving and restoring a small number of callee-saved registers on the coroutine stack,including the stack pointer and the program counter.Stackful coroutines can be implemented as a general-purpose library without special syntax (e.g.,async/await) provided by the language compilers,and FTL algorithms can be integrated into the asynchronous framework with minor modifications.

    4.2 Heterogeneous multi-core architecture

    To handle a tremendous number of concurrent I/O requests arriving over multiple queues,manufacturers begin to employ high-performance multi-core microprocessors with more computational power on SSDs.In SoftSSD,we run the flash firmware on an ARM-based heterogeneous multi-core microprocessor with one 64-bit Cortex-A53 core and two 32-bit Cortex-R5 cores,as shown in Fig.6.To better use the parallel processing enabled by the multi-core processor,we adopt a layered design for the flash firmware and divide it into three modules that can independently run on different CPU cores.The FTL module implements the software NVMe interface,which communicates with the host OS driver through the PCIe controller.It also handles most computational tasks(i.e.,cache lookup and address mapping)for a user request.Thus,we run the FTL module on the Cortex-A53 core with the highest computational power.The flash interface layer(FIL)module manages the low-level flash transactions issued by the FTL module and dispatches them to the NAND flash memory through the flash channel controllers.It also continuously sends commands to the NAND flash memory to poll the NAND device status.Finally,the ECC engine module implements the software error correction logic to ensure data integrity.

    Fig.6 Multi-core heterogeneous architecture of SoftSSD

    The FTL module communicates with the FIL and the ECC engine modules with two ring queue pairs located in a shared memory region.Each queue pair has an available ring and a used ring.When the FTL module needs to send a request to the other two modules,it enqueues a command in the corresponding available ring.The FIL and the ECC engine modules continuously poll the available ring for new commands.For the FIL module,each command represents a read or write flash transaction that needs to be executed on the target flash chip.For the ECC engine module,each command contains the data block and the corresponding stored code bits that need to be corrected.After a command is processed,the two modules enqueue a completion entry in the used ring and send an inter-processor interrupt (IPI) to the FTL module to notify it of the command completion.

    4.3 NVMe interface

    In SoftSSD,we implement the NVMe protocol completely in software for better observability and extensibility.With NVMe,the host OS creates multiple queue pairs in the host memory for submitting commands and receiving completion.Internally,the flash firmware starts multiple coroutine threads to handle the requests from one queue pair.Each NVMe worker thread is bound to an NVMe SQ.After a worker thread is started,it continuously polls the SQ for new commands.If the SQ is empty (i.e.,the head pointer and the tail pointer meet),then the worker thread suspends itself and waits until new commands arrive.Otherwise,it fetches the command at the head pointer with a host DMA request through the PCIe controller and updates the head pointer.The command is handled as either an admin command or an I/O command based on the SQ,from which it is received and the command result will be posted to the corresponding completion queue(CQ).Finally,the worker thread sends an MSI to the host to notify it of the command completion.When the host writes to the SQ tail doorbells,a callback function will be invoked with the new tail pointer.If the new tail pointer is different from the current value,it updates the tail pointer and wakes up all worker threads assigned to the SQ to process the new commands.

    4.4 Flash interface layer

    The FIL manages the low-level flash transaction queues and the underlying NAND flash array.To avoid contention between FTL tasks and flash management tasks,we offload the FIL to a dedicated CPU core.The FTL core and the FIL core communicate via a pair of ring queues(the available ring and the used ring) located in a shared memory.The FTL divides user requests of arbitrary sizes into fixed-sized flash transactions and submits them to the available ring.Each flash transaction reads/writes data from/to a flash page identified by its physical address.The physical address is represented as a location vector(channel,way,die,plane,block,page) based on the flash array hierarchy.At runtime,the FIL first polls the available ring for incoming flash transactions.The flash transactions are organized into per-chip transaction queues based on the channel and the way number in their location vectors.Each flash transaction is executed in two phases: the command execution phase and the data transfer phase.Based on the flash transaction type,the command execution phase either reads data from the NAND flash cells into its internal data cache or programs the data in the cache to the flash cells.The data transfer phase transfers data between the device DRAM and the internal data cache.The two phases can be started only if the target die or the channel is idle.Thus,the FIL goes through the transaction queues of all chips that are connected through an idle channel or have at least one die that is not executing any flash command and dispatches the flash transactions to them if possible.The FIL also maintains the list of flash channels and dies with outstanding data transfers/flash commands.It polls the status of the flash channel controllers and the flash dies to check if the data transfers or the flash commands have completed.Once the two execution phases for a flash transaction complete,the flash transaction is added to the used ring and the FIL generates an IPI to notify the FTL core of the completion.

    5 Implementation

    We implement the prototype of SoftSSD on a hardware development board as shown in Fig.7.The core of the SoftSSD board includes a dualcore ARM Cortex-A53 application processing unit(APU),a dual-core ARM Cortex-R5 real-time processing unit (RPU),and programmable logics built into the FPGA.We use one APU core to run the worker coroutine threads of the FTL and two RPU cores to run the FIL and the ECC engine,separately.The SoftSSD board is designed as a PCIe expansion card that can be connected to the host system with PCIe Gen3×8 and serve as a normal NVMe SSD.The flash memory modules are mounted with eight test sockets.The NAND flash packages are connected to the FPGA with eight NV-DDR2 channels specified by the open NAND flash interface(ONFI).The two packages in the same row share two channels and the storage controller supports up to an 8-channel,4-way configuration.SoftSSD enables rapid flash firmware development by implementing a large number of SSD components in software.

    Fig.7 Hardware development board of SoftSSD prototype

    Multiple FTL algorithms can be integrated with the coroutine-based asynchronous framework to provide full-featured flash firmware.In this work,we implement FTL based on the demand-based flash translation layer(DFTL) (Gupta et al.,2009).The FTL uses a page-level mapping table to directly translate a logical page address into a physical one.Mapping table entries are stored as translation pages on the NAND flash memory and partially cached in the device DRAM for faster accesses.The global translation directory maintained in the embedded multimedia card (eMMC) keeps track of the physical locations of the translation pages.

    6 Evaluation

    6.1 Experiment setup

    All experiments are conducted on a host PC with a 6-core Intel?CoreTMi7-8700 CPU and 16-GB RAM.The host PC runs Linux kernel 5.16.The workloads used in the experiments are generated by FIO-3.28.The SoftSSD prototype board is connected to the host PC via PCIe and used as an NVMe block device driven by the OS driver.An additional development PC is used to collect software output and statistics with a debug console via the universal asynchronous receiver/transmitter(UART)serial port.The development PC also runs the development tools for programming the prototype board.

    For the SoftSSD prototype board,we install eight Micron MT29F1T08 MLC NAND flash memory chips(128 GB each).We enable NV-DDR2 timing mode 7 with a maximum data rate of 400 MT/s per channel.The OS driver creates 12 NVMe I/O queues on the device (one per host CPU hardware thread) and the maximum queue depth is set to 1024.Table 2 summarizes the SoftSSD configuration and the flash microarchitecture that are used in the experiments.

    Table 2 SSD configurations used in the experiments

    6.2 Sequential access

    We first measure the sequential access performance of the SoftSSD prototype board.We use flexible I/O tester (FIO) to generate workloads that issue read or write requests to consecutive logical addresses.The queue depth is set to 1 and the number of outstanding requests to the SSD matches the number of threads.The request size is 1 MB for read requests and 16 KB for write requests.All experiments start with an empty mapping table and every accessed logical address is assigned a new physical address by the address mapping unit.For write requests,we issue enough requests to fill the write cache and cause the data cache to evict pages to NAND flash memory.

    As shown in Fig.8,as the number of threads increases from 1 to 16,the throughput for both read and write requests increases.The throughput peaks at 32 threads,where all 32 flash chips (8-channel,4-way) are saturated with the outstanding flash commands.When the number of threads is small (e.g.,1–4),write requests have a lower throughput than read requests due to the longer flash command latency.As the number of threads increases,there are enough requests to saturate the bandwidth of the software host interface so that read and write requests achieve similar peak throughput.

    Fig.8 Sequential access performance of SoftSSD with different numbers of I/O threads: (a) read 1 MB throughput;(b) read 1 MB latency CDF;(c) write 16 KB throughput;(d) write 16 KB latency CDF (CDF:cumulative distribution function; T: number of threads)

    Figs.8b and 8d also show the cumulative distribution functions (CDFs) of end-to-end read and write latencies,respectively.Compared to write latencies,read latencies have a more uniform distribution because data read from the NAND flash memory are not cached in the device DRAM and every read request incurs the same number of transactions to the NAND flash memory.For write requests,data can be buffered in the device DRAM before the data cache is full.After the data cache is filled,data must be evicted and written back to the NAND flash memory to make room for new data,and write requests may incur extra flash write transactions,causing the longer tail latency.

    6.3 Random access

    Fig.9 shows the average throughput and latency CDF of random accesses in SoftSSD.The workloads in the experiments issue read/write requests of 16 KB of data to random logical addresses.The queue depth is set to 1 and the number of outstanding requests to the SSD matches the number of threads.As shown in Fig.9,as the number of threads increases from 1 to 16,the input/output operations per second (IOPS) increase,and peak at 32 threads.Compared to sequential accesses,random read/write accesses achieve a lower maximum throughput due to worse locality.This has a negative impact on different components in the FTL.The mapping table needs to allocate and maintain more translation pages to store the physical addresses of the random logical addresses.Also,the data cache cannot use the temporal locality to coalesce multiple write requests with a cached page,incurring more flash write transactions.

    Fig.9 Random access performance (16 KB) of SoftSSD with different numbers of I/O threads: (a) read throughput;(b)read latency CDF;(c)write throughput;(d)write latency CDF(CDF:cumulative distribution function; T: number of threads)

    6.4 Flash transaction latency

    In SoftSSD,each flash transaction is executed in two phases.In the transfer phase,data are transferred between the device DRAM and the internal data cache in the NAND flash memory.In the command phase,the NAND flash memory executes a flash command to read/program data cache from/to NAND flash cells.With the software implementation,we can use the audit framework to collect performance statistics of different internal operations.Fig.10 shows the latencies of individual execution phases of flash transactions by running 16 KB aligned random read/write requests.

    Fig.10 Latency CDFs of individual phases of flash transactions: (a) read transfer;(b) read command;(c)write transfer;(d) write command (CDF: cumulative distribution function; T: number of threads)

    As shown in Fig.10,when a single thread is used,SoftSSD achieves deterministic latencies for the transfer phase (74.0 μs read,108.0 μs write)and the read command phase (~105 μs).When 32 threads are used,the flash transaction latencies increase due to the scheduling overhead and contention at the bus interconnects which connect the flash channel controllers to the device DRAM.Specifically,when there is a large number of outstanding flash transactions,the FIL core needs to continuously monitor the completion status of up to 64 flash dies,resulting in longer command latencies.Also,when used in multi-level cell(MLC)mode,each flash cell is shared by a lower LSB (short for least significant bit) page and an upper MSB (short for most significant bit)page with different read/program latencies,which is reflected by the steps of the write command latencies.

    6.5 Software overhead

    Different from existing methods,SoftSSD implements the majority of components in software for better extensibility.However,this could also incur overhead and degrade the overall performance compared to a specialized hardware implementation.For example,the host interface controller needs to transfer all PCIe TLPs to the device DRAM and process them in software.It also needs to maintain the internal states and provide a software simulation of the NVMe protocol.To evaluate the overhead of the software layer,we implement a hardware NVMe-over-PCIe controller in the FPGA and compare its performance with that of SoftSSD.The hardware host interface controller manages the NVMe queue pairs created by the host system and fetches NVMe commands from the host memory.Incoming NVMe commands are presented to the flash firmware through a memory-mapped first-in-firstout (FIFO) queue so that the software layer does not need to process packets from the host interface.We also modify the flash firmware of both SoftSSD and the hardware-based implementation to immediately send a successful completion without further FTL processing to measure the raw performance of the host interface controller.

    Fig.11 shows the average throughput of the software and the hardware host interface controller implementations.Compared to SoftSSD,the hardware implementation achieves~10× higher performance.After excluding other FTL components from the system,SoftSSD can process~25 000 commands per second.When the write size is 16 KB,SoftSSD can achieve a maximum throughput of 400 MB/s.Other request processing stages,such as address translation and flash command execution,can introduce latencies and further reduce the performance,which accounts for the low performance of SoftSSD,as shown in Sections 6.2 and 6.3.However,compared to Soft-SSD,it also demands significant effort to develop and maintain the hardware implementation.Adding new storage protocol features requires synthesis and implementation which can be time-consuming when the hardware design becomes complex.Furthermore,the hardware host interface controller is strongly coupled to the underlying transport (i.e.,PCIe) and is not portable to new transports for the NVMe protocol.To provide a trade-offbetween the flexibility and the performance,we design the hardware implementation to expose the same set of APIs as SoftSSD,so that flash firmware can be prototyped on SoftSSD and later integrated with the hardware implementation when the performance is prioritized over extensibility and observability.

    Fig.11 Requests per second of the software (SW)and the hardware (HW) host interface controller implementation

    7 Conclusions

    In this paper,we propose SoftSSD,which enables rapid prototyping of flash firmware on a real hardware platform.The key technique for achieving this is that we implement the majority of components in pure software so that SoftSSD can provide better observability of the internal states in the SSD,and new storage protocol features can be integrated into the flash firmware without modifying the hardware components.We conduct experiments with real I/O workloads to demonstrate the performance of Soft-SSD as a standard NVMe SSD.We believe that the observability and extensibility provided by the Soft-SSD platform can contribute to future flash firmware development in the research communities.

    Contributors

    Jin XUE designed the research.Renhai CHEN and Tianyu WANG designed the experiment platform.Jin XUE processed the data,and drafted the paper.Tianyu WANG helped organize the paper.Jin XUE,Tianyu WANG,and Zili SHAO revised and finalized the paper.

    Compliance with ethics guidelines

    Zili SHAO is a guest editor of this special feature,and he was not involved with the peer review process of this manuscript.Jin XUE,Renhai CHEN,Tianyu WANG,and Zili SHAO declare that they have no conflict of interest.

    Data availability

    The data that support the findings of this study are available from the corresponding author upon reasonable request.

    国产精品98久久久久久宅男小说| 我要看黄色一级片免费的| 久久久久久亚洲精品国产蜜桃av| 男女免费视频国产| 亚洲精品美女久久av网站| avwww免费| 日韩成人在线观看一区二区三区| 国产无遮挡羞羞视频在线观看| 亚洲自偷自拍图片 自拍| 精品国产乱子伦一区二区三区| 久久人妻av系列| 国产精品欧美亚洲77777| svipshipincom国产片| 一本—道久久a久久精品蜜桃钙片| 国产国语露脸激情在线看| 久久久久久久久久久久大奶| 亚洲七黄色美女视频| 精品国产超薄肉色丝袜足j| 无限看片的www在线观看| 97人妻天天添夜夜摸| 老熟妇乱子伦视频在线观看| 国产老妇伦熟女老妇高清| 午夜日韩欧美国产| 精品一区二区三区av网在线观看 | 亚洲av日韩在线播放| 国产伦人伦偷精品视频| 国产伦人伦偷精品视频| 亚洲九九香蕉| 免费高清在线观看日韩| 色94色欧美一区二区| 国产午夜精品久久久久久| 欧美黑人欧美精品刺激| a级片在线免费高清观看视频| 国产不卡av网站在线观看| av在线播放免费不卡| 少妇精品久久久久久久| 天天影视国产精品| 国产成人系列免费观看| 无遮挡黄片免费观看| 啦啦啦免费观看视频1| 少妇精品久久久久久久| 精品国产一区二区三区四区第35| 欧美在线黄色| 国产av一区二区精品久久| 国产一卡二卡三卡精品| 美女高潮喷水抽搐中文字幕| 蜜桃在线观看..| 久久毛片免费看一区二区三区| 亚洲伊人色综图| 国产男靠女视频免费网站| 天天躁夜夜躁狠狠躁躁| 天堂俺去俺来也www色官网| 97人妻天天添夜夜摸| 国产日韩欧美视频二区| 美女国产高潮福利片在线看| 中文字幕高清在线视频| 搡老熟女国产l中国老女人| 亚洲av片天天在线观看| 久久99热这里只频精品6学生| 国产在线精品亚洲第一网站| 久久国产精品人妻蜜桃| 国产高清国产精品国产三级| 男女高潮啪啪啪动态图| 男女高潮啪啪啪动态图| 成人18禁高潮啪啪吃奶动态图| 国产成人啪精品午夜网站| 欧美大码av| 久久影院123| 天堂8中文在线网| 精品少妇久久久久久888优播| 日日摸夜夜添夜夜添小说| 午夜日韩欧美国产| 精品卡一卡二卡四卡免费| 99国产精品99久久久久| 人人妻人人添人人爽欧美一区卜| 最新美女视频免费是黄的| 伦理电影免费视频| 亚洲第一欧美日韩一区二区三区 | 丰满人妻熟妇乱又伦精品不卡| 成年女人毛片免费观看观看9 | 一夜夜www| 视频在线观看一区二区三区| 成人手机av| 亚洲色图 男人天堂 中文字幕| 99热国产这里只有精品6| 亚洲国产av影院在线观看| 精品午夜福利视频在线观看一区 | 不卡av一区二区三区| 老司机亚洲免费影院| 97人妻天天添夜夜摸| 亚洲精品在线美女| 999精品在线视频| 少妇 在线观看| av电影中文网址| 午夜激情av网站| 国产老妇伦熟女老妇高清| 午夜两性在线视频| 飞空精品影院首页| 亚洲av成人不卡在线观看播放网| 久久久国产欧美日韩av| 精品一区二区三卡| 人妻 亚洲 视频| 国产精品一区二区在线观看99| 一夜夜www| 国产黄色免费在线视频| 免费观看a级毛片全部| 久久这里只有精品19| 久久精品国产99精品国产亚洲性色 | 久久人妻av系列| 男男h啪啪无遮挡| 久久久久久久国产电影| 悠悠久久av| 欧美精品高潮呻吟av久久| 女人高潮潮喷娇喘18禁视频| 韩国精品一区二区三区| videos熟女内射| 80岁老熟妇乱子伦牲交| 欧美日本中文国产一区发布| 免费女性裸体啪啪无遮挡网站| 久久天躁狠狠躁夜夜2o2o| 中国美女看黄片| 亚洲伊人久久精品综合| 菩萨蛮人人尽说江南好唐韦庄| 午夜久久久在线观看| xxxhd国产人妻xxx| 精品久久久精品久久久| av免费在线观看网站| 国产亚洲一区二区精品| 国产精品影院久久| 久久精品国产亚洲av香蕉五月 | 国产xxxxx性猛交| 日韩欧美一区二区三区在线观看 | 免费在线观看影片大全网站| 精品人妻1区二区| 久久久欧美国产精品| 国产亚洲欧美在线一区二区| 精品福利永久在线观看| 超色免费av| 欧美人与性动交α欧美精品济南到| 日本vs欧美在线观看视频| 人妻久久中文字幕网| 怎么达到女性高潮| 五月天丁香电影| 久久婷婷成人综合色麻豆| 国产又爽黄色视频| 久久毛片免费看一区二区三区| 亚洲精品美女久久av网站| 国产精品av久久久久免费| a在线观看视频网站| 99香蕉大伊视频| 欧美另类亚洲清纯唯美| 精品国产乱子伦一区二区三区| av不卡在线播放| 欧美亚洲日本最大视频资源| 国产精品影院久久| a级片在线免费高清观看视频| 国产精品一区二区在线观看99| av网站免费在线观看视频| 波多野结衣一区麻豆| 中文字幕色久视频| 大片电影免费在线观看免费| 9色porny在线观看| 精品一区二区三区av网在线观看 | 国产精品影院久久| 一进一出好大好爽视频| 老司机影院毛片| 超色免费av| 2018国产大陆天天弄谢| 国产1区2区3区精品| 水蜜桃什么品种好| 蜜桃国产av成人99| 久久狼人影院| 中文字幕人妻丝袜一区二区| 国产精品av久久久久免费| 两性夫妻黄色片| 老司机午夜福利在线观看视频 | 亚洲精品在线美女| 亚洲国产看品久久| 亚洲熟妇熟女久久| 久久亚洲真实| e午夜精品久久久久久久| 伦理电影免费视频| 久久精品亚洲精品国产色婷小说| 亚洲精品成人av观看孕妇| 欧美亚洲日本最大视频资源| 考比视频在线观看| 99精国产麻豆久久婷婷| 欧美日韩福利视频一区二区| 国产人伦9x9x在线观看| 欧美激情久久久久久爽电影 | 亚洲国产看品久久| 黄色视频,在线免费观看| 天天添夜夜摸| 亚洲性夜色夜夜综合| 亚洲av国产av综合av卡| 麻豆av在线久日| 国产熟女午夜一区二区三区| 国产熟女午夜一区二区三区| 久久精品国产99精品国产亚洲性色 | 欧美日本中文国产一区发布| 亚洲精品美女久久久久99蜜臀| 午夜福利在线免费观看网站| 久久性视频一级片| 91麻豆av在线| 在线观看免费高清a一片| 国产亚洲av高清不卡| 免费少妇av软件| 叶爱在线成人免费视频播放| 欧美乱妇无乱码| 久久ye,这里只有精品| 亚洲va日本ⅴa欧美va伊人久久| 久久人人97超碰香蕉20202| 男女床上黄色一级片免费看| 麻豆av在线久日| 欧美黑人精品巨大| www.自偷自拍.com| 亚洲男人天堂网一区| 国产精品久久久久久人妻精品电影 | svipshipincom国产片| netflix在线观看网站| 免费在线观看影片大全网站| 亚洲第一青青草原| a级毛片在线看网站| 在线av久久热| 午夜91福利影院| 另类精品久久| 丝瓜视频免费看黄片| 欧美+亚洲+日韩+国产| 热re99久久国产66热| 日韩视频一区二区在线观看| 久久精品亚洲熟妇少妇任你| 免费女性裸体啪啪无遮挡网站| 午夜福利视频在线观看免费| 在线观看免费视频网站a站| 欧美日本中文国产一区发布| a在线观看视频网站| 国产亚洲午夜精品一区二区久久| 亚洲va日本ⅴa欧美va伊人久久| videos熟女内射| 午夜福利免费观看在线| 别揉我奶头~嗯~啊~动态视频| 久久人人爽av亚洲精品天堂| 97人妻天天添夜夜摸| 啦啦啦免费观看视频1| 狠狠婷婷综合久久久久久88av| 精品国产乱码久久久久久小说| 国产精品一区二区免费欧美| 亚洲欧美日韩另类电影网站| 国产日韩欧美在线精品| 十八禁网站免费在线| 高清毛片免费观看视频网站 | 在线十欧美十亚洲十日本专区| 嫩草影视91久久| 女人久久www免费人成看片| 两性夫妻黄色片| 精品第一国产精品| 在线观看免费视频日本深夜| 午夜福利欧美成人| 亚洲av第一区精品v没综合| 免费不卡黄色视频| 亚洲精品av麻豆狂野| 亚洲精品中文字幕在线视频| 久久久欧美国产精品| 亚洲国产欧美一区二区综合| 中文字幕色久视频| 国产精品久久久久成人av| 成人国产一区最新在线观看| 国产欧美亚洲国产| 无人区码免费观看不卡 | 国产高清激情床上av| 亚洲黑人精品在线| 亚洲国产欧美在线一区| 亚洲欧美一区二区三区久久| 丝瓜视频免费看黄片| 日韩三级视频一区二区三区| 汤姆久久久久久久影院中文字幕| 最新在线观看一区二区三区| 午夜日韩欧美国产| 热re99久久精品国产66热6| 国产伦理片在线播放av一区| 超色免费av| 国产在线精品亚洲第一网站| 999久久久精品免费观看国产| 国产91精品成人一区二区三区 | 欧美 亚洲 国产 日韩一| 伊人久久大香线蕉亚洲五| 日韩熟女老妇一区二区性免费视频| 久9热在线精品视频| 成人影院久久| 女同久久另类99精品国产91| 宅男免费午夜| 中文欧美无线码| 精品第一国产精品| 国产精品久久久久久精品古装| 久久久久精品人妻al黑| 精品国产一区二区久久| 国产免费视频播放在线视频| 黑人巨大精品欧美一区二区蜜桃| 免费高清在线观看日韩| 少妇的丰满在线观看| 一本大道久久a久久精品| 极品少妇高潮喷水抽搐| 大陆偷拍与自拍| 一本色道久久久久久精品综合| 欧美 亚洲 国产 日韩一| 亚洲专区中文字幕在线| 亚洲av电影在线进入| e午夜精品久久久久久久| 色播在线永久视频| 97在线人人人人妻| 天天操日日干夜夜撸| 久久久久精品国产欧美久久久| 一边摸一边做爽爽视频免费| 国产不卡av网站在线观看| 国产精品成人在线| 日日摸夜夜添夜夜添小说| 91字幕亚洲| 极品教师在线免费播放| 美女扒开内裤让男人捅视频| 久久精品国产综合久久久| 国产高清视频在线播放一区| 日韩欧美一区视频在线观看| 桃红色精品国产亚洲av| 精品国内亚洲2022精品成人 | svipshipincom国产片| 国产精品免费视频内射| 1024香蕉在线观看| 69精品国产乱码久久久| 99久久99久久久精品蜜桃| 天天躁狠狠躁夜夜躁狠狠躁| 中文字幕人妻丝袜制服| www.精华液| 另类精品久久| 悠悠久久av| 美女高潮喷水抽搐中文字幕| 亚洲人成电影免费在线| 成人精品一区二区免费| 一个人免费看片子| 亚洲一码二码三码区别大吗| 精品久久蜜臀av无| 在线观看免费视频日本深夜| 大香蕉久久网| 国产97色在线日韩免费| 色老头精品视频在线观看| 精品人妻熟女毛片av久久网站| 18禁黄网站禁片午夜丰满| 精品久久久精品久久久| 国产日韩欧美在线精品| 中亚洲国语对白在线视频| 中文字幕最新亚洲高清| 亚洲精品在线美女| 叶爱在线成人免费视频播放| 国产麻豆69| 欧美激情久久久久久爽电影 | 岛国毛片在线播放| 狂野欧美激情性xxxx| 久久久久网色| 免费在线观看完整版高清| bbb黄色大片| 我要看黄色一级片免费的| 少妇被粗大的猛进出69影院| 制服诱惑二区| 人人妻人人添人人爽欧美一区卜| 动漫黄色视频在线观看| 国产精品.久久久| 精品欧美一区二区三区在线| 手机成人av网站| 人人妻人人爽人人添夜夜欢视频| 一区在线观看完整版| 久久精品国产综合久久久| 免费少妇av软件| 国产黄色免费在线视频| 日日摸夜夜添夜夜添小说| 精品亚洲成a人片在线观看| 最近最新中文字幕大全免费视频| 中文字幕最新亚洲高清| 91字幕亚洲| 欧美在线一区亚洲| 亚洲情色 制服丝袜| 不卡一级毛片| 91成年电影在线观看| 午夜激情久久久久久久| 欧美亚洲日本最大视频资源| 国产亚洲精品久久久久5区| 成人18禁在线播放| 久久婷婷成人综合色麻豆| 精品国产一区二区三区久久久樱花| av天堂久久9| 亚洲精品国产区一区二| 国产精品自产拍在线观看55亚洲 | e午夜精品久久久久久久| 变态另类成人亚洲欧美熟女 | 日韩欧美免费精品| 搡老乐熟女国产| 伊人久久大香线蕉亚洲五| 日韩欧美免费精品| 久久热在线av| 涩涩av久久男人的天堂| 男女免费视频国产| 午夜久久久在线观看| 中文字幕制服av| 欧美精品人与动牲交sv欧美| 啦啦啦中文免费视频观看日本| 国产日韩欧美在线精品| 在线观看免费日韩欧美大片| 午夜日韩欧美国产| 成年动漫av网址| 亚洲人成电影免费在线| 欧美日韩精品网址| 亚洲中文av在线| 人妻久久中文字幕网| 国产一区二区三区视频了| 少妇裸体淫交视频免费看高清 | 91国产中文字幕| 女人爽到高潮嗷嗷叫在线视频| 午夜福利一区二区在线看| 操出白浆在线播放| 下体分泌物呈黄色| 捣出白浆h1v1| 熟女少妇亚洲综合色aaa.| 悠悠久久av| 久久久久久久久久久久大奶| 日日爽夜夜爽网站| 欧美在线黄色| 99热国产这里只有精品6| 老熟妇乱子伦视频在线观看| 欧美日本中文国产一区发布| 精品欧美一区二区三区在线| 黄色 视频免费看| cao死你这个sao货| 免费一级毛片在线播放高清视频 | 亚洲av电影在线进入| 黄色a级毛片大全视频| 免费少妇av软件| 久久性视频一级片| 午夜日韩欧美国产| 国产91精品成人一区二区三区 | 97在线人人人人妻| 超色免费av| 亚洲va日本ⅴa欧美va伊人久久| 国产在线视频一区二区| 婷婷丁香在线五月| 国产在线一区二区三区精| 久久性视频一级片| 欧美人与性动交α欧美软件| 大香蕉久久网| 欧美精品av麻豆av| 亚洲第一av免费看| 99热国产这里只有精品6| 国产精品一区二区精品视频观看| 欧美激情高清一区二区三区| 亚洲av成人一区二区三| 久久久精品免费免费高清| 久9热在线精品视频| 99国产精品一区二区三区| 精品国产国语对白av| 国产精品免费大片| 国产日韩欧美亚洲二区| 亚洲色图 男人天堂 中文字幕| 欧美乱码精品一区二区三区| 欧美老熟妇乱子伦牲交| 一本色道久久久久久精品综合| 亚洲国产成人一精品久久久| 男女边摸边吃奶| 午夜福利影视在线免费观看| 精品少妇黑人巨大在线播放| 中文字幕色久视频| 纯流量卡能插随身wifi吗| 亚洲av国产av综合av卡| 国产亚洲精品一区二区www | 一区二区日韩欧美中文字幕| 国产一区二区在线观看av| 蜜桃国产av成人99| 欧美精品人与动牲交sv欧美| 无限看片的www在线观看| 天堂动漫精品| 狂野欧美激情性xxxx| 亚洲少妇的诱惑av| 日本一区二区免费在线视频| 亚洲精品中文字幕在线视频| 免费人妻精品一区二区三区视频| 国产老妇伦熟女老妇高清| 天天添夜夜摸| 看免费av毛片| 蜜桃在线观看..| 成人18禁在线播放| 日韩精品免费视频一区二区三区| 中文字幕色久视频| 97在线人人人人妻| 亚洲一区二区三区欧美精品| 五月天丁香电影| 午夜福利免费观看在线| 国产99久久九九免费精品| 亚洲第一青青草原| 91大片在线观看| 精品一品国产午夜福利视频| 日韩大片免费观看网站| av天堂在线播放| 精品国产乱码久久久久久小说| 一边摸一边做爽爽视频免费| 69av精品久久久久久 | 黑丝袜美女国产一区| 成人精品一区二区免费| 老汉色av国产亚洲站长工具| 国产aⅴ精品一区二区三区波| h视频一区二区三区| 狂野欧美激情性xxxx| videos熟女内射| 久久久精品国产亚洲av高清涩受| 欧美 日韩 精品 国产| 欧美日韩视频精品一区| 女同久久另类99精品国产91| 欧美精品一区二区大全| 色老头精品视频在线观看| 久久99一区二区三区| 欧美 亚洲 国产 日韩一| 国内毛片毛片毛片毛片毛片| 色综合婷婷激情| 国产一卡二卡三卡精品| 久久久久精品国产欧美久久久| 亚洲精品久久成人aⅴ小说| 色综合欧美亚洲国产小说| 涩涩av久久男人的天堂| 人人妻人人澡人人看| 成人av一区二区三区在线看| 亚洲天堂av无毛| 美女视频免费永久观看网站| 欧美精品一区二区免费开放| 黄网站色视频无遮挡免费观看| 最近最新免费中文字幕在线| 老司机在亚洲福利影院| 国产成人精品久久二区二区91| 色老头精品视频在线观看| 狠狠狠狠99中文字幕| 免费日韩欧美在线观看| 一夜夜www| 咕卡用的链子| 精品久久久久久久毛片微露脸| 一区二区三区国产精品乱码| 女人久久www免费人成看片| 黄色毛片三级朝国网站| 国产精品九九99| 日日爽夜夜爽网站| 亚洲精品国产精品久久久不卡| 日本av手机在线免费观看| 在线观看免费视频网站a站| 美女扒开内裤让男人捅视频| 在线观看免费视频网站a站| 夫妻午夜视频| 久久毛片免费看一区二区三区| 夫妻午夜视频| 欧美午夜高清在线| 女人精品久久久久毛片| 极品少妇高潮喷水抽搐| 桃花免费在线播放| 欧美av亚洲av综合av国产av| 成人国产av品久久久| 免费观看人在逋| av国产精品久久久久影院| 亚洲成av片中文字幕在线观看| 高潮久久久久久久久久久不卡| 欧美激情 高清一区二区三区| 91老司机精品| 成年人免费黄色播放视频| 免费在线观看日本一区| 老汉色av国产亚洲站长工具| 亚洲精品av麻豆狂野| 久久久国产精品麻豆| 午夜91福利影院| 91av网站免费观看| av不卡在线播放| 精品福利永久在线观看| 9热在线视频观看99| 亚洲一码二码三码区别大吗| 国产老妇伦熟女老妇高清| 亚洲av第一区精品v没综合| 欧美激情久久久久久爽电影 | av超薄肉色丝袜交足视频| 97人妻天天添夜夜摸| 交换朋友夫妻互换小说| 一级黄色大片毛片| 十八禁人妻一区二区| 亚洲成人免费电影在线观看| 桃红色精品国产亚洲av| 超碰97精品在线观看| 美女主播在线视频| 丝瓜视频免费看黄片| tocl精华| 不卡av一区二区三区| 国产免费福利视频在线观看| 又黄又粗又硬又大视频| 91成年电影在线观看| 黄色怎么调成土黄色| 亚洲av日韩精品久久久久久密| 亚洲成人手机| 精品免费久久久久久久清纯 | 不卡av一区二区三区| 电影成人av| 欧美成人午夜精品| av福利片在线| 在线观看人妻少妇| 亚洲国产中文字幕在线视频| 考比视频在线观看| 色视频在线一区二区三区| 岛国毛片在线播放| 一区二区三区国产精品乱码| 极品教师在线免费播放| 色老头精品视频在线观看| 最新的欧美精品一区二区| 日韩中文字幕欧美一区二区| av福利片在线| 亚洲欧美色中文字幕在线| 夫妻午夜视频| 99国产精品免费福利视频| 美女福利国产在线|