• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A survey on design and application of open-channel solid-state drives*

    2023-06-02 12:30:48JunchaoCHENGuangyanZHANGJunyuWEI

    Junchao CHEN ,Guangyan ZHANG ,Junyu WEI

    1Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China

    2Xi’an Satellite Control Center,Xi’an 710043,China

    3Beijing National Research Center for Information Science and Technology,Tsinghua University,Beijing 100084,China

    Abstract: Compared with traditional solid-state drives(SSDs),open-channel SSDs(OCSSDs)expose their internal physical layout and provide a host-based flash translation layer (FTL) that allows host-side software to control the internal operations such as garbage collection (GC) and input/output (I/O) scheduling.In this paper,we comprehensively survey research works built on OCSSDs in recent years.We show how they leverage the features of OCSSDs to achieve high throughput,low latency,long lifetime,strong performance isolation,and high resource utilization.We categorize these efforts into five groups based on their optimization methods: adaptive interface customizing,rich FTL co-designing,internal parallelism exploiting,rational I/O scheduling,and efficient GC processing.We discuss the strengths and weaknesses of these efforts and find that almost all these efforts face a dilemma between performance effectiveness and management complexity.We hope that this survey can provide fundamental knowledge to researchers who want to enter this field and further inspire new ideas for the development of OCSSDs.

    Key words: Domain-specific storage;Flash translation layer;Garbage collection;Internal parallelism;Open-channel solid-state drives (OCSSDs)

    1 Introduction

    In recent years,solid-state drives (SSDs) have been widely used in data centers (Bj?rling et al.,2017),cloud storage systems(Zhang XY et al.,2021),and mobile devices (Lee C et al.,2015) due to their better performance such as higher input/output(I/O) throughput,lower access latency,and lower power consumption than hard-disk drives(HDDs).

    However,the design principle of traditional file systems targets the physical media of HDDs and does not consider the features of flash memory,which leads to inefficient utilization of SSDs.Even worse,frequent random writes and flush operations to an SSD can seriously aggravate the wearing,cause internal fragmentation,and reduce its lifetime (Lee C et al.,2015).Storing data on an aged SSD may trigger garbage collection (GC) operations frequently,which will cause a bad quality of service(QoS).

    Some traditional file systems are designed to overcome the challenges of frequent random writes and flush operations.For example,the log-structured file system (LFS) (Rosenblum and Ousterhout,1991) writes data sequentially to disks in a log-structured manner,which converts small synchronous random writes into large asynchronous sequential writes,while designed for HDDs,it can eliminate the negative effect of frequent small writes to SSDs and significantly improve the write throughput.The copy on write (COW) strategy of BTRFS (Rodeh et al.,2013) and the delayed allocation mechanism of Ext4 (Mathur et al.,2007) are able to improve the resistance to fragmentation and thus can mitigate the aging problem of flash media.However,these works do not consider the physical features of SSDs and fail to exploit their strengths.

    Some research works try to design tailored file systems or optimize specific methods to exploit the potential of SSDs.For example,F2FS(Lee C et al.,2015)is a Linux kernel file system designed for flash storage devices;it uses a flash-friendly data layout and an effective GC mechanism to enhance flash storage performance and prolong its lifetime.Park et al.(2010) proposed external request rescheduling and dynamic write mapping approaches to exploit SSD’s internal plane-and die-level parallelism.Yang LH et al.(2019)observed the external behaviors of SSDs and exploited their internal parallelism for specific applications.However,these works still do not fully explore the performance advantage of SSDs mainly because they treat an SSD as a black box,and instead of directly building upon raw flash memory,they are based on the block interface to the built-in flash translation layer (FTL).This results in a huge semantic gap between file systems and flash devices,and additionally causes the inability to fully leverage the parallelism and scheduling mechanism inside SSDs.

    Therefore,more researchers have shifted their attention to the key ideas of treating an SSD as a white box,namely,moving the FTL from the device side to the host side and letting the external software directly interact with raw flash memory.These ideas give birth to open-channel SSDs (OCSSDs).An OCSSD exposes its internal physical layout and moves the FTL to the host side,allowing the FTL to be controlled and co-designed by the host side.OCSSD’s openness concept allows the host side to fully exploit its internal parallelism and design optimization methods,such as rational I/O scheduling policies and efficient GC operations,to achieve high performance,such as high I/O throughput,stable performance,low tail latency,and high resource utilization.

    This paper presents a review of nearly 10 years of exploring the advantages of OCSSDs in academia and industry;the representative papers we surveyed are shown in Table 1.These works are roughly di-vided into three time periods: works before 2015 initially explore the open-channel advantages of SSDs and gradually form and perfect the concept of OCSSDs;works between 2016 and 2018 focus mainly on the systematic design of OCSSDs and form some relatively mature file systems;works after 2018 mainly try to address domain-specific problems based on LightNVM or simulation platforms,and meanwhile optimize them for better performance.

    Table 1 Representative works in 2013–2022

    2 Overview of open-channel solid-state drives

    SSD is a type of persistent storage mainly with NAND flash as its media.Different from HDDs,SSDs are made up of pure electronic circuits and store data in semiconductors,avoiding the bottleneck of physical mechanical movement;they can reach a better performance under the automated management of embedded FTL such as GC,address mapping,and error handling (Fig.1a).

    Fig.1 FTL in SSDs (a) compared to FTL in OCSSDs (b).FTL: flash translation layer;SSD: solidstate drive;OCSSD: open-channel SSD;GC: garbage collection;I/O: input/output;ECC: error correcting code;WL: wear leveling

    OCSSDs have evolved from SSDs (SSD refers to traditional SSD in the remainder of this paper unless explicitly stated otherwise) in recent years.Unlike SSDs,they move the FTL to the host side and expose the internal physical features of SSDs.To be concrete,as shown in Fig.1b,OCSSDs move application-related functions such as GC and I/O scheduling to the host-side FTL,and in most cases leave the device-closely-related functions,such as wear leveling(WL)and error correcting code(ECC),in a simple FTL at the device side.By doing so,they expose the physical page addresses (PPAs) of the space-arranged NAND storage units rather than logical block addresses(LBAs) to the host-side software for better hardware access.OCSSDs enable customizing the FTL for specific scenarios and inherit the advantages of SSDs while having the potential to avoid SSDs’ shortcomings.A typical architecture of OCSSDs is shown in Fig.2.

    Fig.2 A typical OCSSD architecture.OCSSD: openchannel solid-state drive;POSIX: portable operating system interface;FTL: flash translation layer;GC:garbage collection;I/O: input/output;ECC: error correcting code;PU:parallel unit;PPA:physical page address

    2.1 Physical layout

    The physical layout of the storage media of a typical OCSSD is organized as a hierarchy of channel,chip,die,plane,block,and page.This physical layout brings an opportunity to exploit the internal parallelism of OCSSDs.Existing works have explored the parallelism from four levels (Hu et al.,2013): channel level (e.g.,Lu et al.,2019b),chip level (e.g.,Wang Y et al.,2019),die level (e.g.,Bj?rling et al.,2017),and plane level (e.g.,Ouyang et al.,2014).From the software perspective,the open-channel specification(The Open-Channel SSD Community,2023)suggests exposing three levels on channels,parallel units(PUs),and planes,which integrates the chip level and die level as the PU level.

    2.2 Properties of the flash translation layer

    FTL is the core component of an OCSSD and is placed over the hardware.The design quality of an FTL directly determines the quality of OCSSDs in terms of performance,reliability,durability,and so on.An FTL can be of two forms: partly on the device side and partly on the host side,where the device-side light-weight FTL helps OCSSDs manage functions related to the hardware;or fully on the host side,to allow effective design and management.The FTL on the host side can be in the kernel space to achieve efficient I/O interactions or be in the user space to coordinate with applications conveniently.

    A well-designed FTL can bridge the semantic gap between file systems and flash devices,co-design with host software,and eliminate redundant functions,such as space allocation and GC operations in multiple layers,which can further reduce the performance overhead.

    2.3 Interface

    OCSSDs often provide three types of interfaces for interaction: a general block interface,a portable operating system interface (POSIX),and a usercustomized interface often co-designed with hostside software.The FTL can access the raw flash memory through hardware interfaces such as the PPA (Fig.2).The PPA relies on a hierarchical address space(Picoli et al.,2017),which allows the host side to see the media in a fine-grained manner and helps developers directly access the NAND flash cells using data commands including read/write/erase operations.

    Generally,these operations have different granularities: page is the minimum unit for reads and writes,while block is the minimum unit for erases;different granularities of writes and erases may lead to write amplification (WA).These operations also have different access latencies: read is an order of magnitude faster than write/erase,and if a read operation is scheduled after performing a write/erase operation,it may lead to higher read latency.Therefore,a well-designed interface for monitoring and scheduling these operations is very important,which determines the high and stable performance of OCSSDs.

    3 Opportunities

    3.1 Key challenges of solid-state drives

    SSDs have many performance advantages;however,due to the erase-before-write limitation (Lee C et al.,2015),the different write/erase granularities,and the disk-like block interface(Bj?rling et al.,2017),there are several well-documented shortcomings that prevent SSDs from providing high QoS and fully exploiting the benefits of flash memory,as listed below:

    1.Log-on-log problem(Yang JP et al.,2014)

    Due to the functional completeness considerations,each software layer may prefer to perform the same function such as data mapping and GC,resulting in additional performance overhead.

    2.Flash endurance reduction(Lu et al.,2013)

    Extra write operations from outer systems and valid page migration from GC may incur WA and frequent program/erase (P/E) cycles,resulting in expensive bandwidth consumption and flash lifetime reduction.

    3.Long tail latency (Hao et al.,2016;Kang et al.,2017)

    This is usually caused by time-consuming GC operations and irrational I/O scheduling,which may incur busy writes/erases and hungry reads,resulting in long and unpredictable tail latencies.

    4.Inadequate resource utilization (Ouyang et al.,2014)

    SSDs may provide only <50%of the raw bandwidth to applications and often reserve 20%–30%or even larger space as over-provisioning (OP) for GC and other operations.This extremely low resource utilization may result in a significant waste of expensive resources on both initial and recurring costs.

    These shortcomings present great challenges to SSDs and are often difficult to address because SSDs are treated as black boxes and are managed by the built-in FTL.An FTL can provide interoperability(Lee S et al.,2016) and automated management,but it also hides the design details of SSDs such as the execution of erase operations and internal parallelism,and has become a bottleneck to further improve SSDs’ performance.

    Compared to SSDs,OCSSDs can provide new ideas for many research breakthroughs and problem solutions,especially in domain-specific scenarios.OCSSDs expose their internal physical features,allowing host-side applications to customize the FTL according to their own needs,which brings unprecedented opportunities for developers to fully leverage the performance potential of OCSSDs while also posing some challenges for how to study them well under complex management costs.

    Some studies argue that the internal structure of SSDs should not be opened(Yang LH et al.,2019)since it may reduce the security and flexibility of application accesses and raise design complexity.However,existing works on OCSSDs suggest that open designs for domain-specific scenarios can bring obvious performance benefits when properly designed.

    3.2 Opportunities for open-channel solidstate drives

    OCSSDs bring many opportunities for researchers in academia and industry to achieve specific performance goals or address practical problems in different scenarios.We focus mainly on the following five metrics: throughput,latency,lifetime,isolation,and resource utilization.In this subsection,we give only a rough discussion of some representative works to illustrate the performance opportunities presented by OCSSDs.More systematic methods will be analyzed in detail in Section 4 for clarity.More works on leveraging these opportunities to achieve better performance are summarized in Table 2.

    Table 2 Summary of the main optimization goals that some existing works try to achieve

    3.2.1 High throughput

    SSDs can achieve only sub-optimal I/O throughput because of inefficient GC and under-utilized internal parallelism.Even flash-optimized file systems such as F2FS do not perform well under heavy write traffic.OCSSDs can address this shortcoming by allowing users to optimize the GC mechanism,redesign I/O scheduling,and fully exploit the parallelism to significantly improve the throughput.

    Compared to F2FS,ParaFS (Zhang JC et al.,2016) proposes a two-dimensional (2D) data allocation to fully leverage OCSSDs’ channel-level parallelism while reducing valid page migrating during GC operations,which may impact the data read/write throughput.Evaluations show that ParaFS can outperform F2FS by 1.6–3.1 times under write-intensive workloads.

    FlashKV (Zhang JC et al.,2017) and LOCS(Wang P et al.,2014) leverage multi-channel parallelism to improve the throughput of log-structured merge tree (LSM-tree)based key-value(KV) stores.FlashKV uses a parallelism-friendly data layout to manage the raw flash space in the user space,which improves the throughput by 1.5–4.5 times compared with LevelDB.LOCS directly uses software-defined flash (SDF) (Ouyang et al.,2014) as the underlying hardware and adopts a dynamic I/O dispatching policy to fully exploit the channels,which improves throughput by >4 times.NVMKV (Marmol et al.,2015)leverages the advanced host-side FTL capabilities to enable its hash-based KV store,achieving high throughput and low WA while ensuring scalable and ACID(atomicity,consistency,isolation,and durability)compliant KV operations.

    3.2.2 Low latency

    Enterprise-grade systems need to provide high QoS for their tenants since unpredictable or unstable latency will hurt users’productivity.Even worse,long tail latency may slow down the response time or even cause service timeout which fails to meet the requirements of real-time and quality-critical systems(Kang et al.,2017).Fortunately,OCSSDs can solve these problems explicitly by adopting rational GC mechanisms and I/O scheduling strategies.

    1.Predictability

    LightNVM (Bj?rling et al.,2017) is the first generic subsystem designed for OCSSDs.It exposes the physical layout of OCSSDs by the physical block device (PBLK) and narrows the semantic gap by a high-level I/O interface.It allows the developers to explicitly manage the OCSSD’s PUs and design the GC and I/O scheduling according to workloads,which helps achieve predictable latency.By controlling the GC process in the flash array,IODA (Li et al.,2021) can achieve strong latency predictability and reduce the P95–P99.99 latencies to be under 2%.

    2.Stability

    ParaFS adopts a two-phase scheduling policy to fairly assign read/write/erase requests to the flash channels and achieves a consistent performance under write-intensive workloads.TTFLASH (Yan et al.,2017)introduces four key strategies to remove GC blocking from all software stacks and can achieve a guaranteed high performance.Evaluation results show that TTFLASH is 2.5–7.8 times faster compared to the base in average latencies.

    3.2.3 Long flash lifetime

    SSDs face severe WA and inefficient GC problems,which reduce flash endurance excessively.Fortunately,by using host-side FTL and exposing the physical layout of OCSSDs,developers can control data dispatching actions and data placements to prolong flash lifetime by reducing flash WA,achieving efficient GC,and addressing WL issues.

    1.WA

    For example,OFTL (Lu et al.,2013) uses an in-kernel FTL co-designed with the file system to reduce data updates and significantly reduce WA to 47.4%–89.4% in the SYNC mode and 19.8%–64.0%in the ASYNC mode compared with ext2,ext3,and btrfs.By bypassing the file system layer and avoiding partial page updates,FlashKV uses the in-file indexing to eliminate the cascading update and can greatly reduce the WA from the file system and FTL.

    2.GC

    SSW (Du et al.,2020) uses strictly sequential writes to make sure that the writes are not shuffled,which can reduce the number of GC operations on average by 26.65%.Some other works such as StageFS (Lu et al.,2019a) and QBLKe (Qin et al.,2021b) improve the flash lifetime by reducing valid page migrations during the GC process.

    3.WL

    To prevent flash memory cells from wearing out,Prism-SSD(Shen et al.,2022)uses a global WL module by shuffling hot/cold PUs based on the “update likelihood” (or hotness) (Lee C et al.,2015),and SOFA (Chiueh et al.,2014) uses a global FTL that provides both intradisk and interdisk global WL to prolong the lifetime of the flash array.

    3.2.4 Strong performance isolation

    In multi-tenant cloud environments,cloud providers need to provide isolated and stable services to their tenants.Dedicated devices for each tenant may provide great performance while incurring unacceptable cost and management overhead as the tenant scales expand dramatically.Many providers choose SSDs as their storage devices.They use software methods such as Dockers to create isolation space for each tenant or hardware techniques to separate tenants to different flash blocks.However,the unpredictable global GC operations caused by one tenant may affect other tenants’experiences.Meanwhile,tenants still share the flash channels,which may cause excessive conflict in heavy workloads and reduce the overall performance.OCSSDs bring new opportunities for strong performance isolation because of their exposed device geometry.

    Multi-Partition (González and Bj?rling,2017)instantiates several PBLKs (Bj?rling et al.,2017)to extend LightNVM with multi-target support and to separate tenants to dedicated instances and PUs(Fig.3a).Each instance executes independent GC operations and I/O scheduling as if every tenant owns the hardware resource and will not be disturbed by other tenants.

    Fig.3 Design of strong performance isolation: (a)Multi-Partition;(b) OC-Cache.I/O: input/output;FS: file system;PBLK: physical block device;PU:parallel unit;Ch: channel;OCSSD: open-channel solid-state drive

    On the other hand,OC-Cache(Wang HT et al.,2018)achieves strong performance isolation by using OCSSDs as I/O caches.As shown in Fig.3b,it assigns one or more flash channels exclusively to one tenant to ensure strong performance isolation and reserves some shared channels for dynamical assignment according to the tenant’s miss ratio curves of workloads,which can provide better QoS and leverage the OCSSD’s internal parallelism.

    Lee S et al.(2019)aimed to achieve performance isolation between multiple processes rather than between tenants.They proposed a host-level workloadaware budget compensation (hWA-BC) scheduler based on LightNVM.The scheduler focuses mainly on the application’s contribution to GC and takes two factors (the read–write request ratio and the valid–invalid flash page ratio) into consideration.Evaluation results show that hWA-BC succeeds in compensating/penalizing the workloads and ensures performance isolation.

    3.2.5 High resource utilization

    By exposing the physical layout and removing the FTL to the host side,OCSSDs can reach higher resource utilization than SSDs.The resource utilization we have discussed in this subsection includes mainly bandwidth utilization,device-side capacity utilization,host-side CPU/memory utilization,and data reliability utilization that can avoid unnecessary error data accesses and improve critical data utilization.More information can be seen in Table 2.

    1.Capacity

    DIDACache(Shen et al.,2017)can dynamically adjust the OP to maximize the usable flash space for KV caching to increase the cost efficiency.SDF introduces a field-programmable gate array(FPGA)based controller and implements a host-side FTL to directly access the raw flash channels,avoiding reserving space for GC and parity coding in SSDs,which makes SDF achieve 99%user capacity and reduce per-gigabyte hardware cost by 50%on average.

    2.Bandwidth

    SDF can also achieve 95% of the flash’s raw bandwidth by highly exploiting the channel-level parallelism.LightNVM can reach a high write bandwidth based on OCSSDs.However,it performs poorly under multi-thread workloads due to the use of global spinlocks in PBLK.Qin et al.(2021b)proposed QBLKe to improve the scalability and reduce software overhead.By adopting three techniques(see details in Section 4.2.1) to avoid the global spinlocks,QBLKe can improve the write bandwidth up to 78.9% compared with PBLK under 32-thread 4 KB random write test.

    3.Data reliability

    Three-dimensional (3D) NAND flash memory provides ultra-high capacity compared to the planar one;however,it suffers from severe thermal problems that lead to data loss and a sharp decrease in performance.ThermAlloc(Wang Y et al.,2019)distributes the centrally accessed data to different physical locations with a certain distance and postpones or reduces unnecessary GC operations to balance heat issues using OCSSDs,which makes ThermAlloc reduce the peak temperature by 30% and improve the data reliability.PATCH (Chen et al.,2019) focuses on the unreliability issue caused by low cell current in 3D flashes.It can identify the unreliable blocks and allocate/reallocate the write data to other reliable blocks,which can reduce the uncorrectable bit errors and avoid unnecessary data access.PVSensing (Wang Y et al.,2022)considers the process variation and allocates write requests to the corresponding physical blocks to reduce uncorrectable bit errors.Compared to PATCH,PVSensing can reach a more effective reduction of the uncorrectable bit errors and improve the utilization of reliable flash memory.

    From the above-mentioned performance benefits,we can see that OCSSDs outperform SSDs in many specific scenarios.These opportunities come mainly from the carefully designed host-side FTL and the exposed multi-level parallelism.Meanwhile,developers need to consider their workloads and application behaviors to design the data access interfaces,the efficient GC mechanism,and the I/O scheduling strategy to better explore the performance advantages of OCSSDs.

    4 Methodologies

    Although OCSSDs bring many opportunities to achieve better performance,we should realize that achieving optimal performance rather than suboptimal performance is challenging.There are some issues that have to be addressed at the software level for the opportunities to be fully turned into real performance improvements,especially designing effective methods according to the workloads and application behaviors under complex software/hardware environments.In this section,we survey the related works in respect of their research methods from five different perspectives.An overview of some typical optimization methods is shown in Table 3.We will discuss these methods in detail and try to answer the following questions:

    1.What approaches do existing works adopt to exploit the performance benefits of OCSSDs?

    2.What unique issues are addressed with these approaches?

    3.What are the advantages and potential shortcomings of these approaches?

    4.1 Interface design

    Unlike well-packaged SSDs,with OCSSDs,developers can customize interfaces that are exposed to the host while ensuring effective access to the device.The low-level interfaces should consider the device geometry;the high-level interfaces need to ensure high efficiency,delivering semantic information efficiently,and good compatibility with existing systems,which can bring many performance advantages and good usability/scalability.

    LightNVM proposes a classic three-layer architecture: a high-level I/O interface for user-space applications,an FTL in kernel space for main functionalities,and a non-volatile memory express (NVMe)device driver for hardware management.Each layer provides a different abstraction based on its characteristics and can interact well with each other through appropriate interfaces.LightNVM provides a PPA interface to expose its device geometry and parallelism,which is implemented to meet the NVMe standard (NVM Express,Inc.,2023).Meanwhile,host applications or file systems can access OCSSDs through a traditional block I/O interface provided by PBLK to reduce software changes.Moreover,LightNVM allows developers to customize their own application-specific interfaces by exposing geometric addresses to the user space.

    Many other studies also customize interfaces for their practical needs.For example,as an object store,OCStore (Lu et al.,2019b) uses an object interface for host applications for better interaction.FlashKV combines the compaction procedure with GC and I/O scheduling,leaving only an input/output control(ioctl)interface to dispatch requests to FTL for simplification.OFTL (Lu et al.,2013)co-designs software and hardware in embedded systems and provides fine-grained byte-unit object interfaces to file systems,which can compact small updates into fewer pages to reduce WA.To provide different operational granularities for GC efficiency,SDF (Ouyang et al.,2014) customizes an asymmetric read/write/erase operation interface to external software.

    All these designs try to ensure better interaction across the external software,the FTL,and the underlying hardware.However,application-friendly flexible interfaces and optimal performance are difficult to achieve simultaneously.To bridge this gap,Prism-SSD (Shen et al.,2022) proposes three-level abstraction of interfaces to expose the OCSSD hardware,namely the raw-flash-level abstraction,the flash-function-level abstraction,and the user-policylevel abstraction.Prism-SSD provides many application programming interfaces (APIs) and allows developers to choose the best trade offbetween usability and efficiency,and the multi-level abstraction may give us more inspiration to better design the OCSSD interfaces.

    4.2 Flash translation later design

    FTL provides OCSSDs rich functions such as GC,I/O scheduling,and data placement.It is a challenge for FTLs to provide high performance and low software overhead while ensuring high scalability and usability (Qin et al.,2021b).Without properly designing the FTL software architecture,it is hard to fully leverage the OCSSD’s advantages and may put a heavy load on the host hardware.

    4.2.1 Generic design

    As the first generic subsystem for OCSSDs,LightNVM implements its FTL called PBLK in the kernel space and also leaves a simple controller on the device side to manage the hardware well.PBLK provides rich functions such as GC,handling errors,and mapping logical address to physical address(L2P) for LightNVM,and deals with specific constraints on flash media and device controller.With the help of PBLK,LightNVM can provide high performance and small software overhead for the host applications.

    However,general-purpose systems usually face the dilemma between performance and safety,just like Qin et al.(2021b) found that PBLK was limited under multi-thread workloads.The limitation comes from the global spinlocks used in the three components,i.e.,the data buffer,translation map,and direct memory access(DMA)memory pool.The global spinlocks offer unnecessary protection for security and hurt the scalability of PBLK,which makes host software unable to fully exploit the OCSSD performance.

    To address this limitation,they proposed QBLKe,an extended version of QBLK (Qin et al.,2019).QBLKe uses three main optimization techniques: the per-CPU ring buffer technique which allocates a ring buffer to each CPU to minimize the competition that comes from other buffers,the lockfree translation map which uses the atomic primitives to ensure the atomicity of operations,and the per-CPU DMA pool which assigns a DMA memory pool for each CPU to eliminate competition for the lock.Meanwhile,QBLKe allocates a GC thread for each channel to decrease GC page-migration overhead.All these optimization methods make QBLKe more scalable under multi-thread workloads and can achieve higher throughput and write bandwidth.By leveraging the advantages of QBLKe,they further developed a Linux file system called NBFS (Qin et al.,2021a) to achieve atomic writes with elimination of I/O orderings and provide a double performance of F2FS.

    4.2.2 Specific design

    Some previous works (e.g.,Lu et al.,2013;Zhang JC et al.,2016) prefer to design dedicated FTLs with different placement strategies to achieve specific performance under domain-specific scenarios.Putting FTLs in the user space is beneficial for co-designing with applications.Leaving FTLs in the kernel space can reduce workloads on the device side and make them interact well with the file system while sensing the upper-level semantic information.Some other works(e.g.,Lu et al.,2019b;Zhang XY et al.,2021) choose to leave a simple FTL still on the device side,which can use the ability of the device to do some simple operations closely related to hardware management and offload the host-side workloads to the device.The four types of classic domain-specific FTL design patterns are shown in Fig.4.

    Fig.4 Four types of domain-specific FTL designs:(a) OFTL;(b) ParaFS;(c) FlashKV;(d) OCStore.FTL: flash translation layer;GC: garbage collection;I/O: input/output;WL: wear leveling;ECC: error correcting code

    1.Only in kernel

    OFTL implements all the FTL functions in the kernel space (Fig.4a),which can interact well with the upper system and directly access the raw flash memory.To significantly extend flash memory lifetime by reducing WA in FTL,OFTL buffers and compacts frequent partial page updates to reduce the number of updates and designs a lazy indexing technique that uses backward pointers to reduce index metadata persistence and journaling overhead.Although OFTL is an early work and does not consider data hotness and the optimization of GC,the software/hardware co-designed concept allows designing flexible approaches to solve specific bottlenecks for better performance.

    2.FS-kernel

    ParaFS (Zhang JC et al.,2016) uses a simple lightweight in-kernel FTL called S-FTL only for WL,ECC,and block-level mapping,leaving the application-related functions such as GC and I/O scheduling in the file system(Fig.4b).In particular,the mapping is designed as static in a log-structured way(O’Neil et al.,1996)and has no in-place updates,which means that it does not need remapping operations and incurs nearly zero overhead.In addition,S-FTL knows the behaviors of the file system by exposing read/write/erase interfaces and ensures that the data grouping at the file system layer will not be shuffled in the flash channels.To sum up,ParaFS renders it easy to achieve efficient GC and reasonable I/O scheduling by integrating them with the file system;moreover,letting the S-FTL execute only simple commands directly related to the flash memory can reduce redundant functions between multiple layers and improve the overall performance.

    3.Application-kernel

    User-space applications often need to cross multiple software layers such as application layer,file system layer,and FTL to access devices,which will induce severe WA and GC overhead under write-intensive workloads if these layers take redundant functions(Fig.5);functions with the same color mean reduplicated design.FileStore (Weil et al.,2006) implements mapping at all three layers and journaling in the application and file system layers,while BlueStore (Weil et al.,2006) implements GC and space allocation at both the host side and device side,which introduces extra software overhead.

    Fig.5 Redundant functions across multiple layers.GC: garbage collection;FTL: flash translation layer(References to color refer to the online version of this figure)

    The redundant functions are usually caused by weak semantic interfaces.Each layer prefers to design independent functions toward its own needs rather than co-design with other layers.Therefore,optimizing the software stack design,breaking through the multi-layer semantic isolation,and reducing the redundant or even conflicting functions overhead,are important methods to improve the performance.

    A direct and effective way to eliminate the redundant functions of multiple layers is bypassing the intermediate coupling layer.For example,FlashKV accesses flash memory directly bypassing the embedded FTL and file system.It separates its FTL into two parts,a thick open-lib in the user space codesigned with KV stores and a thin open-channel driver in the kernel space (Fig.4c);these interact with each other through an ioctl interface.The openlib implements most of the functions,such as GC and I/O scheduling,to better meet the feature of applications,and the open-channel driver implements only simple functions that are deeply related to the hardware.This design allows FTL to know more about the application behaviors while leveraging the OCSSD features better,which can reduce the overhead of multi-layer management.

    Another typical example is DIDACache (Shen et al.,2017),which co-designs the KV cache with user-space FTL and eliminates unnecessary functions by bypassing multiple intermediate layers to bridge the semantic gap.This co-designed and bypassed approach can make the KV cache manager directly drive the underlying flash devices and bring many benefits,such as directly mapping the “keys”to physical flash pages to reduce the redundant mapping cost,eliminating multi-layer redundant GCs to reduce the processing overhead,and sensing the real-time workloads to adaptively tune the OP space size.These optimizations bring DIDACache a high throughput and a low latency while removing unnecessary erase operations.

    The bypass method is an effective way to eliminate redundant functions;however,due to the complexity of the computer system architectures and the benefits of low-coupling design in implementation,the multi-layer software stack will not die and not every design can bypass the middle software layers.It is still a challenge to bridge the semantic gap and reduce the redundant functions while keeping independent design and implementation.This involves not only the design of FTL but also the collaborative design of other software layers.

    4.Kernel-device

    The fourth FTL placement strategy is still leaving a simple FTL on the device side,which performs only hardware-closely-related functions,such as ECC and WL,to reduce additional interaction overhead but perform other functions on the host side,such as OCStore(Fig.4d).OCStore co-designs FTL functions with the object storage system in the kernel space and removes redundant functions across multiple software layers by directly managing the flash memory,which makes OCStore better understand the software behaviors and reduce additional performance overhead.

    In addition to the FTL placement rules,different mapping granularities and strategies will affect the FTL performance and management overhead.Page-level mapping adopted by LightNVM can reduce WA and provide fine-grained address management.However,as the capacity increases,the mapping table consumes the host memory excessively and is hard to be loaded entirely in memory.

    To address this,AMF (Lee S et al.,2016)and MT-Cache (Oh and Ahn,2021) adopt blocklevel mapping to reduce mapping tables.MT-Cache further employs a selective mapping table loading scheme that loads only part of the mapping table into the host memory to reduce memory footprint and designs a cache replacement policy to ensure a low cache miss ratio,enabling a final benefit that pays only 32% memory consumption for 80% I/O performance.FlashKV introduces a more coarsegrained mapping,namely,super-block mapping,to reduce the scale of mapping entries,which can be entirely cached in host memory.However,the coarsegrained mapping will produce WA and write latency under small writes,and choosing a fine-or coarsegrained mapping should be considered according to workloads and parallelism design.

    4.3 Internal parallelism exploitation

    The exposure of the internal parallelism of OCSSDs provides external software great opportunities to fully exploit the potential performance.Existing works (e.g.,Wang P et al.,2014;Zhang JC et al.,2017) focus mainly on channel-level parallelism;they have achieved performance goals such as high throughput and efficient resource utilization,and some of them (e.g.,Wang HT et al.,2018) also leverage the independent channels to solve specific bottleneck problems such as strong performance isolation.There are three main challenges to exploiting the internal parallelism: (1) How to ensure that the semantic data grouping and data order are not shuffled? (2) How to dispatch data to ensure a wide parallel stripe and high utilization of PUs? (3) How to design a good data placement for better using different levels of parallelisms? To address these challenges,existing works take a range of striking approaches.

    1.Independent channel parallelism

    SDF rationally exploits the channel-and planelevel parallelism by the shape of their large write workloads.As shown in Fig.6a,it treats each channel as an independent device (44 channels from/dev/sda0to/dev/sda43) and each channel has its own FTL engine (“SSD Ctrl” in the figure).SDF deploys multiple threads to different channels to achieve high concurrency.

    Fig.6 Four instances of exploiting OCSSD’s internal parallelism: (a) SDF;(b) FlashKV;(c) ParaFS;(d)AOCBLK.Chan: channel;Blk: block;D: dimension;PU: parallel unit;OCSSD: open-channel solid-state drive (References to color refer to the online version of this figure)

    LOCS(Wang P et al.,2014)uses SDF as a fast storage device to improve the performance of userspace LSM-tree-based KV store.The advantage of using SDF is that it enables the extended LevelDB to schedule I/O operations to the 44 channels.To exploit SDF’s channel-level parallelism,LOCS uses multiple immutable MemTables instead of one to store more incoming data,which allows write requests to be issued concurrently.Additionally,each flash channel is assigned to a single I/O request queue,which can reduce queuing latency of multiple requests and effectively leverage channel concurrency.

    Independent channel management facilitates the strong performance isolation we discussed in Section 3.2.4;it also contributes to SSD virtualization,which can provide isolated virtual SSDs for tenants on dedicated channels (Huang et al.,2017).BlockFlex (Reidys et al.,2022) even enables dynamic storage harvesting with independent channels in modern cloud platforms to maximize the storage utilization while ensuring performance isolation between virtual machines.However,this management is friendlier to isolated objects,and most of the time,we need to consider multi-channel co-parallelism according to the hotness/locality of workloads.

    2.Coarse-grained striping

    Coarse-grained striping not only reduces address-mapping management overhead but also facilitates channel-level parallelism.For example,FlashKV implements the KV store by proposing a parallel data layout that adopts super-block-level mapping to exploit the internal parallelism while keeping low metadata overhead.As shown in Fig.6b,the super-block has the same size as SSTables,which renders it easy to manage the SSTables and has fewer mapping entries.Flash blocks within the same super-block can be accessed easily in parallel to fully leverage the channel-level parallelism because they are from different channels.

    However,LOCS may cause serious channel contention due to its coarse-grained striping method,and FlashKV’s large SSTables may induce high tail latency when doing KV writes.KVSSDs (Wu et al.,2021)integrate LSM trees and the FTL at the device side and introduce a fine-grained dynamic striping policy to exploit the internal parallelism of SSDs.However,KVSSDs are evaluated only in a simulated environment and are hard to integrate into real SSDs.We suggest that KVSSDs should integrate LSM trees with a host-side FTL,which may benefit more from the exposed internal parallelism of OCSSDs.

    3.Channel-level parallelism with hotness grouping

    F2FS fails to fully exploit the internal parallelism of flash devices due to the semantic isolation caused by the built-in FTL.ParaFS absorbs the good points of F2FS but uses OCSSDs as flash devices.ParaFS designs a 2D data allocation mechanism,namely,channel-level dimension and hotness-level dimension,to fully use flash channels while keeping effective hot/cold data grouping (Fig.6c).In the channel-level dimension,the write data are divided into pages,which are striped to different channels,and can be persistent in flash memory in parallel.By doing so,the allocation process turns to the hotness-level dimension that groups pages with different hotness(in the figure,classifying the yellow pages and blue pages as different groups) and assigns these groups to different allocator heads with the same hotness.This parallel mechanism makes ParaFS fully leverage the channel-level parallelism while keeping an efficient GC.

    4.Fine-grained parallelism

    AOCBLK (Zhang XY et al.,2021) uses a finegrained data placement to exploit PU-level parallelism.In the data layout,chunks in different PUs with the same index are managed as a band that is orthogonal to the PUs (Fig.6d).Data are written sequentially to a band;each band has a header to record some unique numbers for consistency checks and an epilog at the end to store metadata.AOCBLK further adopts a subpage-based(e.g.,16 KB) data interleaving placement policy to better exploit the internal parallelism in PUs.Compared with traditional page-based(e.g.,96 KB)data placement,this fine-grained policy can better leverage the advantages of parallelism.

    Although many works have tried to explore the internal parallelism of SSDs,Qiu et al.(2021) concluded that multi-level interleaving and cache mode pipelining were still not well exploited.They proposed an open-way NAND flash controller (oNFC)to support all the four parallelisms and a dual-level command scheduler that is integrated into the NFC to enable fine-grained plane-level interleaving.Evaluation results show a 93%theoretical bandwidth and 1.9–3.1 times speedup on average in page reading and programming latencies,respectively.

    4.4 I/O scheduling optimization

    Due to the black-box design,SSDs lack semantic information of upper systems.Weak upperlevel scheduling policies may lead to hungry reads and busy erases,resulting in long-tail latencies and performance decline.Although OCSSDs expose their physical geometry,how to design rational I/O scheduling policies according to I/O features is still a challenge.

    1.For overall throughput

    To improve the I/O throughput,LOCS designs four dispatching policies according to I/O requests from LevelDB.When write requests are dominant,the round-robin dispatching policy is selected,which can assign the write requests evenly to all the channels to ensure the load balance of each channel(Fig.7a).The advantage of this policy is its simplicity;however,when read/erase requests are intensive (both are required to be specified to specific channels),this policy will not work well,and the least weighted-queue-length write dispatching policy will be adopted (Fig.7b).This policy maintains a weighted length table and assigns different weights to the read/write/erase requests to predict the processing latency in channels.Using the weighted queue length,the channels become more balanced but not friendly to the read requests.Based on these dispatching policies,LOCS further considers the dispatching optimization for write compaction and erase operation,which delays the processing of erase requests,and schedules them until there are enough write requests (Fig.7c).This optimization is based on the idea that the erase process is not on the critical path and that write requests can balance the queue length.Evaluation results show that adopting the dynamic dispatching policy can better improve the overall throughput.

    Fig.7 I/O scheduling policies in LOCS and FlashKV:(a) round-robin dispatching;(b) least weightedqueue-length;(c) scheduling optimization for erase;(d)priority-based scheduling.I/O:input/output;Op:operation;Chan: channel;Pri: priority

    ParaFS adopts a parallelism-aware scheduling to schedule I/O requests in two phases under heavywrite workloads.In the dispatching phase,as shown in Fig.8a,read and write requests are assigned different weights(Wreadand Wwriteare set to“1” and“8,”respectively),and a channel with the smallest calculated weight will be considered as the least busy one,which will be selected to dispatch the write request.In the scheduling phase,time is sliced evenly between read and write/erase requests to ensure predictable read latency.In the write/erase time slice,an algorithm is used to schedule write/erase requests.As shown in Fig.8b,the algorithm considers the used space,free space (parameter f),and the percentage of processing the erase requests(parameter Ne).When e is larger than 1,the scheduler will send write requests in the corresponding channel;otherwise,it sends erase requests.This balanced scheduling helps ParaFS achieve more consistent performance.

    Fig.8 The two scheduling phases in ParaFS: (a) dispatching phase;(b) scheduling phase.Chan: channel

    2.For read throughput

    Read requests may be blocked by write requests,which will cause severe read throughput degradation.FlashKV designs an adaptive scheduling mechanism to dispatch read/write requests in compaction procedure by adopting two compaction threads.When in write-intensive workloads,it uses full write parallelism to accelerate the compaction,while in read-intensive workloads,the compaction threads will limit write requests to help improve the read throughput.To better optimize the read operations,FlashKV separates the read requests into clients’ reads and compaction reads.Clients’ reads are always small and discontinuous and can be managed in pages,while compaction reads are always large and localized and can be managed in batches.Moreover,as seen in Fig.7d,FlashKV designs a priority-based scheduler (a smaller number means a higher priority) that can execute high-priority requests earlier according to the response time or the available capacity;e.g.,read requests are executed before write requests and foreground requests before backgrounds,which can ensure better performance and user experience.

    3.For latency

    The transaction is latency-sensitive;however,the slowest I/O request determines the latency of the transactions.OCStore designs a transaction-aware scheduling policy with three strategies to achieve low transaction latency and stable performance.The first strategy is balancing the stripe width(the number of parallel channels)and stripe length(the number of pages in one channel)to reduce latency jitter.The number of channels is related to the parallelism,while the latency affects the overall stability of the performance in one channel,which means that the larger the stripe width,the higher the parallelism and the greater the performance jitter.The second is finding the lightest-load (using an estimated execution time) channel to dispatch the transaction requests to maintain a balanced latency.The third is assigning I/O requests in each channel the same time slices and setting a deadline for each request to make sure that every request is processed in time to avoid long tail latency.However,this strategy may not be friendly to the reads with high priority;although it can achieve stable performance,the read performance is limited.

    Although the above-mentioned works can provide rational I/O scheduling strategies,they still suffer from frequent synchronous I/O overheads under partial-page writes and small hot/cold data grouping,which will incur severe WA and performance degradation.Therefore,Lu et al.(2019a) proposed a flash-friendly data layout and designed StageFS to provide balanced I/O scheduling with content-and structure-separated data updating.StageFS introduces a staging phase to absorb synchronized writes and a patching phase to lazily perform data allocation and grouping,especially in the staging phase.Small writes are appended using byte-addressable record units rather than page units to reduce WA.Evaluation results show that StageFS can improve performance by up to 211.4%and achieve lower GC overhead under frequent I/O synchronous workloads.

    4.5 Garbage collection optimization

    GC is an essential but expensive function for flash devices since they do not support in-place updating and must erase the original location before writing data.Due to the serious asymmetry of write and erase granularities,unreasonable small random writes and data assignments may cause inefficient GC operation,which may incur severe I/O delay and valid page migration overhead,resulting in serious WA,performance jitter,and long tail latency.Therefore,changing the write granularity,reasonably using data locality,reducing unnecessary erase operations,and so on are effective ways to optimize the GC mechanism.

    1.Data locality

    Writing data in a log-structured way can not only increase write throughput but also reduce the effects of random writes to achieve higher GC effi-ciency.To eliminate interruption of sequential writes by random writes,SSW proposes strictly sequential writing that can differentiate random requests from sequential requests by checking the metadata.SSW caches the sequential writes and random writes into different buffers and then dispatches them to different channels for concurrency while keeping them separately(as shown in Fig.9a,0–11 are identified as sequential writes and 13–16 as random writes).From the figure,we can see that SSW tends to maintain the data locality in the same flash block,which is beneficial for GC and can reduce valid page migration to improve the flash lifetime,but the channel-level parallelism is not well exploited and the data stripe is limited.

    Fig.9 Data placement strategies in SSW (a) and AMF (b).Chan: channel

    Different from SSW,AMF introduces a block I/O abstraction to dispatch the data to each channel for high parallelism (Fig.9b).It can achieve a high stripe bandwidth but will incur inefficient GC operations.To fix this,AMF increases the number of inode-map segments to facilitate inode-map blocks to have more invalid data before being selected as victims;meanwhile,AMF separates the hot-data blocks from cold-data segments to improve the GC efficiency by identifying data features.

    2.Hotness grouping

    Separating data according to their access/update frequencies will leave data with the same hotness in the same group,which brings two advantages.First,data with the same hotness can be striped to different channels in parallel and accessed concurrently(Fig.6c).Second,we can distinguish the data lifetime by identifying their hotness,meaning that data with the same hotness always have the same life span and can be assigned to the same flash block to reduce the migration of valid pages in GC processing.This hot/cold data grouping can significantly affect the GC performance.

    Take Ext4 journaling file systems as an example;journal and file data usually have different hotness.In most cases,the journal may be deleted after the corresponding file data are written;recording them in the same block may increase extra GC overhead.However,SSDs find it hard to separate journals from file data.Thus,Son and Ahn (2021) proposed a host-side FTL to distinguish the two data types and reduce the GC overhead by writing journal pages to separate blocks.However,this scheme is too simple to identify the hotness of file data and improves only limited GC performance.

    In recent years,multi-stream SSDs (Bhimani et al.,2017)have been developed to write data with the same or similar lifetime to the same erase unit according to the stream IDs,which can greatly improve the efficiency of GC and reduce the WA.However,due to the unawareness of host data features,multi-stream SSDs need to rely on the lifetime-based stream IDs identified and generated by the host side.Additionally,multi-stream SSDs are more like a compromise scheme between SSDs and OCSSDs;they are not as efficient as OCSSDs because the OCSSD FTL can directly identify the data lifetime and group data into channels for parallel writing.

    3.Write granularity

    ParaFS also designs a large write segment whose size and address are aligned to the flash block for both memory allocation and GC.This equivalent design can avoid valid page migration in a victim block to reduce WA and GC overhead.SDF adjusts its write granularity to reduce the GC overhead according to its large write workloads.It improves the write granularity to be several times larger than the erase size and makes the write units align with the erase blocks.The benefit is obvious: during GC,one block is impossible to contain both valid and invalid data pages,so no extra valid pages need to be migrated and the WA is eliminated.However,this strategy is not friendly to small writes.

    4.Coordinated GC mechanism

    F2FS uses a foreground/background coordinated GC mechanism to improve its overall GC effi-ciency.The foreground GC adopts the greedy policy which selects a section with the fewest valid blocks to minimize the time latency,and the background GC adopts the cost-benefit policy,which considers both the number of valid blocks in a section and the lifetime of the section.The foreground GC will be triggered when there is not enough free space,while the background GC is woken up periodically by the kernel.ParaFS adopts the similar mechanism at both the file system level and the FTL level to better optimize the GC efficiency.It employs several greedy foreground threads (one thread to each channel)for write-intensive workloads to do block recycling quickly and assigns a manager thread to wake up these GC threads as necessary.The cost-benefit background thread is triggered when the file system is idle.Both the foreground and background GCs in the file system migrate only the valid pages and mark the victim blocks as erasable;it is the GC in the FTL that directly does the erasing according to the trim commands from the file system to avoid additional copies.This coordinated GC in ParaFS improves the GC efficiency and also reduces performance jitter.

    5.GC-free optimization

    A well-designed fine-grained GC mechanism on OCSSDs can bring significant benefits for latency.TTFLASH (Yan et al.,2017) introduces four key strategies to remove GC blocking at all software levels to achieve low tail latency.First,TTFLASH adopts a plane-blocking GC (Fig.10b) rather than channel-blocking GC (Fig.10a) to ensure that the GC is exerted only on the affected planes using an intra-plane copyback mechanism,and that the channel can continue to provide I/O service for other non-GC planes.Second,by leveraging the idea of redundant array of independent NAND (RAIN),TTFLASH combines parity-based redundancy with GC operations,called GC-tolerant read (GTR),which can proactively regenerate the blocked readpage content by reading the parity from another plane to eliminate the waiting for GC completion.For example,in Fig.10c,if page 2 cannot be fetched during the GC,its page content can be regenerated by XORing pages 0,1,and the parity page,which is several orders of magnitude lower than waiting for GC completion.In addition,TTFLASH designs a rotating GC strategy to enforce that in each plane group,only one GC is performed at one plane at a time (planes with page 2/4/8 in Fig.10c),which brings a zero GC blocking overhead in plane-level parallelism.Finally,TTFLASH introduces a GCtolerant flush (GTF) strategy to use the cap-backed random access memory (RAM) to cache the writes quickly and later flush them to flash pages at a GCtolerant proper time.GTR and rotating GC ensure that the page eviction is not blocked by GC,which makes GTF to flush N-1 pages directly to flash media in every N pages per stripe.These strategies bring a nearly GC-free benefit and significantly reduce the average and P99 latencies.However,TTFLASH needs the support of intra-SSD copy-page command and dedicates one channel for parity.

    Fig.10 Key strategies in TTFLASH: (a) channelblocking GC;(b) plane-blocking GC;(c) GC-tolerant read/flush &rotating GC.GC: garbage collection;Chan: channel

    6.GC copyback scheme

    The GC scheme at the host side brings many benefits while reducing the device overloads.However,it may be frequently triggered under heavy write workloads and may severely take up host dynamic RAM (DRAM) bandwidth and space(Fig.11a).To reduce this consumption,AOCBLK adopts a GC copyback scheme that deploys the GC buffer in the OCSSD device and exposes its address to the host side.By designing two copyback read/write commands,the host side needs only to send read/write commands,which takes nearly zero bandwidth on the device side;it is the OCSSD device that controls the valid data copying from flash memory to GC buffer or writing back from GC buffer to flash memory (Fig.11b).Meanwhile,AOCBLK adopts a quota-based GC policy to coordinate the host and GC write speed.In this way,AOCBLK significantly reduces the impact of GC on host I/O performance.However,the in-device GC requires expensive device resources and may bring a great burden to the device.

    Fig.11 The GC copyback scheme of AOCBLK:(a) GC without copyback;(b) GC with copyback.GC: garbage collection;FTL: flash translation layer;CMD: command;OCSSD: open-channel solid-state drive

    7.Postponed GC strategy

    The GC in 3D NAND flash is different from that in the 2D case,which needs to pay more attention to thermal issues while keeping high efficiency.ThermAlloc (Wang Y et al.,2019)proposes a postponed GC strategy to achieve low heat production and high efficiency.It maps the logical block into three types of physical blocks(PBlk,RBlk,and BBlk)with enough distance,which brings the benefit that GC can be done in different physical areas in parallel,and that the heat can be scattered in different chips.It also designs an algorithm to postpone GC,which is triggered only when the RBlk runs out of space,and a temperature-aware strategy to decide where to place the write requests.

    The works above describe five main methodologies to improve the performance of an individual OCSSD.The overall achieved main optimization goals can be seen in Table 4.In addition,it is meaningful and challenging to optimize the flash disk array in coordination using OCSSDs as the devices.

    Table 4 Main goals achieved by various optimization methods

    Existing works on traditional SSD-based RAID(redundant array of independent disks)systems have made a lot of optimizations for flash arrays,such as FusionRAID (Jiang et al.,2021),and achieved high consistency and low latency on commodity SSD arrays.However,due to the FTL isolation,it cannot fully take the advantages of flash disks and may cause unstable throughput and latency under heavy workloads.SOFA (Chiueh et al.,2014) proposes a log-structured flash array architecture that abstracts the FTL on top of the disk array management logic layer to reduce redundant software overhead;it also leaves a set of on-disk controllers to work with the host-side FTL (Fig.12),so that all raw flash disks can be scheduled by the same FTL.Under the management of this global FTL,SOFA can reasonably schedule all the flash resources to prevent some disks from being worn out earlier than others and thus achieving global WL and load balance.The global FTL also allows SOFA to do global GC and small random writing aggregation while supporting I/O aggregation and distributed logging.All these optimizations significantly improve the performance and resource utilization of flash arrays.

    Fig.12 FTL abstraction in SOFA.FTL: flash translation layer;OCSSD: open-channel solid-state drive

    IODA(Li et al.,2021)leverages the“PLM(predictable latency mode) windows” notion (NVM Express,Inc.,2023),which alternates the working time of flash devices between predictable and busy windows and further takes this concept into the design of flash arrays.As shown in Fig.13,the working time is equally divided into the time window (TW).In the predictable TW,IODA guarantees that the device can do only reads or writes and no GCs will be triggered,which can ensure a strong predictable performance.In the busy TW,the device can perform GCs to ensure enough space for writes later.IODA alternates the busy/predictable TWs and guarantees that at most one device is in the busy status to execute the GC process and the others can still provide strong predictable performance.However,there is a strict constraint that the GC must be controllable,which needs OCSSDs to guarantee the deterministic GC and implement the predictable performance in the real flash array scenarios.

    Fig.13 Time window mechanism in IODA.OCSSD:open-channel solid-state drive

    In practice,the above methods are often related to each other,since any individual optimization method cannot fully exert the performance advantages of OCSSDs,and researchers often adopt a combination of multiple methods to maximize the benefits.For example,LOCS adopts channel-level parallelism and I/O scheduling policies to fully leverage OCSSD’s features,thus increasing the throughput by more than four times.Based on the exploitation of internal parallelism,ParaFS adopts a hot/cold data grouping layout,a parallelism-aware scheduling policy,and a foreground/background coordinated GC mechanism to achieve high throughput.We should fully leverage OCSSD’s advantages to achieve optimization goals by rationally using these optimization methods coordinately according to system and workload features in domain-specific scenarios.

    5 Challenges and future directions

    5.1 Challenges

    OCSSDs have great advantages over SSDs in terms of both performance and customizability due to their openness.However,the built-in FTL moving up to the host side and the exposure of the physical layout also bring some challenges,limiting the further research and application of OCSSDs.

    1.Complexity and nonversatility of the FTL design

    Although the host-side FTL can bring many advantages,such as providing rich functions,codesigning with host applications,and achieving higher performance,there are still three big challenges that should be faced.First,designing a hostside FTL is complicated;the developers must consider the design of the I/O interface,the behaviors of applications,and also the physical features of the raw flash memory to replace the already robust deviceside FTL,which requires a high designing ability for them.Second,a host-side FTL requires high design,maintenance,and management prices,and may not be stable as the device-side FTL.Third,a host-side FTL is customized for host-side software;it is not designed for a generic purpose,which makes it hard to generalize to other software in practice.As far as we know,there is no universal host-side FTL for real business.

    2.Lack of standards

    There have been two versions of standards for OCSSDs,but sadly they both have failed.Until today,OCSSDs still have no universal standards in the storage industry (Picoli et al.,2020).On the other hand,the open-channel specification in the context of LightNVM simply defines the most universal part of OCSSDs;the other parts that may involve sensitive content and are inconvenient to be made public are implemented by manufacturers and vendors.These fragmented implementations introduce different methods,making it difficult to unify the standards.The lack of standards severely affects the development of OCSSDs,which is an urgent challenge to be addressed in the future.

    3.Flexibility,security,and universality

    Yang LH et al.(2019) believed that SSDs’ internal physical layout should not be opened to external users,which would lead to unsecured access and hurt their availability and flexibility.OCSSDs are always co-designed with external applications to achieve higher performance,and thus it is difficult to directly apply them to other applications.Moreover,changes in the design of the upper software may cause co-changes of OCSSDs.All these make OCSSDs to have low flexibility and extensibility.On the other hand,for academic research,OCSSDs’openness is conducive to innovation and technology breakthroughs,but for engineering implementation,this openness may bring more design and maintenance risks,and a small mistake of the developers may cause significant security problems.Besides,due to the high complexity,low flexibility,and expensive nature of software development,OCSSDs are studied mainly in academia and enterprise-grade large-scale systems,mostly in domain-specific scenarios.Although LightNVM promotes the generic support of OCSSDs from the Linux system,and liblightnvm (González et al.,2016) provides a userspace library to facilitate developers’ use of Light-NVM,it still needs to make effort to make OCSSDs more universal in practice than in research.

    5.2 Future development

    In recent years,with the rise and development of NVMe,complex address transformations in OCSSDs could be avoided by zoned namespace (ZNS)(Han et al.,2021),which provides LBAs instead of PPAs,and the SSDs’internal parallelism is exploited well.NVMe SSDs can achieve better performance such as lower latency and higher I/O throughput,and multi-stream SSDs can significantly improve GC efficiency and reduce WA.Although their performance is distant to that of OCSSDs,they earn a great application prospect for their standard and semi-open design,which can not only provide functional interfaces that need to be opened to external software but also hide access that degrades security and increases complexity.NVMe SSDs have been able to replace OCSSDs as underlying storage devices in many specific scenarios.

    Despite facing many huge challenges,we should see that the open nature of OCSSDs makes them a better prospect and has attracted many scholars and developers to dedicate their works on them.OCSSDs allow upper-level systems and applications to customize their functions according to their own needs,and users in different scenarios can have more choices to leverage OCSSD’s features for optimal performance and specific goals.On the other hand,the continuous improvement of general file systems such as LightNVM may enable the development of OCSSDs in more application scenarios.Developers should choose the most appropriate ones according to their specific needs.

    OCSSDs should learn from NVMe SSDs and form their own standards and norms to better promote their research and application.Future research efforts,especially in domain-specific areas,need to take all factors,including standards,universality,products,and performance into consideration,which can further promote the development of OCSSDs.

    6 Conclusions

    By moving the FTL to the host side and codesigning with host-side software,OCSSDs enable SSDs’ inherent characteristics and functionalities to be fully controlled,thereby achieving better performance such as higher throughput,lower latency,and better resource utilization.These advantages provide OCSSDs more opportunities in many domainspecific scenarios.Existing studies give us many insightful ideas such as rich FTL co-design at different software layers,full parallelism exploitation at different levels of parallelism,rational I/O scheduling policies to improve performance,and efficient GC mechanism to reduce performance overhead and long tail latency.These studies make OCSSDs far better than SSDs in many QoS-and latency-sensitive scenarios and provide them a positive reference and guidance for future research.However,some major challenges remain unresolved,and some research issues are still open for further investigation.

    Contributors

    Junchao CHEN designed the research.Junchao CHEN and Junyu WEI drafted the paper.Guangyan ZHANG helped organize the paper.Junchao CHEN and Guangyan ZHANG revised and finalized the paper.

    Compliance with ethics guidelines

    Guangyan ZHANG is a guest editor of this special feature,and he was not involved with the peer review process of this manuscript.Junchao CHEN,Guangyan ZHANG,and Junyu WEI declare that they have no conflict of interest.

    国产熟女欧美一区二区| 啦啦啦啦在线视频资源| 成人无遮挡网站| 97热精品久久久久久| 国产亚洲av嫩草精品影院| 国模一区二区三区四区视频| 一本久久中文字幕| 日韩亚洲欧美综合| 欧美xxxx性猛交bbbb| 欧美色欧美亚洲另类二区| 亚洲国产精品国产精品| 婷婷亚洲欧美| 亚洲美女视频黄频| 丝袜美腿在线中文| 国产乱人偷精品视频| 国产精品国产高清国产av| 午夜福利在线在线| 狂野欧美激情性xxxx在线观看| 麻豆成人午夜福利视频| 国产激情偷乱视频一区二区| 悠悠久久av| 五月玫瑰六月丁香| 成人av在线播放网站| 亚洲激情五月婷婷啪啪| a级毛片a级免费在线| 亚洲欧美中文字幕日韩二区| 亚洲人成网站在线观看播放| 日产精品乱码卡一卡2卡三| 欧美性猛交黑人性爽| 简卡轻食公司| 日韩欧美在线乱码| 精品人妻视频免费看| 狂野欧美激情性xxxx在线观看| 国产精品美女特级片免费视频播放器| 亚洲精华国产精华液的使用体验 | 国模一区二区三区四区视频| 丝袜喷水一区| 国产熟女欧美一区二区| 久久中文看片网| av卡一久久| 久久精品国产99精品国产亚洲性色| 久久亚洲国产成人精品v| 亚洲av中文av极速乱| 淫秽高清视频在线观看| 熟女人妻精品中文字幕| 尾随美女入室| 久久精品国产自在天天线| 精品免费久久久久久久清纯| 蜜桃亚洲精品一区二区三区| 亚洲经典国产精华液单| 99热只有精品国产| 国产精品一区www在线观看| 亚洲欧美日韩高清在线视频| 中文字幕av成人在线电影| 国产精品三级大全| 欧美色欧美亚洲另类二区| 久久亚洲国产成人精品v| 国产极品天堂在线| 亚洲av熟女| 成人亚洲精品av一区二区| 岛国毛片在线播放| 国产高清激情床上av| 免费搜索国产男女视频| 国产又黄又爽又无遮挡在线| 小说图片视频综合网站| 波多野结衣高清作品| 亚洲最大成人手机在线| 国产av麻豆久久久久久久| 一进一出抽搐gif免费好疼| 久久精品影院6| 精品一区二区三区视频在线| 亚洲欧美成人精品一区二区| 人人妻人人澡欧美一区二区| 色播亚洲综合网| 国产一级毛片在线| 黄色配什么色好看| 日韩中字成人| 亚洲国产欧美人成| 99久久无色码亚洲精品果冻| 久久韩国三级中文字幕| 午夜爱爱视频在线播放| 99热这里只有精品一区| 亚洲三级黄色毛片| 日韩,欧美,国产一区二区三区 | av福利片在线观看| 日韩欧美 国产精品| 色噜噜av男人的天堂激情| 亚洲av成人av| 成人特级av手机在线观看| 网址你懂的国产日韩在线| 婷婷亚洲欧美| 欧美最新免费一区二区三区| 久久99蜜桃精品久久| 国产av不卡久久| 亚洲精品影视一区二区三区av| 人妻少妇偷人精品九色| 老师上课跳d突然被开到最大视频| 国产成人freesex在线| 国内精品美女久久久久久| 日韩精品青青久久久久久| 少妇丰满av| 国产精品免费一区二区三区在线| 久久久久久久午夜电影| 国产黄色小视频在线观看| 美女内射精品一级片tv| 欧美xxxx性猛交bbbb| 久久精品国产亚洲av天美| av专区在线播放| 久久久久久久久久久免费av| 久久久久久九九精品二区国产| 国内揄拍国产精品人妻在线| 久久久久久久午夜电影| 亚洲五月天丁香| 久久99热6这里只有精品| 国产高清三级在线| 亚洲欧美日韩东京热| ponron亚洲| 欧美精品国产亚洲| 欧美丝袜亚洲另类| 一级黄色大片毛片| a级毛片a级免费在线| 男人狂女人下面高潮的视频| 国产精品久久久久久久电影| 久久精品影院6| 国产麻豆成人av免费视频| 在线播放无遮挡| 久久人人精品亚洲av| av视频在线观看入口| 亚洲国产精品国产精品| 久久久午夜欧美精品| 天美传媒精品一区二区| 观看免费一级毛片| 亚洲激情五月婷婷啪啪| 18禁在线无遮挡免费观看视频| 男女那种视频在线观看| 亚洲av二区三区四区| 国产午夜精品久久久久久一区二区三区| 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 91aial.com中文字幕在线观看| 久久久久久伊人网av| 青春草亚洲视频在线观看| 精品欧美国产一区二区三| 天堂影院成人在线观看| 久久精品国产亚洲av涩爱 | 免费观看精品视频网站| 免费看光身美女| 一进一出抽搐gif免费好疼| 又粗又硬又长又爽又黄的视频 | 欧美高清性xxxxhd video| 中文字幕av在线有码专区| 全区人妻精品视频| 欧美性感艳星| 天堂网av新在线| 99国产极品粉嫩在线观看| 欧美在线一区亚洲| 欧美变态另类bdsm刘玥| 日韩欧美一区二区三区在线观看| 99热这里只有是精品在线观看| 岛国毛片在线播放| 97在线视频观看| 欧美成人精品欧美一级黄| 九九在线视频观看精品| 一级二级三级毛片免费看| 亚洲久久久久久中文字幕| 国产午夜福利久久久久久| 中文字幕熟女人妻在线| 日韩欧美国产在线观看| 国产午夜精品久久久久久一区二区三区| 天天躁日日操中文字幕| 国产伦精品一区二区三区四那| 97人妻精品一区二区三区麻豆| 国产成人a∨麻豆精品| 国产成人精品一,二区 | 精品久久久噜噜| 日日啪夜夜撸| 久久午夜亚洲精品久久| 伦理电影大哥的女人| 99热只有精品国产| 亚洲av不卡在线观看| 亚洲中文字幕日韩| 岛国在线免费视频观看| 国内精品久久久久精免费| 成人毛片a级毛片在线播放| 久久精品久久久久久噜噜老黄 | 色视频www国产| 中文在线观看免费www的网站| 亚洲成av人片在线播放无| 边亲边吃奶的免费视频| 美女黄网站色视频| 日韩高清综合在线| 精品99又大又爽又粗少妇毛片| 欧美激情国产日韩精品一区| 久久精品人妻少妇| 天堂√8在线中文| 国产精华一区二区三区| 国产乱人偷精品视频| 免费av不卡在线播放| 国产高清激情床上av| 久久6这里有精品| 国产单亲对白刺激| 日韩成人伦理影院| 亚洲自偷自拍三级| 国产单亲对白刺激| 久久精品人妻少妇| 99久久精品国产国产毛片| av在线亚洲专区| 丰满人妻一区二区三区视频av| 秋霞在线观看毛片| 不卡一级毛片| av卡一久久| 午夜视频国产福利| 亚洲自偷自拍三级| 国产精品永久免费网站| 亚洲国产精品sss在线观看| 国内精品宾馆在线| 国产精品1区2区在线观看.| 久久久久久国产a免费观看| 欧美高清性xxxxhd video| 国产乱人视频| 亚洲精品久久国产高清桃花| 国产伦在线观看视频一区| 国产精品,欧美在线| 美女高潮的动态| 国产精品.久久久| 色综合站精品国产| 国产精品麻豆人妻色哟哟久久 | 国产精品一区二区性色av| 国产高清有码在线观看视频| 日韩一区二区视频免费看| 热99在线观看视频| 欧美丝袜亚洲另类| 不卡一级毛片| 亚洲高清免费不卡视频| 亚洲精华国产精华液的使用体验 | 午夜免费男女啪啪视频观看| 国产精品一区二区三区四区久久| 国内久久婷婷六月综合欲色啪| 一区二区三区高清视频在线| 一级毛片电影观看 | 卡戴珊不雅视频在线播放| 久久精品国产亚洲av天美| 国产乱人视频| 禁无遮挡网站| 波野结衣二区三区在线| 亚洲av第一区精品v没综合| 黄色欧美视频在线观看| 我的女老师完整版在线观看| 啦啦啦啦在线视频资源| 欧美+亚洲+日韩+国产| 一个人观看的视频www高清免费观看| 日韩一区二区视频免费看| 久久99热这里只有精品18| 日韩一区二区三区影片| 亚洲成人久久性| 夜夜爽天天搞| 国产亚洲精品av在线| 国产一区亚洲一区在线观看| 在线播放国产精品三级| eeuss影院久久| 三级毛片av免费| 九九在线视频观看精品| 国产黄色小视频在线观看| 成人一区二区视频在线观看| 国产私拍福利视频在线观看| 久久精品国产亚洲av天美| 最好的美女福利视频网| 狂野欧美白嫩少妇大欣赏| 高清午夜精品一区二区三区 | 亚洲人成网站在线播放欧美日韩| 毛片一级片免费看久久久久| 亚洲欧美成人精品一区二区| 成人特级av手机在线观看| 观看免费一级毛片| 亚洲av男天堂| 精品日产1卡2卡| 2022亚洲国产成人精品| 国产美女午夜福利| 国产免费一级a男人的天堂| 亚洲精品影视一区二区三区av| av在线亚洲专区| 美女国产视频在线观看| 国产精品一区www在线观看| 亚洲色图av天堂| 精品久久久久久久久av| 欧美成人免费av一区二区三区| 国产免费一级a男人的天堂| videossex国产| 国产精品乱码一区二三区的特点| 日本免费a在线| 美女被艹到高潮喷水动态| 久久久a久久爽久久v久久| 97人妻精品一区二区三区麻豆| 三级毛片av免费| 精品熟女少妇av免费看| av卡一久久| 不卡视频在线观看欧美| 日本免费a在线| 久久久欧美国产精品| 波多野结衣高清作品| 高清在线视频一区二区三区 | 爱豆传媒免费全集在线观看| 国产又黄又爽又无遮挡在线| 高清日韩中文字幕在线| 久久精品国产亚洲av天美| 日本色播在线视频| 啦啦啦啦在线视频资源| 亚洲乱码一区二区免费版| 亚洲人与动物交配视频| 中文字幕av成人在线电影| 99久国产av精品国产电影| 一级黄色大片毛片| 一级毛片我不卡| 婷婷亚洲欧美| 99久久无色码亚洲精品果冻| 久久久久久久久久久免费av| 天天躁日日操中文字幕| h日本视频在线播放| 亚洲av第一区精品v没综合| 亚洲国产日韩欧美精品在线观看| 青春草亚洲视频在线观看| 看片在线看免费视频| 波多野结衣高清作品| 国产又黄又爽又无遮挡在线| 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 日韩人妻高清精品专区| 国产人妻一区二区三区在| 寂寞人妻少妇视频99o| 能在线免费看毛片的网站| 老熟妇乱子伦视频在线观看| 毛片一级片免费看久久久久| av福利片在线观看| 能在线免费看毛片的网站| 亚洲经典国产精华液单| 一区二区三区免费毛片| 插逼视频在线观看| 亚洲va在线va天堂va国产| 内射极品少妇av片p| 日韩欧美一区二区三区在线观看| 亚洲精品国产av成人精品| 蜜桃久久精品国产亚洲av| 岛国毛片在线播放| 国产一区亚洲一区在线观看| 99热全是精品| 久久久久久久久久久免费av| av又黄又爽大尺度在线免费看 | 毛片一级片免费看久久久久| 成人午夜精彩视频在线观看| 久久久成人免费电影| 国产v大片淫在线免费观看| 男人舔奶头视频| 亚洲四区av| 国产高清不卡午夜福利| av在线播放精品| kizo精华| АⅤ资源中文在线天堂| 亚洲图色成人| 欧美+亚洲+日韩+国产| 一区二区三区免费毛片| 亚洲成av人片在线播放无| 我的女老师完整版在线观看| 美女高潮的动态| 亚洲欧美精品自产自拍| 日本爱情动作片www.在线观看| 国产日韩欧美在线精品| 91av网一区二区| 亚洲国产色片| www.av在线官网国产| 色综合站精品国产| av在线老鸭窝| 亚洲欧美成人综合另类久久久 | 国内精品久久久久精免费| 黄片wwwwww| 久99久视频精品免费| av在线蜜桃| 国产淫片久久久久久久久| 悠悠久久av| 在现免费观看毛片| 日韩av不卡免费在线播放| 最近的中文字幕免费完整| 久久精品人妻少妇| 久久精品久久久久久久性| 国产亚洲5aaaaa淫片| 99热这里只有是精品在线观看| 免费看a级黄色片| 久久精品久久久久久久性| 欧美日韩在线观看h| 婷婷色av中文字幕| 一区福利在线观看| 中文字幕人妻熟人妻熟丝袜美| 99热网站在线观看| 欧美一级a爱片免费观看看| 中文精品一卡2卡3卡4更新| 精品国内亚洲2022精品成人| or卡值多少钱| 两性午夜刺激爽爽歪歪视频在线观看| 一个人观看的视频www高清免费观看| h日本视频在线播放| 久久午夜福利片| 岛国毛片在线播放| 国产一区二区在线av高清观看| 少妇猛男粗大的猛烈进出视频 | 青青草视频在线视频观看| 又爽又黄a免费视频| av在线观看视频网站免费| 亚洲四区av| 看免费成人av毛片| 久久久久久国产a免费观看| 精品久久久久久成人av| 91aial.com中文字幕在线观看| 秋霞在线观看毛片| 国产成人精品久久久久久| 69av精品久久久久久| 人人妻人人看人人澡| 欧美bdsm另类| 日本撒尿小便嘘嘘汇集6| 三级国产精品欧美在线观看| or卡值多少钱| 国产av一区在线观看免费| 久久久久免费精品人妻一区二区| 99热网站在线观看| 青春草国产在线视频 | 两个人的视频大全免费| 99国产精品一区二区蜜桃av| 亚洲av中文字字幕乱码综合| 国产成人91sexporn| 99热6这里只有精品| 欧美激情久久久久久爽电影| 欧美精品国产亚洲| 欧美极品一区二区三区四区| 我的老师免费观看完整版| 九九热线精品视视频播放| 观看免费一级毛片| 日本免费一区二区三区高清不卡| 男女视频在线观看网站免费| 欧美+日韩+精品| 国国产精品蜜臀av免费| 国产亚洲av片在线观看秒播厂 | 精品免费久久久久久久清纯| 午夜福利视频1000在线观看| 国产伦一二天堂av在线观看| 亚洲国产欧美在线一区| 热99re8久久精品国产| 亚洲av二区三区四区| av在线天堂中文字幕| 国产色婷婷99| 最新中文字幕久久久久| 色视频www国产| 中国美女看黄片| 国产伦理片在线播放av一区 | 亚洲国产精品成人久久小说 | 日韩欧美精品免费久久| 国产精品国产高清国产av| 超碰av人人做人人爽久久| 亚洲人成网站在线观看播放| 久久中文看片网| 99热这里只有是精品50| 亚洲自偷自拍三级| 国产精品av视频在线免费观看| 麻豆精品久久久久久蜜桃| 此物有八面人人有两片| 毛片女人毛片| 久久精品91蜜桃| 性色avwww在线观看| 大香蕉久久网| 99九九线精品视频在线观看视频| 欧美变态另类bdsm刘玥| 91久久精品国产一区二区三区| 日韩一区二区视频免费看| 黑人高潮一二区| 亚洲国产精品国产精品| av免费观看日本| 人妻系列 视频| 国产精品综合久久久久久久免费| 国产成人影院久久av| 三级经典国产精品| 成人午夜高清在线视频| 国产精品乱码一区二三区的特点| 偷拍熟女少妇极品色| 午夜免费男女啪啪视频观看| 变态另类丝袜制服| 国产探花在线观看一区二区| 国产一区亚洲一区在线观看| 最近最新中文字幕大全电影3| 色综合亚洲欧美另类图片| 色尼玛亚洲综合影院| 男人舔女人下体高潮全视频| 免费看日本二区| 国产 一区精品| 国产精品一二三区在线看| 波多野结衣高清无吗| 久久精品夜色国产| 久久精品人妻少妇| 欧美xxxx性猛交bbbb| 亚洲精品乱码久久久久久按摩| 乱系列少妇在线播放| 久久久久久九九精品二区国产| 精品一区二区三区视频在线| 悠悠久久av| 男女下面进入的视频免费午夜| 亚洲成人久久爱视频| ponron亚洲| 欧美性感艳星| av在线亚洲专区| 最近视频中文字幕2019在线8| 亚洲人成网站高清观看| 久久亚洲精品不卡| 久久精品国产鲁丝片午夜精品| 亚洲综合色惰| 天堂影院成人在线观看| 久久精品国产亚洲av香蕉五月| 久久草成人影院| 日韩精品有码人妻一区| 直男gayav资源| 久久久久久久久大av| 亚洲精品乱码久久久v下载方式| 深夜a级毛片| 麻豆乱淫一区二区| 欧美又色又爽又黄视频| 国产中年淑女户外野战色| 亚洲av中文av极速乱| 少妇熟女欧美另类| 蜜臀久久99精品久久宅男| 日韩一区二区三区影片| 亚洲精品久久久久久婷婷小说 | 欧美精品国产亚洲| 精品国内亚洲2022精品成人| 有码 亚洲区| 国产午夜精品一二区理论片| a级毛片a级免费在线| 亚洲成av人片在线播放无| 深爱激情五月婷婷| 精品少妇黑人巨大在线播放 | 一级毛片我不卡| 国产一级毛片七仙女欲春2| 国产精品一二三区在线看| 91麻豆精品激情在线观看国产| 亚洲精华国产精华液的使用体验 | 国产乱人视频| 久久精品国产亚洲av天美| 色综合站精品国产| 在线a可以看的网站| 日本一二三区视频观看| 亚洲中文字幕一区二区三区有码在线看| 听说在线观看完整版免费高清| 亚洲欧美清纯卡通| 国产精品.久久久| 男女视频在线观看网站免费| 三级经典国产精品| 一边摸一边抽搐一进一小说| 亚洲中文字幕日韩| 精品欧美国产一区二区三| 日本免费一区二区三区高清不卡| 午夜激情福利司机影院| 国产精品福利在线免费观看| 蜜桃亚洲精品一区二区三区| 日韩精品有码人妻一区| 日韩欧美国产在线观看| 久久人人爽人人爽人人片va| 人人妻人人看人人澡| 国产黄色小视频在线观看| 成年版毛片免费区| 亚洲va在线va天堂va国产| av在线蜜桃| 少妇熟女欧美另类| av免费观看日本| 亚洲第一电影网av| 国产成年人精品一区二区| 国产欧美日韩精品一区二区| 久久久精品欧美日韩精品| 天天一区二区日本电影三级| 性插视频无遮挡在线免费观看| 人体艺术视频欧美日本| 在现免费观看毛片| 神马国产精品三级电影在线观看| 男人狂女人下面高潮的视频| 搡女人真爽免费视频火全软件| 国产乱人偷精品视频| 精品午夜福利在线看| 亚洲av中文字字幕乱码综合| 美女xxoo啪啪120秒动态图| 免费观看的影片在线观看| 午夜免费男女啪啪视频观看| 我要搜黄色片| 亚洲色图av天堂| 春色校园在线视频观看| 国产爱豆传媒在线观看| 青春草国产在线视频 | 国产成人福利小说| 色5月婷婷丁香| а√天堂www在线а√下载| 国产伦在线观看视频一区| 搡老妇女老女人老熟妇| 亚洲人成网站在线播放欧美日韩| 日本撒尿小便嘘嘘汇集6| 国产伦理片在线播放av一区 | 最近手机中文字幕大全| 亚洲成人久久爱视频| 亚洲av中文字字幕乱码综合| 精品久久久久久久久av| 亚洲欧美日韩高清在线视频| 99热只有精品国产| 成人美女网站在线观看视频| 亚洲av免费在线观看| 国产精品麻豆人妻色哟哟久久 | 伊人久久精品亚洲午夜| 91麻豆精品激情在线观看国产| 99精品在免费线老司机午夜| 久久久精品94久久精品| 欧美激情国产日韩精品一区| 两个人视频免费观看高清| 国语自产精品视频在线第100页| 午夜福利在线观看吧| 国产精品一区二区性色av| 日本撒尿小便嘘嘘汇集6| 国产不卡一卡二|