• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Dual-Port Content Addressable Memory for Cache Memory Applications

    2022-03-14 09:23:02AllamAbumwaisAdilAmirjanovKaanUyarandMujahedEleyat
    Computers Materials&Continua 2022年3期

    Allam Abumwais,Adil Amirjanov,Kaan Uyar and Mujahed Eleyat

    1Department of Computer Engineering,Near East University,Nicosia,N.Cyprus via Mersin-10,Turkey

    2Computer Systems Engineering,Arab American University,Jenin,240,Palestine

    Abstract:Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual port content addressable memory (DPCAM).In addition, it proposes a new replacement algorithm based on hardware which is called a near-far access replacement algorithm (NFRA) to reduce the cost overhead of the cache controller and improvethe cache access latency.The experimental results indicated that the latency for write and read operations are significantly less in comparison with a set-associative cache memory.Moreover, it was shown that a latency of a read operation is nearly constant regardless of the size of DPCAM.However, an estimation of the power dissipation showed that DPCAM consumes about 7% greater than a set-associative cache memory of the same size.These results encourage for embedding DPCAM within the multicore processors as a small shared cache memory.

    Keywords: Multicore system; content addressable memory; dual port CAM;cache controller; set-associative cache; power dissipation

    1 Introduction

    The microprocessor that contains multiple cores (processors) in a single integrated circuit (IC)is named as multicore [1].Manycore processors are multicore processors that designed for a high degree of parallel processing containing numerous processor cores.In multicore and manycore processor systems shared memory has the key role for providing efficient communication between the cores.When multiple cores try to access the same shared memory at the same time, it may cause hazard.There are many studies in the literature to improve the performance of shared memory.Reducing access latency and power consumption are the main directions to improve the efficiency of the shared memory.Improvements in cache architecture and cache replacement algorithms are two options to pursue in this direction.

    Most multi/many core systems use associative memory (AM) cache as a shared memory [1,2].Enhanced cache architectures aims to empower the parallel search and rapid access [1,2].On the other hand, replacement algorithms are used to help the cache controller choose which data to discard to make room for the new ones [3,4].In addition, an efficient replacement algorithm will improve the cache access latency.Content addressable memory (CAM) is similar to directmapped AM that refers to memory whose locations are accessed by comparing tags (part of contents) rather than providing their addresses [2], and has some features to be used as a shared memory [2,5,6].

    This work presents a special purpose shared memory architecture based on content addressable memory and a replacement algorithm.The main purpose of this work is to allow simultaneous access to the cache memory by multicore processors that achieve more efficient access latency with various CAM cache sizes compare to set-associative cache.

    The rest of the paper is organized as follows.Section 2 gives briefly the literature review related to CAM and shared cache in multi-core systems.Section 3 presents the architecture of the proposed dual port CAM (DPCAM) and the Near-Far Access Replacement Algorithm(NFRA).Section 4 shows the implementation of the DPCAM in Field Programmable Gate Arrays(FPGA), discusses the functional and timing simulation and illustrates the power estimation analysis.Section 5 represents the conclusion.

    2 Related Works

    In multicore processors, there are multi-level of caches and most of them are setassociative [1].Shared level is usually shared across the cores and is placed on the system chip [1,4].Various types of AM and CAM have been designed and implemented on the FPGA to be use on special purpose applications.However, some of these works [7-11] suffers from memory efficiency due to the limited size, update latency, power cost and low density and etc.Thus, they could not be used as shared level of cache in modern multi-core system [12].There are several works on two approaches to improve memory efficiency by an architectural design [8,13-15] and by efficient cache replacement algorithms [16-20].

    Resistive configurable associative memory (ReCAM) is used to solve some of the issues.ReCAM improves the access latency by limiting the load/store fatigue at the beginning of executions [13].ReCAM uses hamming distance algorithm which searching the nearest cell for reading and writing.It exploits this feature to design a memory with better efficiency in both performance and power consumption.In ReCAM architecture, the processing element has two kinds of execution units: the first is composed of ReCAM arrays where memories are connected using a crossbar.The second is a traditional core.The main disadvantage of this design, it was implemented on single processor system and not suit on multi-core systems.Whereas the DPCAM purpose to be used in multi/many-core processors.

    AM architecture using Virtex-6 FPGA series inside the cache controller was presented that was designed to work as a look-up table inside the cache controller with size of 1KiB and a block size of 16 byte [14].The simulation results show that the cache controller’s setup latency is 1.66 ns and total power consumed is 5.53 mW.The main disadvantage of this design is that difficult to scale it to a bigger size.Therefore, it cannot be used in shared memory.

    The design called Gate-based area efficient ternary content addressable memory(G-AETCAM) which uses flip-flops as FPGA storage elements and it can be configured as binary and ternary CAM where gate levels reduce the resources on FPGA [6].The design has been implemented in different sizes for the Virtex-5,-6 and-7 FPGA series.The performance is increased by 28 percent compared to the other FPGA-based Ternary content-addressable memory (TCAM).It also facilitates better scalability than other TCAMs due to less complexity of the architecture.In [15], the authors presented an efficient FPGA resources and power consumption called Zi-CAM.It has less complexity and power consumption than traditional RAM-based CAM designs on FPGAs.The internal structure consists of two main units: RAM unit and lookup tables unit.Each unit is activated according to the sequence of data.The design has been implemented on Virtex-6 FPGA.The presented results showed that Zi-CAM improved FPGA resources cost, power consumption and update latency when compare to common FPGA-based CAMs.The main benefit is update latency of Zi-CAM is nearly constant with different sizes.These two designs have attractive features for networking applications with limited size especially in routing table.However, it is difficult to implement as shared memory in multi/many-core system.

    In [8], the author presented a logic-based high-performance BiCAM architecture (LH-CAM)with Xilinx FPGA.Multiple data may be written simultaneously if enough I/Os are available on the FPGA device; therefore, improving writing latency.It also provides faster updating algorithms but the complexity linearly increases with the CAM depth and hence access latency will increase linearly.

    The second approach to improve memory efficiency is to use efficient cache replacement algorithms.Least Recently Used (LRU), random, Round Robin, and modified LRU are commonly used.Many other advanced strategies have been proposed, most of which are based on LRU to solve the miss rate and access latency issues and are designed for general purpose applications rather than multi-core processors [16].On the other hand, few of researches were touched to evaluate the multi-core system performance associated with these types of replacement algorithms.In [17], the authors presented a Least Error Rate (LER) replacement algorithm in shared cache L2 with minimum error rate in writing.LER modifies the algorithm used to store incoming data in a cache line with minimum write error rate.To accomplish this, LER compares the incoming data with the contents of the set lines simultaneously.The experiment’s results of these algorithms were compared to LRU and show 1.4% improvements on miss rate and 3.6% less overhead.A new Random First Flash Enlargement (RFFE) had been proposed [18].It enhances the overwriting on L2 shared caches when a cache line should be replaced depending on a random Gaussian-coding scheme.This replacement algorithm increases the complexity of the cache controller.

    In [19], a new update algorithm was proposed; it focused on designing a high-speed intelligent update algorithm for a RAM-based TCAM because it is the main factor that affects power,performance and scalability in TCAM.The design was successfully implemented on Virtex-6 FPGA series.The results show the functional simulation of the design where the authors prove that their design consumed less latency for updating the blocks.Authors in [20] have successfully designed another updating technique based on RAM-based TCAM, which automatically adds incoming data and delete old one whenever the TCAM become full.The main disadvantage of these two algorithms is the complexity especially when increase the memory size and hence consumes large power.

    Moreover, it should be emphasized that all of the previous replacement algorithm add a new overhead of accessing data and increase the non-computational times to update the location because they do not utilize cache hardware architecture.

    The main function of the cache replacement algorithm is determining the effective response of the cache.Although the replacement algorithms goal is to erase the block that will not be accessed in the near future, some of the erased blocks will be accessed in the far future while executing instructions.The data written on the shared level of cache can be divided into two versions.Near-access data may be used by near coming instruction and far-accessing that will be accessed, relatively, long time after being written.The far-accessing data stored in a line could be overwritten before used.In most cases, the controller can produce far-access operation such that the interested core/s read them from lower level and write them in their private caches [1,14].

    The goal of this work is to improve the cache access latency by designing a standalone memory that can be used in multi-core systems as a shared pipeline cache.For the pipeline processors usually dual port is used to avoid stalling during simultaneous access to the memory [21,22].The proposed design is based on dual port CAM (DPCAM), which provides simultaneous write and search operations within the CAM memory, if more than one core try to access the memory.In addition, NFRA is proposed and implemented as a hardware component inside DPCAM to reduce the cost overhead and the complexity of the cache controller.

    3 Proposed DPCAM

    The proposed DPCAM works as a standalone pipelined shared cache where one port is used for writing and the other one is used for reading.The DPCAM includes dual port: DS31-DS0 for writing and Dd31-Dd0 for reading, Tag Field, Data Field, Control Unit, and comparator (CMP)as shown in Fig.1.

    Figure 1: DPCAM design

    In the stage of store back (SB) the core provides the Data source [Ds31-Ds0] and the Tag source [Ts15-Ts0] to be written on a selected cache line.In the stage of Operand fetch (OF) the core provides the Tag destination [Td15-Td0] to be compared with all cache lines simultaneously and the stored data from the Data Field will be read to the destination data bus [Dd31-Dd0].This two ports works concurrently.

    Each cache line (L) is composed of two fields: the Data Field and the Tag Field and is associated with a simple 2X1 comparator (CMP).Data Field contains the shared data to be stored while the Tag Field contains a unique tag (a part of data plus version number) for each Data Field.The length of each field depends on the architecture in which the CAM is used.The tag field can be varied to suit the number of shared unique data versions, e.g., 32-bit tag can accommodate up to 4 Giga versions of data.The CMP is needed in read operations.It compares the tag coming from the OF stage [Td15-Td0] with tags stored [Ts15-Ts0] in cache lines.

    Another main component of the DPCAM architecture is the Control Unit.It has two functions, controlling the write operation and implementing the replacement algorithm.The Control Unit of the DPCAM includes a pointer to produce an active high Latch Enable (LE) signal for each memory line on a rotating basis.Fig.2 depicts the architecture of Control Unit.The control circuit can be employed to select the location where data will be stored where locations are selected sequentially for writing with simple overwriting techniques to update the contents and erase the old one.A set of D Flip Flop (D-FF) is used that is equal to the number of locations, the output of each D-FF points to the corresponding DPCAM location.When the system is reset,this pointer points to the LE0 first memory location, so that the first writing operation will be performed on line 0 of the memory.After writing to the current location, the pointer points to the next location, and so on until location n-1 (Ln-1).

    Figure 2: Control unit

    The writing operation in DPCAM is controlled by the Control Unit and the WR signal on the write port.The stage SB of writing a core provides the data [Ds31-Ds0], the tag [Ts15-Ts0] and the active low WR signal.On the negative edge of WR signal (the end of WR), the control circuit moves the LE to LE1 in preparation for the next writing which will be to line 1.The read operation occurs when the stage OF of the reading core applies the tag destination[Td15-Td0] and an active high read (RD) signal to all Tag Fields simultaneously.The RD signal outputs the stored source tags to the CMP of each memory line in order to be compared with the applied tag simultaneously.If a match occurs, the equality signal of the comparator is used as an output enable (OE) signal which outputs the stored data from the Data Field to the destination data bus [Dd31-Dd0] where it can be read by the OF unit of the reading core.In case of reading and writing to the same memory location, the Control Unit will give priority for writing and will give a WAIT signal to the reading operation.In case of reading and writing to the different memory locations simultaneously, both read and write ports operate concurrently, which significantly reduce a cache access latency.The stage SB of the writing core provides the data[Ds31-Ds0] and the tag [Ts15-Ts0] which are written to the location determined by the Control Unit.On the other hand, the stage OF of the reading core provides the destination tag [Td15-Td0]and RD signal to all tag fields simultaneously and the stored data from the Data Field can be read to the destination data bus [Dd31-Dd0].

    In the DPCAM, a replacement algorithm called NFRA based on simple hardware units is implemented.In the proposed architecture, as an alternative solution to reading the data from lower level memory by the cache controller which increases the access latency, new small DPCAM is implemented.The main DPCAM contains the near-access data, while the new module is used for the far-access data, as shown in Fig.3.Certainly, because the modules of the far-accessing are less frequently used, it will be smaller than those of the near-access.For example, with four cores and 64 Kbyte (KiB) shared DPCAM, each core can write 2k operand, each includes eight bytes of data and tag, to the DPCAM before it needs to be overwritten.

    Figure 3: Near-access, far-access DPCAM modules

    Referring to the Control Unit and writing through the pointer which is the main parts of the NFRA, if the processor writes to location Lx, the next instruction will write its operand on location Lx+1.This process can be repeated until location Ln-1 is accessed, after which it moves back to LE0 where it starts overwriting the old data and tags.This technique is applied in both near-access and far-access module.It can be noted, that the NFRA can be implemented on the hardware level with less cost and less access overhead in compiler computation.The cost overhead is mostly related to the cache controller complexity and its latency.As a new solution,the proposed NFRA is implemented using simple hardware inside DPCAM control unit instead of a complex algorithm installed inside the cache controller, thus NFRA improves the cache access latency.Using an algorithm at compile time allows separating the near-access and far-access to be stored/loaded in/from different DPCAM modules [1].The far-access module works on demand.In other words, it gets activated if any core needs to access it for storing/loading data.Other than that blocks are transferred to a new inactive mode.This is a well-known concept in caches called the migration principle to save the power consumption [23,24].

    4 Performance Analysis

    DPCAM has been implemented, compiled, simulated, and verified using Quartus prime pro 19.1 that includes ModelSim package for design and simulation supported by Intel [25].DPCAM has been built and evaluated as a standalone memory using Intel FPGA family Cyclone V with 28 nm technology [26].This is the first step to demonstrate that DPCAM can replace the shared cache in the memory hierarchy of a multi-core processor.For testing DPCAM two cores was used to assess the latency of read and write operations.DPCAM has been implemented by both block schematic files and Verilog HDL code.Files have then been verified and debugged using ModelSim and Vector Wav File (VWF) in both functional and timing simulation.A special tests were written to simulate and observe the latency of reading and writing operations of DPCAM.In addition,the Power Analyzer Tool has been used to estimate the static and dynamic power consumption of DPCAM.The performance of DPCAM was compared to the set-associative cache which is the most popular architecture type used as shared memory in multi-core systems [1,2].

    4.1 Functional Simulation

    During the functional simulation the following tests to assess the performance of DPCAM were accomplished: write and read operations, the simultaneous read and write operations into the different memory location and the simultaneous read and write operations into the same memory location.

    The test-bench program starts by resetting the control unit.It then generates random 16-bit tags and 32-bit data and puts them on the input pins to perform the write operation.After that, it keeps generating a repeated read/write signals until the end of simulation time.It finally puts the 16-bit tagd to the input pins in order to compare with stored tags and output the data to the output pins to perform the read operation.The test-bench is used for functional simulation in DPCAM.Moreover, it is used for DPCAM which uses NFRA and set-associative cache which uses LRU replacement algorithm in order to compare them in terms of latency and power dissipation.The diagram in Fig.4 illustrates the use of the test-bench program, which is a special benchmark program that was written for analysing the latency and power dissipation assessments for all read and write cases.

    Figure 4: Test-bench program

    Fig.5 shows a thumbnail image of several clock cycles (clock period equal 10 ns) for reading and writing to the 64 KiB DPCAM.In the first interval (0 to 10 ns), the control unit was set to start pointing on the first location.Instantly when the write (WR) signal goes down, because it operates on negative edge, the written data (outI) appear clearly on the DPCAM locations.This means that the input values have been stored in the targeted DPCAM locations.In the second interval (10 to 20 ns), the processor retrieves the data that is stored in DPCAM location, it loads the corresponding tag (tagd) of the data written before (outI) and applies a RD signal.Instantly when the RD signal becomes high, the stored data appears on the output buses of the processor(outE).Interval 4 (30 to 40 ns) displays the read and write operations simultaneously in different DPCAM locations, where a new data with tags ([0]13) is written to the target location and data that already have been written with tagd ([0]12) is read correctly.While interval 5 (40 to 50 ns)shows the simultaneous read and write operations into the same location with higher priority of a write operation and delayed read to the next cycle.

    Figure 5: Functional simulation

    4.2 Latency Assessments

    Fig.6 illustrates the timing simulation of both reads and writes operations of the 64 KiB with the near and far DPCAM module using Cyclone V FPGA from Intel.

    Figure 6: Latency assessments

    In the first interval (0 to 10 ns), the control unit was set to input data and their tags into the first location.Instantly when the WR signal goes down, the written data (pine outI) appear clearly on the DPCAM locations after a time delay.Running the simulator 100 times it was noticed that the average delay time of writing on DPCAM is about 0.9529±0.03393 ns.The second (10 to 20 ns) and third (20 to 30 ns) intervals show the simulation to assess a latency of a read operation.To read data that is already stored in any DPCAM locations, the tagd ([0]10) in second interval is compared simultaneously to the tags in all locations with RD signal.The equality occurs when comparing with the tag associated with data in the first location and the data appears on output buses (outE) after delay time.Taking the average of around one hundred intervals of test-benches,it was noticed that the delay for read operation is around 1.1782±0.08830 ns.The fourth (30 to 40 ns) and fifth (40 to 50 ns) intervals were used for an assessment of latencies for the simultaneous read and write operations to the memory locations.

    Memory latency is the time between initiating a request for data and the actual data transfer.For simultaneous write and read operations of DPCAM the memory latency is the time between two requests for simultaneous write and read operations.To assess a latency for these modes the following expressions can be used:

    ?lSDL=max(lWR+lRD)for simultaneous write and read operations to the different memory locations

    ?lSSL=tCL+lRDfor simultaneous write and read operations to the same memory locations

    WherelWRis a latency of a write operation,lRDis a latency of a read operation andtCLis a cycle time.

    The simulations for both modes were performed 100 time as well.For the mode with simultaneous write and read operations to the different memory locationslSDL=1.2201±0.0914 ns which indicates that the latency is equal to the latency of the read operation measured before in the second interval of tests.It was proved by using T-test with 95% of a confidence interval.

    For the mode with simultaneous write and read operations to the same memory locations the data is written to the target location with a latency 0.9828±0.0412 ns, whereas the read operation waits until the next cycle then targeted data is read from the same location with a latency 1.2226±0.09446, that is common latency for the mode ns.

    The same value of test-bench is used with the 64 KiB four-way set-associative cache to assess the latency of different operations for different cases of memory access.The experiments that were done 100 times showed that the average latency of a write operation is 1.9434±0.0382 ns while for a read operation it is 2.1584±0.1056 ns.Simultaneous read and write operations were not tested because it is not allowed in set-associative caches.For comparisons of the write operation and the read operation for DPCAM and a set-associative memory the T-test was used which showed that there is an evidence that the write and the read latencies of DPCAM less than the same latencies of set-associative with 95% of the confidence interval.

    The read latency of the tested DPCAM is less than that of the tested set-associative cache because set-associative caches require an index to be determined to access a location that has a tag to be compared with tag part of the target address which increases the latency.Whereas in DPCAM, the incoming tag is directly compared with the stored tag.Usually, the cache memory based on AM has around 2 ns read latency with the 64 KiB [27], 1.66 ns in AM with 1KiB,and 1.69 ns in 4-way set associative with 2 KiB which is used in cache controller [14].But write latency for the cache memory based on AM usually exceeds 2 ns for 64KiB [27].

    Latency of a write operation is a critical issue and it can jeopardize the adoption of any design in the multi-core memory hierarchy.DPCAM design with near-far access modules and various memory sizes was simulated and compared with the traditional four-way set-associative cache of equivalent sizes with the same FPGA technology.To find the average latency of a write operation using various memory sizes, the memory size was modified and the test-bench was accomplished with a new time interval for each size.Fig.7 and Tab.1, show that DPCAM has a small writes latency that is nearly constant for different sizes; this is because the control unit directly points to the memory location.In this case, there is no need for generating the address to the next write location which is used in AM cache memory and makes it faster to select the appropriate location for a write operation.Fig.7 illustrates that the gap of write latency between DPCAM and the set-associative cache is increasing as the size of memory increased.

    Figure 7: Latency of a write operation (ns)

    Table 1: Latency of a write operation (ns)

    According to the latency estimation of write and read operations, the NFRA replacement algorithm used by DPCAM was compared with LRU algorithm used by the set associative cache memory.For size 64 KiB DPCAM achieves less access latency.For DPCAM the write operation is about 0.9529±0.03393 ns and the read operation is 1.1782±0.08830 ns, whereas, for the setassociative cache the latency of the write operation is 1.9434±0.0382 ns while the read operation is 2.1584±0.1056 ns.

    4.3 Estimation of a Power Dissipation

    Despite the recent trends towards smaller and faster memories, power management has become increasingly important.As the chip technology size shrinks, the overall size, performance and cost will improve, but the power density will increase.Hence, power dissipation estimation is essential to guide architects to define the components that consume main power and try to modify and improve the design.For estimation of power dissipation the Power Analyzer Tool of a Quartus simulator is used that provides an average accuracy of ±10% [28].The script with DPCAM design provided by ModelSim was run out to generate a file to be used in Power Analyzer Tool.Static, dynamic, I/O and total power are calculated in accordance with waveform file generated by the Power Analyzer during the gate level simulation.

    In this section, the power dissipation is compared between DPCAM and four-way setassociative cache with different memory sizes.For an assessment of the power dissipation the DPCAM includes near-far access modules.

    Static power is the thermal power consumed on a chip.Except for the I/O port, static power always includes the leakage power of the functional unit on the FPGA.Whereas, the dynamic power is the additional power consumption of the device due to the unit’s activity and signal toggling.On the other hand, I/O power is generated by the pins.Pins always drive components off-chip or on-chip, that effect on dynamic power [28].

    Tabs.2 and 3 illustrate the static, dynamic, I/O and total power dissipation of DPCAMs and set-associative cache respectively for different sizes.It is observed that the main drawback of DPCAM is the static power dissipation especially when the size increases; this is because of the hardware complexity of the control unit, which is a part of DPCAM architecture, and the internal wires cover more area and increase the power dissipation.The dynamic power of DPCAM is close to that of the set-associative cache when the size is less than 512 K, but it dramatically increases after 256 K; this is due to the large number of locations is active at the same time, especially in the read operations.The I/O power of DPCAM is very close to that of the set-associative cache when compared using different sizes; this is because that the off-chip pins are constant regardless of the internal memory size.

    Table 2: DPCAM power dissipation

    The total power consumed by DPCAM is a little greater than the set-associative cache,it was estimated at about 7% of total power.Since the increase in total power is small and acceptable it will not affect the adoption of DPCAM in multi-core systems.This slight increase in power dissipation is acceptable and can be improved using different algorithms of power-saving techniques [23,24,29-31] which are mainly used to improve static and dynamic power dissipation.

    Table 3: Set-associative memory power dissipation

    5 Conclusion

    CAM memories are commonly used in different computing applications, mainly in various types of networks and inside CPUs.Moreover, there are several potential applications that can benefit from the capabilities of CAM memory.In this article, a new pipeline DPCAM architecture and a simple replacement algorithm called NFRA based on hardware level has been proposed.DPCAM has been proposed in a way that can be utilized in multi-core processors either in shared level of caches or in special purpose caches inside the interconnection networks between cores.

    This work has demonstrated that DPCAM can replace the shared cache in the memory hierarchy of a multi-core processor.This conclusion was drawn based on evaluating the design on the Cyclone V Intel FPGA.The latency of both read and write accesses and power characteristics of DPCAM have been investigated.The DPCAM achieves average 1.2±0.09138 ns latency for reading and 0.9679±0.0642 for writing operations which are clearly better than in other types of AM.Moreover, for the DPCAM the latency of a write operation was nearly constant for different memory sizes.On the other hand, DPCAM consumes about 7% more power than set-associative memory which can be reduced by some power-saving techniques.

    As a future work, the DPCAM needs to be embedded inside a multi-many core system as a shared cache memory.In this case, other necessary parameters should be evaluated, such as cache hit rate, cache miss rate, read and write latency, memory utilization and cache power consumption.In addition, a comparative study can be done between a multi/many-core that uses the DPCAM with NFRA and the traditional multi/many-core architecture with other state-of-the art replacement algorithms for the purpose of studying the efficiency of the new architecture.Gem5 simulator and a set of SPEC CPU2006 benchmark programs can support these goals.

    Funding Statement:The authors received no specific funding for this study.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    国产精品免费视频内射| 老司机午夜十八禁免费视频| 日本熟妇午夜| 狂野欧美白嫩少妇大欣赏| 少妇熟女aⅴ在线视频| 亚洲av片天天在线观看| 精品第一国产精品| 欧美大码av| 首页视频小说图片口味搜索| 麻豆国产av国片精品| 国产精品免费一区二区三区在线| 一级黄色大片毛片| 搡老熟女国产l中国老女人| 亚洲国产欧美网| 窝窝影院91人妻| 成熟少妇高潮喷水视频| 99热这里只有是精品50| 91大片在线观看| xxxwww97欧美| 99久久综合精品五月天人人| 丰满人妻熟妇乱又伦精品不卡| 夜夜爽天天搞| 亚洲国产欧美人成| 91麻豆精品激情在线观看国产| 在线观看一区二区三区| 精品一区二区三区av网在线观看| av在线天堂中文字幕| 变态另类丝袜制服| 亚洲av熟女| 麻豆国产av国片精品| 桃色一区二区三区在线观看| e午夜精品久久久久久久| 亚洲国产精品久久男人天堂| 精品久久久久久久久久免费视频| 久久 成人 亚洲| 欧美激情久久久久久爽电影| 全区人妻精品视频| 久久婷婷成人综合色麻豆| 午夜激情av网站| 亚洲av五月六月丁香网| 全区人妻精品视频| 超碰成人久久| videosex国产| 亚洲无线在线观看| 嫩草影院精品99| 免费看美女性在线毛片视频| 国产黄a三级三级三级人| 国产精品九九99| videosex国产| 午夜免费成人在线视频| 很黄的视频免费| 日本撒尿小便嘘嘘汇集6| 日本在线视频免费播放| 亚洲av中文字字幕乱码综合| 亚洲成人久久爱视频| av超薄肉色丝袜交足视频| 欧美精品亚洲一区二区| 国产一级毛片七仙女欲春2| 大型黄色视频在线免费观看| 亚洲欧美日韩无卡精品| 国产av一区二区精品久久| 90打野战视频偷拍视频| 久久人妻福利社区极品人妻图片| 男女视频在线观看网站免费 | 欧美成人一区二区免费高清观看 | 日本a在线网址| 午夜免费激情av| 成人av在线播放网站| 免费看日本二区| 高潮久久久久久久久久久不卡| 久久久久久久久免费视频了| 欧美日韩亚洲国产一区二区在线观看| 日本三级黄在线观看| 色av中文字幕| 国产男靠女视频免费网站| av福利片在线观看| 国产亚洲精品久久久久久毛片| 男人舔女人的私密视频| 日韩欧美精品v在线| 久久国产乱子伦精品免费另类| 好看av亚洲va欧美ⅴa在| 国产欧美日韩精品亚洲av| 国产精品自产拍在线观看55亚洲| 波多野结衣高清作品| 精品久久久久久久毛片微露脸| 淫秽高清视频在线观看| 狠狠狠狠99中文字幕| 国产欧美日韩一区二区精品| 日本 欧美在线| 啪啪无遮挡十八禁网站| 啦啦啦免费观看视频1| 国产亚洲精品一区二区www| 成人一区二区视频在线观看| 老司机深夜福利视频在线观看| 一级毛片高清免费大全| 亚洲av五月六月丁香网| 国产精品影院久久| 国产精品免费一区二区三区在线| 精品国产乱码久久久久久男人| 亚洲五月婷婷丁香| 人成视频在线观看免费观看| 精品久久久久久久人妻蜜臀av| 亚洲天堂国产精品一区在线| 九色国产91popny在线| 欧美乱码精品一区二区三区| 日本成人三级电影网站| 全区人妻精品视频| 国产黄片美女视频| 久久精品成人免费网站| 日本撒尿小便嘘嘘汇集6| 黄色a级毛片大全视频| 看黄色毛片网站| 亚洲一区中文字幕在线| 999久久久国产精品视频| 日本 av在线| 久久天堂一区二区三区四区| 在线看三级毛片| 麻豆av在线久日| 亚洲第一欧美日韩一区二区三区| 午夜精品一区二区三区免费看| 两性午夜刺激爽爽歪歪视频在线观看 | 搡老妇女老女人老熟妇| 亚洲精品久久国产高清桃花| 狂野欧美白嫩少妇大欣赏| 日韩欧美在线二视频| 午夜福利视频1000在线观看| 久久久久久人人人人人| 国产精品亚洲一级av第二区| 少妇粗大呻吟视频| 亚洲乱码一区二区免费版| 国产精品久久久久久精品电影| а√天堂www在线а√下载| 男人的好看免费观看在线视频 | 人妻久久中文字幕网| 香蕉丝袜av| 亚洲五月婷婷丁香| 久久精品国产99精品国产亚洲性色| 99精品在免费线老司机午夜| 在线观看66精品国产| 国产三级在线视频| 国产成年人精品一区二区| 一个人观看的视频www高清免费观看 | 国产爱豆传媒在线观看 | 欧美一级毛片孕妇| 成人av在线播放网站| 蜜桃久久精品国产亚洲av| 99久久综合精品五月天人人| 亚洲第一电影网av| 婷婷丁香在线五月| 欧美精品啪啪一区二区三区| 国产av在哪里看| 亚洲精品色激情综合| 舔av片在线| 久久久水蜜桃国产精品网| 桃红色精品国产亚洲av| 特大巨黑吊av在线直播| 欧美性长视频在线观看| 天天躁狠狠躁夜夜躁狠狠躁| 99热只有精品国产| 舔av片在线| 禁无遮挡网站| 高清在线国产一区| 9191精品国产免费久久| 日本免费一区二区三区高清不卡| 午夜免费观看网址| www.www免费av| xxx96com| 日本黄色视频三级网站网址| 美女黄网站色视频| 亚洲成av人片免费观看| 亚洲天堂国产精品一区在线| 欧美精品亚洲一区二区| 日韩欧美精品v在线| 琪琪午夜伦伦电影理论片6080| 午夜免费激情av| 最新美女视频免费是黄的| 天天躁夜夜躁狠狠躁躁| 精品无人区乱码1区二区| 国产高清videossex| 搞女人的毛片| 村上凉子中文字幕在线| 亚洲成人国产一区在线观看| 久久久精品欧美日韩精品| 一本一本综合久久| 精品高清国产在线一区| 成人av一区二区三区在线看| 99热只有精品国产| 这个男人来自地球电影免费观看| 很黄的视频免费| 国产1区2区3区精品| 欧美成狂野欧美在线观看| 亚洲精华国产精华精| 免费在线观看亚洲国产| 精品久久久久久久久久免费视频| 老司机在亚洲福利影院| 18禁黄网站禁片免费观看直播| av有码第一页| 一边摸一边抽搐一进一小说| 一卡2卡三卡四卡精品乱码亚洲| 国产亚洲精品一区二区www| 伊人久久大香线蕉亚洲五| 一级黄色大片毛片| 国产片内射在线| 国产熟女xx| 12—13女人毛片做爰片一| 久久婷婷人人爽人人干人人爱| 叶爱在线成人免费视频播放| 精品一区二区三区四区五区乱码| 一级毛片女人18水好多| 女同久久另类99精品国产91| 啦啦啦免费观看视频1| 天天躁狠狠躁夜夜躁狠狠躁| 日日爽夜夜爽网站| 国产视频一区二区在线看| 亚洲美女视频黄频| 在线观看免费日韩欧美大片| 国产精品一区二区免费欧美| 国产高清视频在线播放一区| 久久这里只有精品中国| 精品久久久久久久毛片微露脸| 久久久久性生活片| 不卡av一区二区三区| 国产蜜桃级精品一区二区三区| 两性夫妻黄色片| www.熟女人妻精品国产| 一区二区三区激情视频| 香蕉av资源在线| 麻豆国产av国片精品| 50天的宝宝边吃奶边哭怎么回事| 麻豆一二三区av精品| 日本a在线网址| 又大又爽又粗| 在线播放国产精品三级| 一二三四在线观看免费中文在| 久久精品aⅴ一区二区三区四区| 免费搜索国产男女视频| 久久草成人影院| 欧美 亚洲 国产 日韩一| 少妇粗大呻吟视频| 麻豆国产97在线/欧美 | 麻豆成人午夜福利视频| 最新美女视频免费是黄的| 亚洲精品中文字幕在线视频| 亚洲精品粉嫩美女一区| 制服诱惑二区| 日日夜夜操网爽| 男人舔奶头视频| 免费观看人在逋| 老司机午夜福利在线观看视频| 婷婷亚洲欧美| 丰满的人妻完整版| 男人的好看免费观看在线视频 | 麻豆国产97在线/欧美 | 婷婷精品国产亚洲av| www.999成人在线观看| 国产精华一区二区三区| 亚洲国产精品久久男人天堂| 丝袜美腿诱惑在线| 女人被狂操c到高潮| 婷婷精品国产亚洲av| 一区福利在线观看| av有码第一页| 两人在一起打扑克的视频| 婷婷亚洲欧美| 久久香蕉国产精品| 精品欧美一区二区三区在线| 老汉色∧v一级毛片| 久久中文字幕一级| 高清在线国产一区| 女同久久另类99精品国产91| 国产亚洲欧美在线一区二区| 国产69精品久久久久777片 | 欧美精品啪啪一区二区三区| av在线播放免费不卡| 中文字幕熟女人妻在线| 91麻豆av在线| 在线观看66精品国产| videosex国产| 可以免费在线观看a视频的电影网站| 精品福利观看| 久久久久久国产a免费观看| 婷婷丁香在线五月| 每晚都被弄得嗷嗷叫到高潮| 日本a在线网址| 啦啦啦韩国在线观看视频| 国内揄拍国产精品人妻在线| 亚洲人成电影免费在线| 国内精品一区二区在线观看| 天堂动漫精品| 欧美成人免费av一区二区三区| 黑人巨大精品欧美一区二区mp4| 丰满人妻一区二区三区视频av | 欧美激情久久久久久爽电影| 久久中文看片网| 精品日产1卡2卡| 欧美成人午夜精品| 一边摸一边做爽爽视频免费| 久久久国产欧美日韩av| 免费高清视频大片| 亚洲欧美日韩高清专用| 精品一区二区三区av网在线观看| 亚洲国产精品sss在线观看| 欧美人与性动交α欧美精品济南到| 真人做人爱边吃奶动态| 日韩成人在线观看一区二区三区| cao死你这个sao货| 国产久久久一区二区三区| 999久久久精品免费观看国产| 深夜精品福利| 久久久久国内视频| 中文资源天堂在线| 午夜a级毛片| 91国产中文字幕| 国产精品一区二区免费欧美| 中文字幕人成人乱码亚洲影| 国产伦人伦偷精品视频| 精品久久久久久成人av| 757午夜福利合集在线观看| 欧美性猛交黑人性爽| 婷婷丁香在线五月| 国产精品综合久久久久久久免费| or卡值多少钱| 又爽又黄无遮挡网站| 曰老女人黄片| 动漫黄色视频在线观看| 国产av不卡久久| 久久午夜亚洲精品久久| 免费在线观看日本一区| 日韩免费av在线播放| 91成年电影在线观看| 制服诱惑二区| 日日夜夜操网爽| 国产亚洲精品第一综合不卡| 亚洲九九香蕉| 国产精品亚洲av一区麻豆| 欧美乱码精品一区二区三区| 婷婷六月久久综合丁香| 日本一区二区免费在线视频| 国产99久久九九免费精品| 成人av一区二区三区在线看| 久久久久久国产a免费观看| 黄色片一级片一级黄色片| 9191精品国产免费久久| 久久精品国产亚洲av香蕉五月| 日本a在线网址| 亚洲精品国产一区二区精华液| 全区人妻精品视频| 国产三级黄色录像| 免费看日本二区| 成人av一区二区三区在线看| 一个人观看的视频www高清免费观看 | 国产私拍福利视频在线观看| 在线观看日韩欧美| 成人国产综合亚洲| 中文字幕人妻丝袜一区二区| 一进一出抽搐动态| 国产黄片美女视频| 露出奶头的视频| 亚洲欧美日韩高清在线视频| ponron亚洲| 老鸭窝网址在线观看| 成人一区二区视频在线观看| 一进一出好大好爽视频| 久久久久国产一级毛片高清牌| 欧美三级亚洲精品| 国产伦人伦偷精品视频| 国内精品久久久久精免费| 少妇人妻一区二区三区视频| 毛片女人毛片| 欧美乱码精品一区二区三区| 亚洲精品美女久久久久99蜜臀| 精品一区二区三区四区五区乱码| 国产伦一二天堂av在线观看| 久久天堂一区二区三区四区| 日本一区二区免费在线视频| 成年版毛片免费区| 嫩草影院精品99| 午夜福利成人在线免费观看| 国产三级中文精品| 色播亚洲综合网| 国产精品野战在线观看| 久久这里只有精品中国| 欧美黄色淫秽网站| 少妇粗大呻吟视频| 老鸭窝网址在线观看| 一区二区三区国产精品乱码| 久久99热这里只有精品18| 老司机午夜十八禁免费视频| 三级男女做爰猛烈吃奶摸视频| 亚洲av成人av| 国产精品亚洲av一区麻豆| 身体一侧抽搐| 啪啪无遮挡十八禁网站| 国产亚洲精品一区二区www| 他把我摸到了高潮在线观看| 国内精品久久久久久久电影| 黄色成人免费大全| 久久久国产精品麻豆| 精品午夜福利视频在线观看一区| 女同久久另类99精品国产91| 久久中文字幕一级| 观看免费一级毛片| 日韩欧美 国产精品| 亚洲国产欧美人成| 亚洲国产精品久久男人天堂| 久久久久亚洲av毛片大全| 日本一二三区视频观看| 99国产精品一区二区三区| 国产精品免费一区二区三区在线| 国产亚洲欧美98| 久久久久久免费高清国产稀缺| 国产精品久久久久久人妻精品电影| 亚洲av成人精品一区久久| 日韩 欧美 亚洲 中文字幕| 99久久国产精品久久久| 亚洲成人免费电影在线观看| 国产高清有码在线观看视频 | 久久久久国产精品人妻aⅴ院| 国产私拍福利视频在线观看| 夜夜夜夜夜久久久久| 精品国产超薄肉色丝袜足j| 啪啪无遮挡十八禁网站| 看片在线看免费视频| 久久精品aⅴ一区二区三区四区| 狂野欧美激情性xxxx| 久久久久久人人人人人| 一边摸一边抽搐一进一小说| 欧美 亚洲 国产 日韩一| 一个人免费在线观看的高清视频| 男女那种视频在线观看| 国产精品香港三级国产av潘金莲| x7x7x7水蜜桃| 日韩欧美 国产精品| 毛片女人毛片| 看免费av毛片| 国产精品一区二区三区四区久久| 最近最新中文字幕大全电影3| 久久久久国产一级毛片高清牌| 国产1区2区3区精品| 婷婷精品国产亚洲av在线| 亚洲精品中文字幕一二三四区| 男男h啪啪无遮挡| 少妇人妻一区二区三区视频| 免费在线观看黄色视频的| 在线a可以看的网站| 精品免费久久久久久久清纯| 国产在线精品亚洲第一网站| 国产高清激情床上av| 中出人妻视频一区二区| 亚洲全国av大片| 1024手机看黄色片| 久久香蕉精品热| 免费搜索国产男女视频| 又黄又粗又硬又大视频| 变态另类丝袜制服| 亚洲在线自拍视频| 亚洲国产欧美网| 成年版毛片免费区| 国产一区二区激情短视频| 欧美另类亚洲清纯唯美| 国产三级中文精品| 午夜精品在线福利| 老司机靠b影院| 别揉我奶头~嗯~啊~动态视频| 999久久久精品免费观看国产| 欧美人与性动交α欧美精品济南到| 日韩欧美 国产精品| 午夜福利高清视频| 亚洲人成电影免费在线| 国产精品综合久久久久久久免费| tocl精华| 一进一出抽搐gif免费好疼| 好男人在线观看高清免费视频| 在线观看66精品国产| 成在线人永久免费视频| 国产成人精品久久二区二区免费| 亚洲美女视频黄频| 欧美黄色淫秽网站| 最好的美女福利视频网| 黄色片一级片一级黄色片| 亚洲精品中文字幕在线视频| 免费无遮挡裸体视频| 欧美在线黄色| 亚洲五月天丁香| 一本综合久久免费| 琪琪午夜伦伦电影理论片6080| 观看免费一级毛片| 国产欧美日韩精品亚洲av| 成人欧美大片| 久久久国产精品麻豆| 男男h啪啪无遮挡| 国产精品精品国产色婷婷| 日日爽夜夜爽网站| 丰满的人妻完整版| 成人三级做爰电影| av国产免费在线观看| 熟女少妇亚洲综合色aaa.| 国产一区在线观看成人免费| 欧美极品一区二区三区四区| 一级a爱片免费观看的视频| 亚洲欧美激情综合另类| 日本黄色视频三级网站网址| 亚洲五月婷婷丁香| 久久婷婷成人综合色麻豆| 亚洲人成电影免费在线| 亚洲精品在线观看二区| 日本一区二区免费在线视频| 99国产极品粉嫩在线观看| 日韩高清综合在线| 不卡av一区二区三区| av国产免费在线观看| 色在线成人网| 青草久久国产| 精品国产超薄肉色丝袜足j| 岛国在线免费视频观看| 麻豆一二三区av精品| 亚洲中文av在线| 国产精品亚洲美女久久久| 免费搜索国产男女视频| 久久久久国产精品人妻aⅴ院| 51午夜福利影视在线观看| 成人国产综合亚洲| 丝袜人妻中文字幕| 亚洲人成网站在线播放欧美日韩| 禁无遮挡网站| 久久香蕉精品热| 在线观看午夜福利视频| 久久国产精品影院| 91国产中文字幕| 999久久久国产精品视频| xxxwww97欧美| 在线观看一区二区三区| 日韩欧美国产在线观看| 88av欧美| 可以在线观看毛片的网站| 在线免费观看的www视频| 99热这里只有是精品50| 麻豆国产97在线/欧美 | 男男h啪啪无遮挡| 亚洲av美国av| 日日爽夜夜爽网站| 国产黄a三级三级三级人| 天堂av国产一区二区熟女人妻 | 久久久精品欧美日韩精品| 最近最新中文字幕大全免费视频| 一本大道久久a久久精品| 久久久精品欧美日韩精品| 好看av亚洲va欧美ⅴa在| 狠狠狠狠99中文字幕| 国产精品99久久99久久久不卡| 久久精品成人免费网站| 国产精品1区2区在线观看.| 18美女黄网站色大片免费观看| 国产精品一区二区精品视频观看| a级毛片在线看网站| 精品少妇一区二区三区视频日本电影| 1024手机看黄色片| 精品电影一区二区在线| 免费在线观看亚洲国产| 国产真人三级小视频在线观看| 日韩欧美在线二视频| 国产精品久久久人人做人人爽| 91大片在线观看| 亚洲激情在线av| 久久热在线av| 老汉色av国产亚洲站长工具| 国产aⅴ精品一区二区三区波| 久久中文看片网| 国产精品电影一区二区三区| 露出奶头的视频| 成人永久免费在线观看视频| 51午夜福利影视在线观看| 老司机在亚洲福利影院| 最近在线观看免费完整版| 国产男靠女视频免费网站| 午夜福利成人在线免费观看| 99热只有精品国产| 99久久99久久久精品蜜桃| 99久久无色码亚洲精品果冻| 国产亚洲精品久久久久久毛片| 久久九九热精品免费| 国产亚洲av高清不卡| 国产亚洲av嫩草精品影院| 久久久久亚洲av毛片大全| 国产久久久一区二区三区| 亚洲精品美女久久av网站| 国产成+人综合+亚洲专区| 亚洲黑人精品在线| 国产亚洲精品久久久久久毛片| 午夜老司机福利片| 国产男靠女视频免费网站| 国产精品 欧美亚洲| 国产午夜精品久久久久久| 不卡av一区二区三区| 日韩欧美国产一区二区入口| 欧美极品一区二区三区四区| 男女床上黄色一级片免费看| 国产精品野战在线观看| 一个人免费在线观看电影 | 国产激情欧美一区二区| 天天添夜夜摸| 久久九九热精品免费| 国产1区2区3区精品| 88av欧美| 国产成人精品久久二区二区免费| 国产成人精品无人区| 色哟哟哟哟哟哟| 美女 人体艺术 gogo| 国产不卡一卡二| 九色国产91popny在线| 哪里可以看免费的av片| 日本熟妇午夜| 国产欧美日韩一区二区精品| 国产成人系列免费观看| 男女那种视频在线观看|