• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Scalable Interconnection Scheme in Many-Core Systems

    2023-12-12 15:50:20AllamAbumwaisandMujahedEleyat
    Computers Materials&Continua 2023年10期

    Allam Abumwaisand Mujahed Eleyat

    Computer Systems Engineering,Arab American University,Jenin,240,Palestine

    ABSTRACT Recent architectures of multi-core systems may have a relatively large number of cores that typically ranges from tens to hundreds;therefore called many-core systems.Such systems require an efficient interconnection network that tries to address two major problems.First,the overhead of power and area cost and its effect on scalability.Second,high access latency is caused by multiple cores’simultaneous accesses of the same shared module.This paper presents an interconnection scheme called N-conjugate Shuffle Clusters(NCSC)based on multi-core multicluster architecture to reduce the overhead of the just mentioned problems.NCSC eliminated the need for router devices and their complexity and hence reduced the power and area costs.It also resigned and distributed the shared caches across the interconnection network to increase the ability for simultaneous access and hence reduce the access latency.For intra-cluster communication,Multi-port Content Addressable Memory(MPCAM)is used.The experimental results using four clusters and four cores each indicated that the average access latency for a write process is 1.14785±0.04532 ns which is nearly equal to the latency of a write operation in MPCAM.Moreover,it was demonstrated that the average read latency within a cluster is 1.26226±0.090591 ns and around 1.92738±0.139588 ns for read access between cores from different clusters.

    KEYWORDS Many-core;multi-core;N-conjugate shuffle;multi-port content addressable memory;interconnection network

    1 Introduction

    In multiprocessor systems and multi-core processors afterward,the processors(the cores)of the system compete for access to the shared resources,mainly the interconnection networks and the shared memory.They compete during the execution of the main program instructions and the execution of the cache coherence protocols instructions.This competition leads to contention,arbitration,bottleneck,and time delay as a result.Due to these problems,a longer execution time of the program is expected.In the previous papers,the authors have presented an organization that rids the system of these problems[1–3].

    During the past decade,processor designers started working on expanding the multi-core systems to many-core systems where the latter contains a large number,tens to hundreds,of cores[4–6] and they developed several many-core systems over the last few years[7–9].Because the Interconnection networks architecture is at the core of such systems,various architectures have been proposed in the literature.Network on Chip (NoC) router-based architecture becomes the optimal solution to overcome the long-standing challenges for traditional bus interconnection networks due to many features such as scalability,effective bandwidth,and others.The NoC that uses routers to provide multiple paths between cores to enhance throughput and scalability has two schemes;buffered and buffer-less NoCs.Recently,most research groups work to improve the performance of these schemes by producing a hybrid NoC which merges between buffered and buffer-less NoC to handle distributed resources in many-core systems,and reduce the contentions,power consumption and so provide highefficiency NoC[10,11].

    All many-core interconnection network that uses router-based NoC pays a heavy penalty in terms of area and power consumption due to the complex router structure.Research revealed that routers’structure in mesh or tour topology consumes about 28%of the total power and 17%of the total area in the Intel chip[12].In addition,the structure of routers puts additional latency due to the increase in hardware structures that negatively affect the performance.Hence this reduces the scalability.On the other hand,there is still a bottleneck when more than one core simultaneously accesses the same shared cache module because this increases the access latency and hence degrades the system performance.

    Designing an effective network topology that reduces the penalties that may be produced in a large-scale system is the main contribution of this paper.Therefore,the goal of this paper is to present a router-less high-effective interconnection network topology based on a multi-cluster architecture capable of improving performance metrics like scalability,size,bandwidth,complexity,and latency.It addresses the scalability problem caused by router structure and reduces the shared cache access latency.

    In this paper,an interconnection scheme called NCSC that connects N multi-core clusters each of which consists of n cores,where (n ≥N),is presented and authors assume that n=N i.e.,N2cores are included in the proposed system.This design consists of two parts.MPCAM organization for communication cores within the same cluster,and conjugate shuffle interconnection for intercluster communication.NCSC eliminates the routers devices from both parts leading to an increase in the system scalability.Also,redesigned shared cache organization and distributed across the interconnection network;to increase the ability for simultaneous access and hence reduce the access latency.

    The remaining of this paper is organized as follows: Section 2 briefly lists the literature review for many-core interconnection networks.Section 3 explains the main component of the proposed interconnection networks within a cluster and inter-clusters.Section 4 discusses the implementation of the NCSC in Field Programmable Gate Arrays (FPGA) while Section 5 displays the main parts simulation and demonstrates the latency estimation analysis in different scenarios.Finally,Section 6 draws some conclusions from this study and suggests a future vision.

    2 Related Works

    Because of the router-based NoC problems,researchers start with the strenuous pursuit to reduce the number of routers and then produce a router-less NoC.Awal et al.[13]proposed a combination of 2-D mesh on multilayers NoC.It reduces the number of routers needed in the network based on multilayers chip.This network has several attractive metrics,such as network cost,and constant degree.Compared to other interconnection networks,it has moderate latency,fault-tolerant structure,and link count.Whilst this interconnection network has several attractive metrics,it has a drawback of the difficulty to scale the design because this requires additional chip layers making it not feasible in current technology.Moreover,this architecture does not take into account the connection of the cores to shared memory.

    Li et al.[14]proposed Nesting Ring NoC(NRO).This topology consists of a set of clusters each with a fixed size of four cores.NRO achieves attractive features in terms of performance and latency but it has an obvious problem in terms of scalability specifically as the number of cores increases to hundreds or more.This will increase the network diameter causing higher delays and increased traffic along inter-cluster paths.In[15]authors try to address these problems using a large-scale NRO interconnection network that modifies NRO in two ways.First,by introducing new links between cores and clusters to reduce the network diameter.Second,by exploiting the advantages of multilevel chips to combine large cores on each cluster.On the other hand,this work does not discuss the shared memory issue between Intra and inter-clusters.

    Udipi et al.[6] designed a new interconnection structure that eliminates the routers between segments.This architecture is based on a shorted bus and a segmented bus.The main idea is to divide the system chip into various segments of cores,with a shorted bus interconnection on each segment.Each sub-segment bus is connected to a central bus that is directly connected to the manager core.A simple control unit,called“Filter”,was implemented on each central bus to allow the data to transmit between segments.This design has a scalability problem because large numbers of cores need large links and complex control units especially to preserve consistency for shared cache memory.

    Liu et al.[16]proposed a new architecture called Isolated Multi-Ring(IMR)that tries to connect up to 1024 cores with Multi-Ring topology.In IMR,any two cores can be connected via one or more isolated rings so that each packet can reach the destination directly by the same ring structure.This eliminates the need for complex routers and so improves performance and reduces hardware costs.IMR enhanced the throughput and latency but still has many issues like a large number of rings and a large number of buffers at interfaces.

    Alazemi et al.[17] proposed a novel router-less architecture that exploits the bus’s resources perfectly to achieve the shortest path and solve the scalability problem.As the new technologies scale the chips to smaller dimensions,it supports a higher level of metal layers for integration.With this new trend,the increasing number of layers will be exploited in routing metal layers.For example,The Intel Xeon Phi series are designed in 13 metal layers[18].The simulation results show that this architecture achieves a significant advantage in latency,performance,and power consumption.Whilst this idea is so promising,the specific architecture has several issues and deals with ideal many-core architecture without taking into consideration issues related to shared caches.

    The authors in[19,20]proposed a new machine learning based on the deep reinforcement method,to decide on the ideal loop placement for routerless NoCs under different design restrictions.The new approach successfully solves issues with the old design,but it still uses a lot of interconnection loops,which increases power consumption.

    The cache coherence protocols cause a larger delay because each core must notify other cores of any changes it makes to a shared variable.The authors in[3]suggested a new coherency approach in the MPCAM that guarantees the cache coherence for all shared variables over multi-core.With this method,there is no longer a need for cache coherence operations,and the delay of accessing the shared cache becomes the same as accessing the core’s private memory.

    This literature contributes to the research topic:router-less interconnection network in a manycore architecture.The main goal of this article is to propose an effective interconnection network,which improves many-core performance by exploiting MPCAM shared memory and router-less techniques.

    3 The Component of the Proposed Scheme

    During the early stages of the proposed NCSC design,the decision was taken that using a router device is not a choice,because it adds burden and complexity to network architecture.NCSC uses N-shuffle stages connected by N crossbar network-based clusters;each consists of N processors.Nshuffle stages have been used before in many interconnection networks [21–23].This scheme has been modified to an N-conjugate shuffle to meet the design needs of multi-cluster connection,where the conjugate core (manager core) in each cluster is responsible for inter-cluster communication.Combining N-conjugate shuffle with the MPCAM,presented in[2]as a shared cache for each cluster,a many-core system is obtained and it has the following features:

    A.Within a single cluster(the multi-core system),the communication between cores(intra-cluster communication) is accomplished using the shared cache (the MPCAM) with an access time that equals that of the local(private)cache.

    B.In inter-cluster communication,the core can access any shared cache of any other cluster with an access time equal to the time of one or two local cache accesses.The length of the access time(one or two clock cycles)depends on whether there exists a request from a local core to that shared cache or not.

    C.In the whole system,each shared data,in whichever shared cache it exists,has a unique tag.This tag can be equally used by any core in any cluster of the system to access its data.The tag includes the variable identity (can be address+version number) in addition to two bits.The first bit decides whether the data is local or shared,and the second decides whether the shared data exists in the cluster shared memory(MPCAM)or in the shared memory of another cluster.This means that the shared cache address space is homogenous to all cores of the system.

    3.1 The N-Conjugate Shuffle

    N-conjugate shuffle has been chosen in the proposed system because it connects each conjugate core to another one in the multi-cluster system with a reduced number of links cost and a simple way.To better understand this scheme,it is necessary to explain the structure and function of the Nconjugate shuffle.It is a passive N-shuffle(no silicone devices are involved)that connects the element to its conjugate,e.g.,it connects the element Eij to the element Eji and vice versa.Fig.1a shows a 4-conjugate shuffle.

    It can be easily noted that the buses coming out from the same cluster do not cross each other.By aligning the opposite bus with these buses,it can be obtained a group of buses that do not cross each other.As can be seen in Fig.1b,12 buses can be put in two groups,i.e.,they can be accommodated in two layers of the chip.

    Figure 1:(Continued)

    Figure 1:(a)The 4-conjugate shuffle.(b)The equivalent bidirectional connection 4-conjugate shuffle

    3.2 The MPCAM Organization

    The MPCAM organization and the MPCAM-based multi-core system were presented in papers[1,2].As shown in Fig.2,the MPCAM is organized as an array of Dual Port Content Addressable Memory modules (DPCAMs) distributed and embedded on the cross points of the multi-core interconnection.The two ports of the DPCAM module allow concurrent read/write operations from the two ports as long as they do not access the same memory location.Through the input port,the Store Back(SB)unit of the core pipeline writes the data and its tag to the least recently written memory line.Through the output port,the Operand Fetch(OF)unit of the core pipeline applies the tag of the required data so that the DPCAM searches for the data in all memory lines simultaneously and reads it if found.In the MPCAM organization,DPCAM modules are connected to the horizontal buses for SB cores,each in its row,and are connected to the vertical buses for OF cores,each in its column,allowing the shared cache to be accessed by any core simultaneously without blocking.It also reduces the access latency which becomes equal to that of the local cache.

    Figure 2:The organization of the MPCAM

    3.3 The MPCAM-Based Multi-Core System

    In the MPCAM-based multi-core system,if n cores are used,then the MPCAM must have n horizontal buses,n vertical buses,and n2DPCAM modules;n modules in each row and n modules in each column.The SB units of the core pipelines are connected to the row busses and the OF units are connected to the vertical buses.Through the horizontal bus,the SB unit can write the data and its tag to all DPCAMs in the row.This means that each column will have a copy of the data.So,the OF unit of the core can search for and read the data through the vertical bus of its column.Fig.3 shows an MPCAM-based multi-core organization.This organization achieves simultaneous access to the shared cache and eliminated the need for router devices between cores inside the same multi-core cluster.This result was fully presented and explored in the article[3].

    Figure 3:The MPCAM-based multi-core cluster

    4 The NCSC Interconnection Scheme

    An efficient many-core system can be created if an efficient and simple interconnection scheme is provided and that’s why the NCSC scheme has been chosen for the MPCAM-based clusters.The NCSC uses N-conjugate shuffle combinations to connect between cores of the N system clusters.It is a simple connection method that removes most of the system contention on the inter-cluster level and it is easy to program.

    Fig.4 shows how a bidirectional link(two buses)of the conjugate shuffle connects the OF units of two cores in two different clusters of the many-core system.The switches shown in Fig.4 guarantee that only one core can access a column of the shared cache of the cluster,regardless of to which cluster this core belongs.

    Three bits of the address or the tag would be good enough to control these switches.The only competition occurs when the OF unit of core i in cluster j(OFji)tries to access column j in cluster i while the OFij is trying to access the same column of the MPCAMi in its cluster.The same occurs if OFij tries to fetch data from column i in MPCAMj of cluster j while OFji is trying to access the same column in its cluster(cluster j).In this case,the request coming from within the cluster is given a higher priority,and the request coming from another cluster has to wait for an extra clock cycle.This allows extra time for the core to write the shared data to broadcast the variable to all modules in its row of the MPCAM.

    It should be noted that as the core writes the shared data to all modules in its Row,each column of the MPCAM is going to have a copy of this variable.So,any core and its conjugate can access it regardless of which core has produced it.

    Figure 4:Connecting two cores of two different clusters via the conjugate shuffle

    Let us consider various scenarios in which a path is to be constructed from the source core to the destination as shown in Fig.5.In scenario 1,assume that core OF00 in cluster 0 wants to access core OF03 in the same cluster and core OF23 in cluster 2 access/read core OF31 in cluster 3 simultaneously.Because the source and destination cores are in the same cluster,the communication process is as follows:OF00 searches the local MPCAM0 and then accesses core 03 data if it was produced formerly.

    If OF23 doesn’t find the data in local MPCAM2,it recognizes that the data does not belong to this cluster and communicates directly with the OF32,which is the conjugate of OF23 through the NCSC to access internal MPCAM3.In the next cycle,SB32 sends the data to be stored directly to SB23 on MPCAM2.The same happens in scenario 2 but with different addresses of source and destination.

    In scenario 3,let’s assume that core SB33 in cluster 3 and OF21 in cluster 2 want to access the same destination core 13,core 3 in cluster 1.This means that SB33 wants to write and OF21 wants to read to/from the same destination simultaneously.Since the source and destination are located in different clusters,SB33 uses SB31 which is the conjugate of SB13.By simple mechanism,SB31 recognizes that the data does not belong to this cluster and communicates directly with SB13 through the NCSC to internally send the data to be stored on MPCAM1.At the same time,OF21 searches the local MPCAM2 and if not found,it will search in the destination cluster.Because OF21 is the conjugate of OF12,they are directly connected through the NCSC to access internal MPCAM2.In the next cycle,SB12 sends the data to be stored directly to SB21 on MPCAM2.In this case,both two sources can be read and stored to/from the same destination without any delay or deflecting the path.Scenario 4 is another example of communication where the source and destination cores are on different clusters

    It can be noted that this simple scheme accomplishes the goals of avoiding the complexity of router devices and solving the simultaneous access problem to the same shared memory.Further,the latency(access time)of the shared data is improved as will be explained in the next section.

    Figure 5:Examples of cores communication

    5 Performance Analysis

    NCSC and MPCAM have been implemented,compiled,and verified in a many-core system using Quartus Prime 20.1 which includes the Intel-supported ModelSim package and Nios II Embedded Design Suite(EDS)for design and simulation[24].NCSC was designed using the Cyclone IV-E Field Programmable Gate Array (FPGA) device family,which has new attractive features,especially the number of input/output pins and power consumption [25].Sixteen cores were used to evaluate the access latency with four cores in each cluster.Both schematic files and Verilog Hardware Description Language (Verilog HDL) code have been used to implement the NCSC in a multi-cluster system.ModelSim and Vector Waveform File(VWF)were used to verify and debug the files in both functional and timing simulations.

    5.1 Functional Simulation

    The test bench is written using Verilog HDL code that covers all possible access scenarios between cores whether they belong to the same cluster or different clusters.Running the simulator several times shows that the functionality of the NCSC interconnection network based on multi-cluster architecture has been successfully achieved using functional simulation.

    In addition to functional simulation,the test bench has also been used to evaluate NCSC access latency.Both functional simulation and access latency are classified into three scenarios.In the first scenario,different cores issued multiple simultaneous writes and read operations for data in the same cluster.In the second scenario,the read and write operations were for data in different clusters.In the third scenario,simultaneous read and write operations in different clusters were made to the same core.The test bench is described in Fig.6.in detail.

    Figure 6:Test benchmark program

    Fig.7 depicts an image of various intervals in the functional simulation of NCSC,the clock period is set to 10 ns for reading and writing.In the first interval (0 to 10 ns),both core 11 in the first cluster and core22 in the second cluster broadcast their shared data with tag source(tags)(A1)and(10C1)respectively.In the second interval(10 to 20 ns),core OF31 in the first cluster read shared data produced by core11 in the same cluster,and core OF44 in the fourth cluster read shared data produced by core22 from another cluster(in the second cluster)simultaneously.In the OF31,because the source and destination cores are in the same cluster,the communication process is followed using the MPCAM1 organization within the local cluster.Whereas OF44 does not belong to the same cluster that has core22,the wanted shared data will not be found in the local cluster.So,by comparing the tags.OF44 communicates directly using the OF24 which is the conjugate of the second cluster through the NCSC to read internal MPCAM2.In this interval,the wanted shared data appeared on the Doutcore31 and Dout-core44 correctly.In the third Interval (20 to 30 ns),various cores from different clusters can read and write simultaneously,where core11 in the first cluster writes its shared data with tags(A7),core OF23 in the third cluster read shared data with tags(A1)produced by another cluster(in the first cluster),and core OF44 in the fourth cluster read shared data with tags (A1) produced from a different cluster(the first cluster)simultaneously.In this scenario,core11 broadcasts its shared data to MPCAM1 in cluster one,and OF23 and OF 44 use OF13 and OF14 respectively which is the conjugate of these cores to cluster one.The output result appeared on the Dout-core23 and Doutcore44 correctly.Finally,in the fourth interval (30 to 40 ns),core OF44 in the fourth cluster reads shared data with tags(A7)produced by another cluster(in the first cluster).Here,OF44 uses OF14 which is the conjugate of cluster one in the NCSC interconnection network.The output result appeared on the output bus Dout-core44 correctly.

    Figure 7:Function simulation for a read operation

    Fig.8 shows an image of several intervals for writing operation over NCSC,a new DI-core pin was added to monitor the writing operation.In the first interval(0 to 10 ns),core21 in the first cluster,core 42 in the second cluster,and core 24 in the fourth cluster write their shared data with tags(0211),(1422),and(3074),respectively,each to its MPCAM.It can be observed that the written data are stored correctly and displayed in DI-core21,DI-core42,and DI-core24.In the second interval(10 to 20 ns),core21,core42,and core 24 write the shared data to these MPCAM simultaneously,each one uses its special tags.The written data appeared on the DI-core21,Dout-core42,and Dout-core24 correctly.

    Figure 8:Function simulation for a write operation

    In the functional simulation,it can be noticed that written or read data appear on the DI-core and Dout-core pins without taking into account the delay produced by design components.

    5.2 Latency Assessments

    The Timing Analyzer tool is used to evaluate the read/write latency for the MPCAM and NCSC.In this section,all scenarios that were presented in the functional simulation will be assessed in the timing simulation.The timing simulation for read/write operations of the proposed system is shown in Figs.9–12.

    Fig.9 shows the timing simulation for MPCAM.In the first interval (0 to 10 ns),all cores simultaneously broadcast their shared data with their special Tags.Instantly the written data(pine DIcore)appears clearly on all modules in its row after a short delay as soon as the WR signal goes down.After running the test bench on the simulator one hundred times,it can be noticed that the average access latency of writing on MPCAM organization is around 1.084115±0.03384 ns.The second(10 to 20 ns)interval shows the latency assessment for a read access.To read data that is already written in all MPCAM modules,the Tage destination(Tagd)(1A,2A,3A,and 4A)which is provided by core1,core2,core3,and core4 respectively are simultaneously compared to the tags in all MPCAM modules in the same column.The results appear on output buses(Dout-core)after the delay time.The delay for read operation was calculated using an average of roughly one hundred intervals of test benches,it was noticed that the delay for a read operation is around 1.27804±0.086823 ns.The latencies assessment for concurrent read and write operations to the memory locations are shown in the third(20 to 30 ns)and fourth(40 to 50 ns)periods which was performed around 100 times for each of the two cases.For the scenario when many memory locations are being read and written simultaneously,the latency is 1.1909±0.02363 ns this is almost the same as the separate read operations latency.On the other hand,when write and read operations are performed simultaneously to the same memory locations,the data is written with a latency of 1.3105±0.091955 ns to the destination location,while the read process waits for the following interval then the shared data is read with a total latency 1.2780±0.086823 plus the time of the interval.All these measurements were proved by using Statistical Package for Social Sciences(SPSS)and T-tested with a confidence interval of 95%.

    Fig.10 shows two scenarios of read/write over MPCAM.In the first interval(0 to 10 ns),core 1 broadcasts shared data to all DPCAM models in its row with Tags(0078).Instantly the written data(pine DI-core) appears clearly after a time delay of 1.084115±0.03384 ns as soon as the WR signal goes down this is almost the same as the separate write operations latency.In the second interval(10 to 20 ns),all cores simultaneously read the shared data which was produced by core 1 in the first interval.The results appear on Dout-core pines with a time delay of 1.27804±0.086823 ns,which is almost the same as the separate read operations latency.From the timing simulation,it can be noticed that the write and read operations to the shared cache would not take more than 1.3 ns compared to 5.3 ns at the L2 cache and 19.5 ns in the L3 cache in Nehalem Intel i7.On the other hand,it needs 12 ns at the L2 cache,and 21 ns in the L3 cache in AMD’s Bulldozer family[26].

    Figure 9:MPCAM organization Timing Simulation 1

    Figure 10:MPCAM organization Timing Simulation 2

    Figure 11:NCSC organization Timing Simulation for a read operation

    Figure 12:NCSC organization Timing Simulation for a write operation

    As a result of this simulation,it was demonstrated that all cores in the MPCAM organization can access shared data simultaneously without contention,blocking,and arbitration issues.

    Fig.11 shows the timing simulation of the proposed NCSC with several intervals.In the first interval (0 to 10 ns),both core 11 and core 22 write their shared data with tags (00A1) and (10C1)respectively.In the second interval (10 to 20 ns),core OF31 read shared data with tags (00A1) from the same cluster,and core OF44 read shared data with tags(10C1)from another cluster simultaneously.In this interval,the read latency between the two cases is different.To read data that is already written in the same cluster,the tagd(00A1)which is provided by core31 is simultaneously compared to the tags in the MPCAM column within cluster one.The results appear on output buses (Doutcore 31)after some delay.The delay for read operation was assessed using an average of roughly one hundred intervals of test benches,it was found that the read process delay within the same cluster is around 1.26226±0.090591 ns.On the other hand,to read data that is already written in another cluster,core44 wants to read data produced by core22.OF44 communicates directly using the OF24 which is the conjugate of the second cluster through the NCSC to read internal MPCAM2.The results appear on output buses (Dout-core 44) after some delay.Using an average of one hundred intervals of test benches,it was noticed that the delay for read access between cores that belong to different clusters is around 1.92738±0.139588.In the third Interval (20 to 30 ns),core11 in the first cluster writes its shared data with tags(00A7),core23 in the third cluster read shared data with tags (00A1) produced by core11,and core OF44 read shared data with tags (00A1) simultaneously.In this interval,core11 broadcast its shared data to MPCAM1 in cluster one,and OF23 and OF44 connect to core 11 in the first cluster using OF13 and OF14 respectively which is the conjugate of these cores to cluster one through the NCSC IN.The results appear on output buses (Dout-core23) after some delay.It was noticed that the delay for read access between cores in different clusters is around 1.92738±0.139588 ns,which is nearly equal to the latency of read operation in (Dout-core44).The third Interval(30 to 40 ns)shows the same behavior as the second interval l(20 to 30 ns).

    Fig.12 shows an image of two intervals to assess the write latency over NCSC.In the first interval(0 to 10 ns),core21,core42,and core24 write their shared data with their tags(0211),(1422),and(3074)respectively,each to its MPCAM.It can be observed that the written data are stored in DI-core21,DIcore42,and DI-core24 pins after a delay time with an average of 1.14785±0.04532 which is almost the same as the separate write operations latency in the MPCAM organization.In the second interval(10 to 20 ns),core21,core42,and core24 write the shared data to their MPCAM simultaneously with an average delay of 1.15235±0.06132 which is identical to the latency in the previous interval.

    5.3 Area and Power Consumption Analysis

    Cache memories and routers on any NoC topology are the dominant factors in area and power consumption.Unfortunately,NoC consumes a lot of power,increasing the chip’s total power consumption.Research has already verified that NoC consumes about 40%of the chip’s power without counting the cache power consumption [4].On the other hand,other studies revealed that routers’structure in mesh or tour topology consumes about 28% of the total power and 17% of the total area in the Intel chip[17],and also adds additional latency due to increased hardware structures that negatively affect the performance.Therefore,since routers are the most power-consuming components of the interconnect network,this paper proposed the NCSC interconnection scheme that eleminates the need for router devices and enhances both area and power overheads.Furthermore,NoC with appropriate power consumption plays a leading role in increasing scalability in many-core systems.

    As a future work,a Power analysis simulator can be used to estimate power and area overhead for NCSC which was used for this purpose in MPCAM[1,3].

    6 Conclusion

    In on-chip many-core systems,the NoC topologies have been considered to be the best;however,they have some disadvantages like high scalability cost,power,latency,and contention during access to shared cache,which is mostly due to the usage of router-based structures.Therefore,researchers try to find alternatives for these topologies based on router-less interconnection.In this paper,a scalable topology of the many-core processor systems called NCSC was presented.The proposed topology has additional features such as high scalability,fixed latency on the intra-cluster and inter-cluster levels,and elimination of routers and arbiters which solves the problem of simultaneous access to the shared cache.

    NCSC has been implemented using the Cyclone IV-E FPGA device family.After running the test bench program several times it was found that the main functions of NCSC organization in terms of reading,writing,and simultaneous read-write are accomplished inside and between clusters.The latency of reading and writing by multiple cores within the cluster and between clusters has been assessed.NCSC provides non-blocking access between cores with average latency for write access within the same cluster is around 1.14785±0.04532 ns which is almost the same as the separate write operation in MPCAM,the average read latency within the same cluster is 1.26226±0.090591 ns and the latency for read access between cores from different clusters is around 1.92738±0.139588 ns.On the other hand,the simulation result shows that competition happens only if simultaneous access comes from the same cluster and a request comes from another to the same column in MPCAM.In this case,access coming from the same cluster is given a higher priority,whereas the request coming from another cluster has to wait until the next clock cycle.So,the read latency becomes 1.92738±0.139588 ns pulse the delay from the competition.

    The realization of the MPCAM-based multi-core cluster and conjugate shuffle network in manycore systems opens the door wide for massively parallel processing on a chip and makes life easier for chip designers and programmers.

    In future work,more research can be conducted on the NCSC topology.Other crucial dynamic performance metrics like throughput,latency overhead,area,and power consumption can be evaluated.In addition,all components of this topology,including core interfaces,can be built to ensure the authenticity of NCSC within a many-core system.After that,it can be implemented in Verilog and verified and synthesized using some design tools.

    Acknowledgement:We would like to thank Prof.Adil Amirjanov for his valuable advice and continuous support.We would also like to acknowledge the advice and support from the Department of Computer System Engineering,Arab American University.

    Funding Statement:The authors received no specific funding for this study.

    Author Contributions:The authors confirm their contribution to the paper as follows:study conception and related work: Allam Abumwais and Mujahed Eleyat;data analysis: Allam Abumwais;the components of the proposed system:Allam Abumwais;performance analysis:Allam Abumwais and Mujahed Eleyat;draft manuscript preparation:Mujahed Eleyat.All authors reviewed the results and approved the final version of the manuscript.

    Availability of Data and Materials:Available upon request.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    精品无人区乱码1区二区| 国产精品福利在线免费观看| 欧美色视频一区免费| 成人特级黄色片久久久久久久| 中亚洲国语对白在线视频| 看十八女毛片水多多多| 男女之事视频高清在线观看| 搡老熟女国产l中国老女人| 亚洲性夜色夜夜综合| 麻豆一二三区av精品| 美女高潮喷水抽搐中文字幕| 欧美色欧美亚洲另类二区| av天堂在线播放| 亚洲一区高清亚洲精品| 精品久久久久久,| 精品午夜福利视频在线观看一区| av国产免费在线观看| 日本黄色片子视频| 91av网一区二区| 内地一区二区视频在线| 日韩精品有码人妻一区| 欧美极品一区二区三区四区| 国产真实伦视频高清在线观看 | 午夜日韩欧美国产| 亚洲精品国产成人久久av| 国产精品不卡视频一区二区| 我要搜黄色片| 黄色配什么色好看| 久久午夜福利片| 日本-黄色视频高清免费观看| 真人一进一出gif抽搐免费| 91久久精品电影网| 狂野欧美白嫩少妇大欣赏| 日日啪夜夜撸| 国产精品久久久久久久久免| avwww免费| 老师上课跳d突然被开到最大视频| 久久久久久久久久黄片| 日日夜夜操网爽| 听说在线观看完整版免费高清| 久久久久久久久久成人| 最近最新免费中文字幕在线| 亚洲成人免费电影在线观看| 1024手机看黄色片| 久久精品人妻少妇| 男人狂女人下面高潮的视频| 日韩精品有码人妻一区| 国产成人a区在线观看| 成人特级av手机在线观看| 欧美激情国产日韩精品一区| 男女视频在线观看网站免费| 国产熟女欧美一区二区| 能在线免费观看的黄片| 无遮挡黄片免费观看| 久久精品国产亚洲av天美| 五月玫瑰六月丁香| 欧美日韩国产亚洲二区| 亚洲精品影视一区二区三区av| av.在线天堂| 国内精品久久久久久久电影| 国内揄拍国产精品人妻在线| 日本一本二区三区精品| 丰满乱子伦码专区| 99久久九九国产精品国产免费| 国产亚洲精品久久久com| 国产精品久久久久久亚洲av鲁大| 天堂影院成人在线观看| 可以在线观看毛片的网站| av黄色大香蕉| 中文字幕精品亚洲无线码一区| 色播亚洲综合网| 亚洲av日韩精品久久久久久密| 色哟哟哟哟哟哟| 精品99又大又爽又粗少妇毛片 | 久久久国产成人精品二区| 深夜精品福利| 伊人久久精品亚洲午夜| 精品欧美国产一区二区三| 国产精品无大码| 欧美一级a爱片免费观看看| 天美传媒精品一区二区| 亚洲av一区综合| 国产精品一及| 精华霜和精华液先用哪个| 日本免费一区二区三区高清不卡| 老熟妇乱子伦视频在线观看| 国产精品野战在线观看| 国产探花在线观看一区二区| 成人午夜高清在线视频| .国产精品久久| 男女视频在线观看网站免费| 欧美日韩国产亚洲二区| 精品一区二区三区视频在线| 成人国产麻豆网| 18+在线观看网站| 久久久色成人| 日韩精品有码人妻一区| 日日摸夜夜添夜夜添av毛片 | 韩国av一区二区三区四区| 99riav亚洲国产免费| 精品久久久久久,| 日日摸夜夜添夜夜添小说| videossex国产| 精品久久久久久,| 国产伦精品一区二区三区四那| 啪啪无遮挡十八禁网站| 国产三级在线视频| 免费在线观看日本一区| 精品久久久久久久末码| 国产午夜精品久久久久久一区二区三区 | 啦啦啦韩国在线观看视频| 欧美成人a在线观看| 人人妻人人澡欧美一区二区| 国产成人a区在线观看| 久久亚洲真实| 亚洲自拍偷在线| 亚洲av中文字字幕乱码综合| 春色校园在线视频观看| 日韩精品有码人妻一区| 特级一级黄色大片| 18禁黄网站禁片午夜丰满| 99热只有精品国产| 免费无遮挡裸体视频| 色视频www国产| 淫秽高清视频在线观看| 亚洲一区二区三区色噜噜| 校园春色视频在线观看| 黄色视频,在线免费观看| 久久热精品热| 国产在线精品亚洲第一网站| 中亚洲国语对白在线视频| 狂野欧美激情性xxxx在线观看| 欧美xxxx黑人xx丫x性爽| 中文字幕精品亚洲无线码一区| 最近最新中文字幕大全电影3| 夜夜爽天天搞| 日韩中字成人| 伦精品一区二区三区| 国产真实乱freesex| 97超视频在线观看视频| 免费av不卡在线播放| 女的被弄到高潮叫床怎么办 | 中文在线观看免费www的网站| 久久6这里有精品| 搡老岳熟女国产| 亚洲,欧美,日韩| aaaaa片日本免费| a级毛片免费高清观看在线播放| 12—13女人毛片做爰片一| 真实男女啪啪啪动态图| 一本久久中文字幕| 亚洲av.av天堂| 麻豆成人av在线观看| 免费黄网站久久成人精品| 最近中文字幕高清免费大全6 | 久久中文看片网| 女同久久另类99精品国产91| 国产69精品久久久久777片| 88av欧美| 成人特级黄色片久久久久久久| 老女人水多毛片| 午夜福利视频1000在线观看| 国产精品免费一区二区三区在线| av天堂中文字幕网| 乱码一卡2卡4卡精品| 99久久精品国产国产毛片| 日韩欧美三级三区| 成人特级黄色片久久久久久久| 男女那种视频在线观看| 禁无遮挡网站| 国产亚洲91精品色在线| 国产高潮美女av| 一级a爱片免费观看的视频| www.色视频.com| 丰满的人妻完整版| 亚洲av.av天堂| 深夜精品福利| 精品久久久久久久久av| 国产午夜精品久久久久久一区二区三区 | 日本与韩国留学比较| 看免费成人av毛片| 色尼玛亚洲综合影院| 美女cb高潮喷水在线观看| 国产成人福利小说| 欧美+亚洲+日韩+国产| 久久精品国产亚洲av涩爱 | 国产三级在线视频| 变态另类丝袜制服| 亚洲av日韩精品久久久久久密| 久久久精品欧美日韩精品| 免费观看人在逋| 免费搜索国产男女视频| 久久久久国内视频| 午夜激情福利司机影院| 婷婷丁香在线五月| 99riav亚洲国产免费| 丰满乱子伦码专区| 午夜福利欧美成人| av.在线天堂| 亚洲成av人片在线播放无| 女的被弄到高潮叫床怎么办 | 国产亚洲精品综合一区在线观看| 成人鲁丝片一二三区免费| 欧美黑人巨大hd| 国产国拍精品亚洲av在线观看| 精品人妻1区二区| 又黄又爽又免费观看的视频| 中文字幕人妻熟人妻熟丝袜美| 成人性生交大片免费视频hd| av在线天堂中文字幕| 成人国产麻豆网| 国产免费一级a男人的天堂| 亚洲无线观看免费| 久久精品国产鲁丝片午夜精品 | 伊人久久精品亚洲午夜| 久久国产精品人妻蜜桃| 韩国av在线不卡| 嫩草影院入口| 变态另类丝袜制服| 日本色播在线视频| 亚洲国产高清在线一区二区三| 波多野结衣巨乳人妻| 欧美日韩亚洲国产一区二区在线观看| 午夜免费激情av| 极品教师在线免费播放| 国产免费av片在线观看野外av| 国产高潮美女av| 黄色配什么色好看| 免费观看精品视频网站| 极品教师在线视频| 99久久精品国产国产毛片| 美女xxoo啪啪120秒动态图| 日日撸夜夜添| 免费看a级黄色片| 日韩中文字幕欧美一区二区| 人人妻人人看人人澡| 日韩欧美在线二视频| 国产美女午夜福利| 亚洲五月天丁香| 色综合站精品国产| 久久婷婷人人爽人人干人人爱| 国产乱人视频| 国产精品女同一区二区软件 | 桃色一区二区三区在线观看| 久久久久九九精品影院| 欧美性猛交╳xxx乱大交人| 91麻豆精品激情在线观看国产| 亚洲精品一区av在线观看| 精品不卡国产一区二区三区| 97人妻精品一区二区三区麻豆| 日韩精品青青久久久久久| 亚洲精品一卡2卡三卡4卡5卡| 精品久久久久久久人妻蜜臀av| 国产精品永久免费网站| 在线看三级毛片| 精品欧美国产一区二区三| 日韩av在线大香蕉| 色综合站精品国产| 国产精品一及| av国产免费在线观看| 日韩人妻高清精品专区| 熟女电影av网| 看免费成人av毛片| 波多野结衣巨乳人妻| 国产大屁股一区二区在线视频| 国产精品伦人一区二区| 成人av在线播放网站| 日韩高清综合在线| 亚洲av第一区精品v没综合| 琪琪午夜伦伦电影理论片6080| 99久久中文字幕三级久久日本| 深夜a级毛片| 97碰自拍视频| 三级国产精品欧美在线观看| 看十八女毛片水多多多| 国产成人一区二区在线| 国产激情偷乱视频一区二区| 国产 一区 欧美 日韩| 国产精品久久电影中文字幕| 国产亚洲欧美98| 乱系列少妇在线播放| 桃红色精品国产亚洲av| 欧美在线一区亚洲| 极品教师在线免费播放| 精品久久国产蜜桃| 丰满人妻一区二区三区视频av| 国产高清激情床上av| 精品人妻视频免费看| 日日啪夜夜撸| 亚洲专区中文字幕在线| 身体一侧抽搐| 在线天堂最新版资源| 欧美黑人欧美精品刺激| 亚洲精品久久国产高清桃花| 精品无人区乱码1区二区| 十八禁国产超污无遮挡网站| 国产精品99久久久久久久久| 身体一侧抽搐| 国产91精品成人一区二区三区| 黄色女人牲交| 国产蜜桃级精品一区二区三区| 搡女人真爽免费视频火全软件 | 观看免费一级毛片| 精品久久久久久久久久久久久| 国产久久久一区二区三区| 午夜福利成人在线免费观看| 蜜桃亚洲精品一区二区三区| 美女高潮的动态| 色视频www国产| 成人二区视频| 国产单亲对白刺激| 国产毛片a区久久久久| av天堂在线播放| 长腿黑丝高跟| 午夜免费激情av| 亚洲精品乱码久久久v下载方式| 中文资源天堂在线| 久久久久久九九精品二区国产| 99热只有精品国产| 久久久久久大精品| 俄罗斯特黄特色一大片| 我要搜黄色片| 99国产精品一区二区蜜桃av| 夜夜爽天天搞| 性色avwww在线观看| 久久九九热精品免费| 国产欧美日韩一区二区精品| 午夜爱爱视频在线播放| 无人区码免费观看不卡| 国内精品久久久久久久电影| 亚洲成a人片在线一区二区| 成年免费大片在线观看| 久久草成人影院| 亚洲人成网站在线播| 久久久久精品国产欧美久久久| 亚洲天堂国产精品一区在线| 精品一区二区三区av网在线观看| 免费观看精品视频网站| 中文字幕高清在线视频| 亚洲精华国产精华液的使用体验 | 久久久久久久久中文| 在线播放无遮挡| 免费看光身美女| 国语自产精品视频在线第100页| 国产高清不卡午夜福利| 小说图片视频综合网站| 色哟哟哟哟哟哟| 久久热精品热| 亚洲欧美日韩无卡精品| 亚洲美女视频黄频| 亚洲av免费在线观看| 亚洲内射少妇av| 国产精品自产拍在线观看55亚洲| 校园人妻丝袜中文字幕| 国产精品久久久久久av不卡| 男人的好看免费观看在线视频| 午夜精品久久久久久毛片777| 韩国av在线不卡| 亚洲av电影不卡..在线观看| 午夜久久久久精精品| 美女被艹到高潮喷水动态| av在线亚洲专区| 白带黄色成豆腐渣| 国产午夜福利久久久久久| 中文字幕av成人在线电影| 国产精品福利在线免费观看| 日本 av在线| 国产成年人精品一区二区| 91久久精品国产一区二区三区| 岛国在线免费视频观看| 性色avwww在线观看| 18禁黄网站禁片午夜丰满| 91精品国产九色| 国产aⅴ精品一区二区三区波| 天堂av国产一区二区熟女人妻| 一个人看视频在线观看www免费| av福利片在线观看| 国内少妇人妻偷人精品xxx网站| 国产一区二区三区av在线 | 午夜福利视频1000在线观看| 美女 人体艺术 gogo| 亚洲avbb在线观看| 成人综合一区亚洲| 欧美绝顶高潮抽搐喷水| 久久香蕉精品热| 男女啪啪激烈高潮av片| 欧美日韩精品成人综合77777| 国产免费av片在线观看野外av| 女生性感内裤真人,穿戴方法视频| 校园人妻丝袜中文字幕| 国内精品宾馆在线| 啦啦啦啦在线视频资源| 久久国产精品人妻蜜桃| 91狼人影院| 夜夜看夜夜爽夜夜摸| 久久人人精品亚洲av| 免费不卡的大黄色大毛片视频在线观看 | 国产精品一及| 一级毛片久久久久久久久女| 午夜福利视频1000在线观看| 欧美丝袜亚洲另类 | 日韩精品青青久久久久久| 国产高清激情床上av| 成人特级av手机在线观看| 中亚洲国语对白在线视频| 国产中年淑女户外野战色| 日韩欧美国产在线观看| 亚洲性久久影院| 三级国产精品欧美在线观看| 一a级毛片在线观看| 欧美日韩中文字幕国产精品一区二区三区| 91久久精品国产一区二区成人| 亚洲人与动物交配视频| av黄色大香蕉| 久久精品国产亚洲av涩爱 | 中文字幕熟女人妻在线| 高清毛片免费观看视频网站| 国产单亲对白刺激| 免费人成在线观看视频色| 亚洲精品一卡2卡三卡4卡5卡| 人人妻,人人澡人人爽秒播| 日韩欧美三级三区| 成人午夜高清在线视频| 国产中年淑女户外野战色| av在线老鸭窝| 免费无遮挡裸体视频| 最好的美女福利视频网| 天堂√8在线中文| 久久午夜福利片| 尤物成人国产欧美一区二区三区| 亚洲精品成人久久久久久| 日韩高清综合在线| 日韩一区二区视频免费看| 国产一区二区三区av在线 | 老熟妇乱子伦视频在线观看| 国产一区二区在线av高清观看| 桃红色精品国产亚洲av| 国产av一区在线观看免费| 久久久久国产精品人妻aⅴ院| 国产男靠女视频免费网站| 精品久久国产蜜桃| 亚洲av免费高清在线观看| 又爽又黄a免费视频| 日韩精品有码人妻一区| 美女高潮喷水抽搐中文字幕| 亚洲人成伊人成综合网2020| 国内精品一区二区在线观看| 亚洲av.av天堂| 在线免费观看不下载黄p国产 | 琪琪午夜伦伦电影理论片6080| 91精品国产九色| 91久久精品电影网| 国产精品日韩av在线免费观看| 能在线免费观看的黄片| 日本免费一区二区三区高清不卡| 九九在线视频观看精品| 国产主播在线观看一区二区| 一区二区三区四区激情视频 | 老司机深夜福利视频在线观看| av在线观看视频网站免费| 91久久精品国产一区二区三区| 简卡轻食公司| 国产欧美日韩精品一区二区| 不卡一级毛片| 五月玫瑰六月丁香| 日本欧美国产在线视频| 日本一本二区三区精品| 日本-黄色视频高清免费观看| 免费电影在线观看免费观看| 欧美xxxx黑人xx丫x性爽| 国产久久久一区二区三区| 婷婷精品国产亚洲av| 婷婷六月久久综合丁香| 国产精品久久久久久av不卡| 免费黄网站久久成人精品| 欧美不卡视频在线免费观看| 色噜噜av男人的天堂激情| 夜夜爽天天搞| 亚洲第一电影网av| 欧美日韩综合久久久久久 | 91久久精品国产一区二区三区| 美女 人体艺术 gogo| 免费av观看视频| 日本与韩国留学比较| 国产黄色小视频在线观看| 成年女人毛片免费观看观看9| 日韩,欧美,国产一区二区三区 | 网址你懂的国产日韩在线| 亚洲美女视频黄频| www.www免费av| 天堂网av新在线| 国产精品综合久久久久久久免费| 久久久久久久午夜电影| 国内精品久久久久久久电影| 欧美日韩国产亚洲二区| 精品久久久久久久久亚洲 | 中文字幕av在线有码专区| 国产亚洲91精品色在线| 久久久久久久亚洲中文字幕| 久久精品国产清高在天天线| 国产成人av教育| 亚洲午夜理论影院| 少妇人妻一区二区三区视频| 嫩草影院新地址| 国产av一区在线观看免费| 亚洲第一区二区三区不卡| 国产成人福利小说| 桃色一区二区三区在线观看| 免费看光身美女| 黄色女人牲交| 真人一进一出gif抽搐免费| 小蜜桃在线观看免费完整版高清| 自拍偷自拍亚洲精品老妇| 99国产精品一区二区蜜桃av| 日韩欧美国产在线观看| 日韩欧美 国产精品| av在线天堂中文字幕| 国产精品久久电影中文字幕| 日韩中文字幕欧美一区二区| 变态另类丝袜制服| 桃色一区二区三区在线观看| 白带黄色成豆腐渣| 亚洲av免费在线观看| 精品无人区乱码1区二区| 少妇熟女aⅴ在线视频| 香蕉av资源在线| 能在线免费观看的黄片| 老司机深夜福利视频在线观看| 久久久成人免费电影| 国产淫片久久久久久久久| av在线亚洲专区| 丰满的人妻完整版| bbb黄色大片| 99久久九九国产精品国产免费| 97超视频在线观看视频| 很黄的视频免费| 亚洲经典国产精华液单| 久久精品影院6| 免费看日本二区| 午夜福利在线观看免费完整高清在 | 亚洲熟妇中文字幕五十中出| 夜夜夜夜夜久久久久| 亚洲专区国产一区二区| 欧美一区二区国产精品久久精品| 成年女人看的毛片在线观看| 成人特级av手机在线观看| 亚洲人成网站在线播| ponron亚洲| 99久久无色码亚洲精品果冻| 成人欧美大片| 久久精品国产自在天天线| 国内精品宾馆在线| 搞女人的毛片| 日本精品一区二区三区蜜桃| 成人二区视频| 三级国产精品欧美在线观看| 亚洲av第一区精品v没综合| 国内精品久久久久精免费| 亚洲精品日韩av片在线观看| 在现免费观看毛片| 看免费成人av毛片| 亚洲精品粉嫩美女一区| 蜜桃久久精品国产亚洲av| 成人国产一区最新在线观看| 欧美成人a在线观看| 在线国产一区二区在线| 99热网站在线观看| 啦啦啦啦在线视频资源| 身体一侧抽搐| 夜夜爽天天搞| netflix在线观看网站| 美女高潮喷水抽搐中文字幕| 婷婷丁香在线五月| 中文字幕av成人在线电影| 成人无遮挡网站| 亚洲性久久影院| 哪里可以看免费的av片| 精品99又大又爽又粗少妇毛片 | 毛片一级片免费看久久久久 | 亚洲专区国产一区二区| 欧美一区二区亚洲| 99精品在免费线老司机午夜| 国产成年人精品一区二区| 免费人成在线观看视频色| a级毛片a级免费在线| 看黄色毛片网站| 舔av片在线| 国产伦一二天堂av在线观看| 深爱激情五月婷婷| 成人永久免费在线观看视频| 免费大片18禁| 老司机深夜福利视频在线观看| 天美传媒精品一区二区| 色综合色国产| av视频在线观看入口| 国产精品亚洲美女久久久| 淫秽高清视频在线观看| 啦啦啦韩国在线观看视频| 国产91精品成人一区二区三区| 校园春色视频在线观看| 村上凉子中文字幕在线| 乱系列少妇在线播放| 999久久久精品免费观看国产| 色精品久久人妻99蜜桃| 少妇熟女aⅴ在线视频| 美女xxoo啪啪120秒动态图| 成人精品一区二区免费| 国产国拍精品亚洲av在线观看| 久久国产精品人妻蜜桃| 国产伦精品一区二区三区视频9| 91精品国产九色| 无遮挡黄片免费观看| 97超级碰碰碰精品色视频在线观看| 午夜福利高清视频| 久久久国产成人免费|