• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Scalable Interconnection Scheme in Many-Core Systems

    2023-12-12 15:50:20AllamAbumwaisandMujahedEleyat
    Computers Materials&Continua 2023年10期

    Allam Abumwaisand Mujahed Eleyat

    Computer Systems Engineering,Arab American University,Jenin,240,Palestine

    ABSTRACT Recent architectures of multi-core systems may have a relatively large number of cores that typically ranges from tens to hundreds;therefore called many-core systems.Such systems require an efficient interconnection network that tries to address two major problems.First,the overhead of power and area cost and its effect on scalability.Second,high access latency is caused by multiple cores’simultaneous accesses of the same shared module.This paper presents an interconnection scheme called N-conjugate Shuffle Clusters(NCSC)based on multi-core multicluster architecture to reduce the overhead of the just mentioned problems.NCSC eliminated the need for router devices and their complexity and hence reduced the power and area costs.It also resigned and distributed the shared caches across the interconnection network to increase the ability for simultaneous access and hence reduce the access latency.For intra-cluster communication,Multi-port Content Addressable Memory(MPCAM)is used.The experimental results using four clusters and four cores each indicated that the average access latency for a write process is 1.14785±0.04532 ns which is nearly equal to the latency of a write operation in MPCAM.Moreover,it was demonstrated that the average read latency within a cluster is 1.26226±0.090591 ns and around 1.92738±0.139588 ns for read access between cores from different clusters.

    KEYWORDS Many-core;multi-core;N-conjugate shuffle;multi-port content addressable memory;interconnection network

    1 Introduction

    In multiprocessor systems and multi-core processors afterward,the processors(the cores)of the system compete for access to the shared resources,mainly the interconnection networks and the shared memory.They compete during the execution of the main program instructions and the execution of the cache coherence protocols instructions.This competition leads to contention,arbitration,bottleneck,and time delay as a result.Due to these problems,a longer execution time of the program is expected.In the previous papers,the authors have presented an organization that rids the system of these problems[1–3].

    During the past decade,processor designers started working on expanding the multi-core systems to many-core systems where the latter contains a large number,tens to hundreds,of cores[4–6] and they developed several many-core systems over the last few years[7–9].Because the Interconnection networks architecture is at the core of such systems,various architectures have been proposed in the literature.Network on Chip (NoC) router-based architecture becomes the optimal solution to overcome the long-standing challenges for traditional bus interconnection networks due to many features such as scalability,effective bandwidth,and others.The NoC that uses routers to provide multiple paths between cores to enhance throughput and scalability has two schemes;buffered and buffer-less NoCs.Recently,most research groups work to improve the performance of these schemes by producing a hybrid NoC which merges between buffered and buffer-less NoC to handle distributed resources in many-core systems,and reduce the contentions,power consumption and so provide highefficiency NoC[10,11].

    All many-core interconnection network that uses router-based NoC pays a heavy penalty in terms of area and power consumption due to the complex router structure.Research revealed that routers’structure in mesh or tour topology consumes about 28%of the total power and 17%of the total area in the Intel chip[12].In addition,the structure of routers puts additional latency due to the increase in hardware structures that negatively affect the performance.Hence this reduces the scalability.On the other hand,there is still a bottleneck when more than one core simultaneously accesses the same shared cache module because this increases the access latency and hence degrades the system performance.

    Designing an effective network topology that reduces the penalties that may be produced in a large-scale system is the main contribution of this paper.Therefore,the goal of this paper is to present a router-less high-effective interconnection network topology based on a multi-cluster architecture capable of improving performance metrics like scalability,size,bandwidth,complexity,and latency.It addresses the scalability problem caused by router structure and reduces the shared cache access latency.

    In this paper,an interconnection scheme called NCSC that connects N multi-core clusters each of which consists of n cores,where (n ≥N),is presented and authors assume that n=N i.e.,N2cores are included in the proposed system.This design consists of two parts.MPCAM organization for communication cores within the same cluster,and conjugate shuffle interconnection for intercluster communication.NCSC eliminates the routers devices from both parts leading to an increase in the system scalability.Also,redesigned shared cache organization and distributed across the interconnection network;to increase the ability for simultaneous access and hence reduce the access latency.

    The remaining of this paper is organized as follows: Section 2 briefly lists the literature review for many-core interconnection networks.Section 3 explains the main component of the proposed interconnection networks within a cluster and inter-clusters.Section 4 discusses the implementation of the NCSC in Field Programmable Gate Arrays (FPGA) while Section 5 displays the main parts simulation and demonstrates the latency estimation analysis in different scenarios.Finally,Section 6 draws some conclusions from this study and suggests a future vision.

    2 Related Works

    Because of the router-based NoC problems,researchers start with the strenuous pursuit to reduce the number of routers and then produce a router-less NoC.Awal et al.[13]proposed a combination of 2-D mesh on multilayers NoC.It reduces the number of routers needed in the network based on multilayers chip.This network has several attractive metrics,such as network cost,and constant degree.Compared to other interconnection networks,it has moderate latency,fault-tolerant structure,and link count.Whilst this interconnection network has several attractive metrics,it has a drawback of the difficulty to scale the design because this requires additional chip layers making it not feasible in current technology.Moreover,this architecture does not take into account the connection of the cores to shared memory.

    Li et al.[14]proposed Nesting Ring NoC(NRO).This topology consists of a set of clusters each with a fixed size of four cores.NRO achieves attractive features in terms of performance and latency but it has an obvious problem in terms of scalability specifically as the number of cores increases to hundreds or more.This will increase the network diameter causing higher delays and increased traffic along inter-cluster paths.In[15]authors try to address these problems using a large-scale NRO interconnection network that modifies NRO in two ways.First,by introducing new links between cores and clusters to reduce the network diameter.Second,by exploiting the advantages of multilevel chips to combine large cores on each cluster.On the other hand,this work does not discuss the shared memory issue between Intra and inter-clusters.

    Udipi et al.[6] designed a new interconnection structure that eliminates the routers between segments.This architecture is based on a shorted bus and a segmented bus.The main idea is to divide the system chip into various segments of cores,with a shorted bus interconnection on each segment.Each sub-segment bus is connected to a central bus that is directly connected to the manager core.A simple control unit,called“Filter”,was implemented on each central bus to allow the data to transmit between segments.This design has a scalability problem because large numbers of cores need large links and complex control units especially to preserve consistency for shared cache memory.

    Liu et al.[16]proposed a new architecture called Isolated Multi-Ring(IMR)that tries to connect up to 1024 cores with Multi-Ring topology.In IMR,any two cores can be connected via one or more isolated rings so that each packet can reach the destination directly by the same ring structure.This eliminates the need for complex routers and so improves performance and reduces hardware costs.IMR enhanced the throughput and latency but still has many issues like a large number of rings and a large number of buffers at interfaces.

    Alazemi et al.[17] proposed a novel router-less architecture that exploits the bus’s resources perfectly to achieve the shortest path and solve the scalability problem.As the new technologies scale the chips to smaller dimensions,it supports a higher level of metal layers for integration.With this new trend,the increasing number of layers will be exploited in routing metal layers.For example,The Intel Xeon Phi series are designed in 13 metal layers[18].The simulation results show that this architecture achieves a significant advantage in latency,performance,and power consumption.Whilst this idea is so promising,the specific architecture has several issues and deals with ideal many-core architecture without taking into consideration issues related to shared caches.

    The authors in[19,20]proposed a new machine learning based on the deep reinforcement method,to decide on the ideal loop placement for routerless NoCs under different design restrictions.The new approach successfully solves issues with the old design,but it still uses a lot of interconnection loops,which increases power consumption.

    The cache coherence protocols cause a larger delay because each core must notify other cores of any changes it makes to a shared variable.The authors in[3]suggested a new coherency approach in the MPCAM that guarantees the cache coherence for all shared variables over multi-core.With this method,there is no longer a need for cache coherence operations,and the delay of accessing the shared cache becomes the same as accessing the core’s private memory.

    This literature contributes to the research topic:router-less interconnection network in a manycore architecture.The main goal of this article is to propose an effective interconnection network,which improves many-core performance by exploiting MPCAM shared memory and router-less techniques.

    3 The Component of the Proposed Scheme

    During the early stages of the proposed NCSC design,the decision was taken that using a router device is not a choice,because it adds burden and complexity to network architecture.NCSC uses N-shuffle stages connected by N crossbar network-based clusters;each consists of N processors.Nshuffle stages have been used before in many interconnection networks [21–23].This scheme has been modified to an N-conjugate shuffle to meet the design needs of multi-cluster connection,where the conjugate core (manager core) in each cluster is responsible for inter-cluster communication.Combining N-conjugate shuffle with the MPCAM,presented in[2]as a shared cache for each cluster,a many-core system is obtained and it has the following features:

    A.Within a single cluster(the multi-core system),the communication between cores(intra-cluster communication) is accomplished using the shared cache (the MPCAM) with an access time that equals that of the local(private)cache.

    B.In inter-cluster communication,the core can access any shared cache of any other cluster with an access time equal to the time of one or two local cache accesses.The length of the access time(one or two clock cycles)depends on whether there exists a request from a local core to that shared cache or not.

    C.In the whole system,each shared data,in whichever shared cache it exists,has a unique tag.This tag can be equally used by any core in any cluster of the system to access its data.The tag includes the variable identity (can be address+version number) in addition to two bits.The first bit decides whether the data is local or shared,and the second decides whether the shared data exists in the cluster shared memory(MPCAM)or in the shared memory of another cluster.This means that the shared cache address space is homogenous to all cores of the system.

    3.1 The N-Conjugate Shuffle

    N-conjugate shuffle has been chosen in the proposed system because it connects each conjugate core to another one in the multi-cluster system with a reduced number of links cost and a simple way.To better understand this scheme,it is necessary to explain the structure and function of the Nconjugate shuffle.It is a passive N-shuffle(no silicone devices are involved)that connects the element to its conjugate,e.g.,it connects the element Eij to the element Eji and vice versa.Fig.1a shows a 4-conjugate shuffle.

    It can be easily noted that the buses coming out from the same cluster do not cross each other.By aligning the opposite bus with these buses,it can be obtained a group of buses that do not cross each other.As can be seen in Fig.1b,12 buses can be put in two groups,i.e.,they can be accommodated in two layers of the chip.

    Figure 1:(Continued)

    Figure 1:(a)The 4-conjugate shuffle.(b)The equivalent bidirectional connection 4-conjugate shuffle

    3.2 The MPCAM Organization

    The MPCAM organization and the MPCAM-based multi-core system were presented in papers[1,2].As shown in Fig.2,the MPCAM is organized as an array of Dual Port Content Addressable Memory modules (DPCAMs) distributed and embedded on the cross points of the multi-core interconnection.The two ports of the DPCAM module allow concurrent read/write operations from the two ports as long as they do not access the same memory location.Through the input port,the Store Back(SB)unit of the core pipeline writes the data and its tag to the least recently written memory line.Through the output port,the Operand Fetch(OF)unit of the core pipeline applies the tag of the required data so that the DPCAM searches for the data in all memory lines simultaneously and reads it if found.In the MPCAM organization,DPCAM modules are connected to the horizontal buses for SB cores,each in its row,and are connected to the vertical buses for OF cores,each in its column,allowing the shared cache to be accessed by any core simultaneously without blocking.It also reduces the access latency which becomes equal to that of the local cache.

    Figure 2:The organization of the MPCAM

    3.3 The MPCAM-Based Multi-Core System

    In the MPCAM-based multi-core system,if n cores are used,then the MPCAM must have n horizontal buses,n vertical buses,and n2DPCAM modules;n modules in each row and n modules in each column.The SB units of the core pipelines are connected to the row busses and the OF units are connected to the vertical buses.Through the horizontal bus,the SB unit can write the data and its tag to all DPCAMs in the row.This means that each column will have a copy of the data.So,the OF unit of the core can search for and read the data through the vertical bus of its column.Fig.3 shows an MPCAM-based multi-core organization.This organization achieves simultaneous access to the shared cache and eliminated the need for router devices between cores inside the same multi-core cluster.This result was fully presented and explored in the article[3].

    Figure 3:The MPCAM-based multi-core cluster

    4 The NCSC Interconnection Scheme

    An efficient many-core system can be created if an efficient and simple interconnection scheme is provided and that’s why the NCSC scheme has been chosen for the MPCAM-based clusters.The NCSC uses N-conjugate shuffle combinations to connect between cores of the N system clusters.It is a simple connection method that removes most of the system contention on the inter-cluster level and it is easy to program.

    Fig.4 shows how a bidirectional link(two buses)of the conjugate shuffle connects the OF units of two cores in two different clusters of the many-core system.The switches shown in Fig.4 guarantee that only one core can access a column of the shared cache of the cluster,regardless of to which cluster this core belongs.

    Three bits of the address or the tag would be good enough to control these switches.The only competition occurs when the OF unit of core i in cluster j(OFji)tries to access column j in cluster i while the OFij is trying to access the same column of the MPCAMi in its cluster.The same occurs if OFij tries to fetch data from column i in MPCAMj of cluster j while OFji is trying to access the same column in its cluster(cluster j).In this case,the request coming from within the cluster is given a higher priority,and the request coming from another cluster has to wait for an extra clock cycle.This allows extra time for the core to write the shared data to broadcast the variable to all modules in its row of the MPCAM.

    It should be noted that as the core writes the shared data to all modules in its Row,each column of the MPCAM is going to have a copy of this variable.So,any core and its conjugate can access it regardless of which core has produced it.

    Figure 4:Connecting two cores of two different clusters via the conjugate shuffle

    Let us consider various scenarios in which a path is to be constructed from the source core to the destination as shown in Fig.5.In scenario 1,assume that core OF00 in cluster 0 wants to access core OF03 in the same cluster and core OF23 in cluster 2 access/read core OF31 in cluster 3 simultaneously.Because the source and destination cores are in the same cluster,the communication process is as follows:OF00 searches the local MPCAM0 and then accesses core 03 data if it was produced formerly.

    If OF23 doesn’t find the data in local MPCAM2,it recognizes that the data does not belong to this cluster and communicates directly with the OF32,which is the conjugate of OF23 through the NCSC to access internal MPCAM3.In the next cycle,SB32 sends the data to be stored directly to SB23 on MPCAM2.The same happens in scenario 2 but with different addresses of source and destination.

    In scenario 3,let’s assume that core SB33 in cluster 3 and OF21 in cluster 2 want to access the same destination core 13,core 3 in cluster 1.This means that SB33 wants to write and OF21 wants to read to/from the same destination simultaneously.Since the source and destination are located in different clusters,SB33 uses SB31 which is the conjugate of SB13.By simple mechanism,SB31 recognizes that the data does not belong to this cluster and communicates directly with SB13 through the NCSC to internally send the data to be stored on MPCAM1.At the same time,OF21 searches the local MPCAM2 and if not found,it will search in the destination cluster.Because OF21 is the conjugate of OF12,they are directly connected through the NCSC to access internal MPCAM2.In the next cycle,SB12 sends the data to be stored directly to SB21 on MPCAM2.In this case,both two sources can be read and stored to/from the same destination without any delay or deflecting the path.Scenario 4 is another example of communication where the source and destination cores are on different clusters

    It can be noted that this simple scheme accomplishes the goals of avoiding the complexity of router devices and solving the simultaneous access problem to the same shared memory.Further,the latency(access time)of the shared data is improved as will be explained in the next section.

    Figure 5:Examples of cores communication

    5 Performance Analysis

    NCSC and MPCAM have been implemented,compiled,and verified in a many-core system using Quartus Prime 20.1 which includes the Intel-supported ModelSim package and Nios II Embedded Design Suite(EDS)for design and simulation[24].NCSC was designed using the Cyclone IV-E Field Programmable Gate Array (FPGA) device family,which has new attractive features,especially the number of input/output pins and power consumption [25].Sixteen cores were used to evaluate the access latency with four cores in each cluster.Both schematic files and Verilog Hardware Description Language (Verilog HDL) code have been used to implement the NCSC in a multi-cluster system.ModelSim and Vector Waveform File(VWF)were used to verify and debug the files in both functional and timing simulations.

    5.1 Functional Simulation

    The test bench is written using Verilog HDL code that covers all possible access scenarios between cores whether they belong to the same cluster or different clusters.Running the simulator several times shows that the functionality of the NCSC interconnection network based on multi-cluster architecture has been successfully achieved using functional simulation.

    In addition to functional simulation,the test bench has also been used to evaluate NCSC access latency.Both functional simulation and access latency are classified into three scenarios.In the first scenario,different cores issued multiple simultaneous writes and read operations for data in the same cluster.In the second scenario,the read and write operations were for data in different clusters.In the third scenario,simultaneous read and write operations in different clusters were made to the same core.The test bench is described in Fig.6.in detail.

    Figure 6:Test benchmark program

    Fig.7 depicts an image of various intervals in the functional simulation of NCSC,the clock period is set to 10 ns for reading and writing.In the first interval (0 to 10 ns),both core 11 in the first cluster and core22 in the second cluster broadcast their shared data with tag source(tags)(A1)and(10C1)respectively.In the second interval(10 to 20 ns),core OF31 in the first cluster read shared data produced by core11 in the same cluster,and core OF44 in the fourth cluster read shared data produced by core22 from another cluster(in the second cluster)simultaneously.In the OF31,because the source and destination cores are in the same cluster,the communication process is followed using the MPCAM1 organization within the local cluster.Whereas OF44 does not belong to the same cluster that has core22,the wanted shared data will not be found in the local cluster.So,by comparing the tags.OF44 communicates directly using the OF24 which is the conjugate of the second cluster through the NCSC to read internal MPCAM2.In this interval,the wanted shared data appeared on the Doutcore31 and Dout-core44 correctly.In the third Interval (20 to 30 ns),various cores from different clusters can read and write simultaneously,where core11 in the first cluster writes its shared data with tags(A7),core OF23 in the third cluster read shared data with tags(A1)produced by another cluster(in the first cluster),and core OF44 in the fourth cluster read shared data with tags (A1) produced from a different cluster(the first cluster)simultaneously.In this scenario,core11 broadcasts its shared data to MPCAM1 in cluster one,and OF23 and OF 44 use OF13 and OF14 respectively which is the conjugate of these cores to cluster one.The output result appeared on the Dout-core23 and Doutcore44 correctly.Finally,in the fourth interval (30 to 40 ns),core OF44 in the fourth cluster reads shared data with tags(A7)produced by another cluster(in the first cluster).Here,OF44 uses OF14 which is the conjugate of cluster one in the NCSC interconnection network.The output result appeared on the output bus Dout-core44 correctly.

    Figure 7:Function simulation for a read operation

    Fig.8 shows an image of several intervals for writing operation over NCSC,a new DI-core pin was added to monitor the writing operation.In the first interval(0 to 10 ns),core21 in the first cluster,core 42 in the second cluster,and core 24 in the fourth cluster write their shared data with tags(0211),(1422),and(3074),respectively,each to its MPCAM.It can be observed that the written data are stored correctly and displayed in DI-core21,DI-core42,and DI-core24.In the second interval(10 to 20 ns),core21,core42,and core 24 write the shared data to these MPCAM simultaneously,each one uses its special tags.The written data appeared on the DI-core21,Dout-core42,and Dout-core24 correctly.

    Figure 8:Function simulation for a write operation

    In the functional simulation,it can be noticed that written or read data appear on the DI-core and Dout-core pins without taking into account the delay produced by design components.

    5.2 Latency Assessments

    The Timing Analyzer tool is used to evaluate the read/write latency for the MPCAM and NCSC.In this section,all scenarios that were presented in the functional simulation will be assessed in the timing simulation.The timing simulation for read/write operations of the proposed system is shown in Figs.9–12.

    Fig.9 shows the timing simulation for MPCAM.In the first interval (0 to 10 ns),all cores simultaneously broadcast their shared data with their special Tags.Instantly the written data(pine DIcore)appears clearly on all modules in its row after a short delay as soon as the WR signal goes down.After running the test bench on the simulator one hundred times,it can be noticed that the average access latency of writing on MPCAM organization is around 1.084115±0.03384 ns.The second(10 to 20 ns)interval shows the latency assessment for a read access.To read data that is already written in all MPCAM modules,the Tage destination(Tagd)(1A,2A,3A,and 4A)which is provided by core1,core2,core3,and core4 respectively are simultaneously compared to the tags in all MPCAM modules in the same column.The results appear on output buses(Dout-core)after the delay time.The delay for read operation was calculated using an average of roughly one hundred intervals of test benches,it was noticed that the delay for a read operation is around 1.27804±0.086823 ns.The latencies assessment for concurrent read and write operations to the memory locations are shown in the third(20 to 30 ns)and fourth(40 to 50 ns)periods which was performed around 100 times for each of the two cases.For the scenario when many memory locations are being read and written simultaneously,the latency is 1.1909±0.02363 ns this is almost the same as the separate read operations latency.On the other hand,when write and read operations are performed simultaneously to the same memory locations,the data is written with a latency of 1.3105±0.091955 ns to the destination location,while the read process waits for the following interval then the shared data is read with a total latency 1.2780±0.086823 plus the time of the interval.All these measurements were proved by using Statistical Package for Social Sciences(SPSS)and T-tested with a confidence interval of 95%.

    Fig.10 shows two scenarios of read/write over MPCAM.In the first interval(0 to 10 ns),core 1 broadcasts shared data to all DPCAM models in its row with Tags(0078).Instantly the written data(pine DI-core) appears clearly after a time delay of 1.084115±0.03384 ns as soon as the WR signal goes down this is almost the same as the separate write operations latency.In the second interval(10 to 20 ns),all cores simultaneously read the shared data which was produced by core 1 in the first interval.The results appear on Dout-core pines with a time delay of 1.27804±0.086823 ns,which is almost the same as the separate read operations latency.From the timing simulation,it can be noticed that the write and read operations to the shared cache would not take more than 1.3 ns compared to 5.3 ns at the L2 cache and 19.5 ns in the L3 cache in Nehalem Intel i7.On the other hand,it needs 12 ns at the L2 cache,and 21 ns in the L3 cache in AMD’s Bulldozer family[26].

    Figure 9:MPCAM organization Timing Simulation 1

    Figure 10:MPCAM organization Timing Simulation 2

    Figure 11:NCSC organization Timing Simulation for a read operation

    Figure 12:NCSC organization Timing Simulation for a write operation

    As a result of this simulation,it was demonstrated that all cores in the MPCAM organization can access shared data simultaneously without contention,blocking,and arbitration issues.

    Fig.11 shows the timing simulation of the proposed NCSC with several intervals.In the first interval (0 to 10 ns),both core 11 and core 22 write their shared data with tags (00A1) and (10C1)respectively.In the second interval (10 to 20 ns),core OF31 read shared data with tags (00A1) from the same cluster,and core OF44 read shared data with tags(10C1)from another cluster simultaneously.In this interval,the read latency between the two cases is different.To read data that is already written in the same cluster,the tagd(00A1)which is provided by core31 is simultaneously compared to the tags in the MPCAM column within cluster one.The results appear on output buses (Doutcore 31)after some delay.The delay for read operation was assessed using an average of roughly one hundred intervals of test benches,it was found that the read process delay within the same cluster is around 1.26226±0.090591 ns.On the other hand,to read data that is already written in another cluster,core44 wants to read data produced by core22.OF44 communicates directly using the OF24 which is the conjugate of the second cluster through the NCSC to read internal MPCAM2.The results appear on output buses (Dout-core 44) after some delay.Using an average of one hundred intervals of test benches,it was noticed that the delay for read access between cores that belong to different clusters is around 1.92738±0.139588.In the third Interval (20 to 30 ns),core11 in the first cluster writes its shared data with tags(00A7),core23 in the third cluster read shared data with tags (00A1) produced by core11,and core OF44 read shared data with tags (00A1) simultaneously.In this interval,core11 broadcast its shared data to MPCAM1 in cluster one,and OF23 and OF44 connect to core 11 in the first cluster using OF13 and OF14 respectively which is the conjugate of these cores to cluster one through the NCSC IN.The results appear on output buses (Dout-core23) after some delay.It was noticed that the delay for read access between cores in different clusters is around 1.92738±0.139588 ns,which is nearly equal to the latency of read operation in (Dout-core44).The third Interval(30 to 40 ns)shows the same behavior as the second interval l(20 to 30 ns).

    Fig.12 shows an image of two intervals to assess the write latency over NCSC.In the first interval(0 to 10 ns),core21,core42,and core24 write their shared data with their tags(0211),(1422),and(3074)respectively,each to its MPCAM.It can be observed that the written data are stored in DI-core21,DIcore42,and DI-core24 pins after a delay time with an average of 1.14785±0.04532 which is almost the same as the separate write operations latency in the MPCAM organization.In the second interval(10 to 20 ns),core21,core42,and core24 write the shared data to their MPCAM simultaneously with an average delay of 1.15235±0.06132 which is identical to the latency in the previous interval.

    5.3 Area and Power Consumption Analysis

    Cache memories and routers on any NoC topology are the dominant factors in area and power consumption.Unfortunately,NoC consumes a lot of power,increasing the chip’s total power consumption.Research has already verified that NoC consumes about 40%of the chip’s power without counting the cache power consumption [4].On the other hand,other studies revealed that routers’structure in mesh or tour topology consumes about 28% of the total power and 17% of the total area in the Intel chip[17],and also adds additional latency due to increased hardware structures that negatively affect the performance.Therefore,since routers are the most power-consuming components of the interconnect network,this paper proposed the NCSC interconnection scheme that eleminates the need for router devices and enhances both area and power overheads.Furthermore,NoC with appropriate power consumption plays a leading role in increasing scalability in many-core systems.

    As a future work,a Power analysis simulator can be used to estimate power and area overhead for NCSC which was used for this purpose in MPCAM[1,3].

    6 Conclusion

    In on-chip many-core systems,the NoC topologies have been considered to be the best;however,they have some disadvantages like high scalability cost,power,latency,and contention during access to shared cache,which is mostly due to the usage of router-based structures.Therefore,researchers try to find alternatives for these topologies based on router-less interconnection.In this paper,a scalable topology of the many-core processor systems called NCSC was presented.The proposed topology has additional features such as high scalability,fixed latency on the intra-cluster and inter-cluster levels,and elimination of routers and arbiters which solves the problem of simultaneous access to the shared cache.

    NCSC has been implemented using the Cyclone IV-E FPGA device family.After running the test bench program several times it was found that the main functions of NCSC organization in terms of reading,writing,and simultaneous read-write are accomplished inside and between clusters.The latency of reading and writing by multiple cores within the cluster and between clusters has been assessed.NCSC provides non-blocking access between cores with average latency for write access within the same cluster is around 1.14785±0.04532 ns which is almost the same as the separate write operation in MPCAM,the average read latency within the same cluster is 1.26226±0.090591 ns and the latency for read access between cores from different clusters is around 1.92738±0.139588 ns.On the other hand,the simulation result shows that competition happens only if simultaneous access comes from the same cluster and a request comes from another to the same column in MPCAM.In this case,access coming from the same cluster is given a higher priority,whereas the request coming from another cluster has to wait until the next clock cycle.So,the read latency becomes 1.92738±0.139588 ns pulse the delay from the competition.

    The realization of the MPCAM-based multi-core cluster and conjugate shuffle network in manycore systems opens the door wide for massively parallel processing on a chip and makes life easier for chip designers and programmers.

    In future work,more research can be conducted on the NCSC topology.Other crucial dynamic performance metrics like throughput,latency overhead,area,and power consumption can be evaluated.In addition,all components of this topology,including core interfaces,can be built to ensure the authenticity of NCSC within a many-core system.After that,it can be implemented in Verilog and verified and synthesized using some design tools.

    Acknowledgement:We would like to thank Prof.Adil Amirjanov for his valuable advice and continuous support.We would also like to acknowledge the advice and support from the Department of Computer System Engineering,Arab American University.

    Funding Statement:The authors received no specific funding for this study.

    Author Contributions:The authors confirm their contribution to the paper as follows:study conception and related work: Allam Abumwais and Mujahed Eleyat;data analysis: Allam Abumwais;the components of the proposed system:Allam Abumwais;performance analysis:Allam Abumwais and Mujahed Eleyat;draft manuscript preparation:Mujahed Eleyat.All authors reviewed the results and approved the final version of the manuscript.

    Availability of Data and Materials:Available upon request.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    午夜免费观看性视频| 久久99一区二区三区| 狠狠精品人妻久久久久久综合| 九色亚洲精品在线播放| 欧美日韩国产mv在线观看视频| 精品国产乱子伦一区二区三区 | 亚洲欧美色中文字幕在线| 999久久久国产精品视频| 亚洲午夜精品一区,二区,三区| 国产在视频线精品| 我的亚洲天堂| 日韩中文字幕视频在线看片| 午夜老司机福利片| 男女边摸边吃奶| 色精品久久人妻99蜜桃| 国产精品av久久久久免费| 90打野战视频偷拍视频| 成年人免费黄色播放视频| 欧美黑人欧美精品刺激| 久久天堂一区二区三区四区| 国产xxxxx性猛交| 99国产极品粉嫩在线观看| 国产精品国产三级国产专区5o| 免费日韩欧美在线观看| 久久国产亚洲av麻豆专区| 国产亚洲欧美精品永久| 男人操女人黄网站| 免费高清在线观看视频在线观看| 国产成人精品无人区| 亚洲人成77777在线视频| 美女高潮喷水抽搐中文字幕| 久热这里只有精品99| 精品亚洲成国产av| 亚洲精品国产av蜜桃| 麻豆乱淫一区二区| 亚洲一区中文字幕在线| 国产亚洲精品一区二区www | 国产伦理片在线播放av一区| 精品一区二区三卡| 欧美av亚洲av综合av国产av| 欧美日韩国产mv在线观看视频| 91九色精品人成在线观看| 精品少妇内射三级| 91九色精品人成在线观看| 免费在线观看视频国产中文字幕亚洲 | 国产激情久久老熟女| 一级片'在线观看视频| 首页视频小说图片口味搜索| 国产成人免费观看mmmm| 日韩大码丰满熟妇| 香蕉国产在线看| 操出白浆在线播放| 日本一区二区免费在线视频| 亚洲欧美成人综合另类久久久| 老司机午夜十八禁免费视频| 亚洲欧美成人综合另类久久久| av片东京热男人的天堂| 欧美一级毛片孕妇| 91麻豆av在线| 熟女少妇亚洲综合色aaa.| 久久午夜综合久久蜜桃| 制服人妻中文乱码| 欧美久久黑人一区二区| 日日夜夜操网爽| 久久国产亚洲av麻豆专区| 午夜福利视频精品| 脱女人内裤的视频| 99国产精品99久久久久| 欧美大码av| 欧美日韩亚洲国产一区二区在线观看 | 久久久精品94久久精品| 三级毛片av免费| 一区二区三区精品91| 亚洲精品久久久久久婷婷小说| 亚洲七黄色美女视频| 黄色视频不卡| 蜜桃在线观看..| 精品亚洲成国产av| 亚洲 国产 在线| 亚洲中文字幕日韩| 下体分泌物呈黄色| 在线观看免费高清a一片| 91国产中文字幕| 这个男人来自地球电影免费观看| 肉色欧美久久久久久久蜜桃| 老汉色av国产亚洲站长工具| kizo精华| 国产精品国产av在线观看| 午夜影院在线不卡| 考比视频在线观看| av视频免费观看在线观看| 久久国产精品男人的天堂亚洲| 久久久久久久国产电影| 少妇 在线观看| 曰老女人黄片| 在线观看舔阴道视频| 亚洲avbb在线观看| 视频区图区小说| 免费高清在线观看视频在线观看| 一个人免费在线观看的高清视频 | 久久久欧美国产精品| 国产又爽黄色视频| 精品人妻1区二区| 丰满饥渴人妻一区二区三| 美女国产高潮福利片在线看| 一本久久精品| 国产av国产精品国产| 日日夜夜操网爽| 一区二区三区四区激情视频| 免费在线观看日本一区| www.熟女人妻精品国产| 日日摸夜夜添夜夜添小说| 久久综合国产亚洲精品| 午夜成年电影在线免费观看| 91麻豆av在线| 精品一区二区三区四区五区乱码| 午夜精品久久久久久毛片777| 欧美亚洲 丝袜 人妻 在线| av网站免费在线观看视频| 狠狠婷婷综合久久久久久88av| 亚洲性夜色夜夜综合| 19禁男女啪啪无遮挡网站| 成人免费观看视频高清| 欧美日韩中文字幕国产精品一区二区三区 | 老司机福利观看| 国产黄频视频在线观看| 国产精品偷伦视频观看了| 丝袜人妻中文字幕| 麻豆av在线久日| 精品国产一区二区三区久久久樱花| 伊人亚洲综合成人网| 久久人妻熟女aⅴ| 亚洲人成77777在线视频| 天天添夜夜摸| 色94色欧美一区二区| 亚洲成av片中文字幕在线观看| 动漫黄色视频在线观看| 老熟妇乱子伦视频在线观看 | 97人妻天天添夜夜摸| 亚洲成人手机| 久久青草综合色| 亚洲色图综合在线观看| 午夜免费鲁丝| 免费在线观看日本一区| 久久ye,这里只有精品| 99精品久久久久人妻精品| 欧美激情高清一区二区三区| 99精国产麻豆久久婷婷| 欧美黑人精品巨大| 亚洲精品国产av蜜桃| av国产精品久久久久影院| 久久热在线av| 国产片内射在线| 国产精品欧美亚洲77777| 国产精品欧美亚洲77777| 中国国产av一级| 色视频在线一区二区三区| 亚洲情色 制服丝袜| 亚洲情色 制服丝袜| 啦啦啦中文免费视频观看日本| 人人澡人人妻人| 日日摸夜夜添夜夜添小说| 脱女人内裤的视频| 久久综合国产亚洲精品| 午夜免费鲁丝| 精品国内亚洲2022精品成人 | 超色免费av| 啪啪无遮挡十八禁网站| 18在线观看网站| 后天国语完整版免费观看| 别揉我奶头~嗯~啊~动态视频 | 91av网站免费观看| 亚洲男人天堂网一区| 亚洲伊人久久精品综合| 99久久人妻综合| 国产高清视频在线播放一区 | 午夜福利,免费看| 欧美变态另类bdsm刘玥| 久热这里只有精品99| 在线观看免费视频网站a站| 久久久精品免费免费高清| 美女午夜性视频免费| 一区福利在线观看| 日韩欧美一区二区三区在线观看 | 午夜免费成人在线视频| 欧美日韩亚洲高清精品| 嫩草影视91久久| 免费黄频网站在线观看国产| 青春草视频在线免费观看| 日韩欧美一区视频在线观看| 人妻 亚洲 视频| 欧美老熟妇乱子伦牲交| 精品一品国产午夜福利视频| 一个人免费看片子| 久久精品国产亚洲av香蕉五月 | 乱人伦中国视频| 韩国精品一区二区三区| 丁香六月天网| 国产淫语在线视频| 国产主播在线观看一区二区| 男人添女人高潮全过程视频| 国产91精品成人一区二区三区 | 人妻久久中文字幕网| 老鸭窝网址在线观看| 国产免费av片在线观看野外av| 亚洲男人天堂网一区| 国产免费现黄频在线看| 捣出白浆h1v1| 久久久久精品人妻al黑| 亚洲国产精品999| 日本精品一区二区三区蜜桃| a级片在线免费高清观看视频| 男女无遮挡免费网站观看| 亚洲第一青青草原| 亚洲国产日韩一区二区| 90打野战视频偷拍视频| 亚洲精品美女久久av网站| 成在线人永久免费视频| 亚洲精品中文字幕一二三四区 | 超碰成人久久| 久久青草综合色| 中文字幕av电影在线播放| 少妇猛男粗大的猛烈进出视频| 亚洲熟女精品中文字幕| 欧美日韩福利视频一区二区| 亚洲第一欧美日韩一区二区三区 | 99精品欧美一区二区三区四区| 成人免费观看视频高清| 大型av网站在线播放| 欧美精品一区二区免费开放| 日韩一卡2卡3卡4卡2021年| 日韩大片免费观看网站| e午夜精品久久久久久久| 免费久久久久久久精品成人欧美视频| 9色porny在线观看| 99久久99久久久精品蜜桃| 欧美日韩精品网址| 欧美+亚洲+日韩+国产| 久久国产精品男人的天堂亚洲| 在线观看人妻少妇| 欧美黑人精品巨大| 精品少妇黑人巨大在线播放| 日本精品一区二区三区蜜桃| 免费观看av网站的网址| 亚洲黑人精品在线| 亚洲天堂av无毛| 各种免费的搞黄视频| 黄片播放在线免费| 91精品伊人久久大香线蕉| 亚洲一区中文字幕在线| 大片免费播放器 马上看| 咕卡用的链子| 在线十欧美十亚洲十日本专区| 久久久久国产精品人妻一区二区| 国产高清国产精品国产三级| 性少妇av在线| √禁漫天堂资源中文www| 久久综合国产亚洲精品| 天天躁日日躁夜夜躁夜夜| 欧美国产精品va在线观看不卡| 999精品在线视频| 男人添女人高潮全过程视频| 十八禁网站免费在线| 国产不卡av网站在线观看| 亚洲av电影在线进入| 满18在线观看网站| 在线观看免费日韩欧美大片| 日本wwww免费看| 丰满人妻熟妇乱又伦精品不卡| 脱女人内裤的视频| 我要看黄色一级片免费的| 天堂俺去俺来也www色官网| 黄色视频在线播放观看不卡| 亚洲国产成人一精品久久久| 欧美日韩黄片免| 97在线人人人人妻| 久久久欧美国产精品| 老司机影院毛片| av超薄肉色丝袜交足视频| 国产黄频视频在线观看| 爱豆传媒免费全集在线观看| 50天的宝宝边吃奶边哭怎么回事| 国产av国产精品国产| 精品卡一卡二卡四卡免费| 在线观看www视频免费| 极品少妇高潮喷水抽搐| 在线永久观看黄色视频| 欧美变态另类bdsm刘玥| 视频区欧美日本亚洲| 一本一本久久a久久精品综合妖精| 91成人精品电影| 久久天堂一区二区三区四区| 19禁男女啪啪无遮挡网站| 90打野战视频偷拍视频| 久久亚洲精品不卡| 久久久国产欧美日韩av| 男女床上黄色一级片免费看| 人人妻人人爽人人添夜夜欢视频| 人人妻人人澡人人爽人人夜夜| 久久久久久久久久久久大奶| 性少妇av在线| 免费久久久久久久精品成人欧美视频| 久久 成人 亚洲| 色婷婷久久久亚洲欧美| 亚洲专区中文字幕在线| 日韩三级视频一区二区三区| 纵有疾风起免费观看全集完整版| 国产精品成人在线| 中亚洲国语对白在线视频| 在线观看一区二区三区激情| 亚洲精品一二三| 欧美日韩中文字幕国产精品一区二区三区 | 亚洲中文字幕日韩| av在线app专区| 久久久国产精品麻豆| 18禁黄网站禁片午夜丰满| 丰满人妻熟妇乱又伦精品不卡| 精品久久久精品久久久| 欧美日韩亚洲高清精品| 他把我摸到了高潮在线观看 | 国产精品久久久久久精品电影小说| 国产一级毛片在线| 亚洲,欧美精品.| 久久人妻福利社区极品人妻图片| 久久九九热精品免费| 国产高清videossex| 搡老熟女国产l中国老女人| 老熟妇仑乱视频hdxx| 国产91精品成人一区二区三区 | 91九色精品人成在线观看| 婷婷色av中文字幕| 久久亚洲精品不卡| 精品国产乱码久久久久久男人| 欧美人与性动交α欧美精品济南到| 国产片内射在线| 夜夜骑夜夜射夜夜干| 丝袜在线中文字幕| 国产精品国产av在线观看| 99精国产麻豆久久婷婷| 成人手机av| 免费av中文字幕在线| 97人妻天天添夜夜摸| 美女中出高潮动态图| 日韩熟女老妇一区二区性免费视频| 日韩人妻精品一区2区三区| 国产成人免费观看mmmm| 亚洲成人手机| 午夜精品国产一区二区电影| 黄片小视频在线播放| 免费高清在线观看视频在线观看| 国产精品欧美亚洲77777| av电影中文网址| 在线亚洲精品国产二区图片欧美| 国产野战对白在线观看| 国产亚洲欧美在线一区二区| 两个人看的免费小视频| 亚洲专区字幕在线| 999久久久国产精品视频| 深夜精品福利| 欧美xxⅹ黑人| 菩萨蛮人人尽说江南好唐韦庄| 精品国产乱子伦一区二区三区 | 久久亚洲精品不卡| 成人免费观看视频高清| 麻豆国产av国片精品| 日本av手机在线免费观看| 亚洲精品国产区一区二| 久久精品aⅴ一区二区三区四区| 久久免费观看电影| 2018国产大陆天天弄谢| 免费日韩欧美在线观看| 女人高潮潮喷娇喘18禁视频| 国产精品.久久久| 免费在线观看黄色视频的| 亚洲精品av麻豆狂野| 久久久精品区二区三区| 两个人免费观看高清视频| av网站免费在线观看视频| 十分钟在线观看高清视频www| 中文字幕av电影在线播放| 老熟妇仑乱视频hdxx| 欧美激情极品国产一区二区三区| 国产精品亚洲av一区麻豆| 亚洲欧美清纯卡通| 丁香六月天网| 精品免费久久久久久久清纯 | 欧美国产精品一级二级三级| 热99国产精品久久久久久7| www.自偷自拍.com| 国产免费现黄频在线看| 久久久国产成人免费| 免费在线观看日本一区| av免费在线观看网站| 少妇被粗大的猛进出69影院| 最近最新免费中文字幕在线| www.熟女人妻精品国产| 女人爽到高潮嗷嗷叫在线视频| 亚洲欧美日韩另类电影网站| 久久狼人影院| 精品国产一区二区三区久久久樱花| 99热国产这里只有精品6| 国产高清国产精品国产三级| 脱女人内裤的视频| 999久久久精品免费观看国产| 免费在线观看完整版高清| 啪啪无遮挡十八禁网站| 两人在一起打扑克的视频| 国精品久久久久久国模美| 亚洲精品国产av蜜桃| 捣出白浆h1v1| 亚洲欧美成人综合另类久久久| 亚洲av片天天在线观看| av在线app专区| 日本a在线网址| 熟女少妇亚洲综合色aaa.| 久久99热这里只频精品6学生| 麻豆乱淫一区二区| 纯流量卡能插随身wifi吗| www.av在线官网国产| 国产黄频视频在线观看| 91老司机精品| 999精品在线视频| 亚洲国产精品999| 成人黄色视频免费在线看| 国精品久久久久久国模美| 嫩草影视91久久| 老司机影院毛片| 国产区一区二久久| 一区二区三区乱码不卡18| 午夜影院在线不卡| 亚洲一码二码三码区别大吗| 国产成人啪精品午夜网站| 老司机影院成人| 青春草视频在线免费观看| 最新的欧美精品一区二区| 桃花免费在线播放| 91九色精品人成在线观看| 在线观看一区二区三区激情| 最新的欧美精品一区二区| 国产精品 欧美亚洲| 欧美+亚洲+日韩+国产| 99国产综合亚洲精品| 国产精品亚洲av一区麻豆| 免费在线观看视频国产中文字幕亚洲 | 欧美日韩国产mv在线观看视频| 欧美人与性动交α欧美精品济南到| 日韩大码丰满熟妇| tube8黄色片| 捣出白浆h1v1| 99热网站在线观看| 老汉色∧v一级毛片| 性高湖久久久久久久久免费观看| 少妇人妻久久综合中文| 99国产精品一区二区三区| 久久精品国产亚洲av香蕉五月 | 91麻豆精品激情在线观看国产 | 国产精品久久久久成人av| 欧美 日韩 精品 国产| 久久av网站| 免费人妻精品一区二区三区视频| 美女福利国产在线| 少妇精品久久久久久久| 最近最新免费中文字幕在线| 国产男女内射视频| 亚洲欧洲精品一区二区精品久久久| 岛国在线观看网站| 99久久人妻综合| 久久中文字幕一级| av网站在线播放免费| 日韩一区二区三区影片| 99国产精品免费福利视频| 汤姆久久久久久久影院中文字幕| 人人妻人人添人人爽欧美一区卜| 动漫黄色视频在线观看| 亚洲人成77777在线视频| 国产一区二区激情短视频 | 久久精品久久久久久噜噜老黄| 天堂8中文在线网| 好男人电影高清在线观看| 久久国产精品大桥未久av| 黑人猛操日本美女一级片| 久久久久久久大尺度免费视频| 精品少妇黑人巨大在线播放| 91国产中文字幕| 国产成人精品久久二区二区免费| 国产亚洲av片在线观看秒播厂| 1024视频免费在线观看| 淫妇啪啪啪对白视频 | 中文字幕制服av| 国产av精品麻豆| 高清在线国产一区| 伊人久久大香线蕉亚洲五| 中文字幕另类日韩欧美亚洲嫩草| 久久久国产一区二区| 国产日韩欧美视频二区| 国产在线免费精品| 中文字幕色久视频| 欧美日韩av久久| a级片在线免费高清观看视频| 在线观看免费日韩欧美大片| av网站在线播放免费| 激情视频va一区二区三区| 国产男女超爽视频在线观看| 亚洲精品乱久久久久久| 99久久综合免费| 亚洲伊人色综图| 91国产中文字幕| 亚洲av成人一区二区三| 日韩制服丝袜自拍偷拍| 免费观看人在逋| 精品国产乱子伦一区二区三区 | 国产成人精品无人区| 夫妻午夜视频| 嫩草影视91久久| 欧美精品一区二区大全| av国产精品久久久久影院| 国产在线观看jvid| 亚洲av欧美aⅴ国产| 欧美另类一区| 日韩一区二区三区影片| 俄罗斯特黄特色一大片| 成人18禁高潮啪啪吃奶动态图| av在线app专区| 1024视频免费在线观看| 中文字幕av电影在线播放| 欧美+亚洲+日韩+国产| 黑丝袜美女国产一区| 亚洲精品国产色婷婷电影| 日韩 欧美 亚洲 中文字幕| 制服诱惑二区| 国产老妇伦熟女老妇高清| 国产欧美日韩精品亚洲av| 成人国产av品久久久| 99久久综合免费| av在线老鸭窝| 色94色欧美一区二区| av片东京热男人的天堂| 岛国毛片在线播放| 久久九九热精品免费| 两个人看的免费小视频| 欧美+亚洲+日韩+国产| 黄色视频,在线免费观看| 久久久久精品国产欧美久久久 | av线在线观看网站| 国产福利在线免费观看视频| 男男h啪啪无遮挡| 欧美老熟妇乱子伦牲交| 99热国产这里只有精品6| 亚洲专区国产一区二区| 国产精品.久久久| 日本猛色少妇xxxxx猛交久久| 成人av一区二区三区在线看 | a在线观看视频网站| 午夜福利影视在线免费观看| 亚洲,欧美精品.| 中文精品一卡2卡3卡4更新| 欧美 日韩 精品 国产| 日韩熟女老妇一区二区性免费视频| 精品久久蜜臀av无| 飞空精品影院首页| 国产成人精品久久二区二区91| 亚洲av片天天在线观看| 中文字幕人妻丝袜一区二区| 色精品久久人妻99蜜桃| 啦啦啦中文免费视频观看日本| 搡老岳熟女国产| 精品国产一区二区久久| 亚洲av男天堂| 一边摸一边做爽爽视频免费| 日韩视频一区二区在线观看| 欧美精品av麻豆av| 青春草视频在线免费观看| 国产一区二区在线观看av| 欧美亚洲日本最大视频资源| 欧美xxⅹ黑人| 久久久久久久久免费视频了| 欧美+亚洲+日韩+国产| 操出白浆在线播放| 丝袜人妻中文字幕| 50天的宝宝边吃奶边哭怎么回事| a级片在线免费高清观看视频| 亚洲国产精品一区三区| 十八禁网站免费在线| 丁香六月欧美| 午夜福利在线免费观看网站| 美国免费a级毛片| 亚洲av日韩在线播放| 老熟妇乱子伦视频在线观看 | 日韩制服骚丝袜av| 天天影视国产精品| 自拍欧美九色日韩亚洲蝌蚪91| 欧美黑人精品巨大| 丰满迷人的少妇在线观看| 亚洲成av片中文字幕在线观看| 中文字幕av电影在线播放| 大陆偷拍与自拍| 在线 av 中文字幕| 啪啪无遮挡十八禁网站| 侵犯人妻中文字幕一二三四区| 国产精品偷伦视频观看了| 久久久久久人人人人人| 最黄视频免费看| 乱人伦中国视频| 777米奇影视久久| 欧美午夜高清在线| 午夜免费鲁丝| 国产男女超爽视频在线观看| 久久精品亚洲熟妇少妇任你| 久久免费观看电影| 婷婷色av中文字幕| 亚洲国产av新网站| 波多野结衣av一区二区av| 美女扒开内裤让男人捅视频| 1024香蕉在线观看| 日日夜夜操网爽|