• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    COPPER:a combinatorial optimization problem solver with processing-in-memory architecture*

    2023-06-02 12:31:02QiankunWANGXingchenLIBingzheWUKeYANGWeiHUGuangyuSUNYuchaoYANG

    Qiankun WANG ,Xingchen LI ,Bingzhe WU ,Ke YANG ,Wei HU ,Guangyu SUN,6,7,Yuchao YANG

    1School of Software & Microelectronics, Peking University, Beijing 100871, China

    2School of Computer Science, Peking University, Beijing 100871, China

    3School of Integrated Circuits, Peking University, Beijing 100871, China

    4Tencent AI Lab, Shenzhen 518057, China

    5College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China

    6Beijing Advanced Innovation Center for Integrated Circuits, Beijing 100871, China

    7Beijing Academy of Artificial Intelligence, Beijing 100080, China

    Abstract: The combinatorial optimization problem (COP),which aims to find the optimal solution in discrete space,is fundamental in various fields.Unfortunately,many COPs are NP-complete,and require much more time to solve as the problem scale increases.Troubled by this,researchers may prefer fast methods even if they are not exact,so approximation algorithms,heuristic algorithms,and machine learning have been proposed.Some works proposed chaotic simulated annealing (CSA) based on the Hopfield neural network and did a good job.However,CSA is not something that current general-purpose processors can handle easily,and there is no special hardware for it.To efficiently perform CSA,we propose a software and hardware co-design.In software,we quantize the weight and output using appropriate bit widths,and then modify the calculations that are not suitable for hardware implementation.In hardware,we design a specialized processing-in-memory hardware architecture named COPPER based on the memristor.COPPER is capable of efficiently running the modified quantized CSA algorithm and supporting the pipeline further acceleration.The results show that COPPER can perform CSA remarkably well in both speed and energy.

    Key words: Combinatorial optimization;Chaotic simulated annealing;Processing-in-memory

    1 Introduction

    Combinatorial optimization problems,abbreviated COPs,are prevalent but intractable in our daily lives.A classic COP is the traveling salesman problem (TSP),whose purpose is finding the shortest path through all cities and then back to the point of departure.Another COP we often meet is the work planning problem,which looks for an order of handling tasks to minimize the total timeout.Unfortunately,many COPs are non-deterministic polynomial complete (NPC).Consequently,we cannot solve them quickly when the problem size increases to a large scale.The prior theory proved that NPC problems can be transformed into each other in polynomial time (Karp,1972).In other words,we can simply pay attention to only one specific problem.

    Many methods have been proposed to solve COPs.Though exact algorithms,such as enumeration and branch and bound,can obtain the optimal solution,they are usually extremely time-consuming.Approximate algorithms,such as the nearest neighbor algorithm for TSP,can obtain less acceptable solutions in polynomial time.However,they are normally designed for particular problems.Heuristic algorithms,which always focus on the thinking mindset to solve problems,are able to obtain decent but usually not optimal solutions in a specified time.However,none of the above methods can balance cost and performance well,especially when the scale of the problem is extensive.Machine learning has been proven to perform extremely well in many fields,so some works suggested applying it to COP,such as reinforcement learning (Mirhoseini et al.,2020),Hopfield neural networks(HNNs)(Hopfield and Tank,1985),and pointer networks(Vinyals et al.,2015).

    HNN is a typical machine-learning-based method.HNN was proposed in the 1980s(Hopfield,1982).It is a type of fully connected recurrent neural network.Every neuron receives outputs from all other neurons and sends its output to all other neurons.An important property of HNN is Hamiltonian,or energy,which accounts for the stability of HNN.Starting with the initialized weights and neuron states,the energy of HNN keeps decreasing with computation until it reaches a minimum.

    Hopfield and Tank (1985) proved that TSP could be solved using HNN.As shown in Fig.1,when solving a COP with HNN,we need to carefully map the energy of a network to the solution space of the COP.The mapping should ensure that the lower the energy,the better the corresponding solution to the problem.Therefore,we could obtain a sub-optimal/optimal solution when the HNN energy reaches a local/global minimum.

    Fig.1 Solving the traveling salesman problem (TSP)with a Hopfield neural network (HNN)

    However,because the HNN energy decreases only during computation,it is easy to get stuck at a locally optimal solution.Thus,avoiding the local optimal becomes the most challenging issue of the optimization problem.Therefore,Chen LN and Aihara (1995) proposed chaotic simulated annealing(CSA),making neurons in HNN receive not only outputs of other neurons but also their own outputs.The self-feedback will decay with time,so HNN should be chaotic at first and tend to stabilize gradually,which expands the exploration space and increases the probability of finding the global optimal solution.

    Although the CSA method achieves great performance,it also suffers from the“memory wall” and low parallelism when it is processed on current processors.The processing-in-memory (PIM) architecture based on the memristor crossbar is a potential key to the dilemma.The memristor is an electronic device whose resistance can be adjusted,and it can be easily integrated into a crossbar to compute matrix-vector multiplication efficiently.Many works have applied the memristor PIM architecture to accelerate neural network computation(Chi et al.,2016;Shafiee et al.,2016;Song et al.,2017).Yang et al.(2020)suggested that the memristor PIM had the potential to implement a COP solver using CSA.However,the hardware architecture and many details of implementation,such as quantization and the design of the peripheral circuit,were not taken into consideration.Zhu et al.(2019) studied the quantization technique and PIM architecture design well.Zhu et al.(2019)focused on the multi-layer convolution neural network,while HNN is a fully connected cyclic neural network,so reanalysis is needed.Also,the previous PIM accelerators cannot directly perform CSA efficiently.Co-design of the algorithm and hardware is necessary.

    In this paper,we propose an efficient PIM architecture,called COPPER,as the COP solver.COPPER is designed for CSA based on memristor crossbars.Thus,it can run a raw HNN naturally.Users just need to input the parameters of CSA,and COPPER will give an appropriate solution.Our main contributions are as follows:

    1.We explore the influence that quantization brings to CSA,and determine the appropriate bit width.

    2.We adapt CSA to hardware without reducing its performance.

    3.We design an efficient COP solver with CSA and PIM,and propose a pipeline method.

    Experimental results show that the speed and energy efficiency of COPPER are significantly better than those of CPU and GPU.

    2 Preliminaries

    In this section,we first provide the formulation of CSA and evaluate its performance.Then we introduce the memristor and PIM based on it.

    2.1 Chaotic simulated annealing(CAS)

    CSA is based on HNN,which is inspired by the Ising model (Cipra,1987).To explain the phenomenon of ferromagnetic phase transition in physics,Ernst Ising proposed the Ising model.In the Ising model,each neuron can perceive only the neurons adjacent to it.The Ising model succeeded in both physics and computer science.It was discovered that the Ising model could be used to solve the COP,as long as the problem is modeled suitably.Lucas(2014)showed the Ising formulations for many NP problems,most of which are COPs.

    Different from the Ising model,HNN is a fully connected cyclic neural network,which means that each neuron can perceive all other neurons.The discrete HNN(DHNN)was proposed first,and the continuous HNN(CHNN)was proposed years after.The former can be used as content-addressed memory,and the latter can be used as a COP solver.Although CHNN is a little different from the Ising model,the formulations for COPs used by Lucas(2014)can also be applied to CHNN.Because each neuron can directly perceive more neurons,CHNN can achieve a better solution in fewer iterations.

    CHNN is also troubled by local minima,so Chen LN and Aihara(1995)suggested adding self-feedback to CHNN,which is CAS.Here we use the symbols of Chen LN and Aihara (1995)to describe CSA.As shown in Fig.2,every neuronihas stateyiand outputxi,and is influenced by the biasIi.There is a connection weightwijbetween any two neuronsniandnj,andwij=wji.The formulation of CSA is

    Fig.2 Chaotic simulated annealing (CSA) (a) and its neuron (b)

    Fig.3 Memristor crossbar for multiplication

    where∈,k,α,β,andI0are constants,andziis the self-feedback coefficient of neuroni.The original CHNN has no self-feedback;that is,zi=0.At every tick,every neuron updates its own state and output according to Eqs.(1)–(3).The order of neuron operation is not essential.All neurons can run asynchronously or synchronously in any order.

    CSA has the same energy representation as the CHNN,which is shown in Eq.(4).It can be proved that without self-feedback (like the original HNN),the network energy will gradually decrease with time.When the self-feedback coefficients are not zero,the energy change direction of the network is unpredictable,namely chaotic.After the self-feedback coefficients are all zero,the energy decreases until reaching a local minimum or the global minimum.That is the reason why it is called CSA.

    2.2 CSA performance

    Here we test the CSA performance on TSP.To solve TSP,we need to model the problem.Assuming that there arencities in all,ann×nbinary matrixXrepresents a path we choose.InX,xij=1 means that the salesman will visit cityiat thejthposition.Therefore,the constraints consist of two parts:

    1.Hard constraint: any position corresponds to one and only one city;any city corresponds to one and only one position.

    2.Soft constraint: the shorter the path length,the better the path.

    Now we give the whole constraint,that is,

    wheredikrepresents the distance between cityiand cityk.The first part of the right-hand side of Eq.(5)is the hardware constraint,which should be zeros whenXrepresents a feasible solution.The second part is the software constraint,which should be smaller when the solution is better.By adjustingW1andW2,we can indicate whether we prefer to obtain a feasible solution or a better solution.

    By matching the coefficients of everyxin Eqs.(4) and (5),we can compute the values ofwijandIineeded by CSA,and then perform calculations according to Eqs.(1)–(3).CSA will return a solution after reaching stability.The initial state can be set randomly.

    We evaluate the performance of CSA on different numbers of cities.Positions of cities are generated randomly,and coordinates are in 0–1.Table 1 shows the constant parameters we set to test the performance of CSA.For problems of different scales,weuse different constant parameters.Constant parameters used in the problem with 10 cities are from Yang et al.(2020),and we fine-tune constant parameters for the others.As the number of cities increases,we tend to obtain feasible solutions (by decreasingW2)and increase the amount of annealing (by decreasingβand increasingz0) to explore a larger solution space.

    Table 1 Constant parameters used in chaotic simulated annealing when solving the traveling salesman problem

    Table 2 shows the results.The best paths are obtained using Concorde,which is typical software to solve TSP.For each scale,we apply CSA 100 times and the simulated annealing (SA) algorithm 100 times for comparison.For fairness,we set different annealing rates for SA so that the amount of annealing approximately equals the number of iterations of CSAs.We show the best and average path lengths found by CSA and SA.

    Table 2 Performance of chaotic simulated annealing (CSA) and simulated annealing (SA)

    According to Table 2,CSA needs multiple iterations to reach the stable state,and it does not always obtain a feasible solution.As for the quality of the solution,CSA is better than SA.In fact,the solution quality is still not as good as that in Chen LN and Aihara (1995),because we do not spend much time in adapting the constant parameters.However,we can see that CSA can obtain good solutions,although the scale of the problem increases.

    2.3 Memristor-based processing-in-memory architecture

    Because CSA is fully connected,the amount of computation required will increase rapidly,which is not something that current general-purpose processors can handle easily.In fact,many neural networks are troubled by such extensive computation.Too many calculations are needed,although most of the calculations are similar.Therefore,PIM is proposed which intends to perform some simple operations in memory.PIM is created with a high degreeof parallelism and does not require much time for fetching data.

    Yang et al.(2020)attempted to implement CSA using a memristor crossbar.In this work,the weights and self-feedback coefficients were written into the memristor cells,and other operations were simulated.The results showed that the memristor crossbar can perform CSA correctly.In addition,this work explored the convergence when selffeedback coefficients decayed with linear,exponential,and long-term depression,separately.However,the programming of the memristor cell is timeconsuming (Chen JR et al.,2020),so the time cost will increase rapidly if the memristor cell is adjusted frequently.The aim of Yang et al.(2020)was to verify the feasibility,so many practical problems,such as hardware architecture,scale,and algorithm quantization,were not taken into consideration.In the following sections,we will address these problems.

    3 Algorithm

    In this section,we first quantize CSA and find the proper bit width.Then we propose schemes to adapt CSA to hardware implementation.Finally,we evaluate the performance of the quantization and schemes on the max-cut problem.

    3.1 Quantization

    Quantization is necessary before implementing CSA on special hardware.Because matrix-vector multiplication is deployed on memristor crossbars,quantization of the weight and output should be taken into consideration.We analyze the influence of quantization by solving a 30-city TSP 100 times.The positions and constant parameters are the same as in Section 2.2.We pre-calculate float32 weights with float32 position coordinates and then quantize the weights into different bit widths for subsequent computing.Here we use symmetric uniform quantization,which is easy to implement.

    Fig.4 shows the influence of quantization of the weight and output.The solid line represents the ratio between the path lengths of float CSA and quantized CSA.The dotted line represents the rate of obtaining a feasible solution.An interesting note is that quantized CSA sometimes obtains better solutions(the length ratio of float path to quantized path >1),which may be owing to the randomness.We can see that CSA is more sensitive to the quantization of the weight than to that of output.A weight of low bit width may incur a high failure ratio and solutions with bad performance.

    Fig.4 Quantization influence: (a) weight quantization;(b) output quantization

    We think that the weight determines the trend of the network energy change.After the weight is quantized with too few bits,the network cannot obtain the right direction in every iteration,so it will fail at last.In addition,fewer bits erase the gap between a longer and a shorter distance,so the network cannot perceive the better path.According to Fig.4,we suggest using 4 bits for the weight.The finer the output granularity,the more the states that each neuron can have,and the larger the space that the entire network can explore.Also,CHNN and CSA have richer dynamic characteristics,which increase their adaptability.Therefore,we suggest using 8 bits for the output.The following subsections are based on 4-bit weight and 8-bit output.

    3.2 Adaptation

    We will propose some schemes in this subsection to adapt CSA to hardware.

    The implementation of the sigmoid function is very expensive.Fortunately,fewer bits are enough for the output according to Section 3.1,so we could use a lookup table(LUT) to reduce the cost.

    Because the output of the sigmoid is between 0 and 1 and we use 8 bits to represent the output,the input of sigmoid,assumed to bea,needs only to be between-5.54 and +5.54.According to Eq.(1),we havea=y/∈.We can specify∈as-ppower of 2,which does not influence the performance of CSA,so the division could be transformed into a shift.Note that we do not changeybut just use it to calculatea.The resultawill be truncated into 8 bits and be used as the sigmoid LUT.Therefore,Eq.(1) could be replaced by

    In addition,zdecays exponentially in Eq.(3),which is not hardware-friendly.We can replace exponential decay with linear decay.That is,Eq.(3)could be replaced by

    We apply the adaptation to the 30-city TSP.As shown in Table 3,linear decay has little effect on the algorithm.

    Table 3 Influence of the adaptation

    Eq.(2)could also be transformed into

    Although it seems typical,we do not fuse the self-feedback and weight,because modifying the weight,represented by the conductance of a memristor,takes a lot of time.

    Another problem we need to consider is discontinuing computation in time after the network becomes stable.The stability of the network means that every value does not change again,including the energy of the network and the states and outputs of neurons.According to Eq.(4)or(5),the calculation of energy is especially complex,so we prefer using output as the criterion.To reduce the overhead of comparison between the last and the current outputs,we suggest just contrasting their highest bit.This is not absurd,because the final outputs always converge to 0 or 1.We believe that if the outputs do not change 10 times in a row,the network becomes stable.According to Table 3,the results of comparison between rounding values can represent the raw results very closely.In addition,it is shown that using output rounding values can evidently reduce the number of iterations,because the redundant iterations just change the output from>0.5 to 1.0 in most cases.

    3.3 Performance on the max-cut problem

    CSA can solve many COPs directly as long as the energy function is suitably constructed.Also,many COPs can be transformed into each other,so CSA can also solve COPs indirectly by transforming them into those that can be solved.

    To evaluate the feasibility of CSA and the above schemes on other COPs,we carry out experiments on the max-cut problem,which is also an NPC problem.The max-cut problem is to divide the node set of a graph into two complementary subsets so that the number of edges between the two subsets is maximum.Assuming that the graph includesnnodes,the following formulation can be used to evaluate a segmentation:whereiandjrepresent the nodes.aij=1 if nodeiconnects with nodej;aij=0,otherwise.xi ∈{0,1}reflects the subset node to whichibelongs.

    According to Eq.(9),the energy function of the max-cut problem is as follows:

    By building a Hopfield network ofnnodes,and comparing Eqs.(4) and (10),we can see thatwij=-4aijandIi=

    Fig.5 is a graph whose edges are randomly generated.Fig.5 also shows one of the optimal segmentations,and the number of edges that are cut through is 12.We evaluate CSA with different schemes 100 times,and the results are shown in Table 4.The constant parameters in the experiments are consistent with those in the previous TSP problems of 10 cities.Weights and outputs are quantized with 4 and 8 bits,respectively.Obviously,when using various schemes,CSA can still solve the max-cut problem well.

    Fig.5 Max-cut problem

    4 Architecture design

    4.1 Overview

    COPPER is composed mainly of a subprocessor-element(subPE)array and peripheral circuits,as shown in Fig.6.Each subPE can calculate matrix-vector multiplication independently.The peripheral circuit,responsible for the parts except for matrix-vector multiplication,is implemented using traditional CMOS circuits.With the increase of the scale of CSA,the scale of subPE increases in the square while the peripheral circuit increases linearly.That is,each column of subPE is equipped with one peripheral circuit.COPPER covers only the operation of the network,so the calculation of weights is the concern of the host.In consideration of the sign of parameters,the weights may need equivalent linear transformation to make them all non-negative numbers.This was discussed in Yang et al.(2020).

    Fig.6 COPPER overview (a) and subPE (b)

    After parameters are written,COPPER starts running until the network stabilizes or reaches the upper limit of the cycle.During operation,data will be transmitted between subPEs.Specifically,a subPE communicates from left to right and from top to bottom,which can save the bandwidth between the subPE and buffer.The pipeline between the subPE and peripheral circuit can save time,and the delay of the peripheral circuit is covered by the subPE.

    4.2 SubPE

    Each subPE has 16 128×128 memristor crossbars,digital-to-analog converters(DACs),analog-todigital converters (ADCs),sample-and-hold (S+H)devices,registers,and shift-and-add(S+A)units.

    Due toIRdrop,thermal noise,random telegraph noise,and so on,memristor crossbars may not always work ideally,which will result in computing errors.It is not mature to write a 4-bit weight into just one cell,which may lead to more errors.However,with the help of the compensation circuit and other means,memristors composed of single-bit cells can run stably.In Hung et al.(2022),the systemlevel inference accuracy of a ResNet-20 model did not decrease at all.Therefore,we split one 4-bit weight into four cells.Similarly,we divide one input into 8 bits and send one bit into the crossbar at a time,because using an 8-bit DAC is risky.In this way,one bit of the input produces 4 bits,which should be put in appropriate positions,separately.One input produces eight groups of outputs,which should be shifted and accumulated.In fact,this is the same as the column multiplication of binary multiplication.In a word,we spend more time on inputs and more hardware on weights,but achieve perfect stability.

    Because each row of subPEs shares the same input,we make each subPE receive the input from left and immediately transmit the input to the subPE on right.The leftmost subPE obtains data from the data line.Because the results of each column of subPEs will be accumulated together,we make each subPE receive the result from top,accumulate it in its own result,and send the new result to the subPE below.The results received by the uppermost subPE are all zeros.Therefore,we can use different numbers of subPEs to adapt to problems with different scales.

    4.3 Peripheral circuit

    The peripheral circuit starts after subPEs provide the result.In detail,the peripheral circuit will complete the remaining computation and judge whether the network reaches stability.Fig.7 shows the main data path of the peripheral circuit.As mentioned in Section 3.2,we make self-feedbackzdecay linearly and replace the sigmoid function with LUT.The final results arext+1,yt+1,andzt+1.We use the highest bit ofxtandxt+1for comparison to confirm that the network is stable.

    Fig.7 Peripheral circuit data path (References to color refer to the online version of this figure)

    4.4 Pipeline

    Because each iteration of CSA depends on the results of the previous iteration,it seems that there is no way to pipeline.However,considering the possible failure of CSA and the fact that the optimal solution cannot be obtained every time,we may need to run CSA several times to obtain an excellent solution for a particular problem.Therefore,we can run two CSAs for one problem at a time,without the CSAs interfering with each other,which is a pipeline not for different iterations in the same network but for different networks.

    Specifically,for the same problem,the weight and bias are exactly the same,but the initial state is different,so we do not need to rewrite the weight and bias.We can write the first batch of the initial statey1(t=0),calculatex1(t=0),and putx1(t=0)into subPEs.After obtainingwx1(t=0),we input the second batch of the initial statey2(t=0),calculatex2(t=0),and putx2(t=0) into subPEs.At the same time,we usewx1(t=0) to calculate a newy1(t=1),so we can realize the pipeline.By simply incurring some costs to store two sets of constant parameters,we can obtain twice solutions in the same amount of time.

    5 Evaluation

    5.1 Experiment setup

    Because there is no work designing a special hardware architecture for CSA,we compare COPPER with the general CPU and GPU processors.For TSPs with different numbers of cities,we run CSA on CPU and GPU,and estimate the energy cost and speed.The CPU is Intel?CoreTMi7-6700K CPU@ 4.00 GHz with 32 GB RAM,GPU 1 is NVIDIA GeForce GTX 1080 Ti,and GPU 2 is NVIDIA A100 80 GB.We use the Python packages pyRAPL and pynvml to estimate the cost of CPU and GPU,respectively.As for COPPER,we use 10× 10 subPEs,which can solve a 50-city TSP,in evaluation.The parameters in Table 5 are from Shafiee et al.(2016).

    Table 5 Device parameters

    Obviously,as the scale of the problem increases,the workload of peripheral circuits increases linearly,while the subPE sustains square growth.As shown in Table 5,each column of subPEs could obtain no more than 256 products in 1.2×2-21s.It is enough for the peripheral circuit to perform 256 times in 1.2×2-21s.That is,the frequency should be 600 MHz.The critical path(shown as the red line in Fig.7)contains two additions,one multiplication and an LUT,so the peripheral circuit can easily reach 600 MHz.Therefore,matrix-vector multiplication is the main bottleneck,and COPPER does not require high-speed peripheral circuits,so we focus only on the comparison of matrix-vector multiplication.

    5.2 Speed

    We run CSA 10 000 times and compute the average time cost on CPU and GPU.According to Table 6,as the scale of the problem increases,the acceleration of COPPER becomes obvious.

    Table 6 Comparison of COPPER with CPU and GPU

    The practical speedup should not be as significant,because many details,such as communication,buffer,and synchronization,are not taken into consideration.However,the obvious speedup is predictable due to the significant reduction in communication,and the bigger the scale,the more obvious the speedup.

    5.3 Energy efficiency

    We run CSA 10 000 times and compute the average energy cost on CPU and GPU.For CPU,the Python package pyRAPL can directly estimate the energy cost of the selected codes.For GPU,the Python package pynvml can monitor the real-time power,and we calculate the energy by multiplying the power and time used.

    As shown in Table 6,COPPER is much more efficient than CPU and GPU.With the increase of the problem scale,the improvement of efficiency is more and more obvious.We think that there are two main reasons.First,we assume that all subPEs are running,even if the problem scale is not enough touse all crossbars,so the bigger the problem,the better it is for COPPER.Second,with the increase of the problem scale,general-purpose processors spend much more energy on data handling and cyclic calculation,while COPPER holds the calculation in memory,which significantly reduces the energy cost.

    5.4 Comparison with SA

    Although COPPER can implement CSA effi-ciently,it is necessary to compare COPPER with other algorithms on general-purpose processors.Here,we test the time and energy cost of SA algorithm on CPU.Because the SA algorithm is not computation-intensive,GPUs are not in the test.The SA setup is the same as in Section 2.2,and its time and energy are still from the Python package pyRAPL.The time of CSA on COPPER equals the average iteration number divided by the frequency of COPPER,and the energy equals the value of the time times the power.Note that the energy of the peripheral circuit is ignored because this is negligible (Li et al.,2022).

    Table 7 shows the results.Compared with SA on the CPU,COPPER still exhibits clear speedup but fails in energy.The reason is that CSA contains a lot of computation(matrix-vector multiplication),while SA does not.COPPER reduces the time by parallel computing,while the energy needed is still so much.However,according to Tables 2 and 7,COPPER has advantages in speed and quality of the solution,so it is very competitive.

    Table 7 Speed and efficiency of CSA compared to those of SA

    6 Related works

    The Ising model was proposed for the phenomenon of ferromagnetic phase transition in physics.It regards the object as ad-dimensional lattice,and each lattice site represents a small magnet.Adjacent magnets interact with each other,which could be considered as coefficients/weights.The whole system has a Hamiltonian/energy,which is related to weights,magnet properties,and external magnetic fields.

    After it was discovered that the Ising model could be used to solve COPs,a series of hardware was designed,such as quantum annealing (Johnson et al.,2011;Shin et al.,2014) and CMOS annealing (Yamaoka et al.,2016;Takemoto et al.,2019,2021;Yamamoto et al.,2020).They all strived to simulate more and more lattice sites to solve problems on a larger scale,and Takemoto et al.(2021)even implemented 1.44×105lattice sites.To escape from local minima,these designs introduced various annealing mechanisms.For example,Yamaoka et al.(2016) and Takemoto et al.(2019) made the state of spin,corresponding to the small magnet in the Ising model,inverse randomly,while Takemoto et al.(2021)used probabilistic inversion.In addition,they equipped tens of thousands of spins to handle large-scale problems.

    7 Conclusions and discussion

    To solve combinatorial optimization problems,we improve the chaotic simulated annealing algorithm and design COPPER.COPPER uses advanced storage technology to improve the parallelism of operation and obtains great performance in power consumption and speed.COPPER is scalable and can solve large-scale COPs.In addition,COPPER can not only run chaotic simulated annealing,but also support the original continuous Hopfield neural network for other applications.

    However,we also see that the Ising model,CHNN,and CSA all require a high degree of parallelism,while the current hardware does not support reuse,so the hardware scale must increase with the increase of the problem scale.Physically,the hardware scale cannot increase without limits.So far,the biggest ReRAM nonvolatile PIM macro has been 8 Mb (Hung et al.,2022).Using one such macro,we can solve the TSP with 37 cities at most when quantizing the weights into 4 bits.Furthermore,using more macros,TSP with more cities can be solved.Meanwhile,except for the development of hardware,supporting temporal reuse of hardware or implementing CSA using quantum may be a promising direction.

    Contributors

    Qiankun WANG led the research and was mainly responsible for implementing the algorithm,designing the hardware,and drafting the paper.Xingchen LI provided the design ideas and some data for the hardware part.Bingzhe WU sorted out the algorithm and pointed out the possibility of combining software and hardware.Ke YANG and Yuchao YANG laid the foundation for this research and provided some parameters for the algorithm.Wei HU provided the stability analysis of ReRAM and the latest research progress of ReRAM PIM macro.Guangyu SUN made many suggestions on the research,and revised and finalized the paper.

    Compliance with ethics guidelines

    Qiankun WANG,Xingchen LI,Bingzhe WU,Ke YANG,Wei HU,Guangyu SUN,and Yuchao YANG declare that they have no conflict of interest.

    Data availability

    The data that support the findings of this study are available from the corresponding author upon reasonable request.

    精品久久久久久久人妻蜜臀av| 国产女主播在线喷水免费视频网站 | 少妇熟女aⅴ在线视频| 国产精品伦人一区二区| 久久6这里有精品| 亚洲av免费高清在线观看| 亚洲性夜色夜夜综合| 免费电影在线观看免费观看| 麻豆成人av在线观看| 亚洲精华国产精华液的使用体验 | 久久这里只有精品中国| 欧美日韩黄片免| 午夜福利在线观看免费完整高清在 | 在线天堂最新版资源| 国产精品亚洲美女久久久| 欧美+日韩+精品| 国产伦在线观看视频一区| 在线国产一区二区在线| 看片在线看免费视频| 成人欧美大片| 久久久久久久久中文| 午夜免费男女啪啪视频观看 | 亚洲图色成人| 特大巨黑吊av在线直播| 91麻豆av在线| 亚洲人成网站在线播| 级片在线观看| 久久久久精品国产欧美久久久| 亚洲不卡免费看| 免费看av在线观看网站| 又爽又黄无遮挡网站| 久久99热6这里只有精品| 亚洲成a人片在线一区二区| 99在线视频只有这里精品首页| 狂野欧美白嫩少妇大欣赏| 成人性生交大片免费视频hd| 久久午夜亚洲精品久久| 国产真实乱freesex| 成熟少妇高潮喷水视频| 熟女人妻精品中文字幕| 无人区码免费观看不卡| 亚洲欧美日韩高清专用| 日韩亚洲欧美综合| 在线免费观看不下载黄p国产 | 亚洲第一区二区三区不卡| 欧美三级亚洲精品| 精品一区二区三区视频在线观看免费| 夜夜看夜夜爽夜夜摸| 国产免费一级a男人的天堂| 欧美日韩黄片免| 12—13女人毛片做爰片一| 全区人妻精品视频| 精品不卡国产一区二区三区| 午夜福利在线观看免费完整高清在 | 亚州av有码| 亚洲综合色惰| 亚洲av成人av| 我要看日韩黄色一级片| 国产精品日韩av在线免费观看| 中文字幕熟女人妻在线| 嫩草影院精品99| av视频在线观看入口| 97热精品久久久久久| 亚洲av熟女| 亚洲四区av| 亚洲成人久久爱视频| 麻豆精品久久久久久蜜桃| 亚洲欧美精品综合久久99| 精品一区二区三区av网在线观看| 亚洲三级黄色毛片| 免费人成在线观看视频色| 人人妻人人看人人澡| 欧美日韩瑟瑟在线播放| av天堂中文字幕网| 成人av一区二区三区在线看| 欧美又色又爽又黄视频| www日本黄色视频网| 久久久久久伊人网av| 他把我摸到了高潮在线观看| 国产一区二区激情短视频| 少妇裸体淫交视频免费看高清| 校园人妻丝袜中文字幕| 日韩av在线大香蕉| 久久国内精品自在自线图片| 国产高清有码在线观看视频| 搡老妇女老女人老熟妇| 欧美成人性av电影在线观看| 亚洲欧美清纯卡通| 国产美女午夜福利| x7x7x7水蜜桃| 久久久久性生活片| 精品午夜福利视频在线观看一区| 美女高潮的动态| 国产69精品久久久久777片| 国产视频内射| x7x7x7水蜜桃| 色av中文字幕| 亚洲国产精品久久男人天堂| 精品久久久久久久末码| 日日干狠狠操夜夜爽| 亚洲av第一区精品v没综合| 99久久无色码亚洲精品果冻| 免费观看人在逋| 免费观看的影片在线观看| 美女cb高潮喷水在线观看| 中亚洲国语对白在线视频| 嫩草影院入口| 国产淫片久久久久久久久| 51国产日韩欧美| 美女高潮的动态| 免费看av在线观看网站| 直男gayav资源| 99精品在免费线老司机午夜| 天堂av国产一区二区熟女人妻| 国产精品不卡视频一区二区| 在线播放无遮挡| av天堂中文字幕网| 人妻夜夜爽99麻豆av| 国产精品福利在线免费观看| 国内揄拍国产精品人妻在线| 久久精品夜夜夜夜夜久久蜜豆| 日韩欧美三级三区| 我要看日韩黄色一级片| 岛国在线免费视频观看| 性插视频无遮挡在线免费观看| 久久久久九九精品影院| 精品人妻熟女av久视频| 又爽又黄无遮挡网站| 久久国产精品人妻蜜桃| 在线播放无遮挡| h日本视频在线播放| 免费在线观看影片大全网站| 国产精品福利在线免费观看| 在线观看一区二区三区| 日韩欧美一区二区三区在线观看| 亚洲av成人av| 极品教师在线免费播放| 成人无遮挡网站| 成人av在线播放网站| 欧美另类亚洲清纯唯美| 国产欧美日韩精品一区二区| 人人妻人人看人人澡| 欧美最新免费一区二区三区| 日韩一本色道免费dvd| 男女下面进入的视频免费午夜| 噜噜噜噜噜久久久久久91| 国产亚洲精品久久久com| 成熟少妇高潮喷水视频| 日本一本二区三区精品| 国产亚洲精品综合一区在线观看| 欧美不卡视频在线免费观看| 在线观看av片永久免费下载| av国产免费在线观看| 久久精品影院6| 最近中文字幕高清免费大全6 | 亚洲成a人片在线一区二区| 久久久久久久久久久丰满 | 亚洲精品亚洲一区二区| 美女免费视频网站| 精品久久久久久久久av| 欧美激情在线99| 能在线免费观看的黄片| 久久久久久久久久久丰满 | 国产一级毛片七仙女欲春2| 一级a爱片免费观看的视频| 99热只有精品国产| 欧美性猛交╳xxx乱大交人| 99国产精品一区二区蜜桃av| 简卡轻食公司| 久久精品国产亚洲av天美| 亚洲午夜理论影院| 亚洲电影在线观看av| 日本色播在线视频| 亚洲熟妇中文字幕五十中出| 欧美激情在线99| 很黄的视频免费| 九色国产91popny在线| 欧美人与善性xxx| 我的老师免费观看完整版| 日韩精品有码人妻一区| 九九热线精品视视频播放| 日韩高清综合在线| 国产淫片久久久久久久久| 免费黄网站久久成人精品| 日韩在线高清观看一区二区三区 | 久久欧美精品欧美久久欧美| avwww免费| 欧美一区二区国产精品久久精品| 亚洲精品久久国产高清桃花| 女人被狂操c到高潮| 天堂影院成人在线观看| av在线亚洲专区| 亚洲第一区二区三区不卡| 一a级毛片在线观看| АⅤ资源中文在线天堂| 在线观看美女被高潮喷水网站| 精品欧美国产一区二区三| aaaaa片日本免费| 久久久久久久久久久丰满 | 国产视频一区二区在线看| 国产白丝娇喘喷水9色精品| 亚洲真实伦在线观看| 国产伦精品一区二区三区视频9| 十八禁网站免费在线| 欧美在线一区亚洲| 亚洲精品久久国产高清桃花| 在线观看午夜福利视频| 国产亚洲欧美98| 成人一区二区视频在线观看| 亚洲av二区三区四区| 国产三级在线视频| 国产成年人精品一区二区| 亚洲欧美日韩卡通动漫| 国产日本99.免费观看| 国产精品永久免费网站| 中文资源天堂在线| 亚洲av五月六月丁香网| 成人av一区二区三区在线看| 国内揄拍国产精品人妻在线| 一个人免费在线观看电影| 别揉我奶头~嗯~啊~动态视频| 观看免费一级毛片| 亚洲色图av天堂| 一区福利在线观看| 我要搜黄色片| 少妇人妻精品综合一区二区 | 国产精品久久久久久久电影| 久久精品久久久久久噜噜老黄 | 亚洲欧美清纯卡通| 欧美色视频一区免费| 人妻久久中文字幕网| 欧美日韩国产亚洲二区| 久久欧美精品欧美久久欧美| АⅤ资源中文在线天堂| 亚洲精品久久国产高清桃花| 日韩欧美在线二视频| 久久久久免费精品人妻一区二区| 悠悠久久av| 日韩欧美三级三区| 99久久无色码亚洲精品果冻| 99久国产av精品| 人人妻人人看人人澡| 国产高清视频在线播放一区| 久久精品国产清高在天天线| 极品教师在线免费播放| 日韩 亚洲 欧美在线| 国产高清三级在线| 欧美精品啪啪一区二区三区| 国产爱豆传媒在线观看| 国产成年人精品一区二区| 在线观看午夜福利视频| 久久精品国产亚洲av天美| 特级一级黄色大片| 久久久午夜欧美精品| 亚洲无线在线观看| 国产淫片久久久久久久久| 久久人妻av系列| 亚洲中文字幕日韩| 99热网站在线观看| 高清毛片免费观看视频网站| 色噜噜av男人的天堂激情| 亚洲美女视频黄频| 99国产精品一区二区蜜桃av| 国产精品伦人一区二区| 在线国产一区二区在线| 99久久久亚洲精品蜜臀av| 免费电影在线观看免费观看| 69av精品久久久久久| 国产v大片淫在线免费观看| 熟妇人妻久久中文字幕3abv| 日本五十路高清| 制服丝袜大香蕉在线| 国产日本99.免费观看| 中文字幕高清在线视频| 免费观看在线日韩| 国产伦精品一区二区三区视频9| а√天堂www在线а√下载| 桃色一区二区三区在线观看| 中文资源天堂在线| 欧美日本亚洲视频在线播放| 综合色av麻豆| 别揉我奶头~嗯~啊~动态视频| 人妻丰满熟妇av一区二区三区| 1000部很黄的大片| 国产成年人精品一区二区| 日本欧美国产在线视频| 国产又黄又爽又无遮挡在线| 婷婷精品国产亚洲av| 精品一区二区免费观看| 少妇猛男粗大的猛烈进出视频 | 免费观看人在逋| 久久久色成人| 大又大粗又爽又黄少妇毛片口| 日韩欧美 国产精品| 久久这里只有精品中国| 色播亚洲综合网| 久久久国产成人精品二区| 欧美+日韩+精品| 精品久久久久久久久亚洲 | 精品久久久久久成人av| 成人精品一区二区免费| 亚洲成人久久性| 变态另类丝袜制服| 亚洲成人久久爱视频| 国产激情偷乱视频一区二区| 一进一出抽搐gif免费好疼| 亚洲精品粉嫩美女一区| 给我免费播放毛片高清在线观看| 十八禁国产超污无遮挡网站| 国内精品一区二区在线观看| 免费看a级黄色片| 九九热线精品视视频播放| 国产精品一区www在线观看 | 精品午夜福利在线看| 免费观看的影片在线观看| 国产一区二区在线av高清观看| 国产伦人伦偷精品视频| 免费看日本二区| 黄色日韩在线| 精品乱码久久久久久99久播| 久久九九热精品免费| 欧美国产日韩亚洲一区| 国产黄片美女视频| 久久中文看片网| av女优亚洲男人天堂| 天堂网av新在线| 悠悠久久av| 久久久久久久午夜电影| 国内精品久久久久久久电影| 国产精品99久久久久久久久| 精华霜和精华液先用哪个| 一个人观看的视频www高清免费观看| 一卡2卡三卡四卡精品乱码亚洲| 老熟妇乱子伦视频在线观看| 久久久久久伊人网av| 国产精品久久视频播放| 精品人妻一区二区三区麻豆 | 久久精品国产99精品国产亚洲性色| 国产乱人伦免费视频| 国产精品久久久久久亚洲av鲁大| 一区二区三区高清视频在线| 一级黄片播放器| 精品午夜福利在线看| 成年版毛片免费区| 三级毛片av免费| 午夜老司机福利剧场| 最近最新免费中文字幕在线| 国产av一区在线观看免费| 国产一区二区三区视频了| 亚洲av免费在线观看| 亚洲国产色片| 99热精品在线国产| 成人毛片a级毛片在线播放| 久久久精品欧美日韩精品| 日本黄色片子视频| 老熟妇乱子伦视频在线观看| 免费不卡的大黄色大毛片视频在线观看 | 无人区码免费观看不卡| 日韩欧美在线乱码| 午夜福利欧美成人| 一本久久中文字幕| 亚洲在线自拍视频| 国产三级在线视频| 91久久精品国产一区二区成人| 日本在线视频免费播放| 两个人的视频大全免费| 国产精品一区二区三区四区久久| 国产美女午夜福利| 一个人看视频在线观看www免费| 亚洲人成网站在线播放欧美日韩| 少妇人妻一区二区三区视频| 久久久精品欧美日韩精品| 99热这里只有是精品在线观看| 欧美绝顶高潮抽搐喷水| 男人舔女人下体高潮全视频| 国产亚洲av嫩草精品影院| 韩国av一区二区三区四区| 美女免费视频网站| 一边摸一边抽搐一进一小说| 精品久久久久久久人妻蜜臀av| 少妇人妻一区二区三区视频| 免费搜索国产男女视频| 国产大屁股一区二区在线视频| 精品一区二区三区人妻视频| 亚洲av日韩精品久久久久久密| 国产日本99.免费观看| 亚洲va日本ⅴa欧美va伊人久久| 草草在线视频免费看| 欧美色视频一区免费| 亚洲性久久影院| 国产主播在线观看一区二区| 亚洲欧美清纯卡通| 国产色爽女视频免费观看| 国产免费一级a男人的天堂| 18禁在线播放成人免费| 亚洲精品456在线播放app | 天天躁日日操中文字幕| 啦啦啦韩国在线观看视频| 精品人妻视频免费看| 国产成人影院久久av| 日日夜夜操网爽| 国产三级在线视频| 色播亚洲综合网| 日日摸夜夜添夜夜添小说| 精品久久久久久,| 午夜老司机福利剧场| 午夜视频国产福利| 在线观看一区二区三区| 亚洲最大成人手机在线| 欧美绝顶高潮抽搐喷水| 一进一出好大好爽视频| 国产美女午夜福利| 亚洲欧美日韩东京热| 他把我摸到了高潮在线观看| 国产精品嫩草影院av在线观看 | 22中文网久久字幕| 亚洲欧美日韩卡通动漫| 一本精品99久久精品77| 日韩欧美 国产精品| 91久久精品国产一区二区三区| 亚洲图色成人| 精品人妻偷拍中文字幕| 69人妻影院| 最近最新中文字幕大全电影3| 2021天堂中文幕一二区在线观| av视频在线观看入口| 国产大屁股一区二区在线视频| 日韩人妻高清精品专区| 少妇猛男粗大的猛烈进出视频 | 国产毛片a区久久久久| 日韩欧美一区二区三区在线观看| 国产美女午夜福利| 亚洲色图av天堂| 国产精品日韩av在线免费观看| 中文字幕免费在线视频6| 色哟哟哟哟哟哟| 国产黄色小视频在线观看| 亚洲在线自拍视频| 亚洲乱码一区二区免费版| 美女cb高潮喷水在线观看| 亚洲,欧美,日韩| 一进一出好大好爽视频| www.色视频.com| 久久久色成人| 99久久九九国产精品国产免费| 欧美日韩亚洲国产一区二区在线观看| 亚洲不卡免费看| 两个人的视频大全免费| 一个人免费在线观看电影| 国产av一区在线观看免费| 亚洲精品一区av在线观看| 午夜精品在线福利| 看免费成人av毛片| 变态另类丝袜制服| 别揉我奶头 嗯啊视频| 色在线成人网| 午夜爱爱视频在线播放| 久久久久国产精品人妻aⅴ院| 国产成年人精品一区二区| 欧美日韩亚洲国产一区二区在线观看| 国产精品久久久久久亚洲av鲁大| 国模一区二区三区四区视频| 欧美绝顶高潮抽搐喷水| 有码 亚洲区| 中文字幕人妻熟人妻熟丝袜美| 亚洲欧美日韩无卡精品| 别揉我奶头~嗯~啊~动态视频| 99视频精品全部免费 在线| 免费观看人在逋| 成人精品一区二区免费| 高清日韩中文字幕在线| 国产单亲对白刺激| 嫩草影视91久久| 18禁裸乳无遮挡免费网站照片| 国产乱人伦免费视频| 露出奶头的视频| 淫妇啪啪啪对白视频| 久久婷婷人人爽人人干人人爱| 成年免费大片在线观看| 日韩欧美国产一区二区入口| 麻豆成人午夜福利视频| 动漫黄色视频在线观看| 欧美精品国产亚洲| 女生性感内裤真人,穿戴方法视频| 亚洲男人的天堂狠狠| 国产免费av片在线观看野外av| 日韩欧美在线二视频| 国产一级毛片七仙女欲春2| 午夜老司机福利剧场| 白带黄色成豆腐渣| 噜噜噜噜噜久久久久久91| 中文资源天堂在线| 观看美女的网站| 欧美人与善性xxx| 午夜亚洲福利在线播放| 可以在线观看毛片的网站| 久久精品夜夜夜夜夜久久蜜豆| 欧美国产日韩亚洲一区| 国产精品免费一区二区三区在线| 十八禁国产超污无遮挡网站| 丰满人妻一区二区三区视频av| 国产在线精品亚洲第一网站| 免费不卡的大黄色大毛片视频在线观看 | 亚洲国产精品合色在线| 亚洲av第一区精品v没综合| 亚洲最大成人中文| 亚洲av熟女| 成人永久免费在线观看视频| 国产精品自产拍在线观看55亚洲| 波多野结衣巨乳人妻| 国产精品日韩av在线免费观看| 亚洲av电影不卡..在线观看| 亚洲不卡免费看| 午夜福利在线在线| 午夜激情福利司机影院| 国产女主播在线喷水免费视频网站 | 久久久久久久久久久丰满 | 日本a在线网址| 国产探花极品一区二区| 舔av片在线| 91狼人影院| 亚洲最大成人av| 小蜜桃在线观看免费完整版高清| 九色成人免费人妻av| 国产综合懂色| 色吧在线观看| 久久人妻av系列| 国产精品免费一区二区三区在线| 欧美高清性xxxxhd video| 午夜a级毛片| 欧美另类亚洲清纯唯美| 国产不卡一卡二| 中文字幕免费在线视频6| 成人国产麻豆网| 一卡2卡三卡四卡精品乱码亚洲| 午夜福利18| 国产视频内射| 2021天堂中文幕一二区在线观| xxxwww97欧美| 嫩草影视91久久| 国产人妻一区二区三区在| 国内精品美女久久久久久| 国产亚洲精品久久久com| 日本黄色视频三级网站网址| 最后的刺客免费高清国语| 变态另类成人亚洲欧美熟女| 午夜福利在线观看吧| 黄色日韩在线| 中亚洲国语对白在线视频| 伦理电影大哥的女人| 男女之事视频高清在线观看| 色5月婷婷丁香| 极品教师在线免费播放| 欧美日本视频| 国产精品自产拍在线观看55亚洲| 亚洲中文日韩欧美视频| 亚洲最大成人av| 国产精品人妻久久久久久| 噜噜噜噜噜久久久久久91| 在线播放无遮挡| 日韩欧美免费精品| 亚洲欧美日韩东京热| 露出奶头的视频| 天堂网av新在线| 免费电影在线观看免费观看| 中文在线观看免费www的网站| 99久久中文字幕三级久久日本| 九色国产91popny在线| 99热网站在线观看| 九色国产91popny在线| 亚洲国产欧洲综合997久久,| 精品国内亚洲2022精品成人| 国产成人福利小说| a级一级毛片免费在线观看| 久久久久久久午夜电影| 国产亚洲精品av在线| 亚洲狠狠婷婷综合久久图片| 亚洲精华国产精华液的使用体验 | 少妇高潮的动态图| 我要搜黄色片| 黄色配什么色好看| 亚洲精品影视一区二区三区av| 色综合色国产| 夜夜爽天天搞| 麻豆成人av在线观看| www.www免费av| 动漫黄色视频在线观看| 成人性生交大片免费视频hd| 亚洲人成网站在线播| 黄色女人牲交| 久久人妻av系列| 床上黄色一级片| 日韩欧美在线乱码| 一个人免费在线观看电影| 伦理电影大哥的女人| 最近最新免费中文字幕在线| 日本黄色视频三级网站网址| 久久久色成人| 日本免费一区二区三区高清不卡| 麻豆精品久久久久久蜜桃| 午夜福利18| 中文字幕免费在线视频6| 99久久中文字幕三级久久日本| 又黄又爽又刺激的免费视频.| 99在线人妻在线中文字幕| 日韩欧美在线乱码| 成人午夜高清在线视频| av天堂中文字幕网| 男人和女人高潮做爰伦理| 国产三级中文精品| 国产日本99.免费观看| 国产不卡一卡二| 国产在线精品亚洲第一网站| 男女边吃奶边做爰视频| 久久精品国产亚洲av涩爱 |