• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Design space exploration of neural network accelerator based on transfer learning①

    2023-12-15 10:43:28WUYuzhang吳豫章ZHITianSONGXinkaiLIXi
    High Technology Letters 2023年4期
    關(guān)鍵詞:豫章

    WU Yuzhang(吳豫章), ZHI Tian②, SONG Xinkai, LI Xi

    (?School of Computer Science, University of Science and Technology of China, Hefei 230027, P.R.China)

    (??State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences,Beijing 100190, P.R.China)

    Abstract

    Key words: design space exploration (DSE), transfer learning, neural network accelerator,multi-task learning

    0 Introduction

    Artificial intelligence (AI) algorithms, particularly deep neural networks, have demonstrated remarkable results in various fields such as image recognition and natural language processing.Initially, neural networks were executed on central processing unit(CPU), but with the development of the algorithms, it was discovered that graphics processing unit (GPU),with their parallel computing capabilities, were better suited for these calculations.To maintain computational efficiency while reducing power consumption, dedicated neural network accelerators such as Eyeriss[1],Google tensor processing unit (TPU)[2], Tesla neuralnetwork processing unit (NPU)[3], and others have been proposed.

    As algorithmic advancements continue, the architecture of dedicated accelerators must be updated to accommodate these changes.Different accelerators have varying memory levels, data flow, and mapping of processing element (PE) arrays.To find a globally optimal solution rather than a local optimal solution, these configurations must be explored in the parameter space.Consequently, researchers are seeking ways to automate the design of neural network accelerators while accounting for constraints.In recent years, many studies have focused on the design automation of deep neural network (DNN) accelerators, including Dnn Weaver[4], DNNBuilder[5], AutoDNNchip[6], ConfuciuX[7], among others.

    Design space exploration (DSE) plays a critical role in optimizing the performance and efficiency of neural network accelerators, which are specialized hardware devices designed to accelerate the computation of neural networks.DSE algorithms typically involve exploring a large number of possible parameter configurations, such as the number and type of processing elements (PEs), memory sizes, and interconnect architectures, to find the optimal set of parameters that meets certain design objectives, such as throughput, power consumption, or area.Different DSE tools and frameworks have been proposed to automate and optimize the process of exploring the design space, by using various techniques such as simulation, modeling,and machine learning.The work of DSE in recent years includes Interstellar[8], simulating machine learning applications using gem5-aladdin (SMAUG)[9], Accelergy[10], dMazeRunner[11], and loop-order-based memory allocation (LOMA)[12].These DSE tools have the potential to significantly reduce the design time and cost of neural network accelerators, and enable the design of highly customized and efficient hardware for specific neural network workloads.However, DSE also presents some challenges, such as the large search space, the complexity of the hardware design, and the need to balance multiple conflicting design objectives.

    In this work,the extension of the concept of transfer learning to improve the performance of a target learning task by reusing knowledge from a related source task or domain is proposed.Unlike traditional machine learning algorithms, transfer learning has the advantage of reducing the training time of a pretrained model, which can be applied to similar processor architectures.The parameters abstracted by different processor architectures are reused, a part of the hidden layer of the pre-trained model is reserved, and the input hidden layer is retrained to adapt to the target task.

    The similarities in the design space of different tasks performed by the unified intelligent processor architecture are explored.Transfer learning can use either hard parameter sharing or soft parameter sharing to optimize the shared model.Hard parameter sharing means that multiple tasks share a common model,while soft parameter sharing allows each task to have its own model and parameters, where some parameter values are shared among all tasks.To address the challenge of different input sizes, the use of hard parameter sharing to connect sub-model structures is proposed,where each task has its own sub-model.

    The paper is organized as follows.In Section 1,the computing task on a neural network accelerator and how design parameters affect the latency and energy results are described.Section 2 is related work.Section 3 describes the proposed methodology for DSE based on transfer learning.In Section 4, experimental results demonstrate the effectiveness of the methodology are presented.The concluded and future research directions are discussed in Section 5.

    1 Problem description

    1.1 Convolutional neural networks (CNN)accelerator

    The applications of CNN are numerous and cross many disciplines.They can be particularly useful in the fields of robot control, image identification, and natural language processing.CNN can help recognize items in photos and categorize them by using image recognition, it can recognize text and produce relevant outputs in natural language processing, and it can also aid robot navigation and object recognition in robot control.

    CNN uses convolution operations to process input data.Convolution operation involves taking a small matrix, known as a kernel or filter, and sliding it over the input matrix to compute a new output matrix.

    Fig.1 depicts the convolution operation, whereOstands for the output matrix,Ifor the input matrix,andWfor the convolution matrix;CIdenotes the number of input channels,COdenotes the number of output channels, andWOandHOdenote the width and height of the output tensor, respectively, andKWandKHdenote the width and height of the convolution kernel.

    Fig.1 Convolutional computation written as a loop

    A dedicated neural network accelerator is a hardware device specifically designed to accelerate convolution operations in deep neural networks.These accelerators leverage the parallelism inherent in convolution operations to perform them faster and more efficiently than general-purpose CPUs or GPUs.

    The memory hierarchy in a neural network accelerator typically consists of several levels of memory as shown in Fig.2, ranging from on-chip registers and caches to off-chip dynamic random-access memory(DRAM).This hierarchy is designed to minimize memory access time and optimize data transfer between different levels of memory.

    Fig.2 In a neural network accelerator, the PEs usually have dedicated registers that store intermediate results during computation

    Design parameters in dedicated neural network accelerators involve various hardware-level optimizations to improve performance, energy efficiency, and accuracy.These parameters include the size of on-chip memory, the number and type of processing units, the dataflow and scheduling of computations, and the precision of data representation.The design parameters can significantly affect the performance and power consumption of the accelerator, as well as the accuracy of the neural network inference.Therefore, exploring the design space and identifying the optimal set of design parameters is crucial for achieving high-performance and energy-efficient neural network accelerators.

    1.2 Spatial mapping and temporal mapping

    The mappings determine the approach for executing the computation of a neural network on a hardware accelerator, and can be classified into two distinct types.

    (1)Spatial mapping.The movement of operands across PE arrays in the spatial dimension is governed by spatial mappings.The selected mapping for a neural network computation on a hardware accelerator can greatly impact the level of data reuse per operand.As a result, the frequency of accessing storage outside the array can be reduced.

    (2)Temporal mapping.Temporal mapping determines the order in which multiply-accumulate (MAC)operations are executed within each PE of every neural network layer.This is achieved by using a set of nested for loops that operate on the operands of the MAC operations, which are distributed across the memory hierarchy.Each loop is mapped to a specific memory level,such as different Static random-access memorys(SRAMs).The order, size, and layer dimension of the nested loops define the temporal map, which plays a critical role in maximizing data reuse and minimizing memory accesses.

    Both spatial and temporal mappings are crucial considerations for optimizing hardware and software design in terms of their impact on DNN accelerator performance.Ultimately, the best mapping strategies will depend on the specific requirements of the DNN being accelerated, as well as the characteristics of the hardware and software being utilized.

    The complete temporal and spatial mapping can be constructed based on the level and characteristics of the memory hierarchy that is provided.However,given that there are potentially millions to billions of uniform or uneven mappings in the very wide mapping space,evaluating each mapping point to determine the best one would be time-consuming (note that it is only mapping a neural network layer on a fixed architecture).To address this issue, a neural network model based on transfer learning is utilized to predict the architecture parameter space for the next iteration, thus reducing a significant amount of traversal work.

    1.3 Pareto front in multi-objective optimization

    The formal definition of the multi-objective optimization problem is as follows.

    In above equations, the variablexdenotes the decision vector, whereNdenotes the number of independent variablesxi.fh(x) denotes the objective function,gj(x) denotes the equation constraint,hk(x)denotes the equation constraint, andH,J, andKrepresent their quantities, respectively.

    Whenx1andx2meet the requirements,x1is said to dominatex2.For all objective purposes,x2is not worse thanx1.For at least one objective function,x1is strictly superior tox2.

    The Pareto optimal solution set is the set of all possible undominated solutions, and the Pareto front is the boundary of the Pareto optimal solution set.The Pareto front is a useful tool for analyzing and visualizing the trade-offs between different objectives in these problems.

    The concept of Pareto optimality is incorporated in this work to maintain an undominated solutions set during the search process.The main objective is to generate a set of solutions that are not dominated by any other solution in the search space in terms of multiple objective functions.

    In each iteration of the algorithm, a new solution is generated and evaluated based on one or more objective functions that measure its performance.If the solution is not dominated by any of the solutions in the current set, it is added to the set, and the dominated solution is removed in the set.The Pareto front is used to guide the search process, which is a set of solutions that represent the optimal trade-off between different objectives as shown in Fig.3.

    Fig.3 The scatter plot of latency and energy under different design parameters

    It is possible that the search algorithm may not find a satisfactory result, meaning it may not converge to a single optimal solution.In such a scenario, the Pareto optimal solutions found during the search process can still be output.These solutions are considered non-dominated and represent the trade-off between the different objectives.Even though the search algorithm did not converge to a single optimal solution, the set of Pareto optimal solutions provides useful information to users on the best trade-offs between different objectives (energy, latency, and area).

    1.4 DSE

    DSE is a process of searching for an optimal set of architecture parameters and mapping parameters for a given set of computation tasks, subject to certain constraints.The goal is to find a design that optimizes a specific objective function, such as minimizing latency, maximizing throughput, or minimizing power consumption.

    In this work, the input to the DSE process includes the architecture parameters, such as communication bandwidths, memory hierarchy, and interconnect topology, as well as the computation tasks, such as the data size, the number of operations, and the computational requirements of each task.The constraints may include resource limitations, power budgets, and timing requirements.

    The DSE process involves exploring the design space to find the best architecture and mapping parameters that satisfy the constraints and optimize the objective function.The design space is defined by the possible values of the design and mapping parameters, and can be very large and complex.The search can be performed by using various techniques, such as bruteforce search, heuristic search, or optimization algorithms.

    The output of the DSE process is a set of architecture parameters and mapping parameters that satisfy the constraints and optimize the objective function.The architecture parameters determine the hardware resources and organization, such as the number and size of processing elements, RAM sizes, and configurations.The mapping parameters determine how the computation tasks are mapped onto the hardware resources, such as the allocation of data to memory, the assignment of processing elements to tasks, and the scheduling of data transfers between the memory and the processing elements.

    LetDdenotes the design parameters,Mdenotes the mapping parameters, andTdenotes the computational task.LetE(D,M,T) denotes the energy consumption of the accelerator with design parametersDand mapping parametersMand executing computational taskT.LetL(D,M,T) denotes the latency of the accelerator with design parametersD, mapping parametersMand executing computational taskT.

    The goal of the design space exploration is to find the optimal set of design parameters and mapping parameters (D,M) that minimize the energy consumption and latency of the accelerator, subject to any constraints on the design parameters and mapping parameters.

    This can be expressed as the following optimization problem.

    whereDandMare the sets of valid design parameters and mapping parameters, respectively.

    Solving this optimization problem requires exploring the design space by evaluating the energy consumption and latency for different combinations of design parameters and mapping parameters, and selecting the optimal set of parameters that satisfies the constraints and minimizes the objective function.This can be done by using various optimization techniques, such as simulated annealing(SA), heuristic search, or the proposed transfer learning-based approach.

    2 Related work

    Design space exploration is a critical step in optimizing the performance and efficiency of hardware accelerators for deep neural networks.However, exploring the design space can be time-consuming and computationally expensive.

    Traditional methods, such as exhaustive search[13], random search[14], and SA[15], often require extensive search iterations to find the optimal design parameters and mapping parameters.This can result in long wait times and limited scalability, particularly when exploring complex design spaces.

    To address this challenge, researchers have proposed several methods to accelerate design space exploration.Refs [16,17] introduced a fast approach for micro-architectural design space exploration and later improved the robustness of design space modeling.Chen et al.[18]explored the microprocessor design space by using unlabeled design configurations.Additionally, there are also some work exploring the design space of field programmable gate array (FPGA)-based accelerators, such as the design space exploration of FPGA accelerators for convolutional neural networks proposed by Rahman et al.[19]and the exploration of FPGA-based accelerators with multi-level parallelism presented by Zhong et al.[20].The proposed methods for micro architectural design space exploration may have limited applicability due to their focus on specific aspects of design optimization.

    Zig-Zag[21]designed a mapping search engine by using heuristic search strategies to locate the optimal mapping points on energy and performance.Heuristics are problem-solving techniques that use practical methods to solve problems in a reasonable amount of time.In this case, mapping search engines use heuristics to guide the search towards more promising areas of the design space, rather than searching the entire space exhaustively.

    AIRCHITECT[22]described a novel approach to design space exploration that involves using multilayer perceptron (MLP) to learn the optimization task and predict optimal parameters for custom architecture design and mapping space.This approach bypasses the iterative sampling of the design space using simulation or heuristic tools, which can be a costly process.

    Confuciu X[7]leveraged the reinforcement learning (RL) method to guide the search process.The RL agent generates resource assignments as ‘a(chǎn)ctions’ that are evaluated by a fast analytical model for DNN accelerators called MAESTRO, and the environment outputs‘rewards’ that are used to train the underlying policy network.

    3 Methodology

    3.1 Overview

    A novel approach based on transfer learning is proposed in this work to conduct design space exploration accurately and quickly as shown in Fig.4.The optimal design parameters and mapping parameters are predicted by utilizing an MLP based on previously learned experiences.Compared with traditional methods,using MLP can greatly reduce the number of iterations and can be accelerated by GPU.

    Fig.4 Flow chart for design space exploration

    The design rationale for using transfer learning in this work is based on the principle that models trained on a particular task can be transferred to improve learning on a new,related task.By pre-training the MLP on a set of similar design problems,the learned knowledge can be used to guide the search process for a new problem.

    Furthermore, it is demonstrated that the proposed method is superior to existing approaches that uses MLPs.MLP-based approaches require a large amount of data and are prone to overfitting, resulting in limited scalability and slower training times.In contrast, the transfer learning-based approach can use previously learned experiences to train the model,resulting in faster and more efficient training.

    The algorithm is designed to provide optimal design and mapping parameters for a given processor architecture and computing task to achieve minimum power consumption and latency.It consists of four main phases: the parser phase, training phase, exploration phase, and evaluation phase.

    In the parser phase, the input processor architecture and computing task information are translated into specific cycle scales, memory hierarchy, latency, and power consumption of different cell modules for further processing.

    In the exploration phase, the algorithm generates candidate design parameter combinations to explore the design space.These combinations are either initialized at the start of the algorithm or updated based on the output of the neural network in the training phase.Once the candidate combinations are generated, the algorithm runs various mapping strategies in the simulator to calculate the latency and power consumption values for each combination.

    In the training phase, a neural network model is trained and weighted based on the design and mapping parameters and their corresponding latency and power consumption.The neural network model is then used to make predictions, and the best candidate obtained is used as input for the next iteration of the exploration phase.

    In the evaluation phase, the obtained candidate points are evaluated to determine whether the delay and power consumption meet the requirements.If the requirements are met, the design and mapping parameters are directly output.Otherwise, the algorithm enters the training phase.This process is iterated until the evaluation phase meets the exit criteria.

    3.2 Parser phase

    In the initial phase of the algorithm, the input accelerator architecture parameters are processed, including information on the memory structure from DRAM to PE, the optional calculation modes, and the power consumption and latency of each component during the calculation process.Based on this information, a processor object is instantiated in the simulator and the required design parameters can be determined.

    Next, the layer information of the neural network is considered.This includes the size of the input and output matrix, the shape of the convolution kernel,step size, padding, dilation rates, and other relevant parameters.This information is then converted into a computational task in the form of a loop to facilitate later operation and evaluation.

    3.3 Exploration phase

    During the exploration phase, the accelerator parameters are initialized for the first run.This includes the size of each level of memory, the choice of computation mode, and other factors that depend on the given accelerator architecture.

    The initialization of the accelerator parameters can be performed using a completely random initialization,or a pre-trained model can be used to estimate the optimal merit of the architecture based on the results of the previous design.

    If it is not the first run, the candidate design parameter points can be obtained from the training phase,where the neural network has been trained on previous design parameters and mapping parameter combinations and their corresponding latency and power consumption results.Then a neural network can be used to predict the most advantageous design parameters combinations for the next iteration of the exploration phase.

    Different mapping strategies are run in the simulator by using the initialized or candidate design parameters.Each mapping strategy generates a set of mapping parameters combinations and their corresponding latency and power consumption results.The mapping parameters combinations are the parameters used to map the computation onto the hardware accelerator, such as the choice of loop unrolling factors, tiling factors, and parallelization factors.

    Once the mapping strategies are executed, the algorithm evaluates the results and selects the mapping parameters combination with the optimal latency and power consumption.Then the mapping parameters combination is used as input for the training phase.The exploration phase continues until the exit criteria are met, such as achieving the desired latency and power consumption targets or reaching a maximum number of iterations.

    3.4 Training phase

    In order to anticipate the optimum performance and power consumption for the optimal mapping case in architectures with various design parameters, a transfer learning model is trained.

    The neural network model consists of seven fully connected layers.The input layer will determine the width of the input according to the design parameters required by the accelerator architecture obtained in the parser stage, and the output layer will output the latency and energy corresponding to the architecture parameters.

    Before the training process starts, the 1st, 2nd,6th, 7th and 8th layers need to be randomly initialized, and after pruning and fine-tuning the weights of pre-trained models trained under other architectures,they are applied to the 3rd, 4th, and 5th layers as shown in Fig.5.

    Fig.5 The transfer learning neural network structure consists of various layers that accept design parameters and task information as input and output corresponding energy and latency

    Pruning and fine-tuning can improve the performance of the pre-trained model on the new task while reducing the amount of new data needed for training.By removing unimportant or redundant parameters through pruning, the model becomes more efficient and easier to train.Fine-tuning allows the pre-trained model to be adapted to the new task by adjusting its existing weights and biases to fit the new data, which can lead to better performance on the new task.A method similar to PAC-Net[23]is used.

    Supposefws:X→Yis a model trained on the source data set, its weight vector isws, using minimizing the standard negative log-likelihood method for fine-tuning the pre-training model.

    Then, for the weight vectorw= (w1,w2,...,wN) , the maskm= (m1,m2,...,mN) is used to remove smaller weights.

    wherewtis a manually set threshold.

    Next, the remaining unpruned vectors are embedded with the remaining information, which is represented aswU=w☉m.

    Finally, adjust the pruned vectorwP=w☉(1 -m).

    The target weight vectorfwtis obtained by addingwuandwP.

    Since it is a multi-objective optimization, the loss function needs to use a balanced loss function:

    whereNrepresents the total number of samples,enirepresents the predicted energy of thei-th sample,en︿irepresents the actual energy of thei-th sample, andlairepresents thei-th sample.The predicted latency of theisample,la︿irepresents the actual latency of thei-th sample, because the magnitude of the two is different,so the weightαneeds to be set.

    According to the results of the exploration phase,substitute the accelerator parameters and optimal delay power data of this iteration into the model for training,and adjust the transfer learning model network weights.Then use the network to predict the energy and latency of other points, so as to select the next batch of candidate parameters for the optimal solution.This iteration is repeated until the evaluation phase is successfully completed.

    3.5 Evaluation phase

    In this phase, it is determined whether the search process can be terminated by judging whether the output satisfies a predetermined threshold for latency and power consumption.

    If both energy and latency are less than the target:

    the search can be ended immediately.

    However, this situation is difficult to obtain directly during training, so it cannot be used to evaluate the quality of results.The following formula is used for comparison:

    If results that simultaneously meet the objectives have not been achieved, the number of iterations can be set to force the algorithm to stop and output the result with the highest Qor achieved until that iteration.

    4 Experiments

    4.1 Overview

    In this article, the experiment uses Python, taking AlexNet, VGG, and other networks as benchmarks to explore the design parameters of Google TPU, Eyeriss, and Tesla prototype architectures, the design parameters are shown in Table 1.The running example decomposes the network into layer-by-layer convolution or vector operations.Taking AlexNet as an example,the main computing tasks include different convolutional layers.

    Table 1 The design parameters

    4.2 Comparison with other methods

    To assess the effectiveness of this method, several other methods are selected and modified to enable a fair comparison.These methods are as follows.

    SA: SA is used as a baseline comparison method.SA iteratively explores the design space by making random modifications to the current architecture, accepting or rejecting changes based on a probability distribution dependent on the current temperature.The temperature decreases over time, becoming more restrictive and allowing the algorithm to converge towards a minimum.The initial temperature is set as 2 ×10-8.

    Deep reinforcement learning (DRL): the policy gradient approach is used in the DRL method, similar to ConfuciuX[7].An actor network is used to update the policy network, with the current network parameters and configurations as states and modifications to the configurations as actions.Rewards are obtained when the current action approaches the states that satisfy the objectives, with a bonus added if the current state already satisfies the objectives.Hyper parameter tuning is performed to obtain relatively better results.

    Large multilayer perceptron (Large MLP): the results were tested by applying MLP alone, similar to AIRCHITECT[22], and by applying the design selector to improve the results.AlexNet was run on the Google TPU architecture, and energy and latency were predicted based on the design parameters, withk-fold crossvalidation used for training.Once training is complete,a new training task is performed on Eyeriss architecture to run VGG16, and the randomly initialized network is compared with the network that had transferred the previous training results.As shown in Fig.6, the training efficiency of the transfer network is significantly improved.

    Fig.6 The training losses of the transfer learning model and the MLP model without transfer learning

    Heuristic method: similar to Zig-Zag[21], this is a DSE algorithm that iteratively searches for promising designs and refines them.The algorithm takes advantage of the fact that a design can often be decomposed into a set of sub-designs, each of which can be optimized independently.An optimal solution can be eventually converted by iteratively refining these sub-designs.

    The transfer learning method is evaluated by running different convolutional network layers ten times using different methods, allowing a margin of error of 1%for the resulting delay and power measurements.This means that if the latency and power of the DSE output are not worse than the user’s target, the target is still considered satisfied.

    The starting point of design space exploration are randomly initialized before the start of the DSE algorithms, and different DSE algorithms are running until they meet the conditions or the number of iterations reaches the upper limit.The total running time of 10 independent runs is then counted,and the average iterations and number of satisfied results are displayed in Table 2, source model trained on Google-TPU-like architecture, CONV 1 -3 running on Eyeriss-like architecture, CONV 4 -6 running on Tesla-NPU-like architecture, CONV 7 - 9 running on Meta-prototype-like architecture.These methods are run 10 times on 9 cases, for a total of 90 times, and count the number of the final output results of the algorithm that satisfy the requirements.The efficiency of different algorithms is compared by comparing the total time to obtain a result under a given constraint against a baseline of SA algorithm, as shown in Fig.7.The effectiveness of the transfer learning method is demonstrated by the comparisons made with other DSE methods.

    Fig.7 Comparison of runtimes of different DSE methods

    Table 2 Average iterations of different methods

    4.3 Multi-task learning

    Multiple tasks need to be handled by an accelerator, so the situation of exploring multiple tasks simultaneously needs to be considered.Three different scenarios are compared in this study.

    Repeat: the DSE framework can be run repeatedly for a variety of different inputs to obtain the optimal parameter configuration under their respective conditions.The biggest advantage of this method is that it does not need to change any framework and input.

    Merge: the approach is to merge the computing tasks of multiple problems into a single problem, which increases the time per iteration but reduces the number of iterations required.This results in a significant improvement in the method’s operating efficiency compared with repeated operations.Although there may be inconsistencies in the length of input parameters, the solution is to simply expand the input width.

    Multi-task: to simultaneously evaluate the performance of multiple tasks, a multi-task model based on transfer learning is employed.To enable the model to process different tasks simultaneously, the transfer learning model is modified as shown in Fig.8.Specifically, an independent 1st layer is created for each input, with no interference between the 1st layers of different tasks.This approach addresses the issue of different numbers of input parameters between different tasks.Layers 2 -6 remain the same as described in Section 3,with the 2nd and 6th layers randomly initialized and the 3rd, 4th, and 5th layers transferred from the pre-trained model.The 7th and 8th layers are also divided into independent modules, each corresponding to the output of a different task.

    Fig.8 The structure of transfer learning model in different tasks

    The efficiency of the three proposed scenarios is evaluated by conducting multiple runs of the DSE algorithm using each scheme with different test sets, and the algorithm is terminated once the performance goal is achieved.Specifically, the final results had to meet the performance goal.Ten independent runs are conducted for each scenario and the average performance is calculated, as illustrated in Fig.9.The results demonstrate that the multi-task model outperforms the repeat model and is marginally superior to the merge model.The multi-task model significantly improves efficiency and enables a more comprehensive analysis of design space.

    Fig.9 Comparison of DSE times using different methods in multi-tasking scenarios

    5 Conclusion

    In this study, a new DSE method is proposed for neural network accelerator design based on transfer learning.A new approach is presented that can accurately predict energy and latency performance parameters using neural networks that can learn from various accelerator architectures and tasks.Compared with other DSE tools,higher operating efficiency is achieved by this method.Furthermore, improved performance is achieved through the multi-task learning model, which allows simultaneous evaluation of multiple neural networks run samples in design space exploration.The framework is not limited to the simplified model used in this study, it can also be applied to more complex models, such as cycle-accurate electronic system level(ESL) simulators, to explore design spaces.Future work will focus on extending the scope of the framework to various simulators.

    猜你喜歡
    豫章
    劉江寧作品
    鄒婳婳作品
    江西豫章繡的傳承與發(fā)展研究
    西部皮革(2023年5期)2023-03-13 05:06:50
    劉江寧作品
    燦若星辰——音舞詩畫《1921·豫章那個班》主題曲
    心聲歌刊(2021年3期)2021-08-05 07:43:56
    計功行賞
    “豫章書院”一案宣判,受害人還在等待一個道歉
    豫章文化內(nèi)涵研究
    卷宗(2020年4期)2020-04-21 07:48:24
    “豫章書院”辦成戒網(wǎng)癮學校
    雜文選刊(2017年12期)2017-12-13 20:58:35
    美哉,豫章
    久久精品亚洲熟妇少妇任你| 99热全是精品| 我要看黄色一级片免费的| 欧美性长视频在线观看| 丝袜脚勾引网站| 一级黄色大片毛片| 欧美日韩黄片免| 18禁观看日本| 好男人电影高清在线观看| 亚洲av在线观看美女高潮| 搡老岳熟女国产| 免费看不卡的av| 亚洲,欧美,日韩| 亚洲精品美女久久久久99蜜臀 | 久久青草综合色| 国语对白做爰xxxⅹ性视频网站| 丝袜美足系列| 丝袜脚勾引网站| 又紧又爽又黄一区二区| 国产亚洲一区二区精品| 国产深夜福利视频在线观看| 亚洲伊人色综图| 精品国产一区二区三区四区第35| 亚洲视频免费观看视频| 黄色一级大片看看| 亚洲九九香蕉| 日本黄色日本黄色录像| 亚洲国产精品999| 国产成人精品在线电影| 国产女主播在线喷水免费视频网站| 中国美女看黄片| 少妇人妻 视频| 午夜激情久久久久久久| 91老司机精品| 精品久久久精品久久久| 国产黄色视频一区二区在线观看| 免费看av在线观看网站| 国产亚洲av片在线观看秒播厂| 国产一卡二卡三卡精品| 人人妻人人澡人人爽人人夜夜| 亚洲一码二码三码区别大吗| 最新的欧美精品一区二区| 少妇人妻 视频| av网站在线播放免费| 青春草亚洲视频在线观看| 亚洲第一青青草原| kizo精华| 国产精品二区激情视频| 人体艺术视频欧美日本| 久久精品成人免费网站| 国产成人一区二区三区免费视频网站 | 国产成人91sexporn| 欧美少妇被猛烈插入视频| 中文字幕av电影在线播放| 丰满少妇做爰视频| 欧美成人午夜精品| 伊人久久大香线蕉亚洲五| 欧美97在线视频| 19禁男女啪啪无遮挡网站| av网站免费在线观看视频| 久久精品亚洲av国产电影网| 亚洲伊人久久精品综合| 国产精品 欧美亚洲| 热re99久久国产66热| 亚洲欧美精品综合一区二区三区| 久久久久久久国产电影| 国产真人三级小视频在线观看| 高清视频免费观看一区二区| 叶爱在线成人免费视频播放| 免费人妻精品一区二区三区视频| 精品久久久久久电影网| 免费av中文字幕在线| 久久 成人 亚洲| 男人爽女人下面视频在线观看| 亚洲精品国产区一区二| 久久狼人影院| 一二三四社区在线视频社区8| 一级毛片黄色毛片免费观看视频| 丝袜人妻中文字幕| 老司机午夜十八禁免费视频| 激情五月婷婷亚洲| 色播在线永久视频| 下体分泌物呈黄色| 日韩 亚洲 欧美在线| 亚洲欧美一区二区三区国产| 9191精品国产免费久久| 99九九在线精品视频| 精品福利观看| 成年av动漫网址| 日韩 欧美 亚洲 中文字幕| www.999成人在线观看| 免费观看人在逋| 美女主播在线视频| 女人精品久久久久毛片| 欧美中文综合在线视频| 黄色a级毛片大全视频| 一级毛片我不卡| 亚洲欧美精品综合一区二区三区| 狂野欧美激情性bbbbbb| 狠狠精品人妻久久久久久综合| 欧美97在线视频| 欧美日本中文国产一区发布| 少妇裸体淫交视频免费看高清 | 777米奇影视久久| 我的亚洲天堂| 国产在视频线精品| 久久精品久久久久久久性| 午夜福利视频在线观看免费| www.999成人在线观看| 大话2 男鬼变身卡| 亚洲人成电影观看| 亚洲av成人精品一二三区| 国产视频首页在线观看| 亚洲欧美一区二区三区久久| 久久久久国产精品人妻一区二区| 久久久久久人人人人人| 日韩电影二区| 亚洲精品一二三| 男女免费视频国产| 国产熟女午夜一区二区三区| 大片电影免费在线观看免费| 校园人妻丝袜中文字幕| 最近手机中文字幕大全| 欧美精品亚洲一区二区| 999久久久国产精品视频| avwww免费| 亚洲成色77777| 在线观看免费日韩欧美大片| 菩萨蛮人人尽说江南好唐韦庄| 我的亚洲天堂| 亚洲欧美日韩另类电影网站| 午夜91福利影院| 国产老妇伦熟女老妇高清| 亚洲国产欧美日韩在线播放| av在线app专区| 久久亚洲国产成人精品v| 精品久久久久久久毛片微露脸 | 热99久久久久精品小说推荐| 成人免费观看视频高清| 精品免费久久久久久久清纯 | 又大又黄又爽视频免费| 国产片内射在线| 久久99热这里只频精品6学生| 五月天丁香电影| 国产黄色视频一区二区在线观看| 久久人人97超碰香蕉20202| 在线观看一区二区三区激情| 97人妻天天添夜夜摸| 97人妻天天添夜夜摸| 人人妻,人人澡人人爽秒播 | 一级毛片黄色毛片免费观看视频| 久9热在线精品视频| 男女高潮啪啪啪动态图| 99九九在线精品视频| 久久久久久免费高清国产稀缺| 国产亚洲精品第一综合不卡| 一本一本久久a久久精品综合妖精| 久久99一区二区三区| 亚洲欧洲国产日韩| 多毛熟女@视频| av电影中文网址| 亚洲国产成人一精品久久久| 久久久精品免费免费高清| 精品国产乱码久久久久久男人| 亚洲色图 男人天堂 中文字幕| av网站在线播放免费| 午夜福利影视在线免费观看| 在线亚洲精品国产二区图片欧美| 久久毛片免费看一区二区三区| 久久中文字幕一级| 人人妻人人澡人人看| 丰满人妻熟妇乱又伦精品不卡| 咕卡用的链子| 黄色 视频免费看| 亚洲视频免费观看视频| 国产福利在线免费观看视频| 亚洲国产av新网站| 黄色怎么调成土黄色| 97精品久久久久久久久久精品| 男人添女人高潮全过程视频| 大话2 男鬼变身卡| 成年女人毛片免费观看观看9 | 日日摸夜夜添夜夜爱| 精品国产超薄肉色丝袜足j| 亚洲av美国av| 中国美女看黄片| 别揉我奶头~嗯~啊~动态视频 | 中文欧美无线码| 别揉我奶头~嗯~啊~动态视频 | 大片免费播放器 马上看| 激情视频va一区二区三区| 赤兔流量卡办理| 亚洲国产欧美一区二区综合| 一级毛片 在线播放| 国产成人精品久久二区二区免费| 亚洲av综合色区一区| 国产欧美日韩一区二区三 | 在线精品无人区一区二区三| 电影成人av| 成年女人毛片免费观看观看9 | 啦啦啦视频在线资源免费观看| 一边摸一边做爽爽视频免费| 日韩 欧美 亚洲 中文字幕| 你懂的网址亚洲精品在线观看| 国产97色在线日韩免费| 中文字幕高清在线视频| 丰满饥渴人妻一区二区三| 国产又色又爽无遮挡免| 久久精品国产亚洲av高清一级| 精品国产一区二区久久| 青青草视频在线视频观看| 久久国产精品影院| 精品一区二区三卡| 中文字幕另类日韩欧美亚洲嫩草| 99热网站在线观看| netflix在线观看网站| 成年av动漫网址| 捣出白浆h1v1| 国产黄频视频在线观看| 亚洲精品国产色婷婷电影| 九色亚洲精品在线播放| 久久久精品免费免费高清| 亚洲黑人精品在线| 国产精品人妻久久久影院| 久久国产亚洲av麻豆专区| 亚洲欧美精品综合一区二区三区| 美女午夜性视频免费| 叶爱在线成人免费视频播放| 婷婷色综合大香蕉| 黄片播放在线免费| 侵犯人妻中文字幕一二三四区| h视频一区二区三区| 欧美日韩综合久久久久久| 女警被强在线播放| 99久久综合免费| 少妇粗大呻吟视频| 男女床上黄色一级片免费看| 亚洲一卡2卡3卡4卡5卡精品中文| 欧美久久黑人一区二区| 国产精品熟女久久久久浪| 日本91视频免费播放| 中文字幕人妻丝袜制服| 极品人妻少妇av视频| 色播在线永久视频| 日韩制服丝袜自拍偷拍| 日韩熟女老妇一区二区性免费视频| 别揉我奶头~嗯~啊~动态视频 | 国产精品三级大全| 老司机影院成人| 午夜福利乱码中文字幕| 一本一本久久a久久精品综合妖精| 免费高清在线观看视频在线观看| 国产亚洲欧美精品永久| 国产在视频线精品| 在线观看人妻少妇| 国产男女超爽视频在线观看| 精品久久久久久电影网| 女人爽到高潮嗷嗷叫在线视频| 成年av动漫网址| 考比视频在线观看| 最近中文字幕2019免费版| 菩萨蛮人人尽说江南好唐韦庄| 亚洲av片天天在线观看| 99久久人妻综合| 色94色欧美一区二区| 日本wwww免费看| 亚洲av片天天在线观看| 99久久人妻综合| 国产精品av久久久久免费| 日韩一卡2卡3卡4卡2021年| 亚洲精品成人av观看孕妇| 国产av一区二区精品久久| 老鸭窝网址在线观看| 熟女少妇亚洲综合色aaa.| 美女国产高潮福利片在线看| 精品一区二区三卡| 美女大奶头黄色视频| 午夜激情久久久久久久| 亚洲一区中文字幕在线| 久久亚洲精品不卡| 久久精品国产亚洲av高清一级| xxx大片免费视频| 亚洲精品日本国产第一区| 欧美黑人精品巨大| 男的添女的下面高潮视频| 久久久久久亚洲精品国产蜜桃av| 欧美日韩国产mv在线观看视频| 97在线人人人人妻| 成年美女黄网站色视频大全免费| e午夜精品久久久久久久| 无限看片的www在线观看| 青春草视频在线免费观看| 麻豆av在线久日| 9色porny在线观看| 婷婷成人精品国产| 国产欧美日韩精品亚洲av| 国产极品粉嫩免费观看在线| 精品高清国产在线一区| 每晚都被弄得嗷嗷叫到高潮| 黄色视频不卡| 日本wwww免费看| 国产精品熟女久久久久浪| 亚洲精品久久成人aⅴ小说| 婷婷色麻豆天堂久久| 久久人人爽av亚洲精品天堂| 女人被躁到高潮嗷嗷叫费观| 精品久久久久久久毛片微露脸 | 欧美日本中文国产一区发布| 成年美女黄网站色视频大全免费| 亚洲欧美成人综合另类久久久| 9色porny在线观看| 91九色精品人成在线观看| 日韩电影二区| 国产精品国产三级专区第一集| 9色porny在线观看| 在线亚洲精品国产二区图片欧美| 看十八女毛片水多多多| 啦啦啦在线观看免费高清www| 一级a爱视频在线免费观看| www.999成人在线观看| 国产一区二区在线观看av| 我要看黄色一级片免费的| h视频一区二区三区| 中国美女看黄片| 我的亚洲天堂| 亚洲专区中文字幕在线| 国产有黄有色有爽视频| 精品久久蜜臀av无| 麻豆乱淫一区二区| www.av在线官网国产| 一本色道久久久久久精品综合| 午夜激情av网站| 91精品三级在线观看| 七月丁香在线播放| 亚洲专区国产一区二区| 成年人免费黄色播放视频| 99久久精品国产亚洲精品| 老司机影院成人| 黄片小视频在线播放| 免费在线观看黄色视频的| 一本大道久久a久久精品| 免费看不卡的av| 美女主播在线视频| 高清欧美精品videossex| 高清视频免费观看一区二区| 日韩一卡2卡3卡4卡2021年| 国产精品99久久99久久久不卡| 国产欧美日韩精品亚洲av| 亚洲国产av新网站| 美女国产高潮福利片在线看| 十八禁网站网址无遮挡| 爱豆传媒免费全集在线观看| 久久国产精品影院| 19禁男女啪啪无遮挡网站| 色综合欧美亚洲国产小说| 老汉色∧v一级毛片| 免费观看av网站的网址| 亚洲国产精品一区三区| 午夜免费成人在线视频| 老鸭窝网址在线观看| 久久青草综合色| 国产又爽黄色视频| 午夜av观看不卡| 9191精品国产免费久久| 免费看不卡的av| 国产成人一区二区三区免费视频网站 | 天堂8中文在线网| 在线观看一区二区三区激情| 69精品国产乱码久久久| 在现免费观看毛片| 91精品三级在线观看| 91麻豆av在线| 超碰97精品在线观看| 久久久国产精品麻豆| 一本色道久久久久久精品综合| 亚洲欧美日韩另类电影网站| 国产三级黄色录像| 一级片'在线观看视频| 国产高清视频在线播放一区 | 纵有疾风起免费观看全集完整版| 久久精品久久精品一区二区三区| 日本欧美视频一区| 成人手机av| 欧美人与性动交α欧美软件| 免费在线观看完整版高清| 久久久久国产一级毛片高清牌| 国产成人欧美| 在线看a的网站| 精品高清国产在线一区| 国产精品久久久人人做人人爽| 日本vs欧美在线观看视频| 亚洲欧美清纯卡通| 人妻一区二区av| 精品久久蜜臀av无| 国产一级毛片在线| 欧美在线一区亚洲| 99国产精品99久久久久| 国产主播在线观看一区二区 | 在现免费观看毛片| av视频免费观看在线观看| 国产极品粉嫩免费观看在线| a 毛片基地| 老司机影院毛片| av国产精品久久久久影院| 亚洲欧美色中文字幕在线| 免费观看a级毛片全部| 欧美国产精品va在线观看不卡| 一二三四社区在线视频社区8| 啦啦啦在线观看免费高清www| 欧美日韩亚洲高清精品| 一级,二级,三级黄色视频| 久久精品国产综合久久久| 中文字幕人妻丝袜一区二区| 国产精品人妻久久久影院| 亚洲国产成人一精品久久久| 精品福利永久在线观看| 美女福利国产在线| 国产又色又爽无遮挡免| 十分钟在线观看高清视频www| 婷婷色av中文字幕| 男女免费视频国产| 国产成人免费无遮挡视频| 精品一区二区三卡| 国产成人91sexporn| 欧美 日韩 精品 国产| 丝袜人妻中文字幕| 婷婷色综合www| 十八禁网站网址无遮挡| 一级片'在线观看视频| 嫩草影视91久久| 汤姆久久久久久久影院中文字幕| 色播在线永久视频| 久久av网站| 777米奇影视久久| 久久人人97超碰香蕉20202| 欧美成狂野欧美在线观看| 亚洲欧美中文字幕日韩二区| 免费高清在线观看日韩| 国产成人av激情在线播放| 亚洲欧美色中文字幕在线| 成人亚洲欧美一区二区av| 两个人看的免费小视频| 国产在线观看jvid| 国产成人一区二区三区免费视频网站 | 下体分泌物呈黄色| 成在线人永久免费视频| 日日夜夜操网爽| 午夜免费观看性视频| 少妇的丰满在线观看| 午夜视频精品福利| 欧美日韩成人在线一区二区| 热99国产精品久久久久久7| 午夜福利免费观看在线| xxxhd国产人妻xxx| 1024香蕉在线观看| 中文精品一卡2卡3卡4更新| 日韩制服丝袜自拍偷拍| 国产精品偷伦视频观看了| 亚洲欧美清纯卡通| 性高湖久久久久久久久免费观看| 国产成人av激情在线播放| 久热这里只有精品99| 一本色道久久久久久精品综合| 午夜精品国产一区二区电影| 青草久久国产| 亚洲精品一卡2卡三卡4卡5卡 | 久久久久久久国产电影| 久久人妻福利社区极品人妻图片 | 99国产精品一区二区三区| 成人国产一区最新在线观看 | 国产97色在线日韩免费| 一本久久精品| 999久久久国产精品视频| 亚洲精品一二三| 久久久国产一区二区| 亚洲图色成人| 日本色播在线视频| 97精品久久久久久久久久精品| 国产成人精品久久久久久| 中文字幕精品免费在线观看视频| 成人免费观看视频高清| 久久久久久久精品精品| 午夜免费观看性视频| 成年女人毛片免费观看观看9 | 91字幕亚洲| 欧美日韩福利视频一区二区| 丝袜在线中文字幕| 国产黄频视频在线观看| 国产亚洲av高清不卡| 欧美精品一区二区大全| 99国产精品一区二区三区| 狠狠婷婷综合久久久久久88av| 亚洲成人手机| 啦啦啦 在线观看视频| 午夜免费鲁丝| 1024香蕉在线观看| 国产男人的电影天堂91| 丝袜喷水一区| 十分钟在线观看高清视频www| 久久精品国产a三级三级三级| 精品熟女少妇八av免费久了| 欧美亚洲日本最大视频资源| 十八禁人妻一区二区| 老司机在亚洲福利影院| 天天操日日干夜夜撸| 久久久久网色| 大型av网站在线播放| 国产片特级美女逼逼视频| 狠狠婷婷综合久久久久久88av| 午夜日韩欧美国产| 岛国毛片在线播放| 免费在线观看日本一区| 捣出白浆h1v1| 青春草亚洲视频在线观看| 1024香蕉在线观看| 国产视频首页在线观看| 午夜福利一区二区在线看| 大片电影免费在线观看免费| 久久国产精品男人的天堂亚洲| 国产97色在线日韩免费| 国产亚洲精品第一综合不卡| 色婷婷av一区二区三区视频| 三上悠亚av全集在线观看| 如日韩欧美国产精品一区二区三区| 亚洲三区欧美一区| 亚洲五月婷婷丁香| 久久久精品区二区三区| 日本av免费视频播放| 黄色 视频免费看| 亚洲美女黄色视频免费看| 亚洲精品在线美女| 亚洲美女黄色视频免费看| 精品国产超薄肉色丝袜足j| 中文欧美无线码| 亚洲av综合色区一区| 亚洲美女黄色视频免费看| 亚洲精品中文字幕在线视频| 制服诱惑二区| 国产高清视频在线播放一区 | 国产成人一区二区三区免费视频网站 | 欧美精品高潮呻吟av久久| 国产成人91sexporn| 国产亚洲av片在线观看秒播厂| 国产精品人妻久久久影院| 国产真人三级小视频在线观看| 午夜激情久久久久久久| 巨乳人妻的诱惑在线观看| av网站在线播放免费| 日韩欧美一区视频在线观看| 国产精品欧美亚洲77777| 亚洲人成网站在线观看播放| 国产片内射在线| 亚洲av美国av| 脱女人内裤的视频| av有码第一页| 好男人视频免费观看在线| 丝袜美腿诱惑在线| 亚洲精品国产av成人精品| 大话2 男鬼变身卡| 国产片特级美女逼逼视频| 欧美精品人与动牲交sv欧美| 欧美日韩av久久| 日韩大码丰满熟妇| 妹子高潮喷水视频| 中文字幕人妻熟女乱码| 午夜视频精品福利| 久久这里只有精品19| 十八禁高潮呻吟视频| 成人国语在线视频| avwww免费| 国产高清国产精品国产三级| 亚洲国产精品999| av在线播放精品| 90打野战视频偷拍视频| 咕卡用的链子| 在线看a的网站| 亚洲欧美精品综合一区二区三区| 亚洲av国产av综合av卡| 亚洲精品在线美女| 亚洲人成电影免费在线| av线在线观看网站| 最新在线观看一区二区三区 | 免费观看av网站的网址| 亚洲国产精品成人久久小说| 18在线观看网站| 热re99久久精品国产66热6| 视频区图区小说| av天堂在线播放| 少妇的丰满在线观看| 男女无遮挡免费网站观看| 久久午夜综合久久蜜桃| 国产欧美日韩精品亚洲av| 日日爽夜夜爽网站| 国产精品.久久久| 婷婷色综合www| 久久久久久久久久久久大奶| 亚洲三区欧美一区| 99香蕉大伊视频| 人人妻,人人澡人人爽秒播 | 亚洲精品日本国产第一区| 男女边摸边吃奶| 蜜桃国产av成人99| 亚洲国产av影院在线观看| 国产真人三级小视频在线观看| 男女国产视频网站| 亚洲第一青青草原| 黄色毛片三级朝国网站| 人体艺术视频欧美日本| 下体分泌物呈黄色| 亚洲美女黄色视频免费看| 成人国产一区最新在线观看 | 国产精品麻豆人妻色哟哟久久| 国产亚洲欧美精品永久| 亚洲国产精品999| 精品高清国产在线一区|