• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    An Improved Graphics Processing Unit Acceleration Approach for Three-Dimensional Structural Topology Optimization Using the Element-Free Galerkin Method

    2021-11-08 08:07:18HaishanLuShuguangGongJianpingZhangGuilanXieandShuohuiYin

    Haishan Lu,Shuguang Gong,Jianping Zhang,Guilan Xie and Shuohui Yin

    School of Mechanical Engineering,Xiangtan University,Xiangtan,411105,China

    ABSTRACT We proposed an improved graphics processing unit(GPU)acceleration approach for three-dimensional structural topology optimization using the element-free Galerkin(EFG)method.This method can effectively eliminate the race condition under parallelization.We established a structural topology optimization model by combining the EFG method and the solid isotropic microstructures with penalization model.We explored the GPU parallel algorithm of assembling stiffness matrix, solving discrete equation, analyzing sensitivity, and updating design variables in detail.We also proposed a node pair-wise method for assembling the stiffness matrix and a node-wise method for sensitivity analysis to eliminate race conditions during the parallelization.Furthermore,we investigated the effects of the thread block size,the number of degrees of freedom,and the convergence error of preconditioned conjugate gradient(PCG)on GPU computing performance.Finally,the results of the three numerical examples demonstrated the validity of the proposed approach and showed the significant acceleration of structural topology optimization.To save the cost of optimization calculation,we proposed the appropriate thread block size and the convergence error of the PCG method.

    KEYWORDS Topology optimization; EFG method; GPU acceleration; race condition; preconditioned conjugate gradient

    1 Introduction

    Topology optimization as a powerful tool at the early stages of the design process aims to find the optimal layout of material that fulfills the design requirements for conceptual designs [1].Until now, topology optimization has been applied successfully to a wide range of problems, such as compliant mechanisms [2], heat conduction systems [3], material microstructures [4], and additive manufacturing [5].These mentioned applications have displayed the advantages of the topology optimization technique.Numerical instabilities, such as checkerboards and mesh-dependencies,can occur in applications of topology optimization by mesh-based methods [6].Furthermore, for mesh-based methods, mesh distortion or entanglement always appears in the large deformation or moving boundary problems [7].

    To overcome these difficulties related to the mesh-based method, a meshless method, which relies solely on a series of scattered nodes and eliminates the grid [8], has been introduced into topology optimization problems.Cho et al.[9] developed a variable density approach based on the reproducing kernel method for the topology optimization of geometrically nonlinear structures.Lin et al.[10] presented a topology optimization method using the smoothed particle hydrodynamics (SPH) method.In particular, the element-free Galerkin (EFG) method with good convergence,computing accuracy, and stability [11] has gained popularity in topology optimization.Gong et al.[12] presented a structural modal topology optimization method of continuum structure using the EFG method.He et al.[13] applied the EFG method to the design of compliant mechanisms involving geometrical nonlinearity.Shobeiri et al.[14,15] carried out the topology optimization of continuum structures and cracked structures according to the evolutionary structural optimization method integrated with the EFG method.More recently, Zhang et al.[16]introduced a meshless-based topology optimization coupled with the finite element method (FEM)to the large displacement problem of nonlinear hyperelastic structures.Khan et al.[17] proposed a combination of the EFG method and the level set method (LSM) for topology optimization.This combination worked independently of initial guessed topology and could automatically nucleate holes in the design.Zhang et al.[18] presented a numerical method of topology optimization for isotropic and anisotropic thermal structures by combining the EFG method and the rational approximation of material properties model.

    The aforementioned studies have shown the advantages of the meshless method for topology optimization, that is, it can avoid checkerboard patterns and mesh distortion.Because the computational cost for meshless method is much higher than the traditional mesh-based method,these studies are limited to small-scale and two-dimensional topology optimization problems.To improve the computational efficiency, Metsis et al.[19] proposed a novel approach by employing the domain decomposition techniques on physical as well as on algebraic domains.Trask et al.[20] proposed a fast multigrid preconditioner in the generalized minimal residual (GMRES)iterative solver for fluid flow problem discretized by the SPH method.Singh et al.[21] developed a preconditioned biconjugate gradient stabilized solver for the meshless local Petrov-Galerkin method applied to heat conduction in three-dimensional (3D) complex geometry.For solving more complex and large-scale problems, parallel computing is applied to the meshless method [22,23].

    In recent years, with the prevalence of Compute Unified Device Architecture (CUDA) released by NVIDIA, the graphics processing unit (GPU) approach with its massively parallel architecture has been applied widely in computational mechanics [24-27].The meshless method is computationally intensive and well suited for parallelism by the GPU system because of the absence of mesh.Karatarakis et al.[28] proposed a node pair-wise algorithm for stiffness matrix assembly with GPU parallel computing.Dong et al.[29] presented a parallel computing strategy of the material point method (MPM) with multiple GPUs for large-scale problems, with a maximum speed up to 1280 using 16 GPUs.Frissane et al.[30] simulated a blunt projectile penetrating thin steel plate by combining the 3D SPH method and GPU.Chen et al.[31] applied GPU acceleration technique to simulate large-scale 3D-violent free-surface flow problems using the moving particle semi-implicit method.Afrasiabi et al.[32] identified the parameters of a friction model in metal machining using GPU-accelerated meshless simulations.In terms of topology optimization, Challis et al.[33] carried out the GPU parallel computing of topology optimization with the LSM and FEM.They found that the GPU is utilized more effectively at the higher problem size.To solve large FE model in topology optimization, Martinez-Frutos et al.[34] presented a parallelism method using multi-GPU systems for robust topology optimization and proposed a well-suited strategy for large-scale 3D topology optimization problems [35,36].Ramirez-Gil et al.[37] designed 3D electrothermomechanical actuators using the GPU as the coprocessor for the most intensive and intrinsic parallel tasks.Xia et al.[38] proposed a new level-set-based topology optimization method using a parallel strategy of GPU and isogeometric analysis.

    According to the literature, it can be seen that if the GPU parallel acceleration is combined with the EFG method and applied in topology optimization, the advantages of both can be fully exploited.Therefore, to minimize the computational cost of topology optimization, we present an entire GPU parallel computing procedure for topology optimization using the EFG method.We explore the parallel algorithm of assembling stiffness matrix, solving discrete equation, analyzing sensitivity, and updating design variables in detail and provide the flow chart of GPU parallelism.We also propose a node pair-wise method for assembling stiffness matrix and a node-wise method for sensitivity analysis to eliminate race conditions during the parallelization.Then, we verify the proposed approach and evaluate it through three numerical examples.Finally, we discuss the influence of thread block size, number of degrees of freedom (DOFs), and preconditioned conjugate gradient (PCG) convergence error on the computational efficiency of the GPU parallel algorithm.

    2 Review of the EFG Method

    In the EFG method, the moving least squares (MLS) scheme is used to construct the variable shape function only with the nodes that distributed in the domain.The local approximationof the unknown functionu(x)can be written as follows:

    wherepT(x)is a complete polynomial of orderm.In the 3D problems, the linear basispT(x)is given by the following:

    The unknown coefficient vectora(x)is determined at any pointxby minimizing the weighted functionalJ(x)defined as follows:

    whereuiis the nodal parameter at nodexi, and the weight functionw(x-xi)used in this work is the cubic spline weight which is given as follows:

    with

    wheredmi=dmaxciis the domain of influence size of nodei,dmaxis the scaling parameter of domain of influence,ciis the distance to the nearest neighbor node.

    The derivative?J/?aequaling to zero leads to the following relation:

    By substituting Eq.(6) into Eq.(1), the MLS approximant can be defined as follows:

    whereΦT(x)is the MLS shape function given by the following:

    whereΦi(x)at nodeiis defined as follows:

    Because of the lack of the Kronecker delta property of the previous shape function, the essential boundary conditions cannot be imposed in the same way as it can be imposed in FEM.In the present work, we used the penalty method to enforce the essential boundary conditions.The discrete algebraic equation of 3D elasticity problems can be written as follows:

    with

    whereBiandΦiare given by

    3 Topology Optimization Based on the EFG Method

    3.1 Topology Optimization Model

    We selected the nodal relative density parameterρias the design variable in the topology optimization model based on EFG method, and the density of any pointρican be obtained as follows:

    According to the solid isotropic microstructures with penalization (SIMP) model, the relationship between the nodal relative density parameterρiand the Young’s modulusEcan be defined as follows:

    wherepis a penalization factor, andE0represents the initial full-solid state material property.

    The structural compliance minimization problem can be mathematically defined by the following:

    wherecis the compliance of structure,Vis the volume of material after topology optimization,V0is the initial volume of material,fis the volume fraction, andρminis a lower bound of density to avoid the numerical singularity.Substituting Eq.(17) into Eq.(11), the stiffness matrixcan be formulated as follows:

    The material volume of the design domain is calculated by the following:

    It can be seen from Eq.(19) that many elements of the stiffness matrixare approximated to zero as the void area are expanded in topology optimization.As a result, the condition number of the global stiffness matrix tends to degenerate when the optimization problem begins to converge to the optimal solution.

    3.2 Optimization Criterion Method

    The optimization criterion (OC) method is used to update the design variable.The updating scheme for the design variable can be formulated as follows:

    whereξis a positive move-limit,ηis a numerical damping coefficient andBican be found from the optimality condition as follows:

    whereλis a Lagrange multiplier that can be obtained by a bisection algorithm.

    The sensitivity of the objective function and the volume constraint function can be expressed as follows:

    where

    3.3 Topology Optimization Procedure

    The detailed procedure of the topology optimization with the EFG method for minimum compliance problem is illustrated in Fig.1.The main steps in the loop included assembling stiffness matrix, solving discrete equation, analyzing sensitivity, and updating design variables.Because local data (e.g., the shape function and its derivatives, the load vector, and the initial volume of material in the domain) are used a large number of times, all calculations of these data are performed in the initial calculation.The amount of storage required for these data is small,so storing it temporarily is not an issue.

    4 GPU Parallel Programming

    4.1 CUDA

    CUDA is a general purpose parallel computing platform and programming model.In the CUDA context, a thread is the smallest unit of execution of kernel functions, which are defined in the GPU.As shown in Fig.2, a large number of threads are executed concurrently on the GPU.All threads generated by a kernel are defined as a grid.A certain number of threads in a grid are organized as a thread block.A grid consists of a number of blocks.Similar to thread organization, the GPU has a variety of different memories.Global memory can be accessed by all threads on the GPU, but it had a noticeable latency.Constant memory can be read quickly by all threads, but cannot be written to.Shared memory can be stored input and output data by all threads in a block, which is almost as fast as registers.Registers are allocated to individual threads, meaning that each thread can access only its own registers.This hierarchy of thread and memory makes it easier for programmers to develop high-performance parallel programs.

    Figure 1:Topology optimization procedure

    Figure 2:Hierarchy of CUDA thread and memory

    4.2 The Reduction Summation Method

    In the GPU parallel algorithm, it is necessary to calculate a summation of data for all threads in a thread block.To overcome the problem that using a single thread accumulator would lead to poor performance on a GPU, we adopted a parallel reduction summation strategy in this study.As shown in Fig.3, b/2 threads (e.g., thread 1-b/2) of all threads executed the pairwise summation during the first step.Analogously, each step of this pairwise summation divided the number of partial sums by half.Ultimately, the sumbwas obtained after log2bsteps.It was obvious that the number of threadsbshould be a power of 2.

    Figure 3:The reduction summation method

    4.3 The Race Conditions

    As shown in Fig.4, the same memory location might have had conflicting updates when no less than two threads attempted to access the same memory unit (e.g., units with red color)concurrently.This is an example of the racing of CUDA threads, which could lead to an uncertain outcome.For example, parallelizing the Gauss point-wise approach for the assembly of the stiffness matrix might lead to an uncertain outcome.These race conditions can be avoided with proper algorithms that make a thread write data only to the corresponding memory unit.In addition, atomic operations can avert race conditions by making all updates serialization.In a massively parallel system, however, thousands of threads may be working simultaneously, which can hamper performance.Therefore, atomic operations are suitable only for processing a small number of parallel threads.

    Figure 4:The race conditions

    5 GPU Parallel Algorithm

    5.1 Assembling Stiffness Matrix

    Figure 5:The Gauss point-wise and node pair-wise method for assembling stiffness matrix

    The amenability to parallelism of the node pair-wise method is well suited for the GPU parallel computation.As shown in Fig.6, the GPU parallel algorithm of the node pair-wise method had two levels of parallelism.A thread block handled a node pair related to a specific submatrix,and each thread in a thread block was assigned to a Gauss point of the shared domains of influence of the node pair.When all threads of a thread block completed the calculation, the corresponding 3×3 subblock was obtained by the reduction summation method.Note that the number of threads in a thread block should be a power of 2.Finally, the first thread in each thread block wrote the submatrix to the corresponding location in the global memory.

    Figure 6:The GPU parallel algorithm for assembling stiffness matrix

    5.2 Solving Discrete Equations

    In the process of topology optimization, the discrete equations formed by the EFG method need to be solved repeatedly to obtain the structural response.For large-scale problems, in addition to the huge computation time, the use of the direct method is expensive in terms of the storage requirement, resulting from the large number of zeroes that mostly become nonzero values during the factorization process.Therefore, the iterative method usually is employed in the EFG method.The linear equations in topology optimization, however, are extremely ill-conditioned because of the highly heterogeneous material distributions.To improve on the rate of convergence,a preconditioning technique is necessary.In this study, we used the PCG method with the Jacobi preconditionerJto solve the discrete equations.The GPU algorithm of the PCG method using the CUBLAS and CUSPARSE libraries is listed as Algorithm 1.

    Algorithm 1:PCG with Jacobi preconditioner using CUBLAS/CUSPARSE subroutines Data:ˉK, F, U0, tol, Itmax, J Result: U 1 k=0 2 ˉKu= ˉKU0 →cusparseDcsrmv 3 r0=F- ˉKu →cublasDaxpy 4 z0=Jr0 →cusparseDcsrmv 5 p0=z0 →cublasDcopy 6 rr2=rT0 z0, ff =FTF →cublasDdot 7 for k = 0,1,...,Itmax 8 rr1=rr2 9 ifimages/BZ_334_393_968_418_1014.pngimages/BZ_334_418_979_456_1025.pngrr1/ff ≤Tolimages/BZ_334_687_968_711_1014.pngbreak;10 ˉKp= ˉKpk →cusparseDcsrmv 11 αk=rr1(pTk ˉKp)→cublasDdot 12 Uk+1=Uk+αkpk →cublasDaxpy 13 rk+1=rk-αkˉKp →cublasDaxpy 14 zk+1=Jrk+1 →cusparseDcsrmv 15 rr2=rTk+1zk+1 →cublasDdot 16 βk=rr2/rr1 17 pk+1=zk+1+βkpk →cublasDaxpy 18 end 19 U=Uk

    The convergence errorTolin the PCG method had a remarkable effect on the solving of linear equations.We obtained more accurate results of linear equations with a stricter error value.Nevertheless, the stricter the convergence error value was, the more PCG iterations at each optimization iteration led to an increase in computing time.In contrast, the topology optimization often is used as a preprocessing stage to find the optimal layout of the material that fulfills the design requirements, and the resulting topology then can be used as the initial guess for further detailed design.If there is a slight change in the density value, the resulting input to the detailed design will not change significantly.Therefore, if we used the PCG method with a loose error to solve the equations, and it had no effect on the topological results, then the time to solve the linear equations could be reduced significantly.

    5.3 Analyzing Sensitivity

    As shown in Fig.7, the sensitivity of the objective function to the design variableρiis the sum of the contributions of all Gauss points (red x-types) in the infulence domain of nodalito the objective sensitivity.Similar to the assembly of the stiffness matrix, the objective sensitivity of Eq.(23) usually is calculated by looping all Gauss points to accumulating the contributions.The parallelization of the Gauss point-wise method, however, also suffered from the race conditions in which the multiple threads might have accessed a memory unit at the same time, as shown by the red arrow in Fig.7.We proposed a node-wise method to eliminate the race conditions in the calculation of the objective sensitivity herein.By substituting Eq.(25) into Eq.(23), the sensitivity calculation formula of the objective function can be expressed as follows:

    Figure 7:The Gauss point-wise and node-wise method for analyzing sensitivity

    where

    According to the Eq.(26), the computation of the objective sensitivity for each node is split in two steps.In the first step, we calculated theQvalues for each Gauss point and stored them for the calculation of the objective sensitivity in the next stage.The required storage of allQvalues was small, so storing them temporarily was not an issue.In the second step, we computed the objective sensitivity value corresponding to each node.For nodej, the objective sensitivity value?c/?ρjwas calculated over all influenced Gauss points (green x-types) and summed to form the final values of the corresponding objective sensitivity, which is shown schematically in Fig.7.Both steps of the presented node-wise method could be parallelized without involving race conditions,which was for the GPU parallel systems.Obviously, compared with the Gauss point-wise method,the node-wise method did not increase calculated amount.

    In the first step, we calculated theQvalues for all influenced node pairs of every Gauss point,as described in Eq.(27).The two levels of parallelism were as follows:the major over the Gauss points and the minor over the influenced node pairs.This is shown schematically in Fig.8.We assigned a Gauss point to each thread block (thread number was a power of 2) and each thread handled one influenced node pair at a time.Because each Gauss point corresponded to one thread block, allQvalues in the Gauss points were stored in the respective global memory unit.

    Figure 8:The GPU parallel algorithm for computing the Q values

    The GPU parallel algorithm for the calculation of the final value of the objective sensitivity is shown in Fig.9.We adopted the mode of parallelization similar to the aforementioned parallel strategy, in which each thread block (the number of threads in a thread block should be a power of 2) processed one node at a time and each influenced Gauss point was assigned to one thread.After all influenced Gauss points were processed, the objective sensitivity values of threads of the block were summed according to a reduction summation into the corresponding memory.Because each objective sensitivity value had its own memory unit in the global memory, they were not conflicted in this stage.

    Figure 9:The GPU parallel algorithm for computing the final value of the objective sensitivity

    5.4 Updating Design Variables

    We split the update of the design variable for each node in three stages:calculating the relative density parameter of nodes, the relative density of Gauss points, and the volume of material.Because each design variable was updated independent of the updating of the other design variables according to Eq.(21), the first stage had good parallelism.The GPU parallel strategy of the first stage is illustrated as Fig.10.To adapt to the hierarchy of CUDA architecture,we organized all nodes in a number of node groups.Each thread block processed a node group,and each thread in the thread block handled a node in the group.After all nodes were processed,the design variable value related to each thread was stored in the corresponding storage unit.

    Figure 10:The GPU parallel strategy for calculating the relative density parameter of nodes

    The second stage had two levels of parallelism:the major over the Gauss points and the minor over the influenced nodes.As shown in Fig.11, each Gauss point was assigned to a thread block and each thread handled one affected node.Finally, the reduction sum of values of all threads in each thread block gave the relative density of each Gauss point.Obviously, this stage did not have any race conditions.

    Figure 11:The GPU parallel strategy for calculating the relative density of Gauss points

    We adopted the mode of parallelization similar to the parallelization of the first stage in the third stage, as illustrated in Fig.12.Each thread block dealt with a Gauss point group and each Gauss point was assigned to a thread.Because the number of Gauss point groups was much smaller than the number of Gauss points, we used the atomic operations to avoid the race conditions when the first thread in each thread block wrote the reduction summation to the global memory.

    Figure 12:The GPU parallel strategy for calculating the volume of material

    5.5 Parallel Flowchart

    The GPU parallel flowchart of topology optimization using the EFG method is shown in Fig.13.The initial calculation to be executed just once in the flowchart was performed on CPU.The main steps that had to be repeated in the topology optimization process, namely, assembling the stiffness matrix, solving the discrete equation, analyzing sensitivity, and updating the design variables, were parallelized on the GPU to reduce the computation time.

    Figure 13:The GPU parallel flowchart of topology optimization using the EFG method

    6 Numerical Experiments and Discussions

    We evaluated the performance of the proposed GPU parallel algorithm of topology optimization using three numerical examples:the cantilever beam, the quarter annulus, and the cub.The design domain was discretized with a set of meshless field nodes, and a number of virtual background cells were used only for the numerical integration.We used the 2×2×2 Gauss quadrature rule for each virtual background cell.The Young’s modulus for full-solid region wasE0=200 GPa, and Poisson’s ratio for all material inside the design domain wasν=0.3.The scaling factor of the domain of influence was 2.0.The numerical experiments were run on the following hardware:the CPU was an Intel Core i7-8750H, which had six physical cores (12 logical cores) at 2.20 GHz, the RAM was 16 GB DDR4, and the GPU was NVIDIA GeForce RTX 2070 with 2304 CUDA cores and 8 GB GDDR6 memory.The compiler for CPU code using C language was Microsoft Visual Studio 2012, and the compiler for GPU code was NVIDIA CUDA 9.2.To ensure the accuracy of calculation, we performed all floating point calculations using double precision floating-point format.To compare efficiency among the CPU and GPU codes, we defined the speedup ratio as the running time of the CPU serial code divided by the running time of the GPU parallel code.

    6.1 Numerical Examples

    (1) Example I:Cantilever Beam

    Fig.14 displays the design domain of a 3D cantilever beam structure.The model size was 100 mm × 50 mm × 4 mm.The left face of the domain was fixed.We applied a vertically downward forceF=1000N at the center of the right face.The allowed material usage was limited to 40%.The design domain of the cantilever beam structure was discretized with 16,605 nodes, and the total number of DOFs was 49,950.The iteration process and staged results of the topology optimization with CPU and GPU programs are shown in Fig.15.

    Figure 15:Iteration history and staged results of the cantilever beam topology optimization with CPU and GPU program

    (2) Example II:Quarter Annulus

    Fig.16 displays the design domain of a 3D quarter annulus structure and the model size(Unit:mm).The bottom end of the domain was fixed.As shown in the Fig.16, a horizontal force toward the left wasF= 1000 N.The allowed material usage was limited to 50%.The design domain was discretized with 14,840 nodes, and the total number of DOFs was 44,520.The iteration process and the staged results of the topology optimization with CPU and GPU programs are shown in Fig.17.

    Figure 16:Design domain of the quarter annulus

    Figure 17:Iteration history and staged results of the quarter annulus topology optimization with CPU and GPU program

    (3) Example III:Cube

    Fig.18 displays the design domain of a cub structure.The edge length of the cub was 200 mm.The model was fixed at the gray zones of its lower face, which indicated that the displacements at these surfaces were fixed.We applied a concentrated forceF=10,000 N at the central point of the upper face.The allowed material usage was limited to 5%.The design domain was discretized with 28,830 nodes, and the total number of DOFs was 86,490.The iteration process and the staged results of the topology optimization with CPU and GPU programs are shown in Fig.19.

    Figure 18:Design domain of the cube

    Figure 19:Iteration history and staged results of the cube topology optimization with the CPU and GPU programs

    Figs.15, 17, and 19 show that the staged results and iteration history of topology optimization obtained by using the GPU program proposed in this study are completely consistent with that of the CPU program.This finding further verified the feasibility of the proposed GPU acceleration algorithm and strategy.

    6.2 Analysis of Speedup Characteristic

    (1) The thread block size

    Fig.20 shows the relationship between the thread block size and the speedup ratio in different stages of three examples.

    Figure 20:Speedup ratios under different thread block size.(a) Example I (b) Example II (c)Example III

    Fig.20 shows that the speedup ratios increase first and then decrease with the increase of the thread block size.In the GPU algorithm, a greater thread block size was not necessarily better.Under the calculation conditions in this study, the best choice of the thread block size was 32 for the stiffness matrix assembly and density update, and 64 for the sensitivity analysis.This finding also indicated that the GPU algorithm proposed in this study effectively reduced the thread block size, to reduce the occupation of computer resources.

    (2) The computing time for different stages

    When the PCG error was defined as 1e-5 and the thread block sizes with the best values were chosen, the total computational time for all OC iterations was as given in Tab.1.

    Table 1:Total computing time and speedup ratios of different stage

    Tab.1 shows that the assembling stiffness matrix, solving equation, and sensitivity analysis took up most of the time for CPU serial program.Therefore, the computing efficiency of topology optimization was effectively improved by accelerating these three parts using GPU.In addition,although the updating density was not time-consuming, it also had to be computed on GPU to avoid the repeated transmission of large amounts of data between the CPU and GPU.Meanwhile,although the structures of the three numerical examples were different, the speedup ratio of equation solving and overall optimization calculation increased with the increase in the number of DOFs, as shown in Tab.1, which demonstrated the high efficiency of the proposed GPU parallel approach.

    (3) The number of DOFs

    When the PCG error was defined as 1e-5 and the thread block sizes with the best values were chosen, the total computing times under different DOFs for Example I were as given in Tab.2.Global speedup ratios under different DOFs of Example I are shown in Fig.21.Because of the limitation of computer resources, the maximum total number of DOFs only reached 110,880.

    Table 2:Total computing time under different DOFs of Example I

    Fig.21 shows that the speedup ratios will increase with the increase of the DOFs for the same model example before the model scale exceeds the maximum parallel capability of GPU.The GPU was suitable for solving computation-intensive problems; however, the increase of the speedup ratio decreased gradually.This was caused by the limited computing resources of GPU.

    Figure 21:Global speedup ratios under different DOFs of Example I

    Simultaneously, because of the use of the interacting node pair method during the stiffness matrix assembly and node-wise method during the sensitivity analysis, the speedup ratios were close to 20 and 70, respectively.Compared with the traditional Gauss point-wise method, their computational efficiency greatly improved.This meant that the use of interacting node pair and node-wise method not only improved the parallelism performance of the GPU, but also effectively avoided the race condition that occurred in the thread-block stored procedures.

    6.3 Convergence Error in PCG

    In general, the convergence error in the PCG method had a remarkable effect on the solution of discrete equations.The more accurate results of discrete equations could be obtained with more strict error value.Nevertheless, the stricter the error value is, the greater the number of PCG iterations at each optimization iteration leading to increased time.The effect of the convergence error in the PCG method on the topology results and the GPU computing time is shown in Fig.22.

    Fig.22 shows that when the PCG error value varied from 1e-8 to 1e-1, the final optimal results were basically unchanged, and the number of iterations was about the same.With the relaxation of the error value, the time to solve the equation and the overall optimization time decreased rapidly at first and then increased at the later stage.The increase of the total time was accompanied by an increase of the number of iterations.This meant that the PCG error value could not be relaxed too much, and the recommended error was between 1e-4 and 1e-2.

    Figure 22:GPU computing performance under different PCG error values.(a) Example I (b)Example II (c) Example III

    7 Conclusions

    To improve computational efficiency, we presented a novel GPU acceleration approach for 3D structural topology optimization problems using the EFG method.We developed GPU parallel algorithms of assembling stiffness matrix, solving discrete equation, analyzing sensitivity, and updating design variables.The following conclusions can be drawn from the example analysis:

    (1) The final optimal structures obtained by the proposed GPU acceleration approach were in good agreement with the CPU results.Thus, the presented approach in this study was feasible.Meanwhile the boundary profiles were clear even without the filtering technique of sensitivity, and there were no checkerboard and other unstable phenomena.

    (2) Compared with the conventional CPU serial approach, the proposed GPU acceleration approach greatly improved the computational efficiency of 3D structural topology optimization using the EFG method.The maximum of the global speedup ratio of the whole topology optimization process in the examples reached up to 67.8, and the acceleration ratio also increased as the number of DOFs increased.In addition, the thread block size had a significant effect on the computational efficiency.For 3D topology optimization problems, the recommended value of the thread block size was 32 for the stiffness matrix assembly and density update, and 64 for the sensitivity analysis.

    (3) We used the interacting node pair method during the stiffness matrix assembly and the node-wise method during the sensitivity analysis to effectively improve the parallelism performance of the GPU, and we avoided the race conditions that can occur in thread-block stored procedures.

    (4) The results showed that the convergence error of the PCG solver used to solve discrete equation at each OC iteration could be properly loosened without qualitatively affecting the resulting topology and leading to a considerable reduction in the solution time of the discrete equation.The overall computing time may have increased as a result of the added OC iterations when loosening the PCG convergence error.Therefore, it is essential to use the appropriate PCG error for topology optimization problems based on the EFG method.The reasonable range of PCG convergence error was between 1e-4 and 1e-2.

    Acknowledgement:The support from the National Natural Science Foundation of China (Nos.51875493, 51975503, 11802261) is appreciated.

    Funding Statement:This work is supported by the National Natural Science Foundation of China(Nos.51875493, 51975503, 11802261).The financial support to the first author is gratefully acknowledged.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    免费高清在线观看视频在线观看| 在线观看免费视频网站a站| 国产免费一区二区三区四区乱码| 免费久久久久久久精品成人欧美视频| 亚洲三级黄色毛片| 在线免费观看不下载黄p国产| 毛片一级片免费看久久久久| 久久这里只有精品19| 亚洲人成电影观看| freevideosex欧美| 国产成人精品久久二区二区91 | 99九九在线精品视频| 爱豆传媒免费全集在线观看| 久久精品国产自在天天线| 国产成人精品久久久久久| 亚洲成人av在线免费| 日韩制服丝袜自拍偷拍| 侵犯人妻中文字幕一二三四区| 免费观看无遮挡的男女| 黑丝袜美女国产一区| 国产日韩欧美在线精品| 亚洲伊人色综图| 亚洲国产欧美日韩在线播放| 久久久久久久精品精品| 亚洲精品久久久久久婷婷小说| 伦理电影大哥的女人| 精品视频人人做人人爽| 久久久久久免费高清国产稀缺| 亚洲综合精品二区| 国产精品一区二区在线不卡| 免费高清在线观看视频在线观看| 男女免费视频国产| 国产精品久久久av美女十八| 一本久久精品| 亚洲精品国产色婷婷电影| 国产欧美日韩综合在线一区二区| 永久网站在线| 日韩在线高清观看一区二区三区| 自线自在国产av| 99久久综合免费| 777米奇影视久久| 爱豆传媒免费全集在线观看| 可以免费在线观看a视频的电影网站 | freevideosex欧美| 人成视频在线观看免费观看| 男女边摸边吃奶| 亚洲精品自拍成人| 国产精品av久久久久免费| 1024香蕉在线观看| 亚洲欧美成人精品一区二区| 亚洲欧美成人精品一区二区| 国产一区二区激情短视频 | av国产精品久久久久影院| 国产成人免费观看mmmm| 少妇人妻 视频| 亚洲美女视频黄频| 婷婷色综合大香蕉| 久久av网站| 97在线人人人人妻| 国产一级毛片在线| 春色校园在线视频观看| 久久精品国产综合久久久| 美国免费a级毛片| 国语对白做爰xxxⅹ性视频网站| 青青草视频在线视频观看| 制服人妻中文乱码| 2018国产大陆天天弄谢| 69精品国产乱码久久久| 成人国产麻豆网| 桃花免费在线播放| 亚洲av.av天堂| 久久久精品国产亚洲av高清涩受| 日韩熟女老妇一区二区性免费视频| 午夜福利影视在线免费观看| 高清不卡的av网站| 啦啦啦视频在线资源免费观看| 一级毛片 在线播放| 观看av在线不卡| 日本欧美视频一区| 亚洲少妇的诱惑av| 如何舔出高潮| 国产乱来视频区| 亚洲一区中文字幕在线| 2021少妇久久久久久久久久久| 男人爽女人下面视频在线观看| 亚洲国产色片| 国产成人免费无遮挡视频| www.熟女人妻精品国产| 亚洲综合色惰| 男人操女人黄网站| av网站免费在线观看视频| 久久久久精品人妻al黑| 精品视频人人做人人爽| 新久久久久国产一级毛片| 妹子高潮喷水视频| 欧美日韩精品网址| 美国免费a级毛片| 啦啦啦视频在线资源免费观看| 国产精品一区二区在线不卡| 大片电影免费在线观看免费| 中国三级夫妇交换| 欧美精品一区二区大全| 1024香蕉在线观看| 成年女人在线观看亚洲视频| 免费在线观看完整版高清| 日韩熟女老妇一区二区性免费视频| 欧美日韩综合久久久久久| 免费少妇av软件| 美女视频免费永久观看网站| 激情视频va一区二区三区| 黄色 视频免费看| 巨乳人妻的诱惑在线观看| 国产精品不卡视频一区二区| 女人久久www免费人成看片| 日韩一区二区视频免费看| 另类亚洲欧美激情| 亚洲久久久国产精品| 国产在线免费精品| 亚洲人成77777在线视频| 亚洲国产精品999| 熟妇人妻不卡中文字幕| 9热在线视频观看99| 精品国产国语对白av| 曰老女人黄片| 性色av一级| 2021少妇久久久久久久久久久| www.熟女人妻精品国产| 国产激情久久老熟女| 又粗又硬又长又爽又黄的视频| 又粗又硬又长又爽又黄的视频| 久久久国产一区二区| 国产高清国产精品国产三级| 精品久久久精品久久久| 色婷婷久久久亚洲欧美| 久久久久精品性色| 秋霞在线观看毛片| 精品亚洲乱码少妇综合久久| 亚洲av在线观看美女高潮| 亚洲美女搞黄在线观看| 免费少妇av软件| 国产欧美亚洲国产| a级毛片在线看网站| 久久99精品国语久久久| 性高湖久久久久久久久免费观看| 秋霞伦理黄片| 日韩视频在线欧美| 亚洲欧美成人精品一区二区| 女人久久www免费人成看片| 捣出白浆h1v1| 亚洲第一青青草原| 18+在线观看网站| 亚洲精品久久成人aⅴ小说| 赤兔流量卡办理| 日韩在线高清观看一区二区三区| 热99久久久久精品小说推荐| 亚洲精品国产av蜜桃| 国产深夜福利视频在线观看| 韩国精品一区二区三区| 在线 av 中文字幕| 免费在线观看视频国产中文字幕亚洲 | 国产又色又爽无遮挡免| 考比视频在线观看| 高清视频免费观看一区二区| 国产av码专区亚洲av| av视频免费观看在线观看| 国产亚洲精品第一综合不卡| 午夜福利视频在线观看免费| a级片在线免费高清观看视频| 看十八女毛片水多多多| 视频在线观看一区二区三区| 精品国产乱码久久久久久男人| 成年美女黄网站色视频大全免费| 1024视频免费在线观看| 老司机亚洲免费影院| 妹子高潮喷水视频| 欧美 日韩 精品 国产| 99热全是精品| 一二三四中文在线观看免费高清| videossex国产| 亚洲三区欧美一区| 亚洲第一av免费看| 人人妻人人爽人人添夜夜欢视频| 免费高清在线观看日韩| 国产一区二区三区综合在线观看| 欧美激情极品国产一区二区三区| 国产高清不卡午夜福利| 亚洲人成77777在线视频| 成人毛片a级毛片在线播放| 美女主播在线视频| 制服人妻中文乱码| 女性被躁到高潮视频| 日韩制服丝袜自拍偷拍| 国产色婷婷99| 亚洲,欧美精品.| 国产福利在线免费观看视频| 2022亚洲国产成人精品| 99久久中文字幕三级久久日本| 国产一区二区 视频在线| 国产精品久久久久久精品电影小说| 欧美 日韩 精品 国产| 赤兔流量卡办理| 中文字幕人妻丝袜一区二区 | 中文字幕制服av| 成年动漫av网址| 青春草视频在线免费观看| 精品国产一区二区三区久久久樱花| 欧美日韩视频精品一区| 欧美国产精品一级二级三级| 欧美人与性动交α欧美软件| 高清欧美精品videossex| 日韩 亚洲 欧美在线| 日韩中文字幕欧美一区二区 | 午夜av观看不卡| 亚洲精品乱久久久久久| 亚洲av电影在线观看一区二区三区| 黄片播放在线免费| 王馨瑶露胸无遮挡在线观看| 女的被弄到高潮叫床怎么办| 成年女人在线观看亚洲视频| 男女无遮挡免费网站观看| 人妻 亚洲 视频| 欧美日本中文国产一区发布| 人人妻人人澡人人爽人人夜夜| 午夜福利在线观看免费完整高清在| 午夜福利在线观看免费完整高清在| 纵有疾风起免费观看全集完整版| 成人国产麻豆网| 在线观看国产h片| 国产精品久久久久久精品电影小说| 麻豆乱淫一区二区| 亚洲欧美精品综合一区二区三区 | 亚洲国产成人一精品久久久| 久久av网站| 久久久久久久久久久久大奶| 大码成人一级视频| 午夜免费鲁丝| 国产老妇伦熟女老妇高清| 亚洲国产成人一精品久久久| 久久亚洲国产成人精品v| 亚洲精品自拍成人| 国产精品 欧美亚洲| 久久午夜福利片| 午夜久久久在线观看| 日本av免费视频播放| 久久99精品国语久久久| 日本vs欧美在线观看视频| 亚洲,一卡二卡三卡| 亚洲国产精品成人久久小说| 精品99又大又爽又粗少妇毛片| 97人妻天天添夜夜摸| 亚洲精品中文字幕在线视频| 日韩电影二区| 99国产综合亚洲精品| 美女高潮到喷水免费观看| 亚洲一区二区三区欧美精品| 久久久国产一区二区| 99久国产av精品国产电影| 在线观看人妻少妇| 国产探花极品一区二区| 亚洲精品aⅴ在线观看| 美女国产视频在线观看| 午夜91福利影院| 亚洲三区欧美一区| 国产又爽黄色视频| 国产欧美亚洲国产| 2022亚洲国产成人精品| 亚洲精品国产色婷婷电影| 黄色视频在线播放观看不卡| 免费黄网站久久成人精品| 亚洲国产欧美日韩在线播放| 高清在线视频一区二区三区| 成人午夜精彩视频在线观看| 午夜福利在线观看免费完整高清在| av免费在线看不卡| 有码 亚洲区| 国产成人a∨麻豆精品| 少妇熟女欧美另类| 国产成人精品一,二区| 国产一区二区三区综合在线观看| 电影成人av| 成人国语在线视频| 少妇精品久久久久久久| 亚洲成av片中文字幕在线观看 | 成人毛片60女人毛片免费| 免费播放大片免费观看视频在线观看| 日韩av免费高清视频| 精品一区在线观看国产| av国产精品久久久久影院| 97人妻天天添夜夜摸| 久久精品国产鲁丝片午夜精品| 亚洲第一青青草原| 中文字幕人妻丝袜制服| 婷婷色综合大香蕉| 97在线视频观看| 一本大道久久a久久精品| 亚洲国产欧美日韩在线播放| 在线观看免费视频网站a站| 午夜福利在线免费观看网站| 在线观看免费日韩欧美大片| 性少妇av在线| 久久午夜综合久久蜜桃| 久久午夜福利片| 欧美成人午夜精品| 精品亚洲乱码少妇综合久久| 亚洲第一青青草原| 你懂的网址亚洲精品在线观看| 老鸭窝网址在线观看| 91国产中文字幕| 一本—道久久a久久精品蜜桃钙片| 精品少妇久久久久久888优播| 久久久精品区二区三区| 欧美+日韩+精品| 如日韩欧美国产精品一区二区三区| 色吧在线观看| 2021少妇久久久久久久久久久| 伊人久久国产一区二区| 自线自在国产av| 久久久国产欧美日韩av| 考比视频在线观看| 亚洲第一区二区三区不卡| 青草久久国产| 丰满乱子伦码专区| 人妻系列 视频| 人妻 亚洲 视频| 大陆偷拍与自拍| 哪个播放器可以免费观看大片| 国产高清国产精品国产三级| 国产精品久久久久久精品电影小说| 波多野结衣av一区二区av| 久久精品人人爽人人爽视色| 欧美精品人与动牲交sv欧美| a级毛片在线看网站| 免费在线观看黄色视频的| 多毛熟女@视频| www日本在线高清视频| 国产av一区二区精品久久| 这个男人来自地球电影免费观看 | 26uuu在线亚洲综合色| 夫妻午夜视频| 一二三四在线观看免费中文在| 精品人妻在线不人妻| 男女啪啪激烈高潮av片| 菩萨蛮人人尽说江南好唐韦庄| 国产成人a∨麻豆精品| 高清在线视频一区二区三区| 制服丝袜香蕉在线| 欧美国产精品一级二级三级| 亚洲精品成人av观看孕妇| 男人添女人高潮全过程视频| 交换朋友夫妻互换小说| 男女高潮啪啪啪动态图| 2021少妇久久久久久久久久久| 丝袜喷水一区| 高清黄色对白视频在线免费看| 国产亚洲午夜精品一区二区久久| 久久久精品区二区三区| www.av在线官网国产| 最黄视频免费看| 久久 成人 亚洲| 亚洲国产精品成人久久小说| 亚洲精品中文字幕在线视频| 熟女av电影| 欧美精品一区二区免费开放| 日日爽夜夜爽网站| 免费日韩欧美在线观看| 久久狼人影院| 亚洲欧美一区二区三区黑人 | 国产成人欧美| 欧美日本中文国产一区发布| 免费在线观看黄色视频的| 美女福利国产在线| 老熟女久久久| 久久女婷五月综合色啪小说| 日韩大片免费观看网站| 美女国产高潮福利片在线看| 丝袜喷水一区| 国产高清不卡午夜福利| 日韩不卡一区二区三区视频在线| 91精品伊人久久大香线蕉| 黑人巨大精品欧美一区二区蜜桃| 久久女婷五月综合色啪小说| 少妇人妻 视频| av国产久精品久网站免费入址| 亚洲国产精品国产精品| 国产精品一区二区在线观看99| 极品人妻少妇av视频| 久久97久久精品| 亚洲欧美精品综合一区二区三区 | 国产有黄有色有爽视频| 2018国产大陆天天弄谢| 亚洲精品中文字幕在线视频| 免费看不卡的av| 亚洲熟女精品中文字幕| 在线观看国产h片| 国产精品久久久久久精品电影小说| 欧美黄色片欧美黄色片| 99热网站在线观看| 一二三四在线观看免费中文在| 在线天堂中文资源库| 亚洲成国产人片在线观看| 亚洲精品国产av成人精品| 热re99久久精品国产66热6| 亚洲成色77777| 国产精品三级大全| 超碰97精品在线观看| 黄频高清免费视频| 午夜免费鲁丝| 精品亚洲成国产av| 香蕉丝袜av| 美女国产视频在线观看| 国精品久久久久久国模美| 免费黄网站久久成人精品| 午夜福利在线免费观看网站| av一本久久久久| 精品亚洲乱码少妇综合久久| 午夜日本视频在线| 最近手机中文字幕大全| 一区福利在线观看| 男人操女人黄网站| 少妇猛男粗大的猛烈进出视频| 亚洲伊人色综图| 美女大奶头黄色视频| 校园人妻丝袜中文字幕| 少妇精品久久久久久久| 成人毛片60女人毛片免费| av一本久久久久| 国产亚洲精品第一综合不卡| 超色免费av| 欧美 亚洲 国产 日韩一| 国产精品二区激情视频| 日韩大片免费观看网站| 韩国av在线不卡| 久久久久久久久久人人人人人人| 多毛熟女@视频| 伊人久久大香线蕉亚洲五| 亚洲av日韩在线播放| 男人舔女人的私密视频| 国产片内射在线| 中文字幕亚洲精品专区| 欧美日韩综合久久久久久| 亚洲综合色惰| 少妇 在线观看| 伦理电影大哥的女人| 日韩av免费高清视频| 亚洲av综合色区一区| 免费观看av网站的网址| 久久影院123| 国产深夜福利视频在线观看| 午夜福利网站1000一区二区三区| 免费观看性生交大片5| 午夜福利一区二区在线看| 久久久久久伊人网av| 最近2019中文字幕mv第一页| 女人久久www免费人成看片| 国产97色在线日韩免费| 侵犯人妻中文字幕一二三四区| 国产在线免费精品| 大片电影免费在线观看免费| 国产无遮挡羞羞视频在线观看| 五月开心婷婷网| 国产男人的电影天堂91| 亚洲精品在线美女| 美女高潮到喷水免费观看| 自拍欧美九色日韩亚洲蝌蚪91| 日韩一区二区三区影片| 色网站视频免费| 黑人猛操日本美女一级片| 三上悠亚av全集在线观看| 亚洲国产精品一区二区三区在线| 亚洲欧洲精品一区二区精品久久久 | av有码第一页| 男女边吃奶边做爰视频| 国产欧美日韩一区二区三区在线| 国产免费福利视频在线观看| 久久综合国产亚洲精品| 在线观看一区二区三区激情| 麻豆av在线久日| 亚洲欧美一区二区三区久久| 十八禁网站网址无遮挡| 久久久精品94久久精品| 国产野战对白在线观看| 日韩精品有码人妻一区| 免费看av在线观看网站| 一级,二级,三级黄色视频| 日日爽夜夜爽网站| 午夜免费观看性视频| 日韩不卡一区二区三区视频在线| 最近2019中文字幕mv第一页| 国产熟女欧美一区二区| 高清不卡的av网站| 黄片小视频在线播放| 777米奇影视久久| av线在线观看网站| 亚洲天堂av无毛| 97精品久久久久久久久久精品| 狠狠婷婷综合久久久久久88av| 啦啦啦在线免费观看视频4| 另类亚洲欧美激情| 久久久国产一区二区| 精品亚洲成国产av| 少妇人妻 视频| 女的被弄到高潮叫床怎么办| freevideosex欧美| 热99国产精品久久久久久7| 亚洲成色77777| 我的亚洲天堂| 美女中出高潮动态图| 搡女人真爽免费视频火全软件| 最近的中文字幕免费完整| 久久精品国产亚洲av高清一级| 咕卡用的链子| 亚洲经典国产精华液单| 天美传媒精品一区二区| 欧美激情 高清一区二区三区| √禁漫天堂资源中文www| 免费少妇av软件| av片东京热男人的天堂| 18禁动态无遮挡网站| 一区二区三区精品91| 国产精品99久久99久久久不卡 | 亚洲成av片中文字幕在线观看 | 亚洲欧美一区二区三区国产| 中文字幕另类日韩欧美亚洲嫩草| 午夜福利视频精品| 成年人免费黄色播放视频| 人人澡人人妻人| 黄片小视频在线播放| 国产片内射在线| 亚洲三级黄色毛片| 日韩三级伦理在线观看| av线在线观看网站| 欧美日韩精品成人综合77777| 亚洲av综合色区一区| 欧美日韩成人在线一区二区| 国产精品嫩草影院av在线观看| 精品酒店卫生间| 亚洲色图综合在线观看| 深夜精品福利| 久久久久久久久免费视频了| 18禁国产床啪视频网站| 激情五月婷婷亚洲| 精品人妻在线不人妻| 97在线视频观看| 国产毛片在线视频| 日韩av免费高清视频| 亚洲欧美色中文字幕在线| 26uuu在线亚洲综合色| 亚洲精品在线美女| 麻豆av在线久日| 久久久久国产精品人妻一区二区| 交换朋友夫妻互换小说| 老司机亚洲免费影院| 中文字幕另类日韩欧美亚洲嫩草| 丰满少妇做爰视频| 人体艺术视频欧美日本| 在线观看免费高清a一片| 中文字幕人妻丝袜一区二区 | 成人国产av品久久久| 超色免费av| 一边摸一边做爽爽视频免费| 欧美精品人与动牲交sv欧美| 亚洲精品日本国产第一区| 在线看a的网站| 亚洲综合精品二区| av.在线天堂| 免费在线观看视频国产中文字幕亚洲 | 国产精品免费视频内射| 最近最新中文字幕大全免费视频 | 国产av精品麻豆| 精品卡一卡二卡四卡免费| 在线天堂中文资源库| 综合色丁香网| 日韩,欧美,国产一区二区三区| 校园人妻丝袜中文字幕| 日本黄色日本黄色录像| av在线播放精品| 女人久久www免费人成看片| 在线观看免费高清a一片| 成人亚洲欧美一区二区av| 一级毛片我不卡| 国产av码专区亚洲av| 超碰成人久久| 国产女主播在线喷水免费视频网站| 国产1区2区3区精品| 国产男女超爽视频在线观看| av卡一久久| 久久精品人人爽人人爽视色| 国产成人免费无遮挡视频| 国产在线免费精品| 免费观看无遮挡的男女| 亚洲经典国产精华液单| 美国免费a级毛片| 久久99一区二区三区| 国产精品免费视频内射| 中文欧美无线码| 丝袜在线中文字幕| 一区二区av电影网| 久久精品国产鲁丝片午夜精品| 国产成人91sexporn| av在线app专区| 色婷婷av一区二区三区视频| 欧美av亚洲av综合av国产av | 99久久中文字幕三级久久日本| 国产精品久久久av美女十八| 国产亚洲午夜精品一区二区久久| 国产又色又爽无遮挡免| 黑人欧美特级aaaaaa片| 精品少妇黑人巨大在线播放| 亚洲 欧美一区二区三区| 五月开心婷婷网| 嫩草影院入口| 国产有黄有色有爽视频| av不卡在线播放| 男的添女的下面高潮视频| 一级黄片播放器|