• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A New Hybrid Hierarchical Parallel Algorithm to Enhance the Performance of Large-Scale Structural Analysis Based on Heterogeneous Multicore Clusters

    2023-02-17 03:11:48GaoyuanYuYunfengLouHangDongJunjieLiandXianlongJin

    Gaoyuan Yu,Yunfeng Lou,Hang Dong,Junjie Li and Xianlong Jin,*

    1School of Mechanical Engineering,Shanghai Jiao Tong University,Shanghai,200240,China

    2Aerospace System Engineering Shanghai,Shanghai,201108,China

    3School of Aerospace,Mechanical and Mechatronic Engineering,University of Sydney,Sydney,NSW,2006,Australia

    ABSTRACT Heterogeneous multicore clusters are becoming more popular for high-performance computing due to their great computing power and cost-to-performance effectiveness nowadays.Nevertheless,parallel efficiency degradation is still a problem in large-scale structural analysis based on heterogeneous multicore clusters.To solve it,a hybrid hierarchical parallel algorithm(HHPA)is proposed on the basis of the conventional domain decomposition algorithm(CDDA)and the parallel sparse solver. In this new algorithm,a three-layer parallelization of the computational procedure is introduced to enable the separation of the communication of inter-nodes,heterogeneous-core-groups(HCGs)and inside-heterogeneous-core-groups through mapping computing tasks to various hardware layers.This approach can not only achieve load balancing at different layers efficiently but can also improve the communication rate significantly through hierarchical communication.Additionally,the proposed hybrid parallel approach in this article can reduce the interface equation size and further reduce the solution time, which can make up for the shortcoming of growing communication overheads with the increase of interface equation size when employing CDDA.Moreover,the distributed sparse storage of a large amount of data is introduced to improve memory access.By solving benchmark instances on the Shenwei-Taihuzhiguang supercomputer,the results show that the proposed method can obtain higher speedup and parallel efficiency compared with CDDA and more superior extensibility of parallel partition compared with the two-level parallel computing algorithm(TPCA).

    KEYWORDS Heterogeneous multicore;hybrid parallel;finite element analysis;domain decomposition

    Nomenclature

    1 Introduction

    With the development of transportation,energy exploration and exploitation,aerospace industry,etc.,there have been increasing demands for developing complex large or ultra-large systems,such as highspeed-multiple-unit trains,3000-meter-ultradeep drilling rigs,large aircraft,heavy-duty fracturing trucks,and optimizing major engineering projects,including cross-river tunnels and skyscrapers[1,2].Research and development related to these particular equipment systems usually involve systemic complex dynamical evaluations for their performance and utilize the finite element method to execute high-performance computing[3].However,applying finite element analysis in complex systems faces a high degree of freedom (DOF) and contains various factors such as nonlinear and complicated boundary conditions,making it high computational complexity and requiring enormous amounts of computational power.Therefore,it challenges the traditional finite element method and tools[4].In the conventional finite element approach,some vital local details are often sacrificed for a simplified model to guarantee the calculation efficiency in large or ultra-large systems.As a result,the prediction ability for these details will be lost,accompanied by a lower computational precision.Compared to the conventional finite element method,fine modeling of complex dynamical systems has higher accuracy and can predict essential details,but it requires heavy computing based on large-scale or ultra-largescale finite element models.Consequently,efficient solutions of fine modeling cannot be obtained on traditional serial computers[5].Nowadays,parallel supercomputers have been developed rapidly.The research and explorations of parallel algorithms provide a feasible way to solve large and ultra-large complex systems and thus attract researchers’attention worldwide.

    Currently, two main research algorithms are commonly used as finite element parallel solution algorithms. One is starting from the most time-consuming linear equations in the finite element structural analysis to find the effective parallel computing method for solving linear equations [6-8].The other one starts from the parallelism of the finite element method itself to develop a parallel domain decomposition method [9-11]. Specifically, the research of parallel algorithms for linear equations is mainly focusing on two methods,the direct scheme[6,12]and the iterative scheme[13,14].In the direct method,the exact solution of the system can be obtained within the expected calculation steps through sorting algorithm, triangular decomposition, and back substituting. However, the required memory space and computing power will grow rapidly with the increase of the calculation scale,so its scalability is not as good as the iterative method[6,12].The iterative approach improves the solution results through multiple iterations to achieve a convergence interval within the allowable error range of the exact solution.The memory required in the iterative process is relatively small,and it is easy to achieve parallelization. However, the iterative method cannot guarantee a convergence within a reasonable time,and the convergence of ill-conditioned problems with considerable condition number [7,13]. In the field of parallel computing on CDDA, the hybrid method with direct and iterative approaches has been employed by researchers. They start from the parallelism of the finite element problem itself, dividing the complex problem into smaller sub-tasks for parallel processing,and applying the direct method on sub-tasks and iterative method on sub-domains.The hybrid method takes both advantage of direct and iterative schemes, which can improve parallel efficiency. And thus, it has been widely used in the finite elements structural field [2,15-19]. Based on the CDDA,Miao et al. [15] designed a hierarchical parallel calculation method in finite element analysis for static structures. This method was later applied to a high-rise building to find its parallel solution to static structural characteristics. El Gharbi et al. [16] utilized the CDDA to design a two-level parallel mesh algorithm. And they applied the algorithm to complete the mesh for a turbine blade.Koric et al.[17]brought the CDDA to solve the static structural characteristics of a charge air cooler under the unstructured grids.Fialko et al.[18]used the CDDA to obtain the parallel solution of the static characteristics of a multi-storey building. Klawonn et al. [19] also combined CDDA with the FE2approach to take advantage of the largest supercomputers available and those of the upcoming exascale era for virtual material testing of micro-heterogeneous materials.However,with the increased number of sub-tasks in complex problems, the scale of the interface equations formed by each subtask and their condition number also increases dramatically.Moreover,applying the iterative method to solve the interface equations involves the overheads generated by global communication between the sub-tasks and local communication within sub-tasks,which further reduces the efficiency of parallel computing.

    Regarding the hardware in high-performance computing, parallel computing systems usually employ distributed storage parallel mode, which mainly includes the homogeneous supercomputers[20] and the heterogeneous supercomputers [21]. Specifically, heterogeneous supercomputers have become the mainstream in high-performance computers due to their excellent computing power and high cost-to-performance effectiveness.Heterogeneous supercomputers are often equipped with multicore CPUs, also with GPUs and many integrated cores (MIC) processors or heterogeneous groups to complete the computing tasks. The APU project from the AMD Company is such an example[22].Other examples include the Echolen project,led by the NVIDIA[23],The Runnemede project, led by Intel Corporation [24], the ‘Shenwei-Taihuzhiguang’project from Wuxi-Hending Company[2],etc.On this basis,researchers have ported and improved CDDA to solve the structural performance of large-scale and ultra-large-scale finite element systems and achieved remarkable results[25,26]. Xue et al. [25] gained the numerical simulation of the three-dimensional compressible Euler atmospheric model system using the CDDA based on the CPU-MIC architecture. With the base of CPU-MIC architecture on CDDA,Miao et al.[26]designed a new hybrid solver with TPCA for largescale structural analysis.And it was applied to the parallel computation of a tunnel to find its dynamic characteristics.Moreover,the different storage mechanisms and non-uniform communication latencies on the heterogeneous multicore distributed groups also introduce new challenges for the design of efficient parallel algorithms in large-scale structural analysis [27]. Thus, the keys to improving the efficiency of the finite element parallel algorithm of the heterogeneous supercomputer are: (1)to balance loads of computational tasks and hardware topology architecture of the heterogeneous multicore clusters; (2) to store the large-scale data generated in the computing process and (3) to guarantee the communication between inter-node and intra-node,within and between heterogeneous groups.

    The main contribution of this paper is to provide a novel hybrid solver that is aware of the characteristics of heterogeneous multicore clusters and fully exploits their computing power to achieve optimal performance. In the proposed algorithm, the hybrid parallel partitioning is adopted based on the mesh partition and data partition, which reduces the assembling scale of global interface equations. And it improves the memory access efficiency of data by introducing distributed sparse storage of data in the computing process. By utilizing the computing tasks and the mapping of hardware topology structure on the heterogeneous multicore clusters, this method can also realize a three-layer parallelization in the computation procedure:the inter-nodes parallelization,the HCGS parallelization, and the inside-HCGs parallelization. This method not only achieves load balancing at different levels in an effective way but also improves communication efficiency by separating the communication.As a consequence,the work in this paper can be provided as a reference for porting the finite element structural analysis on the ‘Shenwei’heterogeneous multicore processor and other heterogeneous processors to optimize the performance of large-scale parallel implementations.

    The rest of this paper is organized as follows:In Section 2,the related works CDDA and TPCA are introduced to solve the structural large-scale finite element analysis.Then,in Section 3,the HHPA proposed with mesh partition,data partition and the implementation of HHPA with the best respect for the architecture of the Shenwei heterogeneous multicore processor is described.In Section 4,two numerical experiments are presented.Finally,conclusions are drawn in Section 5.

    2 Related Works

    To present the proposed algorithm, some parallel computing algorithms for large-scale finite element analysis are introduced for better understanding.

    2.1 Review of CDDA

    Applying CDDA to solve the large-scale finite element structural analysis,the finite element mesh will first be partitioned into series sub-domains[28],as shown in Fig.1.

    The system equations of partitioned sub-domains at the moment oft+Δtcan be expressed as the following equation equivalently with Schur formulation[15-19]:

    Figure 1:Partitioning and condensation of CDDA

    In the Eqs.(2)~(3),the values ofa0~a7are decided byα,δ,αm,αftogether,which are calculated by Eq.(4):

    Eliminate the internal DOFs to obtain the interface equation that only with the unknowns of the boundary DOFs:

    Combining the interface equations from Eq.(5) in every sub-domain and solving them using parallelPCGalgorithm can determineXB(t+Δt)for each sub-domain. And the internal displacement of each sub-domain can be solved by back substituting according to Eq.(8):

    Finally,calculate the stressσ/strainεof each sub-domain through Eqs.(9)and(10)by substituting the internal and boundary displacement synchronously as follows:

    To reduce memory and calculation amount,the procedure of finding inverse matrixes should be avoided [31]. Considered thatKIIis symmetric positive-definite andis reused for many times in Eqs.(6)~(8),theLDLTalgorithm is used in the calculation of condensation to avoid finding inverse matrixes,as shown in Eq.(11).Then matrix vector operations related tocan be converted into the solution of linear equations.

    2.2 Review of TPCA

    TPCA [16] was proposed to improve the parallel efficiency of large-scale structural dynamic analysis through two-level partitioning and two-level condensation based on CDDA, as shown in Fig.2.Compared with CDDA,TPCA can reduce the solution scale of the overall interface equations of the system through multilayer partition and multi-condensation and thus can effectively improve the parallel efficiency of finding the solution.However,TPCA will require additional time in condensing sub-domains,assembling and solving the interface equations in sub-domains level 2 with the increasing number of sub-tasks,and therefore will limit the efficiency of the parallel system to a certain degree.

    Figure 2:Partitioning and condensation of TPCA

    3 Proposed HHPA Based on Shenwei Heterogeneous Multicore Processor

    3.1 Proposed HHPA

    To reduce time in condensing sub-domains,assembling and solving the interface equations in subdomains level 2 with the increasing number of sub-tasks,HHPA was proposed o the basis of CDDA and parallel sparse solver,as shown in Fig.3.

    Figure 3:Partitioning and condensation of HHPA

    There are two partition methods,namely mesh partition and data partition.For each subdomain,the nodes can be divided into internal node and boundary node according to mesh partition. The system equations of partitioned sub-domains at the moment oft+Δtcan be expressed as Eq.(1).andcan be parallel calculated by Eqs.(2) and (3). Compressed sparse column techniques and distributed storage are applied to storeKandMaccording to data partition. Considering the solution scale ofKIIand multiple reuses ofKII, the parallel sparse solver was used in Eqs.(6)~(8)and Eq.(11).Then,combining the interface equations from Eq.(5)in every sub-domain and solving them using parallelPCGalgorithm can determineXB(t+Δt)for each sub-domain.Finally,the internal displacement/stressσ/strainεof each sub-domain can be calculated by parallel back substituting according to Eqs.(8)~(10).

    The differences between the different approaches CDDA, TPCA and HHPA are shown in Table 1.Also,Fig.4 demonstrates the block diagram of the proposed algorithm.During the parallel computing, each subdomain in level 1 is assigned to a node. All the subdomains in level 2 derived from the same subdomain in level 1 are assigned to different cores within the same node.Besides,MPI process 0 in each node is applied to take charge of operations of the corresponding subdomain in level 1 and the solution of global interface equations. The process of HHPA can be divided into steps as follows.

    Table 1: Differences between the different approaches CDDA,TPCA and HHPA

    Step 1: Prepare data for domain decomposition parallel computing of mesh partition. Data contains information on finite element models (element stiffness matrix, mass matrix and external load vector)and partition information files.There is no communication for this step.

    Step 2: Createp×mprocesses on the node side synchronously. Everymof the MPI process is responsible for one sub-domain data file data reading, whereprepresents the number of nodes participating in parallel computing andmrepresents the number of heterogeneous cluster terminals on each node.There is no communication for this step.

    Step 3:Each MPI process will deriveqthreads from the heterogeneous group to assembleandFwithin sub-domains,whereqrepresents the number of computing cores.for each sub-domain only needs to be calculated once in the first calculation.In later calculations,it only requires updating the effective load vector in every step.There exists communication within the same MPI process.

    Figure 4:Block diagram of proposed HHPA

    Step 4:Every MPI process of each node uses its derivedqthreads to join the condensing based on Eqs.(5)~(7) for each sub-domain synchronously. And then distribute and save ?Kand ?Fto its corresponding shared memory space in the MPI process. There exists communication within and among the MPI process.

    Step 5: Use the derivedqthreads to solve interface equations by parallelPCGalgorithm with communication among nodes.

    Step 6:Use the heterogeneous clusters in each node to solve the internal displacement with parallel back substituting according to Eqs.(9)~(10).There exists communication within and among the MPI process.There exists communication within and among the MPI process.

    Step 7:Use the derivedqthreads to back substitute to get the value of the stressσ/strainεfor each sub-domain. And then distribute and store the calculation results in corresponding shared memory space in the MPI process.There exists communication within and among the MPI process.

    Step 8:If there is no need for further iterations,stop calculation;else back to Step 2.

    3.2 Architecture and Execution Mode of Shenwei Processor

    Shenwei-Taihuzhiguang supercomputer is based on Shenwei heterogeneous multicore processor[2]. The architecture of the Shenwei heterogeneous multicore is shown in Fig.5. Every Shenwei heterogeneous multicore processor includes four HCGs,and each HCG shares 32 G of memory.Every single HCG consists of one computational control core and 64 computing cores.The communication between the HCGs adopts the bidirectional 14 Gbits/s bandwidth.And the communication between the computational control and the computing core adopts the DMA method to get bulk access to the main memory.The local storage space of the computing core is 64 KB,and the command storage space is 16 KB.

    Figure 5:Architecture of Shenwei heterogeneous multicore processor

    The application modes of Shenwei heterogeneous multicore processors mainly include the private mode and the full-chip shared mode of the HCGs.For the private mode of heterogeneous-core groups,every HCGs at each node shares 8 G memory space;On the other hand,for the full-chip mode of the HCGs,the full-chip has 32 G memory,or 16 G memory can be used by one MPI process.Since the computing power in the full-chip mode of the HCG is weaker, it is often used in the case of large memory requirements.Therefore,our project mainly adopts the private mode of the HCG to design the hybrid hierarchical parallel computing method for large-scale finite element structures.

    3.3 Task Mapping of Parallel Algorithms

    Mapping the computing tasks to different hardware layers of heterogeneous multicore supercomputers can achieve a balance-load among different nodes,as well as implement efficacious partitioning in communication,thereby significantly reducing communication overheads[32].

    Based on CDDA, considering the hardware architecture of heterogeneous multicore supercomputers,the design of task mapping of HHPA for large-scale finite element structural analysis is shown in Fig.6. When performing task mapping, the initial mesh in the sub-domains and the interface equation systems will be mapped according to the node order. The condensing of each sub-domain and the internal back substituting within each sub-domain will be mapped corresponding to the HCGs within the internal node.And the mapping of data calculations within HCGs is based on the computing core.

    Figure 6:Task mapping of HHPA for structural large-scale finite element analysis

    3.4 Large-Scale Parallel Computing Mechanism

    Consider the fact that the communication efficiency of intra-node is much higher than internode. And the efficiency within the same HCG is much higher than that among the HCGs in heterogeneous multicore distributed storage computers. Thus, the key to reducing the overheads raised from communication and cooperation in the calculation is to separate and minimize the communication between intra-node and inter-node and the communication among the HCG and within the HCG.Based on the multilayer and multigrain parallel computing approach to large-scale finite element structural analysis,a computing strategy with a three-parallel layer has been designed,as shown in Fig.7.

    Figure 7:Scheme of three-layer parallelization

    The first layer parallelization: In every node,mprocesses are responsible for processing one mesh subdomain,and all processes are operated synchronously.There is no data interaction between processes,but data interactions exist between the computational control cores and the computing cores within the process.Data progress procedure including computational control cores read the model data in sub-domain, and computing cores assemble the system equations for parallel computing of subdomains. To save the memory space and to reduce the amount of computation, compressed sparse column techniques are applied to store the local stiffness matrix of each sub-domain.

    The second layer parallelization:In this layer,based on theLDLTalgorithm,each node startsmprocesses simultaneously to carry out condensation and back substitution for the corresponding mesh sub-domain.Communications exist between different HCGs within the same node,and communications also exist in the computational control cores and the computing cores in one HCG.Compressed sparse column techniques are used here for distributed storage of computational data in the procedure.After calculations,the result data will be sent to the main HCG corresponding to each node named No. 0, which is in charge of assembling the effective stiffness matrix and the effective load vector corresponding to its mesh sub-domain.

    SuperLU [33] is a library that implements algorithms for parallel and serially solving of sparse linear equations.SuperLU_S can provide serialLDLTalgorithm.And SuperLU_D can provide parallelLDLTalgorithm,which is used in the second layer parallelization of proposed HHPA.However,the SuperLU_D cannot use the computing cores of the HCG directly. Through MPI or OpenMPI library,SuperLU_D can only call the computational control core of the HCG.Considering that the implementation process of the parallelLDLTalgorithm mainly includes parallelLDLTdecomposition and parallel solution of triangular systems of linear equations,the heterogeneous multicore acceleration is utilized to improve process improvement. The primary operations for computing are matrixvector calculus and data communications.The communication procedure between different HCGs is realized by adopting the MPI library,and Athread library is utilized to achieve communication within each HCG.Matrix-vector calculus mainly includes addition,subtraction,multiplication and division.Taking vector multiplication a=b×c as an example (a, b, c are storage arrays in the process of arbitrary matrix-vector operations),the implementation process is shown in Fig.8.The 64 computing cores on each HCG read the corresponding data from the memory space synchronously, where the memory of this data segment must be less than 64 KB.And the data will return to specific locations after calculation.The communication only exists in the computational control cores and computing cores.

    Figure 8:Vector matrix multiplication on heterogeneous multicore acceleration

    The third layer parallelization:In the third layer,interface equations of the sub-domains are solved based on the parallel PCG algorithm.Only one HCG participates in computing and communication in each node,as illustrated in Fig.9.The diagonal preconditioners are constructed locally using the condensed stiffness matrices of subdomains in level 1[26,34].The diagonal preconditioning requires solving the systemhi=Solving the preceding system is equivalent to dividing each element of r by a diagonal entry of the corresponding row of the condensed stiffness matrices.Since the rows of the stiffness matrix and the corresponding elements of the vectors reside in the same HCG,this operation does not require any communications among HCGs.

    Figure 9:Parallel PCG algorithms

    During the solving procedure,the effective stiffness matrix and effective load vector of each subdomain are distributed and stored in the memory space of the corresponding HCG. Similarly, the intermediate results are stored in matrix-vector product form based on compressed sparse column distributed storage.Global communications only exist between neighboring sub-domains.And just a few dot product operations and computations for overall iterative error require global communications.Local communications are restricted within the computational control core and the computing cores of each HCG. Hence, this method can remarkably reduce the amount of communication and speed up the computing efficiency.

    To sum up, HHPA can limit the majority of local communications within the node and ensure there is one HCG to participate in global communication for each node. At the same time, each HCG confines a large amount of communication between the computational control cores and the computing cores through multicore acceleration. And also guarantee that only the computing control cores participate in the communication between the core groups for each HCG.Through this operation, communications between intra-node and inter-node and the communication within and among HCGs can be separated effectively.As a result,it reduces the communication and cooperation overheads in the computing process and improves communication efficiency.

    4 Numerical Experiments and Discussion

    To verify the performance of the proposed algorithm,the Shenwei-Taihuzhiguang supercomputer is employed for testing,and each node operates four HCGs during the test.

    4.1 Case of a Cube in Bending

    A cube model was used to evaluate the performance of the parallel algorithm.The model and its load-time curve are shown in Fig.10. The cube is fixed at its left end face, and the right end face is under a uniform sinusoidal load with a peak force of 106N.The cube is 20min length,10min width and 10min height.After discretizing with tetrahedral element,the model contains 6,886,981 elements,7,037,638 nodes,and 11,007,819 DOFs.The material assigned to this model with an Elastic Modulus of 210Gpa,Poisson’s ratio of 0.3 and density of 7850Kg/m3.

    Figure 10:Finite element model of cube and load time curve for p

    In parallel computing,the operating number of nodes should be 32,64,96,and 128,respectively.The criterions of the parallel PCG in CDDA,TPCA and HHPA are 1e-8.The sub-domains should be prepared in advance to correspond to the total number of nodes when applying HHPA proposed in this article.When applying TPCA,the sub-domains in level 2 should be prepared to correspond to four times the total node number[15].And for CDDA,the pre-prepared sub-domains should also match the 4 times total number of nodes.SuperLU_S is adopted by CDDA and TPCA in the condensation of the subdomains.

    In order to verify the accuracy of the parallel algorithm computation, 12 sampling points are picked,as shown in Fig.10.The goodness-of-fit with the theoretical solution is adopted to evaluate the precision of the solution based on the three algorithms, as shown in Fig.11. The definition of goodness-of-fit is shown in Eq.(12):

    Figure 11:R2 value of different methods

    In Eq.(12),nrepresents the number of the sampling points with the changing of the time curve,which is 25.Yiis the theoretical solution(the displacement changes with time inx,y,andzdirections).Xiis the finite element solution (the displacement changes with time inx,y, andzdirections).Xis the average value of the sampling points. The error becomes smaller when the value ofR2approaches 1.

    It can be observed from Fig.11 that whether using the CDDA, TPCA, or HHPA proposed in this paper,the displacement-time curve in all directions for all sampling points is in good agreement with the theoretical solution based on the elasticity and solid mechanics, with the goodness-offit close to 1. Hence, the computational accuracy of the CDDA, TPCA and HHPA is reasonable.However, there are some minor differences between CDDA, TPCA and HHPA. This is because the ordering of operations and rounding errors in calculation with CDDA,TPCA and HHPA are slightly different.

    The detailed results for the cube model of the CDDA,TPCA,and HHPA are shown in Tables 2-3.The performance of parallel algorithm for cube model of the CDDA,TPCA,and HHPA is shown in Fig.12.

    Table 2: Interface problem sizes and iteration for the cube model

    Table 3: Statistics of time and performance of parallel computing for the cube model

    Figure 12:Performance of parallel algorithms for cube

    In Table 3,the total time for parallel computing starts from calculating system equations and ends when obtaining the deformation/strain/stress solutions for every sub-domain.Solution time for level 1 with TPCA includes:assembling system equations of level 1 and the level 2 sub-domains,condensing level 2 sub-domains,solving interface equations with the parallel algorithm of level 1 and level 2 subdomains,and back substituting the internal DOFs for level 1 and level 2 sub-domains.Solution time for level 1 with HHPA includes:assembling level 1 sub-domains,condensing level 1 sub-domains with the parallel solution of level 2 sub-domains, solving interface equations with the parallel algorithm of level 1 sub-domains, and parallel back substituting the internal DOFs for level 2 sub-domains.Compared to the CDDA,both TPCA and HHPA proposed in this article can achieve higher speedup and parallel efficiency,which is shown in Fig.12.This is because the scale and condition number grow dramatically with the increase of sub-domains in the CDDA,which results in longer solving time for the interface equations,and a weaker overall parallel efficiency.Compared with the TPCA,when the number of level-1 sub-domain is small,the computing efficiency and speedup of HHPA proposed in this article are relatively low.But with the increase in the level-1 sub-domains,this algorithm comes with relatively high parallel computing efficiency and speedup.From the mathematics point of view,TPCA and CDDA have the same interface equations of level 1.When the number of level 1 sub-domain is small, the scale and condition number of the system equations are larger. Although utilizing the parallel solvers can save time for assembling the level 2 sub-domain system equations,condensing,and solving time for the level 2 interface equations,the time to solve the sub-domain system is still relatively high.Thus,it has lower parallel efficiency and speedup.Nevertheless,with more sub-domains in level 1,the scale and condition number of the system equations become smaller,and there is no need for assembling, condensing and solving for the level 2 subdomains. Therefore, the parallel solvers take advantage of reducing the total solving time. At the same time, HHPA realized the communication separation of inter-nodes and the communication within and among heterogeneous groups through the three-layer parallelization.As a consequence,it can noticeably reduce the time in solving the level 1 sub-domain and achieve better speedup and parallel efficiency under large-scale partitions.

    4.2 Case of a Cable-Stayed Bridge in Shock Wave

    A double-layer cable-stayed bridge system is used to perform dynamic structural analysis based on the calculating system of HHPA.The development of the double-layer cable-stayed bridge system is significant for solving the dual use of highways and light rails problem and improving traffic efficiency.The strict and high requirements for the safety and stability of the system make it a tough challenge for large-scale numerical modeling.The overall modeling of the finite element system needs to consider detailed modeling and be able to reflect the local response.The overall finite element model is shown in Fig.13,including the H main tower base,H main tower crown,transitional pier,assistant pier, girder, etc. After meshing with tetrahedral elements, this model is composed of 17,637,501 elements, 10,367,815 nodes and 21,835,902 DOFs. When under dangerous working conditions, the main loads usually come from the seismic wave and their own weight. Therefore, the structural dynamics analysis mainly considers the deformation and stress of the bridge under seismic loads and its gravity.Specifically,the seismic waves are the MEX natural waves,which are input in three directions of the bridge bottom simultaneously,and their overall amplitude is modulated proportionally.

    Figure 13:Whole finite element model of cable-stayed bridge

    In parallel computing,the operating number of nodes is 96,192,288,and 384,respectively.The calculation results of TPCA and HHPA proposed in this article are presented in Tables 4-5. The performance of parallel algorithm for cable-stayed bridge of the CDDA,TPCA,and HHPA are shown in Fig.14.

    Figure 14:Performance of parallel algorithms for cable-stayed bridge

    Table 4: Interface problem sizes and iteration for the cube model

    It can be seen from Table 5 that HHPA and TPCA almost obtain the same computing time for the interface equations. This is because the scales of the interface equations of the two methods are approximately the same. When under 96 nodes or 192 nodes, the total computing time of HHPA in this paper is higher than that of TPCA. This result is consistent with Miao’s [15] findings based on the parallel computation under a multicore environment.The main reason behind the phenomenon is when the partition scale is relatively small, the computing scale for the sub-domain is rather large, causing the bandwidth of every sub-domain system to be too large in turn. This raises the computing memory space requirements for each sub-system and accompanies a serious increment in the computing amount. As a consequence, it affects the parallel computing efficiency. On the contrary, when under 288 nodes or 384 nodes, our HHPA consumes less time than TPCA. This is because when the partition scale gets larger,the sub-domain scale will effectively shrink in computing,which will further cause a decrease in the bandwidth of every sub-domain system. Also, compared to the homogeneous groups under a multicore environment, the heterogeneous groups under the heterogeneous condition have superior computation ability,which can speed up the parallel solution for equations.In general,these factors make the total computation time shorter in level 1 equations based on HHPA than that time for the TPCA,considering the assembling,condensing,solving and back substituting for the level 2 equations.And therefore,when under a larger scale partition,HHPA proposed in this article owns a higher efficiency and speedup,which is shown in Fig.14.

    Table 5: Results of parallel computation for cable-stayed bridge

    The calculation results of HHPA with different parallel solvers on the solving of global interface equations are shown in Table 6.

    In Table 6, the global interface equations are solved by different parallel solvers SuperLU_D,KSPCG in PetSc and parallel PCG. It can be seen from the Table 6 that when using HHPA with SuperLU_D to solve the global interface equations, the solution time increases sharply with the increase of subdomains. Although the global interface equations adopt compressed sparse column technique for storage and the SuperLU_D is only applied in the lower triangle decomposition when conducting triangular decomposition, the global interface equations are still highly dense. Also, the triangular decomposition will further increase the density of the original equations.As a consequence,the sets and triangular decomposition require a large amount of memory.Besides,it also needs a lot of communication and calculation.With the increase of subdomain,the scale of the global interface equations also gets larger.Companies with more expense in storage,communication and computing,and take a longer time to solve the global interface equations. On the contrary, when utilizing the KSPCG and the parallel PCG, because of the use of the iterate method, the local communication involved in the equation parallel solution only exists between neighbor subdomains. Only a few dot product operations and computations for overall iterative errors need global communications.Thus, the KSPCG and the parallel PCG can achieve the calculation in a shorter computing time.Additionally,the solving time of parallel PCG is less than KSPCG.It is because all processes need to participate in global communication when solving global interface equations with KSPCG.And the parallel PCG only uses 1 MPI process within the same node.Hence,the overhead increment in interprocess communication and synchronization will greatly increase the solution time of global interface equations.

    Table 6: Results of parallel computation for cable-stayed bridge

    5 Conclusions

    (1) To lower the computational efficiency loss when solving systemically large-scale or ultra-largescale structure finite element problems with heterogeneous multicore distributed storage computers,the HHPA for finite element analysis is proposed based on research and understanding of CDDA, TPCA, parallel solver, and multilayer communication strategy architecture. This method not only reduces the scale of interface equations but also considerably saves computing time.Moreover,a three-layer parallelization has been applied successfully during the computational procedure to separate the communication of inter-nodes,heterogeneous-core-groups and inside-heterogeneous-core-groups.Thereby,this method largely increases communication efficiency.

    (2) The typical numerical example shows that when the scale of the partition is comparatively small, the parallel efficiency of this approach is better than CDDA but worse than TPCA.When the scale of the sub-domain is relatively large, this method can achieve a higher speedup and parallel efficiency than TPCA. The results of this paper can be provided as a reference for finite element structural analysis software operating on‘Shenwei’heterogeneous multicore processors and can be ported to other heterogeneous processors for large-scale parallel computing.It can also be provided as a practical reference for large equipment systems and the dynamics of complex engineering systems under fine modeling in parallel computing.

    Funding Statement:This work is supported by the National Natural Science Foundation of China(Grant No.11772192).

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    99热这里只有是精品在线观看| 五月玫瑰六月丁香| av在线天堂中文字幕| 狠狠狠狠99中文字幕| 久久久久网色| 国产一级毛片在线| 国产精品伦人一区二区| 天堂网av新在线| 亚洲av二区三区四区| 日韩欧美精品v在线| 久久久久久久午夜电影| 亚洲va在线va天堂va国产| 桃色一区二区三区在线观看| 亚洲精华国产精华液的使用体验| 日韩av不卡免费在线播放| 亚州av有码| 欧美成人一区二区免费高清观看| 99热这里只有精品一区| 亚洲国产精品合色在线| 中文欧美无线码| 国产成人精品婷婷| 日韩亚洲欧美综合| 在线播放国产精品三级| 人妻制服诱惑在线中文字幕| eeuss影院久久| 国产精品女同一区二区软件| 国产视频内射| 少妇的逼水好多| 午夜福利网站1000一区二区三区| 99久久中文字幕三级久久日本| av卡一久久| 午夜福利成人在线免费观看| 亚洲最大成人中文| 日韩中字成人| av在线播放精品| 又爽又黄无遮挡网站| 日韩欧美在线乱码| 亚洲精品日韩在线中文字幕| 亚洲内射少妇av| 国产亚洲精品久久久com| 国产乱人偷精品视频| 欧美不卡视频在线免费观看| www日本黄色视频网| 视频中文字幕在线观看| 久久草成人影院| 国产伦理片在线播放av一区| 自拍偷自拍亚洲精品老妇| 久久久精品94久久精品| 非洲黑人性xxxx精品又粗又长| av专区在线播放| 日本黄大片高清| 亚洲国产精品专区欧美| 美女内射精品一级片tv| 老师上课跳d突然被开到最大视频| .国产精品久久| 久久久久精品久久久久真实原创| av卡一久久| 欧美高清性xxxxhd video| 九草在线视频观看| 最新中文字幕久久久久| 久久韩国三级中文字幕| 国产在线一区二区三区精 | 纵有疾风起免费观看全集完整版 | 床上黄色一级片| 日本欧美国产在线视频| 99热这里只有是精品50| 亚洲中文字幕一区二区三区有码在线看| av免费在线看不卡| 99久国产av精品国产电影| 最近最新中文字幕大全电影3| 人体艺术视频欧美日本| 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 天天躁夜夜躁狠狠久久av| 免费电影在线观看免费观看| 日韩av在线大香蕉| av线在线观看网站| 国产av码专区亚洲av| 免费电影在线观看免费观看| 久久久色成人| 国产激情偷乱视频一区二区| 亚洲真实伦在线观看| 亚洲最大成人av| 午夜激情福利司机影院| 欧美日韩国产亚洲二区| 在线观看一区二区三区| 嫩草影院入口| 精品人妻一区二区三区麻豆| 免费观看在线日韩| 精品久久国产蜜桃| 成年免费大片在线观看| www.av在线官网国产| 男人舔女人下体高潮全视频| 午夜老司机福利剧场| 级片在线观看| 国产高清不卡午夜福利| 亚洲av福利一区| 精品一区二区三区视频在线| 国产又色又爽无遮挡免| 2021天堂中文幕一二区在线观| 国产精品女同一区二区软件| 亚洲av电影不卡..在线观看| 夫妻性生交免费视频一级片| 国产成年人精品一区二区| 精品一区二区三区视频在线| 国产午夜精品一二区理论片| 亚洲成人中文字幕在线播放| 六月丁香七月| 国产一区有黄有色的免费视频 | 日产精品乱码卡一卡2卡三| 欧美一区二区国产精品久久精品| 青春草国产在线视频| 国产伦在线观看视频一区| 成人三级黄色视频| 18禁在线无遮挡免费观看视频| 精品人妻一区二区三区麻豆| 国内精品美女久久久久久| 亚洲欧洲日产国产| 啦啦啦啦在线视频资源| 色综合色国产| 国产欧美另类精品又又久久亚洲欧美| 国产美女午夜福利| 两性午夜刺激爽爽歪歪视频在线观看| 热99在线观看视频| 久久久色成人| 91狼人影院| 精品久久久久久久久亚洲| 毛片一级片免费看久久久久| 天堂网av新在线| 91午夜精品亚洲一区二区三区| 青春草亚洲视频在线观看| 91精品国产九色| 国产白丝娇喘喷水9色精品| 国产精品一及| 亚洲成人久久爱视频| 91精品一卡2卡3卡4卡| 哪个播放器可以免费观看大片| 嫩草影院新地址| 非洲黑人性xxxx精品又粗又长| 午夜免费男女啪啪视频观看| 国产一区二区三区av在线| 99国产精品一区二区蜜桃av| 国产成人a区在线观看| 欧美精品一区二区大全| 综合色丁香网| 日本三级黄在线观看| 国产精品久久久久久久电影| 色哟哟·www| 久久99热6这里只有精品| 成人高潮视频无遮挡免费网站| 国产伦精品一区二区三区视频9| 午夜福利视频1000在线观看| 亚洲精华国产精华液的使用体验| 亚洲av成人精品一二三区| 一本久久精品| 日韩欧美在线乱码| 久久婷婷人人爽人人干人人爱| 性色avwww在线观看| 啦啦啦观看免费观看视频高清| 亚洲精品乱码久久久久久按摩| 在线观看美女被高潮喷水网站| 别揉我奶头 嗯啊视频| 你懂的网址亚洲精品在线观看 | 搞女人的毛片| 丰满人妻一区二区三区视频av| 国产淫片久久久久久久久| 国产亚洲精品av在线| 午夜a级毛片| 免费观看的影片在线观看| 看黄色毛片网站| av国产免费在线观看| 午夜老司机福利剧场| 免费一级毛片在线播放高清视频| 亚洲国产精品sss在线观看| av卡一久久| 国产探花在线观看一区二区| 美女大奶头视频| 欧美成人一区二区免费高清观看| 久久久久久久久中文| 婷婷色综合大香蕉| 1024手机看黄色片| 看免费成人av毛片| 麻豆国产97在线/欧美| 又爽又黄无遮挡网站| 美女脱内裤让男人舔精品视频| 床上黄色一级片| 嘟嘟电影网在线观看| 成人亚洲欧美一区二区av| 国产一区二区在线av高清观看| 女人被狂操c到高潮| 欧美最新免费一区二区三区| 丰满乱子伦码专区| 亚洲久久久久久中文字幕| 国产免费一级a男人的天堂| 青春草亚洲视频在线观看| 日本三级黄在线观看| 日韩欧美 国产精品| 综合色丁香网| 日韩视频在线欧美| 久久精品久久久久久噜噜老黄 | 国产亚洲5aaaaa淫片| 天天躁日日操中文字幕| 午夜视频国产福利| 精品一区二区免费观看| 国产视频内射| 国产成人精品久久久久久| 99九九线精品视频在线观看视频| 亚洲天堂国产精品一区在线| 国产午夜精品久久久久久一区二区三区| 18禁在线无遮挡免费观看视频| 天堂av国产一区二区熟女人妻| 免费观看a级毛片全部| 99久国产av精品国产电影| 欧美日韩国产亚洲二区| 我的女老师完整版在线观看| 99热全是精品| 久久久久久久久久久免费av| 国产欧美另类精品又又久久亚洲欧美| 免费看日本二区| 亚洲第一区二区三区不卡| 18禁在线播放成人免费| 国产高清三级在线| 欧美日韩精品成人综合77777| 欧美色视频一区免费| 熟妇人妻久久中文字幕3abv| АⅤ资源中文在线天堂| 精品久久久久久成人av| 久久99精品国语久久久| 成人无遮挡网站| 日韩在线高清观看一区二区三区| 亚洲18禁久久av| 久久久国产成人精品二区| 中文亚洲av片在线观看爽| 中文字幕熟女人妻在线| 国产免费又黄又爽又色| www日本黄色视频网| 国产片特级美女逼逼视频| 久久久久国产网址| a级毛片免费高清观看在线播放| 精品久久久久久成人av| 2021天堂中文幕一二区在线观| 男插女下体视频免费在线播放| 久久亚洲国产成人精品v| 日日撸夜夜添| 91精品国产九色| 欧美精品一区二区大全| 欧美xxxx黑人xx丫x性爽| 国产又色又爽无遮挡免| 中文精品一卡2卡3卡4更新| 欧美成人免费av一区二区三区| 边亲边吃奶的免费视频| 最近手机中文字幕大全| 亚洲色图av天堂| 久久久久久久午夜电影| 一级爰片在线观看| 成人国产麻豆网| 国产精品美女特级片免费视频播放器| 小说图片视频综合网站| 两个人视频免费观看高清| 亚洲丝袜综合中文字幕| 国产精品一及| 内地一区二区视频在线| 亚洲精品日韩av片在线观看| 久久人人爽人人片av| 九九在线视频观看精品| 99国产精品一区二区蜜桃av| 91狼人影院| 国产av不卡久久| 成年av动漫网址| 亚洲精品乱码久久久久久按摩| av国产久精品久网站免费入址| 日日摸夜夜添夜夜添av毛片| 国产高清视频在线观看网站| 高清在线视频一区二区三区 | 麻豆成人午夜福利视频| 在线天堂最新版资源| 视频中文字幕在线观看| 91精品一卡2卡3卡4卡| 国产精品一区二区三区四区久久| 尤物成人国产欧美一区二区三区| eeuss影院久久| 99热6这里只有精品| 亚洲电影在线观看av| 亚洲aⅴ乱码一区二区在线播放| 久久久欧美国产精品| 亚洲一级一片aⅴ在线观看| 又爽又黄无遮挡网站| 日本免费一区二区三区高清不卡| 国产精品嫩草影院av在线观看| 白带黄色成豆腐渣| 搡老妇女老女人老熟妇| 91精品一卡2卡3卡4卡| 亚洲av成人av| 又黄又爽又刺激的免费视频.| 久久精品久久久久久噜噜老黄 | 色综合亚洲欧美另类图片| 午夜日本视频在线| 一本久久精品| 国产精品一区二区在线观看99 | 卡戴珊不雅视频在线播放| 亚洲成av人片在线播放无| 亚洲18禁久久av| 直男gayav资源| 黄片无遮挡物在线观看| 三级国产精品片| 亚洲中文字幕一区二区三区有码在线看| 一个人观看的视频www高清免费观看| 日韩在线高清观看一区二区三区| 丰满乱子伦码专区| 麻豆一二三区av精品| 五月玫瑰六月丁香| 99热这里只有是精品50| 七月丁香在线播放| 天堂影院成人在线观看| 欧美bdsm另类| 秋霞在线观看毛片| 午夜福利高清视频| 中文乱码字字幕精品一区二区三区 | 亚洲在线观看片| 色综合亚洲欧美另类图片| 晚上一个人看的免费电影| 日韩人妻高清精品专区| 亚洲四区av| 国产精品一区二区在线观看99 | 男插女下体视频免费在线播放| 国产白丝娇喘喷水9色精品| 午夜亚洲福利在线播放| 国产免费男女视频| 久久99热这里只频精品6学生 | 国产高清视频在线观看网站| 国产成人精品一,二区| 国产成人免费观看mmmm| 中文字幕人妻熟人妻熟丝袜美| 欧美bdsm另类| 99久国产av精品| 久久99蜜桃精品久久| 最近的中文字幕免费完整| 午夜福利网站1000一区二区三区| 99在线视频只有这里精品首页| 欧美日韩一区二区视频在线观看视频在线 | 波野结衣二区三区在线| 99久久无色码亚洲精品果冻| 少妇高潮的动态图| 丝袜美腿在线中文| 白带黄色成豆腐渣| 好男人视频免费观看在线| 色综合色国产| av视频在线观看入口| 国产伦在线观看视频一区| 激情 狠狠 欧美| 天堂av国产一区二区熟女人妻| 国产国拍精品亚洲av在线观看| 国产色爽女视频免费观看| 日日撸夜夜添| 国产色爽女视频免费观看| 美女国产视频在线观看| 精品人妻一区二区三区麻豆| 在线免费观看的www视频| av黄色大香蕉| 国产精品一及| 国产成人精品一,二区| 97超视频在线观看视频| 久久久久久久亚洲中文字幕| av又黄又爽大尺度在线免费看 | 免费播放大片免费观看视频在线观看 | 亚洲无线观看免费| 视频中文字幕在线观看| 亚洲国产精品合色在线| 性插视频无遮挡在线免费观看| 黄色一级大片看看| 美女内射精品一级片tv| 中文亚洲av片在线观看爽| 插阴视频在线观看视频| 天堂中文最新版在线下载 | 美女黄网站色视频| 国产高清三级在线| 亚洲av免费在线观看| 观看免费一级毛片| 天堂√8在线中文| 亚洲在线自拍视频| 久久精品影院6| 联通29元200g的流量卡| 色哟哟·www| 国产激情偷乱视频一区二区| 舔av片在线| 久久亚洲国产成人精品v| 最近视频中文字幕2019在线8| 免费av观看视频| or卡值多少钱| 亚洲在久久综合| 免费观看性生交大片5| 天天一区二区日本电影三级| av卡一久久| 成人亚洲欧美一区二区av| 能在线免费看毛片的网站| av天堂中文字幕网| 毛片女人毛片| ponron亚洲| 亚洲人与动物交配视频| 国产精品福利在线免费观看| 国产精品蜜桃在线观看| 日韩制服骚丝袜av| 亚洲人成网站高清观看| 国国产精品蜜臀av免费| 亚洲无线观看免费| 一二三四中文在线观看免费高清| 精品不卡国产一区二区三区| 大香蕉97超碰在线| 天天躁夜夜躁狠狠久久av| 午夜激情福利司机影院| 国产精品无大码| 如何舔出高潮| 菩萨蛮人人尽说江南好唐韦庄 | 免费观看a级毛片全部| 中文字幕制服av| 在线观看一区二区三区| 国产亚洲av片在线观看秒播厂 | 中国国产av一级| 欧美人与善性xxx| 两个人视频免费观看高清| 白带黄色成豆腐渣| 男插女下体视频免费在线播放| 寂寞人妻少妇视频99o| 国产片特级美女逼逼视频| 中文字幕av成人在线电影| 99热全是精品| 少妇被粗大猛烈的视频| 国产久久久一区二区三区| 久久久久久久久大av| 亚洲在线自拍视频| 日韩强制内射视频| 人妻制服诱惑在线中文字幕| 丰满少妇做爰视频| 欧美精品一区二区大全| 99热这里只有是精品在线观看| 少妇人妻一区二区三区视频| 亚洲国产精品久久男人天堂| 亚洲18禁久久av| 麻豆一二三区av精品| 偷拍熟女少妇极品色| 黄色配什么色好看| 丰满少妇做爰视频| 午夜福利高清视频| 国产单亲对白刺激| 国产一区二区在线观看日韩| 99久国产av精品| videossex国产| 国产又色又爽无遮挡免| 一级二级三级毛片免费看| 精品国内亚洲2022精品成人| 高清日韩中文字幕在线| 免费观看精品视频网站| 久久精品影院6| 精品国产露脸久久av麻豆 | 成人av在线播放网站| 日韩av不卡免费在线播放| 久久精品久久久久久久性| 欧美bdsm另类| 国产精品麻豆人妻色哟哟久久 | 欧美xxxx性猛交bbbb| 亚洲成人精品中文字幕电影| 国产精品久久电影中文字幕| 国产精品一区二区三区四区久久| 欧美日本视频| 久久韩国三级中文字幕| 国产免费一级a男人的天堂| 秋霞在线观看毛片| 久久久久精品久久久久真实原创| 99热这里只有是精品50| 亚洲av电影在线观看一区二区三区 | 建设人人有责人人尽责人人享有的 | 一级毛片久久久久久久久女| 亚洲在久久综合| 美女内射精品一级片tv| 纵有疾风起免费观看全集完整版 | 亚洲精品乱码久久久v下载方式| 日韩三级伦理在线观看| 亚洲国产日韩欧美精品在线观看| 性插视频无遮挡在线免费观看| 寂寞人妻少妇视频99o| 久久人人爽人人片av| eeuss影院久久| 真实男女啪啪啪动态图| 久久久精品94久久精品| 色综合色国产| 久久久久网色| 精品人妻视频免费看| av黄色大香蕉| 九九热线精品视视频播放| 日本熟妇午夜| 国产亚洲av嫩草精品影院| 久久久午夜欧美精品| 女的被弄到高潮叫床怎么办| 精品午夜福利在线看| 中国美白少妇内射xxxbb| 久久韩国三级中文字幕| 纵有疾风起免费观看全集完整版 | 99久久成人亚洲精品观看| 长腿黑丝高跟| 亚洲,欧美,日韩| 国产免费视频播放在线视频 | 亚洲熟妇中文字幕五十中出| 赤兔流量卡办理| 69av精品久久久久久| 日本-黄色视频高清免费观看| 国产麻豆成人av免费视频| 美女大奶头视频| 久久人妻av系列| 国产视频首页在线观看| .国产精品久久| 精品人妻熟女av久视频| av免费观看日本| 国产一区有黄有色的免费视频 | 天天躁日日操中文字幕| 国内少妇人妻偷人精品xxx网站| 蜜桃亚洲精品一区二区三区| 免费电影在线观看免费观看| 久久亚洲国产成人精品v| 好男人视频免费观看在线| 精品人妻偷拍中文字幕| 精品久久久久久成人av| 国产黄a三级三级三级人| 麻豆成人av视频| 久久久久九九精品影院| 自拍偷自拍亚洲精品老妇| 尤物成人国产欧美一区二区三区| 久久久久久大精品| 在线观看66精品国产| 亚洲欧美一区二区三区国产| 亚洲av成人精品一二三区| 国产欧美日韩精品一区二区| 少妇裸体淫交视频免费看高清| 你懂的网址亚洲精品在线观看 | 菩萨蛮人人尽说江南好唐韦庄 | 亚洲精品乱久久久久久| 日韩在线高清观看一区二区三区| 成人无遮挡网站| 欧美又色又爽又黄视频| 精品久久久久久久久久久久久| 久久精品综合一区二区三区| 一个人看的www免费观看视频| 国产成人aa在线观看| 久久亚洲精品不卡| 小蜜桃在线观看免费完整版高清| 亚洲婷婷狠狠爱综合网| 婷婷色麻豆天堂久久 | 少妇的逼好多水| 黄色一级大片看看| 熟女人妻精品中文字幕| 在线观看一区二区三区| 午夜视频国产福利| 亚洲高清免费不卡视频| 亚洲激情五月婷婷啪啪| 精品久久久久久久久久久久久| 中文字幕熟女人妻在线| 在线天堂最新版资源| 欧美丝袜亚洲另类| 深夜a级毛片| 国产亚洲av片在线观看秒播厂 | 好男人视频免费观看在线| 神马国产精品三级电影在线观看| 精品99又大又爽又粗少妇毛片| 久久久久久国产a免费观看| 国语自产精品视频在线第100页| 三级毛片av免费| 亚洲av男天堂| 国产伦在线观看视频一区| 亚洲四区av| 日韩中字成人| 成人午夜精彩视频在线观看| 国模一区二区三区四区视频| 亚洲真实伦在线观看| 真实男女啪啪啪动态图| 精品免费久久久久久久清纯| 边亲边吃奶的免费视频| 欧美成人午夜免费资源| 激情 狠狠 欧美| 久久精品91蜜桃| 免费看日本二区| 麻豆国产97在线/欧美| 免费av不卡在线播放| 久久99热6这里只有精品| 亚洲自偷自拍三级| 国产精品一区二区三区四区免费观看| 乱系列少妇在线播放| 久久久成人免费电影| 中文资源天堂在线| 国产精品熟女久久久久浪| 欧美bdsm另类| 久久久欧美国产精品| 男插女下体视频免费在线播放| 亚洲精品日韩av片在线观看| 99久久精品一区二区三区| 亚洲婷婷狠狠爱综合网| av播播在线观看一区| 青春草视频在线免费观看| 嫩草影院入口| 黄片无遮挡物在线观看| 国产淫语在线视频| 欧美xxxx黑人xx丫x性爽| 亚洲成人精品中文字幕电影| 久久精品熟女亚洲av麻豆精品 | 中文乱码字字幕精品一区二区三区 | 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 99热6这里只有精品| 亚洲精品乱久久久久久| 国产毛片a区久久久久| 精品无人区乱码1区二区| 亚州av有码| 免费无遮挡裸体视频| 秋霞在线观看毛片| 亚洲av福利一区| 一个人看的www免费观看视频| 久久99热6这里只有精品| АⅤ资源中文在线天堂| 一级黄色大片毛片|