• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Multilevel Hierarchical Parallel Algorithm for Large-Scale Finite Element Modal Analysis

    2023-10-26 13:13:16GaoyuanYuYunfengLouHangDongJunjieLiandXianlongJin
    Computers Materials&Continua 2023年9期

    Gaoyuan Yu ,Yunfeng Lou ,Hang Dong ,Junjie Li and Xianlong Jin,?

    1School of Mechanical Engineering,Shanghai Jiao Tong University,Shanghai,200240,China

    2Department of Structural System,Aerospace System Engineering Shanghai,Shanghai,201108,China

    3School of Aerospace,Mechanical and Mechatronic Engineering,The University of Sydney,Sydney,NSW 2006,Australia

    ABSTRACT The strict and high-standard requirements for the safety and stability of major engineering systems make it a tough challenge for large-scale finite element modal analysis.At the same time,realizing the systematic analysis of the entire large structure of these engineering systems is extremely meaningful in practice.This article proposes a multilevel hierarchical parallel algorithm for large-scale finite element modal analysis to reduce the parallel computational efficiency loss when using heterogeneous multicore distributed storage computers in solving large-scale finite element modal analysis.Based on two-level partitioning and four-transformation strategies,the proposed algorithm not only improves the memory access rate through the sparsely distributed storage of a large amount of data but also reduces the solution time by reducing the scale of the generalized characteristic equation (GCEs).Moreover,a multilevel hierarchical parallelization approach is introduced during the computational procedure to enable the separation of the communication of inter-nodes,intra-nodes,heterogeneous core groups(HCGs),and inside HCGs through mapping computing tasks to various hardware layers.This method can efficiently achieve load balancing at different layers and significantly improve the communication rate through hierarchical communication.Therefore,it can enhance the efficiency of parallel computing of large-scale finite element modal analysis by fully exploiting the architecture characteristics of heterogeneous multicore clusters.Finally,typical numerical experiments were used to validate the correctness and efficiency of the proposed method.Then a parallel modal analysis example of the cross-river tunnel with over ten million degrees of freedom(DOFs)was performed,and ten-thousand core processors were applied to verify the feasibility of the algorithm.

    KEYWORDS Heterogeneous multicore;multilevel hierarchical parallel;load balancing;large-scale modal analysis

    1 Introduction

    Continuous development and research in engineering and scientific technology have brought the appearance of complex equipment and large or ultra-large systems [1,2].The strict requirements of these systems in terms of safety and stability bring severe challenges to the numerical simulation of their dynamic system performance,which will result in large-scale DOFs in their finite element systems and make them difficult to solve[3–5].Specifically,modal analysis is the most time-consuming link among all calculation processes of these complex dynamic systems and is also the basis for other calculations.High-precision modal analysis requires the help of large-scale finite element models,but these problems usually cannot get satisfactory solutions on traditional serial computers[6,7].As a consequence,the research and development of corresponding large-scale modal analysis parallel algorithms can provide a feasible method for solving such problems on the basis of parallel supercomputers.

    The mathematical essence of modal analysis can be eventuated to the large sparse matrix generalized eigenvalue problem.The solutions to these kinds of problems are often based on subspace projection techniques,which mainly include the Davidson subspace method and the Krylov subspace method.The Davidson subspace method is primarily applied in solving the eigenvalue of diagonally dominant symmetric matrices,but the applicability of this method is not as good as the Krylov subspace method [8,9].The Krylov subspace method can be traced back to the 1950s with the proposed algorithms,including the Lanczos algorithm [10],the Arnoldi algorithm [11],the Krylov-Schur algorithm[12],etc.Later,worldwide researchers conducted a series of explorations on parallel algorithms of modal analysis based on the Lanczos algorithm,the Arnoldi algorithm,the Krylov-Schur algorithm,etc.The research mainly focused on two main types of parallel computing methods.One begins from the most time-consuming linear equations to seek the high-efficiency parallel computing method,and the other is from the finite element problem’s parallelism to form the modal synthesis method[13,14].The direct and iterative approaches are the two fundamental algorithms for the parallel model solution of modal analysis.In terms of parallel computing of modal synthesis,researchers have combined direct and iterative methods.They started from the parallelism of the finite element problem itself,dividing the complex modal analysis problems into smaller sub-modal tasks for parallel processing.The direct method is used to solve the equations in sub-modal problems.The systemlevel modal GCEs after condensation are obtained through two modal coordinate transformations to reduce its overall scale.Then the iterative scheme is applied to solve the linear equations involved in the system-level modal GCEs after condensation.Therefore,the advantages of the direct method and iterative method can be well utilized to improve parallel efficiency,and thus it has been widely employed in the field of structural modal parallel computing.Rong et al.[15]designed a hybrid parallel solution for large-scale eigenvalues based on the modal synthesis algorithm(MSA)and transfer matrix method.Later,this method was applied to a vacuum vessel’s large-scale parallel modal analysis.Heng et al.[16]delivered a cantilever beam’s large-scale parallel modal solution on a shared memory parallel computer based on MSA.Aoyama et al.[17]improved the coupling method in the interfaces of each subsystem in MSA,and they successfully conducted the large-scale parallel modal analysis on a rectangular plate.Parik et al.[18]used the MSA to complete the design of the parallel modal analysis computing system based on OpenMP and utilized it in calculating the parallel modal analysis of a shaft system.Cui et al.[19]developed a simultaneous iterative procedure for the fixed-interface MSA,which has such merits as high computational efficiency,especially for reanalysis and parallel programming.However,when solving large-scale problems with the parallel modal synthesis approach(PMSA),with the increment in the number of subdomains,the scale and conditions of the system-level modal GCEs obtained after condensation also increase sharply,making it difficult to work out.Additionally,all processes need to participate in global communication when solving system-level modal GCEs with the iterative method.The overhead increment in inter-process communication and synchronization will greatly reduce the parallel efficiency.

    In terms of hardware,the most widely used parallel computer in scientific and engineering computing is the distributed memory parallel computer,which architectures mainly include pure Central Processing Unit (CPU) homogeneous supercomputers [20] and heterogeneous supercomputers [21].Particularly,heterogeneous supercomputers have become the mainstream architecture in distributed memory parallel computers due to their excellent computing power and cost-to-performance effectiveness.For heterogeneous distributed storage parallel computers,the most important parts are the different storage mechanisms in the cluster environment,the communication and cooperation between processors,and the load balance of the hardware architecture at all levels.These factors can also affect parallel efficiency significantly.Therefore,the keys to improving the parallel efficiency of heterogeneous distributed memory parallel computers are to balance the load between computing tasks and the hardware topology architecture of heterogeneous clusters,to ensure the storage of largescale data,and to deal with the communication and cooperation between processors properly.

    This paper focuses on providing a multilevel hierarchical parallel modal synthesis algorithm(MHPMSA) that is aware of the characteristics of heterogeneous multicore distributed storage computers and fully exploits their computing power to achieve optimal performance.The proposed algorithm,based on two-level partitioning and four transformation strategies,not only realizes the sparsely distributed storage of a large amount of data and speeds up the access rate of data memory but also reduces the scale of general GCEs after condensation effectively and saves the equation solution time.Moreover,this algorithm achieves a three-layer parallelization by utilizing the mapping between computational tasks and the hardware architecture of heterogeneous distributed storage parallel computers.And it can improve the load balance between different layers and accelerate communication efficiency by separating the communication between inter-nodes and intra-nodes,as well as the communication between HCGs and inside HCGs.There are various types of finite element methods,and this article targets the work of using heterogeneous distributed storage parallel computers to solve large-scale modal problems.It can be ported as a reference for other types of largescale structural problems.

    The rest of this paper is organized as follows:Section 2 introduces the related works of CPMSA to solve the large-scale modal analysis.Then,Section 3 describes the MHPMSA proposed with twolevel partitioning,four-transformation strategies,and the implementation of MHPMSA with the best respect for the architecture of the ‘Shenwei’heterogeneous multicore processor.In Section 4,two numerical experiments are presented.Finally,conclusions are drawn in Section 5.

    2 Related Works

    MSA can be divided into three categories according to the different processing methods of the DOFs of the substructure system interface: the free interface MSA [22],the fixed interface MSA[23],and the hybrid interface MSA[24],as shown in Fig.1.In the MSA solution procedure,the loworder modal information of the substructure is used to represent the high-order modal information.When applying the free interface and the hybrid interface MSA,the existence of rigid body modes will cause the formation of singular stiffness matrices and cannot conduct matrix inversion.The fixed-interface MSA was first proposed by Tian et al.[25].Its substructure modes include rigid body modes,constraint modes,and low-order modes of the fixed-interface substructure.Later,Craig and Monjaraz Tec et al.[26]improved the fixed interface MSA by controlling the additional constraints of the interface of the substructure system.Since the rigid body mode no longer needs to be considered in this method,the calculation process is simpler and has been widely used.

    Figure 1:Three categories of MSA

    The mathematical description of modal analysis is[27]:

    In this equation,Kis the stiffness matrix of the overall modal system;Mis the mass matrix of the overall modal system;λis the generalized eigenvalue of the overall modal system;?is the corresponding mode shape vector.KandMare all large sparse,symmetric positive (semi)definite matrices and can be obtained through the finite element discretization and integration of the engineering system structures.The essence of modal analysis is to solve multiple low-order eigenpairs of Eq.(1).

    The finite element mesh is divided once in the conventional parallel modal synthesis algorithm(CPMSA),as shown in Fig.2.There are two meshing algorithms:1D algorithm and 2D algorithm.Compared with the 1D algorithm,the 2D algorithm is more suitable for models with extremely large DOFs or with complicated configurations[16,28].According to the numbering principle,the generated m sub-regions first settle the internal DOFs for the I set and then settle the boundary DOFsBset.The mass matrixMsuband the stiffness matrixKsubof each sub-domain system can be expressed as:

    Figure 2:Partitioning and transformation of CPMSA

    When performing the first coordinate transformation,the relationship between the physical coordinates of each sub-domain system and the modal coordinates can be expressed as:

    In the formula,UIandUBrepresent the displacements of the internal nodes and boundary nodes,respectively.[IBB] is the identity matrix,[0BK] is a zero matrix,{pk} is thekmodal coordinates corresponding to the internal nodes,and{pB}is thelmodal coordinates corresponding to the boundary nodes.[Φ]is the modal transformation matrix of each substructure,and[ΦIk]is the principle modal matrix formed by internal nodes.(KII,MII) can construct GCEs and take the eigenvector group formation corresponding to the firstklow-order eigenvalues,as Eq.(4)shows.[ΦIk]is the constrained mode and can be calculated as Eq.(5).

    The second coordinate transformation is performed as shown below by utilizing the displacement coordination conditions among the sub-domain systems:

    Transform the equivalent stiffness matrix and the equivalent mass matrix after simultaneous all sub-domains,and we can get the reduced GCEs for the whole system:

    The reduced stiffness matrix[K?]and the reduced mass matrix[M?]are:

    Employing the parallel Krylov subspace iteration method to solve Eq.(10)can get the low-order modal frequencies and mode shapes required by the overall system,and substituting it into Eq.(3)can obtain the mode shapes of each sub-structure.

    3 Proposed MHPMSA Based on Shenwei Multicore Processor Architecture

    MHPMSA,based on two-level partitioning and four-transformation strategies and sparsely distributed data storage,not only improves the load balancing of different levels through parallel task mapping but also realizes communication separation and effectively improves communication efficiency.Furthermore,it reduces the scale of the system-level modal generalized characteristic and saves the iterative convergence time.

    3.1 Proposed MHPMSA

    To reduce time in condensing sub-domains,assembling and solving the system-level modal GCEs in sub-domains level 1 with the increasing number of sub-tasks,MHPMSA war proposed on the basis of two-level partitioning and four-transformation strategies.

    In the two-level partitioning,the initial mesh of finite element is first partitioned intoplevel 1 subdomains with the 2D algorithm on the basis of ParMetis [29].Then,every level 1 subdomain is divided intomlevel 2 subdomains on the basis of the characteristics of heterogeneous multicore architecture,which adapts to the heterogeneous multicore distributed storage supercomputer,‘Shenwei-Taihuzhiguang’.Wherepshould be the total number of starting node machines in parallel computing,andmrepresents the number of HCGs in a single node machine,which is 4.For example,as shown in Fig.3,ifpis equal to 2,then the meshing of the finite element will be divided into two level 1 sub-domains at first,and then every level 1 subdomain is partitioned into four level 2 subdomains again.

    Figure 3:Two-level partitioning

    The four-transformation recognizes the transformation process by successfully applying Eqs.(3)and(7)on levels 1 and 2 subdomains,as shown in Fig.4.Firstly,it will form the mass and stiffness matrixes for every level 2 subdomain.Then,throughout the first coordinate transformation,it will constitute the GCEs that only contain internal freedom.Later,the equivalent stiffness and mass matrices are calculated by condensation in each subdomain.Then,by grouping all equivalent stiffness and mass matrices of level 2 subdomains within the same level 1 subdomain and applying the second coordinate transformation can gain the GCEs of level 1 subdomains that only contain independent coordinates.After series condensations,grouping,and coordinates transformations according to Eqs.(7)~(11),we can get the GCEs of the whole system only involving independent coordinates.Then,global GCEs can be solved by the parallel Krylov subspace iteration method to determine the loworder modal frequencies and mode shapes required by the overall system.Finally,modal frequencies and mode shapes of each sub-structure can be calculated by parallel back substituting according to Eq.(3).The differences between the different approaches CPMSA and MHPMSA are shown in Table 1.

    Figure 4:Four transformations

    Table 1:Differences between the different approaches CPMSA and MHPMSA

    3.2 Architecture and Execution Mode of Shenwei Multicore Processor

    Heterogeneous supercomputers are usually equipped with multicore CPUs,Graphics Processing Unit (GPUs),Many Integrated Cores (MIC) processors,or heterogeneous groups to calculate the computational tasks together.For example,the APU project by AMD company [30],the Echolen project led by NVIDIA company [31,32],the Runnemede project conducted by Intel company,and Wuxi-Hending company’s project named‘Shenwei-Taihuzhiguang’[1]in this article,etc.

    Because there are a wide variety of heterogeneous multicore supercomputers,this paper mainly focuses on the parallel solution of large-scale modal problems using the ‘Shenwei-Taihuzhiguang’supercomputer to analyze and solve the problem in a targeted manner.‘Shenwei-Taihuzhiguang supercomputer is built based on‘Shenwei’multicore processor architecture,as shown in Fig.5.There are four HCGs in every Shenwei heterogeneous multicore processor,and all four of these HCG shares 32 GB memory.Every HCG owns one main core (computational control core) and 64 computing cores(computing core array).To ensure the implementation of MHPMSA with the best respect for the architecture of the‘Shenwei’heterogeneous multicore processor,each node should make full use of all four HCGs in parallel computing.And every HCG should make full use of the main core and 64 computing cores.Besides,all of ‘Shenwei’multicore processors are not in any particular order of importance.The principle of network among‘Shenwei’multicore processors adopts peer-topeer network communication protocol.Hence,the number of‘Shenwei’multicore processors can be arbitrarily selected without consideration of the parallel efficiency and the storage of large-scale data.

    Figure 5:Shenwei multicore processor architecture

    The communication between the HCGs adopts the bidirectional 14 G bits/s bandwidth.And the communication between the computational control and the computing core adopts the DMA method to get bulk access to the main memory.The local storage space of the computing core is 64 KB,and the command storage space is 16 KB.The features of its communication are:the communication efficiency of the intra-node is much higher than that of the inter-node.Similarly,the communication rate within the same HCG is much greater than that between the HCGs.

    If anything went wrong with their affairs, Peter had to bear the blame and put things right for them, and he had to endure all this ill-treatment because he was weak and delicate and couldn t defend himself against his stronger brothers

    3.3 Parallel Task Mapping

    Considering the features of communication on Shenwei multicore processor architecture,the keys to improving the parallel efficiency of large-scale finite element modal analysis of heterogeneous distributed memory parallel computers are: (1) to balance the load between computing tasks and the hardware topology architecture of heterogeneous clusters;(2)to ensure the storage of large-scale data and(3)to deal with the communication and cooperation between processors properly.Through mapping,the computing tasks to different hardware layers of heterogeneous multicore supercomputers can achieve load balancing among different layers and accomplish efficacious partitioning in communication[33].

    Based on the CPMSA and considering the hardware architecture of the heterogenous multicore supercomputer ‘Shenwei-Taihuzhiguang’,we formed the task mapping of MHPMSA for structural large-scale modal analysis,as shown in Fig.6.

    When performing task mapping,the level 1 mesh subdomains are mapped according to the order of nodes,and the level 2 mesh subdomains are mapped according to the heterogeneous groups within nodes.The floating-point operations in each HCG are mapped in agreement with the computing cores.

    ‘Shenwei’heterogeneous multicore processor is configured with 8G private memory and can be accessed independently.Therefore,to speed up the memory access rate during the calculation,the data and information of each subdomain are stored in its corresponding HCG within the node machine through multiple file streams.Besides,compared to the CPMSA,the MHPMSA further reduces the scale of the overall GCEs of the system and accelerates its iterative convergence speed with two-level partitioning and four transformation strategies.

    Figure 6:Task mapping of MHPMSA for structural large-scale modal analysis

    3.4 The Three-Layer Parallelization Computational Mechanism

    Based on the communication characteristic of heterogeneous multicore distributed storage parallel computers,the key to improving communication efficiency lies in realizing the communication separation between inter-nodes and intra-nodes,also the communication between different HCGs and inside HCGs when controlling the communication cooperation.As well as reduce the communication and synchronization overheads generated in the whole solution procedure of the system.That is,according to the communication architecture of the heterogeneous multicore distributed parallel storage computer,to restrict the large amount of local communication within the intra-nodes.And minimize the global communication in inter-nodes at the same time.The MHPMSA just satisfies these conditions.As shown in Fig.7,it realizes the three-layer parallelization computation based on the twolevel partitioning and four-transformation approach mentioned before.

    Figure 7:Scheme of three-layer parallelization for large-scale finite element modal analysis

    First-layer parallelization:In the first-layer parallelization,themprocesses of each node are responsible for the processing procedure corresponding to the level 2 subdomain system.All processes operate synchronously,and there is no data interaction between different processes.In the process,data interaction exists between the computing control core and the computing core array.The maximum amount of data that can be used in one data interaction between the computing control core and a single computing core is 64 KB.The data processing processes include: the computing control core of each HCG reading the model data of the subdomain;each computing control core carrying out the calculations corresponding to the computing core array to form the equivalent stiffness matrixKsuband an equivalent mass matrixMsubof their subdomain;applying the parallel Lanczos method to solve the equivalent stiffness matrixand equivalent mass matrix,which are calculated through level 2 GCEs(KII,MII)after condensation,and solving the mode shapes of level 2 subdomain though back substituting.Since these computing processes are operated based on unit information,each process can be performed independently according to the system information of the level 2 subdomain.Therefore,there is no need for data to interact between processes,and the data interactions only exist between the computing control core and the computing core array.

    To save memory space and to reduce the amount of calculation,the local overall stiffness matrix and mass matrix of each level 2 subdomain are stored with the compressed sparse column technique.The matrix-vector operations involved in Eqs.(2)~(9),mainly including addition,subtraction,multiplication,and division,are performed with the Athread library.Take the vector multiplicationa=b×cas an example(a,b,andcare the storage arrays in any matrix-vector operation procedure);its operation is shown in Algorithm 1.The 64 computing cores on each HCG read the corresponding data from its memory space synchronously and cyclically.The required memory in this data segment should be less than 64 kb.After calculation,the data will return to specific locations.Communication only exists between the computing control and the computing core in the process.

    Second-layer parallelization:Each node machine simultaneously performs the set of the corresponding level 1 subdomain,the parallel condensation,and back substituting.Communication exists between different HCGs within the same node and in each HCG between the control core and the compute core.The data in the calculation process adopts the compressed sparse column technique for distributed storage.During calculation,the No.0 HCG in each node machine is responsible for summarizing the results and the internal back substituting and solutions of the set,data distribution,and parallel consideration.At the same time,all HCGs in the same node machine will participate in the parallel computational procedure corresponding to the level 1 subdomain within every node machine.The parallel Lanczos method is applied to solve the GCEs(KII,MII)of the level 1 subdomain.And the linear equations involved in the procedure are preprocessed through the parallel modified Cholesky decomposition(PMCD),as shown in Algorithm 2.

    Figure 8:Parallel Krylov algorithm and task mapping

    In summary,MHPMSA can confine a considerable amount of local communication within every node machine while ensuring that every node machine has one HCG,which is the No.0 HCG,to participate in global communication.This strategy realizes the communication separation of intranode and inter-node,reducing the communication and cooperation overheads in the computational procedure.As a consequence,the MHPMSA can significantly improve communication efficiency.

    4 Numerical Study and Discussion

    4.1 Numerical Experiments

    The‘Shenwei-Taihuzhiguang’supercomputer is used to verify the correctness and effectiveness of the proposed algorithm.Each node starts using four HCGs during the test.And every HCG starts using one main core and 64 computing cores with 8 GB memory.The proposed algorithm and the CPMSA are used in the modal analysis of a disc brake rotor in an ultra-deep drilling rig.The FEM mesh model is shown in Fig.9.The elastic modulus assigned to the model is 210 GPa,and the density is 7800 kg/m3,and the Poisson’s ratio of 0.3.Applying different DOFs scales and setting fixed constraints on the 8 bolt-hole positions on the inner surface.The structural first 20 natural frequency results are calculated and compared with the results computed by the Lanczos algorithm[10,34].The test scales are shown in Table 2,and the maximum relative errors of the first 20 natural frequencies of each test scale are calculated by Eq.(12),and the results are shown in Fig.10.

    Table 2:Testing cases for the main structure of the rotor on the brake system

    Figure 10:The modal relative errors for different DOFs

    To reveal the necessity of large-scale modal calculation of complex systems,we conduct comparative analyses of the modal frequency results of Case1~Case4.Table 3 shows the comparison of modal frequencies under four different scales.And the source of the original experimental data of Table 3 is the modal frequency for different scales of the rotor on the brake system,which is calculated by the Lanczos algorithm.

    From Table 3,it can be seen that with the increase in the calculation scale,the modal frequency of each order will gradually decrease.This pheromone is caused by the‘hardening effect’of stiffness matrices in complex systematic finite element analysis,and it will cause a higher modal frequency in a relatively small DOFs calculation scale.Compared with the modal frequencies of Case1~Case3,the maximum rate of change is 4.68%,while for the modal frequencies of Case3 to Case4,the rates of change are much lower,indicating that for the finite element analysis of complex systems,it is necessary to increase the corresponding calculation scale to improve its accuracy.

    For a special problem,if the available nodes for parallel computing range fromitoj(0

    The MHPMSA proposed in this article and the CPMSA are tested by starting the corresponding number of node machines.With the consideration of features of the‘shenwei’multicore processors and the storage of large-scale data,the total number of startup node machines in sequence during the test is 16,32,64,and 128.Since the total number of level 1 subdomains in two-level partitioning should be equal to the total number of start node machines,the number of corresponding level 1 domains should be 16,32,64,and 128 as well.Every node of‘Shenwei-Taihuzhiguang’supercomputer contains 4 HCGs,and thus in the second partition,every level 1 domain will be divided into 4 independent level 2 domains.The parallel calculation results of Case2 to Case4 are shown in Tables 4~6.The source of the original experimental data of Tables 4~6 is the solution time for different scales,which is calculated by CPMSA and MHPMSA with different numbers of node machines.Then,the speedup and parallel efficiency can be calculated by Eqs.(13)and(14).

    Table 4:Results of parallel computation for Case2

    In Tables 4~6,the total parallel computing time starts from calculating the subdomains and ends when the mode shape has been calculated for every subdomain.The solution time for level 1 subdomains includes:the time to form the equivalent stiffness matrix and the equivalent mass matrix of the level 1 subdomain,the time for condensation and reduction of the level 1 subdomain,the time for parallel solutions of the equations,and the time to back substitute level 1 subdomain modal coordinates.In Table 4,the global system equations are solved by parallel solvers SuperLU_D [35].In Tables 5 and 6,the global system equations are solved by parallel PCG.Based on the results in Tables 4~6,it can be observed that when using MHPMSA proposed in this article,we can obtain a higher acceleration ratio and parallel efficiency than using the CPMSA.This is because when using the CPMSA,with an increasing number of subdomains,the scale of GCEs and conditions also expands rapidly,resulting in a significant increase in solution time and thereby seriously affecting the overall parallel efficiency of the system.From a mathematical point of view,the MHPMSA method solution of the level 1 subdomain essentially solves the system’s GCEs with the same scale as the CPMSA and greatly shortens the solution time.For example,in Table 6,the solution time for the level 1 subdomain is 732.1 s with 512 computing cores,which saves 1289.1 s compared to the CPMSA.The main reason behind that is the MHPMSA,based on the two-level partitioning and four-transformation strategies,improves communication efficiency by separating the communication between inter-node and intranode,together with the communication between HCGs and inside HCGs.Besides,it further reduces the system equation scale after condensation and accelerates its computation and iteration convergence rate.As a result,it can effectively save the solution time for the level 1 subdomain and obtain a satisfactory speedup ratio and parallel efficiency.Also,it can be observed that when using parallel PCG in the solving of global system equations,we can obtain a higher acceleration ratio and parallel efficiency,compared with SuperLU_D.This is because SuperLU_D will further increase the density of the original equations.As a consequence,the sets and triangular decomposition require a large amount of memory.Besides,it also needs a lot of communication and calculation.

    Table 5:Results of parallel computation for Case3

    The performance of the parallel algorithm for different test scales is shown in Fig.11,according to Tables 5 and 6.It can be observed that the performance of the parallel algorithm of CPMSA and MHPMSA can be improved with the increase of the test scales.This is because the proportion of communication decreases with the increase of the test scales compared to the proportion of calculation.When the test scale is relatively large,MHPMSA can obtain a higher acceleration ratio and parallel efficiency than using the CPMSA.Hence,the proposed method showed good scalability with respect to the test scale.

    Figure 11:Performance of parallel algorithm for different test scale

    4.2 A Typical Application of Cross-River Tunnel System

    In real engineering applications,sometimes,complex engineering structures can contain multiple types of elements.In order to test the parallel efficiency of complex engineering systems under the scale of tens of millions of DOFs in multi-element hybrid modeling,a cross-river tunnel,as shown in Fig.12,is taken as an example for analysis.This model has 2,896,781 solid elements,186,121 beam elements,21,685 mass elements,1,012,581,369 non-zero elements of stiffness matrix,412 average bandwidth and 13,167,203 DOFs.And we aim to solve its first 20 natural modal frequencies.

    In parallel computation,the total number of startup nodes is 32,64,128,and 256 in sequence with considering features of the ‘shenwei’multicore processors and the storage of large-scale data.Each node starts using four HCGs during the test.And every HCG starts using one main core and 64 computing cores with 8 GB memory.The calculation results of the CPMSA and the MHPMSA proposed in this paper are shown in Table 7.The source of the original experimental data of Table 6 is the solution time,calculated by CPMSA and MHPMSA with different numbers of node machines.Then,the speedup and parallel efficiency can be calculated by Eqs.(13)and(14).

    It can be seen from Table 7 that when using CPMSA to solve the overall system equations by the direct method,the solution time of the overall system equation increases sharply with the increase of subdomains,thus seriously reducing the parallel efficiency of the system.Although the overall systemlevel modal generalized characteristics adopt a compressed sparse column technique for storage,when conducting triangular decomposition,the PMCD is only applied in the lower triangle decomposition.The overall interface equation is still highly dense,and the triangular decomposition will further increase the density of the original equations.As a consequence,the sets and triangular decomposition require a large amount of memory.Besides,it also needs a lot of communication and calculation.With the increase of subdomain,the scale of the GCEs of the system after the overall reduction also gets larger.Companies with more expense in storage,communication and computing,and take a longer time for the overall solution of the system.On the contrary,when utilizing the MHPMSA,because of the use of the iterate method,there is no need to form the GCEs of the system after reduction.Moreover,the local communication involved in the equation parallel solution only exists between neighbour subdomains.As mentioned before,only a few dot product operations and computations for overall iterative errors need global communications.Thus,the proposed algorithm can achieve a better speedup ratio and parallel efficiency in a shorter computing time.

    Figure 12:FEM of the cross-river tunnel structure system

    Table 7:Results of parallel computation for over-river tunnel

    The calculation results of MHPMSA with different parallel solver on the solving of global GCEs are shown in Table 8.

    Table 8:Results of parallel computation for over-river tunnel

    In Table 8,the global GCEs are solved by different parallel solvers SuperLU_D,KSPCG in PteSc and parallel PCG.It can be seen from the Table 8,the solution time of GCEs with KSPCG and parallel PCG can achieve the calculation in a shorter computing time,compared with SuperLU_D.This is because the local communication involved in the equation parallel solution only exists between neighbor subdomains during the calculation of global GCEs with KSPCG and paralle PCG.Only a few dot product operations and computations for overall iterative errors need global communications.Additionally,the solving time of KSPCG is more than parallel PCG.It is because only 1 MPI process within the node need to participate in global communication when solving global GCEs with parallel PCG.However,the KSPCG uses all MPI processes within the same node.Hence,the increment in inter-process communication and synchronization will greatly increase the solution time of global GCEs.

    5 Conclusion

    With the increase of the size and complexity of numerical simulation problems,heterogeneous supercomputers are becoming more and more popular in the high-performance computing field.However,it is still an open question how current applications can exploit the capabilities of parallel calculation with the best respect for the features of the architecture and execution mode of such heterogeneous systems.The main contribution of this paper is to provide a MHPMSA that is aware of the characteristics of‘shenwei’heterogeneous multicore processors and fully exploits their computing power to achieve optimal performance.

    MHPMSA is on the basis of two-level partitioning and four transformation strategies.To match the features of ‘shenwei’heterogeneous multicore processors,computing tasks of large-scale modal analysis are divided into 3 layers:inter-nodes,intra-nodes,HCGs,and inside HCGs.Through mapping computing tasks to different hardware layers of ‘shenwei’heterogeneous multicore supercomputers can achieve load balancing among different layers and accomplish efficacious partitioning in communication.Then,MHPMSA not only realizes the sparsely distributed storage of a large amount of data and speeds up the access rate of data memory,but also reduces the scale of general GCEs after condensation effectively and saves the equation solution time.Finally,the typical numerical example shows that when compared with the CPMSA,the proposed method in this article can gain a higher speedup ratio and parallel efficiency.When under multiple work conditions,compared with the direct method of reduced GCEs of the whole system,this method can obtain a higher speedup ratio and parallel efficiency as well.

    Although the authors’current research only focuses on the large-scale modal analysis,MHPMSA is a general tool for solving many kinds of structural analysis problems,including linear dynamic analysis and nonlinear dynamic analysis,and so on.Hence,the research outcome of this paper can be provided as a reference for improving large-scale parallel computation efficiency and can be ported to other structural finite element analysis software for various analyses on Shenwei heterogeneous multicore processors or other heterogeneous multicore processors.It can also be used as a practical example and reference for large-scale equipment systems and complex engineering systems with finite modeling in parallel computation.

    Acknowledgement:Not applicable.

    Funding Statement:This work is supported by the National Natural Science Foundation of China(Grant No.11772192).

    Author Contributions:Gaoyuan Yu:Methodology,Software,Validation,Formal Analysis,Investigation,Writing-Original Draft.Yunfeng Lou: Methodology,Resources.Hang Dong: Writing-Review&Editing.Junjie Li:Methodology,Software.Xianlong Jin:Conceptualization,Supervision,Writing-Review&Editing.

    Availability of Data and Materials:The raw data required to reproduce these findings cannot be shared at this time as the data also forms part of ongoing studies.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    在线天堂最新版资源| 亚洲av福利一区| 国产精品久久久久久久电影| 人妻一区二区av| 国产日韩欧美在线精品| 久久草成人影院| 久久国内精品自在自线图片| 嫩草影院精品99| 国产亚洲91精品色在线| 婷婷六月久久综合丁香| 日本与韩国留学比较| 国产免费福利视频在线观看| 成人综合一区亚洲| 成年av动漫网址| 精品一区二区三区视频在线| 亚洲精品影视一区二区三区av| 国产精品国产三级国产av玫瑰| av播播在线观看一区| 日日啪夜夜撸| 国产av码专区亚洲av| 九草在线视频观看| 国产精品99久久久久久久久| 大又大粗又爽又黄少妇毛片口| 久久精品国产亚洲网站| 一本久久精品| 亚洲aⅴ乱码一区二区在线播放| 久久97久久精品| 美女xxoo啪啪120秒动态图| 91久久精品国产一区二区三区| 亚洲精品乱久久久久久| 蜜臀久久99精品久久宅男| 国产精品1区2区在线观看.| 国产伦在线观看视频一区| 成人毛片a级毛片在线播放| 国产精品麻豆人妻色哟哟久久 | 91久久精品国产一区二区成人| 高清av免费在线| 国产一区二区三区av在线| 国产伦精品一区二区三区视频9| 日本色播在线视频| 天堂中文最新版在线下载 | 九九久久精品国产亚洲av麻豆| 亚洲精品久久久久久婷婷小说| 蜜臀久久99精品久久宅男| 精华霜和精华液先用哪个| 亚洲欧美精品专区久久| 黄片wwwwww| 亚洲精品成人久久久久久| 亚洲色图av天堂| 99re6热这里在线精品视频| 久久亚洲国产成人精品v| 波多野结衣巨乳人妻| 久久久a久久爽久久v久久| 又爽又黄a免费视频| 干丝袜人妻中文字幕| 日本欧美国产在线视频| 国产亚洲av嫩草精品影院| 久久久久网色| 毛片女人毛片| 一区二区三区乱码不卡18| 亚洲精品影视一区二区三区av| av国产久精品久网站免费入址| 性色avwww在线观看| 欧美激情在线99| 亚洲婷婷狠狠爱综合网| 成人午夜高清在线视频| 我要看日韩黄色一级片| 激情 狠狠 欧美| 国产综合精华液| 国精品久久久久久国模美| 国产精品爽爽va在线观看网站| 国产高清国产精品国产三级 | 免费看不卡的av| 国产伦理片在线播放av一区| 中国美白少妇内射xxxbb| 亚洲国产日韩欧美精品在线观看| 伊人久久精品亚洲午夜| 国产成人freesex在线| 91精品一卡2卡3卡4卡| 尾随美女入室| av在线亚洲专区| 大香蕉久久网| 欧美不卡视频在线免费观看| 亚洲av二区三区四区| 国产午夜福利久久久久久| 国产免费一级a男人的天堂| 免费观看av网站的网址| 身体一侧抽搐| 国产精品久久久久久久电影| 亚洲经典国产精华液单| 国产白丝娇喘喷水9色精品| 伦理电影大哥的女人| 天天一区二区日本电影三级| 日日摸夜夜添夜夜添av毛片| 亚洲欧美成人精品一区二区| 国产伦在线观看视频一区| 国产成人精品福利久久| 人妻制服诱惑在线中文字幕| 白带黄色成豆腐渣| 国产av码专区亚洲av| 高清午夜精品一区二区三区| 老司机影院毛片| 永久免费av网站大全| freevideosex欧美| 国产 一区精品| 热99在线观看视频| 亚洲自偷自拍三级| 色哟哟·www| 免费av毛片视频| 久久久久精品久久久久真实原创| 精品人妻偷拍中文字幕| 亚洲精品久久久久久婷婷小说| 97人妻精品一区二区三区麻豆| 99久久精品热视频| 国产 亚洲一区二区三区 | 久久精品夜夜夜夜夜久久蜜豆| 美女黄网站色视频| 久久人人爽人人爽人人片va| 国产成年人精品一区二区| 日韩av免费高清视频| 国产精品国产三级国产av玫瑰| 日韩大片免费观看网站| a级毛片免费高清观看在线播放| 欧美性感艳星| 成人午夜高清在线视频| 能在线免费看毛片的网站| 国产亚洲精品久久久com| 男插女下体视频免费在线播放| 亚洲成人久久爱视频| 久久6这里有精品| 美女cb高潮喷水在线观看| 亚洲精品成人av观看孕妇| 伊人久久国产一区二区| 成年女人在线观看亚洲视频 | 可以在线观看毛片的网站| 欧美潮喷喷水| av在线亚洲专区| 伦理电影大哥的女人| 91久久精品电影网| 国产黄频视频在线观看| 亚洲欧美成人综合另类久久久| 免费av毛片视频| 国产精品日韩av在线免费观看| 亚洲欧美日韩卡通动漫| 午夜福利视频精品| 99久国产av精品国产电影| 韩国av在线不卡| 99久国产av精品| 久久久久久久久久久丰满| 国产日韩欧美在线精品| 国产午夜精品一二区理论片| 久久精品国产亚洲av涩爱| 在线天堂最新版资源| 国产精品综合久久久久久久免费| 在线观看av片永久免费下载| 亚洲精品成人久久久久久| 在线免费观看的www视频| 国产亚洲午夜精品一区二区久久 | 欧美另类一区| 嫩草影院新地址| av卡一久久| 少妇人妻精品综合一区二区| 能在线免费看毛片的网站| 黑人高潮一二区| 一级毛片aaaaaa免费看小| 久久热精品热| 80岁老熟妇乱子伦牲交| 中文字幕制服av| 国产激情偷乱视频一区二区| 国产一区二区三区综合在线观看 | 在线播放无遮挡| 亚洲性久久影院| 亚洲色图av天堂| 日韩av不卡免费在线播放| 亚洲熟女精品中文字幕| 菩萨蛮人人尽说江南好唐韦庄| 一边亲一边摸免费视频| 男女国产视频网站| 好男人视频免费观看在线| 亚洲四区av| 韩国高清视频一区二区三区| 国产乱人偷精品视频| 高清视频免费观看一区二区 | 亚洲精品久久午夜乱码| 午夜福利视频1000在线观看| 色视频www国产| 久久久a久久爽久久v久久| 天天一区二区日本电影三级| 免费观看在线日韩| 男人舔奶头视频| 久久久久久伊人网av| a级毛片免费高清观看在线播放| 卡戴珊不雅视频在线播放| 少妇人妻精品综合一区二区| 亚洲天堂国产精品一区在线| 十八禁网站网址无遮挡 | 国产成人精品福利久久| 色网站视频免费| 只有这里有精品99| 亚洲av男天堂| 国产精品三级大全| 丝袜美腿在线中文| 99热全是精品| 国产乱人偷精品视频| 国产亚洲精品av在线| 国产精品一区二区在线观看99 | 欧美bdsm另类| 婷婷六月久久综合丁香| 国产欧美另类精品又又久久亚洲欧美| 国产精品精品国产色婷婷| 久久午夜福利片| 久久精品国产亚洲网站| 丝瓜视频免费看黄片| 国产色婷婷99| 伊人久久精品亚洲午夜| 日韩精品青青久久久久久| 国产成人免费观看mmmm| 天美传媒精品一区二区| 春色校园在线视频观看| 亚洲成人一二三区av| 直男gayav资源| 亚洲精品一二三| 亚洲欧美精品自产自拍| 校园人妻丝袜中文字幕| 在线免费观看不下载黄p国产| 成年av动漫网址| 国产精品人妻久久久久久| 色网站视频免费| 日本色播在线视频| 国产精品三级大全| 国产成人a∨麻豆精品| 国产精品久久久久久精品电影小说 | or卡值多少钱| 欧美日韩国产mv在线观看视频 | 国产熟女欧美一区二区| 日韩一本色道免费dvd| 亚洲18禁久久av| 99九九线精品视频在线观看视频| 高清欧美精品videossex| 婷婷色av中文字幕| 成人欧美大片| 日韩欧美 国产精品| 亚洲av.av天堂| 国产精品日韩av在线免费观看| 欧美日韩国产mv在线观看视频 | 免费av不卡在线播放| 美女主播在线视频| 国产亚洲最大av| 禁无遮挡网站| 亚洲不卡免费看| 人人妻人人看人人澡| 熟女电影av网| 久久久久精品久久久久真实原创| 午夜激情久久久久久久| 18禁在线播放成人免费| 水蜜桃什么品种好| 又爽又黄a免费视频| 高清视频免费观看一区二区 | 亚洲国产精品sss在线观看| 一级毛片 在线播放| 青春草亚洲视频在线观看| 观看美女的网站| 亚洲综合精品二区| 国产久久久一区二区三区| 国产成人精品福利久久| 欧美日韩精品成人综合77777| 中国国产av一级| 乱系列少妇在线播放| 午夜久久久久精精品| 欧美激情久久久久久爽电影| 日韩av不卡免费在线播放| 亚洲av成人av| 亚洲va在线va天堂va国产| 国产在视频线在精品| 日韩,欧美,国产一区二区三区| 国产精品一区二区性色av| 国产黄a三级三级三级人| 69av精品久久久久久| 久久精品国产亚洲网站| 欧美一级a爱片免费观看看| 国产高清不卡午夜福利| 男人舔女人下体高潮全视频| 一个人观看的视频www高清免费观看| 狠狠精品人妻久久久久久综合| 熟妇人妻久久中文字幕3abv| 欧美精品一区二区大全| 一边亲一边摸免费视频| 麻豆成人午夜福利视频| 国产淫语在线视频| 身体一侧抽搐| 亚洲国产欧美在线一区| 男插女下体视频免费在线播放| 伊人久久国产一区二区| 亚洲成人av在线免费| 草草在线视频免费看| 欧美成人a在线观看| 国产精品久久久久久久久免| 国产伦精品一区二区三区四那| 夜夜看夜夜爽夜夜摸| 亚洲av男天堂| 亚洲欧美清纯卡通| 国产日韩欧美在线精品| 成年女人看的毛片在线观看| 免费不卡的大黄色大毛片视频在线观看 | 色播亚洲综合网| 免费黄频网站在线观看国产| 色网站视频免费| 国产一区亚洲一区在线观看| 国产亚洲av嫩草精品影院| 一级爰片在线观看| 夜夜看夜夜爽夜夜摸| 精品午夜福利在线看| 嫩草影院入口| 又爽又黄无遮挡网站| www.av在线官网国产| 久久精品夜夜夜夜夜久久蜜豆| 最后的刺客免费高清国语| 直男gayav资源| 国产熟女欧美一区二区| 免费av毛片视频| 欧美xxxx黑人xx丫x性爽| 亚洲精品国产av蜜桃| 国产精品一区二区在线观看99 | 日韩av在线大香蕉| 日本三级黄在线观看| 成人亚洲欧美一区二区av| 久久久成人免费电影| 欧美xxxx黑人xx丫x性爽| 乱码一卡2卡4卡精品| 一级毛片黄色毛片免费观看视频| 国产成人91sexporn| 日本一二三区视频观看| 国产乱人偷精品视频| av.在线天堂| 精品一区二区免费观看| 美女cb高潮喷水在线观看| 真实男女啪啪啪动态图| 精品不卡国产一区二区三区| 国产精品人妻久久久影院| 天天躁日日操中文字幕| 亚洲精品乱码久久久v下载方式| 免费大片黄手机在线观看| 久久这里只有精品中国| 一级毛片久久久久久久久女| 嘟嘟电影网在线观看| 亚洲色图av天堂| 免费不卡的大黄色大毛片视频在线观看 | 国产精品国产三级国产专区5o| 国产伦一二天堂av在线观看| 欧美日韩在线观看h| 亚洲av电影在线观看一区二区三区 | 国产精品一区二区三区四区久久| 国产亚洲av嫩草精品影院| av专区在线播放| 2022亚洲国产成人精品| 青春草亚洲视频在线观看| 成人亚洲精品av一区二区| 国产白丝娇喘喷水9色精品| 国产精品久久久久久精品电影| 美女cb高潮喷水在线观看| 十八禁国产超污无遮挡网站| 国内少妇人妻偷人精品xxx网站| 久久综合国产亚洲精品| 日日啪夜夜爽| 纵有疾风起免费观看全集完整版 | 美女cb高潮喷水在线观看| 极品少妇高潮喷水抽搐| 久久久久久久久久久免费av| 国产精品一区二区三区四区久久| 久久久色成人| 国产伦精品一区二区三区四那| 深夜a级毛片| 美女cb高潮喷水在线观看| 午夜视频国产福利| 日韩欧美精品v在线| 两个人视频免费观看高清| 国产一级毛片在线| 国产熟女欧美一区二区| 国产精品人妻久久久影院| 国产亚洲av嫩草精品影院| 精品人妻一区二区三区麻豆| 国产老妇女一区| 国产欧美另类精品又又久久亚洲欧美| 国产精品一区www在线观看| 人妻一区二区av| 亚洲国产精品专区欧美| 亚洲成人久久爱视频| 熟妇人妻久久中文字幕3abv| 一级毛片 在线播放| 99热6这里只有精品| 精品少妇黑人巨大在线播放| 国产精品久久久久久精品电影| 国产精品一区www在线观看| 国产伦一二天堂av在线观看| 久久久久久久久久黄片| 午夜激情福利司机影院| www.av在线官网国产| 少妇的逼好多水| 国产一级毛片在线| 我的老师免费观看完整版| 国产黄色小视频在线观看| 十八禁国产超污无遮挡网站| 亚洲精品,欧美精品| 卡戴珊不雅视频在线播放| 成人美女网站在线观看视频| 国产精品福利在线免费观看| 性插视频无遮挡在线免费观看| 国产色婷婷99| 五月伊人婷婷丁香| 在线 av 中文字幕| 一个人看视频在线观看www免费| 婷婷六月久久综合丁香| 精品一区二区三区视频在线| 熟妇人妻久久中文字幕3abv| 亚洲精品亚洲一区二区| 在线观看免费高清a一片| 国产成人午夜福利电影在线观看| 国内精品美女久久久久久| 丰满乱子伦码专区| 亚洲国产色片| 99re6热这里在线精品视频| 久久这里只有精品中国| 高清欧美精品videossex| 亚洲精品一二三| 高清日韩中文字幕在线| 麻豆乱淫一区二区| 成人亚洲精品一区在线观看 | 午夜免费激情av| 欧美变态另类bdsm刘玥| 久久99热这里只有精品18| 亚洲成人久久爱视频| 免费大片黄手机在线观看| 最近最新中文字幕免费大全7| 干丝袜人妻中文字幕| 国产午夜福利久久久久久| 亚洲av一区综合| 国产成人a区在线观看| 精品人妻视频免费看| 一本久久精品| 欧美人与善性xxx| 国产精品av视频在线免费观看| 最后的刺客免费高清国语| 欧美变态另类bdsm刘玥| 国产精品一及| 亚洲成人久久爱视频| 久久精品久久精品一区二区三区| 少妇人妻精品综合一区二区| 国产黄片美女视频| 丰满乱子伦码专区| 亚洲不卡免费看| 午夜福利在线观看免费完整高清在| 亚洲欧洲日产国产| 天堂影院成人在线观看| 国产黄色免费在线视频| 午夜福利在线观看吧| 80岁老熟妇乱子伦牲交| 美女国产视频在线观看| 一级黄片播放器| h日本视频在线播放| 久久久精品欧美日韩精品| 最近最新中文字幕大全电影3| 国产三级在线视频| 国内精品宾馆在线| 岛国毛片在线播放| 有码 亚洲区| 狠狠精品人妻久久久久久综合| 成年女人看的毛片在线观看| 狂野欧美激情性xxxx在线观看| 99久久人妻综合| 有码 亚洲区| 午夜精品国产一区二区电影 | 一区二区三区高清视频在线| 精品久久久久久久久av| 亚洲精品国产av蜜桃| 国内少妇人妻偷人精品xxx网站| 午夜福利视频精品| 午夜免费观看性视频| 亚洲av.av天堂| 色视频www国产| 国产精品综合久久久久久久免费| videossex国产| 91狼人影院| 中文精品一卡2卡3卡4更新| 久久99热6这里只有精品| 日本免费在线观看一区| 欧美一区二区亚洲| 精品一区二区三卡| 日韩精品有码人妻一区| 最近手机中文字幕大全| 日韩国内少妇激情av| 久久久久久久久久黄片| 精品久久久久久成人av| 午夜日本视频在线| 最后的刺客免费高清国语| 亚洲精品一二三| 亚洲av成人av| av线在线观看网站| 久久热精品热| 成人无遮挡网站| 中文资源天堂在线| 国产国拍精品亚洲av在线观看| 极品少妇高潮喷水抽搐| 国产真实伦视频高清在线观看| 国产精品久久久久久精品电影小说 | 女人久久www免费人成看片| 听说在线观看完整版免费高清| 欧美潮喷喷水| 亚洲精品色激情综合| av线在线观看网站| 久久久久久九九精品二区国产| 午夜福利在线观看吧| 熟妇人妻久久中文字幕3abv| 国产精品日韩av在线免费观看| 真实男女啪啪啪动态图| 只有这里有精品99| 国产麻豆成人av免费视频| 寂寞人妻少妇视频99o| 国产精品久久久久久av不卡| 在线播放无遮挡| 国产男人的电影天堂91| 成人亚洲精品av一区二区| 2021天堂中文幕一二区在线观| 国产人妻一区二区三区在| 欧美区成人在线视频| 青青草视频在线视频观看| 最近最新中文字幕大全电影3| 精品久久久久久久久亚洲| 99热6这里只有精品| 国产爱豆传媒在线观看| 亚洲最大成人手机在线| 白带黄色成豆腐渣| 天美传媒精品一区二区| 波多野结衣巨乳人妻| 97超视频在线观看视频| 欧美极品一区二区三区四区| 亚洲精品视频女| 久久午夜福利片| 少妇高潮的动态图| a级毛片免费高清观看在线播放| 国产成人精品一,二区| 免费观看a级毛片全部| 久久精品久久久久久久性| 91久久精品电影网| 精品欧美国产一区二区三| 免费大片黄手机在线观看| 91在线精品国自产拍蜜月| 欧美日本视频| 嫩草影院精品99| 久久久精品免费免费高清| 最近最新中文字幕免费大全7| 婷婷色av中文字幕| 日产精品乱码卡一卡2卡三| av女优亚洲男人天堂| av.在线天堂| 久久国内精品自在自线图片| 欧美日韩综合久久久久久| av国产久精品久网站免费入址| 欧美激情国产日韩精品一区| 国精品久久久久久国模美| 午夜视频国产福利| 97在线视频观看| 国产日韩欧美在线精品| 99热网站在线观看| 亚洲国产精品国产精品| 国产不卡一卡二| 又爽又黄无遮挡网站| 免费看a级黄色片| 99热全是精品| 婷婷色麻豆天堂久久| 建设人人有责人人尽责人人享有的 | 国产一区二区三区综合在线观看 | 成人欧美大片| 日日啪夜夜撸| 午夜亚洲福利在线播放| 午夜精品国产一区二区电影 | 亚洲av电影不卡..在线观看| 国产av在哪里看| 青春草国产在线视频| 街头女战士在线观看网站| 中文字幕制服av| 亚洲欧洲国产日韩| 亚洲成人一二三区av| 日本午夜av视频| 亚洲久久久久久中文字幕| 久久久久久伊人网av| 亚洲在线自拍视频| 亚洲熟妇中文字幕五十中出| 成人综合一区亚洲| 日韩av免费高清视频| 国产午夜精品一二区理论片| 亚洲av电影在线观看一区二区三区 | 亚洲av免费高清在线观看| 男的添女的下面高潮视频| 水蜜桃什么品种好| 国产精品国产三级专区第一集| 国产精品一二三区在线看| 自拍偷自拍亚洲精品老妇| a级毛色黄片| 五月伊人婷婷丁香| 国产在线一区二区三区精| 国产精品人妻久久久影院| 精品一区二区三卡| 亚洲精品国产成人久久av| 国产精品人妻久久久久久| 观看美女的网站| 亚洲成色77777| 18禁裸乳无遮挡免费网站照片| 午夜福利在线观看免费完整高清在| 在线 av 中文字幕| 国内少妇人妻偷人精品xxx网站| 麻豆精品久久久久久蜜桃| 免费看a级黄色片| 欧美 日韩 精品 国产| av免费观看日本| 直男gayav资源|