• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Comparison of Techniques for Solving the Poisson Equation in CFD

    2017-09-22 07:26:06NickBrown
    民用飛機設計與研究 2017年3期

    Nick Brown

    (Edinburgh Parallel Computing Centre, The University of Edinburgh, EdinburghEH9 3FD, UK)

    A Comparison of Techniques for Solving the Poisson Equation in CFD

    Nick Brown

    (Edinburgh Parallel Computing Centre, The University of Edinburgh, EdinburghEH9 3FD, UK)

    CFD is a ubiquitous technique central to much of computational simulation such as that required by aircraft design. Solving of the Poisson equation occurs frequently in CFD and there are a number of possible approaches one may leverage. The dynamical core of the MONC atmospheric model is one example of CFD which requires the solving of the Poisson equation to determine pressure terms. Traditionally this aspect of the model has been very time consuming and so it is important to consider how we might reduce the runtime cost. In this paper we survey the different approaches implemented in MONC to perform the pressure solve. Designed to take advantage of large scale, modern, HPC machines, we are concerned with the computation and communication behaviour of the available techniques and in this text we focus on direct FFT and indirect iterative methods. In addition to describing the implementation of these techniques we illustrate on up to 32768 processor cores of a Cray XC30 both the performance and scalability of our approaches. Raw runtime is not the only measure so we also make some comments around the stability and accuracy of solution. The result of this work are a number of techniques, optimised for large scale HPC systems, and an understanding of which is most appropriate in different situations.

    Poisson equation; CFD; FFT solver; iterative solver; MONC; PETSc

    1 Introduction

    Computational Fluid Dynamics (CFD) is used in a variety of applications, from the modelling of airflow over aircraft wings to the dynamical core of atmospheric models. The fundamental component of CFD is the solving of Navier-Stokes equations and the problem can be made substantially simpler by not resolving all scales, but instead only concentrating on quantities (or scales) of interest and parameterising, or estimating, other values. Large Eddy Simulation is a computational fluid dynamics technique used to eciently simulate and study turbulent flows. In atmospheric science, LES are often coupled to cloud microphysics and radiative transfer schemes, to create a high resolution modelling framework that is employed to develop and test physical parametrisations and assumptions used in numerical weather and climate prediction.

    In this paper we concentrate on the recently developed Met Oce NERC Cloud model (MONC)[1], an open source high resolution modelling framework that employs large eddy simulation to study the physics of turbulent ows. The goal of this model is to further develop and test physical parametrisations and assumptions used in numerical weather and climate prediction. However at its heart MONC utilises CFD for the dynamical core and the purpose of the model in this paper is as a driver for investigating the underlying computational methods. MONC replaced an existing model called the Large Eddy Model (LEM)[2]which was an instrumental tool, used by the weather and climate communities, since the 1980s for activities such as development and testing of the Met Office Unied Model (UM) boundary layer scheme[5][6], convection scheme[8][7]and cloud microphysics[9][4]. A key aspect of the LES dynamical core in MONC is the solving of pressure terms, via the Poisson equation. In the LEM model this consumed a signicant portion of the overall runtime and as-such we considered it crucially important to optimise and understand the best choice of computational technique.

    The ubiquity of CFD means that lessons learnt about optimal computational approaches in one area are of relevance to other fields such as from atmospheric science in MONC to aircraft design. Section two introduces the dierent approaches that we developed for solving the pressure terms in MONC as well as discussing their implementation in more detail. The performance, scalability and stability of the techniques on up to 32768 processor cores of a Cray XC30 is described in section three, before some conclusions are then drawn in section four and further work discussed.

    2 Computational approaches for solving pressure terms

    Calculation of the pressure term involves solving a Poisson-like elliptic equation[3]for the pressure,p, field. This represents a pressure correction, where the dynamical core generates source terms from other aspects such as advection and buoyancy which are then used to dene input to the solver. The Poisson equation is then solved which determines the current pressure at every point in the three dimensional grid, measured in Pascals. From a computational perspective not only are there are numerous ways to actually solve this equation, but many of these also contain numerous choices and conguration options. It is fair to say that this is an area of the model often less understood by the users, with them relying on some default choices, however as we will see if they tune their conguration for the actual test-case then significant improvements in the overall performance can be obtained.

    2.1 Direct FFT

    One approach to solving the Poisson equation is based upon Fourier transformation. By performing a forward Fourier transformation into the frequency domain, we obtain a vertical ordinary differential equation that can then be solved using a tri-diagonal solver. Once completed a back-wards Fourier transform reverts from the frequency domain back into to the spatial domain, this being the pressure solution. The tri-diagonal solver itself is trivial computationally and works vertically (in theZdimension) up and down each individual column of the system, without any interaction with neighbouring columns. This works well as long as complete columns are held on processes, which is the case (MONC decomposes in theXandYdimensions only.) The compu-tationally intensive parts of the direct FFT solver are to be found in computing the forward and backward Fourier transforms and the Fast Fourier Transform (FFT) algorithm[10]is commonly used for this. When, as in this case, working with a 3D system each process handles the dimensions separately, executing the FFT algorithm for all of the data in one dimension before moving onto the next. In this implementation we use the Fastest Fourier Transformation in the West (FFTW)[11]library for the actual FFT computational kernel. FFTW is a fast, commonly available and open source library that implements numerous FFT algorithms. The programmer provides details of their problem in an initial planning phase, which FFTW uses to select the most appropriate kernel to execute. The idea being that the planning and selection of kernel(s) is expensive, but only done once (such as during an initialisation step) as the plan can then be reused time and time again (for instance every timestep.)

    When computing Fourier transforms in parallel systems typically one has a choice between splitting the system up (decomposing the domain) in one or two dimensions. Decomposing in one dimension, known as slab, is simpler to implement but does limits the amount of parallelism available in contrast to a two dimensional, or pencil, decomposition. FFTW supports parallelism through a one dimension slab decomposition, however this is notoriously inecient. Therefore in our code we have leveraged FFTW for the actual computational kernels but implemented our own, pencil decomposition and parallelism to distribute the system amongst the processes in two dimensions. Hence FFTW is responsible for the computation and our own code the decomposition and communication.

    The pressure field,p, that we are working on is initially distributed as in figure 1. The domain is decomposed between processes in two dimensions, with columns resident on a process. In this example there are four processes, each in possession of a unique set of columns. Communication is then issued to transpose in theYdimension, where each process communicates with a subset of processes (all processes in the dimension’s specific direction) to gather together data and distribute its own to form another pencil in this dimension. This can be thought of as a parallel transposition in that dimension, and the result of this communication is illustrated in figure 2. After this transposition, data is now allocated to processes in rows rather than columns. For each row residing on a process the forward Fourier transform is performed by calling into FFTW which executes the appropriate internal kernel as per its plan.

    Once work in theYdimension has completed then a further transposition and FFT in theXdimension must be performed. As before, communication is required with processes in this dimension, in order to gather together blocks forming a pencil inXas illustrated by figure 3. Once again a forward FFT is performed for all data in this dimension.

    At this point we have completed the forward Fourier transform for the 3D data and the pressure field,p, is in the frequency domain. Eectively we have an ODE which our simple tri-diagonal solver can operate upon. However, a significant restriction of our solver is that it must work in vertical columns but the data, as per figure 3, is split up and distributed in theXdimension. As the algorithm requires data to be oriented in the vertical,Z, dimension further communication is required to gather the data back to its original orientation and we do this by reverse transposing, from theXdimension (figure 3) to theYdimension (figure 2) and then to theZdimension (figure 1).

    Once the tri-diagonal solver has executed we need to transform from the frequency domain back into the time domain. This requires a backwards 3D Fourier transform and first we must transpose back to theXdimension via theYdimension. A backwards FFT is executed by FFTW in thisXdimension before transposing to theYdimension and performing another backwards FFT inY. Lastly a transposition fromYtoZis executed which gives us the pressure field in the time domain and holds the solution to the pressure equation.

    From our description here it can be seen that there is significant communication between the processes when performing all these data transpositions. Not only this, but because we are splitting the data into blocks that are then distributed between numerous processes, this can result in many small messages between many processes which is known to be inecient. More general optimisation techniques, such as overlapping communication with computation could offset this ineffciency but these are not applicable here because we must be in possession of all a dimension’s data before performing the Fourier transform. In short, due to the cost of communication, it is widely accepted that FFTs are not scalable to large numbers of processes.

    An uneven decomposition of data is where the global domain does not divide evenly between processes and so they hold dierent numbers of elements. This can add considerable complexity to the code but is an important case for us to handle. We found that the best way to implement this has been to follow a similar initial planning approach that FFTW exposes. Instead of selecting computational kernels our code is determining aplanfor decomposition and communication. Specically this plan is made up of a number of transpositions, each transposition containing specics of what local data needs to be sent and what remote data to receive and then where to place it. In the plan, associated with the set of transpositions, are a number of buers. These are needed for communication and we aim to maximise re-use of these buers for memory eciency reasons. Whilst the resulting implementation is inevitably complex, the fact that performing a forward Fourier transform and then a backwards transform will return the original values means that testing correctness is trivial. By temporarily disabling the tri-diagonal solver we have been easily able to test that the input values match output values from the transpositions to provide condence that the decomposition and communication is correct irrespective of the system size and number of processes.

    Previously the model utilised an old FFT implementation which, written in Fortran 66, only implemented slab decomposition and exhibited very poor performance due to naive parallel communications and a poor FFT kernel. By reimplementing and leveraging FFTW for the kernel with our own pencil decomposition and communication routines we were able to obtain a very signicant improvement in overall performance in MONC compared to the LEM.

    2.2 Iterative solver

    The second way of solving the Poisson equation is by an iterative method where the system starts off with an initial guess and makes process towards the final answer via a series of iterations. A major advantage this technique holds for MONC is that it does not require the lateral periodic boundary conditions imposed by the direct FFT solver and so extends the exibility of the model in general. We are looking to solve an equation of the formAx=b, whereAis the matrix,xis the target solution andbis the right hand side (RHS). Mapping this to the Poisson equation, that we are solving here for pressure terms, one obtains ▽2x=b. Whilst there are many variants of iterative methods, such as Jacobi and Gauss-Seidel, Krylov subspace methods form the basis of the most efficient iterative solvers. There are numerous Krylov based solvers such as Conjugate Gradient (CG), BIConjugate Gradient STABbilized (BiCGStab) and Generalized Minimal RESidual (GMRES.) Each of these varies in its performance and stability for specific problems, so the trick is to pick the most appropriate solver for the equation in question. Before starting to solve the system it is common to pre-condition the matrix in order to lower its condition number. Broadly speaking the condition number can be used to determine how complex a system will be to solve, and a high condition number will result in more iterations of the solver. Therefore running an algorithm on the matrix to lower the condition number, and hence reduce the number of solver iterations by making the system easier to solve, can be of significant performance benefit.

    Unlike the direct FFT approach, iterative methods proceed in iterations. Instead of resulting in the exact solution they will terminate once a specific accuracy has been met and the relative residual, i.e. by how much the error of the solution has decreased, is a common measure and used in this work. For the calculation of each grid point in the system one requires the neighbouring points. This is simple in the middle of the local domain, as it just requires a local memory read, but on the boundaries communication is required with the neighbouring process to obtain the needed values. For efficiency a technique,haloswapping, is used which is illustrated in figure 4 where a halo of width one element surrounds the local domain on each process. After an iteration, values are exchanged with all neighbours (in this case a maximum of four neighbours because we decompose in two dimensions) en mass. This is optimal because fewer, larger messages are far more ecient than many smaller ones, and it also provides the possibility of overlapping some computation with communication. In contrast to the direct FFT approach, halo swapping communication required by the iterative solver is more regular, localised and easier to optimise. In short the communication pattern of the iterative solver should provide signicant performance benets over communication of the FFT solver when considering larger core counts.

    2.2.1 PETSc iterative solver

    The Portable Extensible Toolkit for Scientic computation (PETSc)[12]is a general purpose framework for solving linear systems. Combining a variety of solvers, pre-conditioner and methods for parallelisation, one can define their matrix and RHS using the provided data structures and abstractions. This is handed to PETSc which then executes the actual solve and provides the solution to the user. For parallelisation a Distributed Memory (DM) interface is provided. This enables the programmer to dene general details about their problem and desired parallelism, with the library then making the lower level choices and implementing the physical mechanism such as communication. A major advantage of PETSc is the availability of a rich collection of solvers and pre-conditioners. As such it is trivial to experiment with dierent solution choices simply by setting general options in the code to enable experimentation to find the most appropriate solver for performance and stability.

    Whilst it is possible to run PETSc in a matrix-less manner, i.e. without physically constructing the matrix or RHS, this requires signicantly more work and neither the existing solvers nor preconditioners can be used directly with this. Therefore, in MONC we go with the more common approach of explicitly constructing the matrix and RHS in memory. Listing 1 illustrates the Fortran subroutine called by PETSc to setup the RHS which is based upon source terms which have already been computed for the current timestep. The distributed matrix is first retrieved and then the current processes’ global start grid pointxs,ysandzsalong with the local data size in each dimensionxm,ymandzmare determined. A Fortran pointer to the underlying RHS array is retrieved and then thecopy_data_to_petsc_pointersubroutine, which we developed, will copy the pressure source termsp_source(a global variable) into this array.

    Listing1PETScRHSforeachsolve

    Matrix generation is also done via a procedure callback, omitted for brevity, which is called on the first timestep only as once the matrix is initialised it remains unchanged. This matrix implements the Poisson equation in a standard manner and has been tested for correctness in comparison to the FFT direct solver. Listing 2 illustrates the calling of PETSc at each timestep to perform the pressure solve. Thep_sourceglobal variable is set to be the current source terms which are held in thep(pressure) data array and this is then read in thecompute_RHSprocedure called from PETSc. Execution of theKSPSolveprocedure actually performs the solve and, once this has completed, theKSPGetSolutionprocedure retrieves the solution as a PETSc vector which is then read and copied back into the p array. Theprev_pglobal variable is set to this value which will be initial conditions for the pressure solve at the next timestep.

    Listing2ExecutingPETSciterativesolver

    In contrast to the direct FFT solver, leveraging PETSc for an iterative approach is considerably simpler from a code perspective. This is mainly due to the mature support of the library, but it also should be noted that because the neighbour to neighbour communication pattern of iterative solvers is more regular and far simpler that that required by the Fourier transform, it is easier to abstract in a common library.

    2.2.2 Bespoke solver implementation

    The benefit of the PETSc based approach is that numerous existing solvers and preconditions are made immediately available, however explicit bespoke implementations of the CG and BiCGStab methods were also developed for solving the Poisson equation in MONC. A benet to this bespoke approach is that we can tune our implementations to explicitly match the problem, rather than having to rely on PETSc’s general implementation. Our approach is matrix-less, i.e. the matrix and RHS are not explicitly constructed and as-such this utilises less memory than the PETSc implementation. This implementation post-conditions rather than pre-conditions where we eectively solveAM(Nx) =b, whereNis the inverse ofM. This post-conditioner has been designed specically for our problem and the use of a post-conditioner means that there is exibility to changeMat every iteration in theory which might improve performance. When relying on pre-conditioning this isn’t the case since the RHS has been altered and so the LHS needs to stay consistent.

    The specific calculation procedures of the solvers, such as calculating Ax (the matrix vector product), is hard coded to the problem by physically implementing calculations necessary to the Poisson equation in code. It would therefore be impossible to target these solvers at other equations without a signicant re-implementation of the code but this does mean that we can heavily tune for the problem in hand. One example of this is in the avoidance of communication, where we can combine multiple inner products, each requiring a collective reduction, into one. We are also able to overlap communication with compute by initiating halo swapping in a nonblocking manner. Whilst this is in progress we then start working on the inner local domain calculations which do not require neighbouring data. Only once all the inner value calculations have completed, which represents the vast majority of the domain, will the code then wait for halo swapping communications to complete and handle boundary values. The intention is that whilst the relatively long running inner computations have been in progress the halo-swapping has completed and-so the cost of communication is greatly reduced because the process can continue doing useful work instead of stalling whilst it completes.

    3 Comparison

    As mentioned in section 2.1, traditionally a very naive FFT implementation had been used to solve the Poisson equation in the LEM. This exhibited very poor performance and would not scale beyond a couple of hundred processes. Our initial plan was to address this by utilising an iterative solver. However at the same time we also re-implemented the FFT solver as discussed in section 2.1 which in itself resulted in very signicant improvements to the runtime. Even though it is signicantly improved, the FFT solver still suers from two distinct disadvantages; firstly the signicant, complex and inecient communication required, and secondly the fact that it is a direct method and hence provides the exact solution every time. Therefore it is useful to provide an alternative to this FFT approach and to understand the different performance and scalability characteristics that we might expect from them. It is very possible that different congurations of the model will favour different approaches and/or methods, therefore providing a choice is positive.

    All performance tests in this section have been run on ARCHER, a Cray XC30 which is the UK national supercomputer. In all cases the Cray compiler was used and the results presented are the average of three runs. A standard dry boundary layer MONC test case has been used to drive the solvers which is concerned with the atmosphere at the interface between two different air masses.

    Figure 5 illustrates the performance of the FFT and PETSc iterative solver. In this experiment we are weak scaling with 65536 local grid points and hence 2.1 billion global grid points at 32768 cores. The PETSc solver is leveraging the BiCGStab solver with ILU preconditioner. It can be seen that the FFT solver is signicantly faster than the PETSc iterative solver at smaller core counts, but as one scales up the differences diminish. Hence it can be seen that the direct FFT solver involves less computation but more communication. Results of figure 5 are averaged over the entire run which involves signicant model spin-up time where the eld is varying considerably between timesteps. This variation causes the number of iterations of the iterative solver to be high whereas later on in the run, when the model settles down, then the number of iterations becomes minimal because a steady state has been reached. Conversely the FFT solver requires the same amount of computation regardless, this is advantageous where there is signicant variance but less so for stable systems.

    A major benefit when utilising an iterative solver in contrast to a direct solver is that we can set the termination residual and therefore explicitly choose how accurately to solve the system. Table 1 illustrates executions of the iterative solver, solving to different relative residual accuracies, against the direct FFT solver. TheMeandierencecolumn illustrates the average dierence per field element the pressure values calculated by the iterative solver and that by the direct solver. The focus here is on computation rather than communication, hence the timing results are over executions of the test case on one core only and the FFT solver took on average 3.78e-3 seconds per timestep. It can be seen that solving iteratively to 1e-2 is signicantly faster than a direct FFT solve, although there is the most significant difference in terms of solution accuracy. It is up to the user of MONC to determine the exact termination residual to solve to for their own congurations and our general suggestion is that somewhere between 1e-2 and 1e-4 is most appropriate. The fact that one can leverage the target residual accuracy to considerably reduce the overall solver time is important for users to understand and for some, especially large coarse grained, systems a high level of accuracy in the pressure calculation is not required.

    Table 1 Iterative solver accuracy in comparisonto FFT solver and average time per solve

    Table 2 illustrates the performance difference between different solvers in both PETSc and our bespoke implementation. The ILU pre-conditioner has been used for all PETSc runs as this was found to be the optimal choice for our problem. It can be seen that GMRES, which is PETSc’s default solver, is marginally slower than BiCGStab in PETSc. Our own implementation of BiCGStab is faster than PETSc and the fastest is our bespoke implementation of CG. Whilst the number of solver iterations varies dramatically during the model runs, PETSc’s GMRES tends to involve the greatest number and our bespoke CG implementation the fewest iterations. The number of iterations between PETSc’s BiCGStab and our bespoke BiCGStab is roughly similar (even though a dierent approach to lowering the condition number of the matrix is adopted.) In all it is felt that our bespoke solvers out performs PETSc due to the optimisations we have been able to apply due to the more specialist implementations. This is both in terms of the targetted post conditioning and also communication optimisation (avoidance and overlapping.)

    In terms of solver stability the default GMRES solver of PETSc is able to solve the pressure equations of all our test-case congurations. PETSc’s BiCGStab solver works for most testcases, but can become unstable with some non-symmetric systems. By contrast our own bespoke BiCGStab solver is stable for all systems and the CG solver requires symmetric systems which represents the smallest subset of the test-case congurations. It is important for the user to understand whether CG would be an appropriate choice for their problem because, if so, then this will result in the lowest runtime.

    Table 2 Performance of iterative solvers

    In addition to the runtime there is also the question of memory usage. The runs in this paper have focussed on weak scaling with 65536 local grid points per process. In reality this represents a rather small local domain and can easily t into RAM (2GB per core on ARCHER.) Whilst this is generally a recommended local domain size, some users do want to run with signicantly larger local domain sizes. Therefore it is important to consider the memory requirements of these dierent techniques as this will ultimately limit the amount of local data that can be stored. Our PETSc iterative solver requires substantially more memory than the FFT or bespoke solvers, this is not only due to the fact that we explicitly construct the matrix and RHS in memory but also because PETSc itself abstracts parallelism and operations through matricies, vectors and index sets. Hence internal to PETSc a number of these data structures need to be constructed which results in additional memory overhead. The bespoke solver implementation is the most memory ecient, only requiring a small number of buers for the halo swapping communications, quite closely followed by the direct FFT solver which requires slightly more memory. In reality a user will see very little difference in memory limitation between the FFT and bespoke iterative solvers, but for larger local system sizes they will be impacted by the memory requirements of the PETSc approach.

    4 Conclusions

    In this paper we have considered two different computational techniques for solving the Poisson equation as part of the pressure correction step of the MONC model’s dynamical core. It has been seen that there is a trade-off between computation and communication. The cost of computation associated with a direct FFT method is lower, but the cost of communication is more significant. At lower core counts our direct FFT approach is significantly faster because communication is not a significant factor, but as we scale up to greater numbers of cores then the difference diminishes. In terms of runtime performance there is a slightly increased cost of using the general PETSc framework instead of the bespoke solvers, however implementation of the former was far quicker and the benefit of PETSc is that we can trivially switch between solvers and pre-conditioners. Therefore, whilst currently a bespoke iterative solver implementation targetted at the specic problem is advantageous, as the large PETSc community improve the library and develops further methods then these might very well out perform our own solvers.

    In truth we were surprised that the direct FFT solver scaled as well as it did. However one should bear in mind that Cray machines, with their Aries routing technology, have a very efficient interconnect between the nodes hence diminishing the impact of communication ineciencies. It is likely that scalability on HPC machines inter-connected with other technologies, such as Inniband, would exhibit poorer scaling. One should also bear in mind that as we progress towards exa-scale and look to leverage these methods to solve challenging scientic and engineering problems, 32768 processes will not be considered a large amount of parallelism. From figure 5 it is fairly clear that if one was to target a hundred thousand processes and beyond, which are future plans for MONC, then the iterative approaches would be preferable irrespective.

    In terms of further work there is the possibility of optimising the communication in the direct FFT solver by eliminating the need togobackto a column based (Z) decomposition for the tridiagonal solver. If this could be eliminated then it would half the overall amount of communication which would be a signicant improvement. Whilst modifying the tri-diagonal solver away from a column based approach might be difficult, in our current code instead of transposing directly from an X decomposition to a Z decomposition, the code goes via the Y dimension requiring extra communication. This was done for simplicity and to reuse the transposition implementation, but removing this intermediate step would reduce the overall amount of communication, albeit at the cost of code complexity.

    [1] Nick Brown, Michele Weiland, Adrian Hill, Ben Shipway, Chris Maynard, Thomas Allen, and Mike Rezny. A highly scalable met oce nerc cloud model. InInProceedingsofthethirdInternationalConferenceonExascaleApplicationsandSoftware. EASC 2015, April 2015.

    [2] A.R. Brown, M.E.B Gray, and M.K. MacVean. Large-eddy simulation on a parallel computer.Turbulenceanddiusion, (240), 1997.

    [3] M.E.B Gray, J. Petch, S.H. Derbyshire, A.R. Brown, A.P. Lock, H.A. Swann, and P.R.A. Brown. Version 2.3 of the met oce large eddy model. part 2, scientic documentation. 2003.

    [4] A.A. Hill, P.R. Field, K. Furtado, A. Korolev, and B.J. Shipway. Mixed-phase clouds in a turbulent environment. part 1. large-eddy simulation experiments.QuarterlyJournaloftheRoyalMeteorologicalSociety, 140(680), 2014.

    [5] A.P. Lock. The parametrization of entrainment in cloudy boundary layers.QuarterlyJournaloftheRoyalMeteorologicalSociety, 124(552), 1998.

    [6] A.P. Lock, A.R. Brown, M.R. Bush, G.M. Martin, and R.N.B. Smith. A new boundary layer mixing scheme. part i. scheme description and single-column model tests. part ii. tests in climate and mesoscale models.MonthlyWeatherReview, 128(9), 2000.

    [7] J.C. Petch. Sensitivity studies of developing convection in a cloud-resolving model.QuarterlyJournaloftheRoyalMeteorologicalSociety, 132(615), 2006.

    [8] J.C. Petch and M.E.B. Gray. Sensitivity studies using a cloud-resolving model simulation of the tropical west pacific.QuarterlyJournaloftheRoyalMeteorologicalSociety, 127(557), 2001.

    [9] S.J. Abel and B.J. Shipway. A comparison of cloud-resolving model simulations of trade wind cumulus with aircraft observations taken during rico.QuarterlyJournaloftheRoyalMeteorologicalSociety, 133(624), 2007.

    [10] R.C. Singleton. On computing the fast Fourier transform.CommunicationsoftheACM, 10.10 (1967): 647-654.

    [11] M. Frigo and S. Johnson. FFTW: An adaptive software architecture for the fft.IEEEConferenceonAcoustics,Speech,andSignalProcessing, 3, 1381-1384, 1998.

    [12] S. Balay, S. Abhyankar, M. Adams et al, PETSc Users ManualArgonneNationalLaboratoryANL-95/11-Revision3.7, 2016

    10.19416/j.cnki.1674-9804.2017.03.015

    十八禁人妻一区二区| 国内毛片毛片毛片毛片毛片| 999精品在线视频| 久久国产亚洲av麻豆专区| 日韩三级视频一区二区三区| 精品一区二区三区视频在线观看免费 | 99热国产这里只有精品6| av一本久久久久| 国产亚洲精品一区二区www | 午夜久久久在线观看| 王馨瑶露胸无遮挡在线观看| 男女午夜视频在线观看| 19禁男女啪啪无遮挡网站| 超色免费av| 亚洲国产中文字幕在线视频| 欧美精品亚洲一区二区| 在线永久观看黄色视频| 亚洲av欧美aⅴ国产| 久久久久久亚洲精品国产蜜桃av| netflix在线观看网站| 午夜精品国产一区二区电影| 国产有黄有色有爽视频| 国产精品成人在线| 精品久久久久久久久久免费视频 | 久久精品国产亚洲av高清一级| 成人特级黄色片久久久久久久| 一级毛片精品| 人人妻人人澡人人看| 天天操日日干夜夜撸| 亚洲aⅴ乱码一区二区在线播放 | 亚洲自偷自拍图片 自拍| 久久草成人影院| 日韩欧美免费精品| 国产主播在线观看一区二区| 国产精品永久免费网站| 99re6热这里在线精品视频| 精品国产亚洲在线| 久久国产精品影院| 精品国产亚洲在线| 999久久久国产精品视频| 欧美日韩黄片免| 丰满的人妻完整版| 国产色视频综合| 成人18禁高潮啪啪吃奶动态图| 久热爱精品视频在线9| 亚洲成人免费av在线播放| 黄色视频,在线免费观看| 亚洲人成伊人成综合网2020| 精品久久久精品久久久| 久久国产精品男人的天堂亚洲| 午夜亚洲福利在线播放| 成熟少妇高潮喷水视频| 手机成人av网站| av在线播放免费不卡| 日本a在线网址| 久久久国产成人精品二区 | 999精品在线视频| 久久国产乱子伦精品免费另类| 妹子高潮喷水视频| 搡老乐熟女国产| 色婷婷av一区二区三区视频| 国产成人精品在线电影| 精品久久久久久久毛片微露脸| 亚洲中文av在线| 亚洲一码二码三码区别大吗| 女人精品久久久久毛片| 黑人操中国人逼视频| 亚洲伊人色综图| 午夜福利在线免费观看网站| 久久久久国产一级毛片高清牌| 免费日韩欧美在线观看| 一进一出好大好爽视频| 久久精品亚洲av国产电影网| 日日摸夜夜添夜夜添小说| 亚洲成人免费电影在线观看| 大香蕉久久网| 亚洲欧美激情综合另类| 高清欧美精品videossex| 在线视频色国产色| 国产精品久久视频播放| 欧美最黄视频在线播放免费 | 不卡一级毛片| 欧美黑人欧美精品刺激| 91国产中文字幕| 制服人妻中文乱码| 女人被躁到高潮嗷嗷叫费观| 久久中文字幕一级| 成人三级做爰电影| 又紧又爽又黄一区二区| 亚洲第一av免费看| 国产真人三级小视频在线观看| 精品少妇一区二区三区视频日本电影| 亚洲三区欧美一区| 久久久久国产精品人妻aⅴ院 | 999久久久精品免费观看国产| 久久久久国产精品人妻aⅴ院 | 亚洲av日韩在线播放| 在线观看免费日韩欧美大片| 免费在线观看亚洲国产| 免费在线观看视频国产中文字幕亚洲| 国产蜜桃级精品一区二区三区 | 日韩免费av在线播放| 国产成人精品在线电影| 欧洲精品卡2卡3卡4卡5卡区| 99国产精品一区二区三区| 一区二区三区激情视频| ponron亚洲| 老司机亚洲免费影院| 伦理电影免费视频| 亚洲精品久久午夜乱码| 国产1区2区3区精品| 啦啦啦视频在线资源免费观看| 亚洲va日本ⅴa欧美va伊人久久| 欧美日韩精品网址| 国产精品亚洲一级av第二区| 美女国产高潮福利片在线看| 热re99久久精品国产66热6| 中文字幕制服av| 亚洲成人国产一区在线观看| 国产成人一区二区三区免费视频网站| 下体分泌物呈黄色| 亚洲熟妇熟女久久| 大码成人一级视频| 亚洲中文字幕日韩| 香蕉丝袜av| 美女扒开内裤让男人捅视频| 国产亚洲精品一区二区www | 久久久久精品国产欧美久久久| 丰满饥渴人妻一区二区三| 欧美日本中文国产一区发布| 成年女人毛片免费观看观看9 | 成年人免费黄色播放视频| 三级毛片av免费| 日日摸夜夜添夜夜添小说| 国产精品影院久久| 国产精品av久久久久免费| 最新在线观看一区二区三区| a级毛片黄视频| a级片在线免费高清观看视频| 久久久国产成人免费| 亚洲av成人一区二区三| 久久精品熟女亚洲av麻豆精品| 久久亚洲真实| 亚洲 欧美一区二区三区| 黄网站色视频无遮挡免费观看| 久久亚洲精品不卡| 精品国产一区二区三区四区第35| 美女视频免费永久观看网站| 亚洲精品国产区一区二| 欧美日韩黄片免| 久久久久久人人人人人| 久久精品aⅴ一区二区三区四区| 日韩熟女老妇一区二区性免费视频| 99riav亚洲国产免费| 一级毛片女人18水好多| 一级a爱片免费观看的视频| 色精品久久人妻99蜜桃| 成人影院久久| 免费黄频网站在线观看国产| 波多野结衣一区麻豆| 50天的宝宝边吃奶边哭怎么回事| 在线十欧美十亚洲十日本专区| 亚洲精品一卡2卡三卡4卡5卡| 亚洲欧美精品综合一区二区三区| 王馨瑶露胸无遮挡在线观看| 91麻豆av在线| 在线观看66精品国产| 黄色成人免费大全| 999精品在线视频| 午夜福利欧美成人| 亚洲九九香蕉| 日本一区二区免费在线视频| 久久人妻av系列| 精品卡一卡二卡四卡免费| 又紧又爽又黄一区二区| 无遮挡黄片免费观看| 久久香蕉国产精品| 日韩人妻精品一区2区三区| 国产精品美女特级片免费视频播放器 | 久久热在线av| 水蜜桃什么品种好| 国产免费av片在线观看野外av| 俄罗斯特黄特色一大片| 国产欧美日韩综合在线一区二区| 啦啦啦在线免费观看视频4| 国产精品亚洲一级av第二区| 午夜影院日韩av| 五月开心婷婷网| 麻豆国产av国片精品| 欧美国产精品一级二级三级| 脱女人内裤的视频| 老司机在亚洲福利影院| 纯流量卡能插随身wifi吗| 91麻豆精品激情在线观看国产 | 丝袜人妻中文字幕| 老司机福利观看| 日日夜夜操网爽| 在线观看www视频免费| 日韩 欧美 亚洲 中文字幕| 天天添夜夜摸| 99久久国产精品久久久| 久久ye,这里只有精品| 久久久久国产精品人妻aⅴ院 | 极品人妻少妇av视频| 9色porny在线观看| 国产亚洲av高清不卡| 午夜影院日韩av| 99re6热这里在线精品视频| 波多野结衣一区麻豆| 久久久久精品国产欧美久久久| 看黄色毛片网站| 老汉色∧v一级毛片| 国产精品二区激情视频| 免费观看a级毛片全部| 免费一级毛片在线播放高清视频 | 精品久久久久久久久久免费视频 | 九色亚洲精品在线播放| 国产精品国产av在线观看| 嫁个100分男人电影在线观看| 我的亚洲天堂| 日韩欧美国产一区二区入口| videos熟女内射| 老熟女久久久| 久久天堂一区二区三区四区| 日韩精品免费视频一区二区三区| 国产成人精品在线电影| 精品福利永久在线观看| 黄频高清免费视频| 五月开心婷婷网| 两性夫妻黄色片| 久久久国产成人精品二区 | 国产精品乱码一区二三区的特点 | 18禁裸乳无遮挡免费网站照片 | 午夜福利,免费看| 国产精品久久久久久人妻精品电影| 俄罗斯特黄特色一大片| 麻豆av在线久日| 亚洲免费av在线视频| 一区二区三区国产精品乱码| av不卡在线播放| 9热在线视频观看99| 久久久精品区二区三区| 国产精品久久久av美女十八| 国产精品免费一区二区三区在线 | 校园春色视频在线观看| av国产精品久久久久影院| 久热爱精品视频在线9| 欧美 日韩 精品 国产| 深夜精品福利| 99在线人妻在线中文字幕 | 国产1区2区3区精品| 高清欧美精品videossex| 男人舔女人的私密视频| 国产欧美日韩一区二区三| 69av精品久久久久久| 国产成人免费观看mmmm| 亚洲 欧美一区二区三区| a级毛片黄视频| 一a级毛片在线观看| 香蕉丝袜av| 国产麻豆69| 精品高清国产在线一区| 老汉色∧v一级毛片| 国产亚洲欧美在线一区二区| 校园春色视频在线观看| 国产在线一区二区三区精| 最新的欧美精品一区二区| 99国产精品一区二区三区| 在线观看免费视频网站a站| 免费观看人在逋| 午夜视频精品福利| 久久中文字幕一级| 多毛熟女@视频| 国产精品99久久99久久久不卡| 另类亚洲欧美激情| 丝袜人妻中文字幕| 亚洲欧美一区二区三区久久| 欧美午夜高清在线| 女警被强在线播放| 亚洲自偷自拍图片 自拍| 亚洲片人在线观看| 亚洲aⅴ乱码一区二区在线播放 | 欧美激情极品国产一区二区三区| 黄色片一级片一级黄色片| 无限看片的www在线观看| 波多野结衣高清无吗| 999久久久精品免费观看国产| 国产国拍精品亚洲av在线观看 | 久久久久久大精品| 欧美黄色淫秽网站| 手机成人av网站| 亚洲熟妇中文字幕五十中出| 亚洲一区高清亚洲精品| 欧美三级亚洲精品| 亚洲av第一区精品v没综合| 脱女人内裤的视频| 村上凉子中文字幕在线| 久久精品91无色码中文字幕| 一级毛片女人18水好多| 国产三级在线视频| 亚洲黑人精品在线| 波多野结衣巨乳人妻| 午夜影院日韩av| 亚洲五月天丁香| 国产成人av激情在线播放| 人人妻,人人澡人人爽秒播| av视频在线观看入口| 午夜久久久久精精品| 老汉色∧v一级毛片| 深爱激情五月婷婷| 高清在线国产一区| 91字幕亚洲| 夜夜爽天天搞| 窝窝影院91人妻| 欧美av亚洲av综合av国产av| 一区二区三区免费毛片| 国产欧美日韩精品一区二区| 日韩欧美精品v在线| 久久天躁狠狠躁夜夜2o2o| 国产精品野战在线观看| 99久久精品热视频| 操出白浆在线播放| 精品不卡国产一区二区三区| 90打野战视频偷拍视频| 欧美国产日韩亚洲一区| 国产免费一级a男人的天堂| 欧美性感艳星| 蜜桃久久精品国产亚洲av| 亚洲成人免费电影在线观看| 天堂√8在线中文| 欧美色欧美亚洲另类二区| 欧美精品啪啪一区二区三区| 久久精品国产清高在天天线| xxxwww97欧美| 国产三级中文精品| 亚洲人成伊人成综合网2020| 女人高潮潮喷娇喘18禁视频| 国产亚洲欧美98| 亚洲av成人不卡在线观看播放网| 亚洲欧美激情综合另类| 欧美日本亚洲视频在线播放| 欧美中文日本在线观看视频| 午夜福利18| 两人在一起打扑克的视频| 神马国产精品三级电影在线观看| bbb黄色大片| 国产免费一级a男人的天堂| 给我免费播放毛片高清在线观看| 国产99白浆流出| 18美女黄网站色大片免费观看| 国产乱人伦免费视频| 听说在线观看完整版免费高清| xxx96com| 亚洲精品成人久久久久久| 日韩高清综合在线| 免费av观看视频| 久久久成人免费电影| 日韩有码中文字幕| 老熟妇乱子伦视频在线观看| 老汉色av国产亚洲站长工具| 校园春色视频在线观看| 12—13女人毛片做爰片一| 狠狠狠狠99中文字幕| 精品久久久久久久久久免费视频| 久久国产乱子伦精品免费另类| 国产亚洲av嫩草精品影院| 麻豆一二三区av精品| 亚洲美女黄片视频| 精品一区二区三区av网在线观看| 神马国产精品三级电影在线观看| 毛片女人毛片| 成人永久免费在线观看视频| 岛国视频午夜一区免费看| 最新在线观看一区二区三区| 久久香蕉精品热| 国内毛片毛片毛片毛片毛片| 观看免费一级毛片| 国产午夜精品久久久久久一区二区三区 | 国产三级在线视频| 久久久久九九精品影院| 婷婷亚洲欧美| 色综合站精品国产| 尤物成人国产欧美一区二区三区| 日本五十路高清| 欧美乱妇无乱码| 成人高潮视频无遮挡免费网站| 波多野结衣高清作品| 亚洲avbb在线观看| 久久久久亚洲av毛片大全| 18禁裸乳无遮挡免费网站照片| 久久久久久久久中文| 九九久久精品国产亚洲av麻豆| 男人和女人高潮做爰伦理| 欧美激情久久久久久爽电影| 欧美一区二区国产精品久久精品| 国产av一区在线观看免费| 无遮挡黄片免费观看| 精品欧美国产一区二区三| 国产精品国产高清国产av| 久久精品综合一区二区三区| 亚洲不卡免费看| 国产av在哪里看| 精品欧美国产一区二区三| 白带黄色成豆腐渣| 亚洲av电影在线进入| 一本精品99久久精品77| 亚洲国产精品999在线| 国产精品嫩草影院av在线观看 | 天天添夜夜摸| 亚洲av成人精品一区久久| 欧美bdsm另类| aaaaa片日本免费| 日本黄大片高清| 嫩草影院入口| 国产伦在线观看视频一区| 国产成年人精品一区二区| 欧美一级毛片孕妇| 1000部很黄的大片| 久久精品91无色码中文字幕| 欧美乱色亚洲激情| 老汉色av国产亚洲站长工具| 久久伊人香网站| 欧美av亚洲av综合av国产av| bbb黄色大片| 欧美日韩综合久久久久久 | 久久精品国产综合久久久| 亚洲精品色激情综合| 亚洲国产精品久久男人天堂| 久久国产乱子伦精品免费另类| 欧美日韩一级在线毛片| 老司机在亚洲福利影院| 深爱激情五月婷婷| 免费一级毛片在线播放高清视频| 宅男免费午夜| 国产成人啪精品午夜网站| 男女午夜视频在线观看| 国产精华一区二区三区| 亚洲av二区三区四区| 极品教师在线免费播放| 久久久精品大字幕| 免费av毛片视频| 91麻豆av在线| 村上凉子中文字幕在线| 美女 人体艺术 gogo| 亚洲一区二区三区色噜噜| 最后的刺客免费高清国语| 老司机在亚洲福利影院| 国产伦人伦偷精品视频| 午夜视频国产福利| 十八禁网站免费在线| 久久久久精品国产欧美久久久| 亚洲成人中文字幕在线播放| 非洲黑人性xxxx精品又粗又长| av在线蜜桃| 嫁个100分男人电影在线观看| 激情在线观看视频在线高清| 深爱激情五月婷婷| 国产乱人视频| 变态另类丝袜制服| 日本一本二区三区精品| 一区福利在线观看| 国产精品久久久久久久久免 | 一级黄色大片毛片| 亚洲aⅴ乱码一区二区在线播放| 国产午夜精品论理片| 亚洲内射少妇av| 精品久久久久久久久久免费视频| 色综合亚洲欧美另类图片| 亚洲精品乱码久久久v下载方式 | www国产在线视频色| 悠悠久久av| 午夜精品在线福利| 在线天堂最新版资源| 国产精品久久久久久人妻精品电影| 熟妇人妻久久中文字幕3abv| 天天添夜夜摸| 午夜日韩欧美国产| 成人特级av手机在线观看| 淫妇啪啪啪对白视频| 美女黄网站色视频| 天堂影院成人在线观看| 九九久久精品国产亚洲av麻豆| 成人亚洲精品av一区二区| 亚洲精品日韩av片在线观看 | 欧美高清成人免费视频www| 国产成人福利小说| 免费观看精品视频网站| 亚洲av免费在线观看| 午夜久久久久精精品| 丰满的人妻完整版| svipshipincom国产片| 一区福利在线观看| 午夜福利成人在线免费观看| 97人妻精品一区二区三区麻豆| 午夜激情福利司机影院| 精品电影一区二区在线| 99在线视频只有这里精品首页| 亚洲中文字幕一区二区三区有码在线看| 欧美成人一区二区免费高清观看| 欧美不卡视频在线免费观看| 亚洲欧美日韩卡通动漫| 757午夜福利合集在线观看| 18禁美女被吸乳视频| 欧美绝顶高潮抽搐喷水| 观看美女的网站| 天天一区二区日本电影三级| 51午夜福利影视在线观看| 99国产精品一区二区三区| 国产黄色小视频在线观看| 内射极品少妇av片p| 国产精品女同一区二区软件 | 操出白浆在线播放| 午夜福利高清视频| 国产真实乱freesex| ponron亚洲| 黄色片一级片一级黄色片| 中文字幕av成人在线电影| 国产av一区在线观看免费| 色噜噜av男人的天堂激情| 精品久久久久久成人av| 国产不卡一卡二| 精品午夜福利视频在线观看一区| 午夜福利18| 狂野欧美激情性xxxx| 国产一区二区在线观看日韩 | 国产精品99久久99久久久不卡| 国产精品 国内视频| av福利片在线观看| 久久天躁狠狠躁夜夜2o2o| 三级男女做爰猛烈吃奶摸视频| 老鸭窝网址在线观看| av福利片在线观看| www.www免费av| 日韩中文字幕欧美一区二区| 久久久国产成人免费| 国产乱人视频| 日本撒尿小便嘘嘘汇集6| 人人妻,人人澡人人爽秒播| xxxwww97欧美| 亚洲国产精品sss在线观看| 特大巨黑吊av在线直播| 久久精品夜夜夜夜夜久久蜜豆| 亚洲avbb在线观看| 丰满的人妻完整版| 亚洲人成网站高清观看| 欧美黄色片欧美黄色片| 淫妇啪啪啪对白视频| 99在线人妻在线中文字幕| 午夜视频国产福利| av在线天堂中文字幕| 国产av麻豆久久久久久久| 中文字幕人妻丝袜一区二区| 中文字幕久久专区| 欧美性猛交黑人性爽| 香蕉久久夜色| 欧美激情在线99| 变态另类成人亚洲欧美熟女| 男人的好看免费观看在线视频| 精品一区二区三区人妻视频| 亚洲人成电影免费在线| 亚洲精品一区av在线观看| 少妇的丰满在线观看| 国产精品香港三级国产av潘金莲| 在线播放国产精品三级| 日本精品一区二区三区蜜桃| 成人欧美大片| 俄罗斯特黄特色一大片| 成人高潮视频无遮挡免费网站| 嫩草影院入口| 欧美日韩综合久久久久久 | 国产亚洲精品久久久久久毛片| 国产亚洲精品久久久com| 老熟妇仑乱视频hdxx| 桃色一区二区三区在线观看| 亚洲成人中文字幕在线播放| 国产久久久一区二区三区| 男女床上黄色一级片免费看| 可以在线观看毛片的网站| 琪琪午夜伦伦电影理论片6080| 99国产精品一区二区三区| 国模一区二区三区四区视频| 不卡一级毛片| 精品无人区乱码1区二区| 亚洲成人中文字幕在线播放| 欧美国产日韩亚洲一区| 日本精品一区二区三区蜜桃| 国产精品野战在线观看| 亚洲av一区综合| 婷婷精品国产亚洲av| 久久九九热精品免费| 久久欧美精品欧美久久欧美| 国产v大片淫在线免费观看| 亚洲片人在线观看| 国产成人av激情在线播放| 亚洲无线在线观看| 亚洲欧美日韩东京热| 黄片大片在线免费观看| 亚洲一区二区三区色噜噜| 很黄的视频免费| 欧美bdsm另类| 日本一本二区三区精品| 成人特级av手机在线观看| 亚洲欧美日韩东京热| 丝袜美腿在线中文| 麻豆久久精品国产亚洲av| 亚洲成a人片在线一区二区| 久久精品夜夜夜夜夜久久蜜豆| 国产高清三级在线| 女生性感内裤真人,穿戴方法视频| 国产午夜精品论理片| 久久精品91无色码中文字幕| 每晚都被弄得嗷嗷叫到高潮| 97超级碰碰碰精品色视频在线观看| 欧美bdsm另类| 亚洲在线自拍视频|