Shen Di, Jiang Yong, Zhu Xianli, Li Mengjie
State Key Laboratory of Fire Science, University of Science and Technology of China, Hefei 230027, China
Abstract: This paper aims at accurately predict the smoke temperature in a single-room fire. Since both high-fidelity simulations and single-fidelity surrogate models cost much computational time, it is hard to meet the emergency needs of fire safety management. Therefore, a multi-fidelity model named CoKriging was introduced , which made use of the simulation data from Consolidate Fire and Smoke Transport (CFAST) and Fire Dynamic Simulator (FDS) for training. The leave-one-out cross-validation suggests that this model has been effectively trained when the data ratio of CFAST to FDS is 10∶1. Further comparisons among different methods show that the prediction accuracy of CoKriging is comparable to that of artificial neural network (ANN) and Kriging, while the modeling time is only 1/10 of the latter. Additionally, the predicted temperatures of CoKriging are very close to the simulated results of FDS, and once the CoKriging model is successfully constructed, much less time will be taken to make a new prediction than that of FDS. The exploratory research on the proportion of high-and low-fidelity data to the prediction results of CoKriging shows that there is no obvious correlation between them, and the prediction accuracy can still be ensured even if only a small amount of FDS data participates in model testing. In conclusion, the CoKriging model could be used as a fast and effective regression analysis method for the temperature prediction in a single-room fire.
Keywords: multi-fidelity;surrogate model;CoKriging;smoke layer temperature;single-room fire
The high temperature and hot smoke generated in fires pose a big threat to the emergency evacuation. Hence, predicting the smoke temperature accurately and rapidly can give some scientific guidelines for the fire safety management.Recently, numerical simulation technologies have been widely used in the fire research, but high-fidelity simulation methods (HFSM, e.g. FDS) require considerable computational time and resources, which are not applicable to the problems with multiple calculations (e.g. sensitivity analysis).Much faster alternatives to HFSM, referred as surrogate models or meta methods, have been developed[1]. However, these methods are either too simple to describe a complex system[1], or rely on large amounts of training data, which cannot satisfy the needs of rapid modeling[2,3]. Therefore, it is of great significance to develop a surrogate model with relatively high prediction accuracy but low time cost.
The Kriging method, which is known as Gaussian Process Regression (GPR)[4], has aroused much attention due to its capability in achieving good predictions with a relatively small training sample size[5]. However, the computational task in the training data preparation stage is still challenging.To solve this problem, a more effective approach called CoKriging has been developed. It’s originated from Kriging,and estimates for a poorly sampled variable with the help of a well-sampled variable[6]. In 2000, Kennedy and O'Hagan[7]first extended the CoKriging model from geostatistics to the field of engineering science, providing a realistic way to evaluate the target function with other fidelity data. Among them, the majority were generated from alow-fidelity model and responsible for the trend prediction, while only a few data came from a high-fidelity model, which were generally used to modify the prediction results.
In the past 10 years, the CoKriging model has been extended to a framework with more than two fidelity levels[8], but its applications are still concentrated in the areas closely related to geostatistics, such as the interpolation prediction of particle distribution when considering wind speed curves as auxiliary variable[9], the spatial probability estimation of nitrate content excess with a transmissivity map as covariate[10], and the evaluation of trace elements (Cr, Cu, Cd, Pb, and Zn) via the multivariable method[11]. These researches all find that CoKriging is an effective way in improving prediction accuracy as the addition of secondary variables, which helps reduce uncertainty. In the aircraft flutter analysis by CoKriging[12], fewer infill points are required for the convergence compared to single-fidelity approach. Similar conclusions are drawn in the reliability analysis of nondestructive testing systems[13],which means this multivariate meta method makes it possible to reduce the modeling cost further.
Nevertheless, the prediction performance of CoKriging greatly depends on the correlation between high-fidelity and low-fidelity data. If the training set is too random, the classic ordinary or simple Kriging constructed merely from the high-fidelity data may behave more effective than CoKriging[14]. Another drawback of ordinary CoKriging is that some model assumptions may cause the covariates less relevant than it should be, resulting in loss of information[15]. Currently, several improved CoKriging-based models, like the recursive CoKriging model[16]and the CoPhIK[17], have been proposed. At the same time, more advanced sampling methods are being developed as the alternative optimization strategy[18, 19].
Although the CoKriging model has been studied for more than two decades, its applications on fire science are still very limited. In the numerical experiments by Rémi Storch[20]and Séverine Demeyer[4], the high-/low-fidelity models were constructed with multi levels of discretization and different numerical simulation methods respectively. All the results show that the well-trained CoKriging model can effectively predict the failure probability of ventilation facilities when a fire breaks out. This multi-fidelity method is also introduced as a new fire investigation technique in the study of Nan Li et al[5].
In this paper, we selected the simulations of FDS and CFAST under the same fire scenario, respectively as the high- and low- fidelity data for CoKriging training. There were three inputs available in this model, and they were imported to predict the smoke layer temperatures. By comparing numerical simulation methods (FDS, CFAST) with single-fidelity surrogate models (ANN, Kriging), the following sections will aim to illustrate the effectiveness and characteristics of CoKriging. This work may provide a new alternative method for the quick and accurate prediction of the fire temperatures.
CoKriging, also known as Multi-fidelity Kriging or Multivariate Kriging, is essentially a supervised machine learning method based on Gaussian process regression (Kriging)[20].It is assumed that the interpolation prediction by Kriging is just like the realization of a stochastic process[21], and the predicted value is treated as a random variable Z yielding to a Gaussian distribution.
Z~GP(μ,∑)
(1)
whereμis the maximum likelihood estimate (MLE) of the mean, ∑ is the covariance matrix between Z and the samples.
As a natural extension to Kriging, the CoKriging method also solves the associated value at an unknown point through spatial interpolation[21], particularly specializing in utilizing the auto-correlation and cross-correlation of different Gaussian processes (multi-fidelity data)to learn about the predictor[5], and thus leading to a complex notation. To simplify the method, only two fidelity levels are considered in general.
According to the above description, two Gaussian processes ZH~GP(μH,∑H) , ZL~GP(μL,∑L) are displayed, representing the high-and low-fidelity datasets respectively[8]. Following the auto-regressive model of Kennedy & O'Hagan[7], CoKriging approximates the high-fidelity outputs as the low-fidelity data multiplied by a constant scaling factorρplus another Gaussian process independent from ZL.
ZH(x)=ρZL(x)+ZD(x)
(2)
where ZD~GP(μD,∑D) measures the difference between high-and low-fidelity processes[5]. As a consequence,μH=ρμL+μD, ∑H=ρ2∑L+∑D.
From a Bayesian point of view, the existing low-fidelity data is firstly regarded as prior information, and then a high-fidelity Gaussian process will be established between inputs and outputs through the covariance matrix. The target posterior distribution is finally obtained by Bayes theorem[22].
Here we assume that two sets of samples are given, consisting ofXL(nLinputs for low-fidelity model) andXH(nHinputs for high-fidelity model) in a continuous multi-dimensional search space with observationsYLH. The above parameters are expressed as
(3)
(4)
(5)
Accordingly, the following set of random vectors YLH={yL,yH} can be used to define these stochastic processes, and more details are shown as
(6)
For simplicity, (XL,XL) , (XL,XH) , (XH,XL) , (XH,XH) are marked asXLL,XLH,XHL,XHHrespectively. The covariance matrix C between random variables yLand yHis given by
(7)
(8)
wherecis the vector of covariances between the known samplesXHLand the new inputx.
This study focuses on a typical single-room fire case at 1 atmosphere and 50% humidity. Referring to the steady-state experiments by Steckler et al[23], the test was conducted in a 2.8 m by 2.8 m by 2.18 m single compartment shown in Figure 1. The lightweight walls and ceiling were covered with a ceramic fiber insulation board, and the floor was made of concrete. There was a door (height: 1.83 m) as the natural vent on one of the walls. The fire was ignited by a 30 cm wide porous square diffusion burner supplied with commercial grade methane at a fixed rate, and it was located at the wall opposite to the door, roughly 0.02 m above the floor. In order to measure the smoke temperatures at different heights, an array of aspirated thermocouples was fixed in the room.Each thermocouple returns a time evolution of temperature. As depicted in Figure 1, they were 0.305 m away from the left/backwall (x=y=0.305 m), and equally spaced at an interval of 0.114 m in the vertical direction. More details are given in Table 1.
Figure 1. Schematic diagram of the single-room fire.
The size of natural vent (i.e. door width,L) and heat release rate (HRR) are key parameters for fire growth, and thus they were chosen together with ambient temperature (Ta) as the inputs of CoKriging. According to real natural environments, the variation range of ambient temperature Tawas set as 6 ℃-36 ℃. The other two inputs varied consistently with those of Steckler’s experiments[23], see Table 2 for details. In this work, the average temperatures of the upper (Tu) and lower layer (Tl) were selected as the model outputs to characterize the fire risk.
Table 1. Distance distribution of the thermocouples in z-axis direction.
Table 2. Controlled variables of the experiment scenarios (The three input variables are uniformly distributed).
A well-performed CoKriging model requires a strong cross-correlation between low-and high-fidelity data, and the different fidelity levels principally originate from three categories[24]: simplifying the mathematical model of the physical reality, changing the discretization model, and using experimental results.Two fire simulation softwares developed by National Institute of Standards and Technology (NIST): Fire Dynamic Simulator[25](FDS, version 6.4.0) and Consolidate Fire and Smoke Transport[26](CFAST, version 7.4.2) were used to provide multi-fidelity data for this work. The simulation type of FDS was chosen as ‘Large Eddy (LES)’. It solves the Navier-Stokes equation of low Mach flow driven by fire buoyancy, usually leading to relatively accurate results. ‘Two-zone model’ is the default option for CFAST, and there are two basic assumptions for this built-in model: first, each compartment is divided into two control volumes——the hot upper layer with fire smoke, and the cold lower layer full with clean air; second, every control volume is internally uniform in temperature and composition. Because of the strong simplifications, only approximations to FDS are obtained by this two-zone model. In that way, the mentioned two fire simulation models above were tentatively scheduled as high- and low- fidelity solving methods respectively. The rationality of this choice will be verified in Section 4.3.
The following numerical experiments were conducted in a full-scale (1∶1) structure of the fire scenario shown in Figure 1, and all the basic parameters were defined consistently between CFAST and FDS, which were also the same as that of Steckler’s study. Some properties for the building materials are specified in Table 3. Due to the low conductivity of the ceramic fiber, the ceiling and side walls were approximately insulated. The only way for heat and mass transfer to the outside was through an open door. After a default ram-up time for HRR, the methane burner in FDS kept a steady fire for 50 s. Accordingly, the ignition time for CFAST was set to 1s after the simulation began. The methane fuel was assumed to react completely, with a combustion heat of 55644 kJ/kg and a radiation fraction of 0.14.
Table 3. Material properties required for the single-room modelling (T=297.15 K).
Unlike the easy-to-operate CFAST, a detailed grid division is required for FDS. Whether it is properly designed directly determines the output precision and simulation time cost. The grid size is initially determined by the characteristic length scale (D) of a fire plume structure[27].
(9)
whereρ,CandTdenote the density (kg/m3), specific heat (kJ/(kg·K)) , and temperature (K) of the air respectively;grepresents the acceleration of gravity (m2/s) ;Qis the heat release rate (HRR,kW) of fire source. It can be seen that the computed result is proportional to HRR. Properly increasing the discretization degree helps prompt the simulation accuracy, so the grid size under the minimum HRR (i.e. 30 kW) could be extensively applied to all the FDS experiments in this study: the characteristic length was calculated to 0.227 m, after divided it by 10, the resulting cell size is about 0.0227 m. In thexandydirections, the cell size was set to be 0.0232 m and 0.0219 m respectively.
To verify the grid independence, some adjustments were made on the basis of 0.0227 m, that is, the cell size along thez-axis was selected from 0.010 m, 0.015 m, 0.020 m, 0.025 m, 0.030 m. Under such grid conditions, the temperature changes over time are shown in Figure 2. Actually, these continuous records were from a measurement point, which was located in the corner and 0.969 m above the ground.From Figure 2, it can be observed that the predictions under different grids close to each other. Considering the computational cost and flow resolution, the median value 0.020 m was suggested as the best size for the grid in thez-axis direction. Overall, the cell was determined 0.0232 m×0.0219 m×0.020 m in size.There were 1244160 grid cells in total. To improve the solving efficiency, the computational domain was separately divided into 7 meshes, and each was assigned to one processer.
In order to get better predictions near the opening, the computational domain was extended to 0.4 m beyond the left wall. As shown in Figure 3, all the sides of this extra region were specified as free boundary condition where the static pressure equaled to atmospheric pressure (i.e. 101.3 kPa).
Figure 2. Grid independence verification.
After a simulation of 50 s,Tu and Tl can be directly obtained from the CFAST interactive interface. Nevertheless, it is a little more complicated for FDS.
According to the FDS measurements from aspirated thermocouples, we can only draw a scatter plot of temperature varying with height. After connecting the scattered points with a smooth curve as indicated in Figure 4, it is clear to find that the slope at the beginning and end of the curve approximates to zero, which means the temperature distributions within these heights are almost uniform.
The acquisition for target outputs requires some extra processing: the extremal values of sensor temperatures are computed, and then 1% of the difference between the maximum value and minimum value is taken as the acceptable error level to measure whether the temperature deviation between two measurement points exceeds the standard. If the deviations of consecutive measurement points remain at the acceptable error level, the spatial extents of these thermocouples can be considered as steady smoke layers, which is similar to the ‘two-zone’ assumption in CFAST.
Take the test case of Ta=7 ℃,L=0.67 m, HRR=45.7 kW as an example.The highest temperature measurement was 148.05 ℃ for point 19, and the lowest was 11.28 ℃ for point 1, so only when the temperature difference between any two points on both sides of the curve is less than 1%×(148.05 ℃~11.28 ℃)≈1.37 ℃, the corresponding space can be approximated as the lower or upper layer. As illustrated in Figure 4, the average ordinate values of the blue points on the left and right sides can be taken as Tu and Tl respectively for this example:
The leftmost four measurements are: 11.28 ℃, 11.68 ℃, 12.48 ℃, 13.78 ℃; The rightmost three measurements are: 146.81 ℃, 147.71 ℃, 148.05 ℃
Tu=(147.71 ℃+148.05 ℃)/2=147.88 ℃,
Tl=(11.28 ℃+11.68 ℃+12.48 ℃)/3=11.81 ℃
David[28]summarized some modeling considerations of multi-fidelity Kriging, and particularly pointed out that the strong correlation between high-and low-fidelity functions plays an important role in accuracy promotion and budget reduction. In such a way, it was necessary to investigate the correlated level between CFAST and FDS prior to the follow-up research.
Based on the single-room fire scenario shown in Figure 1,a random sample of 10 test cases was generated from Table 2 for correlation check.In each case, the average temperatures of upper/lower layer simulated by CFAST and FDS are recorded in Table 4.
Table 4. 10 sets of randomly generated cases for the correlation test between FDS and CFAST simulations.
Pearson correlation coefficientρis generally applied to indicate the linear correlation between two variables[29]
(10)
where the arithmetic operatorσandEstand respectively for the standard deviation and expected values of the variables, cov(X,Y) is the covariance betweenXandY.The closer it is to 1, the stronger positive correlation exists.
After calculation,the Pearson correlation coefficient was 0.63 for the prediction results of FDS and CFAST on the average temperatures of upper layer, while 0.65 for lower layer, which were both greater than 0.6, meaning a relatively strong positive correlation[30,31]. In conclusion, FDS and CFAST were permitted as high-and low- fidelity methods to take part in the model construction of CoKriging.
The proportion of training data with different fidelities has a significant influence on the predicted results[28]. In order to reduce the modeling complexity, an initial CoKriging model was tested with the fitted high-and low-fidelity functions before the temperature predictions. These details have been omitted in this article, but the test results are given. It revealed that 140 sets of low-fidelity data combined with 14 sets of high-fidelity data could obtain relatively good predictions. Therefore, the subsequent research followed this proportion, and made it an original criterion to determine the proportion of CFAST and FDS data. In other words, 140 sets of CFAST simulation data, together with 14 sets of FDS simulation data, constituted a hybrid training set for CoKriging. The influence of data proportion on the predicted results will be explored more in Section 6.3.
Obviously, there were significant dimensional differences among the model variables. To eliminate this impact on the predicted results, all the inputs and responses were normalized prior to the model training[32]. Furthermore, a space-filling design of experiments (DOE) contributes to establishing an effective data-driven model with a low computational budget. In this work, the algorithm proposed by Julien Bect[33]was adopted to complete the nested Latin hypercube design of 140 sets of CFAST samples and 14 sets of FDS samples in MATLAB environment.
The CoKriging model with 'multiple inputs-single output' can be created by the MuFi Cokriging toolbox[34], so there was a need to model separately for the two target parameters. Except for the training data, the other parameter settings of the two models are identical. A flow chart of the model construction is given at Figure 5.
Figure 5. Flowchart for developing a fire prediction model based on CoKriging.
Figure 6. Comparison of the predicted average temperature among FDS, CFAST, and CoKriging.
The linear trend of the Gaussian process was determined by default.In order to be admissible, the covariance function of the associated random process paths has to be chosen beforehand from a parametric family of kernels known to be positive definite. In reality, five kinds of acceptable options are available in MuFi Cokriging, representative for different levels of smoothness, and the recommended kernel, Matérnv=5/2, was applied here. More details on covariance function can be found in Olivier and David (2010)[35]. As for other unspecified parameters, what one typically solving method is to estimate it by Maximum Likelihood Estimation (MLE). Several tentative experiments had shown that ‘BFGS’ (the optimal quasi-Newton procedure with the method ‘L-BFGS-B’) provided a better global optimum solution than ‘gen’ (the Genoud algorithm from aRpackagergenoud), so the former optimization method was suggested for the following iterations. In addition, the use of analytical gradients is conducive to the convergence acceleration.
As a matter of fact, the covariance kernel Matérnv=5/2 is only suitable for the Gaussian Process conforming to the continuity hypothesis, while in practical simulations, a slight change in the input vector may sometimes cause a jump in the response, which results in the irreversibility of the covariance matrix. Fortunately, such instabilities can be eliminated by introducing a constant term into the covariance kernel. This modification refers to the so-called nugget effect in geostatistics. Since the needed homogeneous nugget effect in this study was not known exactly, we specified ‘nugget.estim=TRUE’ when modeling.
The prepared 154 sets of training data (140 sets of CFAST and 14 sets of FDS) were imported into the predefined CoKriging model. Through many times of iteration, it was becoming more and more suitable for this work. The quantitative descriptions of the model effectiveness were given by leave-one-out cross-validation (LOO). Due to its nested experimental design, LOO was conducted with a method that deleted one of the high-fidelity data at each verification test. This process eventually generated a set of error values, whose dimension was consistent with the high-fidelity data
{xi,pre-xi,sim}i=1,2,…,14
(11)
wherexi,prewas the prediction result of CoKriging, andxi,simwas the deleted simulation result of FDS.
Accordingly, the average absolute error (MAE) and the root mean square error (RMSE) could be calculated respectively, both of which stood for the unbiasedness of the results. The results with normalization are shown in Table 5.
(12)
(13)
Table 5. LOO cross-validation of the trained CoKriging model.
As indicated in Table 5, the calculated values representative for the model error are all at low level, suggesting that the established two CoKriging models in this work were effectively trained.
From the validity verification, it can be concluded that the CoKriging model has been effectively trained with a few of high-fidelity data plus a large amount of low-fidelity data, demonstrating its strong ability in correlating the fine and coarse training data. Furthermore, the advantages of CoKriging over fire numerical simulation methods (FDS, CFAST) will be discussed below, especially in terms of prediction accuracy and computational cost. The specific comparison analyses between them were made as follows.
First of all, ten sets of input parameters for the single-room fire, independent from the training data, were randomly generated and then imported into FDS,CFAST, and the CoKriging model respectively. Each predicted outputs was simply processed into the average temperature of the upper/lower layer. Among them, the predictions of FDS were regarded as the comparison benchmark due to its high fidelity.
On the one hand, the prediction accuracy can directly reflect the model performance.As is depicted in Figure 6, for both the upper and lower layer, the average temperatures predicted by CoKriging are very close to the simulation results of FDS, and the accuracy is obviously much higher than that of CFAST. For further quantitative comparison, the predictions of CoKriging were evaluated by mean absolute error (MAE), mean relative error (MRE), root mean squared error (RMSE). For the average temperature of the upper layer, the errors were 7.86 ℃, 0.03, and 9.12, similarly, 1.78 ℃, 0.055, and 2.36 for the lower layer. Summing up the above analyses, whether by means of qualitative observation or quantitative error calculation, it can be proved that this well-trained CoKriging model is likely to predict the temperatures accurately.
Figure 7. Regression analyses on the average temperatures predicted by CoKriging.
Figure 8. Regression analyses on the average temperatures predicted by Kriging.
Figure 9. Regression analyses on the average temperatures predicted by ANN.
On the other hand, time saving is the biggest incentive for applying the CoKriging model. According to numerical experiments, a CFAST simulation on a ASUS desktop (Intel Core Processor i7-4790@3.60 GHz; 8 GB RAM; Windows 10 Pro., 64 Bit) cost about one second, however the running time of an FDS was much longer than this. It took more than 16 hours to simulate four FDS simulations in parallel on a Ubuntu server (Intel Xeon Processor E5-2690 v2 @ 3.00 GHz; 62 GB RAM; GNU/Linux 4.4.0-137-genericx86_64). In this way, the time cost for the CoKriging construction could be estimated, which approximated to 56 hours, almost equivalent to the running time required for one FDS simulation on PC. It’s worth noting that the well-trained CoKriging can make a prediction as fast as CFAST, but with a much higher accuracy.
The above comparison results are summarized in Table 6. In a word, the CoKriging model is effective, and it may be constructed as a substitute approach for FDS to some degree.
Table 6. Comparison between FDS, CFAST and CoKriging from time cost & prediction accuracy.
Actually, there have been many researches on how to translate numerical simulation models into computationally easy surrogate models[1], and the most commonly used method, artificial neural network (ANN), was adopted in this study. As noted before, the training process of ANN and Kriging only relies on data with one fidelity, so these alternatives can be collectively referred to as single-fidelity surrogate models. By comparing with ANN and Kriging, this part continued to explore the performance of CoKriging on modeling time cost and prediction accuracy.
To control the variables, the inputs of training data were consistent in single-fidelity and multi-fidelity surrogate models, but it was different for the outputs of training data. Traditional single-fidelity surrogate models are trained with reliable data only from high-fidelity simulations. Therefore, to ensure the prediction accuracy of ANN and Kriging, the first 140 sets of CFAST simulation results for CoKriging training must be replaced with the corresponding FDS simulation ones. It was roughly estimated that the preparation for 154 sets (140 sets + 14 sets) of FDS simulation results cost about 576 hours. However, the constructed CoKriging models in this study only had 14 sets of training data that were derived from FDS, and the modeling time was scaled down to 56 hours, about a tenth of that for FDS (single-fidelity surrogate) models. It is worth noting that, CoKriging only takes 1 second to make a new prediction after well-training, which is identical to Kriging and ANN.
Except for modeling time cost, another evaluation index for model performance is accuracy. The coefficient of determination (R2) is often used to assess the closeness of model’s approximated prediction data to the actual data[36]
(14)
whereSSRis the sum of squares due to regression,SSTis the total sum of squares. This index reflects to a certain extent how the independent variable explains about the dependent variable:The higher theR2value in a regression model , the higher approximation capabilities.
Here, the simulation results of FDS were still regarded as a benchmark for the accuracy comparison. Figure 7 to Figure 9 plot the prediction results versus the FDS simulation results, respectively for CoKriging, Kriging, and ANN.The calculatedR2of the CoKriging models are both greater than 0.98, suggesting a relatively high prediction accuracy. By comparison, it is found that this evaluation index of the three models is very close to each other, which demonstrates that although only one-tenth of the high-fidelity datda is used, the CoKriging model remains as valid as the single-fidelity alternatives.
Figure 10. Prediction results of Tu by CoKriging method under different proportion of high and low fidelity data.
Figure 11. Prediction results of Tl by CoKriging method under different proportion of high and low fidelity data.
Table 7 summarizes the above comparative analyses. Obviously, the CoKriging model is specialized in combining a small amount of high-fidelity data to achieve the same prediction accuracy as single-fidelity surrogate models like ANN and Kriging. This helps relieve the computational burden significantly. Therefore, the CoKriging model can be used as a fast and effective prediction method to replace the traditional single-fidelity surrogate one.
Table 7. Comparisons between ANN, Kriging and CoKriging from time cost & prediction accuracy.
As one kind of machine learning approach, CoKriging's prediction results are closely dependent on training data. Earlier in the article, the ratio of CFAST∶FDS=10∶1 was determined by the test results of two fitted functions.
Here, we will try to discuss more specifically on how the prediction accuracy varies with the proportion of high and low fidelity data. Different from the prediction accuracy, the modelling time cost is affected by computer load or other uncertain factors, ultimately only a rough estimate can be obtained. Therefore, the influence of data ratio on the modelling time cost will not be discussed here, but it is obvious that with the addition of high-fidelity data, it will increase significantly.
In subsequent work, the size of training data is consistent with the initial set, that is, the total number fixes at 154 as far as possible. As shown in Figure 10, more data divisions meeting the nested experiment design (low-fidelity data is more than high-fidelity data, and the quantity must satisfy the multiple relationship) are generated on this condition.
According to the subgraph (a) in Figure 10 to Figure 11, we find that even with different proportion of CFAST and FDS data to train the model, the final prediction results are close to the real simulation results of FDS (Except for the prediction result of Tl for test group 2, when the ratio is 13∶1. It is probably because the experimental design of training data is inappropriate, resulting in a model without adequate training). As shown in Figures 10~11 (b), the relative error of the test results reflects that there is no clear pattern to describe its change with data ratio. Even if only a very small amount of FDS data to participate in training: the ratio of CFAST to FDS equals to 21∶1, which means only seven of the 154 training sets were high-fidelity, and its prediction error remains at low level, reflecting the strong ability of CoKriging in data fusion.
Above all, it is found that the proportion of high-and low-fidelity data doesn’t have a significant impact on the prediction results of CoKriging. In the future work, we may continue to discuss the effects of correlation degree, the experimental design method, and the separate addition of high-/low-fidelity points on the prediction results, or even explore the influence priorities of these factors to provide some guidance for CoKriging modeling.
In this paper, two CoKriging models have been established and applied to the temperature predictions of a single-room fire. These multi-fidelity models were trained with 14 sets of FDS data and 140 sets of CFAST data. To demonstrate the model performances on prediction accuracy and computational cost, they were compared with single-fidelity surrogate models and numerical simulation methods. The major conclusions are as follows:
(Ⅰ) For the temperature predictions of a single-room fire, the high- and low- fidelity training data of CoKriging are obtained from FDS/CFAST calculation results respectively. It is concluded that the CoKriging model can be effectively trained when the data ratio of FDS to CFAST is 1∶14.
(Ⅱ) Compared to FDS, the well-trained CoKriging model poses great savings on computational resources. For a single-room fire, the CoKriging model only takes about 1s to make a new prediction, but at least 16h for FDS. Meanwhile, the predicted temperatures of CoKriging are very close to the simulated results of FDS, and the mean absolute deviation is much lower than that of CFAST. Due to the good performance, CoKriging can be used as an alternative model for FDS in a single-room fire.
(Ⅲ) Unlike single-fidelity surrogate models, ANN and Kriging, CoKriging does not require large amounts of high-fidelity data for training. As a consequence, this multi-fidelity surrogate model can shorten the modeling time to 1/10 while keeping the results as reliable as those of ANN and Kriging, which significantly lower the computational cost in constructing a temperature prediction model for the single-room fire.
(Ⅳ) The change in the proportion of high-and low-fidelity data has little effect on the prediction results of CoKriging. Even if only a very small amount of high-fidelity data participates in model training, good prediction results can still be obtained.