Ruizhi Zhang ,Xiaojing Jia,* ,Qifeng Qian
a Key Laboratory of Geoscience Big Data and Deep Resource of Zhejiang Province, School of Earth Sciences, Zhejiang University, Hangzhou, China
b Zhejiang Institute of Meteorological Science (Chinese Academy of Meteorological Sciences, Zhejiang Branch), Hangzhou, China
Keywords:Heatwave frequency Eastern Europe Summer Machine learning
ABSTRACT A machine-learning (ML) model,the light gradient boosting machine (LightGBM),was constructed to simulate the variation in the summer (June–July–August) heatwave frequency (HWF) over eastern Europe (HWF_EUR)and to analyze the contributions of various lower-boundary climate factors to the HWF_EUR variation.The examined lower-boundary climate factors were those that may contribute to the HWF_EUR variation–namely,the sea surface temperature,soil moisture,snow-cover extent,and sea-ice concentration from the simultaneous summer,preceding spring,and winter.These selected climate factors were significantly correlated to the summer HWF_EUR variation and were used to construct the ML model.Both the hindcast simulation of HWF_EUR for the period 1981–2020 and its real-time simulation for the period 2011–2020,which used the constructed ML model,were investigated.To evaluate the contributions of the climate factors,various model experiments using different combinations of the climate factors were examined and compared.The results indicated that the LightGBM model had comparatively good performance in simulating the HWF_EUR variation.The sea surface temperature made more contributions to the ML model simulation than the other climate factors.Further examination showed that the best ML simulation was that which used the climate factors in the preceding winter,suggesting that the lower-boundary conditions in the preceding winter may be critical in forecasting the summer HWF_EUR variation.
In recent decades,many European areas have experienced severe and frequent extreme temperature events under global warming(Sch?r et al.,2004 ;García-Herrera et al.,2010 ;Yang et al.,2021).As heat waves are dangerous to human health,the environment,and ecosystems,and can cause substantial economic losses,it is essential to understand the variation in,and dynamics of,heatwave events.Improving the ability of climate models to predict heatwave events is critical to contingency planning and decision making (Coumou and Rahmstorf,2012 ;Perkins,2015 ;Wulffand Domeisen,2019).
Previous work has revealed that heatwave events are generally concurrent anomalous high-pressure systems with a long duration that are associated with favorable warm conditions for heat waves (Fischer and Sch?r,2010 ;Pezza et al.,2012 ;Perkins,2015).It has also been demonstrated that many heatwave events in Europe are caused by blockinghigh events over the midlatitudes–for instance,the European heatwave event in 2003 (Feudale and Shukla,2011),the central European and Russian event in 2010 (Grumm,2011),and the heatwave over northwestern Europe in 2018 (Kueh and Lin,2020).Certain lower-boundary conditions have also been revealed to contribute to extreme temperatures through complex feedback mechanisms.These lower-boundary contributors include sea surface temperature (SST),snow cover,soil moisture (SM),and sea ice (Hong and Kalnay,2000 ;Fischer et al.,2007 ;Koster et al.,2009 ;Wu et al.,2012,2013,2016 ;Chen and Zhou,2018 ;Wu and Francis,2019).
Currently,climate models generally have limited ability in capturing the characteristics of extreme temperature events.Some real-time climate models fail to predict heat waves owing to misrepresentation of the feedback process between the atmosphere and lower-boundary fields (Koster et al.,2011 ;Perkins,2015 ;Quandt et al.,2017 ;Ford et al.,2018).Nevertheless,recent work has shown that while it is difficult to precisely simulate extreme temperature events because of cognitive limitations,it may be possible to capture the variation in heatwave frequency (HWF) (Zhang et al.,2022).Recently,benefiting from the rapid development in computing technology,a number of machine-learning(ML) models and techniques have been applied in climate research(Badr et al.,2014 ;Ham et al.,2019 ;Hwang et al.,2019 ;Qian et al.,2020,2021).In this respect,several studies have demonstrated that ML models possess comparable forecasting skills to,or in some cases even outperform,dynamic numerical models (Qian et al.,2020,2021 ;Ham et al.,2021).However,whether ML models can be applied to simulate some of the characteristics of extreme temperature events remains unclear.
In this work,an ML model,the light gradient boosting machine(LightGBM) model,was applied to simulate the variation in summer HWF in eastern Europe (HWF_EUR) during the period 1981–2020.The ML model’s performance was assessed and the contributions of various lower-boundary climate factors used in the ML model were analyzed.
The daily maximum 2-m temperature data used in this study were retrieved from the ERA5-Land hourly dataset.This reanalysis dataset covers the period from 1979 to the present day with a global horizontal coverage at a resolution of 0.1°×0.1°(Mu?oz Sabater,2019).
The monthly mean SST and sea ice concentration (SIC) data were obtained from the Met Office Hadley Center (Rayner et al.,2003).These datasets (resolution: 1°×1°) cover the period from 1870 to the present day.
The snow-cover extent (SCE) dataset was obtained from Rutgers University Global Snow Laboratory (Robinson and Estilow,2012).It has a temporal range from October 1966 to the present day,and a spatial resolution of 25 km.The SCE data were transformed into monthly mean data to facilitate the analysis in the current work.
The monthly mean SM dataset,with a resolution of 1.875° × 1.9°covering the period from 1979 to the present day,was obtained from the NCEP Reanalysis II datasets (Kanamitsu et al.,2002).
The LightGBM model,which has been shown in previous work to perform reasonably well (e.g.,Song et al.,2019 ;Qian et al.,2021),was applied in this study to simulate the summer HWF_EUR.In addition,LightGBM is a tree-ensemble ML model with high operational efficiency and scalability (Ke et al.,2017),and therefore we were able to analyze the relative contributions of the climate factors used in the model simulation.Based on the algorithm proposed by Breiman et al.(1984),the contributions of the climate factors could be calculated.Moreover,we also utilized a linear regression (LR) model to conduct a similar simulation and compared its results to the LightGBM model.
In the present work,the summer HWF of a grid point denotes the total days when the daily maximum 2-m temperature (Tmax) exceeds the criterion of heat waves for at least six consecutive days during June–July–August (JJA).The criterion of heat waves is the 90th percentile ofTmax on each calendar day,calculated with a centered 15-day window for each calendar day.More details can be found in Perkins and Alexander (2013).
Following Qian et al.(2020,2021),the simulation method utilized a seasonal forecast scheme with an empirical orthogonal function (EOF)algorithm.The core algorithm is expressed as
wherexandydenote the spatial coordinate andtrepresents the time.EOFiand PCirepresent the pattern and time series of theith EOF of HWF_EUR,andnis the number of EOF modes.In this work,only EOF1 and PC1 were analyzed.They were used in the ML model to perform the HWF_EUR simulation.The HWF_EUR simulation was built according to Eq.(1) with the observational EOF1 and the model-simulated PC1.
Fig.1.(a) The standard deviation (contours;units: days) and climatological mean (shading;units: days) of HWF_EUR for the period 1981–2020.The framed area in (46°–62°N,28°–60°E).(b) The EOF1 of HWF_EUR (shading;units: days)calculated by regression against PC1 for 1981–2020.The dotted areas denote the HWF anomalies significant at the 0.05 level.
The standard deviation and climatological mean of HWF_EUR were calculated and depicted in Fig.1 (a).High HWF values,as well as high variability,can be observed over eastern Europe (HWF_EUR;46°–62°N,28°–60°E),which is denoted by the blue box in Fig.1 (a).The EOF1 of HWF_EUR for 1981–2020 accounts for 31.8% of the total HWF variance and passes the separation criteria of North et al.(1982).The spatial structure of EOF1 (Fig.1 (b)) also shows significant anomalous positive HWF over eastern Europe,consistent with Fig.1 (a).PC1 is closely correlated to an area-averaged HWF_EUR index,which is significant at the 0.01 level (not shown).
To test the feasibility of the ML simulation experiment,hindcast experiments for PC1 were run for the period from 1981 to 2020.The variables from the lower-boundary conditions that may impact the variation in HWF_EUR,i.e.,SM,SST,SCE,and SIC,in three seasons,i.e.,the simultaneous summer (JJA),preceding spring (March–April–May,MAM),and winter (December–January–February,DJF) were examined.The climate factors selected to build the ML model were calculated by standardizing and area-averaging the lower-boundary variables over the specific regions where they were significantly correlated with PC1.Details regarding the lower-boundary variables and selected areas to construct the climate factors are provided in Figs.S1–S4.
Cross-validation with a grid search scheme was performed to determine the hyper-parameter of the LightGBM model.A five-fold crossvalidation method was adopted to evaluate the model,and the averaged RMSE was the metric used to assess the performance of the model.Fig.2 shows the simulation of PC1 in the LR and LightGBM models for the period 1981–2020.The take-10-years-out method was used in the simulation.The simulated PC1 from LightGBM correlated significantly with that in the observation,with a temporal correlation coefficient (TCC) of 0.36,significant at the 0.05 level,and was higher than that of the LR model,which had a TCC of 0.31.Then,the simulated PC1 and the EOF1 from the observations were employed to build the HWF_ERU for the LR (Fig.2 (b)) and LightGBM(Fig.2 (c)) models.Results showed that the LightGBM and LR model simulations were reasonably well matched the observations,with significant TCCs appearing over eastern Europe.Comparatively,the Light-GBM model performed better than the LR model,especially over the key region of HWF_EUR.Take-1-year-out and take-4-years-out hindcast experiments were also conducted,and the TCC maps of the same model with different take-out windows were consistent in general(not shown).
Fig.2.(a) The standardized PC1 in the observations (black line) and the hindcast PC1 from the LR (blue line) and LightGBM (red line) model.(b,c) The TCCs between the observed and simulated HWF_EUR in the (b) LR and (c) LightGBM model for the period 1981–2020.Areas with TCCs significant at the 0.1 level are dotted.
Real-time simulation experiments were conducted with the same method as the hindcast experiments,but the PC1 and corresponding EOF1 were obtained from data for the period 1981–2010.The variables and regions used to construct the climate factors were those that were significantly correlated with PC1 for the same period.Details and a description of the climate factors are listed in Tables S1–S4,and the variables and regions selected are depicted in Figs.S5–S8.To mimic real-time simulation,the LightGBM models were trained with climate factors from 1981 to 2010 and simulated the variation in HWF_EUR for the period 2011–2020.
Fig.3 (a) and Fig.3 (b) depict the TCCs between the observed and simulated HWF_EUR from the LR and LightGBM models,respectively.It is shown that the real-time simulation result of the LightGBM model with all potential climate factors clearly outperformed that of the LR model,especially over the northern Black Sea where the TCC skill was negative for the LR model.
To evaluate the effects of these selected climate factors,several additional real-time simulation experiments using only some of the climate factors were conducted.The TCC maps of model experiments with single-field climate factors,i.e.,SST (Fig.3 (c)),SCE (Fig.3 (d)),and SM(Fig.3 (e)),in all three seasons,show positive TCCs over most regions of eastern Europe,while negative TCCs appear in the model experiment with SIC as the single factor (Fig.3 (f)).The results indicate that SST,SCE,and SM make positive contributions to the model skill in the experiments.The model experiments with climate factors in a single season,i.e.,the preceding DJF (Fig.3 (g)),preceding MAM (Fig.3 (h)),and JJA(Fig.3 (i)),show that the experiment with climate factors in the preceding DJF exhibits better simulation skill than that of MAM and JJA.
Fig.3.(a,b) The TCCs between the observed and (a) LR-and (b) LightGBM-simulated HWF_EUR using all climate predictors for the period 2011–2020.(c–f) As in(b) but with only (c) SST,(d) SCE,(e) SM,and (f) SIC as the factor in the three seasons.(g–i) The TCCs of LightGBM using all climate factors in a single season: (g)the preceding DJF;(h) the preceding MAM;and (i) JJA.Areas with TCCs significant at the 0.1 level are dotted.
As LightGBM is a tree-ensemble model,the relative contributions to the model simulation experiments of each climate factor can be evaluated.Fig.4 (a) demonstrates the contributions of the top 10 climate factors to the LightGBM model simulation with all climate factors.SST factors account for 70% of the top 10 climate factors,while SM and SCE factors account for 20% and 10%,respectively,consistent with Fig.3 (c–f),which indicates that SST factors make relatively more contributions to the LightGBM model experiment than other factors.Similarly,SST is the factor that contributes the most to the model simulation with climate factors in the preceding winter (Fig.4 (b)).
Considering the poor performance of current climate models in simulating heatwave events,ML models may be better than traditional climate models for capturing the nonlinear relationships between factors and such events.Therefore,in this study,an ML model,LightGBM,was used to simulate the variation in summer HWF_EUR for the period 2011–2020.The relative contributions of various climate factors that may contribute to the HWF_EUR variation,including SST,SCE,SM,and SIC,were analyzed in three seasons,i.e.,the simultaneous summer,preceding spring,and winter.
Results showed that the LightGBM model had good skill in simulating the variation in summer HWF_EUR,and obviously outperformed the LR model.The SST,SCE,and SM factors contributed more than the SIC factor in the model experiments.Among them,SST played the most critical role in the ML model simulation compared to the other climate factors.In addition,model experiments using climate factors from the preceding DJF showed the best skill compared to the other two seasons,indicating that the lower-boundary conditions in the preceding DJF may be vital impact factors for the summer HWF_EUR variation and may contribute to the forecasting of summer HWF_EUR variation.Note that the SM may face uncertainties with different datasets.For example,the SM data from NCEP and ERA5-Land may bear some inconsistencies,especially in North America,and therefore the interpretation of the contribution of SM should be taken with caution and needs to be further examined in the future.
Fig.4.Contributions of the top 10 climate factors (units: %) to the simulation of PC1 in LightGBM with all climate factors: (a) in all three seasons;(b) in DJF only.
Funding
This research was supported by the National Natural Science Foundation of China [grant number 42075050 ].
Supplementary materials
Supplementary material associated with this article can be found,in the online version,at doi: 10.1016/j.aosl.2022.100256.
Atmospheric and Oceanic Science Letters2022年5期