Matthias Schmidt,Johannes Breidenbachand Rasmus Astrup
The prediction of tree height is of central importance,not only for the calculation of growing stock from sample inventories,but also in the prognosis of middle and long-term forest development for forest planning and in the analysis of timber supply.The estimation of single tree volume and assortment is made using speciesspecific taper functions which use tree height and the diameter at breast height(dbh)(and occasionally other stem diameters)as inputparameters.Single tree volumes are then the basis for expanding total timber volume from sample forest inventories for any given evaluation and planning unit.With respect to dbh,information from fully documented experimental plots,or at least from concentric sample plots,can frequently be relied upon for tree height imputation in sample forest inventories and for generating realistic start values to initialize forest growth simulators.Measurements of tree height,however,are considerably more costly to obtain,so that often little or no data is available.If one to several height measurements are available in a stand or sample plot,height-dbh curves based on simple mixed models are applied which allow for local calibration of a mean population relationship and thus local prediction(e.g.Corral-Rivas et al.2014).These models that employ exclusively diameter as predictor are purely data imputation tools and do not,for example,describe explicitly the effects of site or competition on the height-dbh relationship.Generalized height curves describe these effects(Larsen and Hann 1987;López et al.2003;Temesgen and Gadow 2004).However,frequently the information on measured height-dbh pairs is not used for local calibration of the height predictions.A combination of both model approaches leads to generalized height-diameter(h-d)models,which can be locally and temporarily calibrated.Hence,these models are developed using either linear or non-linear mixed models in which site,stand,competition variables but also regional units or geographic coordinates are used as covariates(Lappi 1997;Eerik?inen 2003;Calama and Montero 2004;Meht?talo 2004;Nanos et al.2004;H?kk?,1997;Schmidt et al.2011).From a more general point of view mixed models also provide a solution to the problem of correlated errors that results from grouped data structures and they quantify the variability between groups via random effects(Pinheiro and Bates 2000).This is highly relevant in h-d modeling since in most forest growth and yield data bases several measurements origin from the same sample plot or trial and measurement occasion.
In forest growth simulations the actual projection of future tree heights is frequently based not on height-dbh curves but on height growth functions of dominant trees.These can be adapted for single trees using additionaltree covariates like competition indices(Pretzsch 2009).However,if a longitudinal covariate,such as age or quadratic mean diameter,is used then valid future height projections can be obtained directly using the generalized h-d model.
H-d models for Norway spruce(Picea abies(L.)Karst.),Scots pine(Pinus sylvestrisL.)and silver birch(Betula pendulaRoth)are presented in this paper,which allow an optimal height prediction for any given dbh in all of the example situations described below.This is trueregardlessofthe numberofavailable height measurements,as well as in those cases in which only measurements from an earlier inventory are available.Furthermore,an optimal combination of information from stand and site variables together with local height measurementsisensured.Theserequirementsare fulfilled by using a generalized height-diameter model which has been parameterized as a mixed model.Mixed models facilitate the local or temporal calibration of global models which have been determined using fixed covariate effects(Lappi 1997;Meht?talo et al.2015).As in our investigation only few causal site variables were available the covariates used are mainly proxies.In order to guarantee the highest possible accuracy of prediction in those cases for which no height-dbh pairs were available at all,complex linear predictors for the fixed effects are parameterized in generalized additive mixed models(gamm).Moreover for forecasts of future height development,it seems advantageous to describe the highest possible variance partition as a function of dynamic(i.e.time-varying)covariatesthrough their fixed effects,because it can be assumed that the information from measured dbh-height pairs becomes less meaningful with increasing simulation period.The use of longitudinal variables,such as age or quadratic mean diameter(qmD)increases the possible applications of the models from purely data imputation to height projection in growth simulations.Finally implausible effects,resulting for certain data ranges of covariates in the gamm,are forced into plausible(as decided by experts)patterns by defining monotonicity-restrictions in shape-constraint additive models(scam).
The developed longitudinal h-d models provide solutions for the following applications throughout Norway:
·Height imputation,taking into account site and tree effects for single trees in the NFI and also for initializing growth simulators,when no representative height measurements are available.For the application,a measured or estimated dbh must be available.
·Medium-term future height predictions for the analysis of timber supply,forest development scenarios and silviculture scenario simulations,taking(fixed)site and tree effects into account.
·Ensuring plausible height predictions for the whole data range of covariates by applying monotonicity-constraints where necessary for the fixed model effects.
·Model calibration,i.e.local adaption of height predictions and projections using height-diameter measurements.
Data from the Norwegian national forest inventory(NFI)for the period 1986–2012 were available.For all three tree species studied,steep gradients were evident in the data for the potential covariates and their combinations.This is extremely advantageous for the development of statistical models,or rather for generating generally acceptable,stable and plausible estimations of model effects(Tables 1 and 2).
Annually,ca.one fifth of the sample plots are inventoried in the NFI(inter-penetrating panel design).Below the coniferous forest limit,the permanent sample plots are laid out in a systematic 3 km×3 km grid.Since 2005,sample plots in high mountain areas above the coniferous limit and in Finnmark are measured on a 3 km×9 km and 9 km×9 km sampling grid,respectively.Sample plots in Finnmark below the coniferous limit are measured on a 3 km×3 km grid as in other parts of the country.Between 1986 and 1993 concentric circular plots of 100 m2(for trees with a dbh of less than 20 cm)and 250 m2(for trees with a dbh greater and equal to 20 cm)were used.From 1994 on,simple circular plots with an area of 250 m2were used.Over the complete inventory period only trees with a dbh of 5 cm or greater were sampled.Tree height measurements in the NFI are made with Vertex inclimeters for a subsample of trees.The subsample of trees is selected proportional to tree diameter.While the expected number of height trees per sample plot was three per species until 2004,the expected number of height trees per plot was 10 independent of tree species from 2005 onwards.Therefore,and due to the inclusion of high mountain areas and Finnmark,there was a clear increase in the number of h-d value pairs with time(Fig.1,left).The greatest numbers of trees were sampled between 200 and 250 m above sea-level(Fig.1,right).Spruce and birch are more evenly distributed across the altitude gradient than pine.The relatively small proportion of spruce in the lower altitudes(predominantly coastal areas)and its dominance between 300 and 600 m stands out,while above 800 m birch is the most frequently occurring species.
Table 1 Summary statistics of the continuous variables used in the development of the longitudinal height-diameter models.Statistics for quadratic mean diameter(species specific)and stand age were calculated for sample plot/measurement occasion means,statistics for altitude for sample plot means respectively
Table 2 Number of sample plots by soil depth category and tree species used in the development of the longitudinal height-diameter models
Birch has the greatest regional range,the highest natural tree line(Fig.1,right)and the most northerly range limit(Fig.2,right)of the three species.Although pine also has a large regional range(Fig.2,middle)the natural tree line lies at a much lower altitude(Fig.1,right)and it occurs much less frequently in the provinces of Nordland,Troms and Finnmark.The data for spruce show that it has a higher natural tree line than pine(Fig.1,right)but the northern limit of its range lies at a lower latitude(Fig.2,left).More clearly than for the other two species,a separation of spruce into two distinct ranges can be seen,one in south-east Norway east of the main watershed divide,the other lying in the province of Nord-Tr?ndelag and parts of S?r-Tr?ndelag.The limited spruce distribution in the coastal regions of south and mid Norway is due to the fact that spruce is not part of the potential natural vegetation in these regions.In total there are 68,426 spruce,50,852 pine and 59,112 birch h-d data pairs and respective covariate vectors available(Tables 1 and 2).
Model development was a multi-step process.In a first step,gammwere parameterized in order to identify covariates with significant effects and to test model effects for non-linearity.Based on this unrestricted model selection the model effects were tested for plausibility.If necessary,conditions such as monotonicity were specified andscamparameterized.Thescamwere then validated against the unconstrained models by comparing the fitting statistics standard error,explained deviance and Akaike information criterion(AIC).Because of the computational intensity,a direct parameterization of shape constraint additive mixed models(scamm)was only possible for small datasets given the available computing facilities.To develop models which can be locally calibrated,generalized linear mixed models(glmm)are therefore parameterized in which,on the basis of thescampredictions,conditional expectation values are entered as “a priori”information.The specification as a mixed model enables the partitioning of the total variance on different levels,and thereby,the calibration of a mean population model using additional h-d measurements(Meht?talo et al.2015).Moreover the mixed model approach accounts for the grouped structure of the used NFI-data and the related correlated errors.The integration of theqmDas a covariate gives the models their longitudinal character and,consequently,the shifting h-d relationship with time can be described as a function of the developmental stage of a stand.Even if the shift in the h-d relationship should not be confused with incremental height growth,the approach opens up the possibility of site-sensitive height projections in growth simulations.
Fig.1 Distribution of stem number by inventory year(left)and altitude level(50 m classes)(right)
Fig.2 Regional ranges of Norway spruce,Scots pine and silver birch in the Norwegian National Forest Inventory
The choice of the basic model,or rather,of the specific height-diameter function,is crucial for the longitudinal h-d model that is developed from it.Here,a special form of the Korf-function developed by Lappi(1997)is used,which is distinguished by the biological interpretability and comparatively low correlation of its parameters.These qualities are particularly advantageous when,as is the case here,the parameters,and thereby the realisation,of height curves are to be described as a function of site,stand,and single tree variables.Meht?talo(2004,2005)built on the work of Lappi(1997)and adapted and validated the model for spruce,pine and birch in Finland,a country with partly similar growth conditions.Additionally it is very important that the model is linear which enables the estimation of site,stand and tree effects and their validation for nonlinearity in a one-step procedure usinggamm.Finally in our application the modified Korf-function showed an adequate flexibility which is illustrated in the results chapter.The basic version of the Korf-function used here(Eq.1)is an alternative to the more frequently used variant,in which breast height(1.3 m)is subtracted from the tree height.In order to prevent the expected height values from taking on the value“zero”when the dbh is very small,Lappi added a small constant λ to the dbh,where dbh+ λ can be interpreted as the diameter at ground height.Lappi(1997)then reparameterized the function(Eq.2)because the expected values and the standard error of the “l(fā)inear”parametersAandBare strongly correlated and the trend ofBwith age is difficult to interpret.This re-parameterization,on the basis of expected values of the logarithmic tree height for trees with dbh of 30 and 10 cm(Eqs.2 and 2.1),yields biologically meaningful parameters,as well as a clear reduction in correlation.ParameterAcan then be interpreted as the expected value of the natural logarithm[ln(.)]of the height of a tree with dbh=30 cm,while parameterBis the difference between the expected values of ln(tree height)between trees with dbh=30 cm and dbh=10 cm of the respective tree species.The parametersA,B,Cand λ are referred to in this paper as first order parameters(of the Korf-function)in order to distinguish them from the second order parameters which describe the effects of site,stand,and single tree variables that are integrated into the model later.
with:
hkti:height of treeiat time of inventorytat sample plotk;
dbhkti:dbh of treeiat time of inventorytat sample plotk;
xkti:re-parameterized dbh of treeiat inventory datetat sample plotk;
Akt,Bkt,C,λ:first order parameters of the heightdiameter model at time of inventorytat sample plotk;
ln(.):natural logarithm.
In keeping with Lappi(1997),the function(Eq.2)is subsequently linearized by iteratively determining the combination of λ andCfor which the corresponding model has the lowest AIC.Differences in the underlying data result,at this point,in a fundamental difference to the approach of Lappi(1997)and Meht?talo(2004).Lappi(1997)used experimental plots and Meht?talo(2004)a subsample of the Finnish NFI which,because of the large or least sufficient number of h-d value pairs,allow an ordinary least squares estimate of separate h-d curves for each plot and inventory date.From these individual parameterizations Lappi(1997)derived not only the optimal parameter combination of λ andC,but also the age trends for the parametersAandB.In contrast,the choice of optimal combinations of λ andCin this study was made using aglmmbased on the re-parameterized Korf-function(Eq.2)because the number of measurements per plot in the Norwegian NFI rarely allowed fitting of stable,separate plot-specific models.Thisglmmincludes plot-level random effects with mean 0 and constant variance for the parametersAandB(Eq.2.2).Moreover during model development it turned out that the variance of random effects for an inventory date level nested within plot level was extremely low and almost zero for all 3 tree species.Hence all further model selection was restricted to plot level mixed models.
hkti~Gamma(μ,ν) with dispersion parameter ? =1/ν= σ2.
with additionally:
A,B:Fixed effects for the first order parameters of the height-diameter model(re-parameterized Korf-function);
αk,βk:Random effects for sample plotkwith the vector of random parametersbk=(αk,βk)′~N(0,D)andDdenoting the corresponding variance-covariance matrix.
In this study all models are paramaterized asglmmorgammwith log-link function and Gamma as distribution assumption.By employing the Gamma distribution we assumeaconstantcoefficientofvariationσwith[Var(hkti)]1/2=σE(hkti)andVar(hkti)=?[E(hkti)]2.This corresponds to a log-linear model,but,using generalized models no transformation bias occurs when the prediction is back-transformed.We show in the results chapter that assuming a Gamma distribution leads to a sufficient variance stabilization in our case and in contrast to Lappi(1997)and Meht?talo(2004,2005)we did not model the residual variance explicitly.However,the ongoing development of approaches likegamfor location and scale(Wood et al.2016)will allow for a more flexible variance modelling in the future.
The iterative search for the parameters λ andCfor spruce,pine,and birch,with respectively 5613,5219,and 7606 sample plots,proved to be too computationally intensive given the available computing facilities.Instead 20 samples,each containing 500 sample plots,where drawn from the dataset and models with different combinations of λ andCwere parameterized(Eq.2.2).Based on the optimal values determined by Meht?talo(2004,2005)for spruce(λ=7,C=1.564)and birch(λ =6,C=1.809),the value for λ was varied between 3 and 20(in increments of 1),the value forCwas varied between 0.3 and 2.5(in increments of 0.1)and all of the resulting combinations tested.The AIC values of the resulting 20 models for each parameter combination were then averaged and the optimal parameter combination determined using the lowest average AIC value.
In contrast to Lappi(1997),further model selection in this study followed in a one-step procedure with the help ofgamm,without the effects of the longitudinal covariate(age orqmD)on the first order parametersAandBbeing first approximated.Lappi(1997),on the other hand,assumed that the effects of further covariates would be linear and affect the before approximated age effects.In this study,all further covariate effects are estimated simultaneously with the effect of the longitudinal covariateqmD,whereby,because of the log-link function,the effects act multiplicative exponential on tree height(Eq.3).Model effects on the first order parameterAare indicated byf1a…fnaorfsp athe latter one indicating a structured spatial effect.Terms affecting the first order parameterBare described by the varying coefficient termsf1....fnb.Through the simultaneous estimation of the parameters of the 2-dimensional trend functionfspa(eastk,northk)and the plot level random effects(αk,βk),the spatial autocorrelation is separated into a structured and an unstructured spatial effect(Brezger and Lang 2006).The first captures the large-scale autocorrelation,while the second describes the small scale correlation within sample plots.The models were fit using software default values(Rpackagemgcv,Wood 2006)for the spline basis dimensions ofk=10 for the 1-dimensional andk=30 for the 2-dimensional splines.
hkti~Gamma(μ,ν)with dispersion parameter ? =1/ν= σ2.
with additionally:
x1…xn:Covariates with 1-dimensional effects on the h-d relationship;
eastk,northk:Easting and northing of sample plotk(UTM-coordinates);
f1a(x1)…fna(xn):1-dimensional penalised regression P-splines describing the level of the h-d relationship(first order ParameterA);
f1b(x1)…fnb(xn):1-dimensional penalised regression P-spline describing the slope of the h-d relationship(first order ParameterB);
fspa:2-dimensional isotropic penalised thin-plate regression spline capturing the structured spatial effect on the level of the h-d relationship(first order ParameterA).
The estimated model parameters were checked for logical validity.If deemed necessary,monotonicity constraints were defined to enforce plausible patterns.Ascammdescribing the h-d relationship under conditions of monotonicity for all 1-dimensional effects can be written as follows,with all monotonic model effects denoted byminstead off:
Fig.3 (See legend on next page.)
hkti~Gamma(μ,ν)with dispersion parameter ? =1/ν= σ2.
with differing to Eq.3:
m1a(x1)…mna(xn):1-dimensional monotonic penalised regression P-splines describing the level of the h-d relationship(first order ParameterA);
m1b(x1)…mnb(xn):1-dimensional monotonic penalised regression P-splines describing the slope of the h-d relationship(first order ParameterB).
Due to the extensive dataset,with many thousand sample plots,a parallelization is necessary.The parameterization of all plot-levelgammwas made using theR(R Core Team 2016)packagemgcv(Wood 2004,2006,2011),which can handle parallel calculations.The investigations concerning 2-levelgammwith additional inventory date level were conducted by combining functions from packagesmgcv,nlme(Pinheiro et al.2013)andMASS(Venables and Ripley 2002).The parameterization of thescamwas done using the R packagescam(Pya 2015),which is based on themgcvlibrary and also allows a parameterization ofscamm.Parallel computing has not been supported by thescampackage up to now,so in this study a 2 step procedure is used.In a first stepscamwere parameterized(Eq.5)whose estimates of conditional expected values of ln(tree height)were the only covariate in subsequentglmm(Eq.6).The resultingglmmmakes possible a local calibration using height-dbh measurements,by which the pattern of the fixed shape constrained model effects remain the same.Because theglmmbuilds on thescam,it will henceforth be labeled asscam_m.
hkti~Gamma(μ, ν)with dispersion parameter ? =1/ν= σ2.
with additionally:Prediction of ln(tree height)usingscam(Eq.5)of treeiat inventory datetand sample plotk.
Within the studied parameter boundaries the optimal combination for spruce was λ=20 andC=2.5,for pine λ=19 andC=2.5 and for birch λ=16 andC=2.4(Fig.3).For each of the 3 tree species studied several different parameter combinations resulted in AIC values near the minimum.The optimal values were,depending on species,for one or both of the parameters near the upper boundary of the studied parameter ranges.It can,therefore,be assumed that the true optima lie at higher values of λ andC.However,further improvements would be marginal as can be seen from the development of the AIC values within the studied parameter boundaries(Fig.3).There were relatively large differences between the optimal parameter combinations and those determined by Meht?talo(2004,2005),although those optima would also lead to relatively low AIC values if applied to the NFI data(Fig.3).For pine,Meht?talo(2005)modelled parameterCdependent onqmD,so that in this case there were no constant value-pairs available for comparison.
In the course of model selection of the unrestrictedgamm,quadratic mean diameterqmD,the competition indexBAL(basal area larger;the sum of basal areas of all trees larger than the reference tree),altitudeAlt,soil depthSD,as well as regional location(easting,northing)were all selected as covariates with a significant effect on the first order parameterA.OnlyqmDshowed an additional significant effect on the slope of the h-d relationship,the first order parameterB.
hkti~Gamma(μ,ν)with dispersion parameter ? =1/ν= σ2.
with additionally:
qmDkt:Quadratic mean diameter of the tree species at inventory datetat sample plotk;
Altk:Altitude of sample plotk;
Fig.4 (See legend on next page.)
BALkti:Basal area larger(sum of the basal area of all trees larger than the reference tree)of treeiat inventory datetat sample plotk;
SDk:Soil depth category of sample plotk:I(0–25 cm),II(25–50 cm),III(50–100 cm),IV(>100 cm);
eastk,northk:Easting and northing of sample plotk(UTM-coordinates);
f1a(.)…f3a(.):1-dimensional penalised regression P-splines describing the level of the h-d relationship(first order ParameterA);
f1b(.):1-dimensionalpenalised regression P-spline describing the slope of the h-d relationship(first order ParameterB);
?SD:Vector of the regression coefficients for soil depth categories;
fsp a:2-dimensional isotropic penalised thin-plate regression spline capturing the structured spatial effect on the level of the h-d relationship(first order ParameterA);
αk,βk:Random effects for sample plotkwith the vector of random parametersbk=(αk,βk)′~N(0,D)with D denoting the corresponding variance-covariance matrix.
Two level mixed models were excluded from the further process of model selection since the estimated variances of random effects for an additional inventory date level nested within plot level for all 3 tree species were extremely low and hence irrelevant.For all three tree species the 1-dimensional effects of all continuous covariates on the first order parametersAandBwere more or less non-linear whereas the effects ofBALshowed only minor deviations from linearity(Fig.4).Based on expert knowledge the flexibility of the 1-dimensional splines was validated as sufficient and the default spline basis dimension ofk=10 was not increased.In modeling usingscaman assessment of the unrestricted model effects is part of the model building process,as a decision must be made as to what degree plausible model effects could be forced by imposing restrictions.Since h-d curves are fitted the model effects have to be validated with regard to their effects on tree height as a surplus to the effects on diameter growth.
For all three tree species the effect ofqmDon the first order parameterAdecreases in the range of large values(Fig.4).This was seen as unfeasible since for spruce from aqmDof ca.25 cm and for pine and birch from aqmDof ca.40 cm onwards,a decreasing level of the height curve with increasingqmDwould be predicted(Fig.4).It is assumed that the cause of this frequently occurring pattern is that the share of unfavourable sites is much higher in stands in advanced development stages than in younger stands,because such sites have on average poorer access,lower management intensity and a lower timber felling rate.It can also be assumed that,because of lower tree heights,unfavourable sites are less vulnerable to storm damage and that their share will therefore increase with advancing stand development stage.
The effects ofBAL,Altand the different categories ofSDon the first order parameters seem plausible.All three species show monotone decreasing effects with increasingAlt(Fig.4).UnderNorwegian growth conditions it can be assumed thatAltis primarily a proxy variable for temperature,which decreases with increasingAlt.Precipitation,which increases with increasingAlt,is not able to fully compensate for the limiting factor of the temperature-sum.The weak gradient between 0 and 150 mAltfor all three tree species seems plausible,because it can be assumed that the growth conditions at these altitudes are relatively uniform,if all other influence factors are constant.
Fig.5 Model effects of stand age on the first order parameter A(left)and B(right)of a non-shape-constrained gamm,for Norway spruce where stand age instead of the quadratic mean diameter has been used as the longitudinal covariate(Eq.7.1)
Fig.6 (See legend on next page.)
All three tree species show increasing effects with increasing soil depthSD(Fig.4),although for pine and birch between categoriesIIIandIVand for spruce between categoriesII,IIIandIVthe differences were not significant.The ranking is plausible,because with increasing soil depth better conditions with respect to water regime and nutrient supply can be assumed.Also the much lower level of theSD-categoryIsites with very shallow soils can be judged to be plausible.
The effect ofBALis uniformly monotone increasing with a,depending on the tree species,more or less weak degressive tendency,which is most pronounced for pine(Fig.4).BALis a simple index describing the social rank and competition pressure of a tree within a sample plot.In assessing the effect it was assumed,that with increasingBAL(or increasing competition),light would become a growth-limiting factor.Thus,with increasingBALthe relation of height-growth to diameter-growth shifts in favour of height growth and greater tree heights are predicted,if all other factors remain constant(Fig.4).With decreasingBALthe social rank of a tree increases and lower tree heights are predicted,if all other variables are equal because dominant,and as extreme cases solitary trees,invest more into diameter than height growth for stability reasons.Another way of interpreting theBALeffect is to compare trees that grow under equal site conditions but under different competition.These trees will have similar heights but different dbh as a function of growing space.Hence larger h-d ratios can be assumed in denser stands and for higher competition (Zhang and Burkhart 1997;Zeide and Vanderschaaf 2002;Calama and Montero 2004).The model effect of competition is in accordance with several investigations about the effects of stand density on the h/d ratio(Calama and Montero 2004).Through the choice ofBALas the competition index it is implied that the h-d relationship is not influenced by the harvest or death of trees which are smaller than the reference tree.
The effect ofqmDon the first order parameterBis montone increasing with an asymptotic tendency from ca.qmD45 cm for spruce(Fig.4).For pine this effect is nearly linear increasing,while for birch it is monotone increasing with a degressive tendency.
The spatial trend-functionfsp a(eastk,northk)captured large-scale correlated differences of the h-d relationship which were not described by the other covariates and lead to a clear improvement of the model accuracy for all tree species(Fig.4).Apart from the north-south gradients of temperature-sum and length of the vegetation period,it is perhaps above all the effect of distance to the coast and the resulting site differences that are modelled by this effect.It can also be supposed that the effects of further causal factors like,for instance,large scale geological differences,are accounted for by the regional location proxy variable.No further investigation concerning optimal variance partition between spatial trend-function and random plot effects was made at this point.
If stand age was used instead of the mean basal tree area as the longitudinal covariate(Eq.7.1)there is a clear decrease in the model accuracy for all tree species.This is illustrated here using spruce as an example(Fig.5).hkti~Gamma(μ,ν)with dispersion parameter ? =1/ν= σ2.
with differing to Eq.7:
Agekt:Non species-specific stand age at inventory datetat sample plotk.
The effects of stand age on the first order parametersAandBare not very sensitive and display implausible patterns(Fig.5).In this context it must be mentioned that the specification of conditions inscamshould only be applied in those cases in which the patterns of unconstrained effects seem basically plausible.The specification of conditions serves solely to suppress a too great and implausible flexibility,especially at the boundaries of the covariate data ranges.The problem of insensitive model effects and entirelyimplausible patterns,especially in those data-ranges with many available datapoints,cannot be solved using thescamapproach.
Hence, the subsequent integration of shapeconstraints inscamto ensure plausible model effects is done on the basis of thegamm,in whichqmD(Eq.7)instead of stand age(Eq.7.1)is used as the longitudinal covariate.For this monotone increasing effects ofqmDon the first order parametersAandBand a monotone increasing and concave effect ofBALon the first order parameterAwere parameterized.The remaining model effects were included without shape-constraints in the model(Eq.8).
Table 3 Standard errors,explained deviance and AIC of shape constrained and unconstrained generalized h-d models for spruce,pine and birch in Norway
hkti~Gamma(μ,ν)with dispersion parameter ? =1/ν= σ2.
with differing to Eq.7:
m1a(qmDkt):1-dimensional penalised regression P-spline describing the effect ofqmDon the level of the hd relationship;
mcc3a(BALkti):1-dimensionalmonotone increasing,and concave,penalised regression P-spline describing the effect ofBALon the level of the h-d relationship;
m1b(qmDkt):1-dimensional monotone increasing penalised regression P-spline describing the effect ofqmDon the slope of the h-d relationship.
In addition to the forced monotone or monotoneconcave patterns clearer,and mostly significant,contrasts of the effects of the soil depth categories for all tree species occur as a side-effect of this process.Now,solely the soil depth categoriesIIIandIVdisplay for spruce no significant differences.The clearer contrasts in the effects of the soil depth categories can be seen as an indication that,with respect to the combinations ofqmDandSDthe data base is unbalanced.Only after monotonic restriction of the effect ofqmDmore distinct,significant differences in the h-d relationship are depicted by the causal covariateSD.The basic pattern of the unconstrained effects ofAltand of regional location is scarcely changed by the shape constrains(Fig.6).
The prediction accuracy of the models(standard error of tree height estimation)is only slightly influenced by the specification of shape constraints(Table 3).If only the fixed effects are taken into account(mean population model),the standard error of thegamm(Eq.7)differs only slightly from the standard error of thescam(Eq.8).For spruce and pine thescamstandard errors are even a little lower.For the purpose of comparison,additional generalized additive models(gam)were parameterized,because the prediction accuracy of the mean population model of mixed models is normally a little lower.When compared to thescam,the prediction accuracy of thegamis only marginally higher(spruce,pine)or the same(birch).
A comparison of the prediction accuracy of thegammandscam_m,taking both fixed and random effects intoaccount,also shows only minimal differences.The standard errors of thegammfor spruce and pine are slightly lower,and for birch slightly higher,than those of thescam_m.Comparing thegamandscam(orgammandscam_m)based on their explained deviance confirms again,that shape constraints only result in marginal differences.The AIC-values of thegamare also only slightly lower than those of thescam(Table 3).Because of the stepwise parameterization of thescam_m,a comparison ofscam_mandgammby means of AIC is not possible.
Table 4 Estimated parameters for the variance components(standard deviation SD)and dispersion parameter for shape constrained and unconstrained generalized h-d models for spruce,pine and birch in Norway.Lower and upper limits for the 0.95%confidence intervals are given in brackets
Fig.7 Residuals on response scale of the scam_m(Eq.6)over 0.2 interval classes of plot-wise standardized dbh for spruce,pine and birch
Fig.8 Residuals on logarithmic scale of the scam_m(Eq.6)over 2 cm interval classes of dbh for spruce,pine and birch
The standard errors of height prediction applying only fixed effects are the highest in pine followed by spruce and birch(Table 3).In comparison the reduction in standard error using the full mixed models are about 1 m for pine,0.7 m for spruce and 0.6 m for birch.The standard errors of the mixed models are rather similar for spruce and pine and the lowest for birch.The explained deviance using only fixed effects is considerably higher for spruce compared to pine and birch whereas the values of the mixed models are similar for spruce and pine and the lowest for birch.
A comparison of variance components shows that pine has the highest inter-plot variability for both first order parametersAandBand birch has a higher variability inAthan spruce whereas the variability in parameterBis higher for spruce than for birch(Table 4).However the plot-level variance components estimated by Meht?talo(2004,2005)for the same tree species in Finland are considerably lower even if our linear predicator is more flexible.This might be a result of the much more variable growth conditions in Norway with its mountain ranges and complex coastal lines.
Based on a residual analysis the predictions of thescam_mcan be validated as more or less unbiased(Fig.7)which confirms the suitability of the modified Korf-function as basic model.Only for Scots pine a very slight overestimation is present for standardized dbh greater than or equal to 1,whereas the unsystematic deviations at the edges of the dbh ranges are assumed to be random because of very few underlying observations.
The analysis of residuals on logarithmic scale indicates that the assumption of a Gamma distribution stabilizes the variance sufficiently(Fig.8).However,this finding is in contrast to the investigations of Lappi(1997)and Meht?talo(2004,2005).
Based on h-d models for spruce,pine and birch in Norway a model comparison of unconstrainedgammandscam_mwas made.For the h-d models it was shown thatscam_mcombines the flexibility ofgammwith the assurance that all model effects will be plausible.Plausible model effects can be forced by setting conditions such as monotonicity,convexity,concavity or combinations thereof.The full flexibility of additive regression models remains within the constraint conditions.As was shown in the cases studied,constrained and unconstrained effects can be combined within the same model.There were only marginal differences in the predictive accuracy of h-d models which had been parameterized asgammorscam_m.At the same time,thescam_mmodels are made more generally applicable,especially for predictions based on external data,by the ability to take expert knowledge into account.Because forest growth data is normally more or less unbalanced,the number of potential uses forscam_mmodels is large.
Abbreviations
AIC:Akaike information criterion;Alt:Altitude;BAL:Basal area larger;
dbh:Diameter at breast height;east:UTM-easting coordinate;
gam:Generalized additive model;gamm:Generalized additive mixed model;
glmm:Generalized linear mixed model;h:Tree height;NFI:National forest inventory;north:UTM-northing coordinate;qmD:Quadratic mean diameter;
scam:Shape constrained additive models;scam_m:2-step shape constrained additive mixed models;scamm:Shape constrained additive mixed models;
SD:Soil depth category
Acknowledgements
We would like to thank two anonymous reviewers for constructive comments.
Funding
This study was supported by the Norwegian Institute of Bioeconomy Research(NIBIO).
Availability of data and materials
Data are available upon request under certain constraints.
Authors’contributions
The first author(MS)conducted the data analysis,model development and wrote the first draft.JB substantially contributed to the data preparation.All authors jointly discussed the results,drew conclusions and finalized the manuscript.All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
The authors give the consent for publication.
Competing interests
The authors declare that they have no competing interests.
Author details
1Northwest German Forest Research Institute,Forest Growth,Gr?tzelstr 2,37079 G?ttingen,Germany.2Norwegian Institute for Bioeconomy Research,National Forest Inventory,Postboks 115,1431 ?s,Norway.
Received:2 June 2017 Accepted:20 December 2017
Brezger A,Lang S(2006)Generalized structured additive regression based on Bayesian P-splines.Comput Stat Data Anal 50(4):967–991
Calama R,Montero G(2004)Interregional nonlinear height-diameter model with random coefficients for stone pine in Spain.Can J For Res 34:150–163
Corral-Rivas S,álvarez-González JG,Crecente-Campo F,Corral-Rivas JJ(2014)Local and generalized height-diameter models with random parameters for mixed,uneven-aged forests in Northwestern Durango,Mexico.Forest Ecosystems 1:6.https://doi.org/10.1186/2197-5620-1-6
Eerik?inen K(2003)Predicting the height-diameter pattern of planted Pinus kesiya stands in Zambia and Zimbabwe.Forest Ecol Manag 175(1–3):355–366.https://doi.org/10.1016/S0378-1127(02)00138-X
H?kk? H(1997)Height–diameter curves with random intercepts and slopes for trees growing on drained peatlands.For Ecol Manag 97:63–72
Lappi J(1997)A longitudinal analysis of height/diameter curves.For Sci 43(4):555–570
Larsen DR,Hann DW(1987)Height-diameter equations for seventeen tree species in southwest Oregon,vol 49.Oregon State University,College of Forestry,Forest Research Laboratory,Corvallis,p 16
López Sánchez CA,Gorgoso JJ,Castedo F,Rojo A,Rodríguez R,álvarez González JG,Sánchez Rodríguez F(2003)A height–diameter model for Pinus radiata D.Don in Galicia(Northwest Spain).Ann Forest Sci 60:237–245
Meht?talo L(2004)A longitudinal height-diameter model for Norway spruce in Finland.Can J For Res 34:131–140
Meht?talo L(2005)Height-diameter models for Scots pine and birch in Finland.Silv Fenn 39(1):55–66
Meht?talo L,de-Miguel S,Gregoire T(2015)Modeling height-diameter curves for prediction.Can J For Res 45(7):826–837.https://doi.org/10.1139/cjfr-2015-0054
Nanos N,Calama R,Montero G,Gil L(2004)Geostatistical prediction of height/diameter models.For Ecol Manag 195(1–2):221–235
Pinheiro JC,Bates DM(2000)Mixed-effects models in S and S-plus.Springer,Berlin Heidelberg New York
Pinheiro JC,Bates DM,DebRoy S,Sarkar D,R Development Core Team(2013)nlme:linear and nonlinear mixed effects models.R package version 3,pp 1–108 https://CRAN.R-project.org/package=nlme.Accessed 12 May 2017
Pretzsch H(2009)Forest dynamics,growth and yield.Springer Verlag,Berlin
Pya N(2015)Scam:shape constrained additive models.R package version 1,pp 1–9 https://cran.r-project.org/web/packages/scam/index.html.Accessed 12 May 2017
R Core Team(2016)R:a language and environment for statistical computing.R Foundation for Statistical Computing,Vienna,Austria https://www.R-project.org/.Accessed 12 May 2017
Schmidt M,Kiviste A,Gadow K(2011)A spatially explicit height-diameter model for Scots pine in Estonia.Eur J For Res 130:303–315.https://doi.org/10.1007/s10342-010-0434-8
Temesgen H,Gadow K(2004)Generalized height-diameter models–an application for major tree species in complex stands of interior British Columbia.Eur J Forest Res 123(1):45–51
Venables WN,Ripley BD(2002)Modern applied statistics with S,4th edn.Springer,New York
Wood SN(2004)Stable and efficient multiple smoothing parameter estimation for generalized additive models.J Am Stat Assoc 99:673–686
Wood SN(2006)Generalized additive models:an introduction with R.Chapman and Hall/CRC,Florida
Wood SN(2011)Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.J Royal Stat Soc(B)73(1):3–36
Wood SN,Pya N,S?fken B(2016)Smoothing parameter and model selection for general smooth models.J Am Stat Assoc.https://doi.org/10.1080/01621459.2016.1180986
Zeide B,Vanderschaaf C(2002)The effect of density on the height-diameter relationship,Gen Tech Rep SRS-48.U.S.Department of Agriculture,Forest Service,Southern Research Station,Asheville,pp 463–466
Zhang S,Burkhart HE(1997)The influence of thinning on tree height and diameter relationships in loblolly pine plantations.South J Appl For 21:199–205