Yan Zhang,Weiping Zhang,Xiao Guo(Dept.of Statistics and Finance,University of Science and Technology of China,Anhui 230026,PR China)
EMPIRICAL LIKELIHOOD APPROACH FOR LONGITUDINAL DATA WITH MISSING VALUES AND TIME-DEPENDENT COVARIATES??
Yan Zhang,Weiping Zhang?,Xiao Guo
(Dept.of Statistics and Finance,University of Science and Technology of China,Anhui 230026,PR China)
Missing data and time-dependent covariates often arise simultaneously in longitudinal studies,and directly applying classical approaches may result in a loss of efficiency and biased estimates.To deal with this problem,we propose weighted corrected estimating equations under the missing at random mechanism,followed by developing a shrinkage empirical likelihood estimation approach for the parameters of interest when time-dependent covariates are present.Such procedure improves efficiency over generalized estimation equations approach with working independent assumption,via combining the independent estimating equations and the extracted additional information from the estimating equations that are excluded by the independence assumption. The contribution from the remaining estimating equations is weighted according to the likelihood of each equation being a consistent estimating equation and the information it carries.We show that the estimators are asymptotically normally distributed and the empirical likelihood ratio statistic and its profile counterpart follow central chi-square distributions asymptotically when evaluated at the true parameter.The practical performance of our approach is demonstrated through numerical simulations and data analysis.
empirical likelihood;estimating equations;longitudinal data; missing at random
2000 Mathematics Subject Classification 62G05
Longitudinal data frequently occur in many studies such as medical follow-up studies.A key characteristic of longitudinal data is that outcomes measured repeatedly on the same subject are typically correlated.Regression methods for such datasets accounting for within-subject correlation is abundant in the literatures[3,4]. Among which,the generalized estimating equations (GEEs) method by[12]has been widely used since it considers the mean structure and the correlation structure separately.When the marginal mean outcome given the covariates at current time is the same as that on all the past,present and future covariate values,the GEEs method assures consistency of the mean estimates even if the correlation is misspecified,and achieves efficiency if the correlation is correctly specified.
In practice,however,it is common that some covariates may vary over time in a longitudinal study,that is,some of the covariates may be time-dependent.For example,in the Mother’s Stress and Children’s Morbidity Study (MSCM,[1]) ,the daily ratings of child illness Yitand maternal stress Xitare measured during a 28-day follow-up period.Obviously,Yitand Xitare time-dependent,both of which vary over time and may correlate with the other measurements.It has been noted that the consistency of GEEs is not assured with arbitrary working correlation structures when there are time-dependent covariates[17].The reason is that the estimating functions generated by the longitudinal data are no longer unbiased under an arbitrary correlation structure unless the marginal mean outcome given the covariates at the current time is the same as that on all the past,present and future covariate values.Clearly,the use of an independence assumption guarantees the consistency of GEEs but can result in a substantial loss in efficiency due to the fact that, only a subset of all the unbiased estimating functions is used.Some methods have been developed for such situations,for example,[10]classified the time-dependent covariates into three different types based on the moment conditions that are valid to the covariates.They then introduced a generalized method of moments (GMM) to combine all available valid estimating equations optimally.In general,the basic idea is to select estimating functions by minimizing some criteria,see[9,15,24] among others.Recently,[10]proposed a shrinkage empirical likelihood (EL,[16]) approach by including all the estimating functions under the independence correlation assumption and/or those are known to be unbiased a priori,and shrinking all other estimating functions according to the likelihood of each being a biased,uninformative or informative estimating equation.Their approach avoids identifying the uninformative and biased estimating functions and allows different shrinkage parameters for different estimating functions.
Another common problem in the longitudinal data analysis is the missing data problem.For example,there were approximately 4%of dropout in the illness record during the 4 weeks in MSCM study.Statistical methods are available to handle such issue by incorporating missing data mechanisms to provide valid statistical inference,including complete-case analysis method,imputation method,inverseprobability weighted method and likelihood based method[13].Among which,the complete-case method is a straight forward way to handle missing data and has been widely used.It excludes all units in the dataset with one or more unobserved values.This can dramatically decrease the amount of data to be analyzed since a small percentage of missing values can lead to a considerable proportion of excluded units.Additionally,to be valid,it also requires data to be strictly MCAR (missing completely at random) ,that is,the probability of missingness is independent of all the covariates and observations at other measurement occasions.This is a rather strong assumption and is seldom met,especially for longitudinal data.Therefore,it often leads to biased or misleading results when directly applying the complete-case method in longitudinal data analysis.The GEEs method also encounters such embarrassment for missing data.[20]proposed a weighting method for rendering GEEs analysis correct under missing-at-random (MAR) mechanisms,that is,the probability of missingness does not depend on the unobserved data given the observed data.
Since it is very common in longitudinal study that some measurements may be missing and some of the covariates are time-dependent simultaneously,and available statistical methods are insufficient to deal with such situation,it is of great interest to jointly study the missing data and time-dependent covariates issues.In this paper, we propose a shrinkage empirical likelihood (EL) approach to estimate the regression parameters under the MAR mechanism and the existence of time-dependent covariates at the same time.The empirical likelihood approaches have been proved to be advantageous in combing multiple sources of information and flexibly formulating models when dealing with missing data,see,for example,[2,19,22,26].Recently, [14]proposed an EL approach to construct the confidence intervals for the parameters of interest in linear regression models under nonignorable missing mechanism. When some covariates are missing at random,[7]introduced an augmented inverse probability weighted-type empirical likelihood ratio for the parameters of interest for single index model.To deal with missing data,we first multiply impute the missing values by the inverse probability weighting method similar to[22],which was shown to be a key to achieve semiparametric efficiency in their case.Then the estimating equations are partitioned into two groups.The first group contains those unbiased estimating functions and are always used,the second group consists of all other estimating functions which need to be appropriately shrank by the shrinkage parameters.The shrinkage parameters are given according to the likelihood of each estimating equation being unbiased.Finally the EL approach is employed by combining these two groups of estimating functions to gain efficiency.
The rest of this paper is organized as follows.Section 2 describes the models andthe missing mechanism.We describe the details of the shrinkage empirical likelihood estimation approach in Section 3.Numerical studies and real data analysis are presented in Sections 4 and 5 respectively.All technical proofs are relegated to the appendix.
Consider a longitudinal study with data set
The probability p (xit) is called the propensity score or selection probability function.
Let the marginal mean outcome at the tth time point for the ith observation be
where g is a known monotone link function and β= (β1,···,βp)τis a p-vector of unknown parameters.Let Vibe an estimate of var (yi) or a“working”covariance matrix and,T be the (s,t) th entry of.Then the GEEs estimate of β is obtained by solving the following equations
whereμi= (μi1,···,μiT)τ.Under the assumption of (asymptotic) unbiasedness of the estimating equations,[3]showed that the estimate of β by GEEs is nearly efficient relative to the maximum likelihood estimates of β in many practical situations,provided that var (yi) has been reasonably approximated.However, when time-dependent covariates are present,the unbiasedness assumption might not hold for arbitrary working correlation structure and consequently the GEEs estimate of β is not necessarily consistent[17].Note that we do not assume thatwhich means that the conditional mean outcome given the covariates at current time is the same as that on all the past, present and future covariate values.Although such condition guarantees unbiasedness of GEEs analysis,it is too strict in many situations.For example in MSCM study,captures the association of the current daily rating of child illness and maternal stress,butincludes,among others,the association between maternal stress history and current daily ratings of child illness.So even though we may have a marginal model for the outcome and current maternal stress,the influence from historical maternal stress could render the GEEs analysis biased.
In case of missing data occurring in some estimating equations under working independence,we use the inverse probability weighted estimating equations approach similar to[23]and[22]to get a weighted estimating function with multiple imputations:
Since we have a total of pT2estimating functions available,selecting the appropriate estimating functions for parameter estimation becomes a key to produce consistent and more efficient estimates than the class of usual GEEs estimators.To improve efficiency over a GEEs estimate with an independence correlation assumption,we introduce an EL with moment shrinkage approach in next section,as EL allows the easy incorporation of additional information and the number of estimating equations can be greater than the number of parameters.
Under the independence correlation assumption,a subject-wise empirical likelihood for the regression parameters β is
Similar to[25]and[22],we can easily have the following result.
Theorem 1 Under Conditions 1-5 in the appendix,as n→∞and nl→∞,converges in distribution to a normal random vector with mean zero and variancewhere
To extract additional information from the estimating equations that are excluded by the independence assumption,and to improve efficiency over a GEEs estimate and subject-wise EL estimate with independence correlation assumption, denote bythe main estimating functions which are always selected and are known a priori to be unbiased,and other estimating functions are denoted as auxiliary estimating functions,We introduce a vector γ of shrinkage parameters withdimensions;each element of γ is a real number in[0,1]that depends on the data.LetIdeally, the shrinkage parameters should down weight those biased or un-informative estimating functions in SA.For a particular choice of γ,we have estimating functionsapplied tois similarly defined.Thus,consider the maximizerof the following empirical likelihood ratio function
In application,we need to choose appropriate γ.Without loss of generality,let the estimating functions inLet each element ofvector c,,take 1 if the corresponding estimating functionis deemed unbiased by the testing procedure of[10]and be 0 otherwise.Then the shrinkage parameteris defined as
Let c0and γ0be the true parameters of c and γ,respectively.Ifis truly unbiased,we haveotherwise.Defineand letbe the estimator of γ by (14) .We can obtain the following result similar to Theorem 1.
Theorem 2 Under Conditions 1-5 in the appendix,asconverges in distribution to a normal random vector with mean zero and varianceevaluated at β0,where
Because of unknown nuisance parameters in the limiting variances,it is difficult to make statistical inference using the above theorem.For this purpose,we consider the properties of the empirical likelihood ratios induced from L2(β) .Let R (β0) =We have the following theorem.
Theorem 3 Under Conditions 1-5,as n→∞and nl→∞,R (β0) converges in distribution to a chi-square random variable with p degrees of freedom.
The profile empirical likelihood ratio can be used to test a subvector of parameters when existing nuisance parameters.dimensional subvector of β.To test H0:β1=β01,denote the profile empirical likelihood ratiois the maximizer under the null hypothesis.We have the following theorem for the profile empirical likelihood ratio statistic.
Theorem 4 Under Conditions 1–5,asconverges in distribution to a chi-square random variable with q degrees of freedom.
The central chi-square distributions in Theorems 3 and 4 are convenient for hypotheses testing and constructing confidence regions for β and its components.
In this section,we present simulation studies to compare the finite sample properties of our proposed method with the other two alternative methods in terms of the bias and the root mean squared error (RMSE) .The three methods are:
1.GEEs using an independence working correlation (GEE1) ,where the missing data yitis generated from the conditional distribution F (y|xit) ;
2.Subject-wise empirical likelihood method in (11) using an independence working correlation (EL1,EL with no shrinkage) ;
3.Using the empirical likelihood approach in (13) ,with the vector of shrinkage parameters (EL2,EL with shrinkage) .
The data are generated by two different models under four different percentages of missing.We consider sample sizes n=150 and 300,and T=6 throughout the simulation.The number of replications is 1000 for each situation.The following two models are used by many other authors,see[10,27,11]for example.We are mainly interested in the regression coefficients ofin each case.
Model 1 A TypeⅠⅠtime-dependent covariate.The data generating process is
Model 2 Three TypeⅠⅠⅠtime-dependent covariates.The data generating process is
We use product Gaussian kernels with possible bandwidth when computing (6) , (7) and (8) .It is well known that selecting a suitable bandwidth is a critical issue in nonparametric or semiparametric inferences.The classical optimal rate for the bandwidth is h=O (n?1/5) ,see[21].However,as[27]pointed out,the optimal rate h=O (n?1/5) is not allowed here since we require nh2m→ 0 for the mth order kernel in Condition 5.Hence,we use a simple bandwidthin model 1in model 2 respectively.Set nl=20 in (9) and (10) .
The simulation results are tabulated in the following two tables.We only give results for the time-dependent slope parameter in each model as the intercept is a time-independent covariate and is seldom of interest.Table 1 corresponds to the situation where there is a single Type II time-dependent covariate,and shows that all estimators are approximately unbiased for n=150 and 300,and our approach outperforms the other two methods especially in the case with higher missing percentage in the response.
Table 1:Bias and RMSE for estimates of β1=ζ1+ζ2ρ using three methods under Model 1
Table 2 corresponds to the situation with three Type III time-dependent covariates.Obviously,our approach performs better than the other two methods especially in cases with relatively higher missing percentages.Both simulation results under these two models show that our method works much better than GEE1 and EL1 which simply use an independence working correlation,especially under the situations with relatively higher missing percentage.
Table 2:Bias and RMSE for estimatesusing three methods under Model 2
Table 2:Bias and RMSE for estimatesusing three methods under Model 2
?
Figure 1 shows the plots of the empirical quantiles of empirical likelihood ratios for EL2 evaluated at the true parameters versus the theoretical quantiles offor model 1 andfor model 2,respectively.Both plots indicate the validity of Theorems 3 and 4.
Figure 1:Quantile-Quantile plot ofrelative todistribution withbeing the true parameters of the time-dependent covariates,where the missing percentage is aboutthe empirical quantiles of EL2 relative toin Model 1; (b) the empirical quantiles of EL2 relative toin Model 2.
We apply the methods in the previous section to the data from the Mothers’Stress and Children’s Morbidity Study (MSCM,[1]) .This study enrolled n=167 preschool children between the ages of 18 months and 5 years that attended an inner-city paediatric clinic.During 4 weeks of follow-up daily measures of maternal stress and child illness were recorded.Additional baseline covariates including child race,maternal educational attainment,marital status,and household size were also available (see[3]for example) .Time-dependent measures for household i included daily ratings of child illness Yitand maternal stress Xitduring a 28-day follow-up period t=1,2,···,28.Here we only focus on maternal educational attainment among all the other time-independent covariates to examine the effect of mothers’stress on paediatric care utilization.
In this study,the child illness is missing at random during the 4 weeks of follow-up daily record for each child.We first divide the 28-day follow-up period into 4 weeks. Hence Yitdenotes the number of days that the ith child was ill in the tth week,and is missing if some daily records in that week are missing.Data with missing covariates are excluded.There were approximately 4%of dropout in the illness record during the 4 weeks.Similarly,Xitdenotes the number of days the ith child’s mother felt stressful in the tth week and there is no missingness in Xit,t=1,···,4.Since we are interested in a time-dependent covariate,stress and a time-independent covariate, education,that is,β= (β1,β2,β3)τ,where β1is the intercept,β2and β3are the coefficients for stress and education,respectively.We apply the three methods in the previous section to this dataset.The estimates︿β are (0.7860,0.2765,?0.0390)τ,and (0.7813,0.3195,?0.0423)τand (0.7960,0.3180,?0.0477)τby EL2,EL1 and GEE1 respectively.These estimates are very close as they are all consistent estimates.
Table 3:Bias (RMSE) for the timedependent covariate
We use the bootstrap method to obtain the bias of the estimator.Table 3 shows the bias and the root mean squared error (RMSE) of estimator for each method based on 500 bootstrap sampling.It can be seen that EL2 outperforms GEE and EL1.For model testing H0:β=0,the p-value is smaller than 1%by Theorem 3,that is,the model is significant.Using Theorem 4,we can test H0:βi=0 for i=1,2,3,the p-values are 4.51×10?5,1.96×10?6and 0.614 for i=1,2,3respectively.While the p-value for testing H0:β2=β3=0 is 9.89×10?6,we can conclude that the covariate stress is significant while education is nonsignificant.
In this paper,we have proposed a shrinkage empirical likelihood estimate for the parameters of interests when the response might be missing at random and the covariates may vary at time.After imputing the missing values,the estimating equations that are known a priori to be unbiased are always selected in the estimation procedure,and the contribution of other estimating functions is controlled by the shrinkage parameters.The shrinkage parameters put the appropriate weights on the additional information from the estimating equations that are excluded by the independence assumption,and therefore we obtain guaranteed improved efficiency over classical generalized estimating equations approach and subject-wise empirical likelihood approach.
Acknowledgements We would like to thank the editor and two referees for all the constructive comments and detailed suggestions,which lead to a substantially improvement of our paper.
[1]C.S.Alexander and R.Markowitz,Maternal employment and use of pediatric clinic services,Medical Care,24 (1986) ,134-147.
[2]S.X.Chen and P.Zhong,ANOVA for longitudinal data with missing values,The Annals of Statistics,38 (2010) ,3630-3659.
[3]P.J.Diggle,P.J.Heagerty,K.Y.Liang and S.L.Zeger,Analysis of longitudinal data, 2nd ed.,Oxford University Press,New York,2002.
[4]G.M.Fitzmaurice,M.Davidian,G.Verbeke and G.Molenberghs,Longitudinal Data analysis,Boca Raton,Florida:Chapman&Hall/CRC,2008.
[5]A.N.Glynn and K.M.Quinn,An introduction to the augmented inverse propensity weighted estimator,Political Analysis,18 (2010) ,36-56.
[6]X.Guo,T.Wang,W.Xu and L.Zhu,Dimension reduction with missing response at random,Computational Statistics&Data Analysis,69 (2014) ,228-242.
[7]X.Guo,C.Niu,Y.Yang and W.Xu,Empirical likelihood for single index model with missing covariates at random,Statistics,49 (2015) ,588-601.
[8]N.M.Laird,Missing data in longitudinal studies,Statics in Medicine,7 (1988) ,305-315.
[9]T.Lai,D.Small and J.Liu,Statistical inference in dynamic panel data models,Journal of Statistical Planning andⅠnference,138 (2008) ,2763-2776.
[10]T.L.Lai and D.Small,Marginal regression analysis of longitudinal data with timedependent covariates:A generalized method-of-moments approach,Journal of the Royal Statistical Society Series B,69 (2007) ,79-99.
[11]D.H.Leung,D.S.Small,J.Qin and M.Zhu,Shrinkage empirical likelihood estimator in longitudinal analysis with time-dependent covariates–an application to modeling the health of filipino children,Biometrics,69 (2013) ,624-632..
[12]K.Y.Liang and S.L.Zeger,Longitudinal data analysis using generalized linear models, Biometrika,73 (1986) ,13-22.
[13]R.J.A.Little and D.B.Rubin,Statistical Analysis with Missing Data,2nd ed.,Wiley, 2002.
[14]C.Niu,X.Guo,W.Xu and L.Zhu,Empirical likelihood inference in linear regression with nonignorable missing response,Computational Statistics&Data Analysis, 79 (2014) ,91-112.
[15]R.Okui,Instrumental variable estimation in the presence of many moment conditions, Journal of Econometrics,165 (2011) ,70-86.
[16]A.B.Owen,Empirical Likelihood,Chapman and Hall-CRC,New York,2001.
[17]M.S.Pepe and G.L.Anderson,A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data,Communications in Statistics,Simulations and Computation,23 (1994) ,939-951.
[18]J.Qin and J.Lawless,Empirical likelihood and general estimating equations,The Annals of Statistics,22 (1994) ,300-325.
[19]J.Qin,B.Zhang and D.H.Y.Leung,Empirical likelihood in missing data problems, Journal of the American Statistical Association,104 (2009) ,1492-503.
[20]J.M.Robins,A.Rotnitzky and L.P.Zhao,Analysis of semiparametric regression models for repeated outcomes in the presence of missing data,Journal of the American Statistical Association,90 (1995) ,106-121.
[21]J.H.Sepanski,R.Knickerbocker and R.J.Carroll,A semiparametric correction for attenuation,Journal of the American Statistical Association,89 (1994) ,1366-1373.
[22]C.Y.Tang and Y.Qin,An efficient empirical likelihood approach for estimating equations with missing data,Biometrika,99 (2012) ,1001-1007.
[23]D.Wang and S.X.Chen,Empirical likelihood for estimating equations with missing values,The Annals of Statistics,37 (2009) ,490-517.
[24]L.Wang and A.Qu,Consistent model selection and datadriven smooth tests for longitudinal data in the estimating equations approach,Journal of the Royal Statistical Society Series B,71 (2009) ,177-190.
[25]S.Wang,L.Qian and R.J.Carroll,Generalized empirical likelihood methods for analyzing longitudinal data,Biometrika,97 (2010) ,79-93.
[26]Q.Wang and J.N.K.Rao,Empirical likelihood-based inference under imputation for missing response data,The Annals of Statistics,30 (2002) ,896-924.
[27]Y.Zhou,A.T.K.Wan and X.J.Wang,Estimating equations inference with missing data,Journal of the American Statistical Association,103 (2008) ,1187-1199.
This supplement contains necessary regularity conditions and proofs of the lemmas and Theorems 2-4.The proof of Theorem 1 is similar to that of Theorem 1 in [22]Therefore we focus on the proofs of Theorems 2-4.
In the following conditions,lemmas and theorems,we are interested in these estimating equations:
respectively,where
Before stating the lemmas,we impose the following regularity conditions.For simplicity,we denote all the estimating functions as a generic term g (Y,X;β) ,which includes S (Y,X;β) ,SM(Y,X;β) and SA(Y,X;β) .Let f (x) be the probabilitydensity function ofwhich therefore includes
Condition 2 The true parameter value β0is the unique solution to=0,and all the estimating functions denoted by g (y,x;β0) have bounded qth order partial derivatives with respect to x,
Condition 4 The matrices Γ1and Γ2are positive definite andhas full column rank.
Condition 5 The functions W,K and Koare mth order kernels when p>4.are bandwidths for (6) , (7) and (8) respectively.
For simplicity,we use C to denote a generic positive constant independent of n in the proofs.We present the proofs of Theorems 2-4 as consequences of the following lemmas.
hence,we have the estimator of γ.Let c0and γ0be the true parameters of c and γ,respectively.Ifis truly unbiased,we haveotherwise.Let
Lemma 1 Suppose Conditions 1-5 are satisfied,then aswe have for
Following Lemma 2 in[22],we have
Therefore (A.5) is proved by noting that
Lemma 2 Under Conditions 1-5,as
in distribution and
Proof Since
Now we prove (A.10) ,
Following Lemma 2 in[22],we have
(A.10) can be concluded from (A.11) to (A.13) .This completes the proof of Lemma 2.
Lemma 3 Under Conditions 1-5,as n→∞and nl→∞,
in distribution and
where Γ2is given in (A.2) .
Proof Following the approach of Lemma 1 in[22],we can easily show that
in distribution and
where ΓMis defined in (A.3) .Note that
The lemma can then be proved by directly following the approach of Lemma 1 in [22].
Given a particular choice of γ,we denote a profile log-EL of
Differentiating (A.20) with respect to (β,λ) ,we obtain the empirical likelihood equation
which leads to the maximum EL estimatesof L2(β) in (A.19) .
Lemma 4 Suppose Conditions 1-5 are satisfied,then aswith the proper choice ofdepending on the data by cross-validation,which therefore gives less weights to those estimating functions that are likely to be biased,and with probability tending to 1,the likelihood equation (A.20) has a solutionwithin the open balland L2(β) attains its local maximum at
Proof As a consequence of Lemma 3 and the proof of Lemma 1 in[18],we can establish this lemma for the existence of a local maximizer of L2(β) .
Based on the above conditions and lemmas,we are dedicated to prove Theorems 2-4.
Proof of Theorem 2
Proof The results can follow directly from the proof of Theorem 1 in[22].First, we differentiate lγ (β,λ) with respect to (β,λ) in (A.20) ,let
Inverting a Taylor’s expansion on (A.22) near (β0,0)τ,we show that
in distribution.This completes the proof of Theorem 2.
Proofs of Theorems 3 and 4
Proof By expanding the logarithm function in (A.20) at 1 for β and β0respectively,we establish the following expansion of the empirical likelihood ratio:
and therefore
Theorem 3 follows from (A.25) by applying Lemma 3.Noting thatand
Similar to the proof of (A.26) ,we have
(edited by Mengxin He)
?This work was supported by the NNSF of China (No.11271347) and the Fundamental Research Funds for the Central Universities.
?Manuscript September 25,2015;Revised March 31,2016
?.E-mail:zwp@ustc.edu.cn
Annals of Applied Mathematics2016年2期