統(tǒng)計(jì)國際論壇論文輯要

2011-09-08 09:50:30

統(tǒng)計(jì)與信息論壇 2011年1期

關(guān)鍵詞：普查模型

2010年7月10日,中國人民大學(xué)統(tǒng)計(jì)學(xué)院等單位聯(lián)合舉辦了第四屆統(tǒng)計(jì)國際論壇暨第五屆中國科學(xué)院統(tǒng)計(jì)科學(xué)前沿國際論壇。為推動(dòng)和加強(qiáng)統(tǒng)計(jì)學(xué)科的國際交流,增進(jìn)世界各國特別是中美統(tǒng)計(jì)學(xué)者之間的學(xué)術(shù)合作研究,搭建與國際接軌的學(xué)術(shù)交流平臺(tái),自2004年起,中國人民大學(xué)每兩年舉辦一期國際統(tǒng)計(jì)論壇,目前已連續(xù)舉辦了四屆。

本次國際統(tǒng)計(jì)論壇邀請到美國科學(xué)院院士、美國加州伯克利大學(xué)Peter Bickel教授,美國科學(xué)院院士、美國賓夕法尼亞大學(xué)Lawrence D.Brown教授,美國科學(xué)院院士、1982年美國COPSS會(huì)長獎(jiǎng)獲得者、美國卡內(nèi)基梅隆大學(xué)Stephen E.Fienberg教授,英國皇家科學(xué)院院士、1989年美國COPSS會(huì)長獎(jiǎng)獲得者、澳大利亞墨爾本大學(xué)Peter G.Hall教授,中國科學(xué)院院士Zhi-Ming Ma教授,美國科學(xué)院院士、美國賓夕法尼亞大學(xué)Lawrence Shepp教授,美國科學(xué)院院士、美國斯坦福大學(xué)David O.Siegmund教授,美國科學(xué)院院士、美國南加州大學(xué)Michael S.Waterman教授,美國科學(xué)院院士、1993年美國COPSS會(huì)長獎(jiǎng)獲得者、美國斯坦福大學(xué)Wing-Hung Wong教授, 2000年美國COPSS會(huì)長獎(jiǎng)獲得者、普林斯頓大學(xué)Jianqing Fan教授,2002年美國COPSS會(huì)長獎(jiǎng)獲得者、哈弗大學(xué)Jun Liu教授,2006年美國COPSS會(huì)長獎(jiǎng)獲得者、哈弗大學(xué)Xihong Lin教授, 2008年美國COPSS會(huì)長獎(jiǎng)獲得者、賓夕法尼亞大學(xué)Tony Cai教授等在內(nèi)的200余名國內(nèi)外具有統(tǒng)計(jì)學(xué)背景的科學(xué)院院士、COPSS會(huì)長獎(jiǎng)獲得者、專家與會(huì)發(fā)表主題演講。來自中國、美國、澳大利亞等國家的350多名專家學(xué)者在三天時(shí)間里交流分享了200余篇統(tǒng)計(jì)學(xué)研究成果。這些成果,充分展示了統(tǒng)計(jì)學(xué)理論研究的前沿及其應(yīng)用研究的廣度和深度,凸顯出統(tǒng)計(jì)學(xué)在社會(huì)經(jīng)濟(jì)、金融保險(xiǎn)和生物醫(yī)學(xué)等領(lǐng)域的廣泛應(yīng)用前景。

應(yīng)廣大與會(huì)者和讀者的要求,在《統(tǒng)計(jì)與信息論壇》編輯部的支持下出版統(tǒng)計(jì)國際論壇論文輯要。會(huì)務(wù)組整理并翻譯了9位院士的發(fā)言摘要作為此專輯的第一部分,部分國外邀請報(bào)告人將他們發(fā)言詳細(xì)摘要共20篇提交到本專欄,同時(shí),在所有中國大陸邀請?zhí)峤辉敿?xì)摘要中,我們選擇了30篇刊出,以饗讀者。

專欄主編:袁衛(wèi)

院士報(bào)告

Peter J.Bickel

加州大學(xué)伯克利分校統(tǒng)計(jì)系

《統(tǒng)計(jì)推斷在基因?qū)W中的一些應(yīng)用例子》

這里討論來自我們所參與的ENCODE(DNA的百科全書)分析工作小組的兩個(gè)問題。這個(gè)小組是一個(gè)致力于標(biāo)記人類基因組功能的國際合作組織。我們的問題已經(jīng)由該組織以及其他人實(shí)現(xiàn)和利用。評價(jià)何時(shí)兩個(gè)基因特性是相互獨(dú)立的;評價(jià)用于尋跡的peak callers以及利用生物重復(fù)信號(hào)變得嘈雜的點(diǎn)的可靠性。(許王莉譯)

Lawrence D.Brown

賓夕法尼亞大學(xué)沃頓商學(xué)院

《模型選擇后的有效統(tǒng)計(jì)推斷》

在數(shù)據(jù)分析之前,常規(guī)的統(tǒng)計(jì)推斷要求對于數(shù)據(jù)產(chǎn)生的特定模型做出假定,然而在應(yīng)用中,我們經(jīng)常通過各種各樣的選擇算法來決定一個(gè)更適宜的模型,這一過程往往涉及對原來模型的統(tǒng)計(jì)檢驗(yàn)和置信區(qū)間,但是這些實(shí)際操作都被誤導(dǎo)了。被估計(jì)的參數(shù)依賴于原來的模型,而且后來選出的模型的抽樣分布可能具有很多意想不到的性質(zhì)。這些性質(zhì)與通過常規(guī)假定得到的性質(zhì)非常不同,置信區(qū)間和統(tǒng)計(jì)檢驗(yàn)并沒有像設(shè)想的那樣有很好的表現(xiàn)。當(dāng)模型選擇的過程本身就是各種各樣的而且未被充分理解的時(shí)候尤其如此。我們研究被通常使用的高斯線性模型,除了在后模型選擇推斷中潛藏的問題,我們還呈現(xiàn)一種用于對后模型選擇參數(shù)做出有效推斷的程式,這一程式不依賴于關(guān)于模型選擇程式的知識(shí)。我們同樣呈現(xiàn)該程式對于某些特殊線性模型設(shè)定的表現(xiàn)特征,以及涉及高維參數(shù)情形下的漸進(jìn)性質(zhì)。

(許王莉譯)

Stephen E.Fienberg

卡內(nèi)基梅隆大學(xué)統(tǒng)計(jì)系

《關(guān)于殘疾的抽樣數(shù)據(jù)的縱向混合成員模型》

通過混合一種交叉截面級別成員關(guān)系模型和縱向多變量潛軌道模型的特性,發(fā)展了一種新的模型,用于分析縱向數(shù)據(jù)。這些模型假定少數(shù)典型或極端個(gè)體的存在,并對他們在時(shí)間上的變化進(jìn)行建模。通過把每個(gè)個(gè)體看作極端類群凸的加權(quán)組合,從而在不同程度上把個(gè)體看作從屬于所有這些類群。通過這種方式,能夠描述顯著的一般趨勢(極端情形),同時(shí)能夠說明個(gè)體的變異性。提供了一種完全貝葉斯的設(shè)定,而估計(jì)方法給予了馬爾科夫鏈蒙特卡羅抽樣。我們把這種方法應(yīng)用到國家長期關(guān)注調(diào)查(NL TCS),這是一個(gè)用于65歲及以上的美國公民中評定殘疾的狀態(tài)和特征并帶有六個(gè)復(fù)雜波浪的縱向調(diào)查。這一方法的簡單推廣,能夠回答關(guān)于在代與代之間殘疾狀況變化的相關(guān)問題。(許王莉譯)

Peter H all

澳大利亞墨爾本大學(xué)數(shù)學(xué)與統(tǒng)計(jì)系;加利福尼亞大學(xué)統(tǒng)計(jì)系

《基于多峰跡象的高維數(shù)據(jù)聚類》

絕大多數(shù)“非參數(shù)”多元數(shù)據(jù)聚類方法都是基于歸類和分類方法,采用的是距離測度或相異性度量。Hastie等人討論并比較了不同的方法,如K-均值聚類法是基于數(shù)據(jù)向量間的歐氏平方距離進(jìn)行比較的,并進(jìn)行聚類使到類中心距離最小。然而,當(dāng)維度相對于樣本大小來說非常高時(shí),就有可能使得許多成分包含的信息與噪聲無差別。那樣的話,通過度量歐式距離對每個(gè)成份相同處理,可導(dǎo)致許多噪聲成分,這些噪音成分隱藏量聚類的重要信息,而這些信息其實(shí)可以在數(shù)目小得多的其他一些成分中得到。以上考慮激發(fā)了我們在利用一個(gè)聚類方法之前去考慮變量或特征的顯著性選擇問題。然而,大量的高維數(shù)據(jù)的變量選擇器強(qiáng)調(diào)的是響應(yīng)變量Y與解釋變量X是一起測量的,例如Tibshirani(1996)提出的Lasso,Breiman(2001)提出的非負(fù)garot2 to,Candes and Tao(2007)提出的Dantzig選擇, Fan and Lv(2008)給出的確定獨(dú)立篩選法,Dono2 ho and Huo(2001)、Fan and Li(2001)、Donoho and Elad(2003)、Tropp(2005)涉及到的相關(guān)方法。

報(bào)告將考慮超高維小樣數(shù)據(jù)背景下變量選擇方法,響應(yīng)變量的觀測值得不到,只有解釋變量具有觀測值。對于多峰性的檢驗(yàn)包含了Silverman(1981)提出的窗寬檢驗(yàn)和Hartigan and Hartigan(1985)建議的dip檢驗(yàn)方法,與M¨uller and Sawitzki (1991)提出的超大量方法一維情況相等(該情況與我們的工作類似)。窗寬檢驗(yàn)值得研究,不僅僅因?yàn)樗芰餍?更重要的是,它特別容易受到在“邊界數(shù)據(jù)集”(例如數(shù)據(jù)集由單個(gè)向量成分構(gòu)成)的離群點(diǎn)的影響,所以它更傾向于選擇許多假的成分。由于這些原因我們愿意選擇dip檢驗(yàn)方法?；趯Χ喾逍缘姆菂?shù)檢驗(yàn)問題的聚類方法有很多優(yōu)勢,相對于其他方法,它很容易解釋并且不受參數(shù)模型當(dāng)中擬合優(yōu)度的影響。第一,對于聚類的其他非參數(shù)檢驗(yàn)方法它很有競爭性;第二,對于基于模型的方法它更有力。然而頻率論者,基于混合模型的方法或者相關(guān)方法更常見,包括了McLachlan et al(2002, 2004)、McLachlan and Chang(2004)和Datta and Datta(2006)的擴(kuò)展工作。(田茂再譯)

馬志明

中國科學(xué)院數(shù)學(xué)與系統(tǒng)科學(xué)研究院

《來自因特網(wǎng)和情報(bào)檢索中的一個(gè)概率統(tǒng)計(jì)問題》

前半部分介紹網(wǎng)絡(luò)搜索引擎的一些重要進(jìn)展,特別是PageRank的發(fā)展情況,其中著重介紹了本人所帶領(lǐng)的研究團(tuán)隊(duì)在PageRank算法方面的一些研究成果,包括PageRank的極限,不同不可約馬氏鏈的比較,N-步PageRank等。此外,針對PageR2 ank的一些弱點(diǎn),報(bào)告中詳細(xì)介紹了本研究團(tuán)隊(duì)所提出的另外一個(gè)搜索引擎:BrowseRank,包括此算法的原理、計(jì)算方法、數(shù)據(jù)分析等。報(bào)告的后半部分,介紹了因特網(wǎng)信息檢索即搜索引擎的算法設(shè)計(jì)和分析過程中所涉及的主要概率和統(tǒng)計(jì)問題,包括瀏覽過程和two-layer統(tǒng)計(jì)學(xué)習(xí)等。重點(diǎn)介紹了研究團(tuán)隊(duì)的研究成果,包括所提出的一種新型的馬爾科夫骨架過程以及基于此過程提出了一種給網(wǎng)頁重要性排序的算法,two-layer統(tǒng)計(jì)學(xué)習(xí)及其在網(wǎng)絡(luò)檢索中的應(yīng)用。最后提出一些未來可研究的問題及可能遇到的挑戰(zhàn)。

由于Inter網(wǎng)的廣泛使用,對網(wǎng)上信息的檢索每天都在大量的發(fā)生,提高搜索引擎算法的效率是一個(gè)非常重要的問題,而對這方面的研究也是當(dāng)前非常活躍的研究領(lǐng)域。人們在瀏覽網(wǎng)頁、檢索信息的過程中,隨機(jī)性的存在是很顯然的。如果在搜索引擎的算法設(shè)計(jì)中考慮到這些隨機(jī)性,給出其合理的描述,則會(huì)對搜索算法的改進(jìn)有很大的幫助。而如何描述這些隨機(jī)性,將概率統(tǒng)計(jì)的思想和方法合理地引入和應(yīng)用于此問題,就是一個(gè)非常有意義的研究方向。除了利用已有的概率統(tǒng)計(jì)方法研究了經(jīng)典的PageRank算法外,更重要的是深入分析了網(wǎng)上瀏覽、檢索信息的行為,引入了概率統(tǒng)計(jì)的思想和方法,提出了一種新的搜索算法:BrowseRank,這一算法已經(jīng)引起了人們的關(guān)注。此外,給出了網(wǎng)上瀏覽、檢索信息行為的隨機(jī)模型:一種新的馬爾科夫骨架過程,并基于此提出了給網(wǎng)頁重要性排序的算法。還將統(tǒng)計(jì)學(xué)習(xí)的思想方法引入到搜索算法的研究中,提出了two-layer統(tǒng)計(jì)學(xué)習(xí)方法。得到的結(jié)論是:概率統(tǒng)計(jì)的理論和方法在搜索引擎算法的研究中將會(huì)起到越來越重要的作用,而反過來,對網(wǎng)絡(luò)信息檢索的研究也為概率統(tǒng)計(jì)提供了越來越多的有趣和有挑戰(zhàn)性的問題。(張景肖譯)

Larry Shepp

賓夕法尼亞大學(xué)沃頓商學(xué)院

《我們應(yīng)該避免的統(tǒng)計(jì)思維》

報(bào)告分兩部分:第一部分不太有爭議性,如果有人試圖通過“兩個(gè)時(shí)間序列的部分和的相關(guān)系數(shù)不為零”來證明兩個(gè)序列相關(guān),那么他要么在糊弄人,要么他太無能,這種做法,就如同“碳消耗導(dǎo)致全球變暖”這種結(jié)論一樣純屬濫用統(tǒng)計(jì);第二部分爭議性較大,我想評論一下當(dāng)前統(tǒng)計(jì)學(xué)中的一種趨勢,即我所稱的“盲目使用回歸來訓(xùn)練數(shù)據(jù)挖掘算法”,這部分可能屬于“班門弄斧”。

從非零的相關(guān)系數(shù)得出相關(guān)性是我們不應(yīng)該持有的統(tǒng)計(jì)思想。偽相關(guān)(Spurious Correlation)就是一個(gè)例子:對于兩個(gè)獨(dú)立的零均值的隨機(jī)序列X{t}和Y{t}來說,它們各自的部分和仍然會(huì)以很高的概率表現(xiàn)出相關(guān)性。直觀解釋就是,如果一旦這些部分和非零,那么它們將傾向于長期呆在零的左邊或右邊,因?yàn)閺恼祷氐截?fù)值(或相反)的概率比停留在正值上的概率要小,所以這兩個(gè)部分和很容易表現(xiàn)出正相關(guān)或負(fù)相關(guān)。報(bào)告中從該相關(guān)系數(shù)的極限分布角度詳細(xì)解釋了這個(gè)問題的數(shù)學(xué)細(xì)節(jié)。

很多人用他們鐘愛的一種回歸方法解決所有數(shù)據(jù)分析問題,我們也看到很多統(tǒng)計(jì)學(xué)家一開始沿著解決問題的方向出發(fā),但逐漸陷入了自己的方法泥潭不能自拔,全然忘記了要解決的問題是什么。有人堅(jiān)持使用L1回歸(即LASSO),有人堅(jiān)持L2回歸(即嶺回歸),而這些選擇和要解決的問題并沒什么關(guān)系。這樣思考統(tǒng)計(jì)學(xué),是將統(tǒng)計(jì)學(xué)與實(shí)際問題隔離起來。事實(shí)上如果稍微利用一點(diǎn)“物理”知識(shí)(常識(shí)或?qū)嶋H經(jīng)驗(yàn)),那么我們的問題可能會(huì)有一個(gè)好得多的解決方案。60年代John Tukey和他的追隨者將探索性數(shù)據(jù)分析引入統(tǒng)計(jì)學(xué),可能是作為對當(dāng)時(shí)流行的純數(shù)學(xué)建模思想的一種反叛,他的出發(fā)點(diǎn)是對的,但我認(rèn)為他在這條道路上走得有點(diǎn)遠(yuǎn),“讓數(shù)據(jù)說話”的思想甚至走偏了。有些問題看起來需要一種純粹由數(shù)據(jù)驅(qū)動(dòng)的方法,數(shù)據(jù)挖掘方法就是這樣(僅靠數(shù)據(jù)說話),我想Tukey應(yīng)該也不希望我們被數(shù)據(jù)中反映的信息所誤導(dǎo)。我認(rèn)為對于特定問題的解決方案應(yīng)該或多或少取決于問題本身,而反觀神經(jīng)網(wǎng)絡(luò)或訓(xùn)練理論,卻僅依賴于數(shù)據(jù)。顯然,將統(tǒng)計(jì)學(xué)孤立出來不利于學(xué)科之間的交叉繁衍,如數(shù)學(xué)、物理、化學(xué)和生物等(數(shù)據(jù)挖掘似乎僅僅在與計(jì)算機(jī)科學(xué)打交道)。為了說明統(tǒng)計(jì)方法的局限性,我舉一個(gè)最簡單的例子:識(shí)別一幅圖片中的動(dòng)物是貓還是狗,這是三歲小孩都能完成的任務(wù),但在計(jì)算機(jī)和統(tǒng)計(jì)方法中卻處處是難點(diǎn)。我在伊朗曾經(jīng)向一群頂尖的模式識(shí)別專家挑戰(zhàn)這個(gè)貓狗問題,他們能把識(shí)別錯(cuò)誤率控制到10%左右,看起來很好,但遠(yuǎn)不如三歲小孩。我對這個(gè)問題的看法是,我們應(yīng)該花更多精力在提取貓狗的有用特征上(而不是忙于訓(xùn)練神經(jīng)網(wǎng)絡(luò)模型)。如果人們能基于特定問題尋找更好的想法,那么統(tǒng)計(jì)學(xué)將會(huì)變得越來越有趣,但事實(shí)是人們總在尋找一種回歸方法能一下子解決所有問題。最后我以20年前貝爾實(shí)驗(yàn)室的一個(gè)經(jīng)典案例結(jié)束我的報(bào)告:郵政編碼的自動(dòng)識(shí)別是一個(gè)經(jīng)典問題(從信封的掃描圖片中識(shí)別郵編數(shù)字),而著名統(tǒng)計(jì)學(xué)家Trevor Hastie一直在這個(gè)問題上不斷用回歸方法降低識(shí)別錯(cuò)誤率(可達(dá)2%～3%左右),后來他邀請一位工程師Patrice Simaud來給我們作報(bào)告,這位工程師給出一個(gè)方法使得錯(cuò)誤率幾乎降為0%。他是如何做到的?答案還是“物理”知識(shí)(考慮問題的物理背景、對圖像做各種變換)。如果我們盲目使用神經(jīng)網(wǎng)絡(luò),這個(gè)錯(cuò)誤率將很難得到改進(jìn),但直至今日,甚至Patrice Simaud他自己都還堅(jiān)信神經(jīng)網(wǎng)絡(luò)是最好的文字識(shí)別方法,換句話說,我們?nèi)匀辉诿撾x實(shí)際問題做統(tǒng)計(jì)。(謝益輝譯)

David O.Siegmund

斯坦福大學(xué)統(tǒng)計(jì)學(xué)系

《探查比對序列里同時(shí)發(fā)生的變點(diǎn)》

探討探測多維噪聲序列同一位點(diǎn)上的當(dāng)?shù)匦盘?hào),特別關(guān)注了那些發(fā)生在一小段片段上的較弱的信號(hào)。模擬了以獨(dú)立正態(tài)分布均值為臨時(shí)變動(dòng)的信號(hào)。為了探測這些變動(dòng),此研究還給出了聯(lián)合多個(gè)序列的數(shù)據(jù)進(jìn)行分析的統(tǒng)計(jì)方法,給出了這些統(tǒng)計(jì)方法的p值。與原先基于DNA單個(gè)序列的獨(dú)立分析方法相比,這種新方法能獲得更好的檢驗(yàn)功效,而且能更好的對數(shù)據(jù)進(jìn)行解釋?？疾炝艘元?dú)立的高斯觀測均值為臨時(shí)變動(dòng)的信號(hào)。這些模型是為了解決在多個(gè)樣本里探查重復(fù)發(fā)生的DNA拷貝數(shù)目所遇到的問題,這些模型也可以應(yīng)用到DNA拷貝數(shù)目變異的問題中。最后舉出一個(gè)分布式感應(yīng)器在線探測當(dāng)?shù)丨h(huán)境改變的例子。(王瑜譯)

Michael S.Waterman

南加州大學(xué)生物科學(xué)中心

《應(yīng)用歐拉路徑分析方法進(jìn)行DNA序列研究》

首先介紹了DNA序列分析的歷史以及DNA序列拼接研究的進(jìn)展,之后詳細(xì)探討了新時(shí)期DNA序列研究面臨的挑戰(zhàn),最后重點(diǎn)分析了應(yīng)用歐拉路徑分析方法對DNA序列進(jìn)行研究的優(yōu)勢與難點(diǎn)。

(呂曉玲譯)

Wing Hung Wong

斯坦福大學(xué)統(tǒng)計(jì)系

《可選Polya樹及貝葉斯推斷》

提出使用推廣的Polya樹方法構(gòu)造概率測度空間,強(qiáng)調(diào)在異質(zhì)性環(huán)境中測度空間、變量選擇和統(tǒng)計(jì)推斷設(shè)計(jì)三者之間關(guān)系的復(fù)雜性和重要性。探討替代停止和替代選擇拆分變量機(jī)制,討論了使用新機(jī)制構(gòu)造的隨機(jī)測度的優(yōu)點(diǎn),主要結(jié)論是:空間分割的分段光滑密度絕對連續(xù),可以保障在全變差拓?fù)淇臻g上有較大的非零域,產(chǎn)生的后驗(yàn)分布仍然是替代Polya樹,便于參數(shù)計(jì)算。這些理論同時(shí)在高維離散和連續(xù)的多元分布的概率空間適用。(王星譯)

國外論文

Aarti Singh

Machine Learning Department,Carnegie Mellon U2 niversity

Detecting Weak butHierarchically-Structured Patterns in Networks

The ability to detect weak distributed activa2 tion patterns in networks is critical to several ap2 plications,such as identifying the onset of anoma2 lous activity or incipient congestion in the Internet, or faint traces of a biochemical spread by a sensor network.This is a challenging problem since weak distributed patterns can be invisible in per node statistics as well as a global network-wide aggre2 gate.Mostpriorwork considerssituations in which the activation of each node is statistically in2 dependent,but this is unrealistic in many prob2 lems.In this paper,we consider structured pat2 terns arising from statistical dependencies in the activation process.Our contributions are threefold.First,we propose a sparsifying transform that succinctly represents structured activation pat2 terns that conform to a hierarchical dependency graph.Second,we establish that the proposed transform facilitates detection of very weak activa2 tion patterns that cannot be detected with existing methods.Third,we show that the structure of the hierarchicalgraphgoverningtheactivation process,and hence the network transform,can be learnt from few(logarithmic in network size)inde2 pendent snapshots of network activity.

歷史上第一只基金——海外及殖民地政府信托基金是1868年在英國成立的，當(dāng)時(shí)該基金主要是以英國海外殖民地的公債投資為主，它的創(chuàng)立標(biāo)志著基金開始登上歷史舞臺(tái)。早期基金多為契約型的，1879年，英國《股份有限公司法》頒布，投資基金脫離原來的契約形態(tài)，發(fā)展成為股份有限公司類的組織形式。

Formally,consider the problem of detecting a weak noisy binary pattern that is observed atp network nodes:

Hereyidenotes the observation at nodeiand x=[x1,…,xp]∈{0,1}pis thep-dimensional un2 known binary activation pattern,μi>0 denotes an unknown signal strength,and the noiseεi～N(0, σ2).The conditionxi=0,i=1,…,p,is the nor2 mal operating condition(no signal present).Ifxi>0 for one or more i,then a signal or activation is present in the network.We consider patterns that aresupportedoverhierarchically-structured groups of nodes.The hierarchical structure could arise due to the physical network topology and/or due to dependencies between the network nodes. We show that such structured activation patterns can be sparsi?ed further by a transformation that is adapted to the network dependency structure. The transform concentrates the energy in the un2 known patternxin a few large basis coef?cients, thus facilitating detection.

Sparisfying transform for hierarchically-struc2 tured patterns-We model the hierarchical de2 pendencies governing the activation process by the following multi-scale latent Ising model de?ned over a variable vector z.

We only observex={zi}i∈L,thep-dimen2 sional vector of network observations.In the mod2 el,V?denotes the vertices at level?in a binary tree graphT3,andγ?>0 characterizes the strength of pairwise interaction between a vertexiat level? and its parentπ(i).This model implies that the probability of a pattern is higher if the variables a2 gree with their parents.The proposed transform is derived from a hierarchical clustering using covari2 ances{rij}between pairs of node variables.In some paper,it is argued that the latent tree de2 pendency graphT3can be recovered by the hierar2 chical clustering.Furthermore,theunbalanced Haar basis built onT3leads to a sparse represen2 tation of binary patterns drawn from the multiscale Ising model,as described below.

Theorem 1.Consider a pattern x drawn at ran2 dom from a latent Ising model on a tree-graph with uniform degreedand depthL=logdp.Let?0andγ?=∞for?

whereC,c>0 are constants.

(b)Detection performance for patterns from a

withp=1296 leaf nodes,degreed=6 and depthL

=4.The standard deviation of noiseσ=0.1,and

target false alarm probability is 0.05.

The result of the theorem states thatthe transform domain sparsity scales asp1-β(and is therefore determined by the rate at which the inter2 action strength increases with level),while the ca2 nonical domain sparsity scales asp1-β(and is there2foredeterminedbythesmallestinteraction strength between a variable and its parent).Since β>α,the proposed transform enhances the sparsity of canonically sparse patterns that have a hierarchi2 cal structure.

Threshold of Detectability-We consider a simple test based on the maximum of the absolute values of the empirical transform coefficients maxi|as the test statistic,and the following theo2 rem provides an upper bound on the detection threshold.Contrasting this with the detectability threshold of earlier canonical domain methodμ>),we see that a polynomial improvement is at2 tained if the activation pattern is sparser in trans2 form domain.

Theorem 2.With probability>1-2δover the draw of the activation pattern,the test statistic maxi|biTy|drives the probability of false alarm and miss(conditioned on the draw of the pattern)to zero asymptotically asp→∞if

Learning Hierarchical clusters from data2An2 other theorem implies that onlyO(logp)snaphots of network measurements are needed to learn the covariances and hence the proposed transform from i.i.d.network snapshots.

Bo Lu,Chih-Lin Li,Amy Ferketich,Maria Grassi Division of Biostatistics,College of Public Health Causal Inf erence in Repeated Cross-Sectional Ob2 servational Studies

Inferring thecausalrelationshiphasbeen widely studied for cross-sectional design,with ei2 ther binary treatment or multi-level treatment. The discussion is scarce when the cross-sectional study is repeated over time for different partici2 pants.Such design has a time component,but it is different from a longitudinal observational study, in which the same subject is observed for multiple times.In health studies,it is quite common that a promising or successful program is repeated over time with essentially the same protocol.In repeat2 ed cross-sectional observational studies,the esti2 mation of the treatment effect is more challenging than a single time cross-sectional study for three reasons.First,due to the observational nature, the participantsusually possesssomedifferent characteristics not only between different treat2 ment arms,but between different time points as well.Second,during the course of the study,oth2 er interventions(such as a policy change)may in2 terfere with the intervention under investigation. Third,the treatment effect may change over time.

Our research is motivated by an Italian smok2 ing cessation study repeated every year from 2001 to 2006,aiming to compare the efficacy of medica2 tion plus counseling and counseling only strategies on smoking cessation.Early 2005,a law banning smoking in public places was enacted in Italy.The main research question is to assess the impact of the smoking ban on the effectiveness of the smok2 ing cessation program.Therefore,the data is nat2 urally broken into two periods—pre-ban and post -ban.For every participants included in this stud2 y,he/she can be classified into one of the four treatment/time combination groups,say pre-ban treated,pre-ban control,post-ban treated and post-ban control.

We set up a potential outcome framework fol2 lowing Rubin’s Causal Model.For each partici2 pant,there is a vector of four potential outcomes, corresponding to the four treatment/time combina2 tion.To render the valid causal inference,the fol2 lowing assumptions are needed:(1)Stable Unit Treatment Value Assumption(SU TVA):The dis2 tribution of the potential outcomes for one unit is independent of the treatment assignment of other units.It rules out the interference between units and ensures the consistency of the treatment.(2) Strong Ignorability of Treatment Assignment As2 sumption for Multi-level Treatment:The distri2 bution of the potential outcomes is independent of treatment assignment given the observed covariates for the whole vector of potential treatment assign2 ment.(3)Temporal Stability Assumption:It re2 fers to the stability of treatment protocol and treat2ment effect over time in the absence of other inter2 vention.

We propose a four-group matching design to balance the covariates distribution in both treat2 ment/control groups,and the pre/post-ban peri2 ods.A robust rank based difference-in-differ2 ence analysis is performed with the matched sets to estimate the interaction between the ban and the smoking cessation program.As far as we know, there is no optimal matching algorithm to conduct quadruple matching(a matched set of four subjects with one from each group).We propose a subopti2 mal quadruple matching algorithm based on the op2 timal nonbipartite pairmatching algorithm and compare it with the more intuitive greedy matching algorithm.Our extensive simulation study shows that the suboptimal matching beats the greedy matching most of the times.Since the treatment assignment option is a vector,we use a multinomi2 al logit model to estimate the propensity score.As a result,there is a vector of four propensity scores for each participant and ideally,similarity between participants implies that they have the same pro2 pensity score on the whole vector.Therefore,we use a composite distance measure in matching and the balance in covariates is improved as the postmatching group means are more similar.

We also relax the temporal stability assump2 tion a bit by assuming that the treatment effect could change over time monotonically.We then split the pre-ban data into two time periods to try to estimate the treatment effect change before it in2 teracts with the ban.A six-group matching de2 sign and a difference-in-difference-in-differ2 ence estimator is utilized.

The major limitation of the study is that, when hidden bias is present,the strong ignorabili2 ty assumption no longer holds.For repeated obser2 vational studies,the hidden bias may occur at ei2 ther pre-ban period or post-ban period,or both. Sensitivity analysis can be performed to address the robustness of the current findings.But two sensi2 tivity parameters have to be introduced with one for each time period.

Fang Yao

Department of Statistics,University of Toronto Additive Modeling of Functional Gradients

We consider the problem of estimating func2 tional derivatives and gradients in the framework of a functional regression setting where one observes functional predictors and scalar responses.Deriva2 tives are then defined as functional directional de2 rivatives which indicate how changes in the predic2 tor function in a specified functional direction are associated with corresponding changes in the scalar response.Aiming at a model-free approach,navi2 gating the curse of dimensionality requires the im2 position of suitable structural constraints.Accord2 ingly,we develop functional derivative estimation within an additive regression framework.Here the additive components of functional derivatives cor2 respond to derivatives of nonparametric one-di2 mensional regression functions with the functional principal components of predictor processes as ar2 guments.This approach requires nothing more than estimating derivatives of one-dimensional nonparametric regressions,and thus is computa2 tionally very straightforward to implement,while it also provides substantial flexibility,fast compu2 tation and is consistent.We demonstrate the con2 sistent estimation and interpretation of the resul2 ting functional derivatives and functional gradient fields in a study of the dependence of lifetime fer2 tility of flies on early life reproductive trajectories.

Feifang Hu

Department of Statistics&Department of Health Evaluation Sciences,University of Virginia

Response AdaptiveRandomizedClinicalTrial: Overview and Future

1.Response adaptive designs.The design of clinical trials has now become standardized in drug development:a statistician computes the requisite sample size for a given power,and generates a ran2 domization sequence for the packaging of doublemasked medication.Response adaptive designs in clinical trials use sequentially accruing outcome da2 ta to dynamically update the probability of assign2ment to one of two or more treatments.The pre2 liminary idea is to skew these probabilities to favor the treatment that has been the most e?ective thus far in the trial,thus making the randomiza2 tion strategy more attractive to physicians and their patients.Response-adaptive design has a long history in the biostatistical literature.

Clinical trials usually are very complex with multiple objectives.These objectives include,but are not limited to:(i)minimizing the total sample size;(ii)minimizing the total number of expected treatment failures in binary response trials;(iii) minimizing(ormaximizing)expected total re2 sponse in continuous response trials;(iv)minimi2 zing expected total hazard in survival trials;and (v)minimizing total cost of the trial if cost per pa2 tient di?ers among treatments.Response-adap2 tive designs have been proposed to achieve di?er2 ent objectives.

The preliminary ideas of response-adaptive designs can be traced back to Thompson(1933) and Robbins(1952).Zelen(1969)proposed play -the-winner rule to assign more patients to the better treatment.The idea of incorporating ran2 domization in the context of response-adaptive treatment allocation designs is the randomized play -the-winner rule proposed by Wei and Durham (1978).Since then,a large number families of urn models have been proposed and studied in litera2 ture.These urn models include:Urn models: Generalized Friedman’s urn models;Ternary Urn; Drop-the-Loser rule;Generalized drop-the-Loser rule.The details are described in Chapter 4 of Hu and Rosenberger(2006).These urn models are usually intuitively attractive,but not based on any optimal criterion.

Recently Hu and Rosenberger(2006)pro2 posed the three-step method to achieve different objectives in clinical trials:(1)formulate the ob2 jectives mathematically and then?nd the optimal allocation;(2)use sequential estimation,substitu2 ting estimates from the data accrued thus far into the optimal allocation;and(3)?nd an appropri2 ate randomization procedure that will result in op2 timal allocation.The urn models are not applicable here.Hu and Zhang(2004)developed a response -adaptiverandomizationprocedure,whichis called doubly adaptive biased coin design,that is appropriate for these types of problems,based on ideas originated by Eisele(1994).Hu,Zhang and He(2009)further proposed a family of e?cient randomized adaptive designs.Hu and Rosenberger (2003)give a template by which response-adap2 tive randomization procedures can be directly com2 pared theoretically.Further,Hu,Rosenberger, and Zhang(2006)have derived a Rao-Cram’er lower bound on the variance of the procedure,giv2 en a target allocation function.Hu,Zhang and He’s procedure attains the lower bound,and can target any allocation.When Neyman allocation is targeted and Hu,Zhang and He’s procedure is used,we have a most powerful procedure.Some simulation studies can be found in Hu and Rosen2 berger(2006).

2.Implementation and conclusions.Logistical2 ly,there is little difficult in today’s information age in implementing response-adaptive randomi2 zation using a centralized computer system.Inves2 tigators enter responses as they become available, and each time a patient is randomized,the random2 ization probability is updated according to the data accrued thus far.Delayed responses o?er no in2 herent di?culty.Randomization procedures are simply updated when they become available.Gen2 erally a moderate delay in accruing responses will not a?ect the large-sample properties of the trial (Chapter 7,Hu and Rosenberger,2006)The con2 cept behind response-adaptive randomization–giving patients a better chance of being assigned to the better treatment,is attractive,and can aid re2 cruitment.The main features of response-adap2 tive randomization can be summarized as follows: (1)Response-adaptive randomization procedures can reduce treatment failures with no loss of pow2 er.(2)Response-adaptive randomization proce2 dures can increase the potential for patient recruit2 ment.(3)The three-steps method allows us to minimize the total sample size or other costs associ2ated with the clinical trial.(4)The centralized computer randomization with real-time data up2 dates makes such trials much easier to implement.

H arrison Zhou

Department of Statistics,Yale University

Some recent work inspired by Le Cam’s Theory

The key idea of Le Cam’s theory is to approx2 imate a complicated statistical problem by a more tractable one.In my talk,I will describe some of my recent work which was directly inspired by Le Camís theory.The talk will attempt to shed light on how Le Cam’s theory can be applied to highdimensional models.

Firstly,I will discuss how Le cam’s theory has inspired me and my coauthors to propose new statistical procedures for models such as robust nonparametric regression(with the noise distribu2 tion is unknown and possibly heavy-tailed)and generalized nonparametric regression in exponen2 tial families(Poisson regression,binomial regres2 sion,and Gamma regression and so on).We took a unified approach of using a transformation to convert each of these problems into a standard ho2 moskedastic Gaussian regression problem to which any good nonparametric Gaussian regression proce2 dure can be applied.Here is the example for densi2 ty estimation.Suppose that{X1,…,Xn}is a ran2 dom sample from a distribution with the density functionf.We assume that the density functionf is compactly supported on an interval,say the unit interval[0,1].Divide the interval intoTequilength subintervals and letQibe the number of ob2 servationsonthei-thsubintervalIi=counts{Qi}can be treated as observations for a nonparametric regression directly,but this then becomes a heteroscedastic problem since the vari2Instead,wefirstapplytheroottransformvations.The constant is chosen to stabilize the va2 riance and at the same time match the mean.Wemalize to get an estimator off.After the density estimation problem is transferred into a regression problem,any nonparametric regression method can be applied.The general ideas for the root-unroot transform approach can be more formally explained as follows.We show that the resulting density es2 timator enjoys a high degree of adaptively.A nu2 merical example and a practical data example are discussed to illustrate and explain the use of this density estimation procedure.

Secondly,I will discuss minimax rates of con2 vergence for estimating the Toeplitz covariance ma2 trix under the spectral norm.Golubev,Nussbaum and Zhou(2010)showed that the experiments giv2 en by observationsy(1),…,y(n),a stationary cen2 teredGaussiansequencewithspectraldensity f,and

withf∈∑are asymptotically equivalent.A minimax lower bound is derived by constructing a more informative and tractable model as in Gol2 ubev,Nussbaum and Zhou(2010)for which it is easier to derive a minimax lower bound.

Finally,I will discuss minimax rates of con2 vergence for estimating an infinite-dimensional parameter in functional regression for general ex2 ponential families.An estimator that achieves the minimax upper bound is constructed by maximum likelihood on finite-dimensional approximations with parameter dimension that grows with sample size.The Le Camís distance provides the key tech2 nical tool for bounding the error of the approxima2 tions.

Hélène Massam

Department of Statistics,York University

Conj ugatepriordistributionsf or covariance matrices

1.Introduction.Research on covariance esti2 mation was first motivated by the problem of esti2 mating the mean of a multivariate normalNr(μ,tion of the covariance or precision matrix of a mult2 ivariate normal distribution is a central problem in the analysis of high-dimensional data:it is one of the most common task in statistical analysis but al2 so a difficult task due to the large number of pa2 rameters in a covariance matrix and the fact that it is constrained to belong to a cone.The Bayesian approachtocovarianceestimationimpliesthe choice of a prior distribution for the covariance ma2 trix.In this short survey,we will concentrate on conjugate prior distributions for covariance matri2 ces.

The two main ideas in the development of pri2 or distributions for covariance matrices are that of shrinkage,shrinkage of the eigenvalues or shrink2 age of the covariance matrix towards a given struc2 ture,and that of regularization,i.e.the reduction of the number of parameters to estimate.The three main parametrizations of covariance matrices are the polar decompositionODOtwhereOis an or2 thogonal matrix andDthe diagonal matrix of ei2 genvalues,the variance-correlation decomposition DRDtand the Cholesky decompositionTD Ttwhere Tis lower(or upper)triangular with entries func2 tion of the regression coefficients and diagonal ele2 ments equal to 1,Ds diagonal with positive en2 tries.In the last two decades,there has been a string of papers where conjugate prior distributions are developed with the aim of shrinking or regular2 izing the estimates of covariance matrices:all these distributions are based on the Cholesky decomposi2 tion of positive definite matrices and various modi2 fications of the Wishart distribution.We will give below a brief outline of this development.

2.A few basic facts.We will assume that we consider a-dimensional random vector which fol2 lows a normalN(0,ρ)distribution whereρis ar ×rpositive definite matrix,i.e.an element of the coneP+fr×rpositive definite matrices.The Wishart distribution has density

the decomposition of the inverse Wishart

3.Conjugate priors based on the Cholesky de2 composition.Brown,Le and Zidek are the first to generalize these conjugate distribution for the ele2 ments of the Cholesky decomposition ofρby re2 moving the constraint that the parametersδin(1) and(2)be the same and replacing them by two distinctδ1andδ2thus allowing the possibility of shrinkage with the two parametersδ1,δ2.Splitting V={1,…,r}into a partition ofkintervals and u2 sing the notation[i]=[Ni,Ni+1],=∪ji[j],N1=1<…

As one of their main results,Consonni and Veronese obtained the same result but they do so throughtheCholeskyUDUtdecompositionof,that is theL TLtof forLlower traingular. In other words,the variables in Consonni and Ve2 ronese’s article are considered in the order reverse to that taken in Brown,Le and Zidek’s article and ;i:is replaced by.Though reversing the order is of no consequence here in the case of a sat2 urated model,we will see below that the order makes a difference in the case of models Markov with respect to a homogeneous graph.It is also important to note that the prior for the canonical parameterinduced from(5)-(7)is conjugate but not in the sense of Diaconis and Ylvisaker (1979).The conjugate in the sense of Diaconis and Ylvisaker is,of course,the Wishart.

For models Markov with respect to a decom2 posable graph G,Dawid and Lauritzen developped a conjugate prior for the covariance parameter of normalN(0,.For such models,the vertices of G index the components of the multivariate normal random variable and the absence of an edge(i,j) in G implies the conditional independence of the corresponding variables given all the others.Gen2 erally a graph can be described by its cliques and a decomposable graph is characterized by the fact that its set of cliques can always be given a perfect orderingC1,…,Ck.In this case,the covariance pa2 rameter is not the entire covariance matrixbut rather the collection of submatrices,i=1,…, k.The prior developped in Dawid and Lauritzen’s article is conjugate in the sense of Diaconis and Yl2 visaker.It is called the hyper inverse Wishart dis2 tribution and its density is the Markov ratio of marginal Wishart distributions on,i=1,…, kand,i=2,…,kwhereSi,i=2,…,kare the minimal separators of the graphG.It is easy to show that iffollows the hyper inverse Wishart, then the elements of its Cholesky decomposition∑=L DLtwith an order given by a perfect order of the cliques,are distributed as,fori=2,…,k

where here[i]andtake a strictly differ2 ent meaning,more precisely,if we writeHi=

Letac and Massam did for the decomposable case the analogue of what Brown,Le and Zidek did for the saturated case.They replacedδin(8)-(9) by 1 andδi,i=2,…,kthus also allowing for shrinkage in covariance estimation.The prior in2 duced onby(8)-(10)is what is called the WPGinWishartdistributions fordecomposable graphs since this distribution is defined on the cone PG={y∈P+such thatyit=0 whenever (i,j)|E},

theprecisionparameterspaceformodels Markov with respect toG.In Letac,G(2007), another family of distribution was defined on the coneQG={incomplete matricesxwith missingxij, (i,j)|EandxA>0,AΑVcomplete}.This distri2 bution is a generalization with several shape pa2 rametersδiof the hyper-Wishart which was iden2 tified by Dawid and Lauritzen as the distribution of the maximum likelihood estimate of the covariance parameter inQGfor model Markov with respect to G.Unlike the inverse WPG,this distribution did not find a use as a prior in Letac G.and Massam H.’s article.

HoweverKhare and Rajaratnam considered marginal independence models Markov with re2 spect to a decomposable graphG,i.e.,models such the absence of an edge(i,j)inGimplies the marginal independence of the corresponding varia2bles.For these models the covariance parameter∑belongs toQG.In their article,a conjugate prior was constructed directly on the Choleski decompo2 sition∑=L DLtof the covariance.Its density mimics the Gaussian distribution and is proportion2 al to

whereUis a scale matrix parameter andαi,i =1,…,rare the shrinkage shape parameters.This distribution is clearly conjugate for∑since it was built to be.Unlike the normalizing constant of in2 verseWPGin Letac G.and Massam H.’s aticle (2007),the normalizing constant of(12)cannot be expressed analytically.Also,sampling from the posterior cannot be done exactly.However in the particular case where the graph is homogeneous (i.e.decomposable and does not contain a threelink chain as an induced subgraph),this constant can be expressed analytically and sampling can be done exactly as in the case of the inverseWPG.Mo2 reover,in the homogeneous case also,distribution (12)is identical to an inverseWQG.

In fact in the homogeneous case,both the marginal independence model and the conditional independence model(i.e.both when∈PGand when∈PG)are Markov equivalentto a Gaussian model Markov with respect to a directed acyclic graph(abbreviated DA G).In the condi2 tional independence case,the DAG is the Hasse tree of the homogeneous graph G completed by the arrows implied by transitivity.The directions of the arrows are obtained by taking as the first point the root of the Hasse tree and by following the branches towards the leafs.In the marginal inde2 pendence case,the DAG has the same skeleton as in the conditional independence case but the arrows are reversed.We thus see that for Markov models as opposed to saturated models,the direction im2 plied by the Cholesky decomposition of∑matters. To complete this short history of conjugate priors for the covariance,it should be noted that,for a star-shaped graph,which is a particular type of homogeneous graph,Sun and Sun(2005)gave pri2 or distributions that belong to the inverseWPGfam2 ily.

Jan H annig

Department of Statistics and Operation Research, University of North Carolina at Chapel Hill

Comparison between Fiducial and Objective Bayes2 ian Inf erence

In the case of a one-parameter family of dis2 tributions,Fisher gave the following definition for a fiducial densityr(θ)of the parameter based on a single observationx0for the case where the distri2 bution functionF(x|θ)is a function ofθdecreasing from 1 to 0:

While Fisher provided more complex exam2 ples,until recently there was no formal definition of the fiducial distribution for a general model.

Generalized fiducial inference begins with ex2 pressing the relationship between the data,X,and the parameters,ξ,as

whereG(·,·)is termed structural equa2 tion,andUis the random component of the struc2 tural equation,a random variable or vector whose distribution is completely known.

Suppose that a data set was generated using (2)and the value ofXhas been observed asx0. Let beX1,…,Xni.i.d.absolutelyX1,…,Xncon2 tinuous random variables with distributionF(x,ξ) functionand densityf(x,ξ),whereξis a p-di2 mensional parameter.Assume that the partial de2 rivatives ofF(x,ξ)with respect toξare continuous and the Jacobian

for all distinctx1,…,xp.

Define the usual pseudo-inverseF-1(ξ,u)= inffx{F(x,ξ)≥u}and use the structural equation Xi=F-1(ξ,Ui),i=1,…,n,withUii.i.d.U(0, 1).Then Hannig(2009a,b)show that the gener2 alized fiducial distribution simplifies to a distribu2 tion with density

We use(3)as the definition of the generalized fiducial distribution and compare the repeated sam2 pling frequentist performance of confidence sets based on(3)with the performance of Bayesian posterior computed using the reference prior of Berger et al.(2009).The models considered are U (θ,θ2),the triangle distribution and the three pa2 rameter Weibull distribution.The results show that in some cases the generalized.

Jianqing Fan,Shaojunn Guo and Ning H ao

Department of Operations Research&Financial Engineering,Princeton University

Ref itted Cross-validation in Ultrahigh Dimen2 sional Regression

Variance estimation is a fundamental problem in statistical modeling.It is prominently featured in the statistical inference on regression coeffi2 cients.It provides also a bench mark of forecasting error when an oracle actually knows the regression function and such a bench-mark is very impor2 tant for forecasters to gauge their forecasting per2 formance relative to the oracle.For conventional linear models,the residual variance estimator usu2 ally performs well.However,in ultrahigh dimen2 sional linear regressions where the dimensionality is much larger than sample size,traditional vari2 ance estimation techniques are not applicable.Re2 cent advances on variable selection in ultrahigh di2 mensional linear regressions make this problem ac2 cessible.

To estimate the variance of the noise in ultra2 high dimensional linear regression,consider the following naive two-stage procedure.In the first stage,a model selection tool is applied to select a model which includes all important variables with moderate model size.In the terminology of Fan and Lv(2008),the selected model has a sure screening property.In the second stage,the vari2 ance is estimated by ordinary least squares method based on the selected variables in the first stage. The naive method does not work well unless model consistency can be achieved in stage one(This is u2 sually extremely difficult in practice for ultrahigh dimensional problems).One of the main reasons is the high spurious correlation between the unob2 served realized noise and some of the predictors. As a result,the realized noises are actually predic2 ted when extra irrelevant variables are selected (they are indeed recruited to predict the realized noise),leading to serious underestimate of the noise level.The above phenomenon can be easily illustrated in the null model.

To attenuate the influence of spurious varia2 bles entered into the selected model and improve the estimation accuracy,we propose a two-stage refitted procedure,called refitted cross-validation (RCV).The procesure is as follows.We assume the sample sizenis even for simplicity and split the sample into two groups randomly.In the first stage,an ultrahigh dimensional variable selection method like correlation learning(SIS)or LASSO or SCAD with a small regularization parameter is applied to these two datasets separately,which yields two small sets of important variables.In the second stage,the ordinary least squares(OLS) method is used to re-estimate the coefficientβand varianceσ2.Different from the naive procedure, we apply OLS to the first half of the data with the variables selected by the second half of the data and vice versa.Taking the average of these two es2 timators,we get our final estimator,denoted by.The refitting step in the second stage is fun2 damental to reduce the influence of the spurious variables selected in the firststage.The RCV method requires only that the model selection pro2 cedure in stage one has a sure screening property. Undersomemildregularityconditions,we show that

whereεis random error andσ2=E[ε2]is the variance parameter of interest.This statement re2 veals that the RCV estimator of variance has an or2 acle property,i.e.it has the same asymptotic dis2 tribution as the oracle estimator which assumes the true model is known.

We also study one step method via penalized least squares estimators such as SCAD and LAS2 SO.The variance is estimated by the residual vari2 ance based on a penalized least squares estimator. It is shown that there is a bias term of orderO(s logp/n)inherent in the LASSO-based estimator, when theregularization parameter is optimally tuned.Moreover,we show the SCAD estimator for the variance possesses oracle property as well.

We compare the aforementioned methods by simulations.The results show that RCV method is the best and comparable with the oracle estimator. As an application,the methods are used to assess the forecasting errors of home price indices in the core based statistical areas in the US.

To summarize,variance estimation is impor2 tantand challenging forultrahighdimensional sparse regression.One of the challenges is the spu2 rious correlation:covariates can have high correla2 tions with the realized noise and hence are recruited to predict the noise.As a result,the naive twostage estimator seriously underestimates the vari2 ance.Its performance is very unstable and depends largely on the model selection tool employed.The RCV method is proposed to attenuate the influence of the effect of spurious variables.It only requires the sure screening property in the model selection stage,which is a minimal requirement for any rea2 sonable two-stage estimator.Both the asymptotic theory and empirical result show that the RCV es2 timator is the best among all estimators.It is accu2 rate and stable,insensitive to the model selection tool and the size of the selected model.

Jianqing Fan1,Jinchi Lv2

1.Department of Operations Research and Finan2 cial Engineering,Princeton University;2.Infor2 mation and Operations Management Department, University of Southern California Non-Concave Penalized Likelihood with NP-Di2 mensionality

Penalized likelihood methods are fundamental to ultra-high dimensional variable selection.How high dimensionality such methods can handle re2 mains largely unknown.In this paper,we show that in the context of generalized linear models, such methods possess model selection consistency with oracle properties even for dimensionality of Non-Polynomial(NP)order of sample size,for a class of penalized likelihood approaches using fol2 ded-concave penalty functions,which were intro2 duced to ameliorate the bias problems of convex penalty functions.This fills a long-standing gap in the literature where the dimensionality is al2 lowed to grow slowly with the sample size.Our re2 sults are also applicable to penalized likelihood with the＄L_1＄-penalty,which is a convex function at the boundary of the class of foldedconcave penalty functionsunderconsideration. The coordinate optimization is implemented for finding the solution paths,whose performance is e2 valuated by a few simulation examples and the real data analysis.

Jing Li

Division of Science and Mathematics,University of Minnesota-Morris

An Introduction to the Directional Dependence in the Copula Regression Setting

In this presentation,we are going to define and study the concept of directional dependence in regression setting by using copulas.Copulas are multivariate distribution functions that connect or couple one-dimensional marginal distributions andjointdistributions.Directionaldependence model is critical because it can be easily adapted to wide range of applications,such as financial risk assessment and actuarial analysis.Directional de2 pendence in joint behavior between two variables is defined as the form of the regression functions be2 ing different.By doing so,copulas transfer mar2 ginal distributions into uniform on the interval(0, 1).The basic idea of a copula is to separate the de2pendence and marginal distributions in a multivari2 ate distribution.They eliminate the influence of marginal behavior and provide a clear look at de2 pendence structure.Another way of looking at the directional dependence is through the conditional copulas.Copulas are a new way of modeling the correlation structure between variables.In the re2 search we developed how to define and create facts to indentify directional dependence between two variables,how to get the empirical cumulative probability distribution function of each variable in the data set,as well as how to search for a copula with the regression function that has the similar features as the smoothed data involved with linear combination theorem of copulas,truncation invari2 ant copula,etc.The approach has been developed applied to various data sets to analyse dependence structure in many fields.

Joon Jin Song1,Jong-Min Kim2

1.Center for Statistical Research and Consulting& Department of Mathematical Sciences,University of Arkansas 2.Statistics Discipline,Division of Science and Mathematics,University of Minnesota -Morris

Bayesian Analysis ofRandomized Response Sum Score Variables

Direct survey about sensitive topics is a diffi2 cult task because Individuals are not often willing to liberally reveal such information for fear of so2 cial stigma.Randomized response(RR)is an in2 terview technique that can protect respondents pri2 vacy.In this technique,a probability mechanism using randomization devices is commonly involved in answering to sensitive questions.In order to e2 valuate the survey at the most accurate extend, self-protection(SP)is introduced to describe the responses by participants who give the evasive an2 swer without taking the result of the randomization device into account.Warner(1965)and Greenberg et al.(1969)proposed the initial randomized-re2 sponse techniques which can be an e?ective survey method to?nd such estimates because individual anonymity.Recently,many randomized-response varianttechniquesareproposedbyBarabesi (2008),Cruy?et al.(2008),Gjestvang and Singh (2006),and Singh and Chen(2009).

A primary focus of this paper is the implemen2 tation of Bayesian methods to two types of Poisson regression models,Poisson and zero-in?ated Poisson(ZIP)RR regression models,for RR sum score variables under SP assumption.For ZIP RR regression,a latent variable is introduced to facili2 tate computation.Appropriate priors are placed on the parameters in the models,and Markov Chain Monte Carlo(MCMC)techniques are used to ob2 tain approximate samples for the posterior distri2 butions.For the purpose of model choice,we use theDevianceInformationCriterion(DIC, Spiegelhalter et.al,2002),which was developed to compare complex hierarchical models.

RR data from a Dutch survey on non-compli2 ance with social security regulation in 2004 is used to demonstrate the proposed models.We estimate parameters and calculate the criterion DIC values for all models proposed in the paper.It is found that the Poisson models are preferred to the ZIP model.Among the proposed models,the Bayesian Poisson RR regression model outperforms the oth2 ers models in terms of DIC.A natural extension of the model with more flexible distributions can be a promising future study,such as generalized Pois2 son or generalized ZIP.

Jun Shao

Department of Statistics,University of Wisconsin A Theory f or Testing Hypotheses under Covariate -Adaptive Randomization

The covariate-adaptive randomization meth2 od,such as the minimization,generalized minimi2 zation,or covariate-adaptive biased coin,was proposed for clinical trials a long time ago but very little theoretical work has been done for statistical inference after covariate-adaptive randomization. Because of the unavailability of a valid test proce2 dure associated with covariate-adaptive randomi2 zation,practical users often apply test procedures developed for simple randomization,which is con2troversial since procedures valid under simple ran2 domization may not be valid under covariate-a2 daptive randomization.

In this article,we provide some theoretical re2 sults for testing hypotheses after covariate-adap2 tive randomization.We show that one way to ob2 tain a valid test procedure under covariate-adap2 tive randomization is to construct a test valid under any fixed treatment allocation conditionally on co2 variates including the one used in covariate-adap2 tive randomization.

We also show that the simple two samplettest without using any covariate is conservative un2 der covariate-adaptive biased coin randomization in terms of its type I error,and that a valid boot2 strapt-test can be constructed.The power of several tests under covariate-adaptive biased coin randomization is examined theoretically as well as empirically.Our study provides guidance for appli2 cations and sheds light on further research in this area.

K arl Rohe,Sourav Chatterjee,Bin Yu

Department of Statistics,UC Berkeley

Spectral clustering and the high-dimensional Sto2 chastic Block model

Networks or graphs can easily represent a di2 verse set of data sources that are characterized by interacting units or actors.Social networks,repre2 senting people who communicate with each other, are one example.Communities or clusters of high2 ly connected actors form an essential feature in the structure of several empirical networks.However, searching for clusters is algorithmically difficult because it is computationally intractable to search over all possible clusterings.Spectral clustering is one popular and computationally feasible method to discover communities.

Stochastic models areusefulbecausethey force us to think clearly about the randomness in the data in a precise and possibly familiar way.It is natural and important for us to study the ability of clustering algorithms to estimate the true clus2 ters in a network model.This paper studies the performance of spectral clustering,a nonparamet2 ric method,on a parametric task of estimating the blocks in the Stochastic Block Model.

The Stochastic Block Model is a social net2 work model with well defined communities;each node is a member of one community and the proba2 bility of an edge between two nodes depends only on their group memberships.This paper provides the first asymptotic clustering results that allow the number of blocks in the Stochastic Block Model to grow with the number of nodes.Similar to the asymptotic results on regression techniques that al2 low the number of predictors to grow with the number of nodes,allowing the number of blocks to grow makes the problem one of high-dimensional learning.This asymptotic regime is motivated by the empirical work of,which showed that in a di2 verse set of large networks(tens of thousands to millions of nodes),the size of the“best”clusters is not very large,around 100 nodes.

Under the high-dimensional Stochastic Block Model,we bound the number of nodes“misclus2 tered”by spectral clustering.In a simple example, our results show that the proportion of nodes that are misclustered converges to zero,even when the number of blocks grows proportionally to n1=3, where n is the number of nodes.In this simple ex2 ample,the probability of an edge between two nodes in the same block is greater than the proba2 bility of an edge between two nodes in separate blocks.However,this simple example is surpris2 ing;because of the high-dimensionality,the pro2 portion of all edges that connect two nodes in the same block converges to zero.The vast majority of edges connect nodes in separate blocks.

In order to study spectral clustering under the Stochastic Block Model,we first show that under the more general latent space model,the eigenvec2 tors of the normalized graph Laplacian asymptoti2 cally converge to the eigenvectors of a“population”normalized graph Laplacian.Aside from the impli2 cation for spectral clustering,this provides insight into a graph visualization technique.Our method of studying the eigenvectors of random matrices isoriginal.

Lianming Wang1,Jianguo Sun2,Xingwei Tong3

1.Department of Statistics,University of South Carolina;2.Department of Statistics,University of Missouri;3.School of Mathematical Sciences,Bei2 jing Normal University

The Additive Hazard Model f or Inf ormatively In2 terval-censored Failure Time Data

Interval-censored failure time data usually refer to the data in which the failure time of inter2 est is observed only to belong to an interval instead of being known exactly(Kalb?eisch and Prentice, 2002;Sun,2006).Such data arise naturally in medical follow-up studies where,for example, the event of interest(failure)is not observed di2 rectly but only detected by some laboratory tests. Thus,the failure time is known only to lie between the two monitoring times that are the last monito2 ring time at which the event has not occurred and the?rst monitoring time at which the event has already occurred.Examples of interval-censored data can be found in Sun(2006)among others.

For regression analysis of interval-censored data,a few methods have been proposed and Sun (2006)gives a relatively complete review of the a2 vailable methods.However,many of them are quite complicated in their implementation because of the need of estimation of the baseline cumulative hazard function.More importantly,most of them assume that the mechanism thatgenerates ob2 served intervals for the failure time of interest is independent of the failure time,which may not be realistic.In this paper,we considersituations where the observed interval or the monitoring times for subjects in a study that yield the observed intervals may be dependent of the failure time of interest given covariates.

Let T denote the failure time of interest and sup2 pose that there exist two monitoring variables U and V characterizing the monitoring process and they are ob2 servable and may be related to the failure time T.Also suppose that there exists a possibly time-dependent p -dimensional covariate vector denoted by Z,which is completely observed.Defineδ1=I(T

To model the covariate effects,we will as2 sume that there exists an unobservable random processb(t)with mean zero that characterizes the dependency between the monitoring times and the failure time and that given the covariate process and processb(t),the monitoring timesUandV and the failure timeTare independent.The same idea has been used by,for example,Zhang,Sun and Sun(2005)for current status data.Further2 more,we assume that the hazard functions of TT, UandVhave the forms,respectively,

and

We remark that the models given above are quite general as the law of random processb(t)is totally unspecified.Also it is easy to show that the model forTis essentially an additive hazard model since the survival function can be derived as

Suppose that the main goal is to make infer2 ence about covariate effects.For this,an estima2 ting equation-based procedure is proposed and both?nial and asymptotic properties of the pro2 posed estimates are established.In particular,the numerical studies suggest that the proposed infer2 ence procedure works well for practicak situations and an illustrative example is provided.A major advantage of the proposed approach is that it doesnot require estimation of all nuisance baseline haz2 ard functions.

Mingjin Wang1,Qiwei Yao1,2

1.Guanghua School of Management,Peking Uni2 versity;2.DepartmentofStatistics,London School of Economics

Modelling High-Dimensional Volatilities:Fac2 tors,CUCs,and Fast Algorithms

whereFt=σ(yt,yt-1,yt-2,…)representing the available information upto timet.The goal is to build a dynamic model for thep×pmatrix process∑y(t).Note that∑y(t)are not directly observable.Various statistical models have been proposed to model the matrix volatility function∑y(t),aiming for practically effective and statisti2 cally effcient.Whenpis large,dimension reduc2 tion techniques are pertinent in order to avoid over2 parametrization.We adopt in this paper a twostep modeling strategy which is particularly rele2 vant whenpis large(such as in the order of sever2 al hundreds to a few thousands).

First,whenpis very large,it is plausible that the dynamic structure ofytis driven by a lowerdimensional process,i.e.conceptually we may viewytconsisting of two parts:a part with dynam2 ic structure and a static part.This leads to the de2 composition

yt=Fxt+et,

whereFisp×dconstant matrix,xtis ad×1 process withE(xt|Ft-1)=0,andetis a homogene2 ous process satisfying the condition

This is a factor model for time series,xtis a factor process,F is a factor loading matrix,and d is smaller or much smaller than p.With an addi2 tional condition Cov(xt,et)=0,it is easy to see that

In the second step,we model the conditional variance(t)via the CUC of Fan,Wang and Yao(2008).To this end,we assume that

where A is ad×dconstant matrix,and Var(zt| Ft-1)is a diagonal matrix for any t.Therefore the components ofztis called conditionally uncorrelat2 ed components(CUC).With a proposed new algo2 rithm,finding the CUCs is effectively equivalent to an eigenanalysis for ad×dnon-negative deffnite matrix.

Pengsheng Ji1,Jiashun Jin2

1.Department of Statistical Science,Cornell Uni2 versity;2.Department of Statistics,Carnegie Mel2 lon University

UPS Delivers Optimal Phase Diagram in High Di2 mensional Variable Selection

We consider a linear regression modelY=Xβ +z,z～N(0,In),X=Xn,p,where bothpandn are large butp>n.The vectorβis unknown but is sparse in the sense that only a small proportion of its coordinates is nonzero,and we are interested in identifying these nonzero ones.We model the co2 ordinates ofβas samples from a two-component mixture(1-ε)v0+ε π,and the rows of as samplesis a distribution,andΩis apbypcorrelation ma2 trix which is unknown but is presumably sparse.

We propose a two-stage variable selection procedure which we call the UPS.This is a Screen and Clean procedure(WassermanandRoeder [2009]),in which we screen with the Univariatethresholding,and clean with the Penalized ML E. In many situations,the UPS possesses two impor2 tant properties:Sure Screening and Separable Af2 ter Screening(SAS).These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately.As a result,the UPS is effective both in theory and in computation.

We measure the performance of variable selec2 tion procedure by the Hamming distance,and use an asymptotic framework wherep→∞and(ε,π, n,Ω)depend onp.We find that in many situa2 tions,the UPS achieves the optimal rate of conver2 gence.We also find that in the(εn,πn)space, there is a three-phase diagram shared by many choices ofΩ.In the first phase,it is possible to re2 cover all signals.In the second phase,exact recov2 ery is impossible,but it is possible to recover most of the signals.In the third phase,successful varia2 ble selection is impossible.The UPS partitions the phase space in the same way that the optimal pro2 cedures do,and recovers most of the signals as long as successful variable selection is possible.

The lassoandthesubsetselection(also known as theL1andL0penalization methods,re2 spectively)are well-known approaches to variable selection.However,somewhat surprisingly,there are regions in the phase space where neither the lasso nor the subset selection is rate optimal,even for very simpleΩ.The lasso is non-optimal be2 cause it is too loose in filtering out fake signals(i. e.noise that is highly correlated with a signal), and the subset selection is non-optimal because it tends to kill one or more signals when signals ap2 pear in pairs,triplets,etc.

Peter Radchenko,G areth M.James

Marshall School of Business,University of South2 ern California

Variable selection using Adaptive Non-linear In2 teraction Structures in High dimensions

Recently considerable attention has focused on fitting the traditional linear regression model,Yi=of predictors,p,is large relative to the number of observations,n.In this situation there are many methods that outperform ordinary leastsquares (Frank and Friedman,1993).One common ap2 proach is to assume that the true number of regres2 sion coefficients,i.e.thenumber ofnonzero β3′js,is small,in which case estimation results can be improved by performing some form of varia2 ble selection.An important class of variable selec2 tion methods utilizes penalized regression.The most well known of these procedures is the Lasso (Tibshirani,1996)which imposes anL1penalty on the coefficients.Numerous alternatives and exten2 sions have been suggested.

Penalized regression methods have now been extensively studied in the linear situation.In this work we extend the model to the nonlinear and nonadditive framework,using the standard twoway interaction model,

While(1)is a well known model,fitting it in2 volves estimating on the order ofp2terms,most of which are two-variate functions.Thus fitting this model presents a considerable computational and statistical challenge for largep.There has been little research on fitting(1)in the large p situa2 tion.

A simple approach to fit(1)would be to use a penalty function of the form

Minimizing the usual sum of squares plus the penalty(2)has the effect of shrinking most of the main effect and interaction terms to zero,in a simi2 lar manner to that of the Lasso in the linear set2 ting.However,(2)has some significant draw2 backs.First,it is inefficient,because it treats in2 teractions and main effects similarly,when in fact the entry of an interaction into the model generally adds more predictors than the entry of a main effect,and is also harder to interpret.Second,for sufficiently largepit is computationally prohibitiveto naively estimatep2different terms.

Instead we introduce a novel convex penalty function,

We call the method resulting from using(3) and minimizing the penalizedsum ofsquares,“Variable selection using Adaptive Non-linear In2 teraction Structures in High dimensions”or VAN2 ISH for short.The penalty function motivates a computationally efficient block coordinate descent algorithm that handles models involving thousands of interaction terms.We also show that the VAN2 ISH algorithm resulting from(3)has several desir2 able properties.In particular,it turns out thatλ1can be interpreted as the weight of the penalty for each additional predictor included in the model, whileλ2corresponds to an additional penalty on the interaction terms for the reduction in interpretabili2 ty of a non-additive model.In addition,penalty function(3)imposes the heredity constraint and has the advantage of producing a convex optimiza2 tion criterion.One consequence of the algorithm is to make it easier to enter the model for interaction terms corresponding to predictors that have already been added.Thus,it reduces the false positive rate among interaction terms.

Our theoretical results provide conditions un2 der which VANISH will select the correct model with probability tending to one,asnandptend to infinity.Further,these conditions suggest that VANISH should outperform other methods when the true interaction structure is sufficiently sparse. We present a number of detailed simulation re2 sults,both for linear and non-linear models. These simulations involve up to 5000 interaction terms and demonstrate that VANISH is computa2 tionally efficient for moderate scale data sets and hasbetterestimation accuracythan competing methods.VANISH is also competitive with purely additive methods even when the true structure is additive.Finally,we compare VANISH to other possible approaches on the Boston housing data. VANISH successfully eliminates noise interactions and main effects.It also produces superior results, in comparison to other methods,using randomly sampled training and test sets.

Yingying Fan

Information and Operations Management Depart2 ment,University of Southern California Variable Selection in Linear Mixed Ef f ects Models

This paper is concerned with the selection and estimation of fixed and random effects in linear mixed effects models.We propose a class of non2 concave penalized profile likelihood methods for se2 lecting and estimating significant fixed effects pa2 rameters simultaneously for the setting in which the number of predictors is allowed to grow expo2 nentially with sample size.To study the sampling properties of the proposed procedure,we establish a new theoretical framework which is distinguished from the existing ones(Fan and Li,2001).We show that the proposed procedure enjoys the model selection consistency.We further propose a group variable selection strategy to simultaneously select and estimate the significant random effects.The resulting random effects estimator is compared with the oracle-assisted Bayes estimator.We prove that,with probability tending to one,the proposed procedureidentifiesalltruerandom effects,and furthermore,that the resulting esti2 mates are close to the oracle-assisted Bayes esti2 mates for the selected random effects.In the ran2 dom effects selection and estimation,the dimen2 sionality is also allowed to increase exponentially with sample size.Monte Carlo simulation studies are conducted to examine the finite sample per2 formances of the proposed procedures.We further illustrate the proposed procedures via a real data example.

Yu Zhang1,Jun S Liu2

1.DepartmentofStatistics,ThePennsylvania State University 2.Department of Statistics,Har2 vard University)

Fast and Accurate False Positive Control in Ge2nome-wide Association Studies

Genome-wide association studies routinely test hundreds of thousands or millions of genetic markers simultaneously.Adjustment on the pvalues of individual tests is necessary to reduce false positive findings,known as the multiplecomparisonproblem.Withadvancedhighthroughput genotyping technology,the number of tests is expected to further increase.Rare genetic variants are now detectable and will be included in future genome-wide association studies.In addi2 tion to single-marker tests,multi-marker tests will also increase the number of comparisons sub2 stantially.Current practices either use Bonferroni corrections or permutations to evaluate the genome -wide significance of associations.The Bonferroni method is overly conservative due to the strong de2 pendence between genetic markers,which is par2 ticularly problematic for testing high-density markersandmarkersinoverlappingwindows (such as testing rare variants).Bonferroni correc2 ted results will also significantly impact false dis2 covery rate(FDR)procedures.Permutation test, on the other hand,is computationally too intensive for large studies involving millions of comparisons or many thousands of individuals.Advanced meth2 ods for adjusting multiple correlated comparisons are therefore needed for future genome-wide as2 sociation studies.

We propose a new framework to evaluate the genome-wide significance of disease associations, which is statistically rigorous and at the same time flexible to be adapted to FDR procedures and con2 ditional tests.Our approach is motivated by the Poisson heuristic.When testing independent hy2 potheses,the number(WT)of rejected true null hypotheses at a threshold T follows approximately a Poisson distribution Po(λ),and the family-wise significance atTis Pr(WT>0)≈1-e-λ.When testing associations on markers,the tests are posi2 tively correlatedduetolinkagedisequilibrium (LD,a genetic term for correlation).We therefore want to compensate the LD effects,such that the Poisson formula applies.Given that LD decays o2

ver distance,the correlation between tests are mostly local,this makes our task much simpler. We define a clump of markers,centered at marker i,as all markers within its neighborhood,e.g., markers with indices[i-a,i-a+1,…,i,…,i+ b].The neighborhood size can be determined from data.We then define a new test statistic of the clump at marker i and at threshold T as

wheretidenotes the original test statistic of disease association at markeri,andI(e)denotes an indicator of evente.The new statisticri(T)has the following properties:(1)ri(T)takes binary values,whereri(T)=1 if and only if marker i has the largest test statistic within its neighborhood and is aboveT;and(2)ri(T)defined at all mark2 ers are locally negatively correlated,i.e.,there is at most oneri(T)=1 within a local region.Using the new statistic,we compute a total number of re2 jectionsWr(T)by summingri(T)over all mark2 ers.If none of the markers are associated with the disease,we have thatWr(T)follows approximate2 ly a Poisson distribution Po(λr(T)),where the Poisson meanλr(T)is a function ofT,and the family-wise significance ofTis approximately 1-

Using the Poisson approximation,the task of ad2 justing p-value is reduced to estimatingλr(T).Given that the number of disease associated markers in a ge2 nome-wide study is typically much smaller than the total number of markers tested,we can approximateλr(T)by the total number of tests multiplied by the nominal p-value of individual test ofri(T).The computation time of our method is therefore unrelated to the number of markers,e.g.,adjusting p-values for 5 million markers is almost as fast as adjusting 5,000 markers.This is a critical distinction between existing methods and ours.There are no closed form forλr(T),we therefore use numerical calculation.We developed efficient importance sampling algorithms to estimateλr(T),such that the computation time is un2 related with the sample size and the thresholdT.Ana2 lytically,we use Chen-Stein’s method to obtain an error bound of the Poisson approximation,which istypically very small(in several magnitudes)relative to the family-wise significance ofT.

We evaluated the accuracy of our method using 1,000,000 simulated high-density markers with strong correlations.Permutation p-values are used as the benchmark.We observed that our methods tracked the permutation p-values accurately for a wide range of thresholds we tested.In comparison,we also computed the Bonferroni adjusted p-values,which were overly conservative by a factor of at least 1.6.We further ob2 served that the sample size has a substantial impact on the family-wise significance,which is due to the fact that the assumed asymptotic distribution of the test sta2 tistics fails for small samples and rare alleles.We there2 fore developed a truncation technique to correct for the sample size effects,and the results after correction were much more accurate.In terms of computation time,our methods took a few seconds or minutes to approximate the significance of a wide range of thresholds simultane2 ously,as oppose to several hours or days required by ex2 isting methods and the permutation test.The time com2 plexity mainly depends on the LD decay,where stronger LD requires larger clumps defined forri(T),and hence longer time in computingλr(T).

Our method can be further used to estimate FDR,which is defined as E(nFP(T)/nP(T))and prac2 tically estimated by E(nFP(T))/nP(T).Here,nFP(T) denotes the number of false positives andnP(T)de2 notes the number of positives(rejections).Using our method,E(nFP(T))=λr(T),andnP(T)is the to2 talnumber ofsignificantassociations observed from data.To control FDR in genome-wide asso2 ciation studies,we follow conventional step-down FDR procedures1,but with one additional impor2 tant step that removes the LD effect of each detec2 ted disease marker on its neighboring markers. From a biologist’s perspective,FDR should meas2 ure the expected proportion of falsely discovered disease loci among all reported loci.Without remo2 ving LD effects,the conventional FDR applied to individual tests only measures the expected propor2 tion of falsely discovered markers among all repor2 ted markers,which is very misleading when inter2 preted biologically.

Given that our method only uses local infor2 mation to adjust p-values,the method can be generalized to conditional tests.An interesting strategy is to use lower thresholds at regions that are more disease susceptible,and higher thresholds elsewhere.A huge amount of data is currently a2 vailable that are(indirectly)predictive of the dis2 ease loci,such as susceptible genes detected in pre2 vious studies,sequence under conservation and se2 lection,regulatory factors,and gene annotations. We can incorporate all available auxiliary informa2 tion to calculate some“disease-potential”scores along the genome.We then use lower thresholds to detect associations at“high-potential”regions, and larger thresholds at“l(fā)ow-potential”regions. The key is to determine the local thresholds such that we gain power in association mapping but maintain a desired false positive level.To our best knowledge,our approach is currently the only method that can tackle this problem in the context of multiple correlated comparisons.

Zhigang Li1,Ian W.McKeague2

1.Department of Community and Family Medicine, Dartmouth Medical School;2.Department of Bio2 statistics,Columbia University)

Power under local alternatives f or generalized esti2 mating equations

Power and sample size formulae play an im2 portant role in the design of experimental and ob2 servational studies.There is an extensive literature on this topic,especially for hypothesis tests based on the method of generalized estimating equations (GEE),as introduced by Liang and Zeger(1986) for handling correlated longitudinal or clustered data.In this setting,however,all existing sample size formulae have been derived using the asymp2 totic behavior of test statistics under fixed alterna2 tives.In particular,fixed alternatives were used by Liu and Liang(1997)[henceforth LL]to derive sample size formulae for quasi-score statistics, and by Shih(1997)[henceforth Shih]for Wald test statistics,see also Pan(2001),Liu,Shih and Gehan(2002),Jung and Ahn(2003),Jung andAhn(2005),andKim,Williamson and Lyles (2005).

We develop a potentially more accurate ap2 proach to power and sample size calculations in the GEE setting using local asymptotic theory(Le Cam,1960).That is,rather than directly calculat2 ing the power of a test ofH0∶Ψ=Ψ0vs.a fixed alternativeH1∶Ψ=Ψ1,whereΨis ap-vector representing the parameters of interest,we first calculate the asymptotic power of the test under a sequence of local alternativesH1m∶Ψ=Ψ0+h/

m,where is the sample size(the number of clus2 ters)andhis a fixedp-vector(the local parame2 ter).The local asymptotic approach is considered to be standard in settings that do not involve corre2 lated data(van der Vaart,1998),but it has not previously been attempted in the GEE setting as far as we know.The justification for our approach is provided by a result showing that in general GEE settings the Wald and quasi-score test sta2 tistics are asymptotically noncentral chi-squared under the sequence of local alternatives.The proof of this result also provides a suitable small sample approximation to the asymptotic power function in2 volving weighted averages of the gradients of the terms in the estimating equation.Under marginal models,we then express these averages as expec2 tations under the distribution of the covariates. This leads to explicit sample size formulae by in2 verting the small sample approximation to the as2 ymptotic power function at the local parameter val2 ueh=m(Ψ1-Ψ0),and solving formin the usual way.

Our approach has several advantages over ex2 isting approaches.For the quasi-score test,all existing methods of power calculation depend on knowing the limiting value of the nuisance parame2 ter estimator underH1;such estimators are incon2 sistent under fixed alternatives,cf.Self and Mau2 ritsen(1988).This can involve the additional bur2 den of having to numerically find the root of a non2 linear equation;our approach,on the other hand, does not require this extra step because the nui2 sance parameter estimator is consistent under the local alternatives.LL restricted attention to the case of discrete covariates,but for continuous co2 variates their approach requires an initial(ad hoc) discretization of the covariates,and it is not clear how this approximation would affect their results, cf.Shieh(2000).

Detailed comparisons of our results with those of LL and Shih are made in the setting of marginal models with exchangeable correlation structure. Our results agree with both LL and Shih for con2 tinuous outcomes and cluster level exposure.For studies with binary outcomes,however,our for2 mulae differ from Shih’s.In two-sample compar2 ison problems with binary outcomes,we find(u2 sing a simulation study)that Shih’s approach can overestimate the sample size by 10%to obtain 90%power,whereas our approach provides an ac2 curacy of around 2%.For a cohort study of pairs of siblings in which one sibling is exposed and the other unexposed,and the outcome is binary,LL’s approach does not lead to an explicit sample size formula when the hypothesized null value is nonzero,whereas our approach produces a closedform expression,along with better accuracy.

國內(nèi)論文

G ao Wei

School of Statistics,Xi’an University of Finance &Economics

Generalized Directed Acyclic Graphs Models f or the Identif ication of Nonlinear VAR Models

Graphical models have become an important tool for the statistical analysis of multivariate data sets.Recently the graphical models have been in2 troduced to model dependence structures among multivariate time series.One kind of graphical models is used to research the relationships be2 tween the components of a multivariate time se2ries,where each vertex is associated with a com2 plete time series.Another kind of graphical models is applied to identify the structural vector autore2 gressive models,where each variable at a specific time is represented by a separate vertex.Up to now,graphical modeling of time series concerns mainly with linear time series.Few studies deal with nonlinear time series.

In this paper,a generalized directed acyclic graph(denoted by DAG)for identifying nonlinear vector autoregressive model is proposed.The ver2 texes in the graph denote random variables at dif2 ferent times,and the edges denote causal depend2 ence among the corresponding variables.The first step is to construct the sample conditional inde2 pendence graph(denoted by CIG).A conditional mutual information statistic is presented to test the conditional independence between the variables. The statistic is estimated by correlation integral method and the significance is determined by a per2 mutation procedure.In addition,a statistic based on generalized and linear conditional mutual infor2 mation is proposed to test the linearity of the rela2 tions between the variables.The significance of the test statistic is determined by a bootstrap method. The bootstrap time series preserving all the linear properties of the original series is generated from surrogate data rather than an estimated VAR mod2 el,which avoids the calculate complex and errors for estimating the VAR model.

The next step is to determine which DAG rep2 resentations are consistentwith theundirected CIG.The arrow of time is attached to undirected edges from the past to the present.A generalized likelihood ratio statistic proposed to determine the direction of the dependence between contemporane2 ous variables.Aconditionalbootstrapmethod based on nonparametric maximum likelihood esti2 mation of the nonlinear function is used to approxi2 mate the distribution of test statistic.

The method is applied to simulated time series which present linear or nonlinear lag dependence and instantaneous dependence.It is shown that this method may lead to an exact characterization of the interactions in vector autoregressive models.

Hong Chang

School of Economics,Xiamen University

Analysis on the Survey of the Rural–Urban Edu2 cation Dif f erence

1.Introduction

Education is important to a nation’s economic development.According to The China Youth Dai2 ly,among 2952 respondents in a survey,56.5%of them think that"more and more,education is not fair".Does the unfairness really exist?Where is the weak link?How to improve it?To explain these issues,an investigation was carried out to the labor force under the age of 45.Their basic in2 formation,educational situation,education expen2 ses and viewpoints on education expense were col2 lected through the questionnaire survey.

2.Data and methods

2.1 Data collection method The survey data was shared data collected by cooperative multipeople.The investigation objectwas employed population aged from 18 to 45 in the nation’s big cities.109 valid questionnaires were collected,in2 cluding Beijing(11 samples,10%),Shanghai(14 samples,13%),Xiamen(28 samples,26%), Nantong(10 samples,9%),Wenzhou(10 sam2 ples,9%),Binzhou(15 samples,14%),Chang2 shu(10 samples,9%),and other city(11 sam2 ples,10%).

2.2 Data processing and display This survey was conducted in the fourth quarter of 2009.Con2 sidering that inflation leads to higher educational expenditure throughout years,the educational ex2 penditures were adjusted each year according to CPI of that year to make the data comparable.Ac2 cording to the nation’s statistical yearbook,the year of 1978 was taken as the base.Because the age in the questionnaire was filled with interval es2 timation,mid-point in each group were used as the respondents’age.Some special groups,such as“under the age of 18”,was counted as“age of 17”(in China the legal employment age is 16,and 17 is the mid-point of 16 and 18).And“over theage of 35”was counted as“age of 40”(the investi2 gation object was under the age of 45)

3.Statistical description and analysis

3.1 Methods and tools from the scatter dia2 gram,the relationship between personal education expenses and monthly income was displayed.The differences of education level and monthly income between the urban and rural area were testified with Wilcoxon test.Spearman test was applied to analyze the relationship between personal educa2 tion expenses and monthly income.Respondents’opinions on education expenses were also summa2 rized.

3.2 Findings

3.2.1 Education level,education expenses and future income show a positive correlation From the scatter diagram,no significant linear relationship between income and education expense is showed. The result of spearman test shows that the p-val2 ue is 0.098.Thus at 95%confidence level,per2 sonal education expenses and future income don’t have positive correlation.This is further evidence of our analysis above,that is,people who have highereducationexpensenotnecessarilyhave higher income.

3.2.2 The significant difference in education level between urban and rural area leads to the difference in income level The result of Wilcoxon test shows that there is a significant difference of education level between urban and rural(p=0. 0007).The null hypothesis is rejected,mex> mey and mex=5,mey=4.The survey result shows that the education level of labor force come from rural obviously falls behind that of urban.On the other hand,the result of Wilcoxon test also shows thatthere is asignificantdifference in monthly income between urban and rural(p=0. 009).The null hypothesis is rejected and the labor force which comes from urban has higher income than those come from rural.52%of respondents think that“education expense is too high”.Half of them think that“education expense is rational”. An important result is that both the urban and the rural respondents think that“the government’s educational expenditure input is too low and it should be increased”.This ratio is above 60%.

3.2.3 People are generally dissatisfied with the educational expenditure This two sets of data shows that the public has obvious pressure on pay2 ing the education expenses,and the government should increase its educational investment.

4.Conclusions

4.1 Reasons analysis Income is not in direct proportion to educational level due to the demand requirement gap between the job market and aca2 demic education.The data analysis shows that ru2 ral population accounts for 2/3 of the country’s population,but their average education level is ob2 viously lower than urban population’s and they on2 ly reach the high school level generally.Unjust al2 location of the government’s educational expendi2 ture between urban and rural areas leads to the low education level in rural areas.

4.2 Suggestions on narrowing the education gap between urban and rural areas(1)Increase ed2 ucation investment,reduce family’s education bur2 den;(2)Invest more educational expenditure to rural areas,narrow the education gap between ur2 ban and rural areas;(3)Reform the enrollment system and the tuition system;(4)Introduce more incentives,improve the education quality in rural areas.

He Li

China Research Institute on Science Popularization Research onpopulationscienceliteracyinxi2 aoguan communities,Chaoyang district,Beijing

To understand the basic situation of the rele2 vant population science literacy in xiaoguan com2 munities,the street office community launched a survey in the five communities of this area the“population science literacy survey”in May 2009. The China citizen science literacy questionnaire in 2007 was utilized with a total of 2,200 question2 naires being issued,2,100 were recovered,with a recovery rate of 95.45%,The 2100 questionnaires were valid.The survey questionnaires were 2200, and the questionnaires were in accordance with thecommunity’s population density and the communi2 ty’s economic condition.To determine the ques2 tionnaire ratio from the perspective of the respond2 ents’career in order to ensure accurate compara2 bility

To discuss

The survey also reflects xiaoguan community’s population having the characteristics of a science literate people and knowing which aspects of sci2 ence literacy are important.

1.Withalowerlevelunderstanding and knowledge of basic science.Correct understanding of the scientific disciplines was not high.20～90%。The role of science,especially in under2 standing the relationship between science and soci2 ety this not clear The activities of various forms of superstition,those who are still believing in for2 tune telling,the highest rate is 20.90%.Second2 ly,there is an imbalance in scientific understand2 ing.Significant gender differences,those that do not believe in all kinds of superstition,with the ra2 tio of the proportion of male respondents more than the female respondents,this is cause for con2 cern.Increasing the female population’s under2 standing as to the relationship between science and society and future communities to understand,is the focus of popular science.

2.Understanding of the scientific point of view the ratio of right and wrong answers were a2 bove the national average,with the do not know proportion being below the national level.Under2 standing of scientific ideas and how it reduces the error rate is also one of the tasks of popular sci2 ence.

3.In the limited understanding of the scientif2 ic method regarding re the subject of the scientific method in answering the three tests questions,of which two test questions were answered correctly, were in the proportion of less than 50%,approxi2 mately 28.30%.

4.The community population and their partic2 ipation in public affairs,science and technology was less than 10%.

5.Althoughthecomposition ofscientific terms of scientific quality,scientific point of view, scientific methods and social relations with the four parts of the survey results,a small off the street level of the community population,was slightly higher than the scientific quality of the national av2 erage,but the national level is a lower level,with a greatgapbetweenthedeveloped countries. There is much room to enhance its scientific quality standards.

6.Statistical analysis shows that all test ques2 tions as considered from the view of the answers given,with the increase in the level of education, the proportion of correct answers also increased and those respondents who obtained an undergrad2 uate education had a significantly higher proportion of correct answers than any other age group.

Propose

1.Enhance the transparency of government decision-making technology,all related to peo2 ple’s livelihood,major scientific and technological decision-making,not only for government de2 partmentsandscientists,butthegovernment should get the people more involved,and allow for the community population to participate in public affairs,scienceandtechnology.Theprocess should also involve the popularization of scientific and technologicalknowledgeandthescientific quality improvement process.

2.Strengthen the science education communi2 ty.The community is the city’s"cell",with the social development and reform,a large number of "units"and become a"social person."Enterprise divestiture and the government should bear the so2 cial functions and service functions required by the community.The direct purpose of the science edu2 cation community is to make the public understand science,and the use of non-intervention,noncompulsory education,so that the community pop2 ulation can participate in activities,and to allow the community population to study and master sci2 entific knowledge so as to understand that it is very important for the community population to draw a distinction between science and superstition,civili2 zation and ignorance.

In carrying out the process of science educa2 tion,the community needs to develop high-quali2 ty science volunteers,so that they are active in ev2 ery part of the community to carry out a variety of popular programs such as anti-cult propaganda, earthquake relief and other activities.Popular sci2 ence community volunteers are the main force in the community,volunteers in the community,sci2 ence popularization of scientific knowledge,pro2 moting scientific methods,dissemination of scien2 tific thought,the spirit of science against supersti2 tion and pseudoscience all have an irreplaceable role in the society.

Ping-Feng Xu1,Jianhua Guo2and Xuming He3

1.School of Mathematics and Statistics,Northeast Normal University;2.Key Laboratory for Applied Statistics of MOE and School of Mathematics and Statistics,Northeast Normal University;3.De2 partment of Statistics,University of Illinois at

An Improved Iterative Proportional Scaling Proce2 dure f or Gaussian Graphical Models

The maximum likelihood estimation of Gaussi2 an graphical models is often carried out by the iter2 ative proportional scaling(IPS)procedure.In this paper,we propose an improvement to the IPS pro2 cedure by using local computation and by sharing computations on a junction treeT.The proposed procedure,called IIPS for short,adjusts node by node the marginals of the cliques of the underlying graph contained in the nodes ofT,and sends mes2 sage between two adjacent nodes ofTby an ex2 change operation for the propositional scaling step. We show,through complexity calculations and em2 pirical examples,that the proposed IIPS procedure works more efficiently than the conventional IPS procedure for large Gaussian graphical models.

Ren Yanyan1,Huaxiaoan2,Wuhe1

1.School of Economics,Shandong University;2. Qilu Bank

Local Government Investment,Limited Access and Economic Growth:Evidence f rom Cross-sectional Correlated Panel Data

As institutional offers,the limited access bar2 riers local government enacts by its administrative power do not only protect local economy and em2 ployment,but also offer important socioeconomic support for local government officers’performance competition since only with an outstanding result of local economy and employment can the officers have a favorable assessment among the perform2 ance competition(Li’an Zhou(2004)).What is more,since current Chinese public finance system bears significant federal characteristics,the limited access mode of local government institutional offer2 ing is beneficial to avoid the sharing of rents.Rea2 son for this is that if one province opens its area in advance,then its economic rents might be shared by other economic units enters this area while local government,as the institutional offer,has the willing of owing the rents exclusively.Therefore, limited access is tightly correlated to natural gov2 ernment status(Douglass C North(2006)).How2 ever,as a behavioral factor,investment is an im2 portant means for local government to influence e2 conomy.On the other hand,as a measure to main2 tain limited access ability and then obtain economic rents,local government investment does not only directly influence the economic efficiency and per2 sistency which we take as a performance index,but also a guarantee to keep the local limited access a2 bility which we take as an institutional index.

In order to study the certain relationships be2 tween all three of them,we build the following models separately:

The one which shows the influence of limited access and local government investment to local e2 conomic growth.And in order to show the influ2 ence of limited access and economic growth to local government investment,then we build the model (2):

Finally,we also use the model(3)to study how can local government investment and economic growth influence its limited access:

Then,considering the problem of cross-sec2 tion correlation that the panel data model usually has,The first step we need to do is to test the ex2 istence of cross-sectional correlation.One feasi2 ble testing method can be constructed based on la2 grange multiplier test which is proposed by Brochet and Pagan.

H0:Non-diagonal elements of∑are zero;

H1:Non-diagonal elements are not zero.

Then the testing statistic is:where is the coefficient of residuals between cross -sectioniandj.When NT is large enough,λLMalso followsΧ2distributionwhich freedomisly:E(ViV′j)=σijI,where I isT×Tunit matrix, then the variance-covariance matrix is:

In order to get the parameters’best linear un2 biased estimators,we use the feasible generalized least squares(FGLS)method because generalized least squares(GLS)is useless since the variancecovariance matrix(V)of the above model is un2 known.The matrix form of model(1)–(3)is:

andxi=(xi1,…,xiT)′(i=1,…,N),e=(1,…, 1)′T×1,

Set:Y=(Y1,…,YN)′,Yi=(Yi1,…,YiT)′(i= 1,…,N)

Then:^βFGLS=(X′V-1X)-1X′V-1Y,Where the estimator of elements(σij)of∑in matrixVis ^σij=e′iej/T(i=1,…,N;j=1,…,N),eiandejare respectively the residual series of cross-sectioni andjafter we estimate the parameters with meth2 ods of least squares dummy variable(LSDV).In order to get parameters’efficient estimators,we test whether the cross-sectional sample data of model(1)—(3)is cross-sectional correlated first2 ly.The testing statisticλfor model(1)-(3)are 1761.719,1845.265 and 2241.158 and the approx2 imate critical value ofχ2distribution which free2 dom isn(n-1)/2=91(n=14)is 61.754.This means that the cross-sectional data of model(1) -(3)are correlated to each other.Therefore, with consideration of cross-sectional correlation, we use feasible general least squares(FGLS)to in2 vestigate the relationship between local govern2 mentinvestment,limited access and economic growth based on year 1994 to 2007 Chinese provin2 cial data.The results show that the limited access has inverse U-shaped effect on local economic growth and current limited access level promotes local economic growth significantly for most prov2 inces;in addition,limited access has U-shaped effect on local government investment and the rela2 tionship between them are still among the descend2 ing side now.

高建偉

華北電力大學(xué)經(jīng)濟(jì)與管理學(xué)院

《企業(yè)年金稅收優(yōu)惠政策精算分析》

實(shí)施稅收優(yōu)惠政策的目的是利用間接調(diào)控手段,通過市場的作用,達(dá)到鼓勵(lì)非強(qiáng)制的企業(yè)年金計(jì)劃的發(fā)展。迄今為止,我國企業(yè)年金稅收優(yōu)惠政策仍依據(jù)2000年國務(wù)院關(guān)于《完善城鎮(zhèn)社會(huì)保障體系的試點(diǎn)方案》,即企業(yè)年金對職工個(gè)人采用TTT模式,對企業(yè)采用ETT模式且繳費(fèi)免稅限額僅為4%;對投資和領(lǐng)取養(yǎng)老金兩個(gè)環(huán)節(jié)都沒有減免稅收的規(guī)定。國家尚未明確企業(yè)年金稅式支出與鼓勵(lì)發(fā)展企業(yè)年金之間的成本與效益的關(guān)系,由此造成企業(yè)年金稅收優(yōu)惠政策不完善,這將成為制約企業(yè)年金發(fā)展的瓶頸。

本研究利用精算理論,將稅收優(yōu)惠政策視為外生變量,對稅收優(yōu)惠進(jìn)行精算分析。鑒于稅收優(yōu)惠政策可分別針對企業(yè)年金繳費(fèi)、投資和領(lǐng)取養(yǎng)老金三個(gè)階段實(shí)施,而這三個(gè)階段對應(yīng)不同的稅種,每一個(gè)環(huán)節(jié)對應(yīng)的稅種也不完全相同,故針對三個(gè)階段可產(chǎn)生8種稅收優(yōu)惠類型,即EEE、EET、ETE、TEE、ETT、TTE、TET、TTT,其中T和E分別表示征稅(taxing)和免稅(exempting)。根據(jù)8種稅收優(yōu)惠類型,利用精算現(xiàn)值方法分別建立稅式支出精算現(xiàn)值模型,推導(dǎo)不同模式下稅式支出精算現(xiàn)值模型,進(jìn)而研究不同稅惠政策模式下稅收優(yōu)惠對稅式支出的影響。研究發(fā)現(xiàn):EEE模式下稅式支出最大,政府損失最大;TTT模式下稅式支出最小,政府損失最小。損失其次較小的是ETT、TET及TTE。由于稅式支出涉及三個(gè)階段,且涉及因素較多,如企業(yè)繳費(fèi)率、個(gè)人繳費(fèi)率、企業(yè)繳費(fèi)稅收優(yōu)惠比例、個(gè)人繳費(fèi)稅收優(yōu)惠比例以及投資收益稅率等,當(dāng)參數(shù)變化時(shí),稅式支出大小變化較大,因此EET與ETE、TET與TTE之間的稅式支出大小要根據(jù)具體參數(shù)大小方可確定。

國家通過稅收優(yōu)惠政策激勵(lì)企業(yè)發(fā)展企業(yè)年金。企業(yè)年金基金實(shí)現(xiàn)積累后,可提高職工退休后的待遇,但國家舉辦企業(yè)年金卻付出稅式支出的代價(jià),因此需明確不同稅收優(yōu)惠模式下的企業(yè)年金規(guī)模。當(dāng)稅收優(yōu)惠政策不同時(shí),企業(yè)年金基金規(guī)模的積累會(huì)受稅收影響而各不相同,主要受繳費(fèi)階段和投資階段稅率的影響,領(lǐng)取階段的個(gè)人所得稅征收與否對前期基金規(guī)模累積并無影響。本文利用精算理論,得到不同稅收優(yōu)惠模式下企業(yè)年金積累規(guī)模精算模型,發(fā)現(xiàn)EEE與EET、ETE與ETT、TET與TEE、TTE與TTT這四種模式下企業(yè)年金基金規(guī)模相同。通過比較得出,由于繳費(fèi)階段和投資階段都免稅,故EEE與EET模式下企業(yè)年金基金規(guī)模最大。ETE與ETT模式雖然在繳費(fèi)階段有優(yōu)惠,但投資收益稅沒能免除;而TET與TEE模式均是在繳費(fèi)階段無優(yōu)惠,但投資階段免投資收益稅。由于投資收益稅率通常是20%,比繳費(fèi)階段的企業(yè)和個(gè)人繳稅比例作用大的多,投資收益稅的征收對企業(yè)年金規(guī)模影響更大,故ETE與ETT模式下的企業(yè)年金基金規(guī)模小于TET與TEE模式。

我國舉辦企業(yè)年金是為了完善社會(huì)保障體系,縮減基本養(yǎng)老保險(xiǎn)基金缺口的支付壓力。目前基本養(yǎng)老保險(xiǎn)中個(gè)人賬戶空賬問題嚴(yán)重,而發(fā)展企業(yè)年金可以做實(shí)個(gè)人賬戶,不排除未來基本養(yǎng)老保險(xiǎn)個(gè)人賬戶與企業(yè)年金個(gè)人賬戶并賬運(yùn)行的可能。因此,可將稅收優(yōu)惠政策與基本養(yǎng)老保險(xiǎn)基金缺口對接,研究稅收優(yōu)惠政策對縮減基本養(yǎng)老保險(xiǎn)基金缺口的貢獻(xiàn)率。對國家而言,通過稅收優(yōu)惠政策實(shí)現(xiàn)的企業(yè)年金基金凈增長規(guī)模為企業(yè)年金積累規(guī)模與稅式支出之差。將企業(yè)年金基金凈增長規(guī)模與基本養(yǎng)老基金缺口對接,利用精算理論,可得到國家實(shí)行稅收優(yōu)惠政策對縮減基本養(yǎng)老基金缺口的貢獻(xiàn)率指標(biāo)模型。根據(jù)該模型,通過實(shí)證測算8種不同稅收優(yōu)惠模式對解決基金缺口的貢獻(xiàn),得到EET的貢獻(xiàn)率最大,為我國采用何種稅收優(yōu)惠模式提供定量化分析依據(jù)。

高敏雪,許曉娟

中國人民大學(xué)統(tǒng)計(jì)學(xué)院

《中國對外貨物貿(mào)易數(shù)據(jù)的重估與分析——從常住原則到屬權(quán)原則的總量調(diào)整》

依據(jù)現(xiàn)有統(tǒng)計(jì),欲對中國對外貿(mào)易狀況有一個(gè)比較全面客觀的認(rèn)識(shí)和評價(jià),必須同時(shí)關(guān)注兩組數(shù)據(jù):第一組是海關(guān)公布的貨物進(jìn)出口及其差額,第二組是有關(guān)在華外資企業(yè)的購進(jìn)銷售數(shù)據(jù),尤其是其貨物進(jìn)出口數(shù)據(jù)。根據(jù)第一組數(shù)據(jù),中國對外貨物貿(mào)易近年一直呈現(xiàn)大幅度增長,并形成了較大數(shù)額的貿(mào)易順差,已經(jīng)是世界第二大貨物貿(mào)易國,同時(shí)是世界上最大的貨物貿(mào)易順差國。但如果將第二組數(shù)據(jù)放進(jìn)來,可以看到,中國對外貨物貿(mào)易總額中近60%是由在華外商及港澳臺(tái)商投資企業(yè)實(shí)現(xiàn)的,對進(jìn)出口順差也具有顯著貢獻(xiàn)。就是說,盡管中國在國際貨物貿(mào)易市場上已經(jīng)占有比較重要的地位,但很大程度上卻是由中國境內(nèi)的外商投資企業(yè)推動(dòng)的。

為什么會(huì)這樣?原因在于,現(xiàn)行國際貿(mào)易統(tǒng)計(jì)體系是建立在所謂“常住地”原則基礎(chǔ)上的,一國所記錄的對外貿(mào)易是該國常住單位與非常住單位(即國外)之間發(fā)生的“跨境”貿(mào)易往來。但在全球貿(mào)易投資一體化背景下,通過外國直接投資這種形式,出現(xiàn)外商投資企業(yè)這樣的具有雙重屬性的“怪胎”:常住地在東道國,其資本權(quán)卻屬于投資國。按照現(xiàn)行國際貿(mào)易統(tǒng)計(jì)規(guī)則,這些“外國商業(yè)存在”要作為東道國的常住單位對待,其與國外之間發(fā)生的商品買賣要作為東道國的對外貿(mào)易,與東道國當(dāng)?shù)匕l(fā)生的貿(mào)易活動(dòng)則被視為東道國國內(nèi)貿(mào)易活動(dòng),與對外貿(mào)易無關(guān)。由此帶來的問題是:無法有效地反映外商投資企業(yè)與其投資母國的內(nèi)在聯(lián)系,如果此類企業(yè)以及由此類企業(yè)形成的對外貿(mào)易規(guī)模比較顯著,無疑會(huì)導(dǎo)致所提供信息不能真實(shí)反映一國對外貿(mào)易的真實(shí)狀況,并混淆國別之間的真實(shí)經(jīng)濟(jì)關(guān)系。中國當(dāng)前遭遇的情況就是如此。

能否改變這種局面,糾正現(xiàn)行國際貿(mào)易統(tǒng)計(jì)規(guī)則給數(shù)據(jù)效力帶來的影響?國際上已經(jīng)開始探索,希望引入“屬權(quán)”原則構(gòu)建國際貿(mào)易統(tǒng)計(jì),克服、至少是彌補(bǔ)“常住”原則帶來的弊端。其基本思路是:將直接投資引起的外國商業(yè)存在從東道國居民中剝離出來作為東道國的“國外”,然后進(jìn)行國際貿(mào)易統(tǒng)計(jì)。沿此思路,本文以方法討論為基礎(chǔ),對中國對外貨物貿(mào)易規(guī)模及其差額進(jìn)行了重新估算。

以2007年中國現(xiàn)行對外貿(mào)易統(tǒng)計(jì)數(shù)據(jù)為重估起點(diǎn),具體調(diào)整過程包含三個(gè)步驟:首先,將當(dāng)前記錄的外商投資企業(yè)的進(jìn)出口貿(mào)易從中國跨境貿(mào)易中予以扣除;其次,將外商投資企業(yè)在中國市場上的購買和銷售加入到中國進(jìn)出口貿(mào)易之中;第三步,就中國在境外的企業(yè)對進(jìn)出口貿(mào)易的影響進(jìn)行調(diào)整。為了充分多層次提供信息,調(diào)整過程中將在華外商投資企業(yè)按照三個(gè)口徑處理:一是只考慮外商獨(dú)資企業(yè),將其視為中國的非居民;二是以股權(quán)比例50%以上為標(biāo)準(zhǔn)確認(rèn)外商(絕對)控股企業(yè),作為中國的非居民;三是將全部外商投資企業(yè)都考慮在內(nèi),全部作為中國的非居民。

現(xiàn)行對外貿(mào)易統(tǒng)計(jì),2007年當(dāng)年出口12 178億美元,進(jìn)口9 560億美元,貿(mào)易順差2 618億美元,該順差相當(dāng)于出口的21.5%、進(jìn)口的27.4%。

第一步調(diào)整,將在華外資企業(yè)的進(jìn)出口從跨境進(jìn)出口中扣除。由于外資企業(yè)規(guī)模較大、進(jìn)出口傾向較強(qiáng),將外資企業(yè)完成的跨境貨物貿(mào)易額從跨境貨物貿(mào)易總額中扣除,結(jié)果給中國進(jìn)出口貨物貿(mào)易統(tǒng)計(jì)數(shù)據(jù)帶來了嚴(yán)重影響:貿(mào)易規(guī)模顯著“縮水”,貿(mào)易差額也顯著下降。具體到2007年,即使按照最保守的方法調(diào)整(只扣除外商獨(dú)資企業(yè)跨境進(jìn)出口額),調(diào)整后結(jié)果也只相當(dāng)于原來規(guī)模的六成(出口為原來的60.7%,進(jìn)口為原來的58.6%),貿(mào)易順差也隨之縮水近四成(60.1%);如果將全部外資企業(yè)都考慮在內(nèi),調(diào)整帶來的下降將更加驚人:所余部分的貿(mào)易總額勉強(qiáng)超過40%,貿(mào)易順差的下降也超過了50%。

在第一步調(diào)整基礎(chǔ)上進(jìn)行第二步調(diào)整,即在扣除外資企業(yè)進(jìn)出口基礎(chǔ)上,將外資企業(yè)在中國境內(nèi)的銷售和購買納入進(jìn)出口貿(mào)易統(tǒng)計(jì)之中:外資企業(yè)從中國市場的購買作為中國的出口,外資企業(yè)在中國市場的銷售作為中國的進(jìn)口。與第一步調(diào)整的扣除不同,第二次調(diào)整是要“加”,因此進(jìn)出口貿(mào)易規(guī)模會(huì)因調(diào)整而提高,但調(diào)整結(jié)果是否高于跨境貿(mào)易規(guī)模,則要取決于外資企業(yè)在內(nèi)地市場購銷額與跨境貿(mào)易額之間的規(guī)模對比。就2007年數(shù)據(jù)而言,在出口方面,在所有三種口徑下,第二步調(diào)整后的貿(mào)易規(guī)模都小于跨境貿(mào)易(因?yàn)橥赓Y企業(yè)在華購買均小于其對國外出口),但在進(jìn)口方面,除了B1調(diào)整結(jié)果略有下降外,其他兩個(gè)口徑的調(diào)整結(jié)果均大于跨境出口,就是說,總體看,外資企業(yè)在中國內(nèi)地市場上的銷售額大于從國外市場的進(jìn)口。將這兩方面合起來,結(jié)果顯示,與跨境貿(mào)易相比,中國進(jìn)出口貿(mào)易差額由原來非常顯著的順差變成了同樣非常顯著的逆差。即使是最保守的口徑1,貿(mào)易差額也從原來跨境貿(mào)易的2 618億美元、第一步調(diào)整的1 793億美元順差,變身為291億美元的逆差。

第三步調(diào)整考慮中國在境外直接投資形成的企業(yè)(簡稱中資企業(yè))對于中國進(jìn)出口貿(mào)易的影響。限于數(shù)據(jù)可得性,我們做了一些簡化和假定,只就跨境貿(mào)易中記錄的中資企業(yè)部分做了扣除。數(shù)據(jù)結(jié)果顯示,伴隨中國近年“走出去”戰(zhàn)略實(shí)施,通過中資企業(yè)實(shí)現(xiàn)的跨境貿(mào)易已經(jīng)具備一定規(guī)模,其中對進(jìn)口的拉動(dòng)作用要大于出口拉動(dòng),由此在一定程度上抵消了第二步調(diào)整形成的進(jìn)出口貿(mào)易逆差,甚至使第一個(gè)口徑下的調(diào)整結(jié)果由逆差轉(zhuǎn)為順差(從-291變?yōu)?03)。

限于數(shù)據(jù)可得性,本研究關(guān)于中國進(jìn)出口貿(mào)易統(tǒng)計(jì)的調(diào)整試算仍有待改進(jìn),但綜合各步驟的調(diào)整結(jié)果,我們有理由相信,中國對外進(jìn)出口貿(mào)易規(guī)模并非如傳統(tǒng)跨境貿(mào)易顯示的那樣大,進(jìn)口與出口相比也并非有那么大的順差。盡管不能用調(diào)整后的結(jié)果完全替代傳統(tǒng)統(tǒng)計(jì)數(shù)據(jù),但可以肯定,只有將直接投資與對外貿(mào)易這兩個(gè)現(xiàn)象銜接起來,才能從本質(zhì)上公允地評價(jià)中國對外貿(mào)易的整體實(shí)力以及在國際貿(mào)易市場上的地位。

中國對外貿(mào)易統(tǒng)計(jì)數(shù)據(jù)的調(diào)整(2007年)表單位:億美元

耿修林

南京大學(xué)商學(xué)院

《城鎮(zhèn)居民可支配收入對消費(fèi)結(jié)構(gòu)變動(dòng)影響的實(shí)證分析》

消費(fèi)作為社會(huì)經(jīng)濟(jì)活動(dòng)過程中的重要環(huán)節(jié),無論從微觀還是從宏觀角度看都具有重要的意義。

考察消費(fèi)支出格局,可能更有助于揭示城鎮(zhèn)居民生活方式的改變。為了考察在人均可支配收入不斷增長情況下的居民消費(fèi)支出變動(dòng)的統(tǒng)計(jì)特征,我們搜集了1985—2006年我國城鎮(zhèn)居民人均可支配收入和城鎮(zhèn)居民消費(fèi)支出數(shù)據(jù),并把消費(fèi)支出轉(zhuǎn)化成成分?jǐn)?shù)據(jù),然后訴諸于多重回歸分析,模擬了人均可支配收入的增長與消費(fèi)構(gòu)成之間的統(tǒng)計(jì)關(guān)系。依據(jù)成分?jǐn)?shù)據(jù)主要是考慮到可以避免消費(fèi)支出受到幣值變動(dòng)的影響;采用多重回歸分析的理由是:用城鎮(zhèn)居民可支配收入解釋城鎮(zhèn)居民消費(fèi)支出構(gòu)成的變動(dòng),由于被解釋變量消費(fèi)支出構(gòu)成存在嚴(yán)格的單位和約束,此時(shí)用可支配收入分別去解釋每項(xiàng)消費(fèi)支出所占比重,勢必會(huì)割裂消費(fèi)支出成分之間的關(guān)系。

為更好地說明城鎮(zhèn)居民消費(fèi)支出構(gòu)成的變化軌跡,我們對城鎮(zhèn)居民消費(fèi)支出成分?jǐn)?shù)據(jù)進(jìn)行了統(tǒng)計(jì)聚類分析。由SPSS輸出結(jié)果,1985-1992年城鎮(zhèn)居民消費(fèi)支出成分表現(xiàn)較為相似,1993-1999年屬于另外一組同質(zhì)類,而2000年之后幾年城鎮(zhèn)居民的消費(fèi)支出成分表現(xiàn)出較高的相似性。這說明,我國城鎮(zhèn)居民消費(fèi)結(jié)構(gòu)素質(zhì)上有了一個(gè)較為明顯的整體提升。根據(jù)居民消費(fèi)支出規(guī)律,居民消費(fèi)支出構(gòu)成有從生存到享受和發(fā)展的序進(jìn)過程,那么就我國城鎮(zhèn)居民人均可支配收入水平和消費(fèi)支出構(gòu)成的狀況,在目前階段應(yīng)該是消費(fèi)支出的格局越分散越好,基于這樣的考慮,我們計(jì)算了1985-2005年我國城鎮(zhèn)居民消費(fèi)支出成分的GINI-SIMPSON指數(shù),用以測量居民消費(fèi)支出的變動(dòng)度,如表1所示:

表1 1985-2005年城鎮(zhèn)居民消費(fèi)成分GINI-SIMPSON指數(shù)表

GINI-SIMPSON指數(shù)的數(shù)值越大,表明消費(fèi)支出的分布越分散,反之則越集中。表1的計(jì)算結(jié)

果同樣顯示城鎮(zhèn)居民消費(fèi)支出品質(zhì)是穩(wěn)步改善的。建立如下多重多元回歸分析模型:

模型求解時(shí)只計(jì)算了1985-2005年的數(shù)據(jù),預(yù)留下2006年消費(fèi)支出以作為驗(yàn)證模型模擬的實(shí)際效果。2006年城鎮(zhèn)居民的實(shí)際消費(fèi)支出與根據(jù)式(2)估計(jì)的數(shù)據(jù)列在表2中。

表2 2006年城鎮(zhèn)居民消費(fèi)支出成分估計(jì)誤差表

估計(jì)的均方根誤差為2.375 9,相對誤差為18. 72%,從實(shí)用角度來看這種水平的估計(jì)誤差是可以接受的。依據(jù)式(2),從統(tǒng)計(jì)意義上講,我國城鎮(zhèn)居民人均可支配收入每增加一個(gè)單位,用于食品方面的消費(fèi)支出會(huì)下降0.002 1個(gè)百分點(diǎn),用于衣著消費(fèi)支出下降0.000 5個(gè)百分點(diǎn),用于家用設(shè)備消費(fèi)支出下降0.000 6個(gè)百分點(diǎn),而用于醫(yī)療服務(wù)消費(fèi)會(huì)上升0.000 7個(gè)百分點(diǎn),用于交通通訊消費(fèi)支出上升0.001 3個(gè)百分點(diǎn),文教娛樂消費(fèi)上升0.000 7個(gè)百分點(diǎn),用于居住上升0.000 8個(gè)百分點(diǎn)?？梢?人均可支配收入的增加,影響較大的消費(fèi)支出項(xiàng)目主要是食品、交通通訊等。

根據(jù)國家發(fā)改委經(jīng)濟(jì)研究所(2007)對城鎮(zhèn)居民消費(fèi)支出結(jié)構(gòu)變動(dòng)趨勢的預(yù)測分析,以及商務(wù)部政策研究室人均GDP1 000-3 000美元消費(fèi)結(jié)構(gòu)升級的國際比較(2006)的研究結(jié)論,并結(jié)合實(shí)際數(shù)據(jù)的模擬結(jié)果,我們將“十一五”期間城鎮(zhèn)居民各項(xiàng)消費(fèi)支出的變化區(qū)間設(shè)定為:食品30%-35%,衣著7%-12%,家用設(shè)備4%-8%,交通通訊10%-15%,醫(yī)療保健6%-10%,文教娛樂10%-15%,居住10%-15%。以此進(jìn)行的回歸控制分析表明,要實(shí)現(xiàn)這一目標(biāo),城鎮(zhèn)居民的人均可支配收入需要達(dá)到19 224元左右,這意味著在“十一五”期末,我國城鎮(zhèn)居民人均可支配收入至少需要保持在13%以上的年均增幅。所以,如何有效引導(dǎo)和規(guī)劃城鎮(zhèn)

居民消費(fèi)支出結(jié)構(gòu)的提升,需要注意城鎮(zhèn)居民收入的合理增長。

何新華

中國社會(huì)科學(xué)院世界經(jīng)濟(jì)與政治研究所

《我國GDP數(shù)據(jù)間的矛盾現(xiàn)象》

近年來,對我國國民經(jīng)濟(jì)核算數(shù)據(jù)真實(shí)性的質(zhì)疑聲雖不絕于耳,但爭論歸爭論,到目前為止國家統(tǒng)計(jì)局公布的相關(guān)數(shù)據(jù)還是最權(quán)威的數(shù)據(jù)來源。盡管消費(fèi)、投資、出口連續(xù)多年高于GDP增長速度的現(xiàn)象不時(shí)引起外界的懷疑,但因國家統(tǒng)計(jì)局公布的消費(fèi)、投資、出口的增長速度是以現(xiàn)價(jià)計(jì)算,而GDP增長速度是以不變價(jià)計(jì)算,其不可比性顯而易見,對其質(zhì)疑缺少其合理性。然而,通過對2004年經(jīng)濟(jì)普查前后《中國統(tǒng)計(jì)年鑒》中公布的數(shù)據(jù)進(jìn)行比對,以及對2006年以來《中國統(tǒng)計(jì)年鑒》公布的數(shù)據(jù)間存在的邏輯關(guān)系進(jìn)行粗略檢驗(yàn),發(fā)現(xiàn)需要引起關(guān)注的問題:

2004年經(jīng)濟(jì)普查后對現(xiàn)價(jià)支出法和生產(chǎn)法GDP分別進(jìn)行調(diào)整,結(jié)果改變了兩者間的相對關(guān)系。部分年度調(diào)整前支出法小于生產(chǎn)法核算,但調(diào)整后支出法反而高出生產(chǎn)法核算很多。與此同時(shí), 2004年經(jīng)濟(jì)普查前支出法與生產(chǎn)法核算數(shù)據(jù)間的差距要大大小于調(diào)整后兩者間的差距。由于2004年經(jīng)濟(jì)普查采集的是時(shí)點(diǎn)數(shù)據(jù),并且根據(jù)統(tǒng)計(jì)局的相關(guān)說明,對生產(chǎn)法GDP和支出法GDP歷史數(shù)據(jù)的修訂采用了相同的方法,即趨勢離差內(nèi)插法,因此上述現(xiàn)象的出現(xiàn)令人生疑:對GDP歷史數(shù)據(jù)的修訂可能并非嚴(yán)格按照統(tǒng)計(jì)局所公布的方法,而是摻雜了一些人為因素。特別是當(dāng)以調(diào)整前各年度的數(shù)據(jù)為100時(shí),1998-2004年調(diào)整后的存貨變動(dòng)依次為:43.3%,97.7%,-905.2%,211.2%,400%, 885.8%,673.2%。從數(shù)據(jù)調(diào)整前后存貨變動(dòng)呈現(xiàn)出的巨大變化可以看出,對支出法GDP的調(diào)整確實(shí)有不合理之處。

自2006年起,《中國統(tǒng)計(jì)年鑒》中開始公布支出法GDP各組成部分對GDP增長的貢獻(xiàn)率和拉動(dòng)。當(dāng)把最終消費(fèi)、資本形成總額和凈出口對GDP的拉動(dòng)之和與生產(chǎn)法GDP增長率相比較后,可以發(fā)現(xiàn)兩者完全相同。鑒于現(xiàn)價(jià)支出法與生產(chǎn)法GDP間有較大差距,上述現(xiàn)象導(dǎo)致了一系列難以解釋的現(xiàn)象:

(1)國家統(tǒng)計(jì)局默許了兩個(gè)GDP減縮指數(shù)的存在?GDP增長率采用不變價(jià)GDP計(jì)算,而不變價(jià)GDP等于現(xiàn)價(jià)GDP除以GDP減縮指數(shù)。由于現(xiàn)價(jià)支出法GDP與現(xiàn)價(jià)生產(chǎn)法GDP間的誤差在2%以上,2007年的支出法GDP甚至高出生產(chǎn)法GDP 515%,若以兩者計(jì)算的增長率相同,則顯然需要兩個(gè)不同的GDP減縮指數(shù)。而連續(xù)多年以兩者計(jì)算的增長率都相同,顯然不是偶然的巧合,而是故意為之。

(2)由于支出法GDP增長率與生產(chǎn)法GDP增長率相同,當(dāng)采用固定資產(chǎn)投資價(jià)格指數(shù)減縮固定資本形成后,出現(xiàn)了存貨變動(dòng)對GDP增長率在個(gè)別年度過高的拉動(dòng),如1993年為3.7個(gè)百分點(diǎn),2005年為-2.9個(gè)百分點(diǎn)。這一現(xiàn)象令人難以置信。

(3)同樣由于支出法GDP增長率與生產(chǎn)法GDP增長率相同,當(dāng)采用消費(fèi)者價(jià)格指數(shù)減縮居民消費(fèi)后,推導(dǎo)出:不變價(jià)最終消費(fèi)增長速度的低迷竟然在很大程度上是因?yàn)椴蛔儍r(jià)政府消費(fèi)增長速度的下降。

盡管由于資料來源、生產(chǎn)范圍、計(jì)算方法以及對某些具體問題的處理方法等的變化,經(jīng)濟(jì)普查年度GDP數(shù)據(jù)發(fā)生了較大幅度的變化,需要對以往年度GDP歷史數(shù)據(jù)進(jìn)行修訂,以使數(shù)據(jù)具有可比性,但作為權(quán)威的官方統(tǒng)計(jì)數(shù)據(jù),其采用的修訂方法不僅必須經(jīng)得起時(shí)間的考驗(yàn),也需要滿足最基本的常識(shí)性判斷,以切實(shí)維護(hù)統(tǒng)計(jì)數(shù)據(jù)的嚴(yán)肅性。

由于核算的角度不同,采用不同核算方法產(chǎn)生的GDP數(shù)據(jù)間不可避免地存在一定的統(tǒng)計(jì)誤差,沒有必要刻意回避這一現(xiàn)象。在支出法GDP核算與生產(chǎn)法GDP核算仍存在較大差距的情況下,單純?yōu)榱藵M足對某些數(shù)據(jù)的需求(如消費(fèi)、投資對GDP的拉動(dòng))而放棄基本統(tǒng)計(jì)原則的做法是絕對不可取的。

金勇進(jìn),陶然

中國人民大學(xué)應(yīng)用統(tǒng)計(jì)科學(xué)研究中心、統(tǒng)計(jì)學(xué)院《普查事后調(diào)查理論及其中國實(shí)踐研究》

作為一種全面調(diào)查,普查主要受到各種非抽樣誤差的影響,總的誤差效應(yīng)表現(xiàn)為普查準(zhǔn)確性,即普查結(jié)果與真實(shí)值的偏差。事后調(diào)查是普查數(shù)據(jù)質(zhì)量眾多評估方法之一,通過在普查的某些環(huán)節(jié)結(jié)束后開展獨(dú)立的抽樣調(diào)查,將抽查數(shù)據(jù)與普查數(shù)據(jù)對比,因此也被稱為事后抽查。其目的,一是要分析普查匯總結(jié)果的各種誤差影響因素,發(fā)揮對普查數(shù)據(jù)質(zhì)量的預(yù)防控制作用;二是實(shí)現(xiàn)對普查總體的推斷,用以評估普查匯總結(jié)果的準(zhǔn)確性。

普查事后調(diào)查技術(shù)方法為了達(dá)到上述兩個(gè)目的,普查事后調(diào)查可以采用兩種技術(shù)方法:

一種是以事后抽查為準(zhǔn)衡量普查對象的普查記錄,根據(jù)抽樣設(shè)計(jì)推斷普查總體凈誤差,本文稱為凈誤差推斷。假設(shè)事后抽查第i個(gè)樣本普查小區(qū)第j個(gè)普查對象的普查結(jié)果是Eij,認(rèn)為普查真實(shí)值只是理論存在,而將該普查對象的事后抽查調(diào)查結(jié)果Pij視為普查真實(shí)值的替代,用Eij-Pij表示該普查對象的調(diào)查誤差,將該樣本小區(qū)全部普查對象誤差匯總即可得到樣本小區(qū)的凈誤差Ei-Pi,再結(jié)合事后抽查設(shè)計(jì)得到普查總體凈誤差E-P。通過凈誤差推斷可實(shí)現(xiàn)普查和多種誤差的識(shí)別。若將普查結(jié)果Eij視為普查對象的計(jì)數(shù),凈誤差推斷可用來檢查普查涵蓋情況;若將Eij視為普查對象的指標(biāo)值,則又可用來檢查普查回答或計(jì)量誤差。由于凈誤差推斷建立在以事后抽查記錄為準(zhǔn)的匹配基礎(chǔ)上,因此其可以用來對普查對象的調(diào)查質(zhì)量進(jìn)行過程檢查,及時(shí)發(fā)現(xiàn)普查中存在的問題;或者將樣本普查小區(qū)內(nèi)普查對象的檢查結(jié)果根據(jù)屬性進(jìn)行分類,根據(jù)不同屬性的誤差發(fā)生情況控制普查的過程質(zhì)量;或者根據(jù)匯總推斷普查總體,對普查總體數(shù)據(jù)準(zhǔn)確性做出評估。

另一種方法是通過普查對象的事后抽查數(shù)據(jù)與普查對比,結(jié)合捕獲再捕獲模型原理推斷普查總體,本文稱為雙系統(tǒng)估計(jì)。此時(shí),仍假設(shè)樣本小區(qū)內(nèi)普查對象的普查結(jié)果Eij和事后抽查結(jié)果Pij對比,但不考慮真值的存在性,也不計(jì)算普查對象的匹配誤差,而是將Eij和Pij視為普查對象在普查和事后抽查兩個(gè)獨(dú)立系統(tǒng)內(nèi)的連接屬性,且要求所有普查對象在同一系統(tǒng)內(nèi)的連接屬性是相同的。為此,需要采取措施滿足上述條件,考慮將事后抽查在普查登記結(jié)束一定時(shí)間后開展,滿足兩次獨(dú)立假設(shè);將樣本單元進(jìn)行事后分層,使得每一事后層內(nèi)普查對象具有較高的同質(zhì)性,滿足可能連接的屬性;在每一事后層構(gòu)造子總體的雙系統(tǒng)估計(jì),最終將事后層估計(jì)結(jié)合抽樣設(shè)計(jì)合成普查總體估計(jì),以此計(jì)算與原有普查結(jié)果的偏差,用以評估普查的準(zhǔn)確性。由于雙系統(tǒng)估計(jì)不對事后抽查相比普查的準(zhǔn)確性程度做出假定,因此其對比結(jié)果不能用來衡量普查的過程誤差,但可以衡量事后層子總體的普查結(jié)果偏差,或最終總普查總體的普查結(jié)果偏差。

普查事后調(diào)查技術(shù)的適用性凈誤差推斷方法將普查對象的事后抽查結(jié)果視為真實(shí)值的替代,其實(shí)質(zhì)是假設(shè)普查對象僅在普查中發(fā)生調(diào)查誤差,而以事后抽查為準(zhǔn)。為滿足這一假設(shè),實(shí)踐中,在事后抽查中由經(jīng)驗(yàn)豐富的調(diào)查員,采用細(xì)致的方案在一定范圍內(nèi)開展。其次,凈誤差推斷不僅用于普查總體的凈誤差評估,也適用于普查總體的分類誤差控制。最后,如果用于推斷總體凈誤差,推斷將完全建立在抽樣設(shè)計(jì)的基礎(chǔ)上,簡便易行。

雙系統(tǒng)估計(jì)理論不對事后抽查做出真值假定,即認(rèn)為普查對象可能在兩次調(diào)查中同時(shí)發(fā)生調(diào)查誤差。這點(diǎn)相比凈誤差推斷,能對普查對象的調(diào)查誤差做出更為準(zhǔn)確的評估,適用于存在特定誤差現(xiàn)象的普查對象,例如人口普查中的人具有較大的流動(dòng)性,很容易被普查或是事后抽查遺漏,經(jīng)濟(jì)普查中的無證個(gè)體經(jīng)營戶也容易被忽略。其次,為了滿足兩次調(diào)查獨(dú)立的假設(shè),事后抽查通常在普查結(jié)束一定時(shí)間后開展,但這將導(dǎo)致事后調(diào)查不能對普查過程進(jìn)行質(zhì)量控制。最后,采用雙系統(tǒng)估計(jì)時(shí),對樣本單元進(jìn)行細(xì)致的事后分層才能滿足同一調(diào)查中等概率連接的假設(shè),需要較大的樣本量用以滿足每個(gè)事后層都有足夠的樣本量用于子總體的雙系統(tǒng)估計(jì),因此費(fèi)用較高,實(shí)施相對復(fù)雜。盡管雙系統(tǒng)估計(jì)在國外由最初的人口普查應(yīng)用擴(kuò)展到農(nóng)業(yè)和經(jīng)濟(jì)普查,但圍繞其前提假設(shè)的爭論一直不斷,各國也在尋求方法的改進(jìn)與突破。

對我國普查事后調(diào)查的實(shí)踐探討首先,需要完善清查摸底環(huán)節(jié)的事后抽查技術(shù)。我國三大周期性普查在普查登記前均采用清查摸底(單位清查)編制普查名錄,清查摸底的事后抽查關(guān)系到普查對象登記的準(zhǔn)確性,需要在清查摸底結(jié)束后較短的時(shí)間內(nèi)完成,發(fā)揮對普查對象名錄的補(bǔ)充與更新作用。將普查推斷通過普查與事后抽查的核對,能夠?qū)崿F(xiàn)普查對象的誤差過程質(zhì)量控制,誤差推斷建立在抽樣設(shè)計(jì)的基礎(chǔ)上,實(shí)施快速簡便。2000年人口普查和2006年農(nóng)業(yè)普查欠缺清查摸底環(huán)節(jié)的事后抽查質(zhì)量控制措施,相比之下,2008年經(jīng)濟(jì)普查采用事后抽查檢查單位清查質(zhì)量,但質(zhì)量檢查工作并沒有建立在完善的抽樣設(shè)計(jì)基礎(chǔ)上。為進(jìn)一步提高普查登記的數(shù)據(jù)質(zhì)量,在清查摸底環(huán)節(jié)需要采用凈誤差推斷的技術(shù)方法進(jìn)一步加強(qiáng)普查清查摸底(單位清查)階段的事后抽查。對于經(jīng)濟(jì)普查和農(nóng)業(yè)普查中的法人或產(chǎn)業(yè)單位清查,通過抽查結(jié)果對名錄庫及時(shí)更新;人口普查和農(nóng)業(yè)普查中的住戶單位,采用事后抽查及時(shí)更新地址碼庫。在普查登記過程中也可根據(jù)需要進(jìn)一步結(jié)合抽樣設(shè)計(jì)進(jìn)行凈誤差推斷和過程控制,對包括涵蓋誤差、計(jì)量誤差、處理誤差等在內(nèi)的普查登記誤差效應(yīng)進(jìn)行控制。

其次,普查登記工作結(jié)束后,適于開展以全國為總體的事后抽查雙系統(tǒng)估計(jì),充分發(fā)揮雙系統(tǒng)估計(jì)對普查遺漏的測量。中國2000年人口普查已嘗試采用雙系統(tǒng)模型對總?cè)丝跀?shù)準(zhǔn)確性進(jìn)行評估,但相比國外成熟做法,還需要從抽樣設(shè)計(jì)和估計(jì)方法上進(jìn)行完善,以滿足模型假設(shè)。隨著我國普查事后抽查制度的常態(tài)化,在農(nóng)業(yè)普查和經(jīng)濟(jì)普查中有必要結(jié)合普查對象的遺漏特點(diǎn)采用雙系統(tǒng)估計(jì)方法對普查數(shù)據(jù)準(zhǔn)確性進(jìn)行評估研究,有助于提高普查結(jié)果的數(shù)據(jù)質(zhì)量,但活動(dòng)單位和人口存在本質(zhì)的差異,在研究基于雙系統(tǒng)模型的農(nóng)業(yè)和經(jīng)濟(jì)普查結(jié)果涵蓋誤差分析時(shí),還需要根據(jù)具體普查內(nèi)容從理論和實(shí)踐上進(jìn)一步深入研究。如果采用事后分層滿足雙系統(tǒng)估計(jì)用于評價(jià)普查數(shù)據(jù)質(zhì)量的要求,對于人口普查,可選擇年齡、性別、民族等作為事后分層標(biāo)志;對于農(nóng)業(yè)普查,可選擇地域、地形、主要農(nóng)作物種類、人均耕地面積等作為事后分層標(biāo)志;對于經(jīng)濟(jì)普查,可選擇職工人數(shù)、企業(yè)所有制、產(chǎn)值規(guī)模等作為事后分層標(biāo)志。

最后,完善事后抽查設(shè)計(jì)?？茖W(xué)的評估方法需建立在細(xì)致的事后抽查方案設(shè)計(jì)上。從中國三大普查最近一次的事后抽查方案設(shè)計(jì)看,在總體分層、樣本量、抽樣方法、樣本代表性等方面存在進(jìn)一步完善的空間。對于清查摸底的事后抽查,可以將誤差推斷目標(biāo)縮小到各個(gè)縣級行政單位,目的是在保證檢查目的的同時(shí)盡快完成抽查。對于全國為總體的普查總體質(zhì)量評估,需要綜合考慮地域特征、人口結(jié)構(gòu)、經(jīng)濟(jì)發(fā)展水平進(jìn)行復(fù)雜抽樣設(shè)計(jì),用以保證事后調(diào)查的有效性。

李靜萍

中國人民大學(xué)統(tǒng)計(jì)學(xué)院

《基于地租方法核算的城鎮(zhèn)土地出讓金》

土地出讓金已經(jīng)成為地方政府補(bǔ)充財(cái)政收入的重要資金來源,《中國資金流量表(2006)》首次將土地出讓金作為非生產(chǎn)非金融資產(chǎn)的交易來記錄,本人則認(rèn)為,在中國適于將土地出讓金作為地租來核算,其最大的好處在于可以展現(xiàn)地方政府的“隱性債務(wù)”,其本質(zhì)既是政府對其他部門的負(fù)債,也是當(dāng)前屆政府對未來屆政府的負(fù)債。本文即按地租核算思路對2003—2007年中國土地出讓金進(jìn)行實(shí)際核算,并揭示土地出讓金以地租方式進(jìn)入核算體系之后,對各個(gè)機(jī)構(gòu)部門的影響①土地使用權(quán)出讓收入有總收入和純收入兩個(gè)口徑。出讓總收入是土地批租時(shí)一次性收取的費(fèi)用,包括土地開發(fā)投資費(fèi)用和使用期內(nèi)的金額土地使用費(fèi)。前者包括征地、動(dòng)遷以及為地塊直接配套的基礎(chǔ)設(shè)施費(fèi),是對開發(fā)投資的一次性補(bǔ)償,后者為土地資源使用的費(fèi)用,是土地所有權(quán)在經(jīng)濟(jì)上的體現(xiàn),也稱為土地出讓純收入。如無特別說明,本文所指的土地出讓金指純收入。。

土地出讓金對國民經(jīng)濟(jì)主體的影響及其估算方法土地出讓交易對國民經(jīng)濟(jì)活動(dòng)主體的影響可以從承租人和出租人兩個(gè)角度來觀察。對于土地的承租人(即使用者),所支付土地出讓金既有當(dāng)年支付地租的部分,也有預(yù)交地租而產(chǎn)生的債權(quán)部分,前者影響經(jīng)濟(jì)流量,后者影響經(jīng)濟(jì)存量。從理論上講,地租支出應(yīng)從承租人的可支配收入中扣除,而因隱性債權(quán)產(chǎn)生的應(yīng)收利息則增加承租人的可支配收入,由此可以綜合分析土地出讓交易對承租人收入的凈影響。另一方面,對于一次性交納出讓金而產(chǎn)生的隱性債權(quán)而言,從理論上說應(yīng)計(jì)為承租人的資產(chǎn),由此可以分析土地出讓交易對承租人資產(chǎn)存量的影響。

對于土地的出租人(即政府)而言,土地交易的影響剛好是反向的,即一方面應(yīng)當(dāng)分析出讓交易對政府收入的增長效應(yīng),另一方面應(yīng)當(dāng)分析政府的隱性債務(wù)負(fù)擔(dān)。具體估算步驟如下:

(1)分用途估算每年的應(yīng)計(jì)地租,公式為:

公式中的關(guān)鍵參數(shù)是貼現(xiàn)率r。由于中國沒有對應(yīng)于40～70年的利率,因此在估算時(shí),以2008年發(fā)行的第六期記賬式30年期國債的利率(4.5%)為基準(zhǔn)來確定貼現(xiàn)率。依據(jù)利率期限結(jié)構(gòu)理論并參照美國的經(jīng)驗(yàn),估計(jì)時(shí)采用兩個(gè)貼現(xiàn)率,分別為4.5%和5%。

(2)分用途估算隱性債務(wù)余額,公式為:

第一年的期初債務(wù)余額=當(dāng)年實(shí)收的土地出讓金,每年的應(yīng)付利息=期初債務(wù)余額×貼現(xiàn)率,每年的債務(wù)支付=每年的地租-每年的應(yīng)付利息,每年的期末債務(wù)余額=每年的期初債務(wù)余額-每年的債務(wù)支付

從第二年開始,每年的期初債務(wù)余額=上年末期末債務(wù)余額

(3)匯總,將前兩個(gè)步驟得到的每年分用途的土地出讓金的估計(jì)結(jié)果加總,即可得到包括各年地租和期初、期末隱性債務(wù)余額在內(nèi)的核算結(jié)果。

土地出讓金對國民經(jīng)濟(jì)主體影響的估算結(jié)果

(1)對住戶的影響。對土地出讓金對住戶收入流量的影響進(jìn)行測算后可以看到,將土地出讓金分?jǐn)偟礁髂旰?與可支配收入相比,每年的地租支出規(guī)模并不大,同時(shí)住戶的虛擬利息收入與地租的規(guī)模相差無幾,因此實(shí)際的地租成本僅相當(dāng)于可支配收入的萬分之幾,而同期住宅用地的出讓金相當(dāng)于可支配收入的比重則達(dá)到2%～3%,這就意味著一次性征收土地出讓金的確在很大程度上增加了住戶的負(fù)擔(dān)。對土地出讓交易對住戶資產(chǎn)存量的影響進(jìn)行測算后可以看到,由一次性征收土地出讓金給住戶所帶來的隱性債權(quán)在規(guī)模上是相當(dāng)可觀的,而且其相當(dāng)于儲(chǔ)蓄余額的比重逐年提升。到2007年末,僅僅是2003-2007年間累積的債權(quán)余額就相當(dāng)于儲(chǔ)蓄余額的5%。

(2)對企業(yè)的影響。對土地出讓金對住戶收入流量的影響進(jìn)行測算后可以看到,由一次性征收土地出讓金給企業(yè)所帶來的隱性債權(quán)規(guī)模也很龐大,相當(dāng)于企業(yè)存款的比重不斷提高。到2007年末,僅僅2003-2007年間累積的債權(quán)余額就相當(dāng)于企業(yè)存款的4%。

(3)對政府的影響。作為國有土地的所有者,政府部門是所有土地出讓金的收入方,同時(shí)也負(fù)擔(dān)起土地出讓金產(chǎn)生的隱性債務(wù)。

對土地出讓金對政府部門的有關(guān)流量的影響進(jìn)行測算,可以看到,土地出讓產(chǎn)生的地租收入對于政府財(cái)政收入而言并非小數(shù)目,2007年前者相當(dāng)于后者的3%,這還僅僅是5年間土地出讓產(chǎn)生的地租收入,若將以前土地出讓所產(chǎn)生的地租收入也記在內(nèi),則地租收入對于地方政府財(cái)政收入的貢獻(xiàn)會(huì)更大。當(dāng)然,由于隱性債務(wù)產(chǎn)生的利息支出在很大程度上抵消了地租收入所帶來的收益,但是與住戶部門相比,凈地租收入對于政府部門而言重要性更為突出,占政府財(cái)政收入的比重并非萬分之幾,而是千分之幾。

對土地出讓金對政府部門資產(chǎn)存量的影響進(jìn)行測算可以看到,土地出讓金所帶來的隱性債務(wù)對于政府而言是非常沉重的,而且隨著土地出讓規(guī)模的不斷擴(kuò)張,隱性債務(wù)負(fù)擔(dān)日益加重。截止2007年底,僅2003-2007年間累積的隱性債務(wù)就大約相當(dāng)于國債余額①由于土地出讓金全部歸地方政府支配,因此研究出讓金隱性債務(wù)對政府部門存量指標(biāo)的影響本應(yīng)采用地方政府債務(wù),但是我國的財(cái)政預(yù)算體制決定了地方政府沒有預(yù)算赤字,從而沒有對地方政府債務(wù)的統(tǒng)計(jì),因此這里只能采用國債指標(biāo)。的30%。

這樣的測算結(jié)果使我們更加確信我國土地出讓金作為預(yù)收地租核算的必要性,否則難以揭示前屆地方政府對未來屆地方政府收入的侵占,這種侵占等價(jià)于給未來屆地方政府遺留的債務(wù)負(fù)擔(dān)。

結(jié)語按照地租的思路進(jìn)行核算,可以發(fā)現(xiàn)土地出讓金對各部門經(jīng)濟(jì)流量的凈影響不明顯,但是對經(jīng)濟(jì)存量則普遍存在顯著的影響,尤其是對政府部門的影響較大,可以充分揭示政府的“隱性債務(wù)”規(guī)模。相反,如果把土地出讓金作為土地使用權(quán)資產(chǎn)的交易,則對于政府部門來講只能體現(xiàn)為金融資產(chǎn)的累積,而不能體現(xiàn)由于預(yù)支未來地租對未來界政府融資能力的弱化。由此我們建議:應(yīng)當(dāng)在現(xiàn)有核算方法基礎(chǔ)上,增加從地租角度核算土地出讓金的備忘項(xiàng)目。

劉樂平,王瑩,劉駿豪

天津財(cái)經(jīng)大學(xué)統(tǒng)計(jì)學(xué)系

《非壽險(xiǎn)損失準(zhǔn)備金的貝葉斯信度區(qū)間估計(jì)》

對于經(jīng)營非壽險(xiǎn)業(yè)務(wù)的保險(xiǎn)公司來說,損失準(zhǔn)備金是負(fù)債表上金額最大的負(fù)債項(xiàng)目,直接關(guān)系到保險(xiǎn)公司利潤指標(biāo)的高低和償付能力的大小。未決賠款準(zhǔn)備金是損失準(zhǔn)備金的主要組成部分,由于包含諸如IBNR、估計(jì)方法和預(yù)測模型等不確定性因素,所以一直是國內(nèi)外非壽險(xiǎn)精算理論和應(yīng)用研究的熱點(diǎn)和難點(diǎn)問題。

將比較完善的未決賠款準(zhǔn)備金估計(jì)方法和預(yù)測模型按數(shù)據(jù)基礎(chǔ)、假設(shè)條件和最終估計(jì)結(jié)果總結(jié)成表1:

表1 未決賠款準(zhǔn)備金的估計(jì)方法和預(yù)測模型表

對表1進(jìn)行分析,估計(jì)方法可以分成確定方法和隨機(jī)模型兩大類,而從最終估計(jì)結(jié)果的形式來看,可以發(fā)現(xiàn)一個(gè)明顯的變化趨勢,即從點(diǎn)估計(jì)到區(qū)間估計(jì)的變化。未決賠款準(zhǔn)備金估計(jì)的隨機(jī)模型的最終結(jié)果要比確定性方法更穩(wěn)健的原因之一也在于區(qū)間估計(jì)與點(diǎn)估計(jì)的差異。例如,用廣義線性模型對未決賠款準(zhǔn)備金估計(jì),在得到估計(jì)值的同時(shí)還可計(jì)算估計(jì)的均方誤差;未決賠款準(zhǔn)備金的貝葉斯估計(jì)不僅可以得到估計(jì)值,還可得到估計(jì)值95%的可信區(qū)間。

未決賠款分布的波動(dòng)性和不確定性使得提留的準(zhǔn)備金與實(shí)際值永遠(yuǎn)存在差異,本文考慮準(zhǔn)備金風(fēng)險(xiǎn)因素,基于貝葉斯信度理論,討論準(zhǔn)備金的貝葉斯信度區(qū)間估計(jì)的方法和步驟。

準(zhǔn)備金風(fēng)險(xiǎn)的測度傳統(tǒng)實(shí)務(wù)中,保險(xiǎn)公司一般都從貨幣形式上來考慮損失,這樣便出現(xiàn)了以鏈梯法為代表的一系列確定性準(zhǔn)備金估計(jì)方法,它們皆是研究總的損失金額在未來的發(fā)展情況。然而,在這個(gè)基礎(chǔ)上建立的準(zhǔn)備金模型沒有考慮案件發(fā)生的頻率、賠款單位數(shù)和案均賠款的影響。如下測度損失:

該式表示損失的金額(L)是索賠頻率(F)、風(fēng)險(xiǎn)單位數(shù)(E)和案均賠款(S)的函數(shù)。由于準(zhǔn)備金問題運(yùn)用平滑理論定性的特點(diǎn),把這里的L定義成損失的指數(shù),這個(gè)指數(shù)逐月的變化就反映了損失變化?；谝陨蠐p失(L),可以對準(zhǔn)備金風(fēng)險(xiǎn)進(jìn)行測度。

貝葉斯信度區(qū)間估計(jì) 本文討論的貝葉斯信度區(qū)間,既不同于經(jīng)典的置信區(qū)間,也有異于貝葉斯可信區(qū)間,而是借用非壽險(xiǎn)精算學(xué)經(jīng)驗(yàn)費(fèi)率厘定的信度模型理論,研究如何利用近期損失準(zhǔn)備金數(shù)據(jù)(稱為經(jīng)驗(yàn)數(shù)據(jù),據(jù)此得到的為經(jīng)驗(yàn)估計(jì)θo)和與之相關(guān)的類似險(xiǎn)種相關(guān)數(shù)據(jù)(稱為先驗(yàn)信息數(shù)據(jù),據(jù)此確定的為先驗(yàn)估計(jì)θo),加權(quán)平均分別得出損失準(zhǔn)備金貝葉斯信度區(qū)間后驗(yàn)估計(jì)的下限和上限:

得到損失準(zhǔn)備金貝葉斯信度區(qū)間估計(jì)為[θL, θU]。其中rL,rU∈[0,1]為準(zhǔn)備金下風(fēng)險(xiǎn)因子水平和準(zhǔn)備金上風(fēng)險(xiǎn)因子水平。

準(zhǔn)備金風(fēng)險(xiǎn)因子控制模型此控制模型基于的假設(shè)條件是:指數(shù)L的變化可以用來反映在損失準(zhǔn)備金中可能考慮到所有的變化。為了實(shí)施該辦法,必須獲得過去F、E、S的月度數(shù)據(jù),用這三者計(jì)算的L值便可以建立一個(gè)損失的歷史數(shù)據(jù)集,將此歷史數(shù)據(jù)做季節(jié)調(diào)整之后再計(jì)算出L的變動(dòng)比率。一個(gè)月、兩個(gè)月、三個(gè)月的L凈變動(dòng)比率等等不同時(shí)期的相同時(shí)間區(qū)間段可以排序。

基于FDR的風(fēng)險(xiǎn)因子水平控制考慮m重假設(shè)檢驗(yàn){H1,H2,…,Hm},相對應(yīng)的檢驗(yàn)P-值為{P1,P2,…,Pm}。Benjamini.Y和Hochberg.Y (1995)提出的B-H過程包括以下步驟:

步驟1:將{P1,P2,…,Pm}排序,設(shè)P(1)≤P(2)≤…≤P(m),相對應(yīng),m重假設(shè)檢驗(yàn)變換為{H(1), H(2),…,H(m)};

步驟3:從P(m)開始,按步驟2逐步向下檢驗(yàn);

步驟4:若存在滿足步驟2的k,則拒絕{H(1), H(2),…,H(k)},否則,不拒絕{H(1),H(2),…, H(m)}。估計(jì)中,準(zhǔn)備金的上下風(fēng)險(xiǎn)因子水平采用q-值定義和控制。

劉仕國

中國社會(huì)科學(xué)院世界經(jīng)濟(jì)與政治研究所

《國際收支統(tǒng)計(jì)新進(jìn)展及其對中國的影響》

2009年12月,國際貨幣基金組織(IMF)發(fā)布了《國際收支統(tǒng)計(jì)手冊》第六版,對1993年發(fā)布的第五版進(jìn)行了更新和改進(jìn)。最新版本的全名為《國際收支統(tǒng)計(jì)與國際投資頭寸手冊》(Balance of Pay2 ments and International Investment Position Man2 ual,以下簡稱BPM6),而第五版的全名為《國際收支統(tǒng)計(jì)手冊》(Balance of Payments Manual,以下簡稱BPM5)。

BPM6相對于BPM5的一般進(jìn)展自1993年BPM5發(fā)布以來,全球經(jīng)濟(jì)與金融持續(xù)深入發(fā)展,許多經(jīng)濟(jì)體改變了以往的經(jīng)濟(jì)政策。BPM6更多地反映了這些變化。相對于BPM5而言,BPM6的賬戶設(shè)置并沒有發(fā)生結(jié)構(gòu)性的變化,而BPM5相對于以前版本的變化卻是結(jié)構(gòu)性的。BPM6將BOP賬戶和IIP賬戶整合為一個(gè)體系。BPM6包括2個(gè)一級賬戶,即“國際收支”賬戶和“國際投資頭寸”賬戶;

以上錯(cuò)誤控制過程都遵循這樣一個(gè)模式:在先給定錯(cuò)誤控制水平,即固定第I類錯(cuò)誤水平的前提下,基于單個(gè)假設(shè)檢驗(yàn),再通過錯(cuò)誤控制過程構(gòu)造出檢驗(yàn)的拒絕域,最后得出檢驗(yàn)結(jié)果。而Storey (2003)卻提出了一種新的假設(shè)檢驗(yàn)思路:憑經(jīng)驗(yàn)先給出拒絕域,然后去估計(jì)錯(cuò)誤率。如果這個(gè)估計(jì)能夠被接受,則認(rèn)為該檢驗(yàn)是有效的;如果錯(cuò)誤率較大,可以通過調(diào)整拒絕域使得錯(cuò)誤率被控制在滿意的水平。

Storey利用pFDR給出了q-值的定義和算法:

對于一個(gè)觀測統(tǒng)計(jì)量T=t,q-value(t)=BPM5僅包括“國際收支”1個(gè)一級賬戶。此外, BPM6還包括“附加的分析用頭寸數(shù)據(jù)”。

在一級賬戶“國際收支賬戶”中,BPM6包含3個(gè)二級賬戶,即經(jīng)常賬戶、資本賬戶和金融賬戶。而BPM5僅包括2個(gè)二級賬戶,即經(jīng)常賬戶以及資本與金融賬戶。換言之,資本賬戶與金融賬戶在BPM5中為二級賬戶“資本與金融賬戶”下的三級賬戶。核算的對象,經(jīng)常賬戶為生產(chǎn)的資產(chǎn)(如固定資產(chǎn)、存貨和貴重物品),資本賬戶為非生產(chǎn)非金融資產(chǎn)(如自然資源、合約、租賃和許可),金融賬戶和國際投資頭寸賬戶為金融資產(chǎn)與負(fù)債。

在二級賬戶內(nèi)部,BPM6“經(jīng)常賬戶”下的“初次收入”項(xiàng)與“二次收入”項(xiàng),在BPM5中的名稱分別為“收入”項(xiàng)和“經(jīng)常轉(zhuǎn)移”項(xiàng);BPM6“資本賬戶”中的“非生產(chǎn)非金融資產(chǎn)的總獲得(貸方)/總支配(借方)”項(xiàng)和“資本轉(zhuǎn)移”項(xiàng),在BPM5中的先后位置是相反的,而且BPM6明確了非生產(chǎn)非金融資產(chǎn)獲得/支配的登錄為“總值”口徑;在“金融賬戶”中, BPM6比BPM5多了“金融衍生工具(不含儲(chǔ)備)和雇員股票期權(quán)”工具,相當(dāng)于從BPM5的“其他投資”中分離出來的。

在具體交易項(xiàng)的設(shè)置方面,BPM6相比BPM5也有許多變化。比如,在“貨物”項(xiàng)下,BPM6僅設(shè)有3個(gè)交易項(xiàng),而BPM5則有5個(gè)交易項(xiàng),后者的“加工貨物貿(mào)易”和“對貨物的修理”被改做“服務(wù)”項(xiàng)。在“服務(wù)”貿(mào)易項(xiàng)下,BPM6新設(shè)“對他人所持有形投入的制造性服務(wù)”和“維修服務(wù)”交易項(xiàng),而在BPM5中這2項(xiàng)設(shè)在“貨物”貿(mào)易項(xiàng)下。

BPM6的調(diào)整對中國國際收支統(tǒng)計(jì)數(shù)據(jù)影響的實(shí)證評估若中國采納BPM6作為官方BOP數(shù)據(jù)的編制依據(jù),將對中國的國際交易數(shù)據(jù)造成何種程度的影響?這里以貨物與服務(wù)交易為例進(jìn)行實(shí)證評估。由BPM5和BPM6相關(guān)交易項(xiàng)間的對應(yīng)關(guān)系, BPM6貨物貿(mào)易賬戶與服務(wù)貿(mào)易賬戶交易項(xiàng)可用BPM5相關(guān)項(xiàng)表示如下:

BPM6的“一般貨物貿(mào)易”=BPM5的(一般貨物貿(mào)易+承運(yùn)人在港口購買的貨物);

BPM6的“轉(zhuǎn)口貨物貿(mào)易凈出口”=BPM5的(承運(yùn)人在港口購買的貨物+待加工貨物+貨物修理);

BPM6的“對屬于他人的貨物投入進(jìn)行的制造新服務(wù)”=BPM5的“貨物加工”;

BPM6的“維修服務(wù)等”=BPM5的(“貨物修理”+“其他形式的交通之其他(部分)”)。

對上述對應(yīng)指標(biāo)的相互數(shù)量關(guān)系進(jìn)行如下假設(shè):

BPM6的轉(zhuǎn)口貨物貿(mào)易凈出口=BPM5的(轉(zhuǎn)口貿(mào)易+其他貿(mào)易相關(guān)服務(wù))×0.3;

BPM6的對屬于他人的貨物投入進(jìn)行的制造新服務(wù)=BPM5的(進(jìn)料加工貨物+中國出料加工貨物),貿(mào)易差額的核算也應(yīng)有該等式;

BPM6的維修服務(wù)等=BPM5的(貨物修理+其他模式的交通之其他×0.8)。

事實(shí)上,1993年至今,進(jìn)料加工在中國貨物加工中的占比,在出口中從45%下降到15%,在進(jìn)口中從43%降至25%。需要說明的是,上述有關(guān)調(diào)整系數(shù)的假設(shè)并沒有真實(shí)的憑據(jù),由于被調(diào)整項(xiàng)在貨物貿(mào)易或服務(wù)貿(mào)易中所占份額較小,因而這些假設(shè)對估算結(jié)果的影響十分有限。

基于中國海關(guān)數(shù)據(jù)的相關(guān)計(jì)算結(jié)果顯示,1997 -2008年期間,同BPM5口徑相比:(1)BPM6的中國貨物出口下降45%～56%,進(jìn)口下降33%～48%,貿(mào)易差額下降84%～192%,其中有七年甚至由盈余轉(zhuǎn)為赤字;(2)BPM6的中國服務(wù)出口上漲80%～138%,進(jìn)口上漲58%～81%,而貿(mào)易差額則全部由赤字轉(zhuǎn)為盈余,額度絕對值多數(shù)變得更大; (3)BPM6的中國貨物與服務(wù)出口下降32%～39%,進(jìn)口下降21%～27%,而貿(mào)易差額則以更大幅度下降,甚至有六年由盈余轉(zhuǎn)為赤字;(4)BPM6使中國的貨物出口或進(jìn)口下降約一半,貨物或服務(wù)貿(mào)易差額下降一倍多;服務(wù)貿(mào)易相應(yīng)指標(biāo)則大幅增加;貨物與服務(wù)貿(mào)易出口或進(jìn)口下降30%左右,貿(mào)易差額的降幅更大,一半以上的年份甚至由盈余轉(zhuǎn)為赤字。這些變化將大大有利于中國拓展對外經(jīng)濟(jì)與政治政策空間。

結(jié)論相對于BPM5,BPM6在賬戶結(jié)構(gòu)上的調(diào)整盡管并不具有革命性的意義,但在貨物貿(mào)易和服務(wù)貿(mào)易項(xiàng)目的調(diào)整,以及有關(guān)直接投資、跨國公司和外商附屬企業(yè)經(jīng)營活動(dòng)統(tǒng)計(jì)方面的許多重大進(jìn)展,為世界各國官方統(tǒng)計(jì)提供了很好的建議。這些建議將推動(dòng)中國官方國際收支統(tǒng)計(jì)工作的提高,相關(guān)數(shù)據(jù)能夠在更大范圍內(nèi)發(fā)揮其應(yīng)有的作用,因而值得歡迎和引入。基于中國1997-2008年對外貿(mào)易數(shù)據(jù),本文模擬BPM6貨物貿(mào)易和服務(wù)貿(mào)易項(xiàng)目調(diào)整的影響,發(fā)現(xiàn)這些影響是巨大的,甚至顛覆了人們關(guān)于中國對外貿(mào)易平衡的現(xiàn)有觀點(diǎn),有利于扭轉(zhuǎn)中國以往在對外經(jīng)濟(jì)與政治交往中的被動(dòng)不利局面。

劉應(yīng)安,尹吉平,夏業(yè)茂

南京林業(yè)大學(xué)信息科學(xué)技術(shù)學(xué)院

《具有數(shù)據(jù)缺失森林資源預(yù)測的研究》

森林資源預(yù)測在國內(nèi)外已經(jīng)有不少學(xué)者進(jìn)行探索和研究,運(yùn)用的方法主要有回歸分析、人工神經(jīng)網(wǎng)絡(luò)、灰色系統(tǒng)理論、馬爾可夫鏈、Kalman濾波等。

目前,一些研究者在應(yīng)用Kalman濾波理論開展森林資源預(yù)測的研究中,其主要工作是完全數(shù)據(jù)條件下的森林資源預(yù)測,如何對缺失數(shù)據(jù)的森林資源進(jìn)行預(yù)測的成果并不多見,特別是應(yīng)用Kalman濾波理論對具有缺失數(shù)據(jù)的森林資源預(yù)測的研究。

本文利用狀態(tài)空間模型進(jìn)行參數(shù)估計(jì)對帶有輸入向量的模擬觀測數(shù)據(jù)進(jìn)行了系統(tǒng)全面的研究,得到一系列好的結(jié)果。首先,對完全觀測數(shù)據(jù)下狀態(tài)空間模型進(jìn)行了研究。利用EM算法對真實(shí)參數(shù)進(jìn)行估計(jì)和隨機(jī)模擬,并對數(shù)據(jù)進(jìn)行平滑分析,平滑值序列與由模型實(shí)際產(chǎn)生的觀測序列非常吻合。其次,對缺失數(shù)據(jù)下的狀態(tài)空間模型進(jìn)行了研究。缺失數(shù)據(jù)的產(chǎn)生是以完全數(shù)據(jù)集為基礎(chǔ),對觀測向量的元素以一定比率進(jìn)行隨機(jī)刪失,利用EM算法對真實(shí)參數(shù)進(jìn)行估計(jì)和隨機(jī)模擬,用所得的數(shù)據(jù)進(jìn)行平滑和預(yù)測分析,平滑值序列和預(yù)測值序列與由模型實(shí)際產(chǎn)生的觀測序列都非常吻合。

歐陽志剛,習(xí) 勤

華東交通大學(xué)經(jīng)濟(jì)管理學(xué)院

《協(xié)整平滑轉(zhuǎn)移回歸模型的估計(jì)方法》

現(xiàn)實(shí)中,經(jīng)濟(jì)變量之間的結(jié)構(gòu)關(guān)系可能是非線性的,協(xié)整平滑轉(zhuǎn)移回歸模型正是針對這一實(shí)際經(jīng)濟(jì)背景而對標(biāo)準(zhǔn)協(xié)整理論進(jìn)行的擴(kuò)展和改進(jìn)。由于實(shí)際經(jīng)濟(jì)中解釋變量往往具有內(nèi)生性,因此內(nèi)生性解釋變量會(huì)給協(xié)整平滑轉(zhuǎn)移回歸模型的估計(jì)帶來什么影響?應(yīng)如何校正?

協(xié)整平滑轉(zhuǎn)移回歸模型不失一般性,考慮如下的協(xié)整平滑轉(zhuǎn)移回歸模型:

這里,向量xt=(x1t,x2t,…,xpt)′為p×1維的I(1)解釋變量向量,且r∈[1,p],ut是均值為零的平穩(wěn)隨機(jī)誤差項(xiàng)。xst為閾值變量,λ為決定機(jī)制轉(zhuǎn)換速度的光滑參數(shù),c為閾值。g(·)是閾值變量xrt的連續(xù)函數(shù),其函數(shù)值隨著xrt的變化而連續(xù)變化。進(jìn)一步,令參數(shù)向量θ=(α0,α1,β11,β21,…,β2p,λ,c)′。在協(xié)整平滑轉(zhuǎn)移回歸模型中,常用的轉(zhuǎn)移函數(shù)g(·)為邏輯函數(shù)和指數(shù)函數(shù)。因此,由轉(zhuǎn)移函數(shù)g(·)的性質(zhì)可知,當(dāng)xrt→-∞時(shí),g(·)=0,y,x的協(xié)整關(guān)系由低機(jī)制刻畫(協(xié)整向量為θ=(α0,β11,…, β1p)′),當(dāng)xrt→+∞時(shí),g(·)=1,y,x的協(xié)整關(guān)系由高機(jī)制刻畫(協(xié)整向量為α0+α1,β11+β21,…,β1p+ β2p);當(dāng)xrt在c的兩側(cè)取值時(shí),g(·)>0且g(·)< 1,轉(zhuǎn)移函數(shù)值將連續(xù)地在高機(jī)制和低機(jī)制之間連續(xù)轉(zhuǎn)換。不難發(fā)現(xiàn),正是由于轉(zhuǎn)移函數(shù)g(·)的非線性,使得由模型(1)刻畫的協(xié)整向量具有非線性轉(zhuǎn)移特征,從而體現(xiàn)對標(biāo)準(zhǔn)協(xié)整的擴(kuò)展。

協(xié)整平滑回歸模型的估計(jì)方法

(1)動(dòng)態(tài)非線性最小二乘估計(jì)(DNLS)。在解釋變量內(nèi)生條件下,利用非線性最小二乘法(NLS)對模型(1)進(jìn)行估計(jì),NLS估計(jì)量的極限分布依賴于未知參數(shù)。Saikkonen,Choi(2004)對此的改進(jìn)是,將NLS擴(kuò)展為DNLS,由此得到的參數(shù)估計(jì)量不依賴于未知參數(shù)。本文將Saikkonen,Choi的DNLS擴(kuò)展為FMOLS,估計(jì)模型(1)。為表述清晰,誤差項(xiàng)ut可表述為:

將(2)式代入模型(1),模型(1)就成為:

(2)完全修正的最小二乘估計(jì)(FMOLS)。FMOLS是通過對ut與vt的長期方差的修正,達(dá)到校正^θ極限分布依賴未知參數(shù)的目的。為表述清楚,本文定義平穩(wěn)隨機(jī)過程wt=(ut,v′t)′。平穩(wěn)過程的長期方差Ω定義并分解為:

令^Ωvu,^Ωv為Ωvu,Ωv的一致估計(jì),并定義:=,進(jìn)而,vt的方差協(xié)方差矩陣=Ωu-ΩuvΩuv,Ip為p×p單位矩陣。從,vt的長期協(xié)方差矩陣可以看出,隨機(jī)過程不相關(guān)。這正是FMOLS校正解釋變量內(nèi)生性的核心所在?；谏鲜鏊枷?將模型(1)轉(zhuǎn)化為模型(4):

蒙特卡洛仿真實(shí)驗(yàn)結(jié)果比較

(1)當(dāng)解釋變量沒有內(nèi)生性時(shí),模型的非線性最小二乘估計(jì)結(jié)果(NLS)優(yōu)于DNLS。但當(dāng)解釋變量具有內(nèi)生性,DNLS對模型的估計(jì)結(jié)果顯著優(yōu)于NLS的估計(jì)結(jié)果。

(2)解釋變量具有內(nèi)生性時(shí),實(shí)踐中DNLS估計(jì)效果與超前滯后項(xiàng)k的選擇有關(guān)。研究者往往不知道現(xiàn)實(shí)中ut和vt的相關(guān)特征,這可能導(dǎo)致實(shí)證研究中錯(cuò)誤選擇超前滯后階數(shù)k,從而影響估計(jì)效果。

(3)在解釋變量內(nèi)生情形下,FMOLS估計(jì)效果顯著優(yōu)于NLS的估計(jì)效果。比較仿真實(shí)驗(yàn)中FMOLS與DNLS估計(jì)結(jié)果,可以發(fā)現(xiàn),在DNLS正確選擇的前提下,DNLS和FMOLS兩種估計(jì)方法沒有明顯體現(xiàn)出一種比另一種有顯著更好的估計(jì)效果。實(shí)證研究中如果無法準(zhǔn)確選擇超前滯后項(xiàng)階數(shù),FMOLS對具有內(nèi)生性解釋變量的協(xié)整平滑轉(zhuǎn)移回歸模型的估計(jì)效果要優(yōu)于DNLS的估計(jì)效果。

謝遠(yuǎn)濤1,彭非2

1.對外經(jīng)濟(jì)貿(mào)易大學(xué)保險(xiǎn)學(xué)院;2.中國人民大學(xué)統(tǒng)計(jì)學(xué)院

《中國對外貿(mào)易指數(shù)(1983-1991)的編制方法》

對外貿(mào)易指數(shù)反映了進(jìn)出口商品價(jià)值、價(jià)格和數(shù)量變動(dòng),是國際商品市場的晴雨表。黃國華(2002)把外貿(mào)指數(shù)的應(yīng)用歸納為宏觀、中觀和微觀三個(gè)層面。中國海關(guān)總署綜合統(tǒng)計(jì)司于2005年起正式編制出版《中國對外貿(mào)易指數(shù)》。2008年出版了中國對外貿(mào)易指數(shù)(1993-2004),但是舊核算體系下的指數(shù)一直空缺,本文介紹1982-1991年的貿(mào)易指數(shù)編制方法并簡單分析結(jié)果。

指數(shù)編制方法中國1980—1991年海關(guān)統(tǒng)計(jì)商品目錄是以聯(lián)合國《國際貿(mào)易標(biāo)準(zhǔn)分類》(SITC REV.2)編碼,以標(biāo)準(zhǔn)國際貿(mào)易分類(SITC)為抽樣框,類似美國,但1992年以后采用HS編碼來選樣。

為了直接采用海關(guān)統(tǒng)計(jì)提供的進(jìn)出口商品的金額和數(shù)量,貿(mào)易指數(shù)編算采用單位價(jià)值指數(shù),即以單位價(jià)值代替商品價(jià)值來計(jì)算。進(jìn)口采用到岸價(jià)格,出口采用離岸價(jià)格作計(jì)量單位,盡可能減小時(shí)間不同步的問題,幣值采用美元。拉氏指數(shù)和帕氏指數(shù)的計(jì)算結(jié)果會(huì)隨著時(shí)間的推移差異越來越大,考慮到每過一段時(shí)期就需要更換一次基期,而當(dāng)更換基期時(shí)會(huì)出現(xiàn)替代誤差問題,因此海關(guān)系統(tǒng)公布費(fèi)氏指數(shù)。費(fèi)氏指數(shù)相乘得到費(fèi)氏定基指數(shù),為“鏈型指數(shù)”(李紅艷,2003)。具體編制流程如下:

(1)對SITC5位編碼數(shù)據(jù)進(jìn)行樣本測試,篩選出一般、特殊和特類樣本。

(2)按照各年SITC5位編碼對照表,根據(jù)樣本的總貿(mào)易額占貿(mào)易統(tǒng)計(jì)原始數(shù)據(jù)總貿(mào)易額的比重,選取SITC5位編碼有效樣本。

(3)提取基期和比較期樣本集,對同比和環(huán)比指數(shù),如果個(gè)體價(jià)格指數(shù)落入?yún)^(qū)間[0.5,2.5]之外,則此編碼指數(shù)(價(jià)格指數(shù))不納入下一步計(jì)算;如果落入?yún)^(qū)間[0.5,0.75]或[1.5,2.5],預(yù)設(shè)“人工干預(yù)”。

(4)計(jì)算樣本的個(gè)體指數(shù),用總體價(jià)值加權(quán)逐級匯總,編制SITC3、SITC2編碼指數(shù)和SITC總指數(shù)。

(5)利用SITC分類與HS分類、BEC分類、SNA分類、國民經(jīng)濟(jì)行業(yè)分類之間的對應(yīng)關(guān)系轉(zhuǎn)換樣本編碼計(jì)算價(jià)格指數(shù),利用總體貿(mào)易統(tǒng)計(jì)數(shù)據(jù)加權(quán)計(jì)算相應(yīng)的價(jià)值指數(shù)與物量指數(shù),得到HS、BEC、SNA與行業(yè)分類指數(shù)。

中國對外貿(mào)易指數(shù)編制結(jié)果從中國1983年初到1991年末對外貿(mào)易總指數(shù)知,9年中,出口價(jià)格總指數(shù)的幾何平均數(shù)為98.98%,價(jià)值指數(shù)的幾何平均數(shù)為113.53%。不考慮指數(shù)計(jì)算其他因素,出口數(shù)量平均增長率達(dá)14.7%。進(jìn)口價(jià)格總指數(shù)的幾何平均數(shù)為101.28%,價(jià)值指數(shù)的幾何平均數(shù)為113.39%。不考慮指數(shù)計(jì)算其他因素,出口數(shù)量平均增長率達(dá)11.96%。中國進(jìn)出口貿(mào)易增長迅速且穩(wěn)健。其中,初級產(chǎn)品出口價(jià)格指數(shù)的幾何平均數(shù)為96.9%,物量指數(shù)為108.5%,價(jià)值指數(shù)的幾何平均數(shù)為105.1%;工業(yè)制品出口價(jià)格指數(shù)的幾何平均數(shù)為101.7%,物量指數(shù)為116.1%,價(jià)值指數(shù)的幾何平均數(shù)為118.0%;初級產(chǎn)品進(jìn)口價(jià)格指數(shù)的幾何平均數(shù)為99.9%,物量指數(shù)為102.9%,價(jià)值指數(shù)的幾何平均數(shù)為102.7%;工業(yè)制品進(jìn)口價(jià)格指數(shù)的幾何平均數(shù)為101.7%,物量指數(shù)為115.8%,價(jià)值指數(shù)的幾何平均數(shù)為117.8%。結(jié)合總指數(shù)序列可知,我國進(jìn)出口貿(mào)易中的主要增長源在工業(yè)制品,出口總價(jià)格指數(shù)下降的原因是初級產(chǎn)品的價(jià)格指數(shù)下降帶動(dòng)的。

總體上看,年度指數(shù)比較平穩(wěn),價(jià)格增長率比較平緩;月度指數(shù)變化要激烈的多,極不光滑。價(jià)格指數(shù)變化比較平穩(wěn),但是物量指數(shù)波動(dòng)比較大。

總結(jié)和進(jìn)一步研究上述指數(shù),既有一定的穩(wěn)定性,又有一定的波動(dòng)性,可以利用年度指數(shù)、季度指數(shù)和月度指數(shù)之間的矛盾情況來判斷是否合理。

單位價(jià)值指數(shù)存在一定的偏倚,張梅琳(1992)認(rèn)為單位價(jià)值指數(shù)包含了太多的因素變動(dòng),權(quán)數(shù)逐期變化是單位價(jià)值指數(shù)誤差的另一大原因?，F(xiàn)在,聯(lián)合國和相當(dāng)一部分國家已經(jīng)認(rèn)識(shí)到單位價(jià)值指數(shù)的局限性,正在著手試編價(jià)格指數(shù)。

對今后的貿(mào)易指數(shù)編算方法和編算系統(tǒng)研究,建議改進(jìn)單位價(jià)值指數(shù)的計(jì)算方法,增加國別和地區(qū)指數(shù)計(jì)算功能,嘗試編制隨機(jī)價(jià)格指數(shù),充分利用先驗(yàn)信息和貝葉斯方法,并嘗試根據(jù)特征價(jià)格函數(shù)來計(jì)算價(jià)格指數(shù)(牟嫣,2008),并使用Hedonic質(zhì)量調(diào)整(徐強(qiáng),2009)。

彭國富1,張玲芝2,李寶新1

1.河北經(jīng)貿(mào)大學(xué)數(shù)學(xué)與統(tǒng)計(jì)學(xué)學(xué)院;2.河北經(jīng)貿(mào)大學(xué)計(jì)算機(jī)中心

《基于伯特蘭博弈模型的人民幣匯率合理性評估》

本文擬以匯率決定的博弈機(jī)制為依據(jù),基于博弈伯特蘭博弈模型的分析框架,通過人民幣博弈均衡匯率測算模型的構(gòu)建和實(shí)證分析,完成人民幣匯率的合理性評價(jià)。

在不存在貿(mào)易障礙、且政府不對商品的生產(chǎn)與消費(fèi)進(jìn)行干預(yù)的情況下,國際與國內(nèi)市場將處于完全競爭狀態(tài),廠商實(shí)現(xiàn)了利潤最大化、消費(fèi)者實(shí)現(xiàn)了效用最大化,同時(shí),伯特蘭寡頭模型表明:兩個(gè)國家的廠商將均等瓜分國際市場份額,兩國的國際貿(mào)易收支相等,此時(shí)匯率將處于均衡狀態(tài),但對政府而言,這種均衡狀態(tài)并一定是滿意的。為了更快地促進(jìn)本國經(jīng)濟(jì)的發(fā)展,各國政府無不采用貿(mào)易擴(kuò)張戰(zhàn)略,提高本國商品的國際競爭力,盡可能多地占有國際市場份額。但是,一國國際貿(mào)易份額的增加必然伴隨著另一國的貿(mào)易份額的減少,因此在某一個(gè)國家采取貿(mào)易擴(kuò)張戰(zhàn)略的同時(shí),其貿(mào)易伙伴國也必然采取相應(yīng)的戰(zhàn)略,并圍繞著貿(mào)易份額的分配,依據(jù)供需規(guī)律,以貿(mào)易品價(jià)格為切入點(diǎn)展開匯率博弈,直至兩國的利益均達(dá)到最大化,這種博弈才處于一個(gè)穩(wěn)定的均衡狀態(tài)。

假設(shè)世界經(jīng)濟(jì)由兩國(本國與外國)、兩部門(貿(mào)易品部門和非貿(mào)易品部門)、兩種產(chǎn)品(貿(mào)易品與非貿(mào)易品)組成;兩國的消費(fèi)者和廠商都是理性的。令P1,Q1,c1,P2,Q2,c2,t1,t2為本國貿(mào)易品部門和非貿(mào)易品部門的均衡價(jià)格、均衡產(chǎn)量、單位產(chǎn)品生產(chǎn)成為外國貿(mào)易品部門和非貿(mào)易品部門的均衡價(jià)格、均衡產(chǎn)量、單位生產(chǎn)成本,其中:Q2,t=a2-b2P2,t+波特蘭博弈分析框架可構(gòu)建起博弈均衡匯率測算模型如下:

該模型表明,博弈均衡匯率水平的變動(dòng),取決于兩國貿(mào)易品部門和非貿(mào)易品部門的單位產(chǎn)品成本,易品部門單位產(chǎn)品成本的提升,將促使本國均衡匯率水平上升,外國貿(mào)易品部門與本國非貿(mào)易品部門單位產(chǎn)品成本的提升,將促使本國均衡匯率水平下降。

依據(jù)我們所搜集的中、美、德、韓、日等5國的相關(guān)統(tǒng)計(jì)數(shù)據(jù),利用所構(gòu)建的博弈均衡匯率測算模型,可測算出1990-2007年人民幣博弈均衡匯率,可在此基礎(chǔ)上繪制評估表(表1)和評估圖(圖1),并對人民幣匯率的合理性進(jìn)行評估。

表1 博弈均衡匯率人民幣匯率評估表

圖1 人民幣名義匯率與博弈均衡匯率走勢分析圖

由表1、圖1可以看出,1990年至2007年人民幣名義匯率一直處于幣值低估狀態(tài),但自2005年起這種幣值低估的態(tài)勢開始扭轉(zhuǎn),并逐步趨近于博弈均衡匯率,為此,近期的人民幣已不宜升值。從圖1還可看出,基于資源配置帕累托最優(yōu)的博弈均衡匯率水平,自1995年,其走勢基本平穩(wěn),并略有上升,且與名義匯率走勢基本平行,說明我國近十年來的經(jīng)濟(jì)發(fā)展?fàn)顩r是比較理想的,資源配置也是較合理的。另外,2005年以后人民幣名義匯率趨近于博弈均衡匯率,也說明我國匯率制度改革取得了明顯成效。

施發(fā)啟

國家統(tǒng)計(jì)局國民經(jīng)濟(jì)核算司

《普查年度GDP數(shù)據(jù)與常規(guī)年度GDP數(shù)據(jù)銜接方法研究》

自新中國成立以來,國家統(tǒng)計(jì)局已先后組織實(shí)施了五次人口普查、三次工業(yè)普查、兩次基本單位普查、一次農(nóng)業(yè)普查、一次第三產(chǎn)業(yè)普查、二次經(jīng)濟(jì)普查,尤其是在2004年和2008年開展的首次全國經(jīng)濟(jì)普查,涵蓋了除農(nóng)業(yè)之外的所有產(chǎn)業(yè),可謂是建國以來最綜合、最全面的兩次大型普查。普查為準(zhǔn)確掌握我國經(jīng)濟(jì)規(guī)模和結(jié)構(gòu)變化,摸清各領(lǐng)域經(jīng)濟(jì)活動(dòng)的基本概貌,確定國民經(jīng)濟(jì)發(fā)展戰(zhàn)略目標(biāo)和規(guī)劃,制定社會(huì)經(jīng)濟(jì)政策提供了科學(xué)依據(jù),同時(shí)也為國民經(jīng)濟(jì)核算提供了基準(zhǔn)數(shù)據(jù)。為了提高歷史統(tǒng)計(jì)數(shù)據(jù)的可比性和準(zhǔn)確性,必須以普查數(shù)據(jù)為基準(zhǔn)對相關(guān)歷史數(shù)據(jù)進(jìn)行修正。目前,國際上尚未有公認(rèn)的修正方法,因此,采用何種修正方法就顯得非常重要,因?yàn)檫@不僅關(guān)系到修正后歷史數(shù)據(jù)的質(zhì)量,而且也關(guān)系到政府統(tǒng)計(jì)的形象和政府統(tǒng)計(jì)數(shù)據(jù)的公信力。

本文第一部分闡述了GDP歷史數(shù)據(jù)的修訂原則;第二部分介紹了GDP歷史數(shù)據(jù)的修訂方法,包括等差內(nèi)插法、等比內(nèi)插法、趨勢離差法、相關(guān)指標(biāo)加權(quán)平均法、等速內(nèi)插法和最小二乘內(nèi)插法,其中后兩種方法為作者首次提出;第三部分為實(shí)際應(yīng)用,利用實(shí)際數(shù)據(jù)對以上六種方法的測算結(jié)果進(jìn)行了比較,發(fā)現(xiàn)最小二乘內(nèi)插法效果最好,其次為等速內(nèi)插法,再次為趨勢離差法(等比內(nèi)插法為趨勢離差法的一個(gè)特例)和相關(guān)指標(biāo)加權(quán)平均法,最差的方法為等差內(nèi)插法。

石磊1,干文1,黃梅2

1.云南財(cái)經(jīng)大學(xué)統(tǒng)計(jì)與數(shù)學(xué)學(xué)院;2.云南民族大學(xué)經(jīng)濟(jì)與工商管理學(xué)院

《逐步局部影響分析及其應(yīng)用》

1986年,著名統(tǒng)計(jì)學(xué)家Dennies Cook開創(chuàng)性的提出了一種識(shí)別影響點(diǎn)的局部影響分析方法,其基本思想是利用基于同時(shí)擾動(dòng)模型的某個(gè)部分,使用似然距離影響圖的法曲率來度量擾動(dòng)的影響,并由此得到診斷統(tǒng)計(jì)量來識(shí)別數(shù)據(jù)中的影響點(diǎn)。由于擾動(dòng)是同時(shí)擾動(dòng)或稱為聯(lián)合擾動(dòng),使得數(shù)據(jù)中多個(gè)影響點(diǎn)的聯(lián)合效應(yīng)能夠被識(shí)別出來,再加上該方法對許多復(fù)雜模型(比如需要迭代計(jì)算的模型)計(jì)算比數(shù)據(jù)刪除法簡便易行,因此這一新的方法提出之后獲得了廣泛的應(yīng)用。

在影響分析中,Masking-效應(yīng)的識(shí)別是比較困難但又非常重要的問題,因此得到了廣泛的討論和研究。Masking效應(yīng)是指當(dāng)出現(xiàn)多個(gè)影響點(diǎn),使用單點(diǎn)數(shù)據(jù)刪除法時(shí),某個(gè)強(qiáng)影響點(diǎn)會(huì)掩蓋其他影響點(diǎn)的影響大小,從而不能有效的識(shí)別其中的影響點(diǎn)。Lawrance(1988)注意到利用局部影響分析可以識(shí)別數(shù)據(jù)中Masking效應(yīng)。由于局部影響分析同時(shí)擾動(dòng)數(shù)據(jù)點(diǎn),因此能夠研究數(shù)據(jù)的聯(lián)合影響,進(jìn)而識(shí)別某些Masking效應(yīng)。但是,如果數(shù)據(jù)中的Masking效應(yīng)比較強(qiáng),特別是時(shí)間序列數(shù)據(jù),影響點(diǎn)或異常值常常成片出現(xiàn),此時(shí)局部影響分析方法對識(shí)別這種影響點(diǎn)是無效的?；谶@一原因,許多學(xué)者提出使用局部影響中法曲率的關(guān)鍵矩陣的前幾個(gè)特征向量來提取影響點(diǎn)的更多信息,或是定義一種考慮多個(gè)特征向量的綜合度量方式來識(shí)別影響點(diǎn)。然而,這些方法也不能有效識(shí)別時(shí)間序列中的成片影響點(diǎn),同時(shí)它們還面臨一個(gè)共同問題,即我們應(yīng)該選定多少個(gè)特征向量。除此以外,如果使用多個(gè)特征向量來識(shí)別影響點(diǎn),也無法比較不同特征向量在識(shí)別影響點(diǎn)時(shí)的差異。

逐步局部影響分析當(dāng)數(shù)據(jù)中存在多個(gè)影響點(diǎn)時(shí),如果其中某一點(diǎn)是強(qiáng)影響點(diǎn),那么這一點(diǎn)對應(yīng)的局部影響診斷統(tǒng)計(jì)量(是一個(gè)單位向量)的數(shù)值將會(huì)很大(指絕對值)。假如此時(shí)影響點(diǎn)之間又存在相互影響,這可能會(huì)減弱其他影響點(diǎn)的效應(yīng),當(dāng)這種現(xiàn)象嚴(yán)重時(shí)就出現(xiàn)了Masking-效應(yīng)。

為了在使用局部影響方法時(shí)最大程度地避免Masking-效應(yīng),我們提出逐步局部影響分析的新方法,基本思想和過程如下:

首先,對數(shù)據(jù)(或模型)進(jìn)行全面擾動(dòng),利用傳統(tǒng)的局部影響分析的方法識(shí)別數(shù)據(jù)中的強(qiáng)影響點(diǎn);其次,對這些識(shí)別出的影響點(diǎn)不做擾動(dòng),而對剩余的數(shù)據(jù)進(jìn)行子集擾動(dòng)(即部分?jǐn)?shù)據(jù)的擾動(dòng));最后,利用基于子集擾動(dòng)模式下進(jìn)行局部影響分析,并構(gòu)造相應(yīng)的診斷統(tǒng)計(jì)量以識(shí)別出剩余數(shù)據(jù)中可能存在的影響點(diǎn)。此過程一直迭代下去,直至沒有進(jìn)一步的影響點(diǎn)識(shí)別出來。

上述迭代過程中,最需要確定每一步判定影響點(diǎn)的基準(zhǔn)值。在第一步的全面擾動(dòng)下,由于使用的診斷統(tǒng)計(jì)量是標(biāo)準(zhǔn)化的特征向量,因此我們可以使用2/n作為基準(zhǔn)值。在其它迭代步驟里,由于擾動(dòng)的觀測值是變化的,建議采用可變基準(zhǔn)值或平均基準(zhǔn)值。平均基準(zhǔn)是固定和可變基準(zhǔn)的線性組合,其權(quán)重系數(shù)取決于前一步識(shí)別出來的影響點(diǎn)的比例,這一準(zhǔn)則充分考慮了識(shí)別出的影響點(diǎn)的個(gè)數(shù)的作用。由于給出的基準(zhǔn)僅僅是一種粗略的準(zhǔn)則而不是一個(gè)嚴(yán)格的判斷值,相比于用基準(zhǔn)點(diǎn)來識(shí)別影響點(diǎn),觀察法也是一個(gè)不錯(cuò)的選擇。在實(shí)際應(yīng)用中,同時(shí)使用基準(zhǔn)點(diǎn)和觀察法來識(shí)別影響點(diǎn)是非常有效的。

應(yīng)用及結(jié)論我們將本文提出的方法應(yīng)用于線性回歸模型、混合線性模型和時(shí)間序列ARMA模型,以識(shí)別數(shù)據(jù)中存在的多個(gè)或成片影響點(diǎn)。在線性模型和混合線性模型中通過3個(gè)實(shí)例進(jìn)行了分析,時(shí)間序列ARMA模型通過模型實(shí)例進(jìn)行分析。結(jié)果表明,我們提出的方法都能有效識(shí)別數(shù)據(jù)中的多個(gè)影響點(diǎn)。因此,這種逐步局部影響分析的方法可以應(yīng)用到其它更復(fù)雜的一般模型中識(shí)別可能出現(xiàn)的Masking效應(yīng),具有廣泛的應(yīng)用價(jià)值。

宋世斌1,馮麗智1,黃敏妹2

1.中山大學(xué)嶺南學(xué)院風(fēng)險(xiǎn)管理與保險(xiǎn)學(xué)系;2.中山大學(xué)統(tǒng)計(jì)科學(xué)系

《醫(yī)療保障體系的公共債務(wù):中國正在步美國的后塵》

利用社會(huì)保險(xiǎn)精算方法,預(yù)測了我國醫(yī)療保險(xiǎn)體系的長期收支狀況、醫(yī)療救助成本和政府的公共財(cái)政負(fù)擔(dān)。結(jié)果顯示,我國醫(yī)療保障體系的公共債務(wù)風(fēng)險(xiǎn)將步美國后塵。現(xiàn)收現(xiàn)付制下,未來基金運(yùn)行將會(huì)出現(xiàn)龐大的收支缺口,政府需要承擔(dān)巨大的醫(yī)療保障開支。必須對醫(yī)?；I資模式進(jìn)行改革,并嚴(yán)格控制醫(yī)療費(fèi)用上漲勢頭,才有可能實(shí)現(xiàn)醫(yī)保體系可持續(xù)發(fā)展。

本文分好、中、差三種情形來預(yù)測醫(yī)保體系的債務(wù)風(fēng)險(xiǎn)。測算方法可簡單的表示成:

t年基金現(xiàn)值=基金結(jié)余現(xiàn)值+政府當(dāng)年補(bǔ)貼現(xiàn)值+當(dāng)年繳費(fèi)現(xiàn)值-當(dāng)年支出現(xiàn)值

醫(yī)療費(fèi)用增長速度和工資增長速度直接影響醫(yī)保系統(tǒng)的基金收支水平,因此好、中、差三種情景的劃分主要依據(jù)兩者之間的差距設(shè)定。假定工資增長率2005—2025年從10%均勻遞減到6%,之后保持6%不變。醫(yī)療費(fèi)用增長率則根據(jù)醫(yī)療費(fèi)用增長相對于工資增長的彈性系數(shù)來確定。依據(jù)歷史數(shù)據(jù),假定2005年城鎮(zhèn)職工和居民醫(yī)療費(fèi)用增長率為13%,較好情形下,醫(yī)療費(fèi)用增長率2005—2035年從13%逐漸遞減到6.6%,之后保持6.6%不變;一般情形下,醫(yī)療費(fèi)用增長率2005—2035年從13%逐漸遞減到7.2%,之后保持7.2%不變;較差情形下,醫(yī)療費(fèi)用增長率2005—2035年從13%逐漸遞減到7.8%,之后保持7.8%不變。其它相關(guān)參數(shù)假設(shè)如表1。

未來我國醫(yī)療保障體系的醫(yī)療費(fèi)用支出占GDP的比例呈持續(xù)上漲趨勢(見圖1)。一般情形下,醫(yī)療費(fèi)用年均增長率為8.1%,高于未來GDP的增長速度,預(yù)計(jì)到2090年醫(yī)療費(fèi)用支出占GDP的25%,若計(jì)入醫(yī)療救助支出及個(gè)人自負(fù)的醫(yī)療費(fèi)用,這個(gè)比例將更大。

圖1 醫(yī)療支出占GDP的比重預(yù)測圖

作為社會(huì)保障制度的主導(dǎo)者,政府承擔(dān)了大量的公共財(cái)政補(bǔ)貼,如政府對退休職工、貧困居民、農(nóng)村人口等低收入者的醫(yī)保參保進(jìn)行補(bǔ)貼,進(jìn)行醫(yī)療救助等。同時(shí),人口老齡化進(jìn)程的加快以及醫(yī)療費(fèi)用的高速增長,公共財(cái)政將對醫(yī)?；鹗罩С嘧诌M(jìn)行托底。根據(jù)前面的測算思路和精算假設(shè),可預(yù)測出政府在醫(yī)療保障體系中所承擔(dān)的公共債務(wù)如圖2。

可以預(yù)計(jì),隨著人口老齡化和醫(yī)療成本增加,醫(yī)保債務(wù)占公共財(cái)政支出將持續(xù)上漲,一般情形下, 2050年占財(cái)政支出比例為15%,2090年則達(dá)到40%。由此可看到,我國正在步美國的后塵,未來面臨嚴(yán)重的醫(yī)保公共債務(wù)風(fēng)險(xiǎn)?，F(xiàn)行醫(yī)療保障體系將產(chǎn)生龐大的收支赤字,并轉(zhuǎn)化成政府承擔(dān)的公共債務(wù),其嚴(yán)重性并不亞于美國,必須及早籌劃、積極應(yīng)對,才能保證醫(yī)保體系的可持續(xù)運(yùn)行。田茂再中國人民大學(xué)統(tǒng)計(jì)學(xué)院《局部適應(yīng)性分位回歸及其應(yīng)用》

圖2 財(cái)政負(fù)擔(dān)占預(yù)算支出的比重預(yù)測圖

分位回歸的概念首先由Koenker and Bassett (1978)提出,并且提出了該模型的一個(gè)優(yōu)良性質(zhì)。由于分位回歸相對于均值回歸來說,它能夠全面刻畫出給定高維解釋變量條件下響應(yīng)變量的各分位點(diǎn)情況,這使得它對一些具體問題的處理有其獨(dú)特優(yōu)勢,因此近幾十年來發(fā)展迅猛,逐漸成為非參數(shù)統(tǒng)計(jì)領(lǐng)域中不可或缺的重要組成部分。這里要研究的是非參數(shù)模型。假定有獨(dú)立同分布的隨機(jī)變量{(Xi,

其中Xi,i=1,…,n為設(shè)計(jì)點(diǎn),殘差εi獨(dú)立同分布于某個(gè)未知分布,我們所要解決的問題是估計(jì)參數(shù)給定解釋變量后響應(yīng)變量的條件分位數(shù)θ:

其中檢驗(yàn)函數(shù)為ρτ(u)=u{τI(u≥0)-(1-τ)I(u<0)},K(·)為核函數(shù),hi為變化的窗寬函數(shù)。

現(xiàn)有很多文章都在研究局部平滑問題。相對于分位回歸來說,已經(jīng)提出了關(guān)于解決均值回歸的很多方法。然而就我們所知,現(xiàn)有論文還沒涉及到到條件分位回歸中。在本文中,擴(kuò)展了自適應(yīng)加權(quán)平滑方法到條件分位曲線上,應(yīng)用局部常數(shù)逼近,建立了一個(gè)能夠自動(dòng)選擇局部自適應(yīng)窗寬的準(zhǔn)則。

定理1 令潛在的條件分位回歸函數(shù)θτ=Ai是不相連的,且有A1∪,…,∪AL∈Rd并且IAi(·)是Ai的示性函數(shù),令Δk3(Xi)ΑAi是由我們提出的自適應(yīng)算法選出的最大臨域,令=C2,對ρ有η2≥(2+ρ)log(n),有P{|^θ(k3)(xi) -^θ(k3)(xj)|<,對所有i≠j}>1-明了在已經(jīng)應(yīng)用了最大臨域的情況下,齊性的自適應(yīng)估計(jì)很大概率程度上是一個(gè)常數(shù)估計(jì)。

定理2 令潛在的條件分位回歸函數(shù)θτ=Ai是不相連的,且有A1∪,…,∪AL∈Rd并且理2表明臨域的非齊性性質(zhì),對于不同區(qū)域的自適應(yīng)估計(jì)是有較大不同的。

從模擬檢驗(yàn)中,能夠明確看出AQR自適應(yīng)方法相比LQR局部線性擬合方法有很大的優(yōu)勢。對比了均值絕對誤差,結(jié)果顯示自適應(yīng)方法比普通分位回歸、ICI準(zhǔn)則、非參數(shù)穩(wěn)健估計(jì)NQR以及局部線性分位回歸等方法的MAE要小得多。同時(shí),又比較了各種方法中窗寬的變化規(guī)律。在實(shí)際數(shù)據(jù)分析中,舉例說明了教授工資與教學(xué)年份的關(guān)系,也證明了局部自適應(yīng)的常數(shù)分位回歸的方法有其獨(dú)特的優(yōu)勢。

錢林義,汪榮明

華東師范大學(xué)金融與統(tǒng)計(jì)學(xué)院

《Regime Switch Levy模型驅(qū)動(dòng)下權(quán)益連結(jié)保險(xiǎn)的風(fēng)險(xiǎn)最小化對沖策略》

權(quán)益連結(jié)保險(xiǎn)產(chǎn)品是上世紀(jì)70年代保險(xiǎn)業(yè)為了與銀行業(yè)的競爭而推出來的一類新型保險(xiǎn)產(chǎn)品。這類保險(xiǎn)產(chǎn)品兼具保障和投資雙重功能,產(chǎn)品的收益與資本市場息息相關(guān)。中國上世紀(jì)末本世紀(jì)初推出來的投資連結(jié)保險(xiǎn)、萬能保險(xiǎn)就屬這一類型。權(quán)益連結(jié)保險(xiǎn)產(chǎn)品的定價(jià)和風(fēng)險(xiǎn)管理一直是理論界和實(shí)務(wù)界研究的熱點(diǎn)和難點(diǎn)問題。本文研究了Re2 gime Switch Levy模型驅(qū)動(dòng)下權(quán)益連結(jié)保險(xiǎn)的風(fēng)險(xiǎn)最小化對沖策略。

當(dāng)風(fēng)險(xiǎn)源數(shù)目多于交易資產(chǎn)數(shù)目時(shí),市場是不完備的,此時(shí)市場上找不到自融資策略對沖未定權(quán)益。不完備市場中存在無窮多使資產(chǎn)價(jià)格貼現(xiàn)過程是鞅的測度,因此未定權(quán)益就會(huì)有很多價(jià)格,需要附加一定的準(zhǔn)則,例如風(fēng)險(xiǎn)最小化、效用最大化等。風(fēng)險(xiǎn)最小化方法就是在成本過程的條件均方誤差最小化準(zhǔn)則下尋找對沖策略進(jìn)而定價(jià)的方法。此種情況下的鞅測度稱為風(fēng)險(xiǎn)最小化鞅測度。

風(fēng)險(xiǎn)最小化方法是由F?llmer和Sondermann (1986)提出來的,他們當(dāng)時(shí)假定資產(chǎn)價(jià)格貼現(xiàn)過程是個(gè)鞅,并且是在離散時(shí)間下推出了風(fēng)險(xiǎn)最小化策略。隨后F?llmer and Schweizer(1991)將結(jié)果推廣到連續(xù)時(shí)間情形。Schweizer(1993)進(jìn)一步推廣到半鞅的情形。Moller(1998)首次將風(fēng)險(xiǎn)最小化方法應(yīng)用到保險(xiǎn)產(chǎn)品的風(fēng)險(xiǎn)對沖中來,并推導(dǎo)了Black-Schole市場模型下權(quán)益連結(jié)保險(xiǎn)產(chǎn)品的風(fēng)險(xiǎn)最小化對沖策略,Moller(2001)將結(jié)果推廣到支付流情形。Riesner(2006)以及Vandaele and Van2 maele(2008)研究了Levy模型驅(qū)動(dòng)下權(quán)益連結(jié)保險(xiǎn)產(chǎn)品的風(fēng)險(xiǎn)最小化對沖策略。我們對Vandaele and Vanmaele(2008)的模型進(jìn)行了推廣,研究了模型參數(shù)依賴經(jīng)濟(jì)狀態(tài)情形下Levy模型驅(qū)動(dòng)的權(quán)益連結(jié)保險(xiǎn)產(chǎn)品的風(fēng)險(xiǎn)最小化對沖策略。模型參數(shù)依賴經(jīng)濟(jì)狀態(tài),并用一個(gè)馬氏鏈來刻畫經(jīng)濟(jì)狀態(tài),即所謂的Regime Switch模型。最近幾年,Regime Switch模型在精算中的應(yīng)用越來越廣泛。本文運(yùn)用Kunita-Watanabe分解等隨機(jī)分析理論及風(fēng)險(xiǎn)最小化的定義推導(dǎo)得到了生存和定期死亡兩類權(quán)益連結(jié)保險(xiǎn)的風(fēng)險(xiǎn)最小化對沖策略及風(fēng)險(xiǎn)最小化鞅測度。

王曉軍

中國人民大學(xué)統(tǒng)計(jì)學(xué)院

《中國社會(huì)養(yǎng)老保險(xiǎn)收支風(fēng)險(xiǎn)評估》

中國歷經(jīng)了20多年的養(yǎng)老保險(xiǎn)改革,一直沿著擴(kuò)大覆蓋面、提高社會(huì)統(tǒng)籌層次、做實(shí)個(gè)人賬戶方向進(jìn)行,目標(biāo)是建立一個(gè)全國統(tǒng)籌、能夠?yàn)閺V大勞動(dòng)者提供基本退休生活保障、在財(cái)務(wù)上具有償付能力、可持續(xù)發(fā)展的養(yǎng)老保障制度。但是,當(dāng)前的制度仍然只能覆蓋大約1/4的全國勞動(dòng)者,面對日益嚴(yán)重的人口老化,未來制度將面臨嚴(yán)重的償付壓力。

對養(yǎng)老保險(xiǎn)制度償付能力的評價(jià),英、美等發(fā)達(dá)國家有專門的社會(huì)保障精算評估機(jī)構(gòu)和定期的精算報(bào)告制度,中國政府部門雖然已經(jīng)認(rèn)識(shí)到開展相關(guān)精算評估的重要性和迫切性,但并沒有建立相應(yīng)的定期評估制度,對養(yǎng)老保險(xiǎn)未來收支的預(yù)測主要來自于學(xué)者的相關(guān)研究。

在已有的相關(guān)研究中,對養(yǎng)老保險(xiǎn)未來收支的預(yù)測一般建立在簡化模型和簡化假設(shè)的基礎(chǔ)上,沒有考慮若干因素的未來變動(dòng),包括:擴(kuò)面及其分布,退休年齡分布,養(yǎng)老金調(diào)整指數(shù),長壽風(fēng)險(xiǎn)等,這些因素的未來變動(dòng)將會(huì)對養(yǎng)老保險(xiǎn)未來收支產(chǎn)生較大的影響。

在退休年齡上,一般的精算評估假設(shè)男性和女性分別在60歲和55歲,實(shí)際上,退休年齡是一個(gè)動(dòng)態(tài)的分布。當(dāng)前的數(shù)據(jù)顯示,我國的退休年齡主要集中在45歲到60歲的年齡段。在退休年齡的點(diǎn)假設(shè)下,會(huì)將45-59歲已退休的人群估計(jì)為繳費(fèi)人數(shù),必然高估了養(yǎng)老保險(xiǎn)的繳費(fèi)收入,低估了養(yǎng)老保險(xiǎn)的待遇支出,而且隨著時(shí)間的推移,這種高估和低估具有累積效應(yīng)。

在養(yǎng)老金待遇調(diào)整上,通常假設(shè)養(yǎng)老金待遇按通脹指數(shù)調(diào)整。實(shí)際上,近10年內(nèi)中國養(yǎng)老金待遇一直保持與工資同步增長,這將大大提高養(yǎng)老金的待遇支出,如果保持這種養(yǎng)老金待遇的調(diào)整水平,將使未來支出大大增加。

在死亡率假設(shè)上,已有的精算評估往往采用一個(gè)恒定的死亡率假設(shè),但隨著社會(huì)經(jīng)濟(jì)發(fā)展和醫(yī)療技術(shù)的進(jìn)步,人口死亡率不斷下降,人口壽命不斷延長,會(huì)使養(yǎng)老金領(lǐng)取時(shí)間延長,養(yǎng)老金待遇支出增加。

在個(gè)人賬戶的可繼承權(quán)上,中國實(shí)行社會(huì)統(tǒng)籌與個(gè)人賬戶相結(jié)合的養(yǎng)老保險(xiǎn)制度,并規(guī)定參保人死亡后個(gè)人賬戶余額可以繼承,而已有的評估結(jié)果大多忽略了對這部分支出的評估,從而部分低估了支出。

本文運(yùn)用養(yǎng)老保險(xiǎn)精算模型對我國養(yǎng)老保險(xiǎn)的未來收入與支出進(jìn)行了測算,對養(yǎng)老保險(xiǎn)制度財(cái)務(wù)可持續(xù)性進(jìn)行了分析,測算了退休年齡、養(yǎng)老金調(diào)整指數(shù)、預(yù)期壽命、個(gè)人賬戶的繼承權(quán)等變動(dòng)對養(yǎng)老保險(xiǎn)未來收支的影響,得出忽略這些因素的變動(dòng)將對養(yǎng)老保險(xiǎn)未來收支估計(jì)產(chǎn)生的定量影響等,最后給出了相關(guān)的政策建議。

謝志剛

上海財(cái)經(jīng)大學(xué)金融學(xué)院

《制定保險(xiǎn)公司償付能力標(biāo)準(zhǔn)的精算研究支持》

保險(xiǎn)公司的償付能力標(biāo)準(zhǔn),相當(dāng)于商業(yè)銀行的資本充足率標(biāo)準(zhǔn),對整個(gè)行業(yè)的經(jīng)營狀況都會(huì)產(chǎn)生重要影響。制定合理的償付能力(資本充足率)標(biāo)準(zhǔn),對于各公司和全行業(yè)的經(jīng)營都是至關(guān)重要的。

而合理的、或者說符合我國保險(xiǎn)業(yè)發(fā)展實(shí)際的償付能力標(biāo)準(zhǔn),究竟應(yīng)該怎么制定呢?

目標(biāo)問題及解決方向精算研究應(yīng)該對我國保險(xiǎn)業(yè)制定合理的償付能力標(biāo)準(zhǔn)提供支持,精算支持首先在于對保險(xiǎn)公司所面臨的各種風(fēng)險(xiǎn)進(jìn)行識(shí)別和評估,為合理標(biāo)準(zhǔn)的制定提供理論依據(jù)和測試方法。

我國保險(xiǎn)業(yè)目前實(shí)行《保險(xiǎn)公司償付能力管理規(guī)定》,該規(guī)定雖然沒有重新設(shè)定計(jì)算最低資本的方法,卻強(qiáng)調(diào)要“基于風(fēng)險(xiǎn)”來計(jì)算最低資本。因此,如何理解“基于風(fēng)險(xiǎn)”是研究的出發(fā)點(diǎn)。

首先,將保險(xiǎn)公司所面臨的風(fēng)險(xiǎn)分為可以量化的風(fēng)險(xiǎn)與不可或難以量化的風(fēng)險(xiǎn)兩大類。大量研究以及實(shí)踐經(jīng)驗(yàn)都表明:在保險(xiǎn)公司所面臨的各種風(fēng)險(xiǎn)中,可以量化的風(fēng)險(xiǎn)只是少數(shù),更多的則是不能或難以量化的風(fēng)險(xiǎn);保險(xiǎn)公司的資產(chǎn)風(fēng)險(xiǎn),由于監(jiān)管機(jī)構(gòu)對保險(xiǎn)資金投資渠道和規(guī)?？刂频恼咦兏^快,也由于我國資本市場和房地產(chǎn)市場的不夠成熟和完善,以及我國金融保險(xiǎn)業(yè)對資產(chǎn)的分類及價(jià)值核算規(guī)則都正處于變動(dòng)時(shí)期,關(guān)于資產(chǎn)風(fēng)險(xiǎn)的可量化程度,包括各類資產(chǎn)之間風(fēng)險(xiǎn)的相關(guān)性的量化程度遠(yuǎn)低于相對較成熟的歐美市場。

經(jīng)過從宏觀層面、行業(yè)層面、再到機(jī)構(gòu)層面的風(fēng)險(xiǎn)構(gòu)成分析,并將其與歐盟(現(xiàn)行的Solvency I和計(jì)劃的Solvency II)和美國保險(xiǎn)業(yè)(RBC)計(jì)算償付能力資本的模型及過程對比后可看到,所謂“基于風(fēng)險(xiǎn)計(jì)算償付能力資本”,不過是基于其中的一小部分可量化風(fēng)險(xiǎn)而已,若不仔細(xì)研究,很容易被“忽悠”。

再看風(fēng)險(xiǎn)資本模型的理論基礎(chǔ),無論是歐盟還是美國模型,其基本的理論模型都是破產(chǎn)概率模型。以非壽險(xiǎn)為例,代表性的兩篇公開文獻(xiàn)是Com2 pagne(1961)和De Mori(1965),但從其推導(dǎo)過程可以看到,其破產(chǎn)概率模型中所使用的賠付率定義與我們現(xiàn)在的滿期賠付率并不一致,對費(fèi)用率的假設(shè)更是過于簡化,而且用歐盟不同成員國市場的樣本數(shù)據(jù)來測算和確定模型參數(shù),導(dǎo)致結(jié)果差異很大,取平均值作為研究結(jié)論,倒也簡單明確:最低資本要求為凈保費(fèi)的25%,歐盟委員會(huì)則將該結(jié)果略為調(diào)整后作為非壽險(xiǎn)償付能力標(biāo)準(zhǔn)(EU Directive 73/239/ EEC)。我國保險(xiǎn)業(yè)關(guān)于非壽險(xiǎn)的現(xiàn)行標(biāo)準(zhǔn)正是借鑒該標(biāo)準(zhǔn),并且特別巧合的是,我國《保險(xiǎn)法》第102條要求“財(cái)產(chǎn)保險(xiǎn)公司當(dāng)年自留保險(xiǎn)費(fèi)不得超過其實(shí)有資本金加公積金總和的四倍”的規(guī)定,正好相當(dāng)于Compagne提出的凈保費(fèi)的25%的研究建議?？傊?無論是我國《保險(xiǎn)法》中關(guān)于財(cái)產(chǎn)保險(xiǎn)公司最低償付能力標(biāo)準(zhǔn)還是保監(jiān)會(huì)規(guī)定的計(jì)算標(biāo)準(zhǔn),都需要有恰當(dāng)?shù)睦碚撃Ｐ鸵罁?jù),而張昌磊(2010)提出用綜合成本率改進(jìn)Compagne模型的建議,是更恰當(dāng)?shù)睦碚撘罁?jù)模型。

關(guān)于歐盟Solvency II標(biāo)準(zhǔn),其草案已獲歐洲議會(huì)批準(zhǔn),計(jì)劃從2012年開始實(shí)施,勢在必行。基于以下三條理由,筆者并不看好該標(biāo)準(zhǔn),判定它完全不適合我國保險(xiǎn)市場實(shí)際,甚至不一定適合歐盟市場的總體實(shí)際情況:第一、Solvency II沒有使復(fù)雜的事情變簡單,而是使簡單的事情變復(fù)雜了,尤其是允許使用內(nèi)部模型計(jì)算“風(fēng)險(xiǎn)資本”SCR,更是破壞了公司經(jīng)營業(yè)績的透明度和相互間的可比性。第二、Solvency II沒有減少保險(xiǎn)經(jīng)營成本,反而大大提高了經(jīng)營成本,因?yàn)樗环矫嬖黾淤Y本成本,另一方面增加人力成本,包括監(jiān)管成本。第三、Solvency II的研制過程在形式上非常透明,大量文本和研究報(bào)告都及時(shí)公開征求意見,但在一些關(guān)鍵技術(shù)問題上,比如風(fēng)險(xiǎn)相關(guān)性的協(xié)方差矩陣的來源和技術(shù)支持,卻語焉不詳。諸如此類的細(xì)節(jié),導(dǎo)致那些想跟隨采用歐盟標(biāo)準(zhǔn)的國家和地區(qū),只能學(xué)個(gè)形式,知其然而不知其所以然。

初步結(jié)論及建議精算的價(jià)值在于,一方面協(xié)助公司和行業(yè)對業(yè)務(wù)運(yùn)作過程中的風(fēng)險(xiǎn)構(gòu)成和形成機(jī)制建立一個(gè)整體的了解,另一方面在于對其中一些關(guān)鍵的可量化風(fēng)險(xiǎn)因素有可信的技術(shù)把握,并基于以上兩點(diǎn)而不是其中一點(diǎn)提供決策支持服務(wù)。對于我國保險(xiǎn)業(yè)制定合理的償付能力標(biāo)準(zhǔn),本研究的結(jié)論和建議包括:(1)研究過程要深入細(xì)致,標(biāo)準(zhǔn)必須要有理論依據(jù)和數(shù)據(jù)測試檢驗(yàn),但結(jié)果必須簡單明確。償付能力標(biāo)準(zhǔn)的結(jié)果表述決不能采用歐盟Solvency II的方式,也不宜采用美國NAIC的RBC方式。當(dāng)然,也不能走另一極端,直接在《保險(xiǎn)法》(第102-103條)中規(guī)定具體標(biāo)準(zhǔn)。(2)適合我國保險(xiǎn)業(yè)的償付能力標(biāo)準(zhǔn),應(yīng)采用“分險(xiǎn)種的簡單比例”方式,形式上與現(xiàn)行標(biāo)準(zhǔn)差別不大,基數(shù)可考慮凈保費(fèi)、準(zhǔn)備金、保額等項(xiàng)目,但比例參數(shù)必須逐一測算,而且必須有理論依據(jù)。(3)償付能力標(biāo)準(zhǔn)的制定,可以“基于”部分風(fēng)險(xiǎn),但必須明確究竟是基于什么具體風(fēng)險(xiǎn)?歐盟和美國的模型,從形式上看都是從細(xì)處往大處、從具體到整體的推算過程。針對我國的情況,則應(yīng)該先算大賬再算小賬,即先判斷保險(xiǎn)公司的“資產(chǎn)風(fēng)險(xiǎn)”、“保險(xiǎn)風(fēng)險(xiǎn)”以及“其它風(fēng)險(xiǎn)”各自應(yīng)在各類保險(xiǎn)公司的償付能力資本要求中究竟會(huì)占多大比例,然后再具體去測算每一風(fēng)險(xiǎn)因素的分布情況。(4)“基于風(fēng)險(xiǎn)”是一個(gè)循序漸進(jìn)的過程,歐盟從Solvency I到Solvency II的跨度如此之大,是不實(shí)際的,更不適合我國保險(xiǎn)業(yè)仿效。我國保險(xiǎn)業(yè)所選擇的標(biāo)準(zhǔn),應(yīng)該是介于SolvencyII中的“最低資本要求(MCR)”和“風(fēng)險(xiǎn)資本要求(SCR)”之間,而且在開始階段偏向于MCR,再逐步偏向SCR。針對各險(xiǎn)種制定的比例參數(shù),應(yīng)該2～3年調(diào)整一次。(5)中國精算師協(xié)會(huì)應(yīng)成立一系列工作組,專門跟蹤研究某些風(fēng)險(xiǎn)因子,收集數(shù)據(jù)測試參數(shù),持續(xù)提出更新參數(shù)的建議。

晏艷陽,宋美喆

湖南大學(xué)金融與統(tǒng)計(jì)學(xué)院

《我國能源利用效率影響因素分析》

本文以指數(shù)分解分析法為理論框架,對能源利用效率進(jìn)行分解,從中找出影響能源利用效率的主要因素,從而為能源優(yōu)化利用提供依據(jù)。本文使用的能源利用效率定義指標(biāo)為經(jīng)濟(jì)意義下的價(jià)值型指標(biāo),選用每萬噸標(biāo)準(zhǔn)煤的產(chǎn)出作為反映能源利用效率的指標(biāo),公式為:e=G/E。

其中E為能源總消費(fèi)量(萬噸標(biāo)準(zhǔn)煤),G為國內(nèi)生產(chǎn)總值(億元)。令e′=e-1=E/G,e′可解釋為能源消費(fèi)強(qiáng)度,因兩者互為倒數(shù)關(guān)系,影響e′的因素也即影響e的因素,應(yīng)用拉氏指數(shù)分解法對e′進(jìn)行分解,以得到能源利用效率的主要影響因素。Gi表示產(chǎn)業(yè)部門i的增加值,e′i表示產(chǎn)業(yè)部門i的能源消費(fèi)強(qiáng)度,令pi=Gi/G,最終分解結(jié)果為:

其中第一項(xiàng)是結(jié)構(gòu)調(diào)整的影響,第二項(xiàng)是能源強(qiáng)度變化的影響,r為分解剩余。國內(nèi)外許多學(xué)者把第二項(xiàng)稱為技術(shù)進(jìn)步的結(jié)果,因此,能源消費(fèi)強(qiáng)度的影響因素可歸結(jié)為結(jié)構(gòu)調(diào)整效應(yīng)和技術(shù)進(jìn)步效應(yīng),而能源消費(fèi)強(qiáng)度的倒數(shù)即為能源利用效率。用第三產(chǎn)業(yè)增加值占GDP比重來表示產(chǎn)業(yè)結(jié)構(gòu),表示為P,假定產(chǎn)業(yè)結(jié)構(gòu)的調(diào)整對能源利用效率的提高有促進(jìn)作用。一般用全要素生產(chǎn)率來表示技術(shù)進(jìn)步水平,假定能源利用效率會(huì)隨著技術(shù)水平提高而提高。

本文選取的考察期為1978—2008年,并以1978年為基期(1978=100)的CPI消除物價(jià)影響因素,數(shù)據(jù)來源于《中國統(tǒng)計(jì)年鑒2009》、《新中國五十年統(tǒng)計(jì)資料匯編》。

首先利用擴(kuò)展后的C-D生產(chǎn)G=A KαLβEγ函數(shù)計(jì)算全要素生產(chǎn)率A,式中L為勞動(dòng)投入量,用年末從業(yè)人員數(shù)表示;以能源總消費(fèi)量E作為能源投入量;以國內(nèi)生產(chǎn)總值G作為產(chǎn)出變量;K為資本投入量,用資本存量表示,具體測算方法參照Goldsmith的永續(xù)盤存法。對生產(chǎn)函數(shù)兩邊取對數(shù),采用OLS估計(jì)法,并假設(shè)規(guī)模報(bào)酬不變,將估計(jì)結(jié)果標(biāo)準(zhǔn)化后得到:α=0.070 7,β=0.565 8,γ= 01363 5,進(jìn)而可得1978—2008年的全要素生產(chǎn)率。

大部分學(xué)者都采用固定系數(shù)的最小二乘回歸作為研究能源利用效率的影響因素模型,本文首先采用該模型進(jìn)行分析,結(jié)果顯示DW值為0.166 1,表明殘差序列存在很強(qiáng)的自相關(guān)性,變量間關(guān)系并不穩(wěn)定,為單位根過程。為進(jìn)一步驗(yàn)證固定系數(shù)模型的穩(wěn)定性,利用遞歸最小二乘進(jìn)行檢驗(yàn),結(jié)果顯示,遞歸殘差在1998年后波動(dòng)劇烈,超出了二倍標(biāo)準(zhǔn)差范圍,說明固定系數(shù)模型不穩(wěn)定,不能反映能源利用效率與其影響因素之間的動(dòng)態(tài)過程,因此考慮利用時(shí)變參數(shù)模型進(jìn)行建模。

建立能源利用效率影響因素的變參數(shù)模型,使用卡爾曼濾波算法,得到估計(jì)結(jié)果為:

可決系數(shù)R2為0.999 9,說明模型對樣本有很好的解釋效果,DW值為2.007,不存在殘差序列自相關(guān),與固定系數(shù)模型回歸的DW值相比,自相關(guān)現(xiàn)象有了很大的改善,說明變系數(shù)模型能更好地描述能源利用效率與其影響因素之間的關(guān)系。

通過觀察結(jié)構(gòu)系數(shù)和技術(shù)系數(shù)的變動(dòng)趨勢,我們得到如下重要結(jié)論與建議:(1)從1978—2008年間,我國能源利用效率由技術(shù)進(jìn)步效應(yīng)和結(jié)構(gòu)調(diào)整效應(yīng)共同決定,技術(shù)進(jìn)步和結(jié)構(gòu)調(diào)整有助于能源利用效率的提高。(2)從時(shí)間趨勢上看,技術(shù)進(jìn)步和結(jié)構(gòu)調(diào)整對能源利用效率的影響力是持續(xù)緩慢上升的,說明兩者對能源效率的影響力是持續(xù)存在的。(3)相對于技術(shù)進(jìn)步而言,結(jié)構(gòu)調(diào)整是能源利用效率更重要的影響因素,因此加快第三產(chǎn)業(yè)發(fā)展,充分利用比較優(yōu)勢,將資源更多的投入到第三產(chǎn)業(yè),實(shí)現(xiàn)產(chǎn)業(yè)重點(diǎn)轉(zhuǎn)移,對于提高能源利用效率起著決定性的作用。同時(shí),淘汰落后產(chǎn)能,鼓勵(lì)技術(shù)進(jìn)步和產(chǎn)業(yè)升級,將成為今后一個(gè)時(shí)期提高能源利用效率的重大主題。

楊廷干

上海金融學(xué)院公共經(jīng)濟(jì)管理學(xué)院

《金融統(tǒng)計(jì)學(xué)學(xué)科發(fā)展與專業(yè)建設(shè)》

統(tǒng)計(jì)科學(xué)、統(tǒng)計(jì)學(xué)科與統(tǒng)計(jì)專業(yè) 科學(xué)是基于問題而存在的。如何有效獲取數(shù)據(jù),如何探尋數(shù)據(jù)中蘊(yùn)含的統(tǒng)計(jì)規(guī)律,如何利用統(tǒng)計(jì)數(shù)據(jù)輔助決策,我們面臨大量的統(tǒng)計(jì)問題。無論自然技術(shù)領(lǐng)域的統(tǒng)計(jì)問題,還是社會(huì)經(jīng)濟(jì)領(lǐng)域的統(tǒng)計(jì)問題,都是客觀存在的?！皵?shù)據(jù)”不是“數(shù)”,統(tǒng)計(jì)學(xué)不同于數(shù)學(xué),統(tǒng)計(jì)學(xué)有自己獨(dú)立的研究對象。為規(guī)范學(xué)科建設(shè)、方便科技統(tǒng)計(jì)及開展大規(guī)模人才培養(yǎng)和學(xué)位管理的需要,大多數(shù)國家都有自己的學(xué)科專業(yè)分類標(biāo)準(zhǔn)。我國目前權(quán)威的有三個(gè):國家技術(shù)監(jiān)督局頒布實(shí)施的《學(xué)科分類與代碼》,國務(wù)院學(xué)位委員會(huì)、教育部頒布實(shí)施的《授予博士、碩士學(xué)位和培養(yǎng)研究生的學(xué)科、專業(yè)目錄》,以及教育部頒布實(shí)施的《普通高等學(xué)校本科專業(yè)目錄》。這三個(gè)學(xué)科專業(yè)分類標(biāo)準(zhǔn)對統(tǒng)計(jì)學(xué)學(xué)科屬性的認(rèn)定不完全一致。

對統(tǒng)計(jì)學(xué)學(xué)科屬性的不同認(rèn)識(shí),既反映了統(tǒng)計(jì)學(xué)學(xué)科屬性的復(fù)雜性,也反映了統(tǒng)計(jì)學(xué)學(xué)科發(fā)展的多樣性。一方面,統(tǒng)計(jì)問題廣泛存在于自然、社會(huì)和人類思維的各個(gè)領(lǐng)域。盡管統(tǒng)計(jì)問題都有“數(shù)據(jù)”共性,但“數(shù)據(jù)”總是有具體內(nèi)容附依其上的,各領(lǐng)域的統(tǒng)計(jì)問題都有自己的特殊性。另一方面,統(tǒng)計(jì)學(xué)從迥然有異的幾個(gè)不同領(lǐng)域發(fā)展起來,國勢學(xué)派和政治算術(shù)學(xué)派一開始就“以問題帶方法”的風(fēng)格研究“國家顯著事項(xiàng)”,數(shù)理統(tǒng)計(jì)學(xué)派則以概率論在社會(huì)經(jīng)濟(jì)領(lǐng)域的應(yīng)用發(fā)端,更多“以方法帶問題”。當(dāng)代統(tǒng)計(jì)學(xué)不同學(xué)派相互借鑒、互為補(bǔ)充,理論統(tǒng)計(jì)學(xué)按照自身的發(fā)展規(guī)律分化、綜合,應(yīng)用統(tǒng)計(jì)學(xué)則既從理論統(tǒng)計(jì)學(xué)汲取營養(yǎng),又從實(shí)質(zhì)性學(xué)科汲取營養(yǎng),既為理論統(tǒng)計(jì)學(xué)提出新課題,又不斷為理論統(tǒng)計(jì)學(xué)延伸觸角、開辟新的應(yīng)用領(lǐng)域。我們并不能把“學(xué)科存在與否”作為論斷“科學(xué)存在與否”的依據(jù),也不能把“專業(yè)存在與否”作為“學(xué)科存在與否”的依據(jù)。

一般來說,學(xué)科分類差異小于專業(yè)分類差異,可以理解為學(xué)科分類主要服務(wù)于科學(xué)研究,科學(xué)研究無國界,所以差異小;專業(yè)分類主要服務(wù)于人才培養(yǎng),人才培養(yǎng)直接服務(wù)于社會(huì)經(jīng)濟(jì)建設(shè),專業(yè)建設(shè)必須適合國情,所以專業(yè)設(shè)置在不同國家的差異要大一些。國外統(tǒng)計(jì)學(xué)學(xué)科研究的范圍十分廣泛,國際統(tǒng)計(jì)學(xué)會(huì)(ISI)的觀點(diǎn)具有一定代表性。他們認(rèn)為“統(tǒng)計(jì)學(xué)是一組科學(xué)”,覆蓋面包括統(tǒng)計(jì)學(xué)基本理論、經(jīng)濟(jì)統(tǒng)計(jì)學(xué)、社會(huì)統(tǒng)計(jì)學(xué)、自然科學(xué)類統(tǒng)計(jì)學(xué)、統(tǒng)計(jì)活動(dòng)理論、統(tǒng)計(jì)學(xué)史研究和其他。這種觀點(diǎn)基本上綜合了官方統(tǒng)計(jì)學(xué)、學(xué)院統(tǒng)計(jì)學(xué)的觀點(diǎn),能客觀反映國外統(tǒng)計(jì)研究的全貌,對我國構(gòu)造和完善統(tǒng)計(jì)學(xué)學(xué)科體系具有借鑒意義。

金融統(tǒng)計(jì)學(xué)學(xué)科建設(shè) 金融統(tǒng)計(jì)學(xué)是經(jīng)濟(jì)統(tǒng)計(jì)學(xué)的重要組成部分,它以經(jīng)濟(jì)、金融理論為基礎(chǔ),運(yùn)用統(tǒng)計(jì)技術(shù)解決金融領(lǐng)域各種各樣的理論和實(shí)踐問題,包括金融經(jīng)濟(jì)學(xué)理論檢驗(yàn)、金融政策模擬、風(fēng)險(xiǎn)管理、金融模型的建立與估計(jì)、資產(chǎn)定價(jià)、衍生產(chǎn)品定價(jià)、資產(chǎn)配置策略、波動(dòng)率估計(jì)等等。金融統(tǒng)計(jì)以其獨(dú)有的研究對象和研究方法,成為統(tǒng)計(jì)學(xué)中相對獨(dú)立、頗具特色和最為活躍的研究領(lǐng)域之一。

金融統(tǒng)計(jì)研究的使命,一是豐富統(tǒng)計(jì)方法論,不斷產(chǎn)生新的金融分析方法,做應(yīng)用方法的研究;二是得出新結(jié)論,通過數(shù)量分析提供咨詢建議,進(jìn)行決策論證,做方法應(yīng)用的研究。應(yīng)用方法研究與方法應(yīng)用研究并不矛盾,實(shí)際研究過程中往往是緊密結(jié)合在一起的?；仡櫧y(tǒng)計(jì)學(xué)說發(fā)展的歷史會(huì)有一個(gè)深刻的印象,這就是不少重要的統(tǒng)計(jì)方法,都是在具體經(jīng)濟(jì)問題的分析研究中經(jīng)過概括、提煉、抽象、升華而成的。生產(chǎn)總值指標(biāo)的產(chǎn)生,投入產(chǎn)出方法的提出,很多經(jīng)濟(jì)指數(shù)方法的創(chuàng)新,莫不如此。斯通、庫茲涅茨、列昂節(jié)夫因?yàn)樵趪窠?jīng)濟(jì)核算體系、國民收入核算和投入產(chǎn)出核算方面的杰出貢獻(xiàn)而先后獲諾貝爾經(jīng)濟(jì)學(xué)獎(jiǎng),但他們同時(shí)也是經(jīng)濟(jì)統(tǒng)計(jì)學(xué)派的杰出代表人物,庫茲涅茨還曾擔(dān)任美國統(tǒng)計(jì)學(xué)會(huì)的會(huì)長(有趣的是自然技術(shù)領(lǐng)域的應(yīng)用統(tǒng)計(jì)學(xué)亦有同樣的現(xiàn)象,有的生物統(tǒng)計(jì)學(xué)家在遺傳學(xué)、生物學(xué)領(lǐng)域有比統(tǒng)計(jì)學(xué)領(lǐng)域更大的知名度)?？梢婋x開具體的經(jīng)濟(jì)環(huán)境,不對經(jīng)濟(jì)現(xiàn)象進(jìn)行深入的定性分析,就方法論方法,統(tǒng)計(jì)方法創(chuàng)新就會(huì)走進(jìn)死胡同。

金融統(tǒng)計(jì)學(xué)專業(yè)建設(shè)與教學(xué)改革本專業(yè)培養(yǎng)具有深厚金融、經(jīng)濟(jì)學(xué)基礎(chǔ)知識(shí),掌握統(tǒng)計(jì)學(xué)基本理論和方法,并能熟練運(yùn)用計(jì)算機(jī)分析數(shù)據(jù),解決金融、保險(xiǎn)、商務(wù)統(tǒng)計(jì)、經(jīng)濟(jì)統(tǒng)計(jì)等實(shí)際問題的專門人才;能夠在金融監(jiān)管部門從事統(tǒng)計(jì)調(diào)查和數(shù)據(jù)分析工作,在商業(yè)銀行、證券及保險(xiǎn)等金融機(jī)構(gòu)從事金融產(chǎn)品設(shè)計(jì)、行業(yè)分析、風(fēng)險(xiǎn)控制及預(yù)測等工作,或在其他跨國企業(yè)從事質(zhì)量控制、數(shù)據(jù)挖掘、流程優(yōu)化等工作的“應(yīng)用型、復(fù)合型、創(chuàng)新型、國際化”高級專門人才。

金融統(tǒng)計(jì)學(xué)專業(yè)建設(shè)應(yīng)注重對教學(xué)內(nèi)容、課程設(shè)置的優(yōu)化組合。統(tǒng)計(jì)學(xué)作為橫斷性學(xué)科,要注重學(xué)科交叉,從學(xué)科的整體發(fā)展與綜合化出發(fā),合理構(gòu)建教學(xué)內(nèi)容與課程體系,注重其他學(xué)科知識(shí)對本學(xué)科的影響及在本學(xué)科領(lǐng)域中的應(yīng)用,在精選知識(shí)、交叉融合和課程整合重組上下工夫,在課程設(shè)置上至少應(yīng)當(dāng)包括四個(gè)方面的內(nèi)容:(1)統(tǒng)計(jì)理論與方法類課程;(2)金融理論與實(shí)務(wù)類課程;(3)統(tǒng)計(jì)軟件類課程;(4)金融統(tǒng)計(jì)實(shí)驗(yàn)類課程。

金融統(tǒng)計(jì)學(xué)專業(yè)建設(shè)要正確處理通才教育與定位培養(yǎng)的關(guān)系。一方面,長期以來我國過分專業(yè)化的高等教育導(dǎo)致人的片面發(fā)展,需要通過加強(qiáng)通識(shí)教育予以矯正。通識(shí)教育涉及的是基礎(chǔ)性、綜合性、有效性以及可遷移性比較強(qiáng)的知識(shí),加強(qiáng)通識(shí)教育,有利于增強(qiáng)學(xué)生的適應(yīng)能力、應(yīng)變能力和綜合素質(zhì)。另一方面,金融統(tǒng)計(jì)學(xué)作為應(yīng)用經(jīng)濟(jì)學(xué)科,培養(yǎng)從事金融數(shù)量分析的應(yīng)用型高級專門人才,專業(yè)定位應(yīng)該定位到“金融數(shù)量分析”上來。要以“金融數(shù)量分析”能力為主線,處理好知識(shí)、能力與素質(zhì)的關(guān)系。以差異化競爭策略,凸顯金融統(tǒng)計(jì)特色,增強(qiáng)學(xué)生的就業(yè)競爭力。

應(yīng)用性人才的培養(yǎng)要把實(shí)踐能力的培養(yǎng)擺在突出位置,改變傳統(tǒng)教育模式下實(shí)踐教學(xué)從屬理論教學(xué)的狀況,構(gòu)筑一個(gè)合理的實(shí)踐能力體系,并從整體上策劃每個(gè)實(shí)踐教學(xué)環(huán)節(jié)。不應(yīng)狹隘地理解實(shí)踐教學(xué),課程設(shè)計(jì)、社會(huì)調(diào)查、案例課都可以作為統(tǒng)計(jì)學(xué)課程的重要實(shí)踐形式。要把統(tǒng)計(jì)軟件應(yīng)用能力的培養(yǎng)和統(tǒng)計(jì)學(xué)專業(yè)課程的實(shí)踐教學(xué)有機(jī)結(jié)合在一起,廣泛開展案例教學(xué),在處理大量數(shù)據(jù)和討論實(shí)際問題的過程中,鍛煉提高學(xué)生發(fā)現(xiàn)問題、分析問題和解決問題的能力。

袁衛(wèi)1,惠爭勤2

1.中國人民大學(xué)統(tǒng)計(jì)學(xué)院;2.中國人民銀行營業(yè)管理部

《中國發(fā)展指數(shù)(RCDI)的編制與應(yīng)用》

2007年2月26日,RCDI指數(shù)由中國人民大學(xué)首次發(fā)布。與HDI指數(shù)相比,RCDI指數(shù)指標(biāo)體系更加完備,具體指標(biāo)選擇符合中國國情和實(shí)際研究需要,無量綱化方法更加科學(xué),對中國社會(huì)經(jīng)濟(jì)發(fā)展的實(shí)際問題具有更強(qiáng)的解釋能力。

RCDI指數(shù)的編制方法介紹在聯(lián)合國人類發(fā)展指數(shù)(HDI指數(shù))的成功經(jīng)驗(yàn)基礎(chǔ)上,為了達(dá)到評價(jià)結(jié)論直觀、通俗,容易讓人接受;指標(biāo)體系應(yīng)充分體現(xiàn)發(fā)展的全面性;評價(jià)方法要強(qiáng)調(diào)均衡發(fā)展的構(gòu)造理念,RCDI指數(shù)采用多指標(biāo)綜合評價(jià)方法。整個(gè)評價(jià)系統(tǒng)由四個(gè)子系統(tǒng)構(gòu)成:指標(biāo)體系、指標(biāo)權(quán)數(shù)體系、計(jì)算方法體系(無量綱化方法)、合成方法體系。每個(gè)體系之間沒有信息傳遞關(guān)系,方法都可擇優(yōu)選取,然后組合這些方法。

(1)指標(biāo)體系。為了充分體現(xiàn)發(fā)展的全面性,客觀評價(jià)社會(huì)經(jīng)濟(jì)綜合發(fā)展的實(shí)際情況,RCDI指數(shù)采用政府統(tǒng)計(jì)部門公開發(fā)表的地區(qū)社會(huì)經(jīng)濟(jì)指標(biāo),不僅對HDI三個(gè)分項(xiàng)指數(shù)的指標(biāo)進(jìn)行了適應(yīng)性改進(jìn)和擴(kuò)充,同時(shí),由于中國經(jīng)濟(jì)在高速發(fā)展的同時(shí)帶來了一系列日益嚴(yán)重的社會(huì)環(huán)境問題,凸顯出社會(huì)環(huán)境建設(shè)在整個(gè)社會(huì)經(jīng)濟(jì)發(fā)展中的核心地位,為此,引入了社會(huì)環(huán)境分項(xiàng)指數(shù)。整個(gè)指標(biāo)體系分為四個(gè)分項(xiàng)指數(shù),共有15項(xiàng)指標(biāo),具體為:健康指數(shù)有出生預(yù)期壽命、嬰兒死亡率、每萬人平均病床數(shù)3項(xiàng)指標(biāo);教育指數(shù)有人均受教育年限、大專以上程度人口比例2項(xiàng)指標(biāo);生活水平指數(shù)有農(nóng)村居民年人均收入、人均GDP、城鄉(xiāng)居民年人均消費(fèi)比、城市居民恩格爾系數(shù)4項(xiàng)指標(biāo);社會(huì)環(huán)境指數(shù)有城鎮(zhèn)失業(yè)登記率、第三產(chǎn)業(yè)產(chǎn)值占GDP比例、人均道路面積、單位地區(qū)生產(chǎn)總值能耗、省會(huì)城市API、單位產(chǎn)值污水耗氧量6項(xiàng)指標(biāo)。

(2)指標(biāo)權(quán)數(shù)體系。在多指標(biāo)綜合評價(jià)中,權(quán)數(shù)的確定直接影響著綜合評價(jià)的結(jié)果。在RCDI指數(shù)的權(quán)數(shù)結(jié)構(gòu)中,我們認(rèn)為健康、教育、生活水平和社會(huì)環(huán)境四個(gè)單項(xiàng)指標(biāo)對總指數(shù)的重要性應(yīng)當(dāng)是相等的,即采用等權(quán)設(shè)計(jì),以體現(xiàn)協(xié)調(diào)發(fā)展的理念。

(3)無量綱化方法。對于多指標(biāo)綜合評價(jià)體系,必須將性質(zhì)和計(jì)量單位不同的指標(biāo)進(jìn)行無量綱化處理,這樣才便于指標(biāo)之間進(jìn)行對比。無量綱化函數(shù)的選取,一般要求嚴(yán)格單調(diào)、取值區(qū)間明確、結(jié)果直觀、意義明確、盡量不受指標(biāo)正向或逆向形式的影響。為此,RCDI指數(shù)構(gòu)建了新的指數(shù)功效函數(shù),公式為:d=Ae(x-xs)/(xh-xs)B。d為單項(xiàng)評價(jià)指標(biāo)的評價(jià)值(或功效分值),x為單項(xiàng)指標(biāo)的實(shí)際值,xs為不容許值(或不允許值),xh為滿意值(或剛?cè)菰S值),A、B為正的待定參數(shù)。

相比常見的功效函數(shù),如線性功效函數(shù)法(或稱傳統(tǒng)功效函數(shù)法)、指數(shù)型功效函數(shù)法、對數(shù)型功效函數(shù)法、冪函數(shù)型功效函數(shù)法,新指數(shù)功效函數(shù)具有以下幾個(gè)優(yōu)點(diǎn):正向指標(biāo)與逆向指標(biāo)具有統(tǒng)一的功效函數(shù)形式,彌補(bǔ)了指數(shù)記分模型的缺點(diǎn);具有下凸性,很好地解決了正向指標(biāo)和逆向指標(biāo)越接近滿意值功效分值上升越快的問題;該無量綱化方法不受樣本變動(dòng)的影響,彌補(bǔ)了指數(shù)記分模型的缺點(diǎn);對于互補(bǔ)型的指標(biāo),其正向形式和逆向形式均具有統(tǒng)一的功效函數(shù)形式;指標(biāo)值可以超過滿意值和不允許值,便于進(jìn)行歷史對比,彌補(bǔ)了冪功效函數(shù)的缺點(diǎn)。

(4)合成方法。在合成方法上,RCDI指數(shù)摒棄了“吃大鍋飯”式平均的加權(quán)算術(shù)平均合成模型,采用加權(quán)幾何平均合成模型,該模型強(qiáng)調(diào)被評價(jià)對象各指標(biāo)評價(jià)值的一致性,突出了指標(biāo)評價(jià)值較小的指標(biāo)作用,并且對指標(biāo)評價(jià)值的變動(dòng)反映更靈敏,更有助于拉開被評價(jià)對象的檔次,綜合評價(jià)的效度更高。該模型公式如下:

各被評價(jià)對象的第j分項(xiàng)指數(shù)的評價(jià)值:dj=為第i項(xiàng)指標(biāo)的權(quán)數(shù),wj為第j分項(xiàng)指數(shù)的權(quán)數(shù)。

RCDI指數(shù)的應(yīng)用進(jìn)展概述到目前為止,已經(jīng)發(fā)布了5期研究成果(詳見各年《中國指數(shù)發(fā)展報(bào)告》)。這些研究從單維度、多維度橫向及縱向等多種視角,對中國各地區(qū)多方面的社會(huì)經(jīng)濟(jì)發(fā)展及差異進(jìn)行了剖析,取得了一定成果。從應(yīng)用研究的進(jìn)展看,隨著RCDI指數(shù)數(shù)據(jù)的公布,基于空間和時(shí)間序列數(shù)據(jù)的面板數(shù)據(jù)分析,將成為研究的重點(diǎn)。

(1)RCDI指數(shù)面板數(shù)據(jù)的計(jì)量建模。單獨(dú)應(yīng)用時(shí)間序列數(shù)據(jù)或截面數(shù)據(jù)來檢驗(yàn)和分析的計(jì)量經(jīng)濟(jì)學(xué)方法都存在不同程度的缺陷,而面板數(shù)據(jù)建模卻具有許多純時(shí)間序列或截面數(shù)據(jù)沒有的優(yōu)點(diǎn),但其計(jì)量建模過程比較復(fù)雜,主要集中在兩個(gè)領(lǐng)域:一是非線性模型研究;二是動(dòng)態(tài)線性模型單位根和協(xié)整檢驗(yàn)的應(yīng)用研究。存在的問題集中在構(gòu)造具有漸近特性的統(tǒng)計(jì)量和檢驗(yàn)方法上,在有限樣本的情況下模型效果還需再進(jìn)一步探討。

(2)RCDI指數(shù)面板數(shù)據(jù)的聚類分析。2002年, Bonzo D.C.和Hermosila A.Y.等統(tǒng)計(jì)學(xué)家開創(chuàng)性地將多元統(tǒng)計(jì)方法引入到面板數(shù)據(jù)的分析中來,國內(nèi)學(xué)者朱建平在2007年對單指標(biāo)面板數(shù)據(jù)的聚類分析進(jìn)行了實(shí)證分析,鄭兵云在2008年構(gòu)造了多指標(biāo)面板數(shù)據(jù)的距離函數(shù)和離差平方和函數(shù),并說明了多指標(biāo)面板數(shù)據(jù)的聚類分析過程,這些都給我們進(jìn)行RCDI指數(shù)面板數(shù)據(jù)的聚類分析提供了有益的參考。

張波,畢濤

中國人民大學(xué)統(tǒng)計(jì)學(xué)院

《基于高頻數(shù)據(jù)的中國股票市場日內(nèi)序列相關(guān)、已實(shí)現(xiàn)波動(dòng)率及交易量動(dòng)態(tài)性研究》

在金融領(lǐng)域,對資產(chǎn)收益率序列相關(guān)性進(jìn)行研究具有重要的意義,因?yàn)橘Y產(chǎn)收益率序列的序列相關(guān)性有助于揭示交易過程的基本特征。Fama (1970)的弱式市場有效性假說認(rèn)為收益率序列應(yīng)該不存在序列相關(guān)性,如若不然,則資產(chǎn)價(jià)格過程就在某種程度上可以預(yù)測,即理性交易者可以通過預(yù)測資產(chǎn)價(jià)格獲得超額收益。國內(nèi)外很多學(xué)者研究了個(gè)股和股指收益率序列的序列相關(guān)性,如Roll(1984), Atchison et al.(1987),Lo and MacKinlay(1988, 1989),Goodhart and Figliuoli(1991),McNish and Wood(1991)。以上實(shí)證研究的結(jié)果雖然不同,但都是基于日收益率序列、周收益率序列甚至是月收益率序列等低頻數(shù)據(jù)的。近年來,隨著高頻和超高頻數(shù)據(jù)的質(zhì)量和可獲得性的提高,很多學(xué)者利用高頻和超高頻數(shù)據(jù)來研究資產(chǎn)收益率序列的序列相關(guān)性問題。除了直接研究資產(chǎn)收益率序列的序列相關(guān)性之外,一些理論和實(shí)證研究探索了收益率序列的序列相關(guān)性與波動(dòng)率及交易量等因素之間的關(guān)系。John Campbell,Sanford Grossman and Jiang Wang (1993)研究發(fā)現(xiàn)序列相關(guān)與交易量間存在負(fù)相關(guān); Thomas and Patnaik(2003)發(fā)現(xiàn)在印度股票市場序列相關(guān)與流動(dòng)性之間存在著一定的聯(lián)系;Bianco and Reno(2006)研究美國標(biāo)準(zhǔn)普爾500期貨數(shù)據(jù),發(fā)現(xiàn)序列相關(guān)與總的日波動(dòng)率呈正相關(guān)而與未預(yù)期到的波動(dòng)率呈負(fù)相關(guān);Tao Bi,Bo Zhang and Rong Xu(2010)研究了中國股票市場收益率序列的日內(nèi)序列相關(guān)的動(dòng)態(tài)性,并發(fā)現(xiàn)序列相關(guān)與已實(shí)現(xiàn)波動(dòng)率的不同成分之間存在復(fù)雜的關(guān)系。

本文深入研究了中國股票市場收益率序列日內(nèi)序列相關(guān)的動(dòng)態(tài)性,并探索性的研究了日內(nèi)序列相關(guān)、已實(shí)現(xiàn)波動(dòng)率及交易量之間的關(guān)系。我們在Bi2 anco and Reno(2006,2009)、Tao Bi,Bo Zhang and Rong Xu(2010)研究的基礎(chǔ)上對已有方法進(jìn)行了擴(kuò)展。利用方差比檢驗(yàn)統(tǒng)計(jì)量數(shù)量化上證綜指收益率序列的日內(nèi)序列相關(guān)性,并對其日內(nèi)序列相關(guān)的動(dòng)態(tài)性進(jìn)行了深入的分析。另外,基于高頻數(shù)據(jù)波動(dòng)率建模最新理論發(fā)展,我們將已實(shí)現(xiàn)波動(dòng)率分解為其連續(xù)成分及跳成分,并分析日內(nèi)序列相關(guān)性與已實(shí)現(xiàn)波動(dòng)率不同成分之間的關(guān)系。最后,構(gòu)建模型研究了日內(nèi)序列相關(guān)性與交易量之間的關(guān)系。

基于上證綜指收益率序列數(shù)據(jù)的實(shí)證結(jié)果顯示,上證綜指收益率在5分鐘的間隔上表現(xiàn)出顯著的序列相關(guān)性,這就拒絕了上海證券交易所弱式市場有效的假說,即對于上交所而言,其上市股票具有一定的可預(yù)測性。上證綜指收益率序列存在正的序列相關(guān)性,正的序列相關(guān)性可由異步交易現(xiàn)象解釋,即構(gòu)成上證綜指的各個(gè)股票對市場信息的響應(yīng)速度不一樣,流動(dòng)性強(qiáng)的股票反應(yīng)較快,流動(dòng)性弱的股票反應(yīng)滯后,造成指數(shù)收益存在正相關(guān)性。另外,就序列相關(guān)性與不同因素之間的關(guān)系看,序列相關(guān)與已實(shí)現(xiàn)波動(dòng)率的跳成分之間存在負(fù)相關(guān)關(guān)系,這一結(jié)果顯示市場的極端不確定性將會(huì)減弱資產(chǎn)價(jià)格過程的可預(yù)測性;序列相關(guān)與已實(shí)現(xiàn)波動(dòng)率的連續(xù)部分也存在負(fù)相關(guān)關(guān)系,這與LeBaron效應(yīng)是一致的;而序列相關(guān)與交易量之間也存在顯著的負(fù)相關(guān)。我們的研究對于深入揭示我國股票市場收益率序列的日內(nèi)序列相關(guān)動(dòng)態(tài)性及序列相關(guān)與不同影響因素之間的關(guān)系具有重要的參考價(jià)值。

余超,張景肖

中國人民大學(xué)統(tǒng)計(jì)學(xué)院

《帶跳的Markov轉(zhuǎn)換隨機(jī)波動(dòng)率模型的貝葉斯估計(jì)》

波動(dòng)率是金融市場風(fēng)險(xiǎn)的重要測度,已有對金融高頻數(shù)據(jù)的研究表明,金融資產(chǎn)價(jià)格的樣本軌道并非連續(xù),而是存在著跳躍,這種跳躍行為與金融市場的重大事件息息相關(guān)。另一方面,對波動(dòng)持續(xù)性的研究發(fā)現(xiàn),這種外界沖擊對市場波動(dòng)性影響的高持續(xù)性往往是虛假的,波動(dòng)過程存在著結(jié)構(gòu)變化。

大量對波動(dòng)率的研究都只單獨(dú)考慮了資產(chǎn)價(jià)格過程的跳躍行為或波動(dòng)的結(jié)構(gòu)變化。已有實(shí)證研究顯示兩者在相同的時(shí)期出現(xiàn),并且都與這一時(shí)期市場的外部沖擊有關(guān),因而我們有必要考察它們之間的相互聯(lián)系。鑒于此,本文提出了一個(gè)新的隨機(jī)波動(dòng)率模型,它能夠同時(shí)刻畫價(jià)格過程的跳躍行為和波動(dòng)的變結(jié)構(gòu)現(xiàn)象,將兩者納入同一模型框架中進(jìn)行研究。模型中跳的部分能夠捕捉到由市場外部因素沖擊導(dǎo)致的價(jià)格的劇烈變動(dòng),同時(shí)具有馬爾可夫轉(zhuǎn)換性質(zhì)的波動(dòng)過程能夠?qū)⑦@種外界沖擊造成的波動(dòng)的偽持續(xù)性分解為波動(dòng)體制的持續(xù)性和體制內(nèi)的波動(dòng)持續(xù)性,真實(shí)地反映波動(dòng)的持續(xù)性特征。

在模型參數(shù)估計(jì)方面,由于模型包含跳、波動(dòng)狀態(tài)、波動(dòng)率等較多的隱含變量,模型的顯示似然函數(shù)無法得到,給參數(shù)估計(jì)造成困難。本文利用數(shù)據(jù)添加的思想,將隱含變量看成模型的未知參數(shù),從貝葉斯的角度對模型分析,利用MCMC模擬與Gibbs抽樣方法得到了模型參數(shù)及隱含變量的貝葉斯估計(jì)。

最后,利用帶跳的Markov轉(zhuǎn)換隨機(jī)波動(dòng)率模型對美國股票市場進(jìn)行了實(shí)證分析,結(jié)果顯示,股票市場在不同的波動(dòng)狀態(tài)之間進(jìn)行轉(zhuǎn)換,波動(dòng)的偽持續(xù)性可由體制狀態(tài)的持續(xù)性來解釋,因而體制內(nèi)的波動(dòng)持續(xù)性大大降低。經(jīng)濟(jì)低迷時(shí)期市場往往處于高波動(dòng)狀態(tài),高波動(dòng)狀態(tài)又往往預(yù)示著價(jià)格過程的大的跳躍行為的發(fā)生,大跳過后市場壓力得以緩解,市場又會(huì)重歸于較為平穩(wěn)的低波動(dòng)狀態(tài)。實(shí)證研究進(jìn)一步表明,本文提出的模型能夠同時(shí)識(shí)別價(jià)格過程的跳和波動(dòng)結(jié)構(gòu),并能夠有效地結(jié)合資產(chǎn)價(jià)格過程的跳躍行為來分析市場的波動(dòng)結(jié)構(gòu)。

鄒國華

中國科學(xué)院數(shù)學(xué)與系統(tǒng)科學(xué)研究院

《PPSWR抽樣下缺失數(shù)據(jù)的插補(bǔ)》

在抽樣調(diào)查的實(shí)踐中,無回答現(xiàn)象是普遍存在的。這種現(xiàn)象產(chǎn)生的原因多種多樣,對調(diào)查的結(jié)果具有較大的影響。無回答一般區(qū)分為單元無回答和項(xiàng)目無回答,本文考慮后者。對項(xiàng)目無回答的處理除了設(shè)法提高回答率外,常常采用插補(bǔ)方法。文獻(xiàn)中已提出了許多這樣的方法,常見的包括比率插補(bǔ)、回歸插補(bǔ)、熱平臺(tái)插補(bǔ)等,但這些方法的提出常假定簡單隨機(jī)抽樣和均勻回答機(jī)制,本文的目的是研究在PPSWR這種不等概率抽樣方式下缺失數(shù)據(jù)的插補(bǔ)方法,且回答機(jī)制可以是非均勻的。

插補(bǔ)方法及目標(biāo)量的估計(jì)

(1)記號(hào)。用U表示大小為N的有限總體,Y表示感興趣的指標(biāo)值,X表示輔助變量,且假定輔助信息是完全的。sn是采用PPSWR抽樣從U中抽取的大小為n的樣本,sr是回答單元的集合,大小為r≥1,sn-r是無回答單元的集合,大小為n-r,pi是單元i的回答概率,qi=1-pi。

(2)pi已知的情形。對單元i∈sn-r,構(gòu)造如下的插補(bǔ)值:

相應(yīng)的總體均值?Y的估計(jì)為

可以證明,?Y∧PPS在非均勻回答機(jī)制下是設(shè)計(jì)無偏的,其方差為

其中Zi=Xi/X,Y和X是總體總和。而方差的Jackknife估計(jì)為

它也是關(guān)于設(shè)計(jì)無偏的。

(3)pi未知的情形。在實(shí)際中,回答概率pi一般是未知的,此時(shí)我們考慮如下的參數(shù)模型:pi= g(xi;θ),其中g(shù)是一個(gè)已知的光滑函數(shù),θ是未知參數(shù)。這樣的模型的一個(gè)常見例子是logistic回歸模型。假設(shè)參數(shù)θ的估計(jì)為^θ,則pi的估計(jì)為^pi= g(xi;^θ),相應(yīng)地,?Y的估計(jì)為

而其Jackknife方差估計(jì)為

在一定的正則條件下,我們有

特殊情形:簡單隨機(jī)抽樣與均勻回答此時(shí)pi全部相等且可用r/n估計(jì)。相應(yīng)地,?Y的估計(jì)量為

可以證明,這兩個(gè)方差估計(jì)量都是漸近設(shè)計(jì)無偏的。

[責(zé)任編輯:杜一哲(中文),王南豐(英文)]

猜你喜歡

普查模型

一半模型
童話王國·奇妙邏輯推理(2024年5期)2024-06-19 16:03:38
立即全面普查警惕二代粘蟲發(fā)生
今日農(nóng)業(yè)(2022年13期)2022-09-15 01:17:58
胡春華強(qiáng)調(diào)：確保脫貧攻堅(jiān)普查取得圓滿成功
今日農(nóng)業(yè)(2021年4期)2021-11-27 08:41:35
p150Glued在帕金森病模型中的表達(dá)及分布
成都醫(yī)學(xué)院學(xué)報(bào)(2021年2期)2021-07-19 08:35:14
重要模型『一線三等角』
中學(xué)生數(shù)理化·七年級數(shù)學(xué)人教版(2020年10期)2020-11-26 08:24:50
重尾非線性自回歸模型自加權(quán)M-估計(jì)的漸近分布
數(shù)學(xué)物理學(xué)報(bào)(2020年2期)2020-06-02 11:29:24
3D打印中的模型分割與打包
光學(xué)精密工程(2016年6期)2016-11-07 09:07:19
關(guān)于農(nóng)業(yè)文化遺產(chǎn)普查與保護(hù)的思考
自然與文化遺產(chǎn)研究(2016年2期)2016-05-17 05:54:37
FLUKA幾何模型到CAD幾何模型轉(zhuǎn)換方法初步研究
核科學(xué)與工程(2015年4期)2015-09-26 11:59:03
我國第六次與第五次體育場地普查結(jié)果的比較分析
吉林體育學(xué)院學(xué)報(bào)(2015年5期)2015-02-28 01:09:19

統(tǒng)計(jì)與信息論壇2011年1期

統(tǒng)計(jì)與信息論壇的其它文章
負(fù)二項(xiàng)分布的兩種近似分布及其比較
分層線性模型的最大后驗(yàn)估計(jì)

雜志排行

1《師道·教研》2024年10期
2《思維與智慧·上半月》2024年11期
3《現(xiàn)代工業(yè)經(jīng)濟(jì)和信息化》2024年2期
4《微型小說月報(bào)》2024年10期
5《工業(yè)微生物》2024年1期
6《雪蓮》2024年9期
7《世界博覽》2024年21期
8《中小企業(yè)管理與科技》2024年6期
9《現(xiàn)代食品》2024年4期
10《衛(wèi)生職業(yè)教育》2024年10期