摘 要:機器學(xué)習(xí)方法是全基因組選擇研究的重要分支,深度學(xué)習(xí)是近年來機器學(xué)習(xí)領(lǐng)域新的研究熱點。本文介紹了機器學(xué)習(xí)以及深度學(xué)習(xí)全基因組選擇研究的原理和應(yīng)用發(fā)展,分別從模型框架、模型參數(shù)、特征選擇等方面對深度學(xué)習(xí)全基因組育種值估計研究進(jìn)展進(jìn)行了闡述,探討了深度學(xué)習(xí)全基因組選擇研究中面臨的一些的問題,并對未來進(jìn)行了展望。
關(guān)鍵詞:全基因組選擇;研究進(jìn)展;機器學(xué)習(xí);深度學(xué)習(xí);原理與應(yīng)用
中圖分類號:S813.1
文獻(xiàn)標(biāo)志碼:A
文章編號:0366-6964(2024)06-2281-12
收稿日期:2023-11-30
基金項目:國家自然科學(xué)基金(32272843)
作者簡介:李 竟(1999-),男,陜西榆林人,碩士生,主要從事深度學(xué)習(xí)全基因組選擇研究,E-mail:lijing5467@126.com
*通信作者:朱 波,主要從事肉牛分子數(shù)量遺傳學(xué)研究,E-mail:zhubo@caas.cn;郭 鵬,主要從事并行全基因組選擇技術(shù)研究,E-mail:super_guopeng@163.com
Research Progress in Machine Learning Genomic Selection
LIJing1,2,ZHANGYuanxu1,2,WANGZezhao2,CHENYan2,XULingyang2,ZHANGLupei2,
GAOXue2,GAOHuijiang2,LIJunya2,ZHUBo2*,GUOPeng1*
(1.College of Computer and Information Engineering,Tianjin Agricultural University,
Tianjin300384,China; 2.Institute of Animal Science,Chinese Academy of Agricultural
Sciences,Beijing100193,China)
Abstract:Machine learning method is an important branch of genomic selection,and deep learning has become anew research hotspot in the field of machine learning in recent years.The principles and application development of machine learning and deep learning genomic selection were introduced in this paper,and the research progress of deep learning genomic breeding value estimation from the aspects of model framework,model parameters,feature selection were elaborated.Some problems in deep learning genomic selection research were explored and prospects in the future were discussed.
Key words:genomic selection; research progress; machine learning; deep learning; principles and applications
*Corresponding authors:ZHU Bo,E-mail:zhubo@caas.cn; GUO Peng,E-mail:super_guopeng@163.com
全基因組選擇(genomic selection,GS)概念的提出標(biāo)志著全基因組育種時代的來臨[1],與傳統(tǒng)的基于表型和系譜的育種值估計方法相比,GS在候選個體生長發(fā)育早期估計育種值,具有準(zhǔn)確性高、世代間隔短等優(yōu)點[2],已成為動、植物育種的重要手段。全基因組選擇育種值估計方法包括直接法和間接法。直接法利用群體遺傳信息構(gòu)建親緣關(guān)系矩陣,然后利用親緣關(guān)系矩陣估計育種值,全基因組最佳線性無偏估計(genomic best linear unbiased prediction,GBLUP)是直接法的典型代表。間接法首先估計出單核苷酸多樣性(single nucleotide polymorphism,SNP)位點效應(yīng)值,然后使用位點效應(yīng)值和SNP編碼值計算個體育種值,Bayes方法是典型的間接估計法。GBLUP和Bayes方法都基于混合線性模型,該模型處理育種值估計中的非線性類問題的性能較差,而且面臨兩個問題:1)數(shù)組維咒,標(biāo)記數(shù)多于個體數(shù)導(dǎo)致模型過擬合問題;2)不能更好地捕捉數(shù)據(jù)相關(guān)性。機器學(xué)習(xí)(machine learning,ML)可以有效地解決非線性問題[3],不同的ML算法適用于不同類型的數(shù)據(jù)集[4],機器學(xué)習(xí)全基因組選擇得到了廣泛地研究[5-8],本文從集成方法、核方法、深度學(xué)習(xí)方法等幾個方面介紹機器學(xué)習(xí)全基因組選擇。
1 機器學(xué)習(xí)
1.1 集成方法
集成方法組合多個同質(zhì)或異質(zhì)弱學(xué)習(xí)模型形成預(yù)測能力更高的強學(xué)習(xí)模型,它集成多種模型以降低方差,彌補各模型的缺點,提高準(zhǔn)確性。弱學(xué)習(xí)機器通常采用隨機森林(random forest,RF)、人工神經(jīng)網(wǎng)絡(luò)(artificial neural network,ANN)或者其他的ML算法[9]。常見的集成策略有:自助法(Boosting)、提升法(Bagging)和堆疊法(Stacking)。
1.1.1 自助法
自助法根據(jù)當(dāng)前基模型的學(xué)習(xí)策略在訓(xùn)練過程中對訓(xùn)練集進(jìn)行轉(zhuǎn)換,轉(zhuǎn)換后的訓(xùn)練集為后一個基模型服務(wù),按照順序逐次進(jìn)行基模型轉(zhuǎn)換,形成階梯狀的訓(xùn)練過程。每次轉(zhuǎn)換都更新訓(xùn)練集個體權(quán)重,最終將學(xué)習(xí)器和權(quán)重進(jìn)行整合,得到強學(xué)習(xí)機器[10]。Gonzlez-Recio和Forni[11]在豬離散性狀育種值估計研究中發(fā)現(xiàn),RF和自助法在少量基因位點控制性狀的GS不需協(xié)變量也可分析非加性效應(yīng),性能略好于Bayes類方法。梯度自助學(xué)習(xí)器(grandient boosting machine,GBM)是自助法的一種。RostamAbdollahi-Arpanahi等[12]在美國荷斯坦公牛數(shù)據(jù)集中使用GBM、BayesB、CNN、多層感知機(multilayer perceptron,MLP)對父系受孕率性狀進(jìn)行育種值估計研究,試驗結(jié)果準(zhǔn)確性為:GBM(0.36)gt;BayesB(0.34)gt;GBLUP(0.33)gt;RF(0.32)gt;CNN(0.29)gt;MLP(0.26);Grinberg等[13]在1008株單倍體酵母菌株的26個性狀中使用GBM進(jìn)行基因組育種值(genomic estimated breeding value,GEBV)的估計,與最小絕對收縮和選擇算子(Lasso)、嶺回歸(ridge regression,RR)、BLUP、支持向量機(support vector machines,SVM)、RF相比,GBM預(yù)測準(zhǔn)確性最高。
1.1.2 提升法
與Boosting不同,Bagging利用重采樣的方法縮短各模型間的差異性,增強模型的學(xué)習(xí)能力,它采用投票或計算平均值的方式將各模型的預(yù)測結(jié)果進(jìn)行整合作為最終結(jié)果[14]。RF是Bagging的改進(jìn)[15],和Bagging不同,RF利用p個變量(plt;m)預(yù)測平均值以降低回歸樹估計結(jié)果的方差。當(dāng)p=m時,RF相當(dāng)于Bagging。Silveira等[16]分別使用Boosting、BL、Bagging、RF、回歸樹(regression tree,RT)對Piau x商用豬的10個胴體性狀進(jìn)行全基因組育種值估計研究,結(jié)果為:Bagging平均準(zhǔn)確性為0.206,RF為0.210,Boosting為0.158,RT為0.086,BL為0.288。Gianola等[17]通過Bagging思想集成了GBLUP,對小麥籽粒產(chǎn)量育種值估計的研究結(jié)果表明,Bagging可以有效地提高GBLUP的性能。
1.1.3 堆疊法
將模型(基學(xué)習(xí)器)訓(xùn)練結(jié)果作為下一輪新模型(元學(xué)習(xí)器)的輸入是機器學(xué)習(xí)全基因組選擇模型改進(jìn)措施之一。梁忙[18]提出了一種由SVR、KRR、彈性網(wǎng)絡(luò)(elastic net,EN)堆疊而成的SELF(stacking ensemble learning framework,SELF)模型,在德國荷斯坦奶牛、火炬松、中國肉用西門塔爾牛3個數(shù)據(jù)集共9個性狀進(jìn)行了基因組選擇研究。結(jié)果表明,SELF模型優(yōu)于單個的SVR、KRR、EN。SELF在肉牛胴體重、火炬松莖高、肉牛眼肌面積3個性狀的預(yù)測準(zhǔn)確性分別提高了9.97%、7.36%和6.40%。為了提高深度學(xué)習(xí)GS的性能,Ma等[19]將DeepGS模型和嶺回歸最佳線性無偏估計(ridge regression best linear unbiased prediction,rrBLUP)進(jìn)行集成,提出了ELBPSO,該模型在小麥全基因組選擇的研究結(jié)果表明,深度學(xué)習(xí)與常規(guī)方法的異質(zhì)集成可以顯著提升模型的性能[18]。
1.2 核方法
核方法使用核函數(shù)計算輸入數(shù)據(jù)在特征空間中的相似度實現(xiàn)非線性映射?;诤朔椒ǖ娜蚪M選擇有SVM、再生希爾伯特空間(reproducing kernel hilbert space,RKHS)。
1.2.1 SVM
SVM將輸入數(shù)據(jù)空間通過核函數(shù)映射到高維空間,然后使用基于線性的超平面進(jìn)行分類回歸。在GS中,SVM將線性問題轉(zhuǎn)換為非線性問題,比基于線性模型的全基因組育種值估計方法更靈活。Zhao等[20]在豬數(shù)據(jù)集中使用不同核函數(shù)的SVM進(jìn)行全基因組選擇研究,結(jié)果表明不同核函數(shù)下的SVM的GS結(jié)果存在差異,通過核函數(shù)擬合不同的基因組數(shù)據(jù)是SVM全基因組選擇的顯著優(yōu)點。
1.2.2 RKHS
RKHS使用一種比GBLUP更廣義的協(xié)方差結(jié)構(gòu),基于系譜和表型的BLUP、GBLUP、標(biāo)記輔助模型(MAS)都是RKHS的特例[21]。在RKHS中,核函數(shù)定義了個體間協(xié)方差結(jié)構(gòu),在希爾伯特空間使用核矩陣表示焦點間的歐氏距離。研究表明,RKHS在全基因組選擇研究中表現(xiàn)出了較好的性能[22-25]。Gonzlez-Recio等[26]使用E-BLUP、基于F∞-metric模型的SNPs線性回歸(F∞-metric)、RKHS、貝葉斯回歸(bayesian regression,BR)4種模型對肉雞死亡率性狀進(jìn)行基因組選擇研究,以殘差方差的后驗均值和準(zhǔn)確性作為結(jié)果。就方差而言,RKHS(17.07)結(jié)果最優(yōu),其余結(jié)果為E-BLUP(24.38)、F∞-metric(29.97)、BR(20.74);就準(zhǔn)確性而言,RKHS(0.20)也是最優(yōu),其余分別為E-BLUP(0.10)、F∞-metric(0.08)、BR(0.16)。
1.3 其他類型機器學(xué)習(xí)方法
其他的機器學(xué)習(xí)全基因組選擇研究方法有EN、KcRR、KNN等。EN融合了RR和LASSO模型,EN的參數(shù)可調(diào)節(jié),當(dāng)參數(shù)alpha等于1時,其性能接近LASSO;alpha等于0時,性能接近于RR[27]。Wang等[28]使用EN和GBLUP、BayesB在中國華西牛數(shù)據(jù)集中進(jìn)行全基因組育種值估計研究,經(jīng)過五倍交叉驗證試驗,發(fā)現(xiàn)當(dāng)EN的參數(shù)設(shè)定為0.001時,EN在平均日增重、體重、骨重、牛腰肉重、里脊重和手腳長性狀的準(zhǔn)確性結(jié)果要高于GBLUP和BayesB方法0.1~2.5%。An等[29]提出了基于余弦核的余弦?guī)X回歸(KcRR)模型。在中國西門塔爾牛公華西牛和火炬松中使用KcRR、GBLUP_kinship、SVR、BayesB進(jìn)行全基因組選擇研究,結(jié)果表明KcRR最優(yōu)。K最近鄰(k-nearest neighbor,KNN)屬于非線性機器學(xué)習(xí)方法,它使用歐幾里得距離表示不同個體SNP之間的距離,選定最近鄰的K個個體(表型)估計個體育種值。Karacaren[6]對奶牛數(shù)據(jù)集的牛毛綜合征性狀使用KNN、梯度提升決策樹(gradient boosting decision tree,GBDT)、樸素貝葉斯(naive bayes,NB)、加權(quán)子空間隨機森林(wRF)3種模型進(jìn)行育種值估計研究,結(jié)果表明NB和KNN都可以使用較少數(shù)量的信息性SNPs估計育種值,但準(zhǔn)確性較低。whoGEM[30]、KAML[31]、DVR[32]、NB[33]、KBMF[34]等其他機器學(xué)習(xí)方法全基因組選擇研究比較如表1所示。
2 深度學(xué)習(xí)
深度學(xué)習(xí)以其強大的自學(xué)習(xí)能力在復(fù)雜性狀全基因組選擇研究中表現(xiàn)出了優(yōu)異性能[35]。因使用的模型不同,深度學(xué)習(xí)包括深度神經(jīng)網(wǎng)絡(luò)(deep neural networks,DNNs)、卷積神經(jīng)網(wǎng)絡(luò)(CNNs)等[36]。
2.1 深度神經(jīng)網(wǎng)絡(luò)
DNNs由輸入層、隱藏層、輸出層組成,根據(jù)神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)不同又分為MLP、堆疊自動編碼器(SAE)、深度信念網(wǎng)絡(luò)(DBN)。在DNNs中,層與層之間權(quán)重不同,每一層網(wǎng)絡(luò)按照順序完成相應(yīng)的功能計算,由輸出層產(chǎn)生估計育種值。
DNNs在全基因組選擇研究中表現(xiàn)出了優(yōu)異的性能,束永俊等[37]建立了一種雙層BP神經(jīng)網(wǎng)絡(luò)對小麥產(chǎn)量進(jìn)行基因組選擇研究,與BLR、BLR-P、RRBLUP進(jìn)行比較,其結(jié)果分別為DNN(0.663 6)、BLR(0.629 4)、BLR-P(0.657 3)、RRBLUP(0.642 2)。Montesinos-Lpez等[38]使用前饋多層神經(jīng)網(wǎng)絡(luò)對玉米、小麥的農(nóng)藝性狀進(jìn)行基因組選擇研究,與GBLUP進(jìn)行比較,其結(jié)果優(yōu)于GBLUP。
Khaki和Wang[39]建立了一種包含一個用于預(yù)測、另一個用于驗證的兩個深層神經(jīng)網(wǎng)絡(luò)模型,每個網(wǎng)絡(luò)包含21個隱藏層、每層包括50個節(jié)點。使用該模型對玉米數(shù)據(jù)集在不同環(huán)境下的產(chǎn)量進(jìn)行基因組選擇研究,其中一個網(wǎng)絡(luò)用于產(chǎn)量的預(yù)測,另一個用于檢查產(chǎn)量。兩個結(jié)果之差作為模型的預(yù)測結(jié)果。與包含單個隱層、300個節(jié)點淺層神經(jīng)網(wǎng)絡(luò)、LASSO(L1在0.1~0.3之間)、最大深度為10的RT三種方法進(jìn)行比較,結(jié)果表明該模型的每個評價指標(biāo)都優(yōu)于其他模型。
SAE以自編碼器(auto-encoder,AEs)作為基本模塊[40],使用SNP數(shù)據(jù)在輸入層和隱藏層之間作為編碼,在隱藏層到輸出層之間作為解碼,重復(fù)此過程實現(xiàn)多個AE模塊組成。對于單個AE,將SNP輸入到第一個隱藏層h1,輸入層和隱藏層之間訓(xùn)練一個自編碼器,然后將h1隱藏層作為第二個AE輸入,以此類推,學(xué)習(xí)更深層次的特征信息。Islam等[41]基于AE模塊提出了DeepCGP,該模型使用AE模型對水稻基因組數(shù)據(jù)進(jìn)行降維,然后使用降維后的數(shù)據(jù)估計全基因組育種值。使用RF分別對兩種水稻數(shù)據(jù)降維后計算GEBV值,結(jié)果表明7K的SNPs的C7AIR性狀在降維94.01%后的準(zhǔn)確性和未降維數(shù)據(jù)的準(zhǔn)確性平均差異小于3%。700K的SNPs的HDRA數(shù)據(jù)在壓縮98.57%之后的GEBV準(zhǔn)確性與原始數(shù)據(jù)的準(zhǔn)確性平均差異小于5%。
DBNs由多個玻爾茲曼機(RBM)組成[42]。Rachmatia等[43]在玉米數(shù)據(jù)集中使用3個RBM組成的四層DBN回歸模型和GBLUP、Bayes方法對不同環(huán)境下玉米的籽粒產(chǎn)量、雌性開花、雄性開花和花期4種性狀進(jìn)行基因組選擇研究,實驗結(jié)果表明,DBN在玉米花期性狀中的預(yù)測能力優(yōu)于其他模型,玉米花期吐絲間隔性狀在水分充足的環(huán)境下的DBN準(zhǔn)確性(0.559)最高,其余模型的結(jié)果分別為RKHS(0.547)、BL(0.513)、BLUP(0.469)。在嚴(yán)重干旱環(huán)境下的準(zhǔn)確性分別是:DBN(0.579)、RKHS(0.572)、BL(0.517)、BLUP(0.481)。
2.2 卷積神經(jīng)網(wǎng)絡(luò)
在卷積神經(jīng)網(wǎng)絡(luò)基因組選擇研究中,CNNs通過卷積層、池化層提取特征,利用全連接層對特征進(jìn)行分類,層與層間的激活函數(shù)輸出的特征映射進(jìn)行非線性變換以提高神經(jīng)網(wǎng)絡(luò)的性能。對基因組數(shù)據(jù)而言,卷積核在輸入數(shù)據(jù)集中進(jìn)行點乘操作后通過滑動操作對局部信息進(jìn)行匯總以獲取數(shù)據(jù)的特征信息[44];池化層在卷積層之后,池運算符提供了空間變換不變性,并通過消除卷積層之間的部分連接來降低上層的計算復(fù)雜性。通過該操作,保留了輸入數(shù)據(jù)中最顯著的特征,減少了計算復(fù)雜性并防止過擬合[45];全連接層中的神經(jīng)元接收并處理來自上一層的輸入數(shù)據(jù),層之間采用線性回歸方法進(jìn)行處理[46]。通過融入不同的特征衍生出多種CNN模型。
DeepGS的結(jié)構(gòu)為8-32-1,8表示核大小為18的卷積核通道數(shù)、32和1表示全連接層神經(jīng)元個數(shù)[19];DNNGP包括3個卷積塊,每個通道由卷積核、BN層和丟棄層構(gòu)成[47];DualCNN(DLGWAS)結(jié)合兩個CNNs流,解決了不同核大小卷積的特征損失問題,并增強了基因型特征信號[48]。顧林林[49]的ResGS模型包含一個殘差單元,殘差單元由通道數(shù)為8,核大小為17的卷積核、BN層、RELU激活層構(gòu)成。Xie等[50]設(shè)計的ResGS模型有50層,包含多個殘差塊,并采用跨步卷積替代最大池化層減少信息丟失;SoyDNGP借鑒VGG模型思想,其模型包含12個卷積塊和一個全連接層,每個塊由卷積層、BN層、RELU激活層構(gòu)成,同時需要將基因組數(shù)據(jù)轉(zhuǎn)為三通道數(shù)據(jù)適應(yīng)網(wǎng)絡(luò)結(jié)構(gòu)[51]。表2列出了卷積神經(jīng)網(wǎng)絡(luò)模型性能比較。
3 模型訓(xùn)練
模型訓(xùn)練是機器學(xué)習(xí)的重要部分,其結(jié)果直接影響模型的預(yù)測性能。深度學(xué)習(xí)全基因組選擇模型訓(xùn)練涉及到參數(shù)調(diào)優(yōu)、模型正則化、特征的選擇。
3.1 超參數(shù)優(yōu)化
深度學(xué)習(xí)中的參數(shù)分為可調(diào)整參數(shù)和不可調(diào)整參數(shù)[52],可調(diào)整參數(shù)使用隨機均值或正常值作為初始化權(quán)值,利用每層權(quán)重求和實現(xiàn)向前傳播,最終輸出預(yù)測值,通過預(yù)測值和實際值之差構(gòu)造損失函數(shù)y來調(diào)整權(quán)重和偏置。模型不斷重復(fù)上述過程以獲得損失最小的最優(yōu)參數(shù)。不可調(diào)整參數(shù)的設(shè)置由工作人員根據(jù)經(jīng)驗設(shè)置,這極大地限制了DL在實際應(yīng)用中的有效性[52]。一些超參數(shù)優(yōu)化方法(表3所示)為機器學(xué)習(xí)、深度學(xué)習(xí)在訓(xùn)練過程中尋找最優(yōu)超參數(shù)提供了解決途徑。
3.2 正則化
正則化可以防止模型在訓(xùn)練過程出現(xiàn)過擬合、在驗證過程中出現(xiàn)泛化能力差的問題。目前正則化方法有:L2(權(quán)重衰竭)、L1、drought、參數(shù)共享、早期停止、數(shù)據(jù)增強、BN等。這些方法在GS的分析中更好地選擇主要特征和分析性狀的遺傳結(jié)構(gòu)。
貝葉斯正則化網(wǎng)絡(luò)(bayesian recurrent neural networks,BRANN)將神經(jīng)網(wǎng)絡(luò)框架引入到貝葉斯模型中,它首先給定參數(shù)先驗分布,利用貝葉斯定理將參數(shù)限制在后驗分布中以提高模型的泛化能力。然后在分布中通過多次采樣得到不同的權(quán)值,產(chǎn)生多種預(yù)測結(jié)果[62]。貝葉斯方法中的似然函數(shù)處理復(fù)雜模型問題時通常使用近似的方法進(jìn)行解決。近似貝葉斯神經(jīng)網(wǎng)絡(luò)(approximate bayesian recurrent neural networks,ABNN)可用于獲得模型平均的后驗預(yù)測,并且使用dropout、權(quán)重衰減實現(xiàn)正則化以提高估計的準(zhǔn)確度[63]。對于權(quán)值的正則化,徑向基神經(jīng)網(wǎng)絡(luò)(BRFNN)以徑向基函數(shù)作為激活函數(shù),相比BP網(wǎng)絡(luò),它可以使數(shù)據(jù)在接近中心點時產(chǎn)生較大的響應(yīng),間接實現(xiàn)正則化,加快訓(xùn)練速度[64]?;赗BF和競爭神經(jīng)元的概率神經(jīng)網(wǎng)絡(luò)(product-based neural network,PNN)和傳統(tǒng)方法相比,預(yù)測性能更好[65],如4表所示。
4 討 論
4.1 基因組選擇中的數(shù)據(jù)處理、深度學(xué)習(xí)與傳統(tǒng)方法比較及計算效率優(yōu)化
在基因組選擇中,輸入數(shù)據(jù)和標(biāo)簽的構(gòu)建對機器學(xué)習(xí)和深度學(xué)習(xí)的影響至關(guān)重要。針對合子性和SNP標(biāo)記數(shù)據(jù)的編碼方式,新的編碼方式如one-hot編碼和frequency編碼被引入以避免信息丟失[68-70]。對于標(biāo)簽數(shù)據(jù),連續(xù)性狀對應(yīng)回歸任務(wù),定性性狀對應(yīng)分類任務(wù),在預(yù)處理方面可以使用殘差代替表型數(shù)據(jù)估計育種值[50],同時需要通過標(biāo)準(zhǔn)化解決環(huán)境因素的影響[71-72]。此外,樣本規(guī)模和SNP標(biāo)記數(shù)的增加也影響著全基因組育種值估計的準(zhǔn)確性,就樣本規(guī)模而言,當(dāng)樣本數(shù)少于1000時,育種值估計結(jié)果偏差較大[73-74];隨著樣本量的增加,小型神經(jīng)網(wǎng)絡(luò)模型的性能優(yōu)于傳統(tǒng)機器學(xué)習(xí)模型[50];大型神經(jīng)網(wǎng)在大數(shù)據(jù)集中性能更優(yōu)[75]。對SNP標(biāo)記數(shù)而言,當(dāng)pgt;n時,模型容易過擬合[76]。由于基因分型錯誤、數(shù)據(jù)缺失、批量效應(yīng)和生物可變性,可能導(dǎo)致SNP數(shù)據(jù)中包含干擾信息,需要使用降維的方法對數(shù)據(jù)進(jìn)行篩選[77]。一些基于ML、DL的特征選擇方法被應(yīng)用到GS當(dāng)中[41,78-82]。
就計算效率而言,基于GPU的ML、DL基因組選擇比基于CPU的基因組選擇運行效率更高[48,83]。在大豆基因組選擇研究中,基于GPU的DLGWAS模型運行時間大約10 min,而基于CPU的模型計算需要大約3h[48]。DAIRRy-BLUP是一種并行分布式存儲器RR-BLUP,對大數(shù)據(jù)集(1百萬個體,360k SNPs)進(jìn)行基因組選擇時,可以有效縮短常規(guī)BLUP的運行時間,在硬件方面需要分布式系統(tǒng)硬件環(huán)境的支持,成本更高[84]。并行Bayes基因組選擇采用基于多核多處理器或者分布式系統(tǒng)的MPI消息傳遞模式模式,可以有效地縮短貝葉斯方法全基因組育種值估計的運行時間,但并行算法運行效率受burn-in階段必須串行運行的限制[85],并行貝葉斯方法運行速度提升效果不如基于GPU的并行深度學(xué)習(xí)全基因組選擇方法顯著。此外,機器學(xué)習(xí)中的多任務(wù)學(xué)習(xí)實現(xiàn)不同任務(wù)預(yù)測的聯(lián)合,通過多個任務(wù)間的共享計算,縮短運行時間,提高GEBV預(yù)測算法的效率,多任務(wù)學(xué)習(xí)在多群體全基因組選擇的性能更優(yōu)[13,86]。機器學(xué)習(xí)、深度學(xué)習(xí)全基因組選擇并行計算提升運行速度潛力更大。
4.2 深度學(xué)習(xí)全基因組選擇研究分析
增加網(wǎng)絡(luò)層數(shù)是提高深度學(xué)習(xí)基因組選擇準(zhǔn)確度的重要途經(jīng)途徑。過多的層數(shù)會加重過擬合、梯度不穩(wěn)定、網(wǎng)絡(luò)退化等問題[87],降低估計的準(zhǔn)確度。DNNGP等DL模型包含Dropout層、BN層和早期停止功能可以緩解模型過擬合[88-89];ResGS中利用殘差神經(jīng)網(wǎng)絡(luò)緩解梯度不穩(wěn)定和網(wǎng)絡(luò)退化問題[87];SoyDNGP借助VGG模型設(shè)計了一種窄而深的模型結(jié)構(gòu),使其能夠在獲得更多特征的同時控制參數(shù)的個數(shù),避免過多的計算量以及過于復(fù)雜的結(jié)構(gòu)[90]。這些DL模型在不同場景中展現(xiàn)出不同的優(yōu)勢,DNNGP在多組學(xué)育種值估計中表型出較好的預(yù)測準(zhǔn)確度[47]、DLGWAS可以對含缺失值的基因組數(shù)據(jù)進(jìn)行育種值估計,soyDNGP、ResGS等深層神經(jīng)網(wǎng)絡(luò)因其深層模型攜帶復(fù)雜的參數(shù)可以更好地擬合復(fù)雜特征。在數(shù)據(jù)集分布不均衡、定性性狀的分類任務(wù)和評判預(yù)測表型的準(zhǔn)確度方面較DeepGS、DNNGP等淺層網(wǎng)絡(luò)有更高的預(yù)測準(zhǔn)確性,更低的誤差;就樣本量敏感性而言,SoyDNGP在超過1000樣本量的性能較好。在DNNGP的對比試驗中,當(dāng)訓(xùn)練集個體時小于500時,SVR和LightGBM準(zhǔn)確度方面優(yōu)于DNNGP,隨著數(shù)據(jù)量增加DNNGP性能大大提升[47]。ResGS在樣本數(shù)為413的水稻數(shù)據(jù)集試驗中,表現(xiàn)出優(yōu)異的性能[50]。
上述深度網(wǎng)絡(luò)模型由于基于序列模式分配標(biāo)記效應(yīng),這種方法不能很好地挖掘SNP之間的信息[50]?;蛭稽c之間距離較遠(yuǎn)而相互影響較小[91-92],亟需在GS中引入新的網(wǎng)絡(luò)模型結(jié)構(gòu)以有效地挖掘遺傳信息。局部卷積神經(jīng)網(wǎng)絡(luò)相較于CNN而言,每個過濾器中的權(quán)重不共享,可以為不同基因座的相鄰標(biāo)記分配不同的權(quán)重以體現(xiàn)基因座之間的影響關(guān)系[93-94];神經(jīng)網(wǎng)絡(luò)中的transformers包括自我注意力機制、前饋神經(jīng)網(wǎng)絡(luò)和歸一化層[70,95]。模型中的自我注意機制計算與特定遺傳標(biāo)記相關(guān)的所有遺傳標(biāo)記的注意力[96]有助于找到標(biāo)記之間的影響關(guān)系;圖神經(jīng)網(wǎng)絡(luò)是一種用于處理圖結(jié)構(gòu)數(shù)據(jù)的深度學(xué)習(xí)模型[97],在GS中使用G矩陣、A矩陣或基因組相似性或關(guān)系矩陣構(gòu)建個體節(jié)點圖,具有足夠關(guān)聯(lián)級別的節(jié)點被連接,表型預(yù)測轉(zhuǎn)換為節(jié)點回歸問題[98]。這些網(wǎng)絡(luò)模型在GS中應(yīng)用較少,進(jìn)一步的研究將會發(fā)掘其在基因組選擇中的作用。
5 未來與展望
當(dāng)前,深度學(xué)習(xí)中的DNN、CNN模型在全基因組選擇研究的應(yīng)用較多。利用其他的深度學(xué)習(xí)模型進(jìn)行全基因組選擇選擇研究,挖掘標(biāo)記效應(yīng)之間的影響因素、提高基因組育種值的準(zhǔn)確性是深度學(xué)習(xí)全基因組選擇研究的方向之一。另外,嘗試使用隱式的深度學(xué)習(xí)技術(shù)提高模型泛化能力、利用強化學(xué)習(xí)規(guī)劃和學(xué)習(xí)策略改善多法“集成”的策略以提高預(yù)測準(zhǔn)確性以及應(yīng)用無監(jiān)督和半監(jiān)督的深度集成方法進(jìn)行全基因組選擇研究需要進(jìn)行探索和研究。CNN的特殊模型結(jié)構(gòu)可以處理高維度數(shù)據(jù),極大地解決了某些表型難以測量等問題。基于遙感衛(wèi)星圖像數(shù)據(jù)預(yù)測作物產(chǎn)量的研究已經(jīng)出現(xiàn),這提高了數(shù)據(jù)的實時性、廣泛性,未來在全基因組選擇研究中可能涉及到更多的非常規(guī)數(shù)據(jù)。這些創(chuàng)新方法的應(yīng)用促進(jìn)了全基因組選擇動、植物育種實踐的發(fā)展??傊?,雖然機器學(xué)習(xí)、深度學(xué)習(xí)受到數(shù)據(jù)因素、參數(shù)因素、模型結(jié)構(gòu)的影響,使得在某一方面預(yù)測結(jié)果相比傳統(tǒng)方法不盡如意,但憑借ML、DL強大的自學(xué)習(xí)能力,隨著模型的不斷完善和基因數(shù)據(jù)的積累,相信其終將表現(xiàn)出令人信服的性能。
參考文獻(xiàn)(References):
[1]MEUWISSEN TH E,HAYES BJ,GODDARD ME.Prediction of total genetic value using genome-wide dense marker maps[J].Genetics,2001,157(4):1819-1829.
[2]WELLER JI,EZRA E,RON M.Invited review:A perspective on the future of genomic selection in dairy cattle[J].J Dairy Sci,2017,100(11):8633-8644.
[3]VARGAS R,MOSAVI A,RUIZ R.Deep learning:a review[J].Adv Intell Syst Comput,2017,5(2).
[4]Z?LLER MA,HUBER MF.Benchmark and survey of automated machine learning frameworks[J].J Artif Intell Res,2021,70:409-472.
[5]ALVES AA C,ESPIGOLAN R,BRESOLIN T,et al.Genome-enabled prediction of reproductive traits in Nellore cattle using parametric models and machine learning methods[J].Anim Genet,2021,52(1):32-46.
[6]KARACA?REN B.An evaluation of machine learning for genomic prediction of hairy syndrome in dairy cattle[J].Anim Sci Pap Rep,2022,40(1):45-58.
[7]WANG X,SHI SL,WANG GJ,et al.Using machine learning to improve the accuracy of genomic prediction of reproduction traits in pigs[J].J Anim Sci Biotechnol,2022,13(1):60.
[8]丁紀(jì)強,李慶賀,張高猛,等.比較機器學(xué)習(xí)等算法對肉雞產(chǎn)蛋性狀育種值估計的準(zhǔn)確性[J].畜牧獸醫(yī)學(xué)報,2022,53(5):1364-1372.
DING JQ,LI QH,ZHANG GM,et al.Comparing the accuracy of estimated breeding value by several algorithms on laying traits in broilers[J].Acta Veterinaria et Zootechnica Sinica,2022,53(5):1364-1372.(in Chinese)
[9]ZHOU ZH.Ensemble learning[M]∥ZHOU ZH.Machine Learning.Singapore:Springer,2021:181-210.
[10]FREUND Y,SCHAPIRE RE.A short introduction to boosting[J].J JSAI,1999,14(5):771-780.
[11]GONZáLEZ-RECIO O,F(xiàn)ORNI S.Genome-wide prediction of discrete traits using Bayesian regressions and machine learning[J].Genet Sel Evol,2011,43(1):7.
[12]ABDOLLAHI-ARPANAHI R,GIANOLA D,PE?AGARICANO F.Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes[J].Genet Sel Evol,2020,52(1):12.
[13]GRINBERG NF,ORHOBOR OI,KING RD.An evaluation of machine-learning for predicting phenotype:studies in yeast,rice,and wheat[J].Mach Learn,2020,109(2):251-277.
[14]BREIMAN L.Bagging predictors[J].Mach Learn,1996,24(2):123-140.
[15]JAMES G,WITTEN D,HASTIE T,et al.An introduction to statistical learning:with Applications in R[M].New York:Springer,2013.
[16]SILVEIRA LS,LIMA LP,NASCIMENTO M,et al.Regression trees in genomic selection for carcass traits in pigs[J].Genet Mol Res,2020,19(1):gmr18498.
[17]GIANOLA D,WEIGEL KA,KR?MER N,et al.Enhancing genome-enabled prediction by bagging genomic BLUP[J].PLoS One,2014,9(4):e91693.
[18]梁 忙.基于機器學(xué)習(xí)算法的全基因組選擇研究[D].北京:中國農(nóng)業(yè)科學(xué)院,2021.
LIANG M.The algorithm research for genomic selection study based on machine learning[D].Beijing:Chinese Academy of Agricultural Sciences,2021.(in Chinese)
[19]MA WL,QIU ZX,SONG J,et al.DeepGS:Predicting phenotypes from genotypes using deep learning[J].BioRxiv,2017:241414.
[20]ZHAO W,LAI XS,LIU DY,et al.Applications of support vector machine in genomic prediction in pig and maize populations[J].Front Genet,2020,11:598318.
[21]DE LOS CAMPOS G,GIANOLA D,ROSA GJ M.Reproducing kernel Hilbert spaces regression:a general framework for genetic evaluation[J].J Anim Sci,2009,87(6):1883-1887.
[22]BLONDEL M,ONOGI A,IWATA H,et al.A ranking approach to genomic selection[J].PLoS One,2015,10(6):e0128570.
[23]GONZáLEZ-RECIO O,ROSA GJ M,GIANOLA D.Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits[J].Livest Sci,2014,166:217-231.
[24]GONZáLEZ-CAMACHO JM,ORNELLA L,PéREZ-RODRíGUEZ P,et al.Applications of machine learning methods to genomic selection in breeding wheat for rust resistance[J].Plant Genome,2018,11(2):170104.
[25]XU Y,XU C,XU S.Prediction and association mapping of agronomic traits in maize using multiple omic data[J].Heredity(Edinb),2017,119(3):174-184.
[26]GONZáLEZ-RECIO O,GIANOLA D,LONG NY,et al.Nonparametric methods for incorporating genomic information into genetic evaluations:an application to mortality in broilers[J].Genetics,2008,178(4):2305-2313.
[27]ZOU H,HASTIE T.Regularization and variable selection via the elastic net[J].J RStat Soc Ser B,2005,67(2):301-320.
[28]WANG XQ,MIAO J,CHANG TP,et al.Evaluation of GBLUP,BayesB and elastic net for genomic prediction in Chinese Simmental beef cattle[J].PLoS One,2019,14(2):e0210442.
[29]AN BX,LIANG M,CHANG TP,et al.KCRR:a nonlinear machine learning with amodified genomic similarity matrix improved the genomic prediction efficiency[J].Brief Bioinform,2021,22(6):bbab132.
[30]GENTZBITTEL L,BEN C,MAZURIER M,et al.WhoGEM:An admixture-based prediction machine accurately predicts quantitative functional traits in plants[J].Genome Biol,2019,20(1):106.
[31]YIN LL,ZHANG HH,ZHOU X,et al.KAML:improving genomic prediction accuracy of complex traits using machine learning determined parameters[J].Genome Biol,2020,21(1):146.
[32]TODA Y,WAKATSUKI H,AOIKE T,et al.Predicting biomass of rice with intermediate traits:Modeling method combining crop growth models and genomic prediction models[J].PLoS One,2020,15(6):e0233951.
[33]VAN DER HEIDE EM M,VEERKAMP RF,VAN PELT ML,et al.Comparing regression,naive Bayes,and random forest methods in the prediction of individual survival to second lactation in Holstein cattle[J].J Dairy Sci,2019,102(10):9409-9421.
[34]GILLBERG J,MARTTINEN P,MAMITSUKA H,et al.Modelling G×E with historical weather information improves genomic prediction in new environments[J].Bioinformatics,2019,35(20):4045-4052.
[35]BELLOT P,DE LOS CAMPOS G,PéREZ-ENCISO M.Can deep learning improve genomic prediction of complex human traits?[J].Genetics,2018,210(3):809-819.
[36]XIE B,ZHANG Q.Deep filtering with DNN,CNN and RNN[J].arXiv preprint arXiv:2112.12616v2,2009.
[37]束永俊,吳 磊,王 丹,等.人工神經(jīng)網(wǎng)絡(luò)在作物基因組選擇中的應(yīng)用[J].作物學(xué)報,2011,37(12):2179-2186.
SHU YJ,WU L,WANG D,et al.Application of artificial neural network in genomic selection for crop improvement[J].Acta Agronomica Sinica,2011,37(12):2179-2186.(in Chinese)
[38]MONTESINOS-LóPEZ A,MONTESINOS-LóPEZ OA,GIANOLA D,et al.Multi-environment genomic prediction of plant traits using deep learners with dense architecture[J].G3(Bethesda),2018,8(12):3813-3828.
[39]KHAKI S,WANG LZ.Crop yield prediction using deep neural networks[J].Front Plant Sci,2019,10:621.
[40]RANZATO MA,BOUREAU YL,LECUN Y.Sparse feature learning for deep belief networks[C]∥Proceedings of the20th International Conference on Neural Information Processing Systems.Vancouver:Curran Associates Inc.,2007:1185-1192.
[41]ISLAM T,KIM CH,IWATA H,et al.DeepCGP:a deep learning method to compress genome-wide polymorphisms for predicting phenotype of rice[J].IEEE/ACM Trans Comput Biol Bioinform,2023,20(3):2078-2088.
[42]LOPES N,RIBEIRO B.Deep belief networks(DBNs)[M]∥LOPES N,RIBEIRO B.Machine Learning for Adaptive Many-Core Machines-A Practical Approach.Cham:Springer,2015:155-186.
[43]RACHMATIA H,KUSUMA WA,HASIBUAN LS.Prediction of maize phenotype based on whole-genome single nucleotide polymorphisms using deep belief networks[J].J Phys:Conf Ser,2017,835:012003.
[44]YAMASHITA R,NISHIO M,DO RK G,et al.Convolutional neural networks:an overview and application in radiology[J].Insights Imaging,2018,9(4):611-629.
[45]GHOLAMALINEZHAD H,KHOSRAVI H.Pooling methods in deep neural networks,a review[J].arXiv preprint arXiv:2009.07481,2020.
[46]MONTGOMERY DC,PECK EA,VINING GG.Introduction to linear regression analysis[M].6th ed.Hoboken:John Wileyamp;Sons,2021.
[47]WANG KL,ABID MA,RASHEED A,et al.DNNGP,a deep neural network-based method for genomic prediction using multi-omics data in plants[J].Mol Plant,2023,16(1):279-293.
[48]LIU Y,WANG DL,HE F,et al.Phenotype prediction and genome-wide association study using deep convolutional neural network of soybean[J].Front Genet,2019,10:1091.
[49]顧林林.基于人工智能(AI)技術(shù)的基因組遺傳值預(yù)測的新算法開發(fā)[D].廈門:集美大學(xué),2021.
GU LL.Development of new algorithms for genetic value prediction of genomes based on artificial intelligence(AI)techniques[D].Xiamen:Jimei University,2021.(in Chinese)
[50]XIE ZC,XU XG,LI L,et al.Residual networks without pooling layers improve the accuracy of genomic predictions[J].Theor Appl Genet,2023.
[51]GAO PF,ZHAO HN,LUO Z,et al.SoyDNGP:a web-accessible deep learning framework for genomic prediction in soybean breeding[J].Brief Bioinform,2023,24(6):bbad349.
[52]YANG L,SHAMI A.On hyperparameter optimization of machine learning algorithms:Theory and practice[J].Neurocomputing,2020,415:295-316.
[53]LIANG M,AN BX,LI K,et al.Improving genomic prediction with machine learning incorporating TPE for hyperparameters optimization[J].Biology(Basel),2022,11(11):1647.
[54]HAN JJ,GONDRO C,REID K,et al.Heuristic hyperparameter optimization of deep learning models for genomic prediction[J].G3(Bethesda),2021,11(7):jkab032.
[55]STORN R,PRICE K.Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces[J].J Global Optim,1997,11(4):341-359.
[56]AlVES AA C,F(xiàn)ERNANDES AF A,LOPES FB,et al.(Quasi)multitask support vector regression with heuristic hyperparameter optimization for whole-genome prediction of complex traits:a case study with carcass traits in broilers[J].G3(Bethesda),2023,13(8):jkad109.
[57]MIRJALILI S.Genetic algorithm[M]∥MIRJALILI S.Evolutionary Algorithms and Neural Networks:Theory and Applications.Cham:Springer,2019:43-55.
[58]WALDMANN P,PFEIFFER C,MéSZáROS G.Sparse convolutional neural networks for genome-wide prediction[J].Front Genet,2020,11:25.
[59]FALKNER S,KLEIN A,HUTTER F.BOHB:Robust and efficient hyperparameter optimization at scale[C]∥Proceedings of the35th International Conference on Machine Learning.Stockholm:PMLR,2018:1436-1445.
[60]ZHOU GH,GAO J,ZUO DS,et al.MSXFGP:combining improved sparrow search algorithm with XGBoost for enhanced genomic prediction[J].BMC Bioinformatics,2023,24(1):384.
[61]XUE JK,SHEN B.A novel swarm intelligence optimization approach:sparrow search algorithm[J].Syst Sci Control Eng,2020,8(1):22-34.
[62]OKUT H,WU XL,ROSA GJ M,et al.Predicting expected progeny difference for marbling score in Angus cattle using artificial neural networks and Bayesian regression models[J].Genet Sel Evol,2013,45(1):34.
[63]WALDMANN P.Approximate Bayesian neural networks in genomic prediction[J].Genet Sel Evol,2018,50(1):70.
[64]GONZáLEZ-CAMACHO JM,DE LOS CAMPOS G,PéREZ P,et al.Genome-enabled prediction of genetic values using radial basis function neural networks[J].Theor Appl Genet,2012,125(4):759-771.
[65]GONZáLEZ-CAMACHO JM,CROSSA J,PéREZ-RODRíGUEZ P,et al.Genome-enabled prediction using probabilistic neural network classifiers[J].BMC Genomics,2016,17:208.
[66]JUBAIR S,DOMARATZKI M.Ensemble supervised learning for genomic selection[C]∥2019IEEE International Conference on Bioinformatics and Biomedicine(BIBM).San Diego:IEEE,2019:1993-2000.
[67]CROSSA J,JARQUíN D,F(xiàn)RANCO J,et al.Genomic prediction of gene bank wheat landraces[J].G3(Bethesda),2016,6(7):1819-1834.
[68]DONG S,WANG P,ABBAS K.A survey on deep learning and its applications[J].Comput Sci Rev,2021,40:100379.
[69]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[70]JUBAIR S,TUCKER JR,HENDERSON N,et al.GPTransformer:A transformer-based deep learning method for predicting Fusarium related traits in barley[J].Front Plant Sci,2021,12:761402.
[71]WASHBURN JD,CIMEN E,RAMSTEIN G,et al.Predicting phenotypes from genetic,environment,management,and historical data using CNNs[J].Theor Appl Genet,2021,134(12):3997-4011.
[72]JUBAIR S,DOMARATZKI M.Crop genomic selection with deep learning and environmental data:A survey[J].Front Artif Intell,2023,5:1040295.
[73]ALWOSHEEL A,VAN CRANENBURGH S,CHORUS CG.Is your dataset big enough?Sample size requirements when using artificial neural networks for discrete choice analysis[J].J Choice Model,2018,28:167-182.
[74]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth16x16words:transformers for image recognition at scale[C]∥9th International Conference on Learning Representations.ICLR,2021.
[75]AZIZI S,MUSTAFA B,RYAN F,et al.Big self-supervised models advance medical image classification[C]∥Proceedings of the2021IEEE/CVF International Conference on Computer Vision.Montreal:IEEE,2021:3458-3468.
[76]WANG JL.Fast and accurate population admixture inference from genotype data from afew microsatellites to millions of SNPs[J].Heredity,2022,129(2):79-92.
[77]NAYERI S,SARGOLZAEI M,TULPAN D.A review of traditional and machine learning methods applied to animal breeding[J].Anim Health Res Rev,2019,20(1):31-46.
[78]MONTESINOS-LóPEZ OA,CRESPO-HERRERA L,PIERRE CS,et al.Do feature selection methods for selecting environmental covariables enhance genomic prediction accuracy?[J].Front Genet,2023,14:1209275.
[79]PILES M,BERGSMA R,GIANOLA D,et al.Feature selection stability and accuracy of prediction models for genomic prediction of residual feed intake in pigs using machine learning[J].Front Genet,2021,12:611506.
[80]PUDJIHARTONO N,F(xiàn)ADASON T,KEMPA-LIEHR AW,et al.A review of feature selection methods for machine learning-based disease risk prediction[J].Front Bioinform,2022,2:927312.
[81]LI SS,YU J,KANG HM,et al.Genomic selection in Chinese Holsteins using regularized regression models for feature selection of whole genome sequencing data[J].Animals(Basel),2022,12(18):2419.
[82]ISLAM T,KIM CH,IWATA H,et al.A deep learning method to impute missing values and compress genome-ide polymorphism data in rice[C]∥Proceedings of the14th International Joint Conference on Biomedical Engineering Systems and Technologies.BIOSTEC,2021:101-109.
[83]ERASLAN G,AVSEC ?,GAGNEUR J,et al.Deep learning:new computational modelling techniques for genomics[J].Nat Rev Genet,2019,20(7):389-403.
[84]DE CONINCK A,F(xiàn)OSTIER J,MAENHOUT S,et al.DAIRRy-BLUP:A high-performance computing approach to genomic prediction[J].Genetics,2014,197(3):813-822.
[85]GUO P,ZHU B,NIU H,et al.Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis[J].BMC Bioinformatics,2018,19(1):3.
[86]CHEN LH,LI CX,MILLER S,et al.Multi-population genomic prediction using amulti-task Bayesian learning model[J].BMC Genetics,2014,15:53.
[87]HE KM,ZHANG XY,REN SQ,et al.Deep residual learning for image recognition[C]∥Proceedings of the2016IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778.
[88]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].J Mach Learn Res,2014,15(1):1929-1958.
[89]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]∥Proceedings of the32nd International Conference on International Conference on Machine Learning.Lille:JMLR.org.,2015:448-456.
[90]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]∥3rd International Conference on Learning Representations.San Diego:ICLR,2015.
[91]KIZILKAYA K,F(xiàn)ERNANDO RL,GARRICK DJ.Genomic prediction of simulated multibreed and purebred performance using observed fifty thousand single nucleotide polymorphism genotypes[J].J Anim Sci,2010,88(2):544-551.
[92]ERBE M,HAYES BJ,MATUKUMALLI LK,et al.Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels[J].J Dairy
Sci,2012,95(7):4114-4129.
[93]LEE HJ,LEE JH,GONDRO C,et al.deepGBLUP:joint deep learning networks and GBLUP framework for accurate genomic prediction of complex traits in Korean native cattle[J].Genet Sel Evol,2023,55(1):56.
[94]POOK T,F(xiàn)REUDENTHAL J,KORTE A,et al.Using local convolutional neural networks for genomic prediction[J].Front Genet,2020,11:561497.
[95]LU J,HOU W,XIONG LW,et al.GSCNN:A genomic selection convolutional neural network model based on SNP genotype and physical distance features and data augmentation strategy[J].BMC Genomics,2024.
[96]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]∥Proceedings of the31st International Conference on Neural Information Processing Systems.Long Beach:Curran Associates Inc.,2017:6000-6010.
[97]WU ZH,PAN SR,CHEN FW,et al.A comprehensive survey on graph neural networks[J].IEEE Trans Neural Netw Learn Syst,2021,32(1):4-24.
[98]HAMILTON WL,YING Z,LESKOVEC J.Inductive representation learning on large graphs[C]∥Proceedings of the31st International Conference on Neural Information Processing Systems.Long Beach:Curran Associates Inc.,2017:1025-1035.
(編輯 郭云雁)