中圖分類號(hào):TP399 文獻(xiàn)標(biāo)志碼:A 文章編號(hào):1671-5489(2025)04-1117-05
Improve Dung Beetle Algorithm to Optimize Machine Learning Model
FEI Minxue1,HUANG Dongyan1,GUO Xiaoxin 2,3 ( (1. College of Engineering and Technology, Jilin Agricultural University,Changchun 130l18,China;
2. College of Computer Science and Technology, Jilin University, Changchun 1300l2, China;
3. Key Laboratory of Symbolic Computation and Knowledge Engineeing of Ministry of Education, Jilin University,Changchun 130012,China)
Abstract: Aiming at the problem of low accuracy of traditional support vector machines (SVM),we proposed an LDBO-SVM model. Firstly,in order to solve the problem of uneven distribution of the initial solution of the original dung beetle optimization algorithm,the Logistic chaotic map was introduced into the algorithm to construct the LDBO algorithm. Secondly,the LDBO algorithm was used to optimize the internal penalty factor and kernel parameters of the traditional support vector machine,and the LDBO-SVM model was constructed. Finally,in order to verify the performance of LDBO-SVM model, LDBO-SVM model was compared with the improved SVM by using five other population intelligent optimization algorithms. The experimental results show that the accuracy of LDBO-SVM model reaches 94.53% , and can accurately predict student achievement, providing assistance for teachers to improve their teaching plans.
Keywords:machine learning; support vector machine; dung beetle optimization algorithm;parameteroptimization
目前,神經(jīng)網(wǎng)絡(luò)模型[1-2]備受關(guān)注,廣泛應(yīng)用于各領(lǐng)域中.神經(jīng)網(wǎng)絡(luò)模型雖然在各領(lǐng)域中都展現(xiàn)了獨(dú)特的優(yōu)勢(shì),但針對(duì)小數(shù)據(jù)集,其性能低于機(jī)器學(xué)習(xí)模型[3-5].機(jī)器學(xué)習(xí)模型不需要大量的數(shù)據(jù)集訓(xùn)練即可達(dá)到很好的擬合效果,所以針對(duì)小數(shù)據(jù)集,機(jī)器學(xué)習(xí)模型更有優(yōu)勢(shì).但隨著社會(huì)需求的提高,傳統(tǒng)機(jī)器學(xué)習(xí)模型的弊端逐漸開(kāi)始嚴(yán)重影響模型的性能,導(dǎo)致最后結(jié)果遠(yuǎn)未達(dá)預(yù)期.
為提升傳統(tǒng)機(jī)器學(xué)習(xí)模型的性能,大量研究開(kāi)始將各種優(yōu)化方法集成到機(jī)器學(xué)習(xí)模型中.Innan等[6]結(jié)合現(xiàn)有的兩種支持向量機(jī)(support vector machines,SVM)的優(yōu)勢(shì)構(gòu)建了QVK-SVM模型,實(shí)驗(yàn)證明該支持向量機(jī)具有較好的性能.Shrivastava等[7]關(guān)注到支持向量機(jī)易受決策邊界附近有噪聲樣本的影響,提出了一種新的損失函數(shù),并將該損失函數(shù)集成到支持向量機(jī)中,以此構(gòu)建了Eagle-SVM模型,具有較好的魯棒性.Rizwan 等[8]為解決傳統(tǒng)支持向量機(jī)忽略了半徑的最小化問(wèn)題與類之間最佳超平面的配置是無(wú)效的問(wèn)題,提出了一種新的加權(quán)半徑支持向量機(jī)(WR-SVM),仿真實(shí)驗(yàn)結(jié)果表明,該模型具有較好的分類精度.可見(jiàn),對(duì)傳統(tǒng)機(jī)器學(xué)習(xí)模型進(jìn)行改進(jìn)可以提升模型的性能.
群智能優(yōu)化算法[9-11]在優(yōu)化領(lǐng)域取得了卓越的效果,尤其是參數(shù)優(yōu)化領(lǐng)域,其中包括支持向量機(jī)的參數(shù)優(yōu)化[12-14].基于此,本文將改進(jìn)的蜣螂優(yōu)化(dung beetle optimizer,DBO)算法集成到傳統(tǒng)支持向量機(jī)中,并將新構(gòu)建的模型應(yīng)用于學(xué)生成績(jī)預(yù)測(cè).
1算法設(shè)計(jì)
1.1 改進(jìn)蜣螂優(yōu)化算法
1. 1. 1 蜣螂優(yōu)化算法
蜣螂優(yōu)化算法通過(guò)模擬自然界中蜣螂的生活行為進(jìn)行數(shù)學(xué)建模.自然界中,蜣螂所有生活行為都以糞球?yàn)楹诵模L動(dòng)糞球、通過(guò)糞球繁殖、小蜣螂覓食和偷竊糞球.在滾動(dòng)糞球過(guò)程中會(huì)遇到障礙物,因此將滾動(dòng)糞球行為分為兩種:一種是無(wú)障礙物滾動(dòng)糞球;另一種是有障礙物滾動(dòng)糞球.因此,蜣螂優(yōu)化算法在數(shù)學(xué)建模時(shí)將算法分為四部分.無(wú)障礙滾動(dòng)糞球數(shù)學(xué)建模如下:
其中: Ψt 表示迭代次數(shù); xi(t) 表示蜣螂的位置; K 表示偏轉(zhuǎn)系數(shù)的常數(shù); b 為定值; α 為自然系數(shù),表示是否偏離原來(lái)方向; Δx=∣xi(t)-Xw∣ 用于模擬光照射強(qiáng)度的變化, Xw 表示全局最差位置.
當(dāng)遇到障礙物時(shí),蜣螂會(huì)在糞球上跳舞,以此重新確定方向,數(shù)學(xué)建模如下
其中 θ 表示撓度角.繁殖行為數(shù)學(xué)建模公式如下:
其中 Bi(t) 表示繁殖出卵球的位置信息, b1 和 b2 為兩個(gè)獨(dú)立的向量, Xpart* 為局部最優(yōu)位置, LBpart? 和UBpart* 分別表示繁殖區(qū)域的下界和上界.規(guī)定繁殖區(qū)域是為保證蜣螂可以在安全范圍內(nèi)繁殖,計(jì)算公式如下:
LBpart*=max{Xpart*×(1-R),LBP},
UBpart*=min{Xpart*×(1+R),UBP},
其中 LBP 和 UBP 分別表示實(shí)際優(yōu)化問(wèn)題的下限和上限;
R=1-t/T,
T 為最大迭代次數(shù).
小蜣螂沒(méi)有豐富的覓食經(jīng)驗(yàn),所以要規(guī)定最佳區(qū)域引導(dǎo)蜣螂覓食,最佳區(qū)域下界、上界計(jì)算公式如下:
其中 LBall* 和 UBall* 分別表示最佳覓食區(qū)域的下界和上界, Xall* 表示全局最佳覓食位置.小蜣螂位置更新公式如下:
其中 C1 表示隨機(jī)數(shù), C2 為隨機(jī)向量.偷竊糞球數(shù)學(xué)建模公式如下:
xi(t+1)=Xall*+β×F×(|xi(t)-Xpart*|+|xi(t)-Xall*|),
其中 F 表示服從正態(tài)分布的隨機(jī)向量, β 表示恒定值.
1.1.2 LDBO算法
群智能優(yōu)化算法雖然有很強(qiáng)的尋優(yōu)能力,但普遍存在初始解隨機(jī)性過(guò)大、分布不均勻等問(wèn)題.為解決該問(wèn)題,混沌映射[15-16]被引入到群智能優(yōu)化算法的種群初始化中,蜣螂優(yōu)化算法也存在上述問(wèn)題,本文在蜣螂優(yōu)化算法的種群初始化中引入Logistic 混沌映射[17],計(jì)算公式如下:
其中: μ 為控制參數(shù),取值為(0,4]; X∈(0,1) .新構(gòu)建的算法命名為L(zhǎng)DBO,其以Logistic混沌映射替代原始蜣螂優(yōu)化算法種群初始化機(jī)制,避免了初始種群隨機(jī)性過(guò)大,同時(shí)也解決了初始解不豐富的問(wèn)題.
1.2 支持向量機(jī)
支持向量機(jī)是機(jī)器學(xué)習(xí)的一種,其在二分類問(wèn)題中展現(xiàn)了良好的性能,尤其是針對(duì)少量樣本數(shù)據(jù)集[18-20].SVM的核心思想是將低維特征通過(guò)函數(shù)映射到高維空間中,然后以此尋找最優(yōu)超平面,并以該超平面作為后續(xù)分類的依據(jù),圖1為最優(yōu)超平面示意圖.SVM內(nèi)部有兩個(gè)重要參數(shù),分別是懲罰因子和核參數(shù),這兩個(gè)參數(shù)值會(huì)直接影響SVM的性能.
1.3 改進(jìn)支持向量機(jī)模型
為優(yōu)化傳統(tǒng)SVM的性能,本文應(yīng)用新構(gòu)建的LDBO算法優(yōu)化SVM內(nèi)部的懲罰因子和核參數(shù),該模型命名為L(zhǎng)DBO-SVM,模型偽代碼如下.
LDBO-SVM偽代碼.
初始化參數(shù);
應(yīng)用Logistic混沌映射初始化初始種群;
While ( ?t
For i=1:N If i∈ 滾球蜣螂根據(jù)式(1)或式(2)更新蜣螂位置;End ifIf i∈ 繁殖蜣螂根據(jù)式(3)更新卵球位置;End ifIf i∈ 小蜣螂根據(jù)式(8)更新小蜣螂位置;End ifIf i∈ 偷竊蜣螂根據(jù)式(9)更新偷竊蜣螂位置;End if
End fort=t+1
End while
輸出最優(yōu)解和適應(yīng)度值;
將最優(yōu)解賦值給SVM.
2 實(shí)驗(yàn)及結(jié)果分析
2.1 數(shù)據(jù)集
實(shí)驗(yàn)數(shù)據(jù)集收集于多個(gè)開(kāi)源平臺(tái),共145個(gè)樣本,每個(gè)樣本包含11個(gè)特征.為避免評(píng)價(jià)標(biāo)準(zhǔn)過(guò)于單一,特征信息不僅包括學(xué)生上課的信息,同時(shí)也包括學(xué)生課外活動(dòng)的信息等,以保證評(píng)價(jià)結(jié)果的全面性.實(shí)驗(yàn)數(shù)據(jù)集特征信息列于表1.將樣本分為兩類,分別為失敗和不失敗.
2.2 實(shí)驗(yàn)設(shè)置
本文實(shí)驗(yàn)全部在實(shí)驗(yàn)室進(jìn)行,模型依托于MATLAB實(shí)現(xiàn).為驗(yàn)證本文模型的有效性,將本文模型與經(jīng)過(guò)其他5種群智能優(yōu)化算法改進(jìn)的支持向量機(jī)進(jìn)行比較.5種群智能優(yōu)化算法分別為蜣螂優(yōu)化算法、遺傳算法(genetic algorithm,GA)、粒子群優(yōu)化(particle swarm optimization,PSO)算法、鯨魚(yú)優(yōu)化算法(whale optimization algorithm,WOA)和灰狼優(yōu)化(grey wolf optimization,GWO)算法.將所有模型基于準(zhǔn)確率和訓(xùn)練時(shí)間進(jìn)行對(duì)比.
2.3 模型實(shí)驗(yàn)結(jié)果
各模型準(zhǔn)確率對(duì)比結(jié)果如圖2所示.由圖2可見(jiàn):本文LDBO-SVM模型準(zhǔn)確率最高,達(dá)94.53% ,說(shuō)明應(yīng)用本文模型預(yù)測(cè)學(xué)生成績(jī)可靠,教師可以以本文模型的預(yù)測(cè)結(jié)果作為改進(jìn)教學(xué)計(jì)劃的依據(jù);由原始DBO優(yōu)化的SVM模型準(zhǔn)確率達(dá)93.90% ,排名第二,比LDBO-SVM 模型低0.63個(gè)百分點(diǎn),說(shuō)明本文算法LDBO具有更好的尋優(yōu)能力,可以更大幅度地提升傳統(tǒng)SVM的性能;WOA-SVM和GWO-SVM模型分別排名第三和第四,準(zhǔn)確率分別達(dá) 93.82% 93.79% ; GA-SVM和PSO-SVM模型排名最后兩位,其中GA-SVM
的準(zhǔn)確率最低,僅有 93.19% .實(shí)驗(yàn)結(jié)果表明,LDBO-SVM與其他模型相比,準(zhǔn)確率提高了0.63~1.34 個(gè)百分點(diǎn),可作為學(xué)生成績(jī)預(yù)測(cè)的輔助性工具.各模型訓(xùn)練時(shí)間對(duì)比結(jié)果列于表2.
由表2可見(jiàn):GA-SVM模型的訓(xùn)練時(shí)間最長(zhǎng),達(dá) 3.870 09s ;DBO-SVM模型的訓(xùn)練時(shí)間最短,僅有 3.545 37s ;本文LDBO-SVM模型排名第二,訓(xùn)練時(shí)間為 3.55180s ,與排名第一的DBO-SVM模型相比,僅相差 0.006 43s ,與其他模型相比,LDBO-SVM模型在訓(xùn)練時(shí)間上仍有優(yōu)勢(shì),其訓(xùn)練時(shí)間少于其他模型.本文LDBO-SVM模型雖然訓(xùn)練時(shí)間不是最短的,但其具有最高的預(yù)測(cè)準(zhǔn)確率,所以LDBO-SVM模型綜合性能最好.
綜上所述,針對(duì)傳統(tǒng)支持向量機(jī)準(zhǔn)確率較低的問(wèn)題,本文提出了一種新的機(jī)器學(xué)習(xí)模型LDBO
SVM.在原始蜣螂優(yōu)化算法種群初始化中加人了混沌映射,構(gòu)建了LDBO算法,且為優(yōu)化傳統(tǒng) SVM性能,將 LDBO 集成到 SVM中,構(gòu)建LDBO-SVM模型.實(shí)驗(yàn)結(jié)果證明本文模型預(yù)測(cè)準(zhǔn)確率最高.
參考文獻(xiàn)
[1]KHOEI T T, SLIMANE H O, KAABOUCH N. Deep Learning: Systematic Review, Models, Challenges,and Research Directions [J]. Neural Computing 8. Applications,2023,35(31):23103-23124.
[2]WANG L N, ZHENG YC,WEI H X,et al. Stretching Dep Architectures:A Deep Learning Method without Back-Propagation Optimization [J]. Electronics,2023,12(7): 1537-1-1537-21.
[3]VON RUEDEN L,MAYER S,BECKH K,et al. Informed Machine Learning:A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems [J]. IEEE Transactions on Knowledge and Data Engineering, 2023,35(1):614-633.
[4]ZHANG Y D,GORRIZ JM,NAYAK D R. Optimization Algorithms and Machine Learning Techniques in Medical Image Analysis [J]. Mathematical Biosciences and Engineering,2023,2O(3): 5917-5920.
[5]ZHOU C M,WANG Y, XUE Q,et al. Diffrentiation of Bone Metastasis in Elderly Patients with Lung Adenocarcinoma Using Multiple Machine Learning Algorithms [J]. Cancer Control, 2O23,30:1-9.
[6]INNAN N, KHAN M A Z,PANDA B,et al. Enhancing Quantum Support Vector Machines through Variational Kernel Training [J]. Quantum Information Processing,2023,22: 374-1-374-18.
[7]SHRIVASTAVA S,SHUKLA S,KHARE N. Support Vector Machine with Eagle Loss Function [J]. Expert Systems with Applications,2024,238:112168-1-112168-16.
[8]RIZWAN A, IQBAL N,AHMAD R,et al. WR-SVM Model Based on the Margin Radius Approach for Solving the Minimum Enclosing Ball Problem in Support Vector Machine Clasification [J]. Applied Sciences-Basel, 2021,11(10):4657-1-4657-21.
[9]CHEN S H, ZHANG C Q,YIJP. Time-Optimal Trajectory Planning for Woodworking Manipulators Using an Improved PSO Algorithm [J]. Applied Sciences-Basel,2023,13(18):10482-1-10482-22.
[10]HSIEH C H, ZHANG Q, XU Y,et al. CMAIS-WOA: An Improved WOA with Chaotic Mapping and Adaptive Iterative Strategy [J]. Discrete Dynamics in Nature and Society, 2023,2023: 8160121-1-8160121-18.
[11]BANAIE-DEZFOULI M, NADIMI-SHAHRAKI M H, BEHESHTI Z. BE-GWO: Binary Extremum-Based Grey Wolf Optimizer for Discrete Optimization Problems [J]. Applied Soft Computing,2023,146:110583-1-110583-18.
[12]HUANG W C,LIU H Y, ZHANG Y,et al. Railway Dangerous Goods Transportation System Risk Identification: Comparisons among SVM, PSO-SVM,GA-SVM and GS-SVM [J]. Applied Soft Computing, 2021,109:107541-1-107541-16.
[13]HUANGQ H,WANG C,YE Y,et al. Recognition of EEG Based on Improved Black Widow Algorithm Optimized SVM[J]. Biomedical Signal Processing and Control, 2023,81: 104454-1-104454-11.
[14]LI J,LIU H,SUN S B,et al. Prediction of Complex Acute Appendicitis Based on HGS-MSVM[J]. IEEE Access,2023,11: 84336-84345.
[15]ADHIKARI S,KARFORMA S. An Eficient Image Encryption Method Using Henon-Logistic-Tent Chaotic Pseudo Random Number Sequence [J]. Wireless Personal Communications,2023,129(4): 2843-2859.
[16]HU A Q,GONG X X,GUO L. Joint Encryption Model Based on a Randomized Autoencoder Neural Network and Coupled Chaos Mapping [J]. Entropy,2023,25(8): 1153-1-1153-24.
[17]DONG Y M,YIN C H, XU C,et al. A Quantum Image Encryption Method for Dual Chaotic Systems Based on Quantum Logistic Mapping [J]. Physica Scripta,2024,99: 015103-1-015103-18.
[18]QIN Z F,LI QQ. An Uncertain Support Vector Machine with Imprecise Observations [J]. Fuzzy Optimization and Decision Making,2023,22(4):611-629.
[19]WANG F,XIE K L,HAN L,et al. Research on Support Vector Machine Optimization Based on Improved Quantum Genetic Algorithm [J]. Quantum Information Processing,2023,22(10): 380-1-380-27.
[20]LI JY,CHAO S W. A Novel Twin-Support Vector Machines Method for Binary Classification to Imbalanced Data [J]. Journal of Intelligent amp; Fuzzy Systems, 2023,44(4): 6901-6910.
(責(zé)任編輯:韓嘯)