柳雋琰,江沸菠*,彭于波,董莉
基于分解法與軌跡搜索的無人機(jī)群軌跡多目標(biāo)優(yōu)化模型
柳雋琰1,江沸菠1*,彭于波1,董莉2
(1.湖南師范大學(xué) 信息科學(xué)與工程學(xué)院,長(zhǎng)沙 410081; 2.湖南工商大學(xué) 計(jì)算機(jī)學(xué)院,長(zhǎng)沙 410205)(?通信作者電子郵箱jiangfb@hunnu.edu.cn)
基于深度學(xué)習(xí)(DL)的傳統(tǒng)多目標(biāo)求解器存在模型利用率低以及容易陷入局部最優(yōu)的問題。針對(duì)這些問題,提出了基于分解法與軌跡搜索的無人機(jī)群軌跡多目標(biāo)優(yōu)化模型(DTMO-UT)。所提模型包含編碼與解碼部分。首先,編碼部分由設(shè)備編碼器(Dencoder)和權(quán)重編碼器(Wencoder)組成,用于提取物聯(lián)網(wǎng)(IoT)設(shè)備的狀態(tài)信息與權(quán)重向量的特征,其中權(quán)重向量代表分解多目標(biāo)優(yōu)化問題(MOP)的標(biāo)量?jī)?yōu)化子問題,因此解決所有子問題即可解決該MOP。權(quán)重編碼器可以實(shí)現(xiàn)對(duì)所有子問題的編碼,從而提高了模型的利用率。然后,使用包含軌跡解碼器(Tdecoder)的解碼部分對(duì)編碼特征進(jìn)行解碼,以生成帕累托最優(yōu)解。最后,為了減少貪婪策略陷入局部最優(yōu)的現(xiàn)象,為軌跡解碼器設(shè)計(jì)軌跡搜索技術(shù),即通過生成多個(gè)候選軌跡選標(biāo)量值最優(yōu)的軌跡作為帕累托最優(yōu)解,從而增強(qiáng)了軌跡解碼器在軌跡規(guī)劃時(shí)的探索能力,并獲得質(zhì)量更好的帕累托集。仿真實(shí)驗(yàn)結(jié)果表明,所提模型相較于主流的基于DL的MOP求解器,在模型參數(shù)量降低98.93%的情況下,MOP解的分布性提高了0.076%,延展性提高了0.014%,平均綜合性提高了1.23%,表現(xiàn)出較強(qiáng)的實(shí)用性路徑規(guī)劃能力。
軌跡規(guī)劃;深度學(xué)習(xí);多目標(biāo)優(yōu)化;分解法;帕累托集
物聯(lián)網(wǎng)(Internet of Things, IoT)技術(shù)的快速發(fā)展提高了諸多時(shí)延敏感型應(yīng)用的服務(wù)質(zhì)量(Quality of Service, QoS),例如智能交通[1]、智能輔助醫(yī)療[2]和應(yīng)急救援[3]等。在這些應(yīng)用中,被采集的信息需要盡快傳輸?shù)街行姆?wù)器進(jìn)行分析與決策,過時(shí)信息可能導(dǎo)致嚴(yán)重的控制錯(cuò)誤或安全事故;因此,保證信息的新鮮度非常重要[4],衡量信息新鮮度的指標(biāo)為信息年齡(Age of Information, AoI)。數(shù)據(jù)收集系統(tǒng)中AoI的優(yōu)化也在近幾年受到了廣泛關(guān)注[5]。無人機(jī)(Unmanned Aerial Vehicle, UAV)具有強(qiáng)大的機(jī)動(dòng)性與靈活性,適合在數(shù)據(jù)收集系統(tǒng)中作為數(shù)據(jù)采集器[6]。一架UAV能按設(shè)定的軌跡依次采集沿線所有物聯(lián)網(wǎng)設(shè)備(IoT Device, IoTD)的數(shù)據(jù)再返回?cái)?shù)據(jù)中心,以實(shí)現(xiàn)高效化的數(shù)據(jù)收集;但相較于無人車和基站,UAV更易受到自身數(shù)據(jù)存儲(chǔ)和電池容量的限制,所以在UAV輔助數(shù)據(jù)收集系統(tǒng)的設(shè)計(jì)中,通常需要考慮UAV的數(shù)據(jù)存儲(chǔ)約束與能耗優(yōu)化,而能耗與AoI存在的競(jìng)爭(zhēng)關(guān)系會(huì)導(dǎo)致二者難以同時(shí)最小化[7]。如果降低系統(tǒng)能耗,那么被采集數(shù)據(jù)的AoI就會(huì)延長(zhǎng);同理,若縮短AoI,系統(tǒng)總能耗則會(huì)增加,因此,該優(yōu)化問題是一個(gè)典型的多目標(biāo)優(yōu)化問題(Multi-objective Optimization Problem, MOP)[8]。在軌跡規(guī)劃過程中,多目標(biāo)規(guī)劃存在多個(gè)需要優(yōu)化的目標(biāo),且這些目標(biāo)之間通常是互相競(jìng)爭(zhēng)的關(guān)系。MOP的目的是同時(shí)極大化或極小化多個(gè)具有沖突性或矛盾性的目標(biāo)的最優(yōu)化問題[9];而解決MOP不同于解決經(jīng)典的單目標(biāo)優(yōu)化問題,它的解并非唯一,即不存在一個(gè)最優(yōu)解,因此傳統(tǒng)的單目標(biāo)優(yōu)化器[10]難以解決MOP[9]。所以只能通過在多個(gè)目標(biāo)的優(yōu)化中進(jìn)行協(xié)調(diào)[7, 11]得到一組折中解以解決MOP,其中折中解也被稱為帕累托最優(yōu)解,這組折中解的集合被稱為帕累托集。對(duì)于本文研究的場(chǎng)景,帕累托最優(yōu)解是不被帕累托支配UAV軌跡,即如果一個(gè)UAV軌跡的能耗與AoI不能被其他任何UAV軌跡全面超越,那么該軌跡是一個(gè)帕累托最優(yōu)解。
求解帕累托集的方法有精確法與近似法。由于帕累托集中解的數(shù)量龐大,且每個(gè)帕累托最優(yōu)解的求解難度隨著問題規(guī)模呈指數(shù)級(jí)上升,所以用精確法求解大規(guī)模的多目標(biāo)優(yōu)化問題是不實(shí)用的[12],因此,在合理時(shí)間內(nèi)找到接近最優(yōu)解的近似法成為更實(shí)用的方法。其中:Srinivas等[13]提出了NSGA(Non-dominated Sorting Genetic Algorithm),通過生成數(shù)量龐大的解并篩選非支配解求出不被支配的帕累托最優(yōu)解;Deb等[14]在NSGA中引入了精英策略并得到了著名的二代NSGA(NSGA-Ⅱ),大幅提高了運(yùn)算效率;Bozorgchenani等[7]采用NSGA對(duì)移動(dòng)邊緣計(jì)算系統(tǒng)中的能耗與處理延遲進(jìn)行優(yōu)化;Ghambari等[15]改進(jìn)了NSGA以優(yōu)化UAV軌跡規(guī)劃中的能耗與效率。NSGA可以快速、直觀地得到帕累托集;然而,需要考慮多目標(biāo)的適應(yīng)度分配與解的多樣化維護(hù),需要大量的專家知識(shí)從而難以執(zhí)行[16],因此,復(fù)雜度更低且更容易實(shí)現(xiàn)的分解法成為了解決多目標(biāo)優(yōu)化問題的主流方法[9]。該方法把MOP分解為若干標(biāo)量?jī)?yōu)化子問題,每個(gè)子問題采用一個(gè)權(quán)重向量聚合多個(gè)目標(biāo)值從而得到一個(gè)標(biāo)量值,目的是最小化該標(biāo)量值。每個(gè)子問題的最優(yōu)解都是帕累托最優(yōu)解[16],因此解決所有子問題即可求得整個(gè)帕累托集。在基于分解法的傳統(tǒng)方法中,啟發(fā)式搜索算法是主流:Zhang等[16]結(jié)合分解法與進(jìn)化算法,并采用近鄰策略優(yōu)化種群,提出了MOEA/D(Multiple Objective Evolutionary Algorithm based on Decomposition);周愛民等[17]在MOEA/D的基礎(chǔ)上采用了一個(gè)改進(jìn)的混合高斯模型對(duì)群體建模并采樣產(chǎn)生新個(gè)體,并利用貪婪策略更新群體,從而求解整個(gè)帕累托集;侯薇等[18]采用混合交叉策略充分利用不同交叉算子的優(yōu)勢(shì),同時(shí)針對(duì)演化過程收斂的特點(diǎn)與結(jié)合局部搜索策略求解帕累托集。在分解法的基礎(chǔ)上采用經(jīng)典單目標(biāo)優(yōu)化方法也能夠解決子問題,其中較著名的工具為谷歌團(tuán)隊(duì)研發(fā)的OR-Tools(Operations Research-Tools)[19],OR-Tools整合并改進(jìn)了諸多經(jīng)典優(yōu)化器,如Concorde[20]、Gurobi[21]等,并在實(shí)際使用時(shí)根據(jù)問題規(guī)模與數(shù)據(jù)特點(diǎn)進(jìn)行自適應(yīng)選擇。相較于帕累托占優(yōu)的方法,基于分解法的方法具有更低的復(fù)雜度且更容易實(shí)現(xiàn)[16]
在基于深度學(xué)習(xí)(Deep Learning, DL)的優(yōu)化方法中,已經(jīng)有諸多解決單目標(biāo)優(yōu)化問題的研究,如循環(huán)神經(jīng)網(wǎng)絡(luò)(Recurrent Neural Network, RNN)[22]、指針網(wǎng)絡(luò)(Pointer Network, PN)[23]和注意力模型(Attention Model, AM)[24]等。相較于啟發(fā)式搜索算法,基于DL的方法運(yùn)算速度大幅提升,且整體性能已經(jīng)接近甚至超越啟發(fā)式搜索算法。在DL解決MOP的研究中,Li等[25]采用分解法與PN模型對(duì)每個(gè)子問題進(jìn)行求解,并訓(xùn)練了多組模型適應(yīng)不同的權(quán)重向量;Wu等[26]采用分解法與AM求解所有子問題,并采用聯(lián)合優(yōu)化策略讓權(quán)重向量接近的模型聯(lián)合解決子問題;Zhang等[27]采用進(jìn)化算法對(duì)AM的所有子問題的解二次優(yōu)化,并用優(yōu)化后的結(jié)果調(diào)優(yōu)模型,同時(shí)解決了多目標(biāo)優(yōu)化中帶時(shí)間窗的問題;董健等[28]提出了基于全連接神經(jīng)網(wǎng)絡(luò)的天線結(jié)構(gòu)多目標(biāo)設(shè)計(jì)方法,并采用多目標(biāo)離子群算法對(duì)網(wǎng)絡(luò)進(jìn)行優(yōu)化;黃博南等[29]采用若干RNN作為子問題的求解器解決綜合能源系統(tǒng)的污染物排放量和綜合運(yùn)行成本。然而,這些方法需要訓(xùn)練多個(gè)模型以適應(yīng)權(quán)重向量不同的子問題,因此模型利用率低[9];同時(shí),DL方法在輸出軌跡時(shí)通常采用貪婪策略輸出所有訪問節(jié)點(diǎn),容易陷入局部最優(yōu)[30]。
本文的主要工作如下:
1)首先將UAV輔助的數(shù)據(jù)收集系統(tǒng)中能耗與AoI的優(yōu)化建模為一個(gè)MOP,通過求解帕累托集解決該MOP。為此,采用分解法把MOP轉(zhuǎn)換為多個(gè)標(biāo)量?jī)?yōu)化子問題,每個(gè)子問題都是通過優(yōu)化UAV軌跡最小化一個(gè)標(biāo)量目標(biāo)值,解決所有子問題可得到帕累托集,即優(yōu)化能耗與AoI的折中方案下中所有最優(yōu)UAV軌跡;同時(shí),提出了高性能的DTMO-UT模型解決所有子問題,從而得到高質(zhì)量的帕累托集。
2)在DTMO-UT模型的編碼器部分,提出了設(shè)備編碼器(Device encoder, Dencoder)與權(quán)重編碼器(Weight encoder, Wencoder)分別提取IoTD狀態(tài)信息與權(quán)重向量的特征,其中IoTD狀態(tài)信息包含該IoTD的位置與帶傳輸數(shù)據(jù)量。采用Dencoder與Wencoder即可編碼所有標(biāo)量?jī)?yōu)化子問題的特征,實(shí)現(xiàn)用一個(gè)模型解得整個(gè)帕累托集,提高了模型的參數(shù)利用率。
3)在軌跡解碼器(Trajectory decoder, Tdecoder)中加入軌跡搜索。在解決子問題時(shí),Tdecoder首先輸出多個(gè)IoTD均不相同的初始訪問候選軌跡,之后選擇標(biāo)量值最優(yōu)的候選軌跡作為帕累托最優(yōu)解。軌跡搜索增強(qiáng)了Tdecoder的全局搜索能力,能夠得到質(zhì)量更好的帕累托集。
圖1 多UAV輔助的數(shù)據(jù)收集系統(tǒng)
傳輸(T表示傳輸(Transfering))能耗為:
在本文系統(tǒng),有兩個(gè)優(yōu)化目標(biāo):一是最小化能耗,二是最小化采集數(shù)據(jù)的AoI。系統(tǒng)能耗的定義為:
系統(tǒng)的AoI定義為所有UAV的最大AoI:
所以,本文的優(yōu)化目標(biāo)為:
圖2 DTMO-UT模型的訓(xùn)練與驗(yàn)證流程
為了得到高性能的DTMO-UT模型,訓(xùn)練階段需要產(chǎn)生大量訓(xùn)練數(shù)據(jù),其中一個(gè)訓(xùn)練數(shù)據(jù)包含隨機(jī)分布的IoTD、不同的待傳輸數(shù)據(jù)量和一個(gè)對(duì)應(yīng)偏好權(quán)重。之后DTMO-UT針對(duì)每個(gè)訓(xùn)練數(shù)據(jù)采樣大量軌跡用于計(jì)算策略梯度從而優(yōu)化DTMO-UT模型。當(dāng)達(dá)到最大訓(xùn)練次數(shù)后即可完成訓(xùn)練。在驗(yàn)證階段,DTMO-UT的Dencoder與Wencoder分別對(duì)IoTD狀態(tài)信息與權(quán)重向量進(jìn)行編碼以提取特征,之后在Tdecoder中解碼并生成解,軌跡搜索通過生成多個(gè)軌跡并擇優(yōu)的方式能幫助Tdecoder減輕局部最優(yōu)并找到更好的解,當(dāng)所有標(biāo)量?jī)?yōu)化子問題解決后即可得到帕累托集。DTMO-UT模型實(shí)現(xiàn)了高參數(shù)利用率并能得到高質(zhì)量帕累托集。所有帕累托最優(yōu)解在多目標(biāo)空間的映射稱為帕累托前沿,也能體現(xiàn)多目標(biāo)優(yōu)化算法的綜合性能,本文研究的多目標(biāo)為能耗與AoI,因此每個(gè)帕累托最優(yōu)解映射為一個(gè)點(diǎn)。當(dāng)帕累托前沿越接近“左下”時(shí),帕累托集的質(zhì)量越高。得到帕累托前沿的過程如圖3所示。
圖3 DTMO-UT模型得到帕累托前沿的過程
在處理MOP時(shí),本文采用在MOEA/D的分解法把能耗與AoI優(yōu)化問題分解為若干標(biāo)量?jī)?yōu)化子問題,并采用一組權(quán)重向量表示所有子問題。不過在解決子問題時(shí),本文采用基于DL的DTMO-UT模型提取IoTD信息與權(quán)重向量的特征并生成對(duì)應(yīng)的帕累托最優(yōu)解,而非MOEA/D的啟發(fā)式搜索;其次,DTMO-UT模型支持輸入實(shí)數(shù)范圍內(nèi)的權(quán)重向量,因此能解決的子問題數(shù)為無窮,而MOEA/D僅能處理固定數(shù)量的子問題。
圖4 DTMO-UT模型的結(jié)構(gòu)
Dencoder主要用于提取IoTD狀態(tài)信息。場(chǎng)景中的IoTD以圖的形式進(jìn)行存儲(chǔ),包含節(jié)點(diǎn)信息與邊信息。Dencoder包含3個(gè)部分:圖嵌入層、LSA層和特征合并層。
2.2.1圖嵌入層
本文提出了基于圖嵌入與LSA機(jī)制的Dencoder對(duì)IoTD狀態(tài)信息編碼。圖嵌入可以有效提取IoTD的圖信息,因此本文采用了基于Structure2Vector(S2V)[31]結(jié)構(gòu)實(shí)現(xiàn)圖嵌入層。注意力機(jī)制可以使模型注意重要的特征信息,并通過加權(quán)的方式體現(xiàn)關(guān)注程度[32],LSA機(jī)制則可以讓模型對(duì)輸入的信息自發(fā)地產(chǎn)生不同的關(guān)注,從而提取IoTD自注意力特征以產(chǎn)生當(dāng)前權(quán)重向量下的一組帕累托最優(yōu)UAV軌跡。最后通過一個(gè)特征合并層合并所有IoTD自注意力為一個(gè)全局圖特征。
2.2.2線性自注意力層
2.2.3特征合并層
在得到IoTD的自注意力特征后,對(duì)IoTD的自注意力特征進(jìn)行合并。在合并特征時(shí),文獻(xiàn)[24]中對(duì)LSA特征以求平均值的方式進(jìn)行合并;雖然平均值法非常高效,但是容易導(dǎo)致特征不可恢復(fù)的缺失[37]。所以本文采用一個(gè)可學(xué)習(xí)的線性特征合并層合并所有IoTD的自注意力特征得到全局圖特征,公式為:
2.4.1軌跡搜索下的場(chǎng)景狀態(tài)
2.4.2線性自注意力輸出層
為了在無標(biāo)簽數(shù)據(jù)下訓(xùn)練得到高性能DTMO-UT模型,本節(jié)將介紹針對(duì)MOP的DL訓(xùn)練方法。因此,狀態(tài)、行為和獎(jiǎng)勵(lì)如下。
狀態(tài) 在每一步的軌跡規(guī)劃中,輸入的狀態(tài)為:
行為 DTMO-UT模型的行為是讓UAV選擇一個(gè)IoTD進(jìn)行數(shù)據(jù)收集或者回到數(shù)據(jù)中心:
獎(jiǎng)勵(lì) 獎(jiǎng)勵(lì)為能耗與AoI與權(quán)重向量的加權(quán)和的負(fù)數(shù):
本文將設(shè)計(jì)DL模型的參數(shù)對(duì)比、與主流方法的綜合性能的對(duì)比實(shí)驗(yàn)驗(yàn)證DTMO-UT模型的高效性。此外,為了使模型收斂,本文的訓(xùn)練采用的學(xué)習(xí)率為0.001,學(xué)習(xí)率衰減(learning rate decay)為0.96,訓(xùn)練迭代次數(shù)為200[9]。運(yùn)行環(huán)境配置是Windows 11操作系統(tǒng),PyTorch 1.7.1框架,顯卡為NVIDIA RTX3060,處理器為i7-11500與16 GB內(nèi)存。
首先,本文將DTMO-UT模型與同樣基于分解法和AM的MODRL/D-AM(Multiple Objective Deep Reinforcement Learning using Attention Model)[26]模型的參數(shù)量與可解決的子問題數(shù)量進(jìn)行對(duì)比,其中D(Decomposition)是多目標(biāo)優(yōu)化中常用的分解法。MODRL/D-AM把多目標(biāo)問題分解為101個(gè)[9,16,39]標(biāo)量?jī)?yōu)化子問題。因此,對(duì)比結(jié)果展示在表1中。
表1參數(shù)量與可解決的子問題數(shù)的對(duì)比
Tab.1 Comparison of parameter quantities and solvable sub-problems
在許多基于分解法的DL方法中,分解的子問題數(shù)為101[25-27],在MODRL/D-AM中,由于需要針對(duì)每個(gè)子問題專門優(yōu)化參數(shù),因此總體參數(shù)量遠(yuǎn)超DTMO-UT模型。DTMO-UT模型的參數(shù)量相較于MODRL/D-AM減少了98.93%,且MODRL/D-AM只支持有限個(gè)標(biāo)量子問題的權(quán)重向量;而DTMO-UT模型的Wencoder支持輸入實(shí)數(shù)范圍內(nèi)的權(quán)重向量,因此可以解決數(shù)量不固定的子問題。從表1可以看出DTMO-UT模型具有更少的參數(shù)使用量,并支持靈活數(shù)量的權(quán)重向量。
為了檢驗(yàn)DTMO-UT模型的總體性能,本文采用對(duì)比帕累托前沿的方式,帕累托前沿是所有帕累托最優(yōu)解在多目標(biāo)解空間的映射,因?yàn)槊總€(gè)解有能耗與AoI兩個(gè)指標(biāo),因此可以映射到一個(gè)二維平面。在對(duì)比算法中,除了上一小節(jié)提到的MODRL/D-AM,還有OR-Tools[19]、NSGA-Ⅱ[14]、MOEA/D[16]和未采用軌跡搜索的DTMO-UT(DTMO-UT model with No trajectory search, DTMO-UTN)模型。參數(shù)設(shè)定為:
1)MOEA/D。分解的子問題數(shù)為100,鄰向量數(shù)為10,種群數(shù)為100,個(gè)體長(zhǎng)度為150,單點(diǎn)交叉算子概率為0.9,變異算子概率為0.01,迭代次數(shù)為150。
2)NSGA-Ⅱ。種群數(shù)為2 000,個(gè)體長(zhǎng)度為150,交叉算子概率為0.9,變異算子概率為0.01,迭代次數(shù)為25 000。
3)OR-Tools。子問題數(shù)為101,并采用加權(quán)求和的方式定義所有子問題。
4)MODRL/D-AM。分解的子問題數(shù)為100,隱藏層維度為128,訓(xùn)練數(shù)據(jù)批大小為200,數(shù)據(jù)量總大小為500 000。
由圖5可得,所有方法得到的帕累托前沿均為凹狀,其中DTMO-UT與MODRL/D-AM的曲線是接近的,因?yàn)檫@兩個(gè)方法所提出的模型都是基于AM且采用深度強(qiáng)化學(xué)習(xí)進(jìn)行訓(xùn)練,而它們均優(yōu)于NSGA-Ⅱ、MOEA/D與OR-Tools。
圖5 不同場(chǎng)景下所有方法所得到的帕累托前沿對(duì)比
表2分布性與延展性指標(biāo)的對(duì)比
Tab.2 Comparison of distribution and ductility indicators
從表2中可以總結(jié)出DTMO-UT模型在分布性與延展性都優(yōu)于其他方法,是因?yàn)椋?)分解法更容易得到均勻的解集,2)DTMO-UT模型的軌跡搜索能提升每個(gè)子問題的解的質(zhì)量,因此提高了解極端值的能力。在啟發(fā)式方法中,NSGA-Ⅱ并非基于分解法,因此容易產(chǎn)生不均勻的解集,所以MOEA/D能得到比NSGA-Ⅱ更好的解集。
從表3可以總結(jié)出基于DRL的DTMO-UT、DTMO-UTN與MODRL/D-AM在所有測(cè)試數(shù)據(jù)中的性能以及運(yùn)算時(shí)間均優(yōu)于基于啟發(fā)式搜索的NSGA-Ⅱ、MOEA/D與OR-Tools。
在運(yùn)算時(shí)間的對(duì)比中,MODRL/D-AM優(yōu)于DTMO-UTN與DTMO-UT,并遠(yuǎn)少于NSGA-Ⅱ、MOEA/D與OR-Tools,原因是:1)谷歌團(tuán)隊(duì)對(duì)OR-Tools的優(yōu)化器進(jìn)行了優(yōu)化,從而比NSGA-Ⅱ、MOEA/D運(yùn)算更快。2)基于DL的方法無須反復(fù)迭代且有并行計(jì)算的加持,使運(yùn)算速度遠(yuǎn)超NSGA-Ⅱ與MOEA/D。3)MODRL/D-AM僅設(shè)計(jì)一個(gè)編碼器用于編碼IoTD信息,其模型結(jié)構(gòu)比DTMO-UT簡(jiǎn)單,因此MODRL/D-AM運(yùn)算速度較快。4)DTMO-UT的軌跡搜索會(huì)生成若干候選軌跡,因此DTMO-UTN運(yùn)算比DTMO-UT更快。但是表1與表3所示MODRL/D-AM需要占用更多存儲(chǔ)空間且性能不及DTMO-UT,采用軌跡搜索后DTMO-UT的運(yùn)算效率并未損失太多同時(shí)性能上有提升。
在性能對(duì)比中,對(duì)于所有HV值的指標(biāo),DTMO-UT最高,DTMO-UTN與MODRL/D-AM次之,之后是OR-Tools、MOEA/D與NSGA-Ⅱ,原因是:1)DRL方法訓(xùn)練的模型具有更好的泛化能力。2)自注意力模型比啟發(fā)式搜索算法具有更強(qiáng)的特征提取能力。3)采用了軌跡搜索的DTMO-UT模型能夠減輕局部最優(yōu)的現(xiàn)象,得到更高質(zhì)量的帕累托集,從而提高了HV值,并降低了多次測(cè)試結(jié)果的標(biāo)準(zhǔn)差。
表3HV值的最大值、最小值、平均值、標(biāo)準(zhǔn)差以及算法運(yùn)算時(shí)間的對(duì)比
Tab.3 Comparison of maximum, minimum, average and standard deviation of HV value as well as algorithm running time
本文研究的是一個(gè)通過優(yōu)化UAV軌跡最小化數(shù)據(jù)收集系統(tǒng)中的能耗與AoI的MOP,通過求出帕累托集解決該MOP??紤]到求解帕累托集的傳統(tǒng)方法存在模型利用率低和容易陷入局部最優(yōu)的問題,本文在分解法的基礎(chǔ)上提出了DTMO-UT模型。該模型的編碼部分通過編碼IoTD的信息與權(quán)重向量得到所有子問題的特征,實(shí)現(xiàn)了利用一個(gè)模型解決所有子問題的目的,從而提高了模型利用率。在解碼器部分采用了軌跡搜索,提高了模型在軌跡生成時(shí)的探索能力與帕累托集的質(zhì)量,減輕了陷入局部最優(yōu)的現(xiàn)象。
然而,本文模型依然存在一些限制,本文所研究的MOP僅考慮了最小化能耗與AoI,而實(shí)際的數(shù)據(jù)收集系統(tǒng)中存在諸多指標(biāo)需要優(yōu)化,比如UAV的數(shù)量、任務(wù)完成時(shí)間等。因此,在未來研究中,將考慮更多的優(yōu)化指標(biāo),并根據(jù)多個(gè)指標(biāo)的特性進(jìn)行優(yōu)化,找到最合適的優(yōu)化方法。
[1] CUI Q, WANG Y, CHEN K-C, et al. Big data analytics and network calculus enabling intelligent management of autonomous vehicles in a smart city [J]. IEEE Internet of Things Journal, 2019, 6(2): 2021-2034.
[2] VERMA P,SOOD S K. Fog assisted-IoT enabled patient health monitoring in smart homes [J]. IEEE Internet of Things Journal, 2018, 5(3): 1789-1796.
[3] PATHAK N, DEB P K, MUKHERJEE A, et al. IoT-to-the-rescue: A survey of IoT solutions for COVID-19-like pandemics [J]. IEEE Internet of Things Journal, 2021, 8(17): 13145-13164.
[4] HU H, XIONG K, QU G, et al. AoI-minimal trajectory planning and data collection in UAV-assisted wireless powered IoT networks [J]. IEEE Internet of Things Journal, 2021, 8(2): 1211-1223.
[5] YATES R D, SUN Y, BROWN D R, et al. Age of information: An introduction and survey [J]. IEEE Journal on Selected Areas in Communications, 2021, 39(5): 1183-1210.
[6] CHEN Z, CHI K, ZHENG K, et al. Minimization of transmission completion time in UAV-enabled wireless powered communication networks [J]. IEEE Internet of Things Journal, 2020, 7(2): 1245-1259.
[7] BOZORGCHENANI A, MASHHADI F, TARCHI D, et al. Multi-objective computation sharing in energy and delay constrained mobile edge computing environments [J]. IEEE Transactions on Mobile Computing, 2021, 20(10): 2992-3005.
[8] LIAO Y, FRIDERIKOS V. Energy and age Pareto optimal trajectories in UAV-assisted wireless data collection [J]. IEEE Transactions on Vehicular Technology, 2022, 71(8): 9101-9106.
[9] LIN X, YANG Z, ZHANG Q. Pareto set learning for neural multi-objective combinatorial optimization [EB/OL]. (2022-03-29)[2022-08-10]. https://arxiv.org/pdf/2203.15386.pdf.
[10] HELSGAUN K. An effective implementation of the Lin–Kernighan traveling salesman heuristic [J]. European Journal of Operational Research, 2000, 126(1): 106-130.
[11] 肖曉偉,肖迪,林錦國(guó),等. 多目標(biāo)優(yōu)化問題的研究概述[J]. 計(jì)算機(jī)應(yīng)用研究, 2011, 28(3): 805-808, 827.(XIAO X W, XIAO D, LIN J G, et al. Overview on multi-objective optimization problem research [J]. Application Research of Computers, 2011, 28(3): 805-808,827.)
[12] FLORIOS K, MAVROTAS G. Generation of the exact Pareto set in multi-objective traveling salesman and set covering problems [J]. Applied Mathematics and Computation, 2014, 237: 1-19.
[13] SRINIVAS N, DEB K. Muiltiobjective optimization using nondominated sorting in genetic algorithms [J]. Evolutionary Computation, 1994, 2(3): 221-248.
[14] DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm: NSGA?Ⅱ [J]. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182-197.
[15] GHAMBARI S, GOLABI M, LEPAGNOT J, et al. An enhanced NSGA?Ⅱ for multiobjective UAV path planning in urban environments[C]// Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence. Piscataway: IEEE, 2020: 106-111.
[16] ZHANG Q, LI H. MOEA/D: A multiobjective evolutionary algorithm based on decomposition [J]. IEEE Transactions on Evolutionary Computation, 2007, 11(6): 712-731.
[17] 周愛民,張青富,張桂戌.一種基于混合高斯模型的多目標(biāo)進(jìn)化算法[J]. 軟件學(xué)報(bào), 2014, 25(5): 913-928.(ZHOU A M, ZHANG Q F, ZHANG G X. Multi-objective evolutionary algorithm based on mixed Gaussian models[J]. Journal of Software, 2014, 25(5): 913-928.)
[18] 侯薇,董紅斌,印桂生.一種改進(jìn)的基于分解的多目標(biāo)進(jìn)化算法[J]. 計(jì)算機(jī)科學(xué), 2014, 41(2): 114-118.(HOU W, DONG H B, YIN G S. Enhanced multi-objective evolutionary algorithm based on decomposition [J]. Computer Science, 2014, 41(2): 114-118.)
[19] LAURENT P, VINCENT F. OR-Tools [EB/OL]. (2022-11-25)[2023-11-29]. https://developers.google.com/optimization/.
[20] COOK W. Concorde [EB/OL]. (2003-11-05)[2022-12-05]. https://www.math.uwaterloo.ca/tsp/concorde.html.
[21] GU Z H, ROTHBERG E, BIXBY R. Gurobi [EB/OL]. (2021-11-16)[2022-12-05]. https://www.gurobi.com.
[22] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks [C]// Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. New York: ACM, 2014: 3104-3112.
[23] VINYALS O, FORTUNATO M, JAITLY N. Pointer networks [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2. New York: ACM, 2015: 2692-2700.
[24] KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![EB/OL]. (2018-03-22)[2022-08-29]. https://arxiv.org/pdf/1803.08475.pdf
[25] LI K, ZHANG T, WANG R. Deep reinforcement learning for multiobjective optimization [J]. IEEE Transactions on Cybernetics, 2021, 51(6): 3103-3114.
[26] WU H, WANG J, ZHANG Z. MODRL/D-AM: multiobjective deep reinforcement learning algorithm using decomposition and attention model for multiobjective optimization [C]// Proceedings of the 11th International Symposium on Artificial Intelligence Algorithms and Applications. Singapore: Springer, 2020: 575-589.
[27] ZHANG Y, WANG J, ZHANG Z, et al. MODRL/D-EL: multiobjective deep reinforcement learning with evolutionary learning for multiobjective optimization [C]// Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-8.
[28] 董健,欽文雯,李瑩娟,等. 基于改進(jìn)反向傳播神經(jīng)網(wǎng)絡(luò)代理模型的快速多目標(biāo)天線設(shè)計(jì)[J]. 電子與信息學(xué)報(bào), 2018, 40(11): 2712-2719.(DONG J, QIN W W, LI Y J, et al. Fast multi-objective antenna design based on improved back propagation neural network surrogate model [J]. Journal of Electronics & Information Technology, 2018, 40(11): 2712-2719.)
[29] 黃博南,王勇,李玉帥,等. 基于分布式神經(jīng)動(dòng)態(tài)優(yōu)化的綜合能源系統(tǒng)多目標(biāo)優(yōu)化調(diào)度[J]. 自動(dòng)化學(xué)報(bào), 2022, 48(7): 1718-1736.(HUANG B N, WANG Y, LI Y S, et al. Multi-objective optimal scheduling of integrated energy systems based on distributed neurodynamic optimization [J]. Acta Automatica Sinica, 2022, 48(7): 1718-1736.)
[30] KWON Y D, CHOO J, KIM B, et al. POMO: policy optimization with multiple optima for reinforcement learning [J]. Advances in Neural Information Processing Systems, 2020, 33: 21188-21198.
[31] DAI H, KHALIL E, ZHANG Y, et al. Learning combinatorial optimization algorithms over graphs [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6351-6361.
[32] YANG L, WANG S, CHEN X, et al. High-fidelity permeability and porosity prediction using deep learning with the self-attention mechanism [J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(7): 3429-3443.
[33] NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines[C]// Proceedings of the 27th International Conference on Machine Learning. New York: ACM, 2010: 807-814.
[34] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J]. Advances in Neural Information Processing Systems, 2017, 30: 6000-6010.
[35] KATHAROPOULOS A, VYAS A, PAPPAS N, et al. Transformers are RNNs: fast autoregressive transformers with linear attention[C]// Proceedings of the 37th International Conference on Machine Learning. New York: ACM, 2020: 5156-5165.
[36] CLEVERT D-A, UNTERTHINER T, HOCHREITER S. Fast and accurate deep network learning by exponential linear units (ELUs)[EB/OL]. (2015-11-23)[2022-12-26]. https://arxiv.org/pdf/1511.07289.pdf.
[37] FANELLO S R, NOCETI N, CILIBERTO C, et al. Ask the image: supervised pooling to preserve feature locality [C]// Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 851-858.
[38] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8: 229-256.
[39] ISHIBUCHI H, SAKANE Y, TSUKAMOTO N, et al. Adaptation of scalarizing functions in MOEA/D: an adaptive scalarizing function-based multiobjective evolutionary algorithm [C]// Proceedings of the 2009 International Conference on Evolutionary Multi-Criterion Optimization. Berlin: Springer, 2009: 438-452.
[40] NAZARI M, OROOJLOOY A, TAKá? M, et al. Reinforcement learning for solving the vehicle routing problem [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9861-9871.
[41] RIQUELME N, VON LüCKEN C, BARAN B. Performance metrics in multi-objective optimization [C]// Proceedings of the 41st Latin American Computing Conference. Piscataway: IEEE, 2015: 1-11.
[42] ZITZLER E, THIELE L, LAUMANNS M, et al. Performance assessment of multiobjective optimizers: an analysis and review [J]. IEEE Transactions on Evolutionary Computation, 2003, 7(2): 117-132.
Multi-objective optimization model for unmanned aerial vehicles trajectory based on decomposition and trajectory search
LIU Junyan1, JIANG Feibo1*, PENG Yubo1, DONG Li2
(1,,410081,;2,,410205,)
The traditional Deep Learning (DL)-based multi-objective solvers have the problems of low model utilization and being easy to fall into the local optimum. Aiming at these problems, a Multi-objective Optimization model for Unmanned aerial vehicles Trajectory based on Decomposition and Trajectory search (DTMO-UT) was proposed. The proposed model consists of the encoding and decoding parts. First, a Device encoder (Dencoder) and a Weight encoder (Wencoder) were contained in the encoding part, which were used to extract the state information of the Internet of Things (IoT) devices and the features of the weight vectors. And the scalar optimization sub-problems that were decomposed from the Multi-objective Optimization Problem (MOP) were represented by the weight vectors. Hence, the MOP was able to be solved by solving all the sub-problems. The Wencoder was able to encode all sub-problems, which improved the utilization of the model. Then, the decoding part containing the Trajectory decoder (Tdecoder) was used to decode the encoding features to generate the Pareto optimal solutions. Finally, to alleviate the phenomenon of greedy strategy falling into the local optimum, the trajectory search technology was added in trajectory decoder, that was generating multiple candidate trajectories and selecting the one with the best scalar value as the Pareto optimal solution. In this way, the exploration ability of the trajectory decoder was enhanced during trajectory planning, and a better-quality Pareto set was found. The results of simulation experiments show that compared with the mainstream DL MOP solvers, under the condition of 98.93% model parameter quantities decreasing, the proposed model reduces the distribution of MOP solutions by 0.076%, improves the ductility of the solutions by 0.014% and increases the overall performance by 1.23%, showing strong ability of practical trajectory planning of DTMO-UT model.
trajectory planning; Deep Learning (DL); multi-objective optimization; decomposition; Pareto set
This work is partially supported by National Natural Science Foundation of China (41904127).
LIU Junyan, born in 1998, M. S. candidate. His research interests include deep learning, combinatorial optimization.
JIANG Feibo, born in 1982, Ph. D., association professor. His research interests include deep learning, reinforcement learning, federated learning.
PENG Yubo, born in 1996, M. S. candidate. His research interests include edge computing, federated learning.
DONG Li, born in 1982, Ph. D., association professor. Her research interests include deep learning, reinforcement learning.
TP183
A
1001-9081(2023)12-3806-10
10.11772/j.issn.1001-9081.2022121882
2022?12?22;
2023?03?15;
2023?03?17。
國(guó)家自然科學(xué)基金資助項(xiàng)目(41904127)。
柳雋琰(1998—),男,湖南岳陽人,碩士研究生,主要研究方向:深度學(xué)習(xí)、組合優(yōu)化;江沸菠(1982—),男,湖南長(zhǎng)沙人,副教授,博士,主要研究方向:深度學(xué)習(xí)、強(qiáng)化學(xué)習(xí)、聯(lián)邦學(xué)習(xí);彭于波(1996—),男,重慶人,碩士研究生,主要研究方向:邊緣計(jì)算、聯(lián)邦學(xué)習(xí);董莉(1982—),女,湖南長(zhǎng)沙人,副教授,博士,主要研究方向:深度學(xué)習(xí)、強(qiáng)化學(xué)習(xí)。