余云河,孫君
機(jī)器類通信中集中式與分布式Q學(xué)習(xí)的資源分配算法研究
余云河,孫君
(南京郵電大學(xué)通信與信息工程學(xué)院,江蘇 南京 210023)
針對(duì)海量機(jī)器類通信(massive machine type communication,mMTC)場(chǎng)景,以最大化系統(tǒng)吞吐量為目標(biāo),且在保證部分機(jī)器類通信設(shè)備(machine type communication device,MTCD)的服務(wù)質(zhì)量(quality of service,QoS)要求前提下,提出兩種基于Q學(xué)習(xí)的資源分配算法:集中式Q學(xué)習(xí)算法(team-Q)和分布式Q學(xué)習(xí)算法(dis-Q)。首先基于余弦相似度(cosine similarity,CS)聚類算法,考慮到MTCD地理位置和多級(jí)別QoS要求,構(gòu)造代表MTCD和數(shù)據(jù)聚合器(data aggregator,DA)的多維向量,根據(jù)向量間CS值完成分組。然后分別利用team-Q學(xué)習(xí)算法和dis-Q學(xué)習(xí)算法為MTCD分配資源塊(resource block,RB)和功率。吞吐量性能上,team-Q和dis-Q算法相較于動(dòng)態(tài)資源分配算法、貪婪算法分別平均提高了16%、23%;復(fù)雜度性能上,dis-Q算法僅為team-Q算法的25%及以下,收斂速度則提高了近40%。
資源分配;集中式Q學(xué)習(xí);分布式Q學(xué)習(xí);余弦相似度;多維向量
機(jī)器型通信(machine type communication,MTC)允許智能物體在沒(méi)有人為干預(yù)情況下實(shí)現(xiàn)相互通信,3GPP(3rd Generation Partnership Project)認(rèn)為MTC將會(huì)對(duì)物聯(lián)網(wǎng)(internet of things,IoT)的發(fā)展起到關(guān)鍵作用[1-2]。隨著IoT的普及,對(duì)“物”之間的通信具有很高的需求,即使5G也不能保證滿足未來(lái)新業(yè)務(wù)的所有需求,因此在B5G(beyond 5G)和6G網(wǎng)絡(luò)中,MTC將會(huì)是研究人員關(guān)注的重點(diǎn)[3-4]。思科預(yù)測(cè)到2022年各行業(yè)中將會(huì)有39億個(gè)MTC設(shè)備連接到網(wǎng)絡(luò)中[5],而海量機(jī)器類通信設(shè)備(machine type communication device,MTCD)連接不僅導(dǎo)致頻譜資源匱乏,還會(huì)造成網(wǎng)絡(luò)擁塞,給基站(base station,BS)帶來(lái)沉重負(fù)擔(dān)。
在部署了高密度MTCD的mMTC網(wǎng)絡(luò)中,將MTCD分組為較小集群被視為一種有助于緩解MTC網(wǎng)絡(luò)擁塞,提高M(jìn)TCD接入成功率,進(jìn)而促進(jìn)吞吐量提升的技術(shù)[6]。為此,國(guó)內(nèi)外學(xué)者提出了一系列關(guān)于MTCD分組聚類算法。文獻(xiàn)[6-7]分別依據(jù)設(shè)備的QoS要求、地理位置進(jìn)行分組,文獻(xiàn)[8]為了延長(zhǎng)網(wǎng)絡(luò)壽命,依據(jù)MTCD剩余能量以及與BS間的距離進(jìn)行聚類。文獻(xiàn)[9-10]在傳統(tǒng)-means算法基礎(chǔ)上作出改進(jìn),分別針對(duì)MTCD能量效率與MTC網(wǎng)絡(luò)傳輸時(shí)延要求,對(duì)MTCD進(jìn)行聚類。然而,上述研究中提出的MTCD聚類策略,有的僅考慮了地理位置和QoS要求中的單個(gè)因素,并未充分發(fā)掘MTCD之間的關(guān)聯(lián)性,導(dǎo)致在MTCD聚簇內(nèi)不能很好地協(xié)調(diào)干擾,潛在影響系統(tǒng)吞吐量;有的僅針對(duì)特定優(yōu)化目標(biāo)進(jìn)行聚類,不具有普遍適用性。
文獻(xiàn)[11-12]均考慮H2H(human to human)與M2M(machine to machine)共存場(chǎng)景中系統(tǒng)用戶過(guò)載情況下的資源分配問(wèn)題。然而,文獻(xiàn)[11]未考慮時(shí)延敏感M2M業(yè)務(wù)的傳輸需求,導(dǎo)致無(wú)法滿足此類M2M業(yè)務(wù)的QoS要求,文獻(xiàn)[12]則利用基于背包模型的資源分配算法,保證了時(shí)延敏感M2M通信業(yè)務(wù)的QoS,但在文獻(xiàn)[12]中僅將所提算法同傳統(tǒng)的優(yōu)先為H2H終端分配資源的算法進(jìn)行性能比較,無(wú)法充分驗(yàn)證該算法的優(yōu)越性。文獻(xiàn)[13]提出了一種動(dòng)態(tài)資源分配策略用于解決MTCD間的資源分配問(wèn)題,雖然考慮了MTCD請(qǐng)求過(guò)載的情況,但并不允許資源復(fù)用,導(dǎo)致頻譜利用率較低,同時(shí)由于接入網(wǎng)絡(luò)的MTCD數(shù)量較少,也造成系統(tǒng)吞吐量下降。文獻(xiàn)[14]針對(duì)多輸入多輸出系統(tǒng)中動(dòng)態(tài)資源分配問(wèn)題,提出了一種確保用戶最低QoS要求的資源分配算法,能獲得較高的系統(tǒng)吞吐量,然而該方法是在用戶功率等分配的前提下執(zhí)行的,并不符合實(shí)際,具有一定的局限性。文獻(xiàn)[15]討論了在頻譜資源匱乏條件下,基于設(shè)備到設(shè)備(device to device,D2D)分簇的車通信資源分配問(wèn)題,在保證車用戶正常通信下,最大化蜂窩用戶的吞吐量。文獻(xiàn)[16]研究了基于容量最大化地mMTC場(chǎng)景的資源分配問(wèn)題,但使用的是傳統(tǒng)粒子群算法,該算法對(duì)容量提升作用有限,且沒(méi)有考慮MTCD分組問(wèn)題。在功率有限、頻譜資源匱乏的MTC網(wǎng)絡(luò)中,傳統(tǒng)資源分配方法難以滿足MTCD不斷增長(zhǎng)的QoS要求。近年來(lái)研究表明基于機(jī)器學(xué)習(xí)的資源分配策略已經(jīng)優(yōu)于傳統(tǒng)的方法[17-18],而Q學(xué)習(xí)作為一種著名無(wú)模型強(qiáng)化學(xué)習(xí)(reinforcement learning,RL)算法引起了人們的關(guān)注。
基于以上分析,本文在確保承擔(dān)高信噪比傳輸任務(wù)的MTCD最低QoS要求前提下,提出兩種Q學(xué)習(xí)算法:team-Q學(xué)習(xí)算法和dis-Q學(xué)習(xí)算法,解決網(wǎng)絡(luò)內(nèi)MTCD之間的資源塊和功率聯(lián)合分配問(wèn)題。該資源分配算法分為兩個(gè)階段:第一階段設(shè)計(jì)一種基于CS的聚類方案,即借鑒商品推薦系統(tǒng)中求取用戶之間相似度的做法,分別為MTCD、DA構(gòu)造多維向量,再利用向量之間余弦相似度進(jìn)行分組。第二階段中,針對(duì)分組后的MTC網(wǎng)絡(luò)上行鏈路資源塊和功率分配問(wèn)題,提出了兩種基于Q學(xué)習(xí)的分配算法:team-Q學(xué)習(xí)和dis-Q學(xué)習(xí),其中dis-Q算法在team-Q算法基礎(chǔ)上改進(jìn)了Q值表和獎(jiǎng)勵(lì)函數(shù)。最后,通過(guò)仿真驗(yàn)證了所提算法能在復(fù)雜性、收斂速度以及對(duì)系統(tǒng)吞吐量促進(jìn)作用等方面的有效性。
本文研究的系統(tǒng)模型如圖1所示,隨機(jī)分布的MTCD經(jīng)過(guò)聚類后形成MTCD聚簇,每個(gè)聚簇內(nèi)含有一個(gè)數(shù)據(jù)聚合器DA,構(gòu)成MTC網(wǎng)絡(luò)。在MTC網(wǎng)絡(luò)中,MTCD通過(guò)稀疏碼分多址技術(shù)[19]與DA連接,DA充當(dāng)數(shù)據(jù)接收和轉(zhuǎn)發(fā)的角色,即負(fù)責(zé)接收MTCD數(shù)據(jù)并轉(zhuǎn)發(fā)至BS,使得整個(gè)網(wǎng)絡(luò)變成雙層架構(gòu),可以減輕BS的接入負(fù)擔(dān)。假定聚簇與聚簇間使用正交的頻譜資源,而聚簇內(nèi)的MTCD之間以非正交多址方式共用資源塊。因此,在MTC網(wǎng)絡(luò)內(nèi)由于資源塊的復(fù)用會(huì)產(chǎn)生多址干擾,在接收端則可采用串行干擾消除(successive interference cancellation,SIC)技術(shù)進(jìn)行正確解調(diào)。
圖1 系統(tǒng)模型
所以針對(duì)使得整個(gè)MTC網(wǎng)絡(luò)吞吐量最大化的目標(biāo),根據(jù)香農(nóng)信道容量計(jì)算公式可以構(gòu)造出如下最優(yōu)化問(wèn)題:
上述問(wèn)題屬于混合整數(shù)非線性規(guī)劃(mixed integer nonlinear programming,MINLP)問(wèn)題,通常是NP難[17]的,很難直接求解,在本文中使用Q學(xué)習(xí)算法解決。
算法1 基于余弦相似度的MTCD聚類算法
初始化:
循環(huán):
(1)基于team-Q學(xué)習(xí)算法的資源分配策略
(2)基于dis-Q學(xué)習(xí)算法的資源分配策略
算法2 dis-Q學(xué)習(xí)資源分配算法
初始化:
迭代:
根據(jù)式(10)更新Q值表;
本節(jié)主要對(duì)本文所提算法的性能進(jìn)行分析驗(yàn)證,包括收斂性、復(fù)雜度和系統(tǒng)吞吐量等,仿真平臺(tái)是MATLAB工具,仿真參數(shù)見(jiàn)表1[12,20]。
表1 仿真參數(shù)
首先對(duì)比兩種Q學(xué)習(xí)算法的收斂速度。如圖2所示,可以得到team-Q算法和dis-Q算法隨著迭代次數(shù)增加都趨向于收斂,但從迭代次數(shù)角度出發(fā),dis-Q學(xué)習(xí)算法的收斂速度相比team-Q學(xué)習(xí)算法提高了近40%。這是由于在team-Q學(xué)習(xí)算法中,Q值表的維度遠(yuǎn)大于dis-Q學(xué)習(xí)算法,當(dāng)動(dòng)作空間和智能體agent數(shù)量都增大時(shí),team-Q算法復(fù)雜度會(huì)呈現(xiàn)指數(shù)級(jí)增長(zhǎng),最終導(dǎo)致dis-Q學(xué)習(xí)算法的收斂速度快于team-Q學(xué)習(xí)算法。
圖2 兩種Q學(xué)習(xí)算法收斂性分析
圖3 不同下team-Q、dis-Q算法中Q值表維度對(duì)比
圖4 不同算法下系統(tǒng)吞吐量對(duì)比
圖5 不同聚類算法下系統(tǒng)吞吐量對(duì)比
本文研究了在mMTC場(chǎng)景中以系統(tǒng)吞吐量最優(yōu)化為目標(biāo)的資源分配問(wèn)題。首先,提出了一種基于余弦相似度的聚類算法,根據(jù)MTCD與DA之間的相對(duì)位置和QoS要求,將MTCD分組。該算法能充分發(fā)掘出MTCD之間的關(guān)聯(lián)性,能更好地協(xié)調(diào)MTC聚簇內(nèi)的干擾,有利于提升系統(tǒng)性能。此外,針對(duì)MTC網(wǎng)絡(luò)中的資源分配問(wèn)題,提出了team-Q學(xué)習(xí)算法和dis-Q學(xué)習(xí)算法。仿真結(jié)果表明,兩種Q學(xué)習(xí)算法對(duì)系統(tǒng)吞吐量的提升作用相較于對(duì)比算法均有較大幅度的提高,其中team-Q算法在系統(tǒng)吞吐量性能上略優(yōu)于dis-Q算法,但是dis-Q算法在信令消耗、收斂速度方面明顯優(yōu)于team-Q算法,這也更加符合“綠色通信”的理念。
[1] CHEN S Y, MA R F, CHEN H H, et al. Machine-to-machine communications in ultra-dense networks—A survey[J]. IEEE Communications Surveys & Tutorials, 2017, 19(3): 1478-1503.
[2] 錢(qián)志鴻, 王義君. 物聯(lián)網(wǎng)技術(shù)與應(yīng)用研究[J]. 電子學(xué)報(bào), 2012, 40(5): 1023-1029.
QIAN Z H, WANG Y J. IoT technology and application[J]. Acta Electronica Sinica, 2012, 40(5): 1023-1029.
[3] Service-aware transport network: opportunities and chanenges[J]. Proceedings of SPIE - The International Society for Optical Engineering, 2005.
[4] ZHOU Y Q, TIAN L, LIU L, et al. Fog computing enabled future mobile communication networks: a convergence of communication and computing[J]. IEEE Communications Magazine, 2019, 57(5): 20-27.
[5] Cisco visual networking index: global mobile data traffic forecast update 2014-2019[EB]. 2014.
[6] LIANG L, XU L, CAO B, et al. A cluster-based congestion-mitigating access scheme for massive M2M communications in internet of things[J]. IEEE Internet of Things Journal, 2018, 5(3): 2200-2211.
[7] GHAVIMI F, LU Y W, CHEN H H. Uplink scheduling and power allocation for M2M communications in SC-FDMA-based LTE-A networks with QoS guarantees[J]. IEEE Transactions on Vehicular Technology, 2017, 66(7): 6160-6170.
[8] GAO H, XU X D, HAN S J. Homogeneous clustering algorithm based on average residual energy for energy-efficient MTC networks[C]//Proceedings of 2018 24th Asia-Pacific Conference on Communications (APCC). Piscataway: IEEE Press, 2018: 28-33.
[9] HUSSAIN F, HUSSAIN R, ANPALAGAN A, et al. A new block-based reinforcement learning approach for distributed resource allocation in clustered IoT networks[J]. IEEE Transactions on Vehicular Technology, 2020, 69(3): 2891-2904.
[10] XU Y Q, FENG G, LIANG L, et al. MTC data aggregation for 5G network slicing[C]//Proceedings of 2017 23rd Asia-Pacific Conference on Communications (APCC). Piscataway: IEEE Press, 2017: 1-6.
[11] 王鑫, 邱玲. H2H與M2M共存場(chǎng)景的準(zhǔn)入控制及資源分配[J].中國(guó)科學(xué)院大學(xué)學(xué)報(bào), 2016, 33(3): 427-432.
WANG X, QIU L. Admission control and resource allocation of H2H & M2M co-existence scenario[J]. Journal of University of Chinese Academy of Sciences, 2016, 33(3): 427-432.
[12] 蔣繼勝, 朱曉榮. H2H與M2M共存場(chǎng)景下的上行資源分配算法[J]. 電子學(xué)報(bào), 2018, 46(5): 1259-1264.
JIANG J S, ZHU X R. An uplink resource allocation algorithm under the scenario of coexistence of H2H & M2M based on knapsack model[J]. Acta Electronica Sinica, 2018, 46(5): 1259-1264.
[13] SALAM T, REHMAN W U, TAO X F. Cooperative data aggregation and dynamic resource allocation for massive machine type communication[J]. IEEE Access, 2018, 6: 4145-4158.
[14] 郭濤, 李有明, 雷鵬, 等. MIMO中繼系統(tǒng)中一種基于用戶QoS的資源分配方法[J]. 電信科學(xué), 2015, 31(4): 121-126.
GUO T, LI Y M, LEI P, et al. A resource allocation scheme based on user’s QoS in MIMO relay system[J]. Telecommunications Science, 2015, 31(4): 121-126.
[15] 張海波, 向煜, 劉開(kāi)健, 等. 基于D2D通信的V2X資源分配方案[J]. 北京郵電大學(xué)學(xué)報(bào), 2017, 40(5): 92-97.
ZHANG H B, XIANG Y, LIU K J, et al. V2X resource allocation scheme based on D2D communication[J]. Journal of Beijing University of Posts and Telecommunications, 2017, 40(5): 92-97.
[16] 劉佳言, 秦鵬, 趙雄文, 等. 基于容量最大化的mMTC場(chǎng)景的資源分配問(wèn)題研究[J]. 電力信息與通信技術(shù), 2020, 18(12): 17-22.
LIU J Y, QIN P, ZHAO X W, et al. Research on resource allocation of m MTC scenario based on capacity maximization[J]. Electric Power Information and Communication Technology, 2020, 18(12): 17-22.
[17] SHARMA S K, WANG X B. Toward massive machine type communications in ultra-dense cellular IoT networks: current issues and machine learning-assisted solutions[J]. IEEE Communications Surveys & Tutorials, 2020, 22(1): 426-471.
[18] HUSSAIN F, HASSAN S A, HUSSAIN R, et al. Machine learning for resource management in cellular and IoT networks: potentials, current solutions, and open challenges[J]. IEEE Communications Surveys & Tutorials, 2020, 22(2): 1251-1275.
[19] NIKOPOUR H, BALIGH H. Sparse code multiple access[C]//Proceedings of 2013 IEEE 24th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC). Piscataway: IEEE Press, 2013: 332-336.
[20] KAI C H, LI H, XU L, et al. Joint subcarrier assignment with power allocation for sum rate maximization of D2D communications in wireless cellular networks[J]. IEEE Transactions on Vehicular Technology, 2019, 68(5): 4748-4759.
Research on resource allocation algorithm of centralized and distributed Q-learning in machine communication
YU Yunhe, SUN Jun
College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
Under the premise of ensuring partial machine type communication device (MTCD)’s quality of service (QoS) requirements, the resource allocation problem was studied with the goal of maximizing system throughput in the massive machine type communication (mMTC) scenario. Two resource allocation algorithms based on Q-learning were proposed: centralized Q-learning algorithm (team-Q) and distributed Q-learning algorithm (dis-Q). Firstly, taking into account MTCD’s geographic location and multi-level QoS requirements, a clustering algorithm based on cosine similarity (CS) was designed. In the clustering algorithm, multi-dimensional vectors that represent MTCD and data aggregator (DA) were constructed, and MTCDs can be grouped according to the CS value between multi-dimensional vectors. Then in the MTC network, the team-Q learning algorithm and dis-Q learning algorithm were used to allocate resource blocks and power for the MTCD. In terms of throughput performance, team-Q and dis-Q algorithms have an average increase of 16% and 23% compared to the dynamic resource allocation algorithm and the greedy algorithm, respectively. In terms of complexity performance, the dis-Q algorithm is only 25% of team-Q algorithm and even below, the convergence speed is increased by nearly 40%.
resource allocation, centralized Q-learning, distributed Q-learning, consine similarity, multi-dimensional vector
TP929.5
A
10.11959/j.issn.1000?0801.2021244
余云河(1995? ),男,南京郵電大學(xué)通信與信息工程學(xué)院碩士生,主要研究方向?yàn)榇笠?guī)模機(jī)器類通信網(wǎng)絡(luò)中的資源分配。
孫君(1980? ),女,南京郵電大學(xué)碩士生導(dǎo)師,主要研究方向?yàn)闊o(wú)線網(wǎng)絡(luò)資源管理。
s: The National Natural Science Foundation of China (No.61771255), Open Project of Key Laboratory of Chinese Academy of Sciences (No.20190904)
2021?04?30;
2021?10?20
孫君,sunjun@njupt.edu.cn
國(guó)家自然科學(xué)基金資助項(xiàng)目(No.61771255);中國(guó)科學(xué)院重點(diǎn)實(shí)驗(yàn)室開(kāi)放課題(No.20190904)