趙學(xué)健,李豪,唐浩天
基于用戶(hù)興趣概念格約簡(jiǎn)的推薦評(píng)分預(yù)測(cè)算法
趙學(xué)健1*,李豪2,唐浩天2
(1.郵政大數(shù)據(jù)技術(shù)與應(yīng)用工程中心(南京郵電大學(xué)),南京 210003; 2.南京郵電大學(xué) 現(xiàn)代郵政學(xué)院,南京 210003)( ? 通信作者電子郵箱zhaoxj@njupt.edu.cn)
數(shù)據(jù)稀疏性制約了推薦系統(tǒng)的性能,而合理填充評(píng)分矩陣中的缺失值可以有效提升預(yù)測(cè)的準(zhǔn)確性。因此,提出一種基于用戶(hù)興趣概念格約簡(jiǎn)的推薦評(píng)分預(yù)測(cè)(RRP-CLR)算法。該算法包含近鄰選擇和評(píng)分預(yù)測(cè)兩個(gè)模塊,分別負(fù)責(zé)生成精簡(jiǎn)最近鄰集合和實(shí)現(xiàn)評(píng)分預(yù)測(cè)及推薦。近鄰選擇模塊將用戶(hù)評(píng)分矩陣轉(zhuǎn)化為二進(jìn)制矩陣后作為用戶(hù)興趣形式背景,提出了形式背景約簡(jiǎn)規(guī)則和概念格冗余概念刪除規(guī)則,以提高生成精簡(jiǎn)最近鄰的效率;在評(píng)分預(yù)測(cè)模塊利用新提出的用戶(hù)相似度計(jì)算方法,消除用戶(hù)主觀因素造成的評(píng)分差異對(duì)相似度計(jì)算的影響,而且當(dāng)兩個(gè)用戶(hù)共同評(píng)分項(xiàng)目數(shù)小于特定閾值時(shí),適當(dāng)縮放相似度,使用戶(hù)間的相似度與真實(shí)情況更吻合。實(shí)驗(yàn)結(jié)果表明,與使用皮爾遜相關(guān)系數(shù)的基于用戶(hù)的協(xié)同過(guò)濾推薦算法(PC-UCF)及基于用戶(hù)興趣概念格的推薦評(píng)分預(yù)測(cè)方法(RRP-UICL)相比,RRP-CLR算法的平均絕對(duì)誤差(MAE)和均方根誤差(RMSE)更小,具有更好的評(píng)分預(yù)測(cè)準(zhǔn)確率和穩(wěn)定性。
推薦系統(tǒng);評(píng)分預(yù)測(cè);概念格;稀疏性;精簡(jiǎn)最近鄰
移動(dòng)互聯(lián)網(wǎng)和物聯(lián)網(wǎng)技術(shù)的普及引發(fā)了數(shù)據(jù)生成方式的變革,催生了大數(shù)據(jù)時(shí)代的來(lái)臨。用戶(hù)面臨海量的數(shù)據(jù),如何高效進(jìn)行檢索分析,獲取對(duì)自己有價(jià)值的信息成為難題,這種現(xiàn)象被稱(chēng)為信息過(guò)載[1]。推薦系統(tǒng)作為解決信息過(guò)載問(wèn)題的主要技術(shù)之一,近年來(lái)受到了學(xué)術(shù)界和工業(yè)界的廣泛關(guān)注[2-3]。目前推薦系統(tǒng)中最常用的方法是協(xié)同過(guò)濾(Collaborative Filtering, CF)推薦算法。協(xié)同過(guò)濾推薦算法的主要思想是根據(jù)用戶(hù)的歷史行為數(shù)據(jù),比如用戶(hù)對(duì)項(xiàng)目的評(píng)分?jǐn)?shù)據(jù),分析用戶(hù)與項(xiàng)目之間的關(guān)系,并依此給出相應(yīng)的推薦[4-5]。
然而在實(shí)際應(yīng)用中,由于評(píng)分?jǐn)?shù)據(jù)集維度規(guī)模較大,數(shù)據(jù)的稀疏程度較高,導(dǎo)致傳統(tǒng)的協(xié)同過(guò)濾推薦算法的推薦準(zhǔn)確性較低[6],數(shù)據(jù)稀疏性成為制約推薦系統(tǒng)應(yīng)用的瓶頸。為解決這一問(wèn)題,研究人員進(jìn)行了大量研究。研究表明,對(duì)評(píng)分矩陣中的缺失值進(jìn)行有效的預(yù)測(cè)填充,有助于提升推薦算法的準(zhǔn)確性[7]。
本文提出一種基于用戶(hù)興趣概念格約簡(jiǎn)的推薦評(píng)分預(yù)測(cè)(Recommendation Rating Prediction based on Concept Lattice Reduction, RRP-CLR)算法。RRP-CLR算法包含近鄰選擇和評(píng)分預(yù)測(cè)兩個(gè)模塊,分別負(fù)責(zé)生成精簡(jiǎn)最近鄰集合和實(shí)現(xiàn)評(píng)分預(yù)測(cè)及推薦。本文主要工作包括:
1)基于概念格及其約簡(jiǎn)理論,實(shí)現(xiàn)對(duì)目標(biāo)用戶(hù)的近鄰用戶(hù)的有效精簡(jiǎn),生成精簡(jiǎn)最近鄰,提高評(píng)分預(yù)測(cè)的準(zhǔn)確率。首先,將用戶(hù)評(píng)分矩陣轉(zhuǎn)化為二進(jìn)制矩陣并作為用戶(hù)興趣形式背景,提出形式背景約簡(jiǎn)規(guī)則,提高概念格構(gòu)建效率;其次,提出概念格冗余概念刪除規(guī)則,提高生成目標(biāo)用戶(hù)精簡(jiǎn)最近鄰的有效性。
2)提出新的用戶(hù)相似度計(jì)算方法,消除用戶(hù)主觀因素造成的評(píng)分差異對(duì)相似度計(jì)算的影響,在兩個(gè)用戶(hù)共同評(píng)分項(xiàng)目數(shù)小于特定閾值時(shí),適當(dāng)縮放相似度,使用戶(hù)間的相似度與真實(shí)情況更加吻合。
3)通過(guò)實(shí)驗(yàn)驗(yàn)證了RRP-CLR算法評(píng)分預(yù)測(cè)的準(zhǔn)確性。
為有效填充評(píng)分矩陣中的缺失值,研究人員開(kāi)展了一系列探索性研究。相關(guān)研究成果可歸納為以下4類(lèi):
1)基于協(xié)同過(guò)濾思想進(jìn)行評(píng)分預(yù)測(cè)。文獻(xiàn)[8]中融合了基于用戶(hù)和項(xiàng)目的協(xié)同過(guò)濾思想,利用相似用戶(hù)或相似項(xiàng)目的評(píng)分信息進(jìn)行評(píng)分預(yù)測(cè),有效緩解了數(shù)據(jù)稀疏性問(wèn)題。文獻(xiàn)[9]中融合社交網(wǎng)絡(luò)信息和用戶(hù)評(píng)分信息,基于項(xiàng)目協(xié)同過(guò)濾框架,選擇性地對(duì)評(píng)分矩陣中的缺失值進(jìn)行評(píng)分預(yù)測(cè),解決評(píng)分矩陣數(shù)據(jù)稀疏性問(wèn)題。
2)基于信任關(guān)系及信任傳遞機(jī)制進(jìn)行評(píng)分預(yù)測(cè)。文獻(xiàn)[10]中融合專(zhuān)家信任度和用戶(hù)興趣相似度進(jìn)行評(píng)分預(yù)測(cè),以降低數(shù)據(jù)的稀疏性。文獻(xiàn)[11]中指出將社會(huì)信任納入矩陣分解方法可以明顯提高評(píng)分預(yù)測(cè)的準(zhǔn)確性,首先通過(guò)推薦系統(tǒng)中用戶(hù)之間的Hellinger距離,從用戶(hù)對(duì)項(xiàng)目的評(píng)分中提取社會(huì)關(guān)系;然后將預(yù)測(cè)的信任分?jǐn)?shù)納入社會(huì)矩陣分解模型實(shí)現(xiàn)評(píng)分預(yù)測(cè)。文獻(xiàn)[12]中提出了一個(gè)簡(jiǎn)單、可擴(kuò)展的文本驅(qū)動(dòng)潛在因素模型,通過(guò)捕獲評(píng)論語(yǔ)義、用戶(hù)偏好和產(chǎn)品特征的語(yǔ)義,將文本分解為特定的低維表示,利用普通用戶(hù)/產(chǎn)品評(píng)級(jí)之間的差異作為補(bǔ)充信息校準(zhǔn)參數(shù)估計(jì),準(zhǔn)確進(jìn)行評(píng)分預(yù)測(cè),解決冷啟動(dòng)和數(shù)據(jù)稀疏問(wèn)題。文獻(xiàn)[13]中提出了一種新的隱式信任推薦方法,通過(guò)挖掘和利用推薦系統(tǒng)中的用戶(hù)隱式信息生成項(xiàng)目預(yù)測(cè)評(píng)分。具體來(lái)說(shuō),首先通過(guò)信任網(wǎng)絡(luò)中的用戶(hù)信任擴(kuò)散特征獲取用戶(hù)信任鄰居集;接著,利用從用戶(hù)信任鄰居集中挖掘出的信任等級(jí)計(jì)算用戶(hù)之間的信任相似度;最后,使用過(guò)濾的信任等級(jí)和用戶(hù)信任相似性,通過(guò)信任加權(quán)方法得到預(yù)測(cè)結(jié)果。文獻(xiàn)[14]中提出了一種改進(jìn)的協(xié)同過(guò)濾推薦算法,該算法考慮到傳統(tǒng)評(píng)分相似度計(jì)算過(guò)分依賴(lài)常用評(píng)分項(xiàng)目,引入了Bhattacharyya相似度計(jì)算方法和信任傳遞機(jī)制,計(jì)算用戶(hù)之間的間接信任值,綜合用戶(hù)相似度和用戶(hù)信任度,采用信任權(quán)重法生成評(píng)分預(yù)測(cè)結(jié)果。
3)基于深度學(xué)習(xí)模型進(jìn)行評(píng)分預(yù)測(cè)。文獻(xiàn)[15]中研究了用戶(hù)因素和項(xiàng)目特征的真實(shí)性質(zhì)之間的非線(xiàn)性復(fù)雜關(guān)系對(duì)評(píng)分預(yù)測(cè)的影響,提出了一種新的深度前饋網(wǎng)絡(luò)學(xué)習(xí)相關(guān)因素及其復(fù)雜關(guān)系,在不使用任何人口統(tǒng)計(jì)信息的情況下自動(dòng)構(gòu)建用戶(hù)配置文件和項(xiàng)目特征,然后使用這些構(gòu)建的特征預(yù)測(cè)項(xiàng)目對(duì)用戶(hù)的可接受程度,產(chǎn)生更好的評(píng)級(jí)預(yù)測(cè)。文獻(xiàn)[16]中提出了一種基于評(píng)論、產(chǎn)品類(lèi)別和用戶(hù)共同購(gòu)買(mǎi)信息的神經(jīng)網(wǎng)絡(luò)聯(lián)合學(xué)習(xí)模型,用于評(píng)級(jí)預(yù)測(cè)推薦。該模型的評(píng)論提取模塊從評(píng)論中學(xué)習(xí)用戶(hù)和產(chǎn)品信息,異構(gòu)信息網(wǎng)絡(luò)提取模塊則提取給定目標(biāo)用戶(hù)-產(chǎn)品對(duì)的關(guān)聯(lián)特征,連接兩部分?jǐn)?shù)據(jù)即可實(shí)現(xiàn)評(píng)分預(yù)測(cè)。文獻(xiàn)[17]中指出現(xiàn)有的基于用戶(hù)的協(xié)同過(guò)濾(User-based CF, UCF)推薦算法往往關(guān)注如何查找最近的概念鄰居以及如何基于預(yù)測(cè)生成推薦,忽略了如何聚合鄰居的評(píng)分以預(yù)測(cè)未評(píng)分項(xiàng)目的評(píng)分,并為此提出了一種基于個(gè)人不對(duì)稱(chēng)響應(yīng)的建議聚合算法。該算法首先使用線(xiàn)性回歸方法了解每個(gè)用戶(hù)對(duì)來(lái)自鄰居的負(fù)面/正面建議的響應(yīng),然后使用梯度下降算法對(duì)用戶(hù)的模型參數(shù)進(jìn)行優(yōu)化,從而實(shí)現(xiàn)更加準(zhǔn)確的評(píng)分預(yù)測(cè)。文獻(xiàn)[18]中提出一個(gè)異構(gòu)融合推薦模型,用于從評(píng)論文本中提取細(xì)粒度的產(chǎn)品屬性和用戶(hù)行為信息,并將用戶(hù)學(xué)習(xí)到的潛在因素和項(xiàng)目連接起來(lái),在圖上執(zhí)行空間卷積,提升推薦評(píng)分的預(yù)測(cè)精度和推薦準(zhǔn)確率。為了深入挖掘用戶(hù)行為的潛在規(guī)律,文獻(xiàn)[19]中提出了基于空間維度和距離測(cè)量的方差模型以及基于空間維度和距離測(cè)量的皮爾森相關(guān)系數(shù)模型,這兩個(gè)模型通過(guò)計(jì)算項(xiàng)目和用戶(hù)在每個(gè)特征維度中的距離來(lái)獲得項(xiàng)目和用戶(hù)的交互特征,分別利用方差和皮爾遜相關(guān)系數(shù)評(píng)估用戶(hù)對(duì)每個(gè)特征維度的關(guān)注程度,從而進(jìn)一步獲得交互特征的權(quán)重向量,并采用專(zhuān)門(mén)設(shè)計(jì)的多層全連接神經(jīng)網(wǎng)絡(luò)進(jìn)行評(píng)級(jí)預(yù)測(cè)。
概念格來(lái)源于形式概念分析(Formal Concept Analysis,F(xiàn)CA)理論,是形式概念分析理論中的核心數(shù)據(jù)分析工具,它本質(zhì)上描述了對(duì)象(樣本)與屬性(特征)之間的關(guān)聯(lián)。形式概念分析包括形式背景和形式概念兩部分核心內(nèi)容。
本文基于概念格及其約簡(jiǎn)理論提出一種用戶(hù)評(píng)分預(yù)測(cè)及推薦算法RRP-CLR,包含近鄰選擇和評(píng)分預(yù)測(cè)兩個(gè)模塊。近鄰選擇模塊主要負(fù)責(zé)將評(píng)分矩陣轉(zhuǎn)化為二進(jìn)制矩陣,并通過(guò)用戶(hù)興趣概念格構(gòu)造及約簡(jiǎn)實(shí)現(xiàn)目標(biāo)用戶(hù)廣義最近鄰的篩選,從而得到精簡(jiǎn)最近鄰;評(píng)分預(yù)測(cè)模塊主要負(fù)責(zé)計(jì)算用戶(hù)之間的相似度,并完成目標(biāo)用戶(hù)對(duì)項(xiàng)目的評(píng)分預(yù)測(cè),實(shí)現(xiàn)Top-推薦。
表1 用戶(hù)評(píng)分矩陣
表2 二進(jìn)制矩陣
圖1 概念格
圖2 精簡(jiǎn)概念格
綜上所述,用戶(hù)相似度計(jì)算方法如下所示:
完成用戶(hù)之間相似度計(jì)算后,可根據(jù)式(4)計(jì)算目標(biāo)用戶(hù)對(duì)項(xiàng)目的預(yù)測(cè)評(píng)分:
評(píng)分預(yù)測(cè)完成后便可從所有項(xiàng)目中為目標(biāo)用戶(hù)選擇評(píng)分最高的個(gè)項(xiàng)目進(jìn)行TOP-推薦。
本文使用MovieLens數(shù)據(jù)集(https://grouplens.org/datasets/movielens/)對(duì)RRP-CLR算法的性能進(jìn)行驗(yàn)證,該數(shù)據(jù)集包含了943名不同用戶(hù)對(duì)1 682部電影的10萬(wàn)個(gè)評(píng)分。評(píng)分1~5表示用戶(hù)對(duì)電影的喜愛(ài)程度,從1到5表示偏愛(ài)程度逐漸增大。原始MovieLens數(shù)據(jù)集中的可用評(píng)分?jǐn)?shù)量過(guò)多,密度偏大,而RRP-CLR算法主要用于解決數(shù)據(jù)稀疏情況下的評(píng)分預(yù)測(cè)問(wèn)題,因此本文在實(shí)驗(yàn)過(guò)程中對(duì)MovieLens數(shù)據(jù)集中現(xiàn)有評(píng)分進(jìn)行隨機(jī)清除;將數(shù)據(jù)集ML-1稀疏度設(shè)置為98%,保留可用評(píng)分31 726條;數(shù)據(jù)集ML-2稀疏度設(shè)置為99%,保留可用評(píng)分15 889條。
本文主要采用平均絕對(duì)誤差(Mean Absolute Error, MAE)和均方根誤差(Root Mean Squared Error, RMSE)評(píng)價(jià)RRP-CLR算法的性能,并與文獻(xiàn)[20]提出的基于用戶(hù)興趣概念格的推薦評(píng)分預(yù)測(cè)方法(Recommendation Rating Prediction method based on User Interest Concept Lattice, RRP-UICL)及使用皮爾遜相關(guān)系數(shù)的UCF推薦算法(UCF recommendation algorithm based on Pearson Coefficient, PC?UCF)[20]進(jìn)行對(duì)比分析。
RMSE也稱(chēng)之為標(biāo)準(zhǔn)誤差,是均方誤差的算術(shù)平方根。引入RMSE與引入標(biāo)準(zhǔn)差的原因是完全一致的,但均方誤差的量綱與數(shù)據(jù)量綱不同,不能直觀反映離散程度,故在均方誤差上開(kāi)平方根,得到RMSE。在本方法中,RMSE表示所有預(yù)測(cè)用戶(hù)評(píng)分和實(shí)際用戶(hù)評(píng)分偏差平方和的均值的平方根。該指標(biāo)也可以反映預(yù)測(cè)的精度,RMSE值越小,預(yù)測(cè)的精度越高;反之,RMSE值越大,預(yù)測(cè)的精度越低。與MAE指標(biāo)不同的是,RMSE先對(duì)偏差作了一次平方,因此如果誤差的離散度高,RMSE就會(huì)被加倍放大。因此,RMSE指標(biāo)不僅可以反映預(yù)測(cè)的精度,同時(shí)可以反映預(yù)測(cè)的離散度。均方根誤差RMSE的計(jì)算方法如式(6)所示:
由圖3可以看出,在數(shù)據(jù)集稀疏度、推薦項(xiàng)目數(shù)相同時(shí),本文RRP-CLR算法的MAE比PC-UCF、RRP-UICL的MAE值更小。在數(shù)據(jù)集稀疏度為98%時(shí),RRP-CLR、RRP-UICL和PC-UCF算法的MAE平均值分別為0.40、0.46和0.75。RRP-CLR算法的MAE相較于RRP-UICL及PC-UCF分別降低13.04%和46.7%。此外,三種算法在ML-1數(shù)據(jù)集中的MAE值均小于在ML-2數(shù)據(jù)集中的MAE值,表明數(shù)據(jù)集的稀疏度對(duì)算法的MAE具有顯著影響,數(shù)據(jù)集稀疏度變大時(shí),MAE相應(yīng)增大。數(shù)據(jù)集稀疏度由98%增大到99%時(shí),RRP-CLR算法對(duì)應(yīng)的MAE平均增加8.07%,小于RRP-UICL的11.97%和PC-UCF的39.36%,表明RRP-CLR算法在數(shù)據(jù)集稀疏度較高的情況下優(yōu)勢(shì)更大??傊谙∈鑸?chǎng)景下,RRP-CLR算法基于概念格及其約簡(jiǎn)理論,實(shí)現(xiàn)了對(duì)目標(biāo)用戶(hù)的近鄰用戶(hù)的有效精簡(jiǎn),生成精簡(jiǎn)最近鄰,相較于PC-UCF及RRP-UICL具有更高的預(yù)測(cè)精度。
圖3 MAE對(duì)比分析
由圖4可以看出,在數(shù)據(jù)集稀疏度為98%時(shí),RRP-CLR、RRP-UICL和PC-UCF算法的RMSE值分別為0.61、0.76和0.99。當(dāng)數(shù)據(jù)集稀疏度由98%增大到99%時(shí),RRP-CLR算法對(duì)應(yīng)的RMSE平均增加13.89%,小于RRP-UICL的21.05%和PC-UCF的39.93%,表明RRP-CLR算法在數(shù)據(jù)集稀疏度較高的情況下預(yù)測(cè)結(jié)果穩(wěn)定性愈發(fā)突出??傊?,與MAE指標(biāo)類(lèi)似,在數(shù)據(jù)集稀疏度、推薦項(xiàng)目數(shù)相同時(shí),RRP-CLR算法的RMSE值比PC-UCF、RRP-UICL的RMSE值也更小,且稀疏度越高,優(yōu)勢(shì)越明顯,表明RRP-CLR算法不僅具有更好的預(yù)測(cè)精度,而且預(yù)測(cè)的離散度更小,即預(yù)測(cè)結(jié)果的穩(wěn)定性更高。
圖4 RMSE對(duì)比分析
針對(duì)數(shù)據(jù)稀疏性導(dǎo)致的推薦算法準(zhǔn)確率下降問(wèn)題,提出一種基于概念格的評(píng)分預(yù)測(cè)填充算法。該算法將用戶(hù)評(píng)分矩陣轉(zhuǎn)化為二進(jìn)制矩陣,并將該矩陣視為用戶(hù)興趣形式背景,提出了形式背景約簡(jiǎn)規(guī)則及概念格冗余概念刪除規(guī)則,提高了概念格構(gòu)建效率及生成目標(biāo)用戶(hù)精簡(jiǎn)最近鄰的有效性。此外,該算法采用了新的用戶(hù)相似度計(jì)算方法,消除了用戶(hù)主觀因素帶來(lái)的評(píng)分差異對(duì)相似度計(jì)算的影響,并在兩個(gè)用戶(hù)共同評(píng)分項(xiàng)目數(shù)小于特定閾值時(shí),對(duì)相似度進(jìn)行適當(dāng)?shù)目s放,使用戶(hù)間的相似度與真實(shí)情況更吻合。實(shí)驗(yàn)結(jié)果表明,本文 算法在數(shù)據(jù)集具有明顯稀疏性時(shí)具有更好的預(yù)測(cè)準(zhǔn)確度和穩(wěn)定性。
[1] 劉華鋒,景麗萍,于劍. 融合社交信息的矩陣分解推薦方法研究綜述[J]. 軟件學(xué)報(bào), 2018, 29(2):340-362.(LIU H F, JING L P, YU J. Survey of matrix factorization based recommendation methods by integrating social information[J]. Journal of Software, 2018, 29(2): 340-362.)
[2] LIU L, DU X, ZHU L, et al. Learning discrete hashing towards efficient fashion recommendation[J]. Data Science and Engineering, 2018, 3(4):307-322.
[3] GAO D, TONG Y, SHE J, et al. Top-team recommendation and its variants in spatial crowdsourcing[J]. Data Science and Engineering, 2017, 2(2): 136-150.
[4] WANG C D, DENG Z H, LAI J H, et al. Serendipitous recommendation in e-commerce using innovator-based collaborative filtering[J]. IEEE Transactions on Cybernetics, 2019, 49(7): 2678-2692.
[5] SRIVASTAVA R, PALSHIKAR G K, CHAURASIA S, et al. What’s next? A recommendation system for industrial training[J]. Data Science and Engineering, 2018, 3(3): 232-247.
[6] ADOMAVICIUS G, TUZHILIN A. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(6): 734-749.
[7] BREESE J S, HECKERMAN D, KADIE C. Empirical analysis of predictive algorithms for collaborative filtering[C]// Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers Inc., 1998: 43-52.
[8] WANG J, DE VRIES A P, REINDERS M J T. Unifying user-based and item-based collaborative filtering approaches by similarity fusion[C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2006: 501-508.
[9] 郭蘭杰,梁吉業(yè),趙興旺. 融合社交網(wǎng)絡(luò)信息的協(xié)同過(guò)濾推薦算法[J]. 模式識(shí)別與人工智能, 2016, 29(3):281-288.(GUO L J, LIANG J Y, ZHAO X W. Collaborative filtering recommendation algorithm incorporating social network information[J]. Pattern Recognition and Artificial Intelligence, 2016, 29(3): 281-288.)
[10] 張俊,劉滿(mǎn),彭維平,等. 融合興趣和評(píng)分的協(xié)同過(guò)濾推薦算法[J]. 小型微型計(jì)算機(jī)系統(tǒng), 2017, 38(2):357-362.(ZHANG J, LIU M, PENG W P, et al. Collaborative filtering recommendation algorithm based on fusion interest and score[J]. Journal of Chinese Computer Systems, 2017, 38(2):357-362.)
[11] TAHERI S M, MAHYAR H, FIROUZI M, et al. Extracting implicit social relation for social recommendation techniques in user rating prediction[C]// Proceedings of the 26th International Conference on World Wide Web Companion. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2017: 1343-1351.
[12] SONG K, GAO W, SHI F, et al. Recommendation vs sentiment analysis: a text-driven latent factor model for rating prediction with cold-start awareness[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2017: 2744-2750.
[13] LI Y, LIU J, REN J, et al. A novel implicit trust recommendation approach for rating prediction[J]. IEEE Access, 2020, 8:98305-98315.
[14] CHEN H, SUN H, CHENG M, et al. A recommendation approach for rating prediction based on user interest and trust value[J]. Computational Intelligence and Neuroscience, 2021, 2021: No.6677920.
[15] PURKAYSTHA B, DATTA T, ISLAM M S, et al. Rating prediction for recommendation: constructing user profiles and item characteristics using backpropagation[J]. Applied Soft Computing, 2019, 75:310-322.
[16] TANG J, ZHANG X, ZHANG M, et al. A neural joint model for rating prediction recommendation[J]. Journal of Computational Methods in Sciences and Engineering, 2020, 20(4):1127-1142.
[17] JI S, YANG W, GUO S, et al. Asymmetric response aggregation heuristics for rating prediction and recommendation[J]. Applied Intelligence, 2020, 50(5): 1416-1436.
[18] YANG Z, ZHANG M. TextOG: a recommendation model for rating prediction based on heterogeneous fusion of review data[J]. IEEE Access, 2020, 8: 159566-159573.
[19] ZHOU D, HAO S, ZHANG H, et al. Novel SDDM rating prediction models for recommendation systems[J]. IEEE Access, 2021, 9: 101197-101206.
[20] 朵琳,楊丙. 一種基于用戶(hù)興趣概念格的推薦評(píng)分預(yù)測(cè)方法[J]. 小型微型計(jì)算機(jī)系統(tǒng), 2020, 41(10): 2104-2108.(DUO L, YANG B. Recommendation rating prediction based on user interest concept lattice[J]. Journal of Chinese Computer Systems, 2020, 41(10): 2104-2108.)
Recommendation rating prediction algorithm based on user interest concept lattice reduction
ZHAO Xuejian1*, LI Hao2, TANG Haotian2
(1(),210003,;2,,210003,)
The performance of the recommendation systems is restricted by data sparsity, and the accuracy of prediction can be effectively improved by reasonably filling the missing values in the rating matrix. Therefore, a new algorithm named Recommendation Rating Prediction based on Concept Lattice Reduction (RRP-CLR) was proposed. RRP-CLR algorithm was composed of nearest neighbor selection module and rating prediction module, which were respectively responsible for generating reduced nearest neighbor set and realizing rating prediction and recommendation. In the nearest neighbor selection module, the user rating matrix was transformed into a binary matrix, which was regarded as the user interest formal background. Then the formal background reduction rules and concept lattice redundancy concept deletion rules were proposed to improve the efficiency of generating reduced nearest neighbors. In the rating prediction module, a new user similarity calculation method was proposed to eliminate the impact of rating deviations caused by user’s subjective factors on similarity calculation. When the number of common rating items of two users was less than a specific threshold, the similarity was scaled appropriately to make the similarity between users more consistent with the real situation. Experimental results show that compared with PC?UCF (User-based Collaborative Filtering recommendation algorithm based on Pearson Coefficient) and RRP-UICL (Recommendation Rating Prediction method based on User Interest Concept Lattice), RRP-CLR algorithm has smaller Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE), and better rating prediction accuracy and stability.
recommendation system; rating prediction; concept lattice; sparsity; reduced nearest neighbor
1001-9081(2023)11-3340-06
10.11772/j.issn.1001-9081.2022121839
2022?12?07;
2023?01?18;
國(guó)家自然科學(xué)基金資助項(xiàng)目(61672299); 中國(guó)博士后科學(xué)基金資助項(xiàng)目(2018M640509)。
趙學(xué)?。?982—),男,山東臨沂人,副教授,博士,主要研究方向:數(shù)據(jù)挖掘、無(wú)線(xiàn)傳感器網(wǎng)絡(luò); 李豪(1999—),男,江蘇揚(yáng)州人,碩士研究生,主要研究方向:數(shù)據(jù)挖掘; 唐浩天(2001—),男,四川阿壩人,碩士研究生,主要研究方向:數(shù)據(jù)挖掘。
TP391
A
2023?02?01。
This work is partially supported by National Natural Science Foundation of China (61672299), China Postdoctoral Science Foundation (2018M640509).
ZHAO Xuejian, born in 1982, Ph. D., associate professor. His research interests include data mining, wireless sensor network.
LI Hao, born in 1999, M. S. candidate. His research interests include data mining.
TANG Haotian, born in 2001, M. S. candidate. His research interests include data mining.