劉洋洋 劉飛
摘 要 針對(duì)間歇生化過(guò)程操作條件的批次響應(yīng)建模問(wèn)題,結(jié)合試驗(yàn)設(shè)計(jì)方法,提出一種基于函數(shù)型主成分分析的序貫建模策略。首先,使用B樣條基函數(shù)平滑法將離散的批次響應(yīng)序列轉(zhuǎn)化為連續(xù)的響應(yīng)函數(shù)曲線;然后,運(yùn)用函數(shù)型主成分分析得到響應(yīng)函數(shù)的均值曲線、主成分函數(shù)和主成分得分;最后,構(gòu)建主成分得分與操作條件之間的Kriging模型,用于預(yù)測(cè)試驗(yàn)區(qū)域內(nèi)任意操作條件所對(duì)應(yīng)的主成分得分,從而建立批次響應(yīng)關(guān)于操作條件的模型。為了提高模型預(yù)測(cè)精度,依據(jù)改進(jìn)的收斂條件,采用序貫設(shè)計(jì)迭代更新模型。通過(guò)生化反應(yīng)網(wǎng)絡(luò)試驗(yàn)仿真,驗(yàn)證了該建模策略的有效性,且仿真結(jié)果表明該建模策略具有較好的數(shù)據(jù)可視化和模型解釋能力。
關(guān)鍵詞 函數(shù)型主成分分析 序貫設(shè)計(jì) 批次響應(yīng) 試驗(yàn)設(shè)計(jì) Kriging模型 生化類間歇過(guò)程
中圖分類號(hào) TP274.2? ?文獻(xiàn)標(biāo)識(shí)碼 A? ?文章編號(hào) 1000-3932(2023)04-0439-08
實(shí)際工業(yè)生產(chǎn)中,大量生化類間歇過(guò)程的機(jī)理不清楚或工藝過(guò)于復(fù)雜,使得機(jī)理建模難度大且優(yōu)化求解困難,因而開(kāi)發(fā)數(shù)據(jù)驅(qū)動(dòng)模型成為可行的替代方案[1]。結(jié)合試驗(yàn)設(shè)計(jì)(Design of Experiments,DoE)的響應(yīng)曲面法(Response Surface Methodology,RSM)是一種兼具建模與優(yōu)化的數(shù)據(jù)驅(qū)動(dòng)方法[2],其在生化分析和藥物研究方面被廣泛應(yīng)用[3]。RSM只能夠建立生產(chǎn)中某一時(shí)刻響應(yīng)與操作條件之間的數(shù)據(jù)驅(qū)動(dòng)模型,通常是終端時(shí)刻。但構(gòu)建整個(gè)批次響應(yīng)關(guān)于操作條件的模型則更為重要,并且隨著自動(dòng)化實(shí)驗(yàn)平臺(tái)的普及,短期內(nèi)并行試驗(yàn)?zāi)軌蚩焖佾@取批次數(shù)據(jù),這進(jìn)一步促進(jìn)了學(xué)者們對(duì)批次響應(yīng)建模的研究。
文獻(xiàn)[4]對(duì)RSM進(jìn)行推廣,提出了動(dòng)態(tài)響應(yīng)曲面法(Dynamic Response Surface Methodology,DRSM),通過(guò)在響應(yīng)面模型的系數(shù)中引入與時(shí)間相關(guān)的移位勒讓德多項(xiàng)式(Shifted Legendre Polynomials,SLP),將RSM中僅描述某一時(shí)刻的模型系數(shù)轉(zhuǎn)化為可以表示整個(gè)批次的時(shí)變系數(shù);WANG Z和DONG Y等針對(duì)估計(jì)高階SLP微小偏差造成的模型局部振蕩問(wèn)題分別提出相應(yīng)的改進(jìn)策略[5,6],并拓展了DRSM的應(yīng)用范圍[7]。文獻(xiàn)[8]使用改進(jìn)DRSM建立吡啶酮環(huán)化反應(yīng)模型;文獻(xiàn)[9]提出基于半?yún)?shù)模型的批次響應(yīng)建模流程,應(yīng)用于甲酯化學(xué)選擇性水解反應(yīng)分析。此外,還可以考慮高斯過(guò)程[10]、機(jī)器學(xué)習(xí)[11,12]等方法來(lái)分析批次響應(yīng)建模問(wèn)題。
以上方法把批次響應(yīng)看作生產(chǎn)過(guò)程的離散數(shù)據(jù)序列。筆者將把批次響應(yīng)視作一個(gè)整體,表示為連續(xù)的響應(yīng)函數(shù)曲線,即函數(shù)型數(shù)據(jù)[13]。函數(shù)型主成分分析(Functional Principal Component Analysis,F(xiàn)PCA)是研究函數(shù)型數(shù)據(jù)的主要方法。FIDALEO M采用面心立方復(fù)合設(shè)計(jì)構(gòu)造試驗(yàn),利用FPCA建立攪拌球磨機(jī)批次響應(yīng)與操作條件之間的函數(shù)模型,確定了操作條件的設(shè)計(jì)空間[14]。其中,F(xiàn)PCA作用于批次響應(yīng)得到均值曲線、主成分函數(shù)和主成分得分。FIDALEO M使用RSM構(gòu)建主成分得分關(guān)于操作條件的二階多項(xiàng)式預(yù)測(cè)模型。但當(dāng)批次響應(yīng)的非線性較強(qiáng)且試驗(yàn)區(qū)域較為復(fù)雜時(shí),就需要采用精度更高、靈活性更強(qiáng)的建模方法;另一方面,如果根據(jù)一次試驗(yàn)設(shè)計(jì)所得模型未達(dá)到預(yù)期精度,還需考慮如何進(jìn)一步提高模型精度。
因此,筆者采用精度更高的Kriging模型預(yù)測(cè)主成分得分,并結(jié)合極大均方誤差準(zhǔn)則的序貫設(shè)計(jì)[15],在當(dāng)前模型預(yù)測(cè)精度較低區(qū)域進(jìn)行新的試驗(yàn),以提高所建模型精度。使用改進(jìn)的曲線擬合度量指標(biāo)與均方誤差共同組成序貫設(shè)計(jì)收斂條件。通過(guò)FPCA序貫建立生化反應(yīng)網(wǎng)絡(luò)產(chǎn)物濃度模型的試驗(yàn)仿真,驗(yàn)證了所提方法的有效性。
1 基于Kriging模型的FPCA建模
1.1 函數(shù)型主成分分析
1.2 預(yù)測(cè)主成分得分
2 FPCA序貫建模算法
3 生化反應(yīng)網(wǎng)絡(luò)建模示例
對(duì)一個(gè)含10種物質(zhì)的模擬反應(yīng)網(wǎng)絡(luò)進(jìn)行FPCA序貫建模。該反應(yīng)網(wǎng)絡(luò)具有8個(gè)獨(dú)立反應(yīng),反應(yīng)1、4為可逆反應(yīng),動(dòng)力學(xué)方程和參數(shù)見(jiàn)文獻(xiàn)[6],物質(zhì)間的關(guān)系如圖2所示,其中,數(shù)字代表反應(yīng),圓圈代表物質(zhì),藍(lán)色表示反應(yīng)物,灰色表示中間體,橙色表示副產(chǎn)物,綠色表示目標(biāo)產(chǎn)物。
綜上,結(jié)合DoE方法,通過(guò)FPCA序貫建模算法實(shí)現(xiàn)了對(duì)生化反應(yīng)網(wǎng)絡(luò)試驗(yàn)區(qū)域內(nèi)任意操作條件下產(chǎn)物批次濃度的預(yù)測(cè),驗(yàn)證了所提建模策略的有效性。
4 結(jié)束語(yǔ)
筆者結(jié)合DoE,提出了一種基于FPCA序貫建立過(guò)程批次響應(yīng)模型的方法。通過(guò)對(duì)生化反應(yīng)網(wǎng)絡(luò)物質(zhì)濃度建模的試驗(yàn)仿真,驗(yàn)證了該方法的有效性。所建模型具有較好的數(shù)據(jù)可視化和解釋能力,能夠非常準(zhǔn)確地預(yù)測(cè)試驗(yàn)區(qū)域內(nèi)未知操作條件的批次響應(yīng),可用于生化過(guò)程的在線監(jiān)測(cè)、控制和優(yōu)化。
本課題中考慮的操作條件是不隨時(shí)間變化的,筆者后續(xù)將推廣所提方法,使其可以建立隨時(shí)間變化的操作條件的過(guò)程批次響應(yīng)模型。
參 考 文 獻(xiàn)
[1]? ?GEORGAKIS C.Design of dynamic experiments:A da-ta-driven methodology for the optimization of time-varying processes[J].Industrial & Engineering Che-mistry Research,2013,52(35):12369-12382.
[2]? ?BAS D, BOYACL I H. Modeling and optimization Ⅰ:Usability of response surface methodology[J].Journal of Food Engineering,2007,78(3):836-845.
[3]? HANRAHAN G,LU K.Application of factorial and response surface methodology in modern experimental design and optimization[J].Critical Reviews in Ana-lytical Chemistry,2006,36(3-4):141-151.
[4] KLEBANOV N,GEORGAKIS C.Dynamic response surface models:A data-driven approach for the analysis of time-varying process outputs[J].Industrial & Engi-neering Chemistry Research,2016,55(14):4022-4034.
[5]? ?WANG Z,GEORGAKIS C.New dynamic response sur-face methodology for modeling nonlinear processes over semi-infinite time horizons[J].Industrial & Engi-neering Chemistry Research,2017,56(38):10770-10782.
[6]? ?DONG Y,GEORGAKIS C,MUSTAKIS J,et al.Constr-ained version of the dynamic response surface metho-dology for modeling pharmaceutical reactions[J].In-dustrial & Engineering Chemistry Research,2019,58(30):13611-13621.
[7]? DONG Y,GEORGAKIS C,SANTOS-MARQUES J,et al.Dynamic response surface methodology using Lasso regression for organic pharmaceutical synthesis[J].Frontiers of Chemical Science and Engineering,2022,16(2):221-236.
[8]? ?JURICA J A,MCMULLEN J P.Automation Technolo-gies to Enable Data-Rich Experimentation:Beyond Design of Experiments for Process Modeling in Late-Stage Process Development[J].Organic Process Research & Development,2021,25(2):282-291.
[9]? ?WANG K,HAN L,MUSTAKIS J,et al.Kinetic and da-ta-driven reaction analysis for pharmaceutical process development[J].Industrial & Engineering Chemistry Research,2019,59(6):2409-2421.
[10]? ?TANG Q,LAU Y B,HU S,et al.Response surface methodology using Gaussian processes:Towards optimizing the trans-stilbene epoxidation over Co2+-NaX catalysts[J].Chemical Engineering Journal,2010,156(2):423-431.
[11]? DOMAGALSKI N R,MACK B C,TABORA J E.Analysis of design of experiments with dynamic res-ponses[J].Organic Process Research & Development,2015,19(11):1667-1682.
[12]? ?WILSON Z T,SAHINIDIS N V.The ALAMO approa-ch to machine learning[J].Computers & Chemical Engineering,2017,106:785-795.
[13]? ?RAMSAY J O.When the data are functions[J].Psych-ometrika,1982,47(4):379-396.
[14]? ?FIDALEO M.Functional data analysis and design of experiments as efficient tools to determine the dynamical design space of food and biotechnological batch processes[J].Food and Bioprocess Technology,2020,13(6):1035-1047.
[15]? ?CROMBECQ K,LAERMANS E,DHAENE T.Effici-ent space-filling and non-collapsing sequential design strategies for simulation-based modeling[J].European Journal of Operational Research,2011,214(3):683-696.
[16]? ?RAMSAY J O,SILVERMAN B W.Functional Data Analysis[M].2nd ed.New York:Springer New York,2005.
[17]? ?BETZ W,PAPAIOANNOU I,STRAUB D.Numerical methods for the discretization of random fields by means of the Karhunen-Loève expansion[J].Computer Methods in Applied Mechanics and Engineering,2014,271:109-129.
[18]? ?RAMSAY J O,DALZELL C J.Some tools for functio-nal data analysis[J].Journal of the Royal Statistical Society:Series B (Methodological),1991,53(3):539-561.
[19]? ?SACKS J,WELCH W J,MITCHELL T J,et al.Design and analysis of computer experiments[J].Statistical Science,1989,4(4):409-423.
[20]? ?MONTGOMERY D C.Design and analysis of experi-ments[M].9th ed.Arizona:John Wiley & Sons,2017.
(收稿日期:2022-10-21,修回日期:2023-01-10)
Sequential Modeling of Process Batch Response Based on
Functional Principal Component Analysis
LIU Yang-yang, LIU Fei
(MOE Key Laboratory of Advanced Control for Light Industry Processes, Jiangnan University)
Abstract? ?Combined with the method of experiment design, a sequential modeling strategy based on functional principal component analysis(FPCA) was proposed for the batch response modeling of operation conditions in biochemical processes. Firstly, having B-spline basis function smoothing method adopted to transform discrete batch response sequence into a continuous response function curve; then, having FPCA employed to analyze and obtain response functions mean curve, principal component function and principal component score; finally, having Kriging model between the principal component score and operating conditions constructed to predict the principal component score corresponding to any operating conditions in the experiment region so as to establish the model of batch response on operating conditions. For purpose of improving prediction accuracy of the model, having sequential design used to update the model according to the improved convergence condition was implemented, including having effectiveness of the proposed modeling strategy verified by biochemical reaction network experiment simulation. The simulation results show that, the proposed modeling strategy has better data visualization and model interpretation ability.
Key words? ?functional principal component analysis, sequential design, batch response, experiment design, Kriging model, biochemical batch process