Chenliang Liu , Yalin Wang ,,,Chunhua Yang ,,, and Weihua Gui
Dear Editor,
This letter proposes a multimodal data-driven reinforcement learning-based method for operational decision-making in industrial processes.Due to the frequent fluctuations of feedstock properties and operating conditions in the industrial processes, existing data-driven methods cannot effectively adjust the operational variables.In addition, multimodal data such as images, audio, and sensor data are still not fully used in industrial processes.To overcome the impact of feedstock condition fluctuations and effectively utilize operational conditions based on the multimodal data, a new method named feedstock-guided multimodal actor-critic (FGM-AC) is proposed.This letter incorporates the feedstock properties and multimodal data into the state space to guide the decision-making process based on a reinforcement learning (RL) framework to achieve a comprehensive human perception.The effectiveness of the proposed method is verified via extensive experiments conducted on actual industrial data.The results reinforce its potential to provide accurate and dependable strategies for decision-making.
The process industry plays a crucial role in the economic growth of modern society, encompassing steel, petroleum, chemicals, and other fields [1].In the production process of the process industry, the optimal decision-making of operating variables is crucial for enhancing product quality and yield.However, the decision-making process is often influenced by the experience levels of on-site workers, which can significantly impact the achievement of overall production goals[2], [3].Moreover, due to the existence of physical and chemical reactions in the production process, it is difficult to establish complex nonlinear relationship models between operational variables and production metrics via mechanism analysis.Hence, optimizing operational variables remains a complex and daunting problem in process industries.
With the increasing availability of industrial data, data-driven decision-making methods that generate decision values for operating variables have become increasingly prevalent in industrial processes.[4] developed a supervised monitoring strategy to adjust the operational variables of the industrial grinding process based on changes in boundary conditions.However, the multimodal data collected from industrial processes, such as images, audio, and sensor data, may be incomplete due to uncontrollable factors.Latent factor analysis is effective in extracting inherent latent features from incomplete data.For instance, [5] proposes a Kalman-filter-incorporated model for performing representation learning to incomplete temporal data.Reference [6] proposes a highly-efficient model for performing representation learning to incomplete industrial data with temporal dynamics.Reference [7] can extract essential non-linear features from incomplete temporal data with high computational efficiency.
In recent years, with the development of RL, the application of RLbased methods to industrial decision-making has been widely studied [8], [9].Hence, a model-free RL algorithm presents a promising solution for industrial processes.RL is an innovative and efficient approach to obtaining optimal decision-making policies in industrial processes by interacting with agents and situations approaching realworld complexity.It is noteworthy that in industrial processes, the optimal strategy of the operational variables is conventionally designed by engineers based on historical data and experience,resembling an expert system grounded on the knowledge of operators.Analogous to expert systems, RL has the potential to continuously enhance operational decision-making policies based on reward data that update the performance metrics function.This attribute renders the application of RL algorithms in industrial processes more reasonable.
The motivation of this letter is to develop an intelligent operational decision-making method that overcomes feedstock fluctuations and utilizes multimodal data in industrial processes.The main contributions of this letter are summarized as follows:
1) The multimodal data of the industrial process is utilized to enhance the adaptability of the operational decision-making strategy by fully simulating the overall perception of the operators at the industrial site.
2) To overcome the frequent fluctuations of feedstocks in the industrial processes, the feedstock conditions are introduced as the state space of the proposed algorithm to enhance its accuracy.
3) The unique reward function and state representation are designed to better handle the complexity and specific characteristics of multimodal data in the industrial process, which enhance the performance of the proposed RL framework.
Problem statement: The flotation process plays a significant role in the mineral processing of the process industry, which entails the separation of minerals from raw ores through physicochemical surface properties.The objective of the flotation process is to concentrate the valuable minerals from the raw ores by attaching the desired mineral particles to air bubbles.These air bubbles then ascend to the surface of the flotation cell and create a froth layer that contains the mineral concentrate.Then, the froth is collected and further processed.
To achieve effective flotation in the industrial process, it is necessary to adjust the operating variables in real time based on the working condition fluctuations.These operating variables include the slurry level, aeration, flotation agent, and agitation rate.In the current industrial process, the values of these operating variables are determined by operators based on their experience, with the aim of achieving the desired concentrate yield and grade within the target range.However, due to frequent changes in feedstock and operating conditions, manual selection of setpoints by operators is prone to errors resulting in significant fluctuations in both concentrate output and concentrate grade.A potential solution to this issue involves circumventing the selection of setpoints based on manual experiential knowledge and instead utilizing alternative intelligent decision-making strategies.Effectively implementing such strategies has the potential to significantly enhance the utilization value of raw ore and the overall efficiency of the mineral processing process.
Proposed operational decision-making method: In the industrial flotation process, the formulation of a rational program and the definition of states, rewards, and actions are fundamental to achieving an optimal global decision-making strategy.Inspired by the above analysis and RL algorithm, a new method called the FGM-AC algorithm is proposed for effective operational decision-making in the flotation process.This operational decision-making strategy aims to obtain relatively optimal decision-making values of the operational variables to ensure that the concentration and grade of flotation froth remain within the desired range.Fig.1 further provides a visual framework of its application in the flotation process, which mainly includes the operational decision-making system and the production environment of the flotation process.As shown in Fig.1, the multimodal data(green circle) derived from the industrial process and feedstock conditions (gray circle) are input into the operational decision-making system.Then, the corresponding product quality and yield (blue circle) are fed back to the system to calculate the resulting rewards.Finally, the decision values of the operational variables are obtained using the proposed method.
Fig.1.Operational decision-making framework based on feedstock-guided multimodal actor-critic RL method.
Therefore, based on industrial process mechanisms and prior knowledge, the state space of the RL algorithm includes operational conditions, feedstock conditions, and the target grade of flotation froth.It is denoted asIn particular, industrial cameras and microphones are used to collect flotation froth images and audio from actual industrial sites to assist in operational decision-making.
To achieve this, the reward function of the proposed FGM-AC algorithm is defined as
whereris a nonpositive scalar function,f1(a|c,x) represents the concentration of flotation froth,is a penalty function.
The decision-making framework for operational variables, as presented in (2), can be transferred to (3) in the RL algorithm framework, which is given as
It is worth noting that, unlike other sequential decision processes,the decision-making of operational variables in this context is not sequential since they are often interrelated and influenced by multiple factors.Therefore, the step sizeTis selected to be one in each episode.The iterative approach is frequently employed to refine the optimal decision-making policy, which can be characterized as a continuous process.Thus, the derivation of this policy is described as
where π(a|s) is assumed as a conditional distribution belongs to Gaussian distribution.
Considering the high-dimensional and continuous nature of the state and action spaces involved in the optimal operational decisionmaking problem, an actor network is employed using a neural networkimplementation denotedasπθ(a|s), where parameter θisused to approximate theGaussian distribution.The actor networkutilizes the state as input and produces the action as output.
In addition, the critic networkRφ(a|s) with parameter φ is used to estimate the reward generated by πθ(a|s).The input of the critic network consists of the state and action.During the training process, the loss function of the critic network is defined as follows:
wherer(s,a) represents the actual reward of the production data.Then,Rφ(s,a) can be replaced byr(s,a) when the training accuracy is satisfied.Hence, the policy is updated as
Furthermore, integrating experience replays into the FGM-AC algorithm allows for repeated learning from experiential data with benefits such as reduced costs, fewer trials and errors, and faster learning speeds.In the experience replay method, a set of experiences consisting of the state, action, and immediate reward obtained during the interaction between the FGM-AC algorithm and the flotation production process is stored in the experience pool.By minimizing the loss function defined based on the criterion, the decisionmaking policy can be improved as
wherePdenotes the experience replay pool.It should be noted that a batch gradient descent method is used to train the critic network.Subsequently, the loss function is reformulated as shown below:
Subsequently, the FGM-AC algorithm is used to obtain the relatively optimal decision-making policy based on the realizations of actor and critic networks based on iteratively updating (7) and (8) in an alternating manner.Finally, the optimal decision-making values of the operational variables are obtained from the actor network,denoted as
wherea? represents the optimal decision-making values of the operational variables.
Experiments and analysis: The proposed operational decisionmaking method based on FGM-AC is applied to an actual industrial flotation process.All experimental data sets are collected from the largest potassium chloride flotation plant of a mineral processing enterprise.A total of 223 data sets were collected, including the feed ore conditions, operational conditions, operational variables, and performance metrics.A detailed description of these variables is given in Table 1.The first 180 data sets were used for training, while the remaining 43 data sets were used for validation.
Table 1.Discription of Data Sets in the Industrial Flotation Process
In the RL framework of the proposed FGM-AC algorithm, the state vectoris composedofthefeedstockconditionsx,operational conditionsc, and targetflotation frothgrade.Theactionvectoris obtained from the proposed operational decision-making method based on the FGM-AC algorithm.The production goal of the industrial floatation process is to maximize the flotation froth concentration while meeting the flotation froth grade specifications.Hence, the reward function is designed as
Comparative experiments are designed to assess the effectiveness of the proposed method.Manual operations collected at industrial sites were used as a baseline for comparison.In addition, operational decision frameworks based on the deep Q-network (DQN) [10] and the standard actor critic (AC) [11] are used as additional comparisons.For unbiased and impartial experimentation, all actor networks use three-layer neural networks comprising 64 hidden-layer neurons and are trained using a learning rate of 0.01.
The experimental results of the flotation froth performance metrics under four comparison methods are presented in Table 2 and Fig.2.Table 2 gives the minimum, maximum, and average values (in parentheses) of the performance metrics.Fig.2 intuitively depicts the trajectories of two performance metrics.It can be seen from Table 2 that the proficiency of on-site operators lies primarily in regulating the froth grade, while their control of froth concentration has no significant advantages.However, other methods based on the RL framework, including DQN, AC, and FGM-AC, have significantly improved froth concentration, which indirectly guarantees an increase in yield.Specifically, the proposed FGM-AC-based operational decision-making method increases the froth concentration by 8.51% and the froth grade by 1.43% compared to manual operation.However, improving froth concentration while maintaining froth grade in actual industrial processes is usually difficult.Hence, it also demonstrates its effectiveness in optimizing industrial processes.
Table 2.Comparision Results of Four Methods
Fig.2.Comparision results of froth concentrate and grade.
Moreover, the trajectories of two operational variables are shown in Fig.3.It is evident that the mixed mother liquor flow is set higher in the three operational decision-making methods based on the RL framework.This is done by increasing the mixed mother liquor flow rate to boost the concentration of flotation froth, indirectly leading to an increase in froth production, which is consistent with the knowledge and experience of experts.Furthermore, the flotation pulp flow is maintained relatively low compared to manual operation to prevent the loss of flotation froth.
Conclusion: This letter proposes a multimodal data-driven RLbased decision-making method for operational variables in industrial processes, which aims to mitigate the effect of feedstock conditions and exploit underutilized multimodal data.Specifically, a new FGMAC algorithm is proposed to convert the operational variable decision-making problem into an RL problem.Compared to the existing algorithms, the proposed FGM-AC algorithm makes full use of the multimodal data of the industrial sites and has a more comprehensive perception ability.Finally, the experimental results using actual data of the industrial flotation process demonstrate the favorable potential for guiding the production of industrial processes.The future work will focus on enhancing the security of online RL algorithms in industrial applications and extending our work to other industrial processes where multimodal data are available.
Fig.3.Optimal operational variables of foud methods.
Acknowledgment: This work was supported by the National Key Research and Development Program of China (2020YFB1713800),the National Natural Science Foundation of China (92267205), the Hunan Provincial Innovation Foundation for Postgraduate (CX2022 0267) and the Fundamental Research Funds for the Central Universities of Central South University (2022ZZTS0181).
IEEE/CAA Journal of Automatica Sinica2024年1期