Wenhui Hu(胡文慧), Jilei Hou(侯吉磊),?, Zhengping Luo(羅正平), Yao Huang(黃耀), Dalong Chen(陳大龍),Bingjia Xiao(肖炳甲),2, Qiping Yuan(袁旗平), Yanmin Duan(段艷敏), Jiansheng Hu(胡建生),3,Guizhong Zuo(左桂忠), and Jiangang Li(李建剛)
1Institute of Plasma Physics,Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei 230031,China
2University of Science&Technology of China,Hefei 230026,China
3Key Laboratory of Photovoltaic and Energy Conservation Materials,Chinese Academy of Sciences,Hefei 230031,China
Keywords: multifaceted asymmetric radiation from the edge(MARFE)movement prediction,random forest,machine learning,Experimental Advanced Superconducting Tokamak(EAST)
High density is essential for the goal of practical fusion power since the fusion reaction rate is proportional to the square of plasma density in tokamak.[1]However,the plasma density has an upper limit and a degradation of plasma confinement or disruption during high density operation.[2]During high density operation, a multifaceted asymmetric radiation from the edge(MARFE)is often encountered and causes plasma disruption.[3–6]It is a toroidally symmetric,poloidally localized, strongly radiating region of high density and low temperature, typically seen on the high-field side or near the X-point of a tokamak.[7]And the formation of the MARFE is accompanied by rapid growth in the global radiation fraction.After the appearance of MARFE, current shrinkage is observed and then disruption is generally triggered.[8–10]
For the sake of improving plasma confinement and avoiding density limit disruption,a scheme of suppressing MARFE by modulating parallel electron thermal flux to increase edge transport barrier is proposed in Ref.[11] based on the model that radiative MARFE instability is caused by edge fluctuations.A few years later, experiments on TEXTOR-94 show that with suppression of unstable MARFE by reduction of recycling,the empirical density limit is overcome and a stationary plasma with density of 1.6× Greenwald limit is obtained.[6,12]While on Tore Supra, high plasma density of 1.4×Greenwald limit is achieved with stabilized MARFE in a high radiation and high recycling regime.[13]Experimental observations on HT-7 also show that stabilized MARFE is good for confinement at high density operation.[14,15]It will be quite beneficial to suppress MARFE movement to maintain high density near or above Greenwald density and avoid disruption for not only EAST but also future reactors,such as ITER and DEMO.In order to avoid or suppress MARFE movement,the first step is to predict and identify MARFE before it moves.[16]On TEXTOR-94, CII and CIII radiation signals near the inner wall are used for MARFE detection,[6]while on Frascati Tokamak Upgrade(FTU)real-time detection of MARFE is realized by processing images of the charge coupled device (CCD) camera.[17,18]Similar to FTU, fast visible camera images are used for automatic recognition of MARFE on JET.[19–21]Apart from directly identifying MARFE with diagnostic signals, the threshold of MARFE onset and the causes of MARFE movement have been investigated extensively with theory,modeling,and experiments.In Ref.[22],the gradient of electron temperature is regarded as the driver of MARFE instability.From a study of Ref.[23], MARFE becomes unstable and moves when radiation rate reaches a critical value.A linear instability analysis shows that higher wall desorption rate leads to higher MARFE density threshold.[24]From a two-dimensional nonlinear fluid model study, the MARFE is presented when there is sufficient radiative cooling.[25]Experiments on JT-60 show higher neutral beam power helps increase MARFE threshold.[26]In EAST’s pellet injection discharges, the MARFE becomes unstable and moves when the symmetry of heat flow on the two sides of it is broken.[27]The experiment on J-TEXT shows that localized current shrinkage happens before MARFE causes disruption.[9]
The above results show that there is no unified physical model to use single or few plasma parameters to give warnings of MARFE movement.In recent years, machine learning (ML) techniques are popular to predict different instability events for whom the physics mechanism is complex and to whom multiple physics processes and physics parameters are related,such as thermal quench and current quench of major disruptions,[28,29]vertical displacement events (VDE),[30]and tearing modes prediction.[31,32]Among these frequentlyused ML methods,random forest(RF)is a kind of model that has been applied for different tasks such as tearing modes prediction and avoidance,[33]resistive wall mode stability prediction[34]and disruption prediction.[35–38]Besides, from comparison of various ML models the RF model works best for the task of fault-detection for magnetic probes in terms of speed and accuracy.[39]Similar to RF,another ensemble learning method named LightGBM is used for cross-machine disruption prediction on J-TEXT and HL-2A.[40]Above all,what makes RF model more attractive is its interpretable characteristics, i.e., it is able to decompose the final output into the contributions of each input feature named feature contribution.By comparing the feature contributions, one can extract the main causes of the final output[41]which helps guide the control system to take corresponding actions with regard to the prediction results.This paper reports an RF model to predict MARFE movement on EAST.The rest of the paper is organized as follows.Section 2 gives a brief introduction of the RF model.Section 3 is about training data collection and labeling,and the prediction results on test discharges.Section 4 is the discussion of the work and future outlook about application of the MARFE movement prediction model.
The RF is a typical example of ensemble learning with the bootstrap aggregating(bagging)method.The basic idea of ensemble learning is to combine a collection of base learners and aggregate their results as final output to reduce the model errors.[42]For RF models, the base learner is the decision tree.[43]Each regression or classification decision tree is parallelly trained with a subset of samples randomly bagged from the whole training data set.These independent decision trees compose an RF model and the output of the RF model is the average of these decision trees’ outputs.For example,an RF model consists ofTdecision tree learners,then the RF outputYRFis calculated as the mean value of single decision tree’s output
The growth of a classification tree is the recursive process of splitting a father-node into two child-nodes until the maximum depth is reached or the child-node does not need to be split anymore, i.e., it only contains samples with the same label.If a node does not have a child-node it becomes a leaf of the tree.Consider a dataset ofNsamples and a sample consisting ofFfeatures(parameters)that are represented asX= [x1,x2,...,xi,...,xF].Figure 1 shows a parent node containingSsamples is split into two children nodes.HereDP={X1,X2,...,XS}andDP=DL+DR.
Fig.1.A parent node is split into two child nodes,the left node and the right node according to the splitting condition of whether the value of the i-th feature xi is greater or less-equal than the split value xspliti.The set of instances in parent is also split into two child nodes DP=DL+DR.
The splitting includes two steps: (i) find the feature for splitting and (ii) find the splitting value of this feature.Both steps are conducted with two cycles:find the splitting value of a feature that maximizes Gini gain, then find the feature that has maximum Gini gain.Here the Gini index of a set that hasKkinds of labels is defined as
wherepkis the proportion of samples with thek-th label in the dataset.A dataset having lower Gini index corresponds to a purer dataset.When a set has only one kind of sample, its Gini index is 0.After splitting from a parent node to two child nodes,the Gini gain is calculated using
In essence,the decision tree withMleaves divides the feature space intoMregions/subspaces.The structure of a tree can be represented by all the nodes and all routines to the leaf of it.Each node contains a set of samplesDnodewith sizeSnodeinside it.The value of a node of thek-th class is represented by the proportion ofk-th samples in this node
whereNkis the number of samples ofk-th class in the node.In this paper for the task of MARFE-movement prediction,there are only two kinds of labels positive and negative, so thatK= 2 and the value of a node can be represented as
From the interpretation method of RF model in Ref.[44],the interpretability of RF prediction is extracted from every splitting of decision trees.Inside a decision tree,the value of a node is the vector that contains proportions of the samples of different classes as shown in Eq.(4).Inputting a test sampleXtestto one of the decision trees of the RF model,before it travels to the leaf and gets its final output it will pass a series of nodes.Each time it passes from a parent node to a child node by the splitting performed over some featuref,the value change between parent node and children node is attributed to that feature and is calculated as the feature’s contribution at this split
By this way the contributions of each feature to the final output are extracted for one decision tree.For the whole RF model,feature contributions are the mean value of all of the decision trees
Now the output of the RF model can be represented as the following equation:
whereYtis the value of the root node oft-th tree.This method of RF model interpretation is applied for understanding the MARFE movement prediction results in the following section.
Disruption is one of the macro-instabilities of tokamak.During major disruption, plasma current decreases to zero instantly with the speed larger than~2 MA/s.Following current quench is huge thermal and mechanical load to the first wall.Due to enormous damage to the device, disruption remains one of the biggest challenges for tokamak.Direct causes of disruption include high density limit, lowqlimit, error filed caused locked-modes and plasma elongation caused vertical instability.Density limit disruption usually happens in high density conditions.When increasing plasma density close to or above Greenwald density limit, enhancement of edge radiation might cause plasma cooling and current shrinkage,and consequently plasma disruption.The target of this predictor is to identify the MARFE movement causing density-limit disruption while avoiding false alarms for nondisruptive plasmas and non-density-limit disruptive plasmas,i.e., plasma disruptions due to other causes instead of density limit.In order to train a MARFE movement predictor,density-limit disruptive discharges caused by MARFE movement are used to collect MARFE-unstable samples and nondisruptive discharges are used to collect MARFE-stable samples.Here MARFE-unstable samples refer to the samples close to MARFE movement and MARFE-stable samples refer to the samples far away from MARFE movement.During the 2022’s first campaign of EAST,as listed in Table 1,there are 43 densitylimit disruptive discharges due to MARFE movement,2322 non-density-limit disruptive discharges and totally 4297 non-disruptive discharges.
Table 1.Number of density-limit disruptive and non-density-limit disruptive discharges and non-disruptive discharges of the EAST first campaign of 2022.
Table 2.Experiment parameters of density ramp-up experiment of the first EAST campaign of 2022.Including power range of LHW,NBI and ECH,plasma current and plasma configuration.
The density ramp-up experiments are carried out with various conditions and 43 density-limit disruptive discharges due to MARFE movement are picked out from them.The main experiment parameters and their range is listed in the Table 2.It has to be noted that for all these density limit disruptive discharges, besides normal gas puffing, their density ramps up under pellet injection at the high field side and supersonic molecular beam injection is added for some of them.Their auxiliary heating power mixture is shown in Fig.2, including neutral beam injection (NBI), lower hybrid wave heating(LHW)and electron cyclotron heating(ECH).These dots are samples collected with a time interval of 100 ms from density limit disruption in these 43 density-limit disruptive discharges.A discharge is also often called as shot.
Table 3.Plasma parameters used as input features of RF model.
Fig.2.Distribution of NBI,LHW and ECH powers for all samples from density ramp-up discharges.A dot represents a sample of plasma condition that is heated with mixture of[PLHW,PECH,PNBI].
As normally applied in machine learning, two-thirds of these 43 discharges are used for training and the other one third of them are used for testing.For density-limit disruptive shots, 28 of them are randomly selected for training and rest 15 shots are for testing.For non-density-limit disruptive and non-disruptive, after deleting the shots with signal missing,totally 5280 shots are left.~3520 shots are randomly selected for training and~1760 are randomly selected for testing.The plasma parameters(features)used to build the training samples are listed in Table 3.Here,Wmhd,βp,βN,κandq95are from equilibrium reconstruction by EFIT.dWmhd/dtis time derivative ofWmhdand dβp/dtis time derivative ofβp.IperrNormandzerrNormrepresent normalized plasma current control error and normalized vertical displacement control error, respectively.The difference between actual values and target values indicates their control errors.Iperror is normalized withIptargetandzerror is normalized with minor radius calculated by EFIT.
The RF is a supervised learning method.The RF models are trained by samples with labels.In tokamak plasma abnormal events prediction research,the most frequently used ways to assign labels to samples are using a step function to divide them into two classes according to a classification time constantτclassor using a sigmoid function to assign continuous labels to them.[45–47]For the former labeling method, if denoting the abnormal event time astevent,then samples between[tevent?τclass,tevent]are labeled as positive and samples beforetevent?τclassare labeled as negative.In studies of disruption prediction on HL-2A[48,49]the classification time is 100 ms and on J-TEXT it is 50 ms.[40]In resistive wall mode prediction on NSTX the classification time is also set to 100 ms.[34]Similarly, for MARFE movement prediction, a classification time ofτclass~100 ms regarded to MARFE movement timetMARFEmoveis applied for sample labeling.For example, in Fig.3,#115858 is a typical density-limit disruption discharge whose MARFE appears at~8.4 s and starts to move after~8.5 s which is marked astMARFEmoveand leads to disruption at 9.3 s.
Training samples are taken from both density-limit disruptive, non-disruptive discharges and other disruptive shots.Their labeling is shown in Table 4.Positive samples are taken from 100 ms beforetMARFEmoveof density limit disruptive shots with a sampling interval of 1 ms.One part of negative samples is from time of more than 100 ms beforetMARFEmovewith a sampling interval of 10 ms.Another part of negative samples is taken from the rest of shots including nondisruptive and non-density-limit disruptive shots.With a sampling frequency of 1 kHz,~2800 positive samples are collected from the MARFE-unstable phase of density-limit disruptive shots.At the same time,the negative samples are collected with a sampling frequency of 100 kHz.Because the total number of collected negative samples is much greater than the number of positive samples, in order to avoid imbalance of positive and negative samples, only~5600 negative samples are randomly selected from the whole collected dataset.They are composed of~2800 randomly selected negative samples collected from MARFE-stable phase of densitylimit shots and~2800 negative samples taken from the flattop phase of non-disruptive and non-density-limit disruptive shots.In other words,positive samples correspond to MARFE-stable phase and negative samples correspond to MARFE-unstable phase.
After building the training dataset, the RF model is built with an open-source library scikit-learn (https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.Random-ForestClassifier.html).As bootstrap sampling method is applied for RF training, only part of samples are randomly selected from the whole dataset to build each decision tree and the rest left-out samples which are also named out-of-bag samples are used to estimate the generalization score of the tree.The first parameter that needs to be tuned is the number of decision trees that consist of the RF model.By scanning the number of trees,the mean out-of-bag score is recorded in Fig.4.By increasing the number of decision trees, the outof-bag score increases from 0.8 to 0.99.The out-of-bag score changes little after the number of trees reaches above 20 and it becomes stable when number of trees reaches 80.Therefore,with considering the computation resource,the number of decision tree is set to 100 for the RF models.Besides, in order to avoid overfitting,the minimum number of samples in a leaf that can be split is set to 10.In other words,a node with less than 10 samples will not be split anymore.
Table 4.Labeling of density-limit disruptive, non-disruptive and non-density-limit disruptive shots.Here positive labels are encoded as 1 and negative labels are encoded as 0.
Fig.3.(a) Time traces of XUV radiation signals for different measurement channels of discharge #115858.(b) XUV channels measurement coverage from top to bottom corresponding to the radiation signal of the same color.(c)The 1-D contour of radiation between time[8.4 s,9.3 s]reconstructed from XUV diagnostic.(d)CCD images of plasma at different time slices showing MARFE movements.
Fig.4.Out-of-bag score of RF models with increasing number of decision trees.
Fig.5.The structure of first three layers of one of the decision trees of the RF model.The pink line highlights a routine to a leaf that is classified as MARFE-stable with all the samples inside it are negative and the Gini index of it is zero.
Fig.6.Feature importance of the input parameters of RF model calculated with the mean impurity decrease method.The parameter that is used to split a dataset and results in larger impurity decrease after splitting is assigned with relatively higher feature importance.Standard deviation is used as their error bar.
Figure 6 is the feature importance of the RF model calculated through mean decrease impurity.As the number of test density limit disruption shots is only 15, to increase the generalizability of the results,the training and testing process is carried out repeatedly five times.At each round of training and testing, the training and testing shots are randomly selected from the shots listed in Table 1.The feature importances plotted in Fig.6 are mean values of five rounds’training and testing.It indicates that the most important feature for MARFE-move detection is GWfrac, which is reasonable because MARFE is closely related to density limit disruption.The relative important features are dβp/dt, dWmhd/dtandH98whileIperrNorm,zerrNormandκare relatively unimportant features.
After training,the RF model is tested with the test shots.For all testing shots,their parameters are taken the same way as the training shots including the time derivative parameters,except that all the input parameters for test shots are then interpolated to a sampling frequency of 1 kHz,so that a value of MARFE-risk can be outputted every 1 ms from the RF model,which is what should be like in real-time experiments.For a DL disruptive test shot if alarming conditions are satisfied and an alarm can be given out,then the alarm is regarded as a successful alarm (SA).While for non-disruptive and non-DL disruptive test shots the alarms for them are regarded as false alarm (FA).The SA rate is defined as ratio of number of SA shots and number of test DL disruptive shots.The FA rate is defined as ratio of number of FA shots and number of test non-disruptive and non-DL disruptive shots
whereNSAis the number of SA shots,NDL-disrupt is the number of density-limit disruptive test shots,Nnon-disrupt is the number of non-disruptive test shots andNnon-DL-disrupt is the number of non-DL disruptive test shots.
Figure 7 shows the SA rate and FA rate for different thresholds assigned to MARFE risk alarming.Similar to Fig.6, SA rate, mean SA time and FA rate plotted in Fig.7 are mean values of five rounds’training and testing.In Fig.7,with increasing alarm threshold, FA rate decreases while SA rate and mean SA time decrease as well.The SA time is the time when an alarm is given before MARFE movement.This is reasonable, because with a higher alarm threshold, alarm condition is harder to meet and SA and FA rates get lower together.For alarm threshold of 0.4, the SA rate is~95%and FA rate is~4%.While for alarm threshold of 0.6,the SA rate is~80%and FA rate is~1%.To get a balance between SA rate and FA rate, the threshold can be set to 0.5.Under such conditions,the SA rate is~87%and FA rate is~2%so that most MARFE instability can be detected while false alarm rate is as low as can be acceptable for EAST experiment.
Fig.7.SA rate and FA rate with different alarm thresholds.Mean alarm time for SA shots.The success alarm time is calculated as SA time=tMARFEmove ?talarm.Mean SA-time is the mean of SA time for all SA shots.Standard deviation is used as their error bar.
When setting alarm threshold to be 0.5,the mean SA time in Fig.7 is shown to be~550 ms and the accumulation fraction of detected MARFE-movement for DL-disruptive shots is shown in Fig.8.It shows that around 85% of unstable MARFE(MARFE movement)can be detected at 40 ms before it,which is at least enough for disruption prediction mitigation with massive gas injection or shattered pellet injection.But a better way to handle MARFE instability is to suppress it and avoid disruption.
Fig.8.Accumulated fraction of MARFE-movement detected for densitylimit disruptive shots for different tMARFEmove ?talarm when alarm threshold is set to be 0.5.
The#113323 is a density-limit disruptive shot that is not used for training.The plasma current, GWfrac, dβp/dtandH98of this shot are plotted in Figs.9(a)–9(d).GWfrac, dβp/dtandH98are the three relatively important features of the RF model as shown in Fig.6.In Fig.9(b), GWfracincreases as density ramps up above Greenwald density.Until GWfracreaches~1.1, the sudden drop of GWfracat 6.28 s implies that MARFE movement happens which causes density diagnostic signal to drop suddenly and leads to the drop of GWfracsignal.As a consequence, in Fig.9(a), plasma current signal shows that disruption happens~300 ms after MARFE movement.Applying the RF model to it, the output value MARFE-risk is plotted in Fig.9(e).If setting the alarming conditions to be reaching alarming threshold of 0.5 and lasting for a duration longer than 10 ms, an alarming would be given attalarmas noted in Fig.9(e).Feature contributions in Fig.9(f) show that it is the dβp/dt(orange), dWmhd/dt(red)and GWfrac(purple)that mainly contribute to increase of MARFE risk at around~6 s.A more clear view of MARFE risk enhancement and the three main contributions of it are shown in Figs.9(g) and 9(h) that are the zoom in Figs.9(e)and 9(f) between time [5.8 s, 6.3 s].Figures 9(c) and 9(d)show that before MARFE movement, dβp/dt, dWmhd/dtandH98have an obvious decrease,which corresponds to MARFErisk increase of Fig.9(e).
Fig.9.Time traces of parameters of an example of a successful alarm shot.(a) The plasma current Ip, (b) GWfrac signal, (c) dβp/dt signal, (d) H98 signal of shot #113323.(e) The output of RF model which is represented as MARFE risk identifying the probability of instability MARFE.(f) The feature contributions that consist of the MARFE risk.(g) Zoom in of MARFE risk signal between time[5.8 s,6.3 s].(h)Zoom in of feature contribution signals between time[5.8 s,6.3 s].
Fig.10.The#115863 is an early warning shot.Panels(a)–(e)are time traces of its discharge signals,the RF model evaluated MARFE-risk and its feature contributions.Panels(f)–(j)are zoom-in images of left images.
Figure 8 shows that there are a few early warning shots whose alarming time is more than 1 s before MARFE movement.For example, #115863 is an early warning shot that reaches the alarming threshold first att2= 7.02 s which is~1.56 s before MARFE movement and is therefore regarded as early warning.In Figs.10(i)and 10(e),together with rapid enhancement of dβp/dtand dWmhd/dtfeature contributions,its MARFE-risk starts to increase fromt1.While the sharp drop ofH98feature contribution helps MARFE-risk maintains a low level below the threshold untilt2when a sudden increase ofH98feature contribution causes MARFE-risk quickly gets to the alarming threshold leading to early warning.Figure 10(c)shows that the increase ofH98feature contribution at around 7.01 s is caused by fast reduction of confinement factorH98from 0.8 to 0.7.Figures 10(g)and 10(h)show that the decline ofH98is a consequence of increase of plasma density as shown in Fig.10(b).Figure 10(a)shows that it is the opening of super-sonic molecular beam injection (SMBI) that results in rise of plasma density.Here the opening of SMBI is preprogrammed for the purpose of density ramp up experiment.
It indicates that besides plasma density,confinement and equilibrium parameters,other parameters such as radiation and temperature signals should be applied for MARFE movement prediction, in order to avoid early warning caused by sudden change of plasma density or confinement.
Prediction of MARFE movement is the first step towards actively handling density limit disruption and maintaining a steady-state high density operation.This paper reports an RF model to predict movement of MARFE for high density discharges on EAST.The model is able to classify MARFE movement with an accuracy of 85%–90%for density limit disruption discharges,while the false alarm for non-density-limit disruption and non-disruption discharges is below 5%.Except for the GWfrac, relatively important parameters for MARFE movement prediction are dβp/dt, dWmhd/dtandH98.The results show that it is feasible to use a machine learning model to forecast MARFE movement before density limit disruption while avoiding false alarms for discharges without unstable MARFE, which is helpful to design the high-density operation scenario for future fusion reactor.For instance,when the unstable MARFE is predicted, we can increase the auxiliary heating power or reduce the edge gas puffing to mitigate or avoid it.
On the other hand, density limit disruption will happen easily if the unstable MARFE is difficult to mitigate or avoid.Then some external means will have to be used to mitigate the disruption.At present, a major gas injection system and shattered pellet injection system have been built for disruption mitigation on EAST.And some experiments for disruption mitigation have been performed with these systems.Future work will be implemented the MARFE movement predictor into EAST plasma control system for real-time MARFE instability detection in high-density experiments.
However, it should be noted that this model in the paper is only trained with limited density limit data (totally 43 shots) from EAST now.Thus, future work will be explored using more density limit discharges data of EAST to increase model’s successful alarm rate.Meanwhile, besides the 9 parameters used in this model, it is worth investigating more effective parameters to avoid false alarm and early warning.What’s more,it is of significant importance to use crossmachine density limit data and cross-machine unified parameters to train the MARFE movement predictor so that the model can be generalized to be applied to future devices.
Acknowledgments
This work is supported by the National MCF Energy R&D Program of China (Grant Nos.2018YFE0302100 and 2019YFE03010003), the National Natural Science Foundation of China (Grant Nos.12005264, 12105322, and 12075285), the National Magnetic Confinement Fusion Science Program of China (Grant No.2022YFE03100003), the Natural Science Foundation of Anhui Province of China(Grant No.2108085QA38), the Chinese Postdoctoral Science Found (Grant No.2021000278), and the Presidential Foundation of Hefei institutes of Physical Science (Grant No.YZJJ2021QN12).