Improved Marquardt Algorithm for Training Neural Networks for Chemical Process Modeling

期刊名字：清华大学学报
文件大小：706kb
论文作者：吴建昱，何小荣
作者单位：Department of Chemical Engineering
更新时间：2020-11-11
下载次数：次

论文简介

TSINGHUA SCIENCE AND TECHNOL0GYISSN 1007-0214 04/22pp454 - 457Volume 7，Number 5，October 2002Improved Marquardt Algorithm for Training Neural Networksfor Chemical Process ModelingWU Jianyu (吴建昱), HE Xiaorong (何小荣)**Department of Chemical Engineering，Tsinghua University, Beijing 100084， ChinaAbstract: Back -propagation (BP) artificial neural networks have been widely used to model chemicalprocesses. BP networks are often trained using the generalized delta rule (GDR) algorithm but application ofsuch networks is limited because of the low convergent speed of the algorithm. This paper presents a newalgorithm incorporating the Marquardt algorithm into the BP algorithm for training feedforward BP neuralnetworks. The new algorithm was tested with several case studies and used to model the Reid vapor pressure(RVP) of stabilizer gasoline. The new algorithm has faster convergence and is much more efficient than theGDR algorithm.Key words: neural network ; Marquardt algorithm ; trainingThe algorithm is simple and easy to program.IntroductionHowever，it has some important drawbacks thatResearch on artificial neural networks (ANN) haslimit the application of the network. First， thmade great progress during the past few years.convergence speed of a first-order GDR isNeural networks have been widely used in chemicalcommonly very slow.Secondly ,the trainingprocesses. Among all kinds of networks, the back -process is prone to a local minimal point， andpropagation ( BP ) network is the most commonoscillation often appears， which makes thechoice for its high capability of nonlinear mapping，converging process fail.study and classification. Through adjustingHowever, many improved methods have beennetwork weights according to samples， the BPpresentedo accelerate andameliorate thenetwork can simulate systems with complexconvergence.' The research falls roughly into twononlinear mapping relationships， such as chemicalcategories. The first category is to prevent thprocesses.indented oscillation through adjusting δ ( asThe most common method for using the BPdescribed in Eq. (1)) and the gradient directionnetwork training process is the generalized delta-dynamicallyt2. Different adjusting ways of thisrule ( GDR ) algorithm. It is one of thekind of methods should be designed according toalgorithms that decrease function values based ondifferent conditions. Therefore， sometimes itfunction gradient changes. Consider the followingworks well but tends to fail for other conditions.problem:The research of the second category is focused onminF(x),combining the GDR algorithm with other ones.On中国煤化工DR tragradient method .The iterative equation of GDR algorithm is givencortraining processbyThTMYHC NMH G the training processxl+1)= x(k)- δ。VF(x(k)(1)observably. But it is hard to make an efficientcontrol on switching different algorithms.In the research on this problem， someReceived: 2001-04-02; revised: 2001-12-03algorithms showed good application foreground by* *To wh而方数据pondence should be addressed.using information about the second partialTel: 86-10-62784572; E-mail: hexr@ tsinghua. edu. cnderivativel41. But to directly calculate the value ofWU Jianyu (吴建昱) et al: Improred Marquardt Algorithm for Training ....455second partial derivatives precisely is so difficultand y’ can be calculated using the followingfor complex problems like ANN that a commonequations that represent the forward transmittingway is to calculate it approximately by some otherprocess in the network:methods, such as the Marquardt ( named M forn= >[wn(k,j)●x'(j)] + bn(k)，short in this paper) algorithm.In fact, the training process of BP-ANN is just atypical problem of nonlinear least squares and the .on(k)= 1+exp[- n%(k)]'M algorithm is one of the important methods toHsolve this kind of questions. It has not beenn。= >[w。(k)●ol(k)]+ bo,applied for the training of ANN because there is aninterimmatrix oflarge scale duringthe》= 1 + exp(-n')convergence process of the M algorithm. It wasdifficult to apply the M algorithm to big networksEquations to calculate all elements of the Jacobianwith large numbers of weights due to the limitationmatrix for each convergence are given by:af;of computer hardware in earlier times. In addition,Bro(k)=-y(1-)y)-6l(k)，bi=y(1-y)，though much faster convergence is obtained, thealgorithm is still based on differential coefficientsduen(k,j)=y(1-y).l(k).[1 -o:(k)].and tends to converge into a local minimum.wo(k).x'(j)，Research on these methods is inactive because inthe past ten years great progress has been made onab.(k>=y(1-y)ol.(k).[1-o6(k)].w.(k).the genetic algorithm (GA) and the simulatedannealing (SA)， both of which can skip localThe network can then be trained with an Mminimum to some extent and converge in thealgorithm through Eqs. (2) and (3) if initialweights are given.direction close to the global optimum.Recently，with the development of computers ,bottlenecks such as the physical size of EMS●b(1)memory and the large program size have alreadyx(2)been overcome. The application of the M●b.(2)algorithm seems possible， and a new method ispresented for training BP-ANN based on the Malgorithm in this paper.●b,(H)1 M Algorithm and Calculation ofJ acobian MatrixFig. 1 Neural network modelConsider the following problem of common2 Case Studies and Result Analysesnonlinear least squares :minE(x) = FT(x)F(x),2.1Case1x = (x,x,...xn)T∈R".The iterative equation of the M algorithm is given200 training and 70 checking samples are generatedby[5.6]from function y= 100+ 50 sinx. Both M and GDRx(+1)= xlk)+ .x(k)(2)algorithms are used to train the network with theO.x(A) is determined bysame initial weights. The topological structure of((J<)TJl") + λ)Q.x) is the Jacobian matrix of E(x) at x(k). IFigure 2 illustrates the error curves of the two .is an identity matrix.algorithms. It is clear that the error of the MA three-layer network shown in Fig. 1 with onealg中国煤化工ster than that of GDR.hidden layer and only one output unit should:MYHCNMHGE resutsof the twosatisfy the following equations:alglTable 1 Effect of Marquardt and GDR methods for Case 1minE= 2(y -t)2= FTF,ItemsMarquardt GDRF= (f,f..,fs, )",Number of iterations0010 000Training time(s)960 .方方数据t, (i= 1,2,.,S.).Ultimate value of error function0.0765. 870where i' is the expected output value for sample i456Tsinghua Science and Technology, October 2002，7(5): 454 - 457DRMarquardt、 Marquardt0.01|101001000Time (s)Time (S)Fig.2 Training error curves of GDR and MarquardtFig.3 Training error curves of GDR and Marquardtmethods for Case 1methods for Case 2Table 2 Convergence of Marquardt and GDR methods for2.2Case2Case 2Samples of 200 training and 70 checking areTraining errorCPU time forgenerated randomly from nonlinear functionsMarquardt (s)GDR (s)exp| xI●In"2|●x:_XCg1.00.0.30.20. 107.7y=X2十Xz十x4Inx,-0.052.123. 00.023.5161expx. (1 + exp(x,/40)) t0. 0124. 8> 330X3●Inx,●Inxz十xs●sinx;3Figure 3 illustrates the error curves of the3 Application to FCC Unit innetwork whose topological structure is 8-10-1Refinerywhen trained by the two algorithms. TableIA model is presented to estimate stabilizer gasolineshows different training effects of the two .Reid vapor pressure (RVP) for an FCC unit in aalgorithms.refinery[3]. Table 3 shows parts of the sample data.Table 3 Partial training samples for FCC unitFeedBottomTermperature of TopTopRefluxFeed flowRVP(t/h)temperature temperature reboiler vapor temperature :pressureflow (t/h)temperature(kPa)(C)(MPa)80. 0140. 0165.0170.054.09. 0C24.0 .33.041.0120. 0133. 0158.0165.0.49. 010. 0034.050.096. 0131.0159.0163.055.025.060.090. 0130.0156.0161.053.010. 5024.036.064.0...89.0125. 7152.8158.356. 29. 8037.468.077.0125.2153.1158.251.09.8031.536. 575.0123.8151.6157.851.310.0036. 234. 682.0Samples of 200 training and 76 checking are .is reduced to only 0. 25 after 720 s with 10 000selected from industrial data. The network withite中国煤化工curve of error functiontopological structure of 8-10-1 is trained by an Mof|MYH.C N M H Gicant oscillation at thealgorithm and a GDR algorithm with the samepoint wnere tne error vaue is about 0. 305，causinginitial weights. Figure 4 illustrates the objectiveerror to rise to about 4 in two iterations. Then itfunction error curves for the two differentgradually descends to the previous value. From .algorithms. .When an M algorithm is used， thethis example we can conclude that the M algorithmerror val死方数据 to below 0. 2 after 40 s.shows better results and presents faster conver-However, when the GDR is used, the error valuegence than the GDR with the industrial data.WU Jianyu (吴建昱) et al: Improred Marquardt Algorithm for Training .....4572.0p5 Conclusions1.6tUsing the M algorithm in the training process oGDRANN greatly improves the training speed andcalculational efficiency because the Jacobian matrixis used to approximate the second partialMarquardtderivatives. Convergence reaches an over-linear0.44speed while the GDR algorithm obtains a speed of100only the first order. Some examples show that theTime (s)M algorithm has a convergence speed 5 - 10 times .faster than the GDR algorithm.Fig. 4 Training error curves of GDR and Marquardtmethods for the FCC unit modelN omenclatureS。number of training samples4Improvementcalculated output value for sample iexpected output value for sample iWe can adjust λ in the M algorithm (as describedunit number of the input layerin Eq. (3)) dynamically according to the results ofHsunit number of middle hidden layeriterations. With this kind of adjustability, the Mr'(j)input value of the j-th input unit corresponding toalgorithm ensures that every time the value ofsample ierror can get some reduction and the convergencewn(k,j) weight between the k-th hidden unit and the j-thdirection is close to the Gauss-Newton directioninput unitwith the highest probability. According to Case 1，bn(k) .threshold of the k-th hidden unitthe network is trained with different initial weightswo(k) weight between the k-th hidden unit and theoutput unitby the M algorithm and the error value of each caseb。threshold of output unitcan descend to 0. 07 - 0. 08. But with the sameni(k)summed input value of the k-th hidden unitnetwork structure, it is hard for the GDR to makecorresponding to sample ithe final error value lower than that for the Mo[(k)output value of the k-th hidden unit correspondingalgorithm when the error value descends to 3. 0.to sample iThe GDR degressive rate for iteration is only aboutsummed output value of the output unit0.0001 each time, which shows that making moredecrease is almost impossible. It is similar forCase 2. Error value of the M algorithm can .Referencesdescend to 0. 001 while it is rather difficult to[1] Jiao Lichen， Theory of Neural Network Systems.descend to 0. 01 with the GDR.Xi'an: Xidian University Press, 1992. (in Chinese)Though the M algorithm showshigher[2] Chen Ming. Neural Network Models. Dalian: Daliancalculational efficiency， sometimes it leads theUniversity of Technology Press, 1995. (in Chinese )iteration process to some local minimum with high[3] Yao Xiaoli. Research on the application of ANN toerror value. As a result, λ will increase to find thethe optimal operation of petrochemical processesconvergence direction making the error descend[Ph. D. Dissertation]. Beiing: Tsinghua University，again by taking several additional iterations， thus1993. (in Chinese )the efficiency is greatly reduced. This defect is[4] Hagan M T，Menhaj M B. Training feedforwardnetworks with the Marquardt algorithm. IEEEovercome by setting an upper limitative value of入。Transaction on Neural Networks， 1994， 5(6):When the actual value of λ exceeds the limitation,he process of increasing λ is stopped and a new中国煤化工nization Methods Used inround of iteration begins. Although it increases theYHC N M H Gejing: Chemical Industryerror value，commonly the degree of increase is soPress，1992. (in Chinese)small that the error value will rapidly go down[6] Chen Baolin.Algorithms and Theories)fafter one or two further iterations. In fact， thisOptimization. Beijing: Tsinghua University Press,1989. (in Chinese)kind of improvement is just one way that uses newvalues of而夜数据o train the network.

论文截图