Neural Network Pruning Algorithm with Penalty OBS Process

期刊名字：中国矿业大学学报（英文版）
文件大小：227kb
论文作者：MENG Jiang，WANG Yao-cai，LIU Ta
作者单位：Shool of Information & Electrical Engineering
更新时间：2020-11-11
下载次数：次

论文简介

Mar. 2005J. China Univ. of Mining & Tech. (English Edition)VoL.15 No.1Neural Network Pruning Algorithm withPenalty OBS ProcessMENG Jiang, WANG Yao cai, LIU TaoShool of Iformation & Electrical Engineering, China University ofMining & Technologx Xuzhou, Jiangsu Province 221008, ChinaAbstract: Aimed at the great computing complexity of optimal brain surgeon (OBS) process, a pruning algorithm withpenalty OBS process is presented. Compared with sensitive and regularized methods, the penalty OBS algorithm notonly avoids time consuming defect and low pruning fficiency in OBS process, but also keeps higher generalizationand pruning accuracy than Levenberg-Marquardt method.Key words: generalization; neural network; pruning algorithm; penalty method; optimal brain surgeonCLC number: TP 1831 Introductionerror function in order to restrict complicated topol-ogy of NN. Correspondingly, the regularized meth-Nonlinear mapping is a key factor for neuralods, i.e. weight decay (3), weight eliminating (4), andnetworks (NN) to be applied widely. Given ade-the hybrid method 5, have not achieved such goodquate complicated topology, NN can approach theperformances as sensitive methods, for the fixedoutputs of training set at any precision with quitepenalty expression cannot suit various problem mod-big variance; On the contrary, if simpler structure isels.designed to show the main trend of objective data,To improve pruning eficiency of NN, this pa-small variance will be achieved with big bias, whichper will integrate advantages of both methods men-is called bias/variance trade off. The pruning meth-tioned above, and develop a pruning algorithm withods introduced in this paper can solve the problempenalty optimal brain surgeon process for neuralof inconsistent relation between generalization andnetwork to make the penalty OBS algorithm nottraining precision partly.only overcome time-consuming defect and lowThe two kinds of ideas of neural network prun-pruning efficiency in OBS process, but also keeping are sensitive and regularized methods. The for-generalization and pruning accuracy.mer deletes special weights with minimal influenceon the error function one by one, till meeting the stop2 Summary of OBS Algorithmcriteria. Based on optimal brain damage (OBD)method ", optimal brain surgeon (OBS) method [2] aBefore being pruned by OBS process, neural net-a "“static” pruning process, deletes unimportantwork must satisfy two approximations as followsl.weights after training, and the training-pruning proc-1) Extremal Approximation. Assume that theess may repeats several times when it deals withoau I中国煤化工network only afterquite complex structures. The superfluous processthe:ulYHCN M H Gglobal mininum inwhich means thebrings not only high pruning efficiency, but also greatcomputing complexity. The latter uses a“dy-error surface.namic'"strategy, which appends some penalty term to2) Quadratic Approximation. Assume that errorReceipohne 2004; accepted 04 september 2004Corresoondine author: Tel: +86-516 3885701: E mail address blfae@ 163.comMENG Jiang et al.Neural Network Pruning Algorithm with Penalty OBS Process53surface around local/global minimum is nearlyO(n2p), where n is the number of network weights,quadratic.and p is the number of training sets. The efficientBased on both approximations above, the Tay-way to reduce OBS complexity is to express H' inlor series expansion of objective functiona simple and economic way.5.(w+ 0w)=5 (w)+ g"(w)Ow +3 Ow" H(W)Ow+3 Process of Penalty OBS AlgorithmConsidering the Hessian matrix in Eq. (5),can be simplified with the following equation:which appears in network training process whenS5(Qw)= 5(w + Aw)-5&(w)=-sw"H(w)Qw，(1)using the second order methods, i.e. Levenberg-where: Aw is the increment of w (network weight) ;Marquardt LM), we propose a new pruning algo-g(w) is the gradient of w; H(w) = EvV0w, it is therithm to append OBS deleting case to error functionas a penalty term in neural network training.Hessian matrix of error function.In order to adapt OBS deleting case to NNThe OBS goal is to minimize Eq. (1) and sat-training course, the extremal approximation will notisfy the following condition:be satisfied, and Eq. (1) should be modified to theu[Aw+wq=0,(2)where ug, corresponding to scalar Wq, is the unitfollowing equation:vector with only the q-th element being equal to 1. .052(w)≈g{(w)Qw+-OwTH:(w)Sw . (7)So, the optimal model of the OBS process isThus, quadratic approximation also faces themin-Aw' THAw,danger of invalidation. To keep availability of thes.t. u[Aw+wq =0,(3)quadratic approximation, we introduce trust regionmethod, whose arithmetic model isand the Lagrangian operator S can be constructed ass=zSw' H(w)Qwv - Au"Aw+w). (4)min q%)(Qw)= 5&v(wx)+g[Qw+zAwT HAws.t.|lw|≤h，(8)where h is Lagrangian multiplier. The best incre-ment of w can be gained from taking derivative ofwhere: gk, Hk are the gradient and Hessian of Wk,respectively; q'(Aw) is a reasonable approximationEq. (4):to Oξ(Ow) based on A w|≤hq; lIOwII denotes anywg(5)sw=-nrnH"u,'kind of norm of Aw. Given special normll. lI2, Eq.(8) can be transformed into the arithmetic model ofwhere q is the index of minimal S, which is definedLM method.as“saliency" of W. The q should be such a valueConsidering weight deleting case of OBS ,that it will make the influence of deleting Wq on theu[Aw +1w, =0，(9error function 晶(w) be the smallest, which is ex-the trust region model of penalty OBS algorithmpressed ascan be deduced asq= minS; = min(i=1,..N)，(6)q foqihom 2[H].minq'(0w)=5 (w)+ g'Sw +Aw'H,Awwhere: Hl is the inverse Hessian matrix H; [u[sw"Sw=h(10)denotes the (i, i) element of H' matrix; N equalsLu[Aw +7w。=0to the length of weight w.Simila中国煤化工of the penatyAccording to the detailed statements above, theOBS amost crucial part of OBS process is the calculation:TYHC NMH G,s=gAw+gow HAw+5(aw"sw-负)-of inverse Hessian. Hassibil7-8) gave an iterative2“(11)formula to solve the problem, but Stahlberger et al.h(u[Sw +n wy)pointed out that OBS algorithm is as complex aswhere: μ, non-negative, is Larangian multiplier for54J. China Univ. of Mining & Tech. (English Edition)Vol.15 No.1trust region case; λ is Larangian multiplier for OBSthe saliency of each weight by Eq. (15).deleting case; η is decay coefficient, in view of5) Compute the weight increment Qw with Eq.changing importance of weight in training process.(14), if Sq is much smaller than others; otherwise,By taking derivation of Eq. (11) equal to zero, wecalculate it with LM step, Ow = (Hx+ μ)'gkhave.6) Check the validity of weight update equatiSdOw=gk +H,Aw+μOw -hug =0.(12)-on (w"eW = Wk + Aw) through the new error grew, andadjust the coefficient of u. Ifgξnew < 5v(W), ξav(Wk+1)Thus, the equivalent equation of penalty OBS= wVhew; otherwise, return step 3) for recalculationmodel can be obtained:with new value of μ.(H→μI)Aw=-gk +hue .13)7) Judge whether to reach the stop criteria. IfAssuring a certain μ to keep Hk + μl (markednot; letk=k + 1 and return step 3) to continue,vith H) positive, the weight updating formula ofamong which reset μ to the started value, once somepenalty OBS algorithm can be obtained from si-.decayed weight, less than the pre-set threshold.multaneous Equations (9) and (13),According to the simulating results, the deca-Ow = DeeaypenaltyOBs + NewtonLM=ying factor n works well, varying from 0.15 to 0.30.(0i)厂B1- T([rn(H-7'8,(14))4 lustration and Discussion[HJq .where: Decay pnalyOBs is the weight decay of penaltyThe ilustrative example is the famous sunspotOBS process; NewtonLM is the search step of LMbenchmark data from the year 1700 to 1979,method; and the index q can be computed throughwell-known from the Time Series Analysis Com-minimizing S; with following equationmunity. First we initialize fully connected feedforward neural network with topology parameter ofs;= )“Bu)-咖jg{(Hr)~8=2[212- 8-1(Fig. 1a). And the training and test sets are(15)data from the year 1700 to 1920 and from 1921 tonw? - 2nw[(Hi)-'gki，1979, respectively (Fig. 1b). .2[H]lUsing penalty OBS and OBS algorithm to trainTo sum up, the basic procedure of penalty OBSand prune in this network, the final structures arealgorthm is as follows:shown in Fig. 2a and Fig.3a, respectively. Noting1) Build a neural network on real question, andthat OBS process cannot be applied in NN withoutconfirm the train and test data for the question.training, LM is implemented before pruning. The2) Init parameters of the algorithm, μ and n, .error curves in Fig. 2b and Fig.3b indicate that theand the iterative indexk= 1.former is more stable :than the latter, on account of3) Calculate the eror function ξav(Wk), the gradientdynamic pruning with penalty term. The generaliza-gk and inverse of H(= H+(uI)based on LM method.tion of penalty OBS is approximate to that of OBS4) Achieve the index q through minimizing S,within the same iteration.XK12)1.0-0.中国煤化工MH.CNMHG(-1)Input layerHidden layer Output layer1700 1750 1800 1850 1990 1950Year(a) Fully connccted network(b) Sunspot benchmark data(1700-1979)Fig. 1 Feedforward neural network model and data sourceMENG Jiang et al.Neural Network Pruning Algorithm with Penalty OBS Process0.008py(0-12) ax Train error0.007-o Test error0.0060.0050.0040.0030.0020.001σ 20406080100 120Lteratio number(a) Network structure(b) Train and test errorFig. 2Training and test result of penalty OBS algorithmOn the other hand, the pruning rate of penaltyrules for decision support system. For example, theOBS is a ltle smaller than that of OBS. The pen-choice of appropriate weight decaying factor η isalty OBS process needs to be modified to keep inone of the key directions in future modifications.good accuracy and simple structure and to extract比-12)00.016; TanerorTest error0.0140.0120.010.008(-1)2040080 100120(b) The train and test error of the OBS algorithmFig. 3 The pruning result of OBS after LM trainingthe result of the former is better than that of the lat-5 Conclusionster in combination of both learming accuracy and1) The penalty OBS algorithm overcomesgeneration ability.shortcomings of time consuming and low efficiency3) Penalty expression in penalty OBS processin OBS process, which needs extra calculations.can estimate saliencies of all weights based on real2) The calculation of penalty OBS processtraining data, which is suitable for different.with LM method is equivalent to that of LM, butReferences[1] Le C Y, Denker J S, Solla s A. Optimal brain damage. Advances in Neural Information Processing Systems 2.San Mateo:Morgan Kaufmann, 1990.2] Hassibi B, Stork D G Wolff G J. Optimal Brain Surgeon and General Network Pruning. IEEE International Conference onNeural Networks, 1992, 1: 293- -299.3] Hinton G E. Connectionist learning procedures. Arificial Itelligence, 1989, 40: 185 -224.4] Weigend A s, Rumelhart D E, Huberman B A. Generalization by weight-elimination with application to forecasting. Ad-vances in Neural lnformation Processing Systems 3. San Mateo: Morgan Kaufmann, 1991.I5] Setiono R. A penalty-function approach for pruning feedforward neural networks. Neural Computation, 1997, 9(1): 185- 204.[6] Harkin s. Neural Networks: A Comprehensive Foundation, 2nd edition, translated by Ye S W. Beijing: China MachinePress,2004, 156 -159. (In Chinese)中国煤化工s Neural InformationProcessing Systems 5. San Mateo: Morgan Kaufmann, 1993.MYHCNMHG8] Hassibi B. Optimal brain surgeon: Extensions and performance comparisons. Advances in Neural Information ProcessingSystem 6. San Mateo: Morgan Kaufmann, 1994.

论文截图