Kernel Factor Analysis Algorithm with Varimax

期刊名字：西南交通大学学报（英文版）
文件大小：658kb
论文作者：Xia Guoen，Jin Weidong，Zhang Ge
作者单位：School of Economics and Management,School of Electrical Engineering
更新时间：2020-12-06
下载次数：次

论文简介

Vol.14 No.4Joumal of Souhwest Jiaotong UniversityOct. 2006Article ID: 1005 2429(2006)04-0394-06Kernel Factor Analysis Algorithm with VarimaxXia Guoen (夏国恩)* Jin Weidong (金炜东)" Zhang Gexiang (张葛祥)21. Scool of Economics and Management, Southwest Jiaotong Uniersity, Chendu 610031, China2. School of Eletrical Enginering, Southwoest Jiaotong Uniersiy, Chendu 610031, ChinaAbstractKernal factor analysis ( KFA) with varimax was proposed by using Mercer kermel function which can map the data in the origi-nal space to a high dimensional feature space , and was compared with the kernel principle component analysis (KPCA). The resultsshow that the best error rate in handwritten digit recognition by kemel factor analysis with varimax (4. 2% ) was superior to KPCA(4. 4%). The KFA with varimax could more accurately image handwitten digit recognition.Key words Kermel factor analysis; Kemel principal component analysis; Support vector machine; Varimax; Algorithm; Handwritten digit recognitionIntroductiontionships between realistic variables and the fitingnonlinear models are much more desirable than theFactor analysis (FA) is a powerful technique foruse of ad hoc linear models'.extracting structure from high-dimensional data sets.To overcome the above ifcultie, nonlinear FAThe objective of FA is to find a linear transform fromwas studied and some methods were presented, whicha set of hidden common factors to data sets. Com-can be divided into the following three types.pared with principal component analysis (PCA), FA(1) Nonlinear functional form exists between the fac-may achieve several solutions to a problem, and it cantors, while linear one exists between the parame-yield more straightforward and interpretable results ofters[2+1(2) The factors and parameters are formu-the problem. However, in the applications of conven-lized by nonlinear functionsl1.5]. (3) There is no par-tional FA, linear models are used almost exclusively.ticular parametric functional form6-10] such as princi-The popularity of the method has resulted in the over-pal curves, blind source separation, neural networks,look of nonlinear relationships and the lack of motiva-semiparametric dynamic FA, etc. All of the threetion for developping a nonlinear FA. In addition, themethods have the following two shortcomings:subject-matter theory suggests that the nonlinear rela-(1) the computation is complex and inaccurate formultidimensional variables; (2) the methods can notimpose particular distributional forms for unobservableReceived 2005-09- 31factors and errors.Foundation itemThe National Defence Foundation 0To solve the above problems, we propose a ker-China ( No. NEWL51435Q220401 )Biography Xia Guoen ( 1977-), PhD candidate. His re-rel I中国煤化i:orithm with varimax.pe input space to a fea-searches of interest are in management information system andYHCNMHGture opavc wiul a imngu uicuoun by using Mercer ker-decision support system* Corresponding author. Tel: + 86-13980432542; E-mail:nel function and then performs the kernel factor analy-gandf007711 @ 163. comsis with varimax in the feature space. A comparisonXia Cuoen et al. / Kernel Factor Analysis with Varimax Algorithm395between KFA and KPCA["I in the simulation experi-Letx(x;∈R*, i=1,2,..,l) be in the inputmeant of handwitten digit recognition reveals that KFAspace. For the kermel version of FA, first define awih varimax could image more accurately handwrit-nonlinear mapping of the centered input data in theo digit recognition and could be an effective meas-feature space as φ: R"-→+F. Using the covarianceure for studying pattern recognition.matrix@(x;)中(x)"(3)1 Mercer Kernelc=+EWe assume that (X,p) is a finite measure spacein F (if F is infinite dimensional， we considerand k∈L。(X ) is a symmetric real-valued kernel ac-中(x)中(x) as the linear operator that maps X∈Fcording to Mercer's theorem(12]. The integral expres-to中(xj)(D(x) .X)). We need to find the eigen-values λ(≥0) and eigenvectors v e F such thatsion isCv=λu.(4)T:L(X)-→L(X)Note that all the eigenvectors corresponding to(TjO)(x: = k k(x.,y)f(y)(u())feigenvalues lie in the span of the transformed data in aEq. (1) is positive; namely, for allfeL2(X),high dimensional space. The equivalent relation can[k(x,)f(x)f(y)du(x)qu(y)≥0.be written as(x).Cv =λ(D(x)D), k=l,.,n(5)Letψ∈L(X) be the normalized eigenfunctionsAlso, the relationship between an ,,"", a andof T, and T is associated with the eigenvalucs )}(λjU is>0). Then there existl2](1) (A,)∈h;U= 2 a,D(x).(6)(2) ψ∈L.(X), sup,I4yIL. <∞;Combination of Eqs. (4), (5) and (6) yieldsVp(3) k(x,y)= 2 λψ(x)4(y). .中(x)(片2 (x)$(x)")xAccording to Ref. [12], it follows that k(x,y)corresponds to a dot product in些,i.e.，(家。a,D(x) )=k(x,y) =中(x) .中(y)，入((x).( 2 ab(x)),where φ:X-→型, andx- +( Aj4;(x)),j=1,2,..Nn,xeX. (2)k=1,',1,All of the above variables can be seen in Ref. [12].In fact, the uniform convergence of the series+ 2 a(x).2 (x)ximplies that given ε>0, there exists an n∈N such((x)'(x))=that even if the range of φ is infinite. dimensional, kcan be approximated within accuracy ε as a dot prod-s 2 a.(+(x,).中(x) ),uct in R* between images of φ°:x-→( Aψ(x),k=1,2,..,l.(7)\^y4r(x),，Ay4.(x)).Further, we define a lxl kemel matrix K asKy: =φ(x).中(x), i;j=1,,.2 Theoretic Derivation of AlgorithmThern中国煤化工asIn FA, estimation of factor loading is usuallyMYHCNMHG(8)done by either of the following two methods: maxi-BecauseK is a symmetric matrix and a set ofmum likelihood estimation and principal factor analy-eigenvectors lie in the span of the transformed data insis. The latter is adopted in this paper.the integral feature space, we haveJoumal of Souhwest Jiaotong Universitylλa =Ka.cific factors and is estimated throughIn addition, to center the sample input vectors of=diag (平n ,,',.u).Eq.(7) on中(x), i.e.,Note that ψg =1/sy andC-' =S for alli;j=1,2..,l.The factor loadings in KFA are not unique and2 φ(x) =0，some substantial loadings on more than one factorthe centered observable vectors in the feature spacemay be negative or positive. This indicates that theare defined asresults in standard KFA are difficult to be interpreted.(x):=中(x)-一2中(x).(10)To conquer these diffculties, we perform a rigid rota-tion of axes of factor space to identify a data structureThen the modified kemel matrix is[1which is simpler and easier to be explained. OneKy=(西(x).動(x)) =((x) -well-known method is varimax rotation!l). Accord-t 2 (.x)((x)-.(xn))=ing to its fundamental principle, factors are composedof a few large loadings , normally achieved by an iter-K-- ElmKm-t S Kuly+ative maximization of a quadratic function of the fac~tor loadings. It is worth noting that the varimax rota-p EiKmly=tion aims at a sparse response to the data, and this hasbeen acknowledged as an efficient form coding[15].(K-lK-1 +1]K1)y(11)By rotating A with varimax, we obtain a new de-According to the definition, K is positively semi-composition formula:definite and has nonnegative eigenvalues. We there-s=(A7)(A7)T+4,fore need diagonalize K. Let λ≤λ2≤.≤λl denote .where At is a new factor loading matrix.the eigenvalues, and a' ,a2 ,.. , a' the correspondingeigenvectors. We normalize v* by the corresponding3 Procedures of Algorithmvector which has been normalized inF, i.e. v".v"=Based on the theoretic derivation above, the pro-cedure of the algorithm is presented as follows:From Eqs. (6) and (7), we have the normaliza-(1) Normalize x; in the input space, where x∈tion condition for a*:R", i=1,.,l.1=v*.v*= 2 ala}(@(x,).中(x))=(2) Build kemel matrix K by selecting appropri-ate kermnel functions and parameters.(a*.Ka*) =λ2(a*.a*).(3) By Eq. (9), calculate the eigenvalues ofSuppose T≥85% is the accumulative varianceK/l,i.e. λ; ≤λz≤≤λI, and the correspondingcontributing rate of common factors based on 0.85eigenvectors a' ,a2 ,.,a'.rule(18), and m is the number of common factors. If(4) Determine the value of m according to the0.85 rule of the accumulative variance contributingrate. Because most of the variance information centersZP=每t.E入in a few common factors in the feature space, origi-nally KFA can extract more data information when itwe obtain the factor loading matrix中国煤化工., more common fac-A=[JAv AzvP . A~v"] (12)YHC NMH Gd variance is large eAccording to Eqs. (3), (11) and (12), the de-nough to reach the limit, the extracted data informa-composition formula of KFA model isS=MT +重,tion trends to be an equilibrium value.where业represents the. covariance matrix of the spe-(5) Calculate the factor loading matrix397Xia Guoen etal. / Kernel Factor Analysis with Varimax Algorithmthe variance is captured by the first and second princi-s=[ Jv - A0]pal components. The first principal component alignsaccording to Eq. (11).itself in the direction of maximum of the global data(6) Obtain the new decomposition fornulaset, and the second is then orthogonal to the first one.s=(A7)(Ai)T +业This method does not consider the fact that data isby using varimax rotation.grouped into three clusters. In the case of KFA with avarimax, we can explain the data in terms of common4 Simulation Examplesvariance dirctions within the data clusters. In the ro-We demonstrate our method with an arificial da-tation of factor axes, not only the variances within theta set and compare two different kemel methods: aindividual clusters but also the global variance of thelinear kermel and a second order polynomial kemel.full data set are considered.Figs. 1 and 2 show the results of KPCA and KFAFigs. 3 and 4 show the resuts of KPCA and KFAwith varimax in a linear kernel space. n this case,with varimax in a second order polynomial kemelthe data set consists of three clusters cenlered on aspace. The resuts ilstrte another advantage of KFAline , and the deviation of any point in a cluster fromwith varimax. KFA with varimax provides a meansits center is dermined by additive Gaussian noise.for choosing the number of basis vectors required;The contours of these figures are the lines of equalthatis to say, the components with negative valuescrrelation in the feature space. As to KPCA, most ofcan be automatically discarded. All of the components1.5.5.0osL5,-057(a) Variance is 0.340(b) Variance is0.011(c) Variance is0.000fig.1 KPCA perfomed in a linear kemel space1.0L.0 10.5)5。中国煤化工-0.5-0MHCNMHG'(a) Variance is0. 339(b) Variance is 0.012(c) Variance is 0.000Fg.2 KFA wih vaimax rotation performned in a linear kermel space398Journal of Southuest Jiaotong University1.5.5 1.0.07.50.50-0.5.-0.5(田) Variance is0.225(b) Variance is 0.025(c) Variance is 0.003(d) Variance is 0.000Fig.3 KPCA performed in a second order polynomial kemel space.s51.:1.0.都(a) Variance is 0.128(b) Variance is 0.123Fig.4 KFA with varimax rotation performed in a second order polynomial kemel spacein KPCA are positive, and a decision must be madeto discard the smaller components on the “screenTable l Test error rates on the USPS handwrittendigit database by KPCAslope".Number ofDegreecomponents2355 Pattern Recognition Experiment8.98.0 8.7 9.3 9.4 11.1We extracted nonlinear factors by KFA with vari-7.2 6.8 7.0 6.9 7.4 7.8126.05.7 6.2 6.0 6.17.0max from a handwritten character database: the US2565.5 5.3 5.2， 5.3 5.4 5.6postal service ( USPS ) database of handwitten dig-5.0.5_ 5.2 4.7its.16]. This database contains 9 298 256-dimensionalNote: Linear support vector machines were trained on nonlinearsamples , in which 2 007 samples make up the test set.principal components extracted by PCA with second to seventh degreeFor convenience of computation, we used a subset ofpolynomial kemel.2 500 samples to train the matrix K. To assess theTable2 Test error rates on the USPS handwitenutility of factors, we trained a soft-margin hyperplanedigit database by KFA with varimaxclassifier support vector machine ( SVM)["]. Thespecific character of SVM is to use the standard dotcomponents 2 3 467product as a kernel function. It simply tries to sepa-3:8.88.9.210.267.06.7 6.8 6.7 7.0 7.5rate the training data by hyperplane with a large. 5.85.6 6.1 5.9 6.0 6.7margin.中国煤化工5.0 5.1 5.2Tables 1 and 2 list the test error rates on the4.9 4.7 4.8CHCNMHGUSPS handwitter digit database by KPCA and KFAwith varimax， respectively. The data illustrate anfactors; extracted'by KFA with varimax.advantage of using KFA with varimax, i.e. , the per-.Xia Guoen et al. / Kernel Facor Analysis with Varimax Algorihm399formance of linear classifier trained on nonlinear fac-Livermore National Laboratory. 2002: 1-18.tors with varimax is better than the one trained on KP-[7] Irie B, Kawato M. Acquisition of internal representationby multi-layered perceptrons[J]. J. IEICE J73-D,CA. Among all degrees, the optimal degree of ker-1990, 2(8): 1177-1178 ( in Japanese).nels is 4, which conforms to SVM results on the same[8] Hastie T, Stuetzle W. Principal curves[J]. Joumal ofdata set!18] . Moreover, on testing error rate, our clas-the American Statistical Association, 1989, 84 (406):sifier with 512 common factors (4. 2%) is competi-502-516.tive to the convolutional five-layer neural networks[9] Attias H. Independent factor analysis[J]. Neural Com-(5. 0%)[19] and KPCA classifier (4. 4%)[3]. It isputation, 1999, 11(4): 803-851.much better than linear cassifers operating directly on [10] Valpola H, Ostmnan T, Karhunen J. Nonlinear indethe image data (a linear SVM achieves 8. 9%9]).pendent factor analysis by hierarchical models[A]. In:Proc.4th Int. Symp. on Independent Component Analy-6 Concluding Remarkssis and Blind Signal Separation (ICA2003)[C], Nara,Japan, 2003. 2003 : 257 -262.We proposed a new kemnel approach, KFA withvarimax , and compared its performance to KPCA. It[11] Scholkopf B, Smola A, Muller K R . Nonlinear compo-nent analysis as a kermel eigenvalue problem[J]. Newralis shown that when KFA is used in conjunction with aComputation, 1998( 10): 1299-1319.varimax rotation in the experiment, more humanly[12] Scholkopf B, Mika s, Burges C, et al. Input space ver-understandable structure is identified. This is becausesus feature space in kernel-based methods [J]. IEEEa varimax rotation has a great advantage for a simpleTransations on Neural Networks, 1999, 10(5): 1000-or sparse structure.1017.[13] TuHS, He P, Zhao L W. Application stistics[M].ReferencesChengdu: Southwest Jiaotong University Press, 1993:[1] Yalcin I, Amemiya Y. Nonlinear factor analysis as a sta.-190-191 (in Chinese).tstial method[J]. Saisical Science, 2001, 16(3): [14] KaiserHF. The varimax citerion for analytial rotation275- 294in factor analysis[J]. Psychometrika, 1958(23): 187-[2] Kenney D A, Judd C M. Estimating the nonlinear and in-teractive efects of latent variables [J]. Psychological[15] Field D. Wbat is the goal of sensory coding? [J]. Neu-Bulletin, 1984(96): 201- 210.ral Compuation, 1994(6): 555- 601.3] Joreskog K G. On the estimation of polychoric correla-[16] Smola A, Scholkopf B. Kernel machines[ DB/OL]. ht-tions and their asymptotic covariance matrix[J]. Psy-tp://www. kemnelmachinesorg/data, 2004-9-10.chometrika, 1994(59) : 381-389.[17] Cortes C, Vaprik V. Support vector networks[J]. Ma-[4] Wall M M, Amemiya Y. Generalized appended productchine Leamning, 1995, 20(3): 272 -297.indicator procedure for ftting polynomial structural models[18] Scolkopf B, Burges C, Vapnik V. Extracting support[J]. Educational and Beharioral Saistis, 2001(26): 1-29.data for a given task[A]. In: Fayyad U M, Uthurn-[5] Amemiya Y. On nonlinear factor analysis[A]. In: Pro-samy R, eds. First Intl. Conference on Knowledge Dis-ceedings of the ASA Social Statistics Section[ C]. Wash-covery and Data Mining[ C]. Menlo Park, CA: AAAIington, DC: American Statistical Association, 1993: 290Press, 1995: 262- -267.-294.[19] CunL, Boser B, Denker J S, et al. Backpropagation[6] Fodor I K. A survey of dimension reduction techniquesaplied to handwitten zip code recognitioa[J]. Neural[R]. Center for Applied Scientific Computing, LawrenceComputation, 1989, 1(4): 541-551.中国煤化工MHCNMHG

论文截图