Similar component analysis

期刊名字：自然科学进展（英文版）
文件大小：343kb
论文作者：ZHANG Hong，WANG Xin，LI Junwei，
作者单位：Image Center,The 207th Institute of the Second Academy of China Aerospace Science Industry Corporation
更新时间：2020-11-22
下载次数：次

论文简介

PROGRESS IN NATURAL SCIENCEVol. 16 ,No. 12 ,December 2006SHORT COMMUNICATIONSSimilar component analysis'ZHANG Hong** , WANG Xin' ,LI Junwei2 and CAO Xianguang'( 1. Image Center , Astronautics School , BeiHang University , Beijing 100083 , China ;2. The 207th Institute of the Second Academy ofChina Aerospace Science Industry Corporation , Beiing 100854 , China )Received March 6 , 2006 irevised April 29 , 2006A new unsupervised feature extraction method called similar component analysis( SCA )is proposed in this paper. SCAmethod has a self-aggregation property that the data objects will move towards each other to form clusters through SCA theretically ,which can reveal the inherent pattern of similarity hidden in the dataset. The inputs of SCA are just the pairwise similarities of the dataset，which makes it ceasier for time series analysis due to the variable length of the time series. Our experimental results on many problems haveverified the effectiveness of SCA on some engineering application.Keywords : clustering , feature extraction , similar component.Data clustering 121 is an old problem in patterntained on the dataset with arbitrarily shaped clusters ,recognition and data mining community , since orga-where traditional clustering approaches would fail.nizing observed data on groups or clusters is the firstHowever , the computation of affinity matrix in specstep to exploit coherent patterns and useful structurestral clustering is also dependent on the dataset itself.hidden in the dataset. Many data clustering methodsInspired by spectral clustering ，a novel featurehave been proposed in the past several decades. How-ever , some of them are only effective for some specificextraction method called similar component analysis iscluster shapes. For example , k-meanS2] is suitableproposed( SCA ) in this paper. The features extractedfor the dataset having round clusters , while Gaussianby SCA are called similar components analysis. It ismixture mode 3]is more effective to discover the ellip-proved theoretically that if the clusters hidden in thetical clusters. Another fact that hinders the applica-dataset are non-overlapped , then the similar compo-tions of the traditional clustering methods is that mostnents can make the data from the same class self-ag-of them are based on the dataset itself , which meansgregated.' The similar components of the data in thethat all the training data should be supplied as inputs.same class will have the same value , while the SimilarHowever , in some fields such as time series analysis ,Components of the data in different classes will havethe data objects may have different dimensionalities ,different values. Moreover ，in this case ，we onlyand it is hard for those traditional clustering methodsneed the pairwise similarities of the dataset as our in-to cluster this kind of data. For instance, in k-puts , which makes our method easily be generalizedmeans , the mean of different clusters is computed into some cluster problems difficult to be solved by theeach iteration step ,but for a set of time series of dif-traditional method. Like time series clustering，therehave been many methods to measure the similarity offerent lengths , we do not know what their mean is.time series data no matter how long the time seriesIn recent years，a new clustering algorithmare，Our experimental results verify the effec-called spectral clustering 451 was proposed based ontiveness of o11r, pronnsed annrnach .graph theory. The main idea of spectral clustering is中国煤化工firstly to embed the data points to some space inHCNMHG:lysiswhich the clusters are more' obvious" , then perform1.1 Overview of the algorithma classical clustering algorithm , such as k-meansIn such a way , impressively good results can be ob-Generally we can quantify the associations among* Supprted by the State Key Laboratory of Object Identity( Grant No. 51476010105HK0101 )To whom correspondence should be addressed. E- mail : dmrzhang@ buaa. edu. cn1344www. tandf. co. uk/journals Progress in Natural Science Vol. 16 No. 122006data objects by a similarity metric , such as the Eu-dexed consecutively. Since the rows of s have beenclidean distance and Mahalanobis distance. Our SCAnormalized , from the theorem of Perron- Frobenius 9algorithm starts with a similarity matrix s =( s;j )that 1 is the largest eigenvalue of S ,it can be easilywith s;;= s;≥0. The s;; measures the similarity be-verified that the column vector z;=(0 0.. re;tween data i and data j. s is row-normalized such0 0)∈R~ ,where i∈{1 2... ,K }is the eigenvec-tor of S corresponding to eigenvalue λmax = 1. Here ,thats; = 1。Thus , SCA simply performs eigen-e;=(1 ,1 .. ,1 )isa row vector with its dimensional-value decomposition to s as follows :ity identical to the size of the i-th cluster. And it canSv = λv.(1)be easily deduced that for any K real numbers( a1 ,Then the k eigenvectors corresponding to the largestk eigenvalues( also called the k dominant eigenvec-a2 r.. rak),z =,a;z; is also an eigenvector of Stors)are called the first k similar components of thewith the eigenvalue 入mx = 1. Clearly , all the ele-dataset. The procedure of SCA is shown as follows :ments within the same cluster will have the same val-Step 1 :Input the similarity metric d , number ofuein z , which makes z a K -step function. That iscomponents k.tosay , in the space spanned by the first K similarcomponents, all the data objects belonging to theStep 2 :Construct the similarity matrix s=( si; )same cluster will self- aggregate to the same point.=(d(i j)).Proposition 1 also answers why the features ex-tracted by SCA are called similar components analy-Step 3 : Normalize the rowsof S to makej=1sis.=11.3 Connection to kernel PCA : extension to testingStep 4 :Do eigenvalue decomposition on s .dataStep5 : Output the k dominant eigenvectors asSCA has been introduced and analyzed in Sectionthe first k similar component.1.1 and 1. 2. However，as a feature extraction1.2 Self-aggregation property of SCA : the idealmethod ,SCA still cannot extract features from test-ing data. In this subsection , the connection betweencaseSCA and kernel principal component analysis will beIn this subsection , the ideal case will be analyzedanalyzed ,and then SCA can be extended to test thedatt 10].when there is no overlap among the clusters ; and thek -dimensional space spanned by the first k similarIt is well known that principal component analy-components is found to have an interesting self- aggre-sis( PCA) is a statistical technique that can extractgation property .the most informative k -dimensional output vector yProposition 1 ( self-aggregation ). When therefrom an input vector x of d -dimension( d》k ).are K non-overlapped clusters in the dataset , the firstScholkopf et al. 10] generalized the traditional PCA tok similar components of the dataset are all piecewise-the nonlinear case and proposed the kernel principal .constant ,assuming the data objects within the samecomponent analysis( KPCA ) method.cluster are indexed consecutively. And in the spaceThe KPCA method first maps the originalspanned by the first k similar components , all data ofdataset in R“to some high dimensional feature spacethe same cluster self- aggregate to a single point.F by some nonlinear mapping φ. φ=[中( x1 ),Proof. The term" non- overlapped" is used to re-中国煤化工mapped data matrix.fer to the case where s;=0 if data i and data j be-TheMYHCN M H G perform PCA in thislong to different clusters. Therefore ,S can be writ-mapped feature space. By assuming that the data ob-ten asjects have already been centered in the feature spaces = diag(S| S2r.. SQ).(2)and the non- centered case can be referred to Ref.Here ,S is a block-diagonal matrix , because the data[ 10], the following equations can be used to computewithin the same cluster are assumed to have been in-the projections of the training and testing data on theProgress in Natural Science Vol.16 No. 12 2006 www. tandf. co. uk/ journals1 345k-th kernel principal componentC;=!2(xx);- m;lq(x);-m;) ,P = 2d(Wx; )x( x;))=EdK。,(3)S;声(6)p=2xx; )(x,))(4)where m; = e/s; is the mean vector of class k , ande; is a vector of dimension s; with all elements beingTherefore，the projections of the training and testingone. So with some simple algebra inferences , we candata can be computed by doing eigenvalue decomposi-gettions to K. The kernel matrix is an inner- productmatrix , and the inner- product of two data objects is .C=(7)usually used for measuring the similarity betweenthem. If we extend this idea and let the entries of thewhere I; is the identity matrix of order s;. Then thekernel matrix K。be some arbitrary similarity metrictotal within-class scatter matrix can be defined aswhich can measure the pairwise similarity of the cor-C= >s,C; . It is well known that a common prin-responding data objects , then the kernel matrix Kwill become the similarity matrix in SCA. As ana-cipal for data analysis is to minimize the trace of C ,lyzed in Proposition 1 , the corresponding eigenvaluesthat isof the similar components areall 1. In this case ,Eq.minJ = tracd C )(3)becomes.| trac( φφ; )- trace(eφφ;ep%=(K。ak);=1.a;=a;,(5)i=1(Vs;which is equivalent to SCA. Thus , SCA is equivalent. In this way , the data in the same class is distributedto non-centralized KPCA ,andEq.( 4 )can be used toas tightly as possible 21. Defining the block -diagonalcompute the similarity components of the out-of-sam-matrixQ = diag(e/√s| .. e/v sk) ,then .ple data.J = trac(虹)- trac( Q'o'DQ ). (8)1.4 Maximum within cluster association in similarNow if the constraint of Q is relaxed to Q'Q=I ,then optimizer( 8 ) becomescomponent space : the general casemax trac( Q'φ'DQ ).(9)The ideal case of SCA is analyzed in Section 1.2and treated as a non- centralized KPCA in SectionFrom the theorem of generalized Rayleigh-RitJ9] ,it1.3. In this subsection , SCA will be analyzed in acan be easily derived that the solutionof(9 )is Q * =general case from a KPCA viewpoint.[q1 92 .. ,9k ]R ,where 91 ,92 ... ,9k are the Kdominant eigenvectorsof φ'φ and R is an arbitraryProposition 2( maximum within cluster associa-orthogonal matrix. Recalling that S= φφ ,so thetion ). If we view the similarity matrix S as an innerproduct matrix in some Hilbert space from the KPCAconclusion can be drawn that the within cluster asso-view point , then the within cluster association will beciation will be maximized in the space spanned by themaximized in the space spanned by the first K similarfirst K similar components of the dataset.components of the dataset.From Proposition 1 and Proposition 2 we can in-fer that the cluster structure hidden in the dataset willProof. In KPCA view , the similarity matrix canbecome clear in the space spanned by the first K simi-be treated as an inner product matrix. φ will be usedlar components of the dataset. In the next section weto denote the data matrix after nonlinear mapping ,will provide a set of experiments to show the effec-and the similarity matrix can be rewritten as Stiveness of SCA.φ'φ. φ;=[中(x ); ... ,中( x ); ] represents the中国煤化工data from class i( with size s; ). Assuming that the2MHCNMHG.data are indexed consecutively ,that is φ=[中...，2.1 2D synthetic data clustering中k ],it can be easily proved that the order of datawill not affect the final results , where K is the num-The synthetic data set with 200 samples is de-ber of classes. Then the within- class scatter matrixpicted in Fig. 1. The traditional k -means , using coor-class k is :dinates of sample points as features , partitions the da-1346www. tandf. co. uk/journals Progress in Natural Science Vol. 16 No. 122006ta set into two clusters as shown in Fig. 1( a). The .ods , respectively. From Table 1 we can see clearlyresult is obviously not satisfying. As a comparison，the advantage of SCA.our SCA method is also employed for this task usingTable 1. Clustering results on EEG datasetthe Gaussian functionHACC .HACSHACASCA1r-x,Euclidean 0. 47780.35560. 5222si= expas the similarity measure with some proper σ ( deter-BP0.4560. 35560.42220.5444mined experimentally ) , since Gaussian function- basedsimilarity has been successfully applied in many spec-2.3 Color image segmentationtral based clustering methods. After the similar com-ponents have been extracted ，we will perform k-Now SCA method is applied to color image seg-means to cluster them and the result is shown in Fig.mentation problems. The RGB ( Red , Green , Blue )1( b), from which we can clearly see that SCA out-values and the spatial coordinates of a pixel are usedperforms k-means.as its features , thus a pixel is represented by a five tu-3p (a)3p (b)ple( r ,g ,b x ,y ). The similarity of two pixels isgiven by a Gaussian function. The segmentation re-sults can be seen in Fig. 2 ,where Fig. 2(a),2(b)are the original images and Fig. 2(c),2(d) are theof0segmented images.-1-2--3210123-32- 0 -个2 - 3Fig. 1. 2D synthetic data clustering. ( a )Clustering results by k-means ;( b ) clustering results by SCA+ k means.2.2 Time series clustering(a)b)A real EEG ( Electroencephalogram ) datasetwhich is extracted from the 2nd W adsworth BCIdataset in BCI2003 competition is used in our experi-ments511]. According to Ref. [ 11 ], the data objectscan be generated from three classes : the EEG signalsevoked by flashes containing targets , the EEG signalsevoked by flashes adjacent to targets , and other EEGsignals. All the data objects have an equal length of(cd)144. We randomly choose 50 EEG signals from eachFig. 2. Color image segmentation.class and use the Euclidean distance and the BP met-ric 121 to measure the pairwise distances of the time2.4 Face recognitionseries. Then these distances will be transformed tcThe ORL database 13] is selected as our experi-represent the pairwise similarities by a Gaussian func-mental dataset. The recognition accuracy is testedtion with some proper variance. .with different numbers of training samples. t( t =2The clustering accuracies5 81achievedhier-3 A )images of each subject were randomly selectedarchical agglomerative clustering ( HAC )21 methodsfor中国煤化工g 10-k images of eachand our approach( i. e. SCA + k-means ) are com-subjHC N M H Gian function is used topared, and the final cluster number is set to be 3measure tne simllarty between two faces.' I 'he recog-manually. The final results are given in Table 1.nition results can be seen in Table 2 , and the recogni-HACC" " HACS" and' HACA" are used to repre-tion accuracy achieved by the traditional PCA andsent the complete-linkage , single-linkage and average-kernel PCA method is given for comparison.linkage hierarchical agglomerative clustering meth-Progress in Natural Science Vol.16 No. 12 2006 www. tandf. co. uk/ journals1347Table 2. Recognition accuracy on ORL dataset4 ShiJ. and Malik J. Normalized cuts and image segmentation.IEEE Trans. on PAMI , 2000 ,228 ): 888- -905.PCAKPCASCA; NgA. Y. ,Jordan M. I. and Weiss Y. On spectral clustering :0.72250.77130.7756Analysis and an algorithm. In : Advances in Neural Information0. 79960.85110.8516Processing Systems 14 ( ed. Dietterich T. G. , Becker S. andGhahramani Z. ), Cambridge, MA: MIT Press , 2002 , 857-0. 84920.89900.8967864.， LiC. A Bayesian approach to temporal data clustering using theSince the KPCA is a popular method to facehidden Markow model methodology. Ph. D. thesis of Vanderbilt U-recognitiont, we can see that the recognition re-niversity ,2000 ,1- 223.sults obtained by SCA can approximate the KPCAAdvances in Neural Information Processing Systems 9( ed. Mozerrecognition accuracy .M. ,Jordan M. and Petsche T. ) , Cambridge , MA : MIT Press ,1997 ,648- -654.3 Conclusions and discussionB Zhong S. and Ghosh J. A unified framework for model-based clus-tering Journal of Machine Learning Research, 2003 , 4( 12 ):In this paper ,a novel feature extraction method1001-1037.called similar component analysis ( SCA ) has beenWeisstein E. W. Perron Frobenius theorem. http ://mathworld.proposed. The SCA method has a self- aggregationwolfram. com/ Perron- Frobenius Theorem. html. [ 206-4-27 ]10 Scholkopf B. , Smola A. and Muler K. Nonlinear component anal-property that the data objects will move towards eachysis as a kemnel eigenvalue problem. Technical Report ,Max Plank-other to form clusters through SCA theretically. ItInstitut ,1996 ,1-18.has been found that the inherent pattern of similarity11 Lin Z. and Zhang C. Enhancing cassfication by perceptual charac-hides in the dataset. The inputs of SCA are just theteristic for the p300 speller paradigm. In : Proeedings of the 2ndIntemnational IEEE EMBS Conference on Neural Engineering , Vir-pairwise similarities of the dataset , which makes timeginia , USA ,March 16-19 ,2005 ,574- -576.series analysis easier due to the variable length of the12 Panuccio A. , Bicego M. and Murino V. A hidden Markov mode-time series. Several experiments have been presentedbased approach to sequential data clustering. In : Procedings of theand the advantages of our method can be seen easily.Joint IAPR International Workshop on Structural , Syntactic , andStatistical Pattern Recognition , Windsor , Canada , August 6- -9Although we can apply the Gaussian function , there2002 ,734-742.are still some unanswered questions such as how tochoose a proper similarity metric , which will be dis-for bhuman face identification. In : Proeedings of the Second IEEcussed in the subsequent paper.Workshop on Appicaions of Computer Vision, Florida, USA,Dec.19- -23 ,1994 ,138-142.References14 Yang M. Kernel eigenfaces vs. kernel fisherfaces : face recognitionusing kernel methods. In : Proc. IEEE Intermational Conference onl JainA. K. and Dubes R. C. Algorithms for Clustering Data, 1stAutomatic Face and Gesture Recognition , Washington DC , USA.ed. Englewood Cliffs NJ : Prentice Hall , 1988 ,1- -200.May 20- -21 ,2002 ,215- -220.DudaR. O. ,Hart P. E. and Stork D. G. Pattemn Classification，2nd ed. New York :John Wiley & Sons , 2001.512- -540.3 BilmsJ. A. A gentle tutorial of the EM algorithm and its applica-tion to parameter estimation for Gaussian mixture and hiddenMarkov models. Technical Report，International Computer ScienceInstitute , Berkeley CA , 1998 ,1-13.中国煤化工MYHCNMHG.

论文截图