Unsupervised Linear Discriminant Analysis

期刊名字：上海交通大学学报（英文版）
文件大小：392kb
论文作者：TANG Hong，FANG Tao，SHI Peng-fe
作者单位：Inst. of Image Processing & Pattern Recognition,Three Gorges Project Command of Armed Police Hydropower Eng. Troops
更新时间：2020-11-22
下载次数：次

论文简介

Journal of Shanghai Jiaotong University(Science) ,Vol. E-11,No. 1,2006，40~42Article ID: 1007-1172(2006)01-0040-03Unsupervised Linear Discriminant AnalysisTANG Hong'"(唐宏)， FANG Tao'(方涛)，SHI Peng-fei'(施鹏飞)，TANG Guo-an2(唐国安)(1. Inst. of Image Processing & Pattern Recognition, Shanghai Jiaotong Univ, Shanghai 200030, China;2. Three Gorges Projet Command of Armed Police Hydropower Eng. Troops, Hubei 443133)Abstract: An algorithm for unsupervised linear discriminant analysis was presented. Optimal unsupervised discrim-inant vectors are obtained through maximizing covariance of all samples and minimizing covariance of local k -near-est neighbor samples. The experimental results show our algorithm is effective.Key words : linear discriminant analysis (LDA); unsupervised learning; neighbor graphCLC number: TP 181Document code: AIntroduction(w, |w,=(1/k,j∈N;,i∈[1,M]}.\0,j女N*As a supervised learning algorithm, linear dis-Step 2 (calculating eigenvalues and eigenvec-criminant analysis (LDA) is widely used in manytors) The k-nearest neighbors graph of all samplesfields, such as face recognition , numerical recogni-are employed to calculate a laplacian matrix L=tion and information retrieval. The objective of-2(1- W.), where I is an MX M iden-LDA is to find a project matrix A that maximizes(12the ratio of between-class scatter S. against within-ity matrix and e is an MX1 column vector whoseclass scatter sWL1]. In contrast, an algorithm forelements are 1. The eigenvalues and eigenvectorsunsupervised linear discriminant analysis (ULDA)of matrix XLXT will be calculated.is presented in this paper. The project matrix A isStep 3 (constructing optimal project matrix)obtained through maximizing covariance of all sam-The project matrix A is constructed by severalples and minimizing covariance of local k- nearesteigenvectors of XLxT, which correspond to leadingneighbor samples.eigenvalues.1 1 The Algorithm2，JustificationGiven M samples X = {x;,x2.,xu), whereLet L be the number of classes and l,(i=1,2,each sample is an N-dimensional vector. The algo-., L) the number of samples in the ith class.rithmic procedure is formally stated as follows:Fisher criterion function isStep 1 (constructing the k-nearest neighborsJ(A)= .ASBAT(1)ASATgraph) For the ith sample, k-nearest neighbors,Nt, are selected and edges between the sample andThe between-class scatter Sb and within class scat-each of its neighbors will be put. The weight ofter S. are given byeach edge is equal to 1/k. Therefore, the weightSp=l.m,m! 一m。m5(2)matrix of the k-nearest neighbors graph is Wl =MReceived date; 2005-07-06中国煤化工l.m,m;)(3)Foundation item: The High Technique Program of China (No.2001AA135091) and the National NaturalMYHCNMHGvectorsoftheithScience Foundation of China(No. 60275021)class samples and all samples，respectively. More-* E-mail:tanghong- sjtu@situ. edu. cnover, total population scatter S,=Sp+sS.[2].Unsupervised Linear Discriminant Analysis41_Let S=Sb- Sw，which can be rewritten assamples. Let L=L- 2Land L=I- W(，where W:is the weight matrix of k-nearest neighbors graph.s=(22.im.m? - 2x,x;" - Mm.m}=Therefore, the idea of ULDA is that the within-class covariance matrix Lw is replaced by L in unsu-六(XL,XT- 2XL.XT)(4)pervised cases. The project matrix A is obtainedthrough maximizingwhere L,=I- ee' and Lw=I- W. And W=JuLDa(A) =ALAT_ A(L, - 2L)AT6)W:AAIt can be seen from Eq. (6) that the maximum ofwhere W;(i∈[1,L]) is l,XJuLDA(A) is obtained through maximizing covari-w」ance of all samples, L, and minimizing covariancel; square matrix:of local k nearest neighbor samples, L.“01/l;.1/173，Experimental Results1/l; 0 ... 1/l;W,=(5)The UCI handwritten number characters arel1/l; 1/， ...)」employed to test our algorithm. The data set con-Theorem 1 Assume V x∈R, t(x)=b(x)+of 5 620 characters, which come from 10w(x)≥0,t(x)≥b(x)≥0 and t(x)≥w(x)≥0. Lethandwritten numbers. In order to reduce the timeJ(x)=b(x)/w(x) and J2(x)=b(x)- w(x).of calculation, we obtain images with 16X16 pix-Then, J(x) has the maximum at point xo if J2(x)els by sub-sampling original 32X 32 image. Twohas the maximum at point xo.typical algorithms for LDA are selected to compareProof If w(x)=0, then J2(x)=b(x)=t(x)with our algorithms, i.e. ，Fisher linear discrimi-and J[(x)=+∞. And if t(x)≥w(x)>0, for anynant analysis (LDA) and direct linear discriminant.given t(x), J2(x)=b(x)- w(x)=t(x)- 2w(x) isanalysis (DLDA). In order to discriminate our al-a monotone decreasing function with respect togorithm from two algorithms in notation, we termw(x)，so is J(x)=(t(x)-w(x))/w(x)=t(x)/both LDA and DLDA supervised linear discrimi-nant analysis (SLDA). In the experiment of SL-w(x)- 1. Therefore ，when Jz(x) reaches its max-DA，100 samples of each number are employed toimum，J[(x) also will reach its maximum at thetrain and others to test using the nearest neighborsame point. According to two cases above, theclassifier.theorem is proven.It used to be difficult to compare a supervisedLet J(A)= ASA.According to Theorem 1,and unsupervised learning algorithm in a fair way.AATboth J(A) and J(A) reach the maximum at theIn order to compare ULDA and SLDA in a fairsame point. Since both L and Lw are real symmet-way, we made a pseudo-clustering experiment ,ric matrixes, both XL,XT and XL_XT are positivewhich is like the nearest neighbor classifier. First-semi-defined matrixes. In addition, XLXT is the .ly, training data in SLDA are employed to extractdiscriminant vectors through the algorithm for UL-covariance matrix of total samples x[$]， AndDA. Moreover ，discriminant vectors form the pro-XLwxT is within-class covariance matrix. There-ject matrix and are used to reduce dimensionality offore, maximizing J (A) means maximizing covari-original data set. At last, all projected data set areance of all samples and minimizing within-class co-used to perform pseudo-clustering， where eachvariance.tral |中国煤化工a clustering centerIn unsupervised learning cases， the within-andnes its nearest clus-class covariance matrix Lw can not be obtained.rilMYHC NM H G rte"of each clus-However, it is easy to calculate the local variancetering is the ratio of the number of samples whosematrix L through neighbor relationship amongclass is the same as that of the clustering center to42 TANG Hong(唐_ 宏)，FANG Tao(方涛)，et al .the total number of samples in the clustering. Uneigenvectors, which correspond to those largerlike SLDA，there is no information indicatingeigenvalues. It can be seen that the recognitionwhich class training- sample of ULDA belongs to inrates of ULDA is obviously lower than that ofextracting discriminant vectors. Only when thLDA and DLDA when the number of dimensionali-pseudo-clustering is over, are classes of trainingty is lower than five. The difference between UL-samples employed to check whether the clusteringDA and SLDA becomes smaller and smaller alongcenter is in the same class as other samples.with the increase of dimensionality. When theIn Tab. 1, the number of dimensionality indi-number of dimensionality is nine, the recognitioncates the number of selected leading discriminantrate of ULDA is slightly higher than that of LDA.Tab. 1 The recognition rate of algorithms using several leading discriminant vectors%Dimensionality34LDA27. 3252. 9773. 8182.1285. 8988. 7290.3090.6191.82DLDA31. 3451. 1972. 4981. 6985. 5287.1789. 9491.5892. 21ULDA13. 3330. 2051. 1372. 9486. 8488. 9089. 9891.99.As we know, the rank of the between-classsamples. Table 2 lists the recognition rate of UL-scatter S。is lower than the number of class of sam-DA using more than nine leading discriminant vec-ples[4]. Therefore , the number of discriminant vec-tors. It can be seen that the recognition rate oftors corresponding to nonzero eigenvalues is alsoULDA still increases with the number of dimen-lower than the number of class of samples. How-sionality. Meanwhile, the recognition rate of UL-ever，unlike SLDA， the number of discriminantDA is larger than the maximum of the recognitionvectors of ULDA can be larger than that of class ofrate of SLDA in Tab.1.Tab.2 The recognition rate of ULDA using more than nine leading discriminant vectors11121161892. 5392.9992. 9093. 5793. 5393. 7593. 6293. 9894. 07first several leading discriminant vectors of ULDA4 Conclusionis worse than that of SLDA in terms of discrimina-The discrimination of first several vectors oftion, the effectiveness of ULDA increases with theULDA is worse than that of SLDA. However,number of dimensionality.along with the increase of dimensionality, the dif-Referencesference between ULDA and SLDA becomes small-er and smaller. When the number of dimensionali-[1] Yu Hua, Yang Jie. A direct LDA algorithm forty is larger or equal to the number of class of sam-high-dimensional data :with application to face recog-nition[J]. Pattern Recognition, 2001 ,34(10) :2067-ples, the recognition rate of ULDA still increases070.with the number of dimensionality. The effective-[2] YangJ, Frangi A F, YangJ Y, et al. KPCA plusness of ULDA can obtain nearly same discriminantLDA: A complete kernel Fisher discriminant frame-vectors as SLDA provided that the number of di-work for feature extraction and recognition [J].mensionality is equal to or slightly larger than thatIEEE Transaction on Pattern Analysis and Machineof class of samples.Intelligence ,2005.27(2) :230- 244.The largest difference between ULDA and SL-[3] He Xiao-fei, Yan Shui- cheng, Hu Yu-xiao, et al.faces[J]. IEEEDA is that ULDA is an unsupervised learning algo-中国煤化工and Machine Itellirithm. In ULDA，optimal unsupervised discrimi-TYHCNMHGnant vectors are obtained through maximizing CO-[4] Liang Zhi-zheng, Shi Peng-fei. Kernel direct disvariance of all samples and minimizing covariancecriminant analysis and its theoretical foundation[J].of local k-nearest neighbor samples. Although thePattern Recognition , 2005 ,38(3);445- 447.

论文截图