Clustering analysis of telecommunication customers

期刊名字：中国邮电高校学报（英文版）
文件大小：309kb
论文作者：REN Hong，ZHENG Yan，WU Ye-rong
作者单位：School of Electronic Engineering,School of Computer Science and Technology
更新时间：2020-11-22
下载次数：次

论文简介

Available online at www.sciencedirect.comScienceDirectThe Jourmnal of ChinaUniversites of Posts andTelecommunicationsELSEVIERApril 2009, 16(2): 114-116www.sciencedirect.com/science/joumal/10058885www. buptjoumal.cn/xbenClustering analysis of telecommunication customersREN Hong', ZHENG Yan2 (8), WU Ye-rong21. School of Electronic Engineering, Beijing University of Poss and Telecommurications, Beijing 100876, China2. School of Computer Science and Technology, Bejing University of Posts and Telecommunications, Bejing 100876, ChinaAbstractIn this article, a clustering method based on genetic algorithm (GA) for telecommunication customer subdivision is presented.First, the features of telecommunication customers (such as the calling behavior and consuming behavior) are extracted. Second,the similarities between the multidimensional feature vectors of telecommunication customers are computed and mapped as thedistance between samples on a two-dimensional plane. Finally, the distances are adjusted to approximate the simiarities graduallyby GA. One advantage of this method is the independent distribution of the sample space. The experiments demonstrate thefeasibility of the proposed method,1 Introductioncategory and maximize heterogeneity among differentcategories. The traditional clustering methods include fuzzyCustomer subdivision aims to increase the profits ofc-means (FCM) and c-means. Their performance might betelecommunication companies, such as for decisions to retainpoor for high dimensional data. The efficiency of theold customers, find potential customers, and prewarn churnclustering is determined by the distribution of data to a certaincustomers. For example, with the help of customerdegree. For example, FCM method performs well wih asubdivision, the market can be segmented properly. Andsuper sphere feature space but performs badly with acorrespondent discount packages and individuation marketingrandomly shaped sample distribution [4]strategies for certain customer categories can be decided.For the clustering samples distributed in the randomlyFurthermore, customer subdivision is popular in areas such asshaped feature space, a new clustering algorithm based on GAfinance, insurance and investment. It is helpful foris presented in this article. The primary advantages of theunderstanding different demands of different customer groupsproposed method include the independence of the samplein terms of making business decisions [1].distribution in the high dimensional feature space and theCustomer subdivision is to divide the customers intooptimization of the clustering result. First, by building thedifferent groups by using clustering methods based on keysimilarity matrix, the essential relations among the samplesperformance indicators (KPI) such as the alling time, theare extracted. Then, the distances of samples which initallyroaming time, average revenue per user (ARPU), etc [2]. Theare randomly distributed on a two-dimensional plane aredeployed data mining technologies, including clustering,adjusted by using GA globally to approximate the simiaritiesclassifcation and association analysis, k means, naive bayes,between the samples. GA is a global search algorithm (5],and decision tree, are commonly used algorithms [3].whereas FCM and C means clustering algorithms are for localsearch, which is prone to trap into local minimums [6-7],2 Clustering based on GAespecially when dealing with a large number of samples. Toovercome the drawback mentioned above, an improvedClustering is an unsupervised classification using essentialclustering algorithm based on GA is proposed in this article.attributes. It aims to maximize homogeneity within the sameThe中国煤化工yCorresponding author: ZHENG Yan, E-mail: yanzheng@ bupl.cdu.cnRecceived date: 11-07-2008YHCNMHGDOI: 10.1016/S 005-8885(08)60214-9Issue 2REN Hong, et al.1 Clustering analysis of telecommunication customers1152.1 Similarity matrixE==2ZK-r|(7)2n同A similarity matrix measuring the similarities amongwhere心denotes the Euclidean distance between samples i andjsamples can be built by such methods as quantity product, thecosine, the max-min, and the arithmetic average methodson a twodimensional plane. The coordinates of the samples iandj[8 -9].are (a,b) i=2..n and (a,b,) j=2..n respectively,For the purpose of building a similarity marix, the samplesand心is defned as:must be normalized to the range of [0,1] in the preprocessingsteps. Suppose that the sample space X = 2x2..心=(a-a)*+(.-b)尸(8)Vx.∈X，the feature vector is x2(=x.2.....). 业The smaller the error value is, the more ft the sarmples are.The fitess function is defined as:denotes the kth atribute of the ith sample.A=-Zxuf=;E+a(9)newhere a=1 for not to make the value too higho.=品二x.-4)%(22.3 Clustering based on GAThus, the initial sample can be normalized as follows:The proposed clustering algorithm based on GA for名=(3)σtelecommunication customer subdivision is described asfollows:(4Step 1 Assign each feature vector of the telecommunicationcustomer a pair of the coordinate value (a,b) randomly onwhereand Xxink are the maximum and minimum valuesa two dimensional plane, where a,b∈[0,1] i= .2....of ......Step2 Compute the simiarities to form the marix (1.,).nThe similarity matix (r,)n。 is an nXn symmetricalusing Eqs. (1)-(6).matrix with 1 in the diagonal line [10]:Step 3 Construct the initial population. Each pair of thecoordinates is viewed as a gene and coded to an 8-bit binaryvalue. Then, all the n genes are linked into a chromosome (also(5called individual), the length of which is L=8n bits. Bydifferent orders, randomly create N chromosomes to form aninitial population S.Step 4 Compute the finess. Fist, compute the eror valuewhere denotes the sinilarity measurement between samples iof each individual using Eq. (7); then compute the ftness usingand j, commonly it is a nonnegative value. The closer or moreEq. (9).similar the samples i andj are, the greater the value .J is. TheStep 5 Select parental individuals according to the roulettecosine method is as fllows:wheel selection strategy. In addition, an elitist strategy is used,i.e. the best individual of each generation is always copied intothe succeeding generation. First, choose the individual with the心=(6maximum value of fitness as a parental individual. Thencompute the selection probability of each rest individual临点运品p.=f/Efm The acuaive prbabiliyis q=Sp,.2.2 GAGenerate a random number r between [0,1) . Select the firstBy building the similarity matrix, the high dimensional andindividual if r

论文截图