Logistic Regression for Prediction and Diagnosis of Bacterial Regrowth in Water Distribution System Logistic Regression for Prediction and Diagnosis of Bacterial Regrowth in Water Distribution System

Logistic Regression for Prediction and Diagnosis of Bacterial Regrowth in Water Distribution System

  • 期刊名字:天津大学学报(英文版)
  • 文件大小:278kb
  • 论文作者:DONG Lihua,ZHAO Xinhua,WU Qing
  • 作者单位:School of Environmental Science and Engineering,People's Liberation Army 93756
  • 更新时间:2020-07-08
  • 下载次数:
论文简介

Trans. Tianjin Univ.2009, 15: 371-374DOI 10.1007/s12209-009-0065-7。Tianin University and Springer-Verlag 2009Logistic Regression for Prediction and Diagnosis ofBacterial Regrowth in Water Distribution System*DONG Lihua(董丽9)", ZHAO Xinhua(赵新华)", WU Qing(吴卿) ', YANG Youan (杨幼安)2(1. School of Environmental Science and Engineering, Tianjin University, Tianjin 300072, China;2. People's Liberation Army 93756, Tianjin 300131, China )Abstract: This paper focuses on the quantitative expression of bacterial regrowth in water distribution system.Considering public health isks of bacterial regrowth, the experiment was performed on a distribution system ofselected area. Physical, chemical, and microbiological parameters such as turbidity, temperature, residual chlorineand pH were measured over a three -month period and correlation analysis was carried out. Combined with principalcomponents analysis (PCA), a logistic regression model is developed to predict and diagnose bacterial regrowthand locate the zones with high risks of microbiology in the distribution system. The model gives the probability ofbacterial regrowth with the number of heterotrophic plate counts as the binary response variable and three new prin-cipal components variables as the explanatory variables. The veracity of the logistic regression model was 90%,which meets the precision requirement of the model.Keywords: bacterial regrowth; water distribution system; bheterotrophic plate counts; logistic regression; principalcomponents analysis; odds ratio; veracityThe deterioration of drinking water quality in the system of certain university residence, which is fed bydistribution system can be ascribed to many factors, of the surface water originating from a water plant. Thewhich bacterial regrowth is a subject for its short-term sampling locations are shown in Fig.1. In this experi-risks regarding public health!"!. Bacterial quality is moni- ment, free chlorine was used as disinfectant. Water sam-tored by heterotrophic plate counts (NHpc) and coliform ples were collected daily at eight locations from May tocounts'-. Bacterial regrowth can degrade water quality July in 2004. The eight sampling locations include aand generate numerous problems'. Many studies were booster pump station (location 6), inlet of the distribu-made to identify the main factors that influence the re- tion system (locations 1 and 8) , private residence (loca-growth of heterotrophic bacteria in the distribution sys- tion 3) and factory (location 4).teml48].However, lttle information is available regardingthe quantitative expression of bacterial regrowth in drink-ing water distribution system!'. The purpose of this paperis to study correlation between Nrpc and other water甘宁quality parameters such as pH, turbidit, which can be3-easily determined in a short time. Combined with princi-al components analysis (PCA), a logistic regressionmodel is established to predict the probability of bacterialregrowth and locate the zones with high risks of microbi-ology.口Sampling 一- Inlet1 Materials and methodsFig.1 Diagram of water sampling locations1.1 Experimental details1.2 ParametersThis experiment was performed on a distribution中国煤化工1, concentration ofAccepted date: 2009-05-31.YHCNMHG*Supported by National Natural Science Foundaion of China (No. 50878140) and Project of Water Plluion Control and Repair(No.20082X07317-005) .DONG Lihua, borm in 1980, female, doctorate student.Correspondence to ZHAO Xinhua, E-mail: zxh@ju.edu.cn.Transactions of Tianjin University Vol.15 No.5 2009free residual chlorine Crch concentration of total residual ter quality parameters, so they cannot be included in thechlorine CTcl, concentration of ferro CFe, concentration of model.total phosphorus CTp, concentration of ammonia nitrogenDuring the experiment, we totally acquired 215CNH,,concentration of nitrate nitrogen CNO,N , concentra- groups of data, except for abnormal values. One hundredtion of nitrite nitrogen CNO,.N, concentration of chloro- and seventy-five of them are used to establish the model,form CcHcl, , concentration of total organic carbon Croc,and the rest are used to verify the model.concentration of assimilable organic carbon CAoc, UV2s4, 2.2.2 Analysis of co-linearity between explanatory vari-Escherichia coli Ncoi, and Nrpc were measured at theablesselected sampling locations by the method of water qual-Co-linearity between explanatory variables will leadity standard monitoring.to inaccuracy of regression ceofficients or an insignifi-cant consequence in the field of specialty. It is necessary2 Logistic regression modelto analyse the co-linearity. The correlation matrix be-tween explanatory variables is shown in Tab. 1. As.1 Logistic regressionshown in Tab. 1 (bold face),there are significant correla-Logistic regression is generally used to model dis- tions between explanatory variables. In order to eliminatecrete response variables, especially for a binary response the co-linearity, PCA is adopted in this paper.variable. Its elementary principle is to determine the 2.2.3 Principal component analysisprobability of event by studying correlation between re-PCA can eliminate co-linearity and yield new ex-sponse and explanatory variables' 0. When the predicted planatory variables with no crrelation!l. Applying PCAprobability of an event is more than or equal to 0.5, it is to the original explanatory variables, we obtain the resultsregarded as the first group, i.e., event; else, as the second shown in Tab. 2. Generally, they are satisfactory whengroup, i.e., no eventthe cumulative reaches 0.7. So Components I, 2, and 32.2 Response and explanatory variablesare chosen as principal components variables which can2.2.1 Selection of response and explanatory variablesaccount for the information of original variables.NIrc as a binary variable (event: Nrpc≥100, Y =1At the same time, we also gain the factor matrix ofand no event: Nurc<100, Y =0), is chosen as the re- principal components, which is shown in Tab. 3, wheresponse variable 4. By analysis, Cre, Tur, Crcl, pH, CrcalCFe, Tur, CTI, CFa etc. are standardized variables. FromUV2s4 and CNH, , which have significant correlations with Tab.3, we can get the expressions of three principal com-Nupe, are regarded as explanatory variables. No signif-ponents variables, which are linear equations composedcant correlation is observed between NHpc and other wa- of CTCI, Crcl Tur, Cre etc.Tab.1 Correlation matrixVariable_CnCraTpHUVz4 .Cm,_G1.000-0.43760.01840.033 80.052 4-0.60130.222 0-0.011 30.05140.0200.0204'ur0.018 .-0.01130.092 10.099 0-0.1354-0.826 6.033 80.051 41.0000-0.481 7-0.1113-0.0292UV2s0.05240.02051.000 0-0.1983-0.071 3Crn,-0.601 3-0.135 4-0.0302Fe0.064 8-0.8266-0.029 2-0.0713Tab.2 Eigenvalues of the correlation matrixComponentEigenvalueDiferenceProportionCumulativeative3.004 280 311.384 516 920.4292.429221.619 763 390.576 162 280.23140.660631.043 601 110.488 084 25中国煤化工0.8090.555 516 860.113872 130.889 050.441 644 730.204 416 82MYHCNMHG0.952 1-372-Dong Lihua et al: Logistic Regression for Prediction and Diagnosis of Bacterial Regrowth in Water Disribution SystemTab.3 Factor matrix of principal componentsefficients and their ORs are more than I, i.e. the greaterVariableComponent 1 Component 2Component 3CFe and Tur, the greater Nupc. Therefore, they are theCra0.499 865-0.011 5940.259 688most likely factors that cause bacterial regrowth in theCea0.378 319-0.088 5450.455 419water distribution system. CTCI and CFc1 have negativept0.172 7790.53 061-0.443 010regression cofficents, which reflcts inhibiting effect ofTur-0.436 1480.349 1010.387 462disinfectant for bacterial regrowth.0.361 2300.331 5520.442 4482.4 Verification and application of logistic regres-UVs40.243 2450.567 222-0.223 083sion modelc-0.444 1590.364 1570.365 027We choose the rest 40 groups of data to verify thelogistic regression model. By calculation, it can be seen2.3 Establishment of logistic regression modelthat only 4 of the predicted probabilities are not consis-As we know, Nrpc is the binary response variable: tent with the actual values, i.e. veracity of the logisticevent (Nrmce≥ 100) and no event (Nrpc<10). With regression model reaches 90%, which meets precision .Components 1, 2, and 3 as explanatory variables, we can requirement of the model.derive the logistic regression model of bacterial regrowthAccording to the above logistic regression model,by SAS software 10, which is a probability model includ- we can achieve online monitorings, if the explanatorying only one explanatory variable (Component 1).variables are known at every sampling location. In fact,Based on the predicted probability, we can diagnose all the explanatory variables included in the model can beand locate the zones with high risks of microbiology in monitored by online instruments, so the model can bethe dstribtion system. If the probability value is equal to easily aplied to practice.or more than 0.5, i.e. Nupc≥100, the distribution systemhas bateria risk. Othervise, if the predicted probability 3 Conclusionsis less than 0.5, ie. Nrpe<100, the water quality is safefor public health.In this paper, logistic regression theory and principalThe testing global null hypothesis (β=0) is showncomponents analysis are introduced to establish the pre-in Tab. 4. It can be seen that Pr is more than ChiSq anddiction model of bacterial regrowth. With NHpc as theless than 0.01, therefore, the logistic regression modelbinary response variable and new principal componentshas significance.variables as explanatory variables, the model can accu-Tab.4 Testing global null hypothesisrately give the probability of bacterial regrowth and diag-TestChiSqDFPr> ChiSqnose zones with high risks of microbiology in water dis-Likelihood ratio0.000 1tribution system.Score43.6537Wald17.669 60.000 IReferencesRegression cofficients and odds ratios (OR) ofthe [1] Yang Y L, Li X, Li G B et al. Study on key guidelies oflogistic regression are shown in Tab. 5, where OR is thebiological stabilization in drinking water distribution sys-ratio of the probability of Y=1 to the probability of Y=0.tem [J]. Water and Wastewater Engineering, 2005, 31(2):We can see that CFe and Tur have positive regression co-12-16 (in Chinese).[2] US Environmental Protection Agency. National Recom-Tab.5 Regression coefficients and odds ratiosmended Water Quality Criteria [S]. Office of Water. Of-EstimationORfice of Science and Technology, US Environmental Protee-Intercept-1.5780.0000tion Agency, Washington, DC, 2006.-0.6570.5182-0.4970.6083[3] LeChevallier M W, Olson B H, McFecters G A. AsessingpH-0.2700.7634and contollig bacterial regrowth in distribution systemsnn 0/71.时000.499 61.648 1-0.4730.623 1[4]中国煤化王H M, SelalaJeal.UVz2s4-0.3190.726 9IYHC N M H Ginfectant residual []0.5841.7932JAWWA, 9999 91(1): 55-64.一373一Transactions of Tianjin University Vol.15 No.5 2009[5] Liu W, Wu H, Wang Z et al. Investigation of asimilablechemical parameters in drinking water [J]. Chin J Publicorganic carbon (AOC) and bacterial regrowth in drinkingHealth, 2006, 22(3) 280-281 (in Chinese) .water distribution system [J]. Water Research, 2002, [ 10 ] Chen G, Chen J W. The function of prediction and diagno-36(4): 891-898.sis of Logistic regression and application [J]. Journal of[6] Kermeys Alain, Nakache FrederiQue, Deguin Alain et al.Mathematical Medicine, 2007, 20(3): 280-281(in Chi-The efects of water residence time on the biological qual-nese).ity in a distributing network [J]. Water Research, 1995,[ 11 ] Johnson R A, Wichem D w. Applied Multivariate Statisti-29(7): 1719-1727.cal Analysis [M]. Tsinghua University Press, Beijing,[7] ChuC, Lu C, Lee C et al. Effects of chlorine level on the2008.growth of bioflm in water pipes [J]. Environ Sci Health- [12] ZhaoS Y, Dong L H, Wu Q et al. Prediction of bacterialPart A, 2003, 38 (7): 1377-1388.regrowth in water distribution system [J]. China Water[8] Kayen Power, Laslo A Nagy. Relationships between levelsand Wastewater, 2006, 22(5): 48-51 (in Chinese).of heterotrophic bacteria and water quality parameters ina [13 ] Zhang M, Chen Y C. Optimization Design and Datadrinking water distribution system [J]. Water Research,Analysis of Environmental Examination [M]. Chemical2000, 34(5): 1495-1502.Industry Press, Bejing, 2008 (in Chinese).[9] Wu Q, Zhao X H. Study on HPC and some physical and中国煤化工MYHCNMHG- -374-

论文截图
版权:如无特殊注明,文章转载自网络,侵权请联系cnmhg168#163.com删除!文件均为网友上传,仅供研究和学习使用,务必24小时内删除。