A New Modeled Equation of the Water Quality Index for Examination of the Water Quality of Treatment Plants in Basra City (Iraq)

Monitoring the pollution degree of water treatment plants (WTPs) or any water sources requires routine and continuous water quality measurements, which is considered a challenging task. For this reason, applying a statistical tool to deal with a single value instead of a huge number of water quality values is very useful for a decision maker in this field. This work aimed to evaluate the characteristics of the purified water that is supplied from the most important water purification stations in Basra governorate for drinking use. Additionally, the study aimed at using the water quality index (WQI) for examination of the water quality, and obtaining a new equation of the WQI for water treatment plants based on the weighted arithmetic index. In this consent, 96 treated water samples were collected, regularly, from eight water purification plants in Basra city over a period of 12 months in 2021, for measuring the main physicochemical properties. Results showed that the studied units supplied treated water with the WQI varying between 50 and 72, and only the water supplied from Al-Asmaee unit could be classified as good water for drinking. The new developed formula of the WQI was valuable and applicable to examine the water quality of any water treatment unit in Basra city. It is recommended to use ridge linear regression (RLR) as the best statistic approach. The coefficient of R 2 founded by RLR method was 96.1369. This method is helpful to be applied for a decision maker for monitoring different resources of water in case of the availability of enough data.


Introduction
Life continuity all over the world is linked to water availability. Seas, oceans, lakes, rivers, groundwater and rainwater are the basic natural sources of water that are mainly used for human, animal, agricultural, industrial, and manufactural uses and other needs (Yaseen et al., 2019;Yousif et al., 2022). The river of Shatt Al-Arab is the dominant water source for all the residents in Basra city, one of the main governorates that is located in the south of Iraq. The water quality of this river is related to several problems that make the water undesirable for drinking or irrigation. These problems are the lack in water quantities that are received from Tigris and Euphrates rivers due to the decreasing level and flow of these two rivers, the increasing salinity level that resulted from the water of Arabian Gulf, and receiption of high quantity of municipal, agricultural and industrial pollutants that are discharged daily from the surrounding places without any previous treatment process (Al-Imarah et al., 2017). Therefore, the majority of water treatment units are constructed and distributed nearby Shatt Al-Arab river to draw the water from it and then supply the Basra province with suitable drinking water. However, most of these plants are old, and treat the water based on the conventional treatment methods or based on a combination of several single units. Hence, it is important to monitor and examine the behavior of the units based on the analysis of the produced water quality regularly.
The analyses of water samples that are based on various physiochemical and biological characteristics to decide the degree of pollution or the range of water suitability for human use are called water quality analyses (Bhutianietal, 2018). However, instead of dealing with a large number of water quality data, analyzing the quality of water by applying specific indices that help minimizing the data size into a single value is recommended (Eassa and Mahmood, 2012). The water quality index (WQI) is a simple and effective way for rating the water appropriateness for drinking or other purposes mathematically, which turn the multiple number of water quality data to a unique value (UNEP GEMS, 2007;Alobaidy et al., 2010a;Akter et al., 2016;Kizar, 2018). This index was firstly interpreted and used by Horton. After that, many researchers applied the WQI for water examination (Al-Imarah et al., 2017;AL-Taay et al., 2018). It is a unitless number that is calculated according to four main steps: choosing the important parameters according to the assessment target, changing the values to a unitless number and rating curve, giving a weight for each parameter based on the parameter importance, and applying the aggregation. Note that there are four common WQI indices that have been widely used in recent years. These indices are National Sanitation Foundation, Canadian Council of Ministries of the Environment, Oregon, and Weighted Arithmetic. Locally (in Iraq), Canadian WQI was used extensively for evaluating Al-Hawizah marsh quality (Alobaidy et al., 2010b), Al-Hammar marsh quality (Al-Saboonchi et al., 2011), and different water treatment stations (Al-Imarah et al., 2017). However, the weighted arithmetic model still has limited applications. This model was applied to examine the quality of Malin river (India) (Bhutianiet al., 2018) and Gopishettykere water body in India (Yogendra et al., 2008). This work aimed to evaluate the efficiency of the main water purification units in Basra, as this topic is not sufficiently studied and discussed in previous works. The objectives are to examine the characteristics and the WQI of treated water supplied from eight water purification units in Basra city for drinking, and to develop a new water quality equation according to the weighted arithmetic index (WAI) method. The contribution of this study is to give more understanding about the performance of the main water treatment units in Basra based on the national and international standards. Furthermore, the new developed formula of the WQI is valuable and applicable to examine the quality of any source of water directly.

Experimental work
This study was implemented to examine the efficiency of eight water treatment units inside Basra city. Al-Basra is an Iraqi city situated on the sides of Shatt al-Arab river. The latitude of Al-Basra city is 30.508102 and the longitude is 47.783489. The selected purification units were Hay Al-Hussain, Al-Bradiah1, Al-Bra-diah2, Al-Jubila1, Al-Jubila2, Al-Libani, Al-Asmaee, and Mhegran (Fig. 1). These units were selected according to their importance as the main treatment plants that are designed to cover the drinking water requirements in Basra. The raw water that entered each treatment unit was drawn from Shatt Al-Arab river, and therefore, these units are constructed near to the river sides. The raw and treated water samples were taken monthly, and were collected from each treatment plant at equal intervals and at the same day for 12 months during the period between 01/01/ 2021 and 30/12/2021. All the gathered (raw and treated) water samples were kept in polyethylene bottles (1 liter) and stored in a cooling box, and then moved to the laboratory for regular tests. The tested (WQI) parameters were pH, total hardness (TH), total alkalinity (T.ALK), electrical conductivity (EC), magnesium (Mg ++ ), calcium (Ca), sulphate (SO 4 --), and potassium (K + ). Except the pH, which was measured at the site, all other parameters were tested according to the APHA method (2017) in the Marine Sciences Center. The treated samples were collected from each unit after treatment reflecting the supplied drinking water.
All the results of thhe tested parameters were used by the WQI according to the WAI procedure (Cude, 2001) to assess the water effectiveness that was supplied by the studied treatment units for human usage.

Calculating of WQI
The WQI applied in the current research was calculated by using the WAI procedure, which was suggested by Horton in 1965. After that, this method was improved in 1972 by Brown (Ajonina, 2020). The arithmetic mean to sum up the values resulted from multiplying each water quality variable with a weighting factor. At first, the rating scale (Vi) of each variable was measured by Eq. 1, followed by calculating the unit weight (Ui) based on Eq. 2. Then, the quality index of water was calculated by the Eq. 3. After computing the WQI, the outcomes were sorted based on different ranges between 0 to 100. These ranges specify the class of the water (Table 1), which provides the quality of water vs the WQI ranges (Chaterjee and Raziuddin, 2002).
Where: Vi -the rating of quality for x water quality of i th variable; Ui -the relative load for x th variable; mi -the exact magnitude of WQI variables, which is gained by the laboratory tests; ki -the ideal magnitude of WQI variable, which is gained by WHO standards ( Table 2).

Analysis of data
Version 22 of IBM SPSS Statistics was used to check if the data followed the case of normal or non-normal distribution, due to data variability. For this, the usage of the Kolmogorov-Smirnov (K-S) and the Shapiro-Wilk (S-W) tests was conducted based on a significance value equal to 0.05. Also, both the Pearson and the Spearman tests were used to calculate the correlation coefficients of the parametric and non-parametric variables (based on the normality test), respectively (Guimarães, 2017;Valentini et al., 2021a).

Ridge regression
This term refers to the special statistical process that is used to enhance the model accuracy and the variance reduction by getting more rendering of the regression estimated coefficients (Mao et al., 2019;Fernán dez del Castillo et al., 2022). Authors confirmed that this method of regression was able to restrict the shrinkage of the validity and keep the prediction level better than other methods (Walker, 2004). Ridge regression provides R 2 that means the percentage of parameter difference obtained by the complete and minimized models of interest depending on the biased ridge weights (Walker, 2004). The ridged weight (See Eq. 4) is computed by considering that the effect of error can be reduced by the ridge estimator, as this method provides prediction linked with lower MSE compared with other typical methods of estimation. Note that getting lower MSE, the best prediction accuracy by the model, and lower multi-collinearity with predictors result from adding the variance/covariance matrix with the optimum, biasing value (K) before computing the formula of regression. In this contest, the algorithm of Newton-Raphson is used to solve the K value iteratively till the minimization of MSE is achieved.
β * -the vector of standardized ridge regression weights; Rxx -the predictor inter-correlation matrix; Rxy -the predictor criterion correlation vector; I -the p-dimensional identity matrix; K -a biasing parameter (typically 0 < k < 1), in that order.

Ridge trace
Ridge trace is a plot that has been proposed to display the coefficients of ridge regression in terms of k. The lowest value of K (lowest bias) is selected from a ridged trace graph corresponding to stabilize the coefficients of regression. This is because coefficient's values are normally stabilized after a wide range of variation for the low K values. It is recommended to select the lowest k when the coefficient of regression reaches a stable value, as enhancing K leads to reducing the regression coefficients till zero.

Factor of variance inflation
The factor of variance inflation (FVI) is employed under a hypothesis that the occurrence of multi-collinearity increases the variance of regression, which results in regression beta coefficients having standard errors for the inflation, in non-empirical studies. As a result, the FVI can be used to detect multi-collinearity inside models.
Note that the acceptable value of FVI is dependant on the researchers' judgment and is decided before taking the multi-collinearity. In the current investigation, coefficient values of more than 2000 were considered multi-collinear (Walker, 2004).

Treated water characteristics
The characteristics of the treated water that are supplied from each treatment unit are shown in Table 2.
Findings showed that the mean pH values of drinking water for all treatment units were within the limits. All of the mean electrical conductivity values were higher than the standards, except those for Hay Al-Hussain, Al-Jubila 2, Al-Asmaee. No unacceptable alkalinity values were found in all treatment units based on both Iraqi and Water and Health Organization (WHO) standards. In terms of TH, all mean values were more than the Iraqi and WHO thresholds, except Al-Jubila 2 and Al-Asmaee in case of Iraqi standards. Magnesium ion mean concentrations were higher than the thresholds, excluding the water supplied by Al-Asmaee plant. Potassium ion mean concentrations were within the limits, except for the water supplied by Mhagran plant, which was slightly higher. Mean sulphate and calcium values were higher than the Iraqi standards except the drinking water of Al-Asmaee, Hay Al-Hussain, and Al-Jubila 2 treatment units. These results indicated that the purification units of Al-Asmaee, then Hay Al-Hussain, and Al-Jubila 2 were the best treatment plants in Basra city compared with other units. The low efficiency of treatment units in Basra city belong to the high salt and other pollutant loading rate of Shatt Al-Arab, as a main source of raw water to feed the treatment plants and then affect their performance day by day. Note that most of the contaminants presented in Shatt Al-Arab came from the discharge of untreated municipal, industrial and agricultural waste to the river directly (Hamdan et al., 2018;Zaboon et al., 2022).

Calculated WQI
The WQI calculations were done, as mentioned previously, based on the weight of each one of the 9 variables of the WQI. All samples were grabbed from each of the treatment units and analyzed monthly during 2021. The tested variables were pH, TH, T.ALK, EC, Mg ++ , Ca, Cl -, SO 4 --, and K + . Hence, the generated data for the eight treatment units were 96 value ( Table 3). Table 1 shows the standard WQI.
The WQI was a very useful tool that enabled quality comparisons between multiple water samples depending on each sample's indicator magnitudes. The results demonstrated that the calculated WQI values regarding the selected treatment units in Basra ranged between 37.9 (in June) and 89.7 (in May) corresponding to Al-Asmaee and Albradiah1, respectively ( Table 3). This wide range of variation may be due to the factors that could affect the raw water entered to the treatment plant from the surrounding area (Mohammed, 2013).
The average values of monthly water quality indices showed that around 13% of the supplied water was classified as a good water, and 88% of the supplied water was classified as poor water ( Table 4). The values ranged between 50 for Al-Assmae and 72 for Mhegran. This deterioration and poor quality of the supplied water in Basra attributed to the source of raw water (Shatt al-Arab river), which is affected by the discharge of noticeable amounts of pollutants from the vicinity places, and the increase of salt concentrations in the river due to the ebb and flow phenomenon from Arabian Gulf. Furthermore, most of the purification plants in Basra province are traditional (utilizing unsolvable components in water). Thus, it is required to use advanced units (ion exchange or membrane filters) to reduce the elevated concentration of soluble salts, which consequently provide clean water for safe use. Same outcomes were confirmed by Vieira et al. (2019). The authors mentioned that the anthropic activities affect the WQI of Mirim Lagoon in Brazil (Valentini et al., 2021b).   Poor poor poor poor poor poor good poor

Normality test
Based on the normality tests of Kolmogorov-Smirnov (K-S) and Shapiro-Wilk (S-W), results showed that the value of ρ was less than 0.05 for most of the studied variables (indicating that the distribution of the records was not normal). For this reason, the correlation coefficient test in the next section was applied using the Spearman test.
Correlation matrix, linearity test, and outlier values Table 5 views the results of the correlation matrix together with the significance value (ρ) and the coefficients. Correlation outcomes were used to define the regression model; therefore, both the variables and the calculated WQI records were correlated. Table 5 demonstrates that there is a positive significant correlation between the variables of pH, TH, T.ALK, EC, Mg ++ , Ca, SO 4 --, and K + , and the outcomes of the calculated WQI. As a result, these factors are assessed in the regression model in the next step. The threshold of correlations intensity by Helena et al. (2000) states that if the coefficient is of equivalence or more than ½ it is considered as a strongly correlated parameter, based on the absolute number. In this research, the variables of pH, TH, T.ALK, EC, Mg ++ , Ca, SO 4 --, and K + were strongly correlated with the calculated water quality index parameters. However, the sufficient pH value was strongly correlated. On the other hand, the pH value was not significant with WQI variables, and therefore, it is discarded. Note that all the coefficients of the parameter were positively correlated, which indicates that the increase of the concentration of each parameter leads to the increase of the WQI.
Regarding linearity examination, the assessment of sample distribution was applied for both the records linked with the calculated WQI and this index's results, as shown in Fig. 2. The results showed that there were three outliers in the data and these values were deleted.

Ridge trace
Ridge trace is used for noticing the behavior of θj(k) in terms of k in a simple way. The K value is chosen as the lowest magnitude to achieve stable coefficients of θj (k). Furthermore, at the chosen k, the remaining summation of squares is required to be near its smallest number. It is important for the factors of variance inflation FVI j(k) to be reduced till a value lower than 10. This is because a factor magnitude of 1 corresponds to an orthogonal system compared with a magnitude lower than 10 that corresponds to a non-collinear or stable system. The current research found out a k value of 0.05. The selection of the lowest magnitude for k is probable, as the coefficients of regression will be constant after this number. Note that this selection leads to a less bias (See Figs. 3 and 4).

Statistical analysis
The ridge regression models were built according to the outcomes gained from the statistical analyses of a significant correlation between the water variables and the calculated WQI. In this research, seven variables were used to develop the new WQI equation. The outcomes of the Spearman matrix presented that the studied parameters, excluding the pH, were highly to medium positively correlated. Also, the multi-collinearity was confirmed, and the independent parameters showed correlation between them.
The Ridge regression statistical model analysis showed that the best fit of R 2 was at 96.14 with a standard error of 1.04683. Therefore, modelling a new WQI equation was based on testing the hypotheses by the linearity test and examining the outlier value of the resulting data. Ridge regression was used as the best way for analyzing the outcomes of multiple regression when these outcomes belong to the problem of multi-collinearity. Hence, the generated equation has R 2 value of 96.14% and a mean square error of 1.04683. Note that the statistical difference (Fig. 5) in terms of the calculated WQI compared with the new one was not significant, and this result was exactly intended from this study.

Conclusion
In the current research, the main water purification units that are located in vital regions in Basra city were selected to examine the characteristics and the WQI of the purified water. Eight WQI variables were selected to monitor the quality of the selected units, which was beneficial to achieve worthy and low cost evaluation.
This study concluded that the purification units of Al-Asmaee, Hay Al-Hussain, and Al-Jubila 2 were the best treatment plants in Basra city compared with other units. In addition, the studied units supplied treated water with the WQI varying between 50 and 72, and only the water supplied from Al-Asmaee unit could be classified as good water for drinking.
Ridge regression was used to generate a new WQI equation, which matched the calculated WQI. This study suggests that water quality monitoring is successfully and beneficially attained by the statistical methods that are applied for modeling the WQI equation based on a few variables. Thus, the new WQI equation modeled in this study is effective to examine the quality of water supplied from any treatment unit in Basra city. According to the outcomes of the current research, the water treatment of Basra must be modified by filter membranes or ion exchange facilities to reduce the high concentration of salinity, which makes the water of Basra unsuitable for human uses.