Agelos Papaioannou^{1,}*, George Rigas^{2}, George Karamanis^{3}, Eleni Dovriki^{4}
^{1} Department of Medical Laboratories, Section of Clinical Chemistry – Biochemistry, Education & Technological Institute of Larissa, Greece
^{2} Department of Animal Production, Education & Technological Institute of Larissa, Greece
^{3} Department of Biochemistry, Diagnostic Laboratories, General Hospital of Kavala, Greece
^{4} Department of Respiratory Medicine, Medical School, University of Thessaly, Larissa, Greece.
Abstract
Objectives: Multivariate statistical methods are not often use in medical studies bur there are already indications for their specific role as a tool of the medical statistics.
Design and Methods: Two multivariate statistical methods were used for assessment and modeling of clinical laboratory data from 185 healthy individuals and 173 end stage renal failure (ESRF) patients.
Results:Cluster Analysis (CA) and Factor Analysis (FA) were used for the determination of clinical laboratory data structure. CA shows the linkage among the biochemical parameters studied. Specific patterns of the classified clinical parameters could be offered, like general health indicator pattern (UA, ALT, ALP and TG); major component excretion pattern (CREA, UREA, P and K); and protein pattern (TP, ALB and AST) when all individuals hierarchical dendrogram considered. Moreover, the formation of three, five, and five varifactors are proven for All Individuals, Healthy Individuals, and Patient Individuals, respectively, with the application of FA, which are obviously responsible for the data structure. It worthy of note that the major groups of biochemical parameters interpreted by CA for the above three studied groups are also involved in the varifactor loadings results of FA. Thus, the classification scheme obtained by CA is confirmed by FA.
Conclusions: This study provides models for assessment and modeling of clinical laboratory data, finding groups of similarity among clinical tests usually determined on healthy individuals and patients with ESRF diagnosis,contributing in data mining and costs minimizing.
Keywords:Cluster Analysis, Factor Analysis, Healthy individuals, EndStage Renal Failure patients, Clinical laboratory data.
Introduction
Chronic kidney disease (CKD) has reached epidemic proportions in many parts of the world, driven by a rise in the occurrence of obesity and diabetes mellitus. Patients with CKD have a high prevalence of coronary artery disease [15].Kidney disease at all stages is associated with a substantial burden of illness. There is accumulating data suggesting that even mild levels of kidney dysfunction are associated with worse outcomes [6]. In the past two decades improvements in the medical management of kidney disease have delayed the rate of progression to kidney failure[7]. Optimal care before dialysis may thus ultimately have an impact on the survival of chronic dialysis patients [810].
In the past 40 years, the most commonly used marker of overall renal function in clinical practice has been plasma creatinine concentration. More precise renal function estimation can be obtained by using estimates of the glomerular filtration rate (GFR) [1116].
In recent years, the use of biochemical markers has received increasing attention for purposes of risk assessment and clinical management in renal failure patients. Mathematical models have been used since 1976 in an attempt to predict the progression of chronic renal failure. These models have used the serum creatinine level as either a reciprocal or logarithmic plot against time. These studies indicate that predictive models using serum creatinine levels are of limited clinical use [1723].
Patients with chronic kidney disease represent a major healthcare problem. In spite of modern treatment these patients are at highrisk for subsequent other clinical events. This population is however very heterogeneous and clinicians are therefore faced with the task of risk stratification to optimize individual patient care and avoid riskfilled therapies and procedures. Biochemical markers are easily available tools, providing clinicians with a window into the diverse pathophysiological mechanisms at work in disease in the individual patient. Assessment of these biomarkers in the diseased state however, requires knowledge of this marker in the healthy state taking into account variables which may affect levels such as age and gender. Hence a matched control population is necessary when evaluating models with these markers [2427].
Biochemical indicators are routinely monitored to enable timely assessment of strategies and management programmes in patient care. Moreover, an overview of the biochemical profiles for the patients can assist the clinician in making adjustments to clinical management practices. Thus, this study presents an example of clinical laboratory data intelligent analysis which used hierarchical cluster analysis (CA) and factor analysis (FA) for assessment and modelling of the input data. The target of the present study was to offer a new research strategy to classify, interpret and modelling clinical results in order to reach better decision–making solutions concerning human health and disease prevention, and to contribute in costs minimizing.
Methods and Materials
Experimental
We studied the distribution patterns of some analytes commonly assayed in clinical chemistry  biochemistry laboratories in healthy individuals and in endstage renal failure (ESRF) patients. One hundred and eightyfive healthy individuals and 98 ESRF patients from General Hospital of Kavala (GHK) were among those tested. All ESRF patients were undergoing hemodialysis in the above hospitals (duration of hemodialysis: mean ± SD = 65.1 ± 55.7; median = 50.0 months).
The data used in this study was derived from the findings of the blood samples that were taken in the biochemical laboratory of GHK. The selection of a normal subject, the preanalytical conditions and the analysis of the blood samples are described in details elsewhere [2831]. The analyses of 18 biochemical parameters were performed with a Dimension RXL analyzer (DadeBehring, U.S.A.) at 37^{0 }C according to the methods listed in Table 1, immediately after centrifugation.
Table 1. Methods used for the determination of the different quantities (37^{0 }C).
Quantity 
Dimension RXL Method 
Alanine aminotransferase (ALT) 
IFCC with (Ρ5Ρ) 
Albumin (ALB) 
(BCP) purple 
Aspartate aminotransferase (AST) 
IFCC with (Ρ5Ρ) 
Alkaline phosphatase (ALP) 
AMP buffer 
Calcium (Ca) 
ocresolphtaleincomplex 
Cholesterol (CHOL) 
CHOD/PAP or CHOD/POD 
Chloride (Cl) 
IMT Indirect 
Creatinine (CREA) 
Jaffe´ 
Glucose (GLU) 
Hexokinase (ΗΚ/G6PDH) 
High density lipoprotein –cholesterol (HDLC) 
Direct enzymatic 
Iron (Fe) 
Ferene 
Phosphorus (P) 
Phosphomolybdate U.V. 
Potassium (K) 
IMT Indirect 
Sodium (Na) 
IMT Indirect 
Total Proteins (TP) 
Biuret 
Triglycerides (TG) 
(CHOD/PAP or CHOD/POD) 
Uric acid (UA) 
Uricase/PAP or Uricase/POD 
Urea (UREA) 
Urease/GLDH U.V. 
Before each determination, calibration and internal control of analyzer with calibrators and quality controls of the corresponding manufactures preceded, according to manufacturer’s instructions and international literature [32, 33]. The reagents provided in the commercial kits were used in the analyzer, and the methods were adapted according to the manufacturer’s instructions. The water was free from metal ions and had a maximum receptivity of 18.2 Mohm cm at 25^{ º}C. Accuracy was checked (and achieved) by an external quality control program (Radox (RIQAS).
Statistical Methods
Descriptive Statistics
Basic statistics and correlation calculations were carried out in order to give initial information about the clinical laboratory data. Unless otherwise indicated, the characteristics of the subjects were described as mean values and standard deviation. Tests for significance of observed mean differences were performed using the Student’s ttest; and tests for significance of observed variances differences were performed using the Levene’s test. To evaluate the correlations between the levels of biomarkers of each studied group, the Pearson correlation coefficients were calculated.
All this data analysis was performed with the Statistical Package for Social Sciences SPSS 15.0, SPSS Inc. and Statistica 7.0 software for Windows.
Chemometric Methods
Cluster Analysis (CA) and FA were used for multivariate statistical modelling of the input data [3437].
Cluster analysis is a data reduction method that is used to classify entities with similar properties. The method divides a large number of objects into a smaller number of homogeneous groups on the basis of their correlation structure. The objective of cluster analysis is to identify the complex nature of multivariate relationships (by searching for natural groupings or types) among the data under investigation, so as to foster further hypothesis development about the phenomena being studied. Cluster analysis imposes a characteristic structure on the data analysis for exploratory purposes. Cluster analysis was conducted a group of biochemical data of healthy individuals and ESRF patients, using Ward’s method with Euclidean distance measure. We used cluster analysis to link variables in the configuration of a tree with different branches – branches that have linkages closer to each other indicate a stronger relationship among variables or cluster of variables. The dendrogram generated from tree clustering provides a useful graphical tool for determining the number of clusters that describe underlying processes that lead to spatial variation. We applied hierarchical CA on logtransformed standardized data using Ward’s method with squared Euclidean distances.
Factor analysis is used to understand the correlation structure of collected data and identify the most important factors contributing to the data structure. In factor analysis, the relationship among a number of observed quantitative variables is represented in terms of a few underlying, independent variables called varifactors, which may not be directly measured or even measurable. Factor analysis is also used to find associations between parameters so that the number of measured parameters can be reduced. Known associations are then used to predict unmeasured biochemical parameters. The initial step was the determination of the parameter correlation matrix, which is used to account for the degree of mutually shared variability between individual pairs of soil quality parameters. The second step was the estimation of the eigenvalues and factor loadings for the correlation matrix. Each eigenvalue corresponded to an eigenfactor that identifies the groups of variables that were most highly correlated among them. The first eigenfactor accounted for the greatest variation among the observed variables, while each subsequent eigenfactor was orthogonal to all preceding factors, and provided incrementally smaller contributions to the overall descriptive ability of the model. Because lower eigenvalues may contribute little to the explanatory capability of the data, only the first few factors were needed to account for much of the parameter variability. In this study, the factor extraction was performed using the method of principal components. The most widely used methods for determining how many factors to use and how many to ignore are the Kaiser criterion and scree plot test. This means that each retained factor provides as much explanatory capability as one original variable. Once the correlation matrix and eigenvalues were obtained, factor loadings were used to measure the correlation between variables and factors. Factor rotation was used to facilitate interpretation by providing simpler factor structure. The factors were rotated so that the observed axes were aligned with a dominant set of variables, which assisted in understanding how factors were related to the observed variables. Our study used the varimax rotation which is a standard rotation method.
Results
Statistical Screening of Data
First, Levene’s test for equality of variances and both pooled and separatevariances ttests for equality of means were conducted to see if there is a difference between the studied groups’ variances and mean values for each of the 18 biochemical parameters. The application of CA, and FA was restricted to the rest of the 11 parameters (UREA, CREA, TG, UA, K, Na, Ca, P, AST, ALT, ALP, TP, ALB), which were significantly different (p<0.01) between the two groups.
Descriptive statistics for the above 11 biochemical parameters for the total number of the cases, the two distinct groups (healthy individuals and patients from the hospital of Kavala), and sex categories are presented in Table 2.
Table 2. Descriptive statistics for the tested clinical parameters for female, male and all individuals’ data sets (mean value, standard deviation of the mean).
Parameter 
Cases 
Female 
Male 
Total 
mean±SD 
mean±SD 
mean±SD 

ALB (g/l) 
Patient 
3.6±0.4 
3.6±0.4 
3.6±0.4 
Healthy 
4.8±0.4 
5.1±0.5 
5.0±0.4 

ALP (U/l) 
Patient 
119.7±86.5 
96.6±40.8 
104.4±60.7 
Healthy 
54.4±15.8 
77.7±25.7 
63.7±23.2 

ALT (U/l) 
Patient 
36.3±13.1 
41.3±27.1 
39.6±23.4 
Healthy 
18.6±14.6 
27.2±14.7 
22.1±15.2 

AST (U/l) 
Patient 
14.7±8.9 
16.3±13.3 
15.8±12.0 
Healthy 
20.0±6.3 
24.8±9.2 
21.9±7.9 

CREA (mg/dl) 
Patient 
8.8±1.6 
9.6±2.4 
9.3±2.2 
Healthy 
0.9±0.1 
1.0±0.1 
1.0±0.1 

K (mmol/l) 
Patient 
5.6±0.7 
5.3±0.9 
5.4±0.8 
Healthy 
4.5±0.4 
4.3±0.3 
4.4±0.4 

P (mg/dl) 
Patient 
5.4±1.9 
5.5±1.5 
5.5±1.6 
Healthy 
3.9±0.5 
3.9±0.7 
3.9±0.6 

TG (mg/dl) 
Patient 
200.6±142.6 
159.1±77.9 
173.0±105.4 
Healthy 
75.9±35.2 
91.6±44.5 
82.2±39.7 

TP (g/l) 
Patient 
6.7±0.4 
6.8±0.6 
6.8±0.5 
Healthy 
7.6±0.5 
7.8±0.5 
7.6±0.5 

UA (mg/dl) 
Patient 
5.2±1.0 
5.8±0.9 
5.6±1.0 
Healthy 
3.8±0.8 
5.4±1.0 
4.5±1.2 

UREA (mg/dl) 
Patient 
164.9±35.9 
174.9±36.0 
171.6±36.1 
Healthy 
24.1±5.0 
28.4±5.6 
25.8±5.6 
The crosscorrelation between the different biochemical test parameters of General Hospital of Kavala (GHK) healthy individuals showed that although the overall significance of many of them was statistically sound, according to the Pearson test of the results for r, a real logical interpretation (r > 0.6) for significance could be offered only for the couples of parameters like AST/ALT (0.752, p<0.001) and TP/ALB (0.790, p<0.001).Thecorrelated couples of parameters for the 98 ESRF patients from the GHK were also AST/ALT (0.718, p<0.001) and TP/ALB (0.686, p<0.001). These correlations were used to identify groups of highly correlated biochemical variables, and it is evident that the simple correlation analysis did not indicate specific links among the studied biochemical parameters.
Parameters Distribution Characteristics and Data Treatment
Most methods, such as CA and FA, require variables to conform to a normal distribution. Thus, the normality of the distribution of each variable was checked by KolmogorovSmirnov statistic, histograms, normality plots, and by analyzing kurtosis and skewness before multivariate statistical analyses.
The original data demonstrated that HDLC and Na more; and UA, TP, ALB less were almost normally distributed, whereas the other parameters (except Ca) were positively skewed, with kurtosis coefficients significantly differed from zero and the most (except UREA and CREA) significantly greater than zero (95% confidence). After logtransformation of the parameters all skewness and kurtosis values were significantly reduced (Fig. 1).
Figure 1. Skewness and Kurtosis coefficients of original (●) and logtransformed (▼) data.
For CA and FA, all parameters were also zscale standardized (mean = 0; variance = 1) to minimize the effects of differences in measurement units and variance and render the data dimensionless. Consequently, each column had zero mean and unit variance.
Analysis by Multivariate Statistical Methods
Two different statistical methods were applied for the analysis of clinical laboratory data. CA and FA interacted together harmoniously to model the 14 parameters data sets corresponding to (i) all individuals from Kavala’s hospital (KAI); (ii) healthy individuals from Kavala’s hospital (KHI) and (iii) patient individuals from Kavala’s hospital (KPI).
Structure of Clinical Laboratory Data
CA and FA are used in order to examine the structure of the biochemical data in the studied groups.
CA was performed to the datasets of the three groups (KAI, KHI, and KPI)consisting of the 11 biochemical parameters (UREA, CREA, K, P, UA, ALT, TG, ALP, AST, TP, and ALB). The respective hierarchical dendrograms are shown in Fig. 2.
Figure 2. Hierarchical dendrograms for 11 biochemical parameters of (a) KAI, (b) KHI and (c) KPI data sets.
Examining each dendrogram, it could be concluded that the parameters are principally separated into two main clusters, each of them divided additionally into subclusters that are presented in Table 3.
Table 3. Subclusters with the parameters of each hierarchical dendrogram of cluster analysis for the KAI, KHI, and KPI data sets.
Groups 
Subcluster A 
Subcluster B 
Subcluster C 
Subcluster D 
Subcluster E 
KAI 
CREA, UREA, K, P 
TG, UA, ALT, ALP 
AST, TP, ALB 
 
 
KHI 
UREA, CREA, UA, ALP 
K, P 
TP, ALB 
TG 
AST, ALT 
KPI 
CREA, UREA, K, P 
TP, ALB, TG 
AST, ALT 
UA 
ALP 
Usually, the typical classification approach of clustering is accompanied by FA, which is a typical projection and modelling approach. In general, FA confirms the results obtained by CA.
FA was applied to standardized logtransformed data sets of KAI, KHI, and KPI (consisting of the 11 biochemical parameters (UREA, CREA, K, P, UA, ALT, TG, ALP, AST, TP, ALB)), to identify the latent factors, to examine differences between healthy and patient individuals, and to determine the biochemical characteristics for each group.Before conducting the FA, the Kaiser–Meyer–Olkin(KMO) and Bartlett’s sphericity tests were performed on the parameter correlation matrix to examine the validity of the FA.The KMO results for groups all (AI), healthy (HI), and patient (PI) individuals of GHK were 0.837, 0.622, and 0.558, respectively, and those for Bartlett’s sphericity were 2435.72, 672.35, and 283.40 (p < 0.05), indicating that FA may be useful in providing significant reductions in dimensionality.
Based on the scree test plot of Fig. 3, only the varifactors (VFs) with eigenvalues greater than 0.935 were considered essential.
Figure 3. Scree plot diagram of FA for KAI (▼), KHI (□) and KPI (○) groups.
FA yielded three, five, and five VFs explaining 71.33, 74.98% and 74.21% of the total variance in the respective data sets. Table 4 summarized the FA results comprising the loadings, eigenvalues and cumulative of variance (%). In this study, loadings with absolute value more than 0.5 were considered significantand were highlighted.
Table 4. Loadings (L) of the 11 measured parameters of KAI, KHI, and KPI data sets (L greater than 0.5 were considered significant).
Parameters 
KAI data set 
KHI data set 
KPI data set 

VF1 
VF2 
VF3 
VF1 
VF2 
VF3 
VF4 
VF5 
VF1 
VF2 
VF3 
VF4 
VF5 

UREA 
0.637 
0.566 
0.440 
0.021 
0.075 
0.734 
0.062 
0.321 
0.826 
0.006 
0.030 
0.193 
0.192 

CREA 
0.649 
0.545 
0.454 
0.056 
0.505 
0.647 
0.134 
0.213 
0.585 
0.461 
0.055 
0.383 
0.039 

TG 
0.301 
0.345 
0.470 
0.112 
0.054 
0.358 
0.569 
0.397 
0.033 
0.493 
0.024 
0.477 
0.121 

UA 
0.138 
0.072 
0.745 
0.465 
0.204 
0.633 
0.046 
0.215 
0.061 
0.216 
0.046 
0.844 
0.027 

K 
0.697 
0.224 
0.279 
0.013 
0.050 
0.063 
0.022 
0.855 
0.816 
0.148 
0.067 
0.033 
0.206 

P 
0.791 
0.060 
0.184 
0.215 
0.103 
0.104 
0.728 
0.040 
0.636 
0.007 
0.019 
0.199 
0.425 

AST 
0.611 
0.426 
0.359 
0.928 
0.067 
0.018 
0.001 
0.055 
0.057 
0.083 
0.905 
0.034 
0.046 

ALT 
0.022 
0.280 
0.850 
0.895 
0.032 
0.165 
0.160 
0.080 
0.011 
0.063 
0.889 
0.084 
0.095 

ALP 
0.358 
0.103 
0.554 
0.356 
0.107 
0.331 
0.636 
0.218 
0.078 
0.028 
0.047 
0.104 
0.910 

TP 
0.108 
0.919 
0.151 
0.028 
0.940 
0.000 
0.019 
0.040 
0.107 
0.835 
0.055 
0.197 
0.105 

ALB 
0.326 
0.850 
0.283 
0.034 
0.909 
0.126 
0.163 
0.053 
0.075 
0.904 
0.076 
0.035 
0.076 

Eigenvalue 
5.485 
1.424 
0.937 
2.929 
2.073 
1.185 
1.126 
0.936 
2.74 
1.758 
1.589 
1.126 
0.951 

Cumulative (%) of variance 
49.864 
62.813 
71.33 
26.63 
45.47 
56.24 
66.48 
74.98 
24.91 
40.89 
55.34 
65.57 
74.21 
Discussion
Two data sets (185 healthy individuals and 173 ESRF patients), each one including 18 biochemical parameters, were analyzed. Only 11, out of the 18 parameters, were used for statistical analysis because they were significantly different between the groups of healthy and patients.
Descriptive statistics and Pearson correlation test gave basic information about the clinical laboratory data of the studied groups. As expected, ALB, AST, HDLC, Ca, TP and Na concentrations were higher; and ALP, ALT, CREA, K, P, TG, UA and UREA were lower in healthy individuals group. Moreover, Pearson correlation test (r>0.6) showed that there was strong relationship between the couples of parameters AST – ALT and TP – ALB, for both healthy individuals and ESRF patients data sets. It was evident that the simple correlation analysis did not indicate specific links among the studied biochemical parameters.
First CA was applied to the data sets of the three studied groups (KAI, KHI and KPI) consisting of 11 biochemical parameters. In each produced dendrogram the parameters were separated into two main clusters with different classification. The classification results give some important information about the relationships among the biochemical test parameters. It is obvious that all parameters (Fig. 2a) are divided into subpatterns each one of them related to a specific function. More specific, the first cluster includes dominantly protein parameters (ALB and TP) and one enzyme parameter like AST. The second cluster is more heterogeneous and involves many parameters related to metabolic excretion processes (UREA, CREA, and UA), two enzymes (ALT and ALP), and chemical cell and blood components (K, P, and TG).
This intelligent data analysis gives an idea on how the single clinical parameters should be compared and related to one another if the individual is treated as depending on all clinical values simultaneously, not separately. For instance, within a group of KPI (Fig. 2c), there is a stronger relation between the group of parameters (CREA, UREA, K, and P) with parameters like TG, TP, and ALB that to UA and ALP or AST and ALT. Therefore, specific patterns of the classified clinical parameters could be offered (for the group of all individuals (KAI)):
1. General health indicator pattern (including UA, ALT, ALP, TG levels)
2. Major component excretion pattern (including CREA and UREA,as well as chemical content of phosphorus and potassium)
3. Protein pattern (with determination of ALB and TP and one enzyme AST).
The CA of the biochemical parameters gives not only information about the relationship among the various groups of biochemical tests of the healthy individuals or the ESRF patients but also ideas about optimizing the number of test necessary to check the healthy individual’s or patient’s condition. For fast screening test it seems reasonable to use some representatives of the separate clusters in order to have information about the state of art in a certain case. The task medical doctors have to solve is to select biochemical parameters both easy to perform (and interpret) and to inform.
Differences in clustering of the studied biochemical variables (Fig. 2b and 2c) were evident from the comparison of the two studied groups (healthy versus (ESRF) patients). For both groups five subclusters were formed (subcluster A includes CREA and UREA for both KHI and KPI groups, with the parameters UA and ALP in KHI and K and P in KPI; subcluster B includes TP and ALB for both groups and TG in KPI; subcluster C includes the stable pair AST and ALT for both groups; subcluster D includes K and P in KHI and UA in KPI; and subcluster E includes TG in KHI and ALP in KPI).
Therefore, when grouping the clinical parameters only for healthy individuals (Fig. 2b) or only for (ESRF) patients (Figure 2c), again, several patterns are formed, which correspond in principle with the idea of major component excretion group (CREA and UREA), enzyme (AST and ALT) group and protein (TP and ALB) group. This seems to be a healthy or ESRF patient specific clinical parameters classification rule. It seems obvious that the possible introduction of a more general health stage indicator has to be healthy – ESRF patient specific. This is a confirmation of the finding regarding the differences between the average values of the single biochemical parameters between KHI and KPI.
As a projection and modelling method FA gives the opportunity to determine the structure of the data sets, to identify the latent factors responsible for the data structure. Very often it is combined with CA to check the classification done. The VFs in table 4 indicate the latent factor structure and help in interpretation of the data sets. The statistically significant factor loadings are marked (the significance is determined by the rule of Malinowski). The formation of three, five, and five latent factors, which were obviously responsible for the data structure, were proven for each of the three different groups: KAI, KHI, and KPI, respectively.
When KAI group was considered, three varifactors explained 71.3 % of the total variance of the system, which is an indication for the FA model adequacy. The first varifactor with high factor loadings for UREA, CREA, K and P could be conditionally named “major component excretion” factor and corresponded completely to the cluster with the same nomination. It explained 49.9 % of the total variance. Next level of total variance explanation (nearly 13 %) is accomplished by the second varifactor, which indicated high correlation (factor loadings values) for TP and ALB and weak correlation for AST and resembled one of the stable subclusters in all dendrograms. Therefore, it could be again conditionally named “protein” factor. Finally, the third varifactor explained also a substantial part of the total variance (8.5 %) and revealed the relation between the clinical parameters ALT, ALP, UA and TG, which allowed its conditional designation as “general health indicator” factor.
The application of FA for KHI group (Table 4) resulted five varifactors (VFs); the VF1 contained the parameters AST and ALT with strong positive loadings and exactly corresponded to the subcluster A of CA; the second VF included TP and ALB with strong positive loadings, and exactly corresponded to the subcluster B; VF3 included CREA, UREA, UA with strong positive loadings and ALP with weak positive loading (L=0.331) and corresponded to the subcluster C (it could also be included the parameter TG with weak positive loading (L=0.358)); VF4 included the parameters ALP and P and resembled to the subcluster D (now ALP included instead K); and lastly VF5 contained the parameter K with strong positive loading and the parameter TG with weak negative loading (0.397), thus it could be corresponded to the subcluster E of CA (Table 3).
The same method of analysis for ESRF patients group (Table 4) revealed that the first latent factor with high positive factor loadings for UREA, CREA, K and P corresponded to the subcluster A of CA and explained almost 25 % of the total variance. Next, level of total variance explanation (almost 16 %) is accomplished by the second latent factor, which indicated strong correlation (factor loadings values) for TP, ALB and weak for TG. Therefore, it could be corresponded to the subcluster B of CA. The third latent factor explained a substantial part of the total variance (14.4 %) and revealed the strong positive relation between the clinical parameters AST and ALT which corresponded to the subcluster C of CA, too. The latent factor four explained over 10 % of the total variance and it is due to the high positive loading of UA. The fifth latent factor explained a substantial part of the total variance (8.6 %) and revealed the strong positive loading of ALP. The two last latent factors again corresponded to the subcluster D and E, respectively.
It is readily seen that the major groups of biochemical parametersinterpreted by cluster analysis for the three studied groups (Fig. 2 and Table 3) are also involved in the varifactor loadings presented in Table 4, except some minor differences as one compares the classification results by CA and the FA modelling when KHI data set is considered. This is not surprising since the data pretreatment very often influences to a small extent the linkage among the variables. Thus, the classification scheme obtained by cluster analysis is confirmed by factor analysis. This confirmation is very important because both chemometric approaches have proved that the biochemical test values are linked in specific patterns and these patterns could be revealed and interpreted only by the use of multivariate statistics.
Conclusion
A combination of CA and FA is used to create models that could assess and model clinical laboratory data of healthy individuals and ESRF patients, based only on their routinely determined biochemical parameters. It is important to notice that the above environmetric methods interacted harmoniously to model the biochemical parameters data sets of KAI, KHI and KPI.
Consequently, our main findings were:
 i. CA and FA were used for the determination of clinical laboratory data structure. CA shows the linkage among the biochemical parameters studied. Specific patterns of the classified clinical parameters could be offered, like general health indicator pattern (UA, ALT, ALP and TG); major component excretion pattern (CREA, UREA, P and K); and protein pattern (TP, ALB and AST) when all individuals hierarchical dendrogram considered. Moreover, the formation of three, five, and five varifactors are proven for KAI, KHI, and KPI, respectively, with the application of FA, which are obviously responsible for the data structure. It worthy of note that the major groups of biochemical parametersinterpreted by CA for the above three studied groups are also involved in the varifactor loadings results of FA. Thus, the classification scheme obtained by CA is confirmed by FA. This confirmation is an important hint that the clinical parameters tested are, indeed, related and form groups of similar indicative properties.
 ii. The FA yielded models for monitoring the biochemical profile of healthy individuals and ESRF patients and could also be helped in costs minimizing. In our case, FA model for KPI results only five VFs that could be used for the monitoring of biochemical parameters. The first parameter could be UREA; the second ALB; AST may be used as the third; UA is the fourth; and ALP could be the fifth biochemical parameter (one parameter from each of the five VFs). For healthy individuals again only five parameters could be monitored, AST; TP; UREA; P; and K. When one of these parameters gave “unusual” values for a healthy individual or an ESRF patient then the other parameters could be determined. This means a reduction of about 72% in the number of biochemical parameters examined and 75% in costs (in Greece, the mean cost per biochemical test for the group of 11 biochemical parameters is about 5.55 euro).
References
[1] Ardissino G, Daccό V, Testa S, Bonaudo R, ClarisAppiani A, Taioli E, et al. Epidemiology of Chronic Renal Failure in Children: Data From the ItalKid Project. Pediatrics 2003;111:e382e387.
[2] Gupta R, Birnbaum Y, Uretsky BF. The renal patient with coronary artery disease: current concepts and dilemmas. J Am Coll Cardiol 2004;44:13431353.
[3] Yerkey MW,Kernis SJ, Franklin BA, Sandberg KR, McCullough PA. Renal dysfunction and acceleration of coronary disease. Heart 2004;90:961966.
[4] Coresh J, ByrdHolt D, Astor BC, Briggs JP, Eggers PE, Lacher DA, Hostetter TH. Chronic Kindey Disease Awareness, Prevalence, and Trends among U.S. Adults, 1999 to 2000. J Am Soc Nephrol 2005;16:180188.
[5] Asselbergs FW, Mozaffarian D, Katz R, Kestendaum B, Fried LF, Gottdiener JS, et al. Association of renal function with cardiac calcifications in older adults: the cardiovascular health study. Nephrol Dial Transplant 2009;24:834840.
[6] McCullough PA, Soman SS, Shah SS, Smith ST, Marks KR, Yee J, Borzak S. Risks associated with renal dysfunction in patients in the coronary care unit. J Am Coll Cardiol 2000;36:67984.
[7] Curtis B, Barrett BJ, Levin A. Identifying and slowing progressive chronic renal failure. Can Fam Physician 2001;47:25128.
[8] Ifudu O, Dawood M, Hornel P, Friedman EA. Excess morbidity in patients starting uremia therapy without prior care by a nephrologist. Am J Kidney Dis 1996;28:8415.
[9] Levin A, Lewis M, Mortiboy P, Faber S, Hare I, Porter EC, et al. Multidisciplinary predialysis programs: quantification and limitations of their impact on patient outcomes in two Canadian settings. Am J Kidney Dis 1997;29:53340.
[10] Stigant C, Stevens L, Levin A. Nephrology: Strategies for the care of adults with chronic kidney disease. JAMC 2003;168:155360.
[11] Cockcroft DW, Gault MH. Prediction of creatinine clearance from serum creatinine. Nephron 1976;16:3141.
[12] Larsson A, Malm J, Grubb A, Hansson LO. Calculation of glomerular filtration rate expressed in mL/min from plasma cystatin C values in mg/L. Scand J Clin Lab Invest 2004;64:2530.
[13] Poggio ED, Wang X, Greene T, Van Lente F, Hall PM. Performance of the Modification of Diet in Renal Disease and CockcroftGaul Equations in the Estimation of GFR in Health and in Chronic Kidney Disease. J Am Soc Nephrol 2005;16:459466.
[14] Selvin E, Köttgen A, Coresh J. Kidney function estimated from serum creatinine and cystatin C and peripheral arterial disease in NHANES 19992002. Eur Heart J. 2009. Doi:10.1093/eurheartj/ehp195.
[15] Tsinalis D, Thiel GT. An easy to calculate equation to estimate GFR based on inulin clearance. Nephrol Dial Transplant 2009. Doi: 10.1093/ndt/gfp193.
[16] Pöge U, Gerhardt T, Woitas RP. Estimation of glomerular rate by use of betatrace protein. Clinical Chemistry 2008;54:14031405.
[17] Pizzarelli F, Lauretani F, Bandinelli S, Windham GB, Corsi AA, Giannelli SV, et al. Predictivity of survival according to different equations for estimating renal function in communitydwelling elderly subjects. Nephrol Dial Transplant 2009;24:11971205.
[18] Shlipak MG, Katz R, Sarnak MJ, Fried LF, Newman AB, StehmanBreen C, et al. Cystatin C and prognosis for cardiovascular and kidney outcomes in elderly persons without chronic kidney disease. Ann Intern Med. 2006;145:237246.
[19] Stevens LA, Fares G, Fleming J, Martin D, Murthy K, Qiu J, et al. Low rates of testing and diagnostic codes usage in a commercial clinical laboratory: evidence for lack of physician awareness of chronic kidney disease. Am Soc Nephrol 2005;16:24392448.
[20] Soylu A, Kasap B, Demir K, Turkmen M, Kavukcu S. Predictive value of clinical laboratory variables for vesicoureteral reflux in children. Pediatr nephrol 2007;22:844848.
[21] Lee D, Levin A, Roger SD, McMahon P. Longitudinal analysis of performance of estimated glomerular filtration rate as renal function declines in chronic kidney disease. Nephrol Dial Transplant 2009;24:109116.
[22] Serra MA, Puchades MJ, Rodriguez F, Escudero F, del Olmo JA, Wassel AH, Rodrigo JM. Clinical value of increased serum creatinine concentration as predictor of shortterm outcome in decompensated cirrhosis. Scand J Gastroenterol 2004;11:11491153.
[23] Kronborg J, Solbu M, Njølstad I, Toft I, Eriksen BO, Jenssen T. Predictors of change in estimated GFR: a populationbased 7year followup from the Tromsø study. Nephrol Dial Transplant 2008;23:28182826.
[24] NCCLS. Procedures for the Handling and Processing of Blood Specimens. Document H18A Villanova, PA: NCCLS; 1990.
[25] NCCLS. Procedures for the Collection of Diagnostic Blood Specimens by Venipuncture. Approved Standard, H3A5, Wayne, PA: NCCLS 2003.
[26] NCCLS. Procedures for the Collection of Diagnostic Blood Specimens by Skin Puncture: Document H4A3 Wayne, PA: NCCLS 1991.
[27] Tietz, N W. "Specimen Collection and Processing; Sources of Biological Variation", Textbook of Clinical Chemistry, 2nd Edition, W. B. Saunders, Philadelphia, PA (1994).
[28] Papaioannou A, Simeonov V, Plageras P, Dovriki E, Spanos T. Multivariate statistical interpretation of laboratory clinical data, European Journal of Medicine, 2007;3:319334.
[29] IFFC: Approved recommendation on the theory of reference values. Part 4. Theory of reference values. Control of analytical variation in the production, transfer and application of reference values. Clin Chim Acta 1991;202:S5S12.
[30] NCCLS. How to Define, Determine, and Utilize Reference Intervals in the Clinical Laboratory.Document C28A, Villanova, PA: NCCLS 1994.
[31] Grossi E, Colombo R, Cavuto S, Franzini C. The REALAB Project: A New Method for the Formulation of Reference Intervals Based on Current Data. Clin Chem 2005;51:12321240.
[32] Kafka MT. Internal quality control, proficiency testing and the clinical relevance of laboratory testing. Arch Rathol Lab Med 1988;112:44953.
[33] NCCLS. Internal Quality Control Testing: Principles and Definitions. NCCLS, Document C24A Wayne, PA: NCCLS; 1991.
[34] Vogt W, Nagel D. Cluster Analysis in Diagnosis. Clin Chem 1992;38:182198.
[35] Clinton D, Button E, Norring C, Palmer R. Cluster analysis of key diagnostic variables from two independent samples of eatingdisorder patients: evidence for a consistent pattern. Psychological Medicine 2004;34:10351045.
[36] Ness R, Kip K, Hillier S, Soper D, Stamm C, Sweet R, Rice P, Richter H. A Cluster Analysis of Bacterial Vaginosisassociated Microflora and Pelvic Inflammatory Disease. Am J of Epidemiol 2005;162:585590.
[37] Toraldo DM, Nicolardi G, De Nuccio F, Lorenzo R, Ambrosino N. Pattern of variables describing desaturator COPD patients, as revealed by cluster analysis. Chest 2005;128:38283837.
Corresponding Author
Agelos Papaioannou
Associate Professor
Biochemisrty – Clinical Chemistry
Department of Medical Laboratories, TEI of Thessaly, 41110 Larissa, Greece
Tel.:+30 2410 684448