K-Means Clustering

<< Click to Display Table of Contents >>

Current:  Advanced Analytics > Experiment and Application of Advanced Analytic > Y-Advanced Analytics 

K-Means Clustering

Previous pageReturn to chapter overviewNext page

Demission Rate Analysis

By establishing a K-Means Clustering analysis model, an enterprise classifies employees and identifies potential demission employees as early as possible. Take relevant measures specifically to reduce the loss of employees by demission.

 

Data Preparation and Correlation Analysis

Drag the "HR" data set node to the edit area, and add Correlated Analysis nodes to connect to the data set node.

ML73

Configure correlation analysis nodes. Set "Correlation" to "Pearson", and add Years of entry, Staff level, Staff Projects, average monthly working hours, Whether or not mistakes, Whether to leave, Has it been promoted in the last five years and satisfaction to "Choose Correlated Columns."

ML74

 

Click "Result." The correlation coefficient between Whether to leave and satisfaction is -0.388, the correlation coefficient between Staff level and Staff Projects is 0.349, and the correlation coefficient between Staff Projects and average monthly working hours is 0.417, which are all within the range of low correlation.

ML75

 

The size of correlation coefficient indicates that: |r|>0.95 significantly correlated; |r|≥0.8 highly correlated; 0.5≤|r|<0.8 moderately correlated; 0.3≤|r|<0.5 lowly correlated; |r|<0.3 extremely low correlation which is deemed as non-correlated

 

K-Means Clustering

oConfiguration items

Add a K-Means Clustering node and connect data set nodes. Set "Number of Centers" to "3", "Initial Cluster Center" to "Random Distance", "Random Seed" to "0", and "Distance Metric Algorithm" to "Euclidean Distance." It choose missing data filling by default instead of normalization; add satisfaction and Staff level as independent variable. 

ML76

oRunning

After parameter configuration, the K-Means Clustering node is under not running state.

ML77

Right-click "K-Means Clustering" and choose "Run" from the context menu. After the operation succeeds, the nodes are displayed as follows:

ML78

oResult display

Click "Result." The result shows that employees are classified into three types based on satisfaction and Staff level, as shown in the following figure.

ML142

The "Centers in K-Means Clustering" table lists the specific values of the three clustering centers and the total sample number in each cluster. The "Cluster Members" table lists the detailed data of the samples in each cluster and the distance to the clustering center.

 

Model Application

oApplication of the model in the dashboard

Select the "K-Means Clustering" node and save it as a training model. The saved model can be applied on the data set bound to the components of the dashboard in "Create Dashboard."

1.On the "Create Dashboard" page, select a data set to create a dashboard. Click the right mouse button and then choose "Apply Trained Model" from the context menu. Select the application training model by clicking the right button on the data set: 

ML81

2.Open the "Specify Trained Model" dialog which displays only the models that can be applied to the bound data set.

ML82

3. Select the saved training model "K-Means Clustering" and click "OK" to generate 2 columns, namely "Cluster" and "Distance." Cluster: Results of clustering classification; Distance: the distance between the point and clustering center.

ML83

4. Create a dimension expression based on the application model result, and rename the three types as follows.

ML84

5. Bind the data to the chart component as follows, and select the line chart.

ML85

6. Set up a dynamic calculator on the fields of axis y: Summarize the percentage; get rid of the points on the mark and modify the size to get the following results. The above figure shows that the monthly working hours of the non-qualified people are concentrated on the left (lower). The working hour of confused workers is the longest followed by the excellent workers.

ML86

 

oScore

For details, see Logistic Regression of "Experiment and Application of Advanced Analytics."