Create Data Set > Self-Service Data Set > Data Governance > Create Advanced Algorithm

❖K-Means Clustering

The number (N) of clustering classifications need to be specified for K-Means clustering. Take N samples randomly as the original center. Calculate and classify the distance from each sample to the center. Re-calculate the classification center after completing sample division. Repeat this process until the classification center doesn't change any longer. Use the kmeans function in R to carry out K-Means clustering.

kmeans(data,centers=3,nstart=10), where the centers parameter is to set the number of classifications, the nstart parameter is to set the number of taking the random original center, namely the running time of kmeans. When kmeans function is used, the default value is 10.

[Cluster Dimensions] Clustering sample set Drag the fields to be used as clustering dimensions from the "Available Columns" box to the "Cluster Dimensions" box.

[Setting K] Number of classifications. You can enter a number manually or enter the maximum K value. The system will calculate the optimal K value according to the contour coefficient.

[Output value] [Cluster Labels] Classification of each sample

[Output value] [Principal Components] Perform principal component analysis for the clustering dimension. Take the most important two components.

•For example

Suppose to remove the classification column and classify the three flowers according to the four attributes.

Create K-means clustering analysis on the chart as shown in the following figure below.

ML113

Clustering results are as shown in the figure:

ML114