The classification and regression illustrations we just looked at are examples of supervised learning algorithms, in which we are trying to build a model that will predict labels for new data. Unsupervised learning involves models that describe data without reference to any known labels.
One common case of unsupervised learning is "clustering," in which data is automatically assigned to some number of discrete groups. For example, we might have some two-dimensional data like that shown in the following figure:
It is clear that each of these points is part of a distinct group. Given this input, a clustering model will use the intrinsic structure of the data to determine which points are related. Using the very fast and intuitive k-means algorithm, we find the clusters shown in the following figure:
a simple regression task in which the labels are continuous quantities.
Consider the data shown in the following figure, which consists of a set of points each with a continuous label.
There are two features describing each data point. The color of each point represents the continuous label for that point.
The feature 1-feature 2 plane here is the same as in the two-dimensional plot from before; in this case, however, we have represented the labels by both color and three-dimensional axis position.
The two-dimensional projection:
To predict labels for new points. Visually, we find the results shown in the following figure:
We have two features for each point, represented by the (x,y) positions of the points on the plane. In addition, we have one of two class labels for each point, here represented by the colors of the points. From these features and labels, we would like to create a model that will let us decide whether a new point should be labeled "blue" or "red."
The following figure shows a visual representation of what the trained model looks like for this data:
total contributions (3)