Unsupervised learning example – marketing segments
Suppose we are given a large (one million rows) dataset where each row/observation is a single person with basic demographic information (age, gender, and so on) as well as the number of items purchased, which represents how many items this person has bought from a particular store:
This is a sample of our marketing dataset where each row represents a single customer with three basic attributes about each person. Our goal will be to segment this dataset into types or clusters of people so that the company performing the analysis can understand the customer profiles much better.
Now, of course, We’ve only shown 8 out of one million rows, which can be daunting. Of course, we can perform basic descriptive statistics on this dataset and get averages, standard deviations, and so on of our numerical columns; however, what if we wished to segment these one million people into different types so that the marketing department can have a much better sense of the types of people who shop and create more appropriate advertisements for each segment?
Each type of customer would exhibit particular qualities that make that segment unique. For example, they may find that 20% of their customers fall into a category they like to call young and wealthy that are generally younger and purchase several items.
This type of analysis and the creation of these types can fall under a specific type of unsupervised learning called clustering. We will discuss this machine learning algorithm in further detail later on in this book, but for now, clustering will create a new feature that separates out the people into distinct types or clusters:
This shows our customer dataset after a clustering algorithm has been applied. Note the new column at the end called cluster that represents the types of people that the algorithm has identified. The idea is that the people who belong to similar clusters behave similarly in regards to the data (have similar ages, genders, purchase behaviors). Perhaps cluster six might be renamed as young buyers.
This example of clustering shows us why sometimes we aren’t concerned with predicting anything, but instead wish to understand our data on a deeper level by adding new and interesting features, or even removing irrelevant features.
Note that we are referring to every column as a feature because there is no response in unsupervised learning since there is no prediction occurring.
It’s all starting to make sense now, isn’t it? These features that we talk about repeatedly are what this book is primarily concerned with. Feature engineering involves the understanding and transforming of features in relation to both unsupervised and supervised learning.