更新时间:2021-06-25 22:46:20
封面
版权信息
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the authors
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Introduction to Feature Engineering
Motivating example – AI-powered communications
Why feature engineering matters
What is feature engineering?
Understanding the basics of data and machine learning
Supervised learning
Unsupervised learning
Unsupervised learning example – marketing segments
Evaluation of machine learning algorithms and feature engineering procedures
Example of feature engineering procedures – can anyone really predict the weather?
Steps to evaluate a feature engineering procedure
Evaluating supervised learning algorithms
Evaluating unsupervised learning algorithms
Feature understanding – what’s in my dataset?
Feature improvement – cleaning datasets
Feature selection – say no to bad attributes
Feature construction – can we build it?
Feature transformation – enter math-man
Feature learning – using AI to better our AI
Summary
Feature Understanding – What's in My Dataset?
The structure or lack thereof of data
An example of unstructured data – server logs
Quantitative versus qualitative data
Salary ranges by job classification
The four levels of data
The nominal level
Mathematical operations allowed
The ordinal level
The interval level
Plotting two columns at the interval level
The ratio level
Recap of the levels of data
Feature Improvement - Cleaning Datasets
Identifying missing values in data
The Pima Indian Diabetes Prediction dataset
The exploratory data analysis (EDA)
Dealing with missing values in a dataset
Removing harmful rows of data
Imputing the missing values in data
Imputing values in a machine learning pipeline
Pipelines in machine learning
Standardization and normalization
Z-score standardization
The min-max scaling method
The row normalization method
Putting it all together
Feature Construction
Examining our dataset
Imputing categorical features
Custom imputers
Custom category imputer
Custom quantitative imputer
Encoding categorical variables
Encoding at the nominal level
Encoding at the ordinal level
Bucketing continuous features into categories
Creating our pipeline
Extending numerical features
Activity recognition from the Single Chest-Mounted Accelerometer dataset
Polynomial features
Parameters
Exploratory data analysis
Text-specific feature construction
Bag of words representation
CountVectorizer
CountVectorizer parameters
The Tf-idf vectorizer
Using text in machine learning pipelines
Feature Selection
Achieving better performance in feature engineering
A case study – a credit card defaulting dataset
Creating a baseline machine learning pipeline
The types of feature selection
Statistical-based feature selection
Using Pearson correlation to select features