Feature Engineering Made Easy

Sinan Ozdemir Divya Susarla

更新时间：2021-06-25 22:46:20

封面

版权信息

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the authors

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Introduction to Feature Engineering

Motivating example – AI-powered communications

Why feature engineering matters

What is feature engineering?

Understanding the basics of data and machine learning

Supervised learning

Unsupervised learning

Unsupervised learning example – marketing segments

Evaluation of machine learning algorithms and feature engineering procedures

Example of feature engineering procedures – can anyone really predict the weather?

Steps to evaluate a feature engineering procedure

Evaluating supervised learning algorithms

Evaluating unsupervised learning algorithms

Feature understanding – what’s in my dataset?

Feature improvement – cleaning datasets

Feature selection – say no to bad attributes

Feature construction – can we build it?

Feature transformation – enter math-man

Feature learning – using AI to better our AI

Summary

Feature Understanding – What's in My Dataset?

The structure or lack thereof of data

An example of unstructured data – server logs

Quantitative versus qualitative data

Salary ranges by job classification

The four levels of data

The nominal level

Mathematical operations allowed

The ordinal level

Mathematical operations allowed

The interval level

Mathematical operations allowed

Plotting two columns at the interval level

The ratio level

Mathematical operations allowed

Recap of the levels of data

Summary

Feature Improvement - Cleaning Datasets

Identifying missing values in data

The Pima Indian Diabetes Prediction dataset

The exploratory data analysis (EDA)

Dealing with missing values in a dataset

Removing harmful rows of data

Imputing the missing values in data

Imputing values in a machine learning pipeline

Pipelines in machine learning

Standardization and normalization

Z-score standardization

The min-max scaling method

The row normalization method

Putting it all together

Summary

Feature Construction

Examining our dataset

Imputing categorical features

Custom imputers

Custom category imputer

Custom quantitative imputer

Encoding categorical variables

Encoding at the nominal level

Encoding at the ordinal level

Bucketing continuous features into categories

Creating our pipeline

Extending numerical features

Activity recognition from the Single Chest-Mounted Accelerometer dataset

Polynomial features

Parameters

Exploratory data analysis

Text-specific feature construction

Bag of words representation

CountVectorizer

CountVectorizer parameters

The Tf-idf vectorizer

Using text in machine learning pipelines

Summary

Feature Selection

Achieving better performance in feature engineering

A case study – a credit card defaulting dataset

Creating a baseline machine learning pipeline

The types of feature selection

Statistical-based feature selection

Using Pearson correlation to select features