Home » » What is feature engineering?

What is feature engineering?

Just read a post from http://blog.bigml.com/2013/02/21/everything-you-wanted-to-know-about-machine-learning-but-were-too-afraid-to-ask-part-two/
What is Feature Engineering?
teachFeature Engineering is the Keyer_key
We’ll start out this time with a topic that is so important that it deserves an instructive example. In fact, Domingos calls it “easily the most important factor” in determining the success of a machine learning project, and I agree with him.
Suppose you have a dataset in which you have pairs of cities, coupled with a prediction of whether most people would consider the two cities to be comfortably drivable within a single day. You’ve got a nice database with the longitudes and latitudes for all of your cities, and so these are your input fields. Your dataset might look something like this (note that the values don’t correspond to actual city locations – they are random):
CITY 1 LAT.CITY 1 LNG.CITY 2 LAT.CITY 2 LNG.DRIVABLE?
123.2446.71121.3347.34Yes
123.2456.91121.3355.23Yes
123.2446.71121.3355.34No
123.2446.71130.9947.34No
And you expect to construct a model that can predict for any two cities whether the distance is drivable or not.
Probably not going to happen.
The problem here is that no single input field, or even any single pair of fields, is closely correlated with the objective. It is a combination of all four fields (the distance from one pair of geo-coordinates to the other), and a combination by a fairly complex formula, that is correlated with the input. Machine learning algorithms are limited in the way they can combine input fields; if they weren’t, they could totally exhaust themselves trying everything.
But all is not lost! Even if the machine doesn’t have any knowledge about how longitudes and latitudes work, you do. So why don’t you do it? Apply the formula to each instance and get a dataset like this (again, random values):
DISTANCE (MI.)DRIVABLE?
14Yes
28Yes
705No
2432No
Ah. Much more manageable. This is what we mean by feature engineering. It’s when you use your knowledge about the data to create fields that make machine learning algorithms work better.
Domingos mentions that feature engineering is where “most of the effort in a machine learning project goes”. I couldn’t agree more. In my career, I would say an average of 70% of the project’s time goes into feature engineering, 20% goes towards figuring out what comprises a proper and comprehensive evaluation of the algorithm, and only 10% goes into algorithm selection and tuning.
How does one engineer a good feature? One good rule of thumb is to try to design features where the likelihood of a certain class goes upmonotonically with the value of the field. So in our example above, “drivable = no” is more likely as distance increases, but that’s not true of longitude or latitude. You probably won’t be able to engineer a feature where this is strictly true, but it is a good feature even if it is somewhat close to that ideal.
Typically, there isn’t a single data transformation that makes learning immediately easy (as there was in the above example), but at least as typically there are one or more things you can do to the data to make machine learning easier. There’s no formula for this, and a lot of it happens by itch and by twitch. BigML attempts to do some of the easy ones for you (automated date parsing is an example) but far more interesting transformations can happen with detailed knowledge of your specific data. Great things happen in machine learning when human and machine work together, combining a person’s knowledge of how to create relevant features from the data with the machine’s talent for optimization.

12 Comments:

Maria Lena said...

Feature engineering is the way toward utilizing space information to separate highlights from crude information through information mining procedures. These highlights can be utilized to improve the exhibition of AI calculations. Highlight building can be considered as applied AI itself. law essay writing service

Ariel Wilson said...

I am a writer in and provide the lecture on law dissertation topics and your blog is really helpful for me so thank you so much for sharing your experience with us.

Unknown said...

Amazing post, Thanks for sharing this wonderful information!
Top Dissertation Writing Services UK

Data Science Course in Bangalore said...

Actually I read it yesterday I looked at most of your posts but I had some ideas about it . This article is probably where I got the most useful information for my research and today I wanted to read it again because it is so well written.
Data Science Course in Bangalore

Rona Wedmore said...

Feature engineering is quite interesting and fun to do as I have studied it all my life, however I took a different elective this semester of accounting for more credit hours and have no idea what to do as I have an assignment to write up. After weeks of despair and searching I googled best accounting essay writers UK and found my knight in shining armor to save me from this trouble. Now I have the best grade and can relax easily.

Website Development Company said...

I always like to read a quality content having accurate information regarding the subject and the same thing I found in this post.

Website Development Company |

Craigslist Posting Service for Car Dealers said...

Great information, i was searching of this kind of information, thank you very much for sharing with us.

Craigslist Posting Service for Car Dealers |

Lubbock moving company said...

This is really amazing website that I have been found on google regarding website Blog Commenting sites. and I would like to thank admin who also given us to post the link on his side.

Lubbock moving company |

Car Auction Software said...

This was something I was looking for, really helpful, and great work is done. Thank you so much for sharing such valuable information.

Car Auction Software |

Best CRM for Small Businesses said...

It’s really a cool and helpful piece of information. I am glad that you shared useful information with us. Please keep us up to date like this. Thanks for sharing.

Best CRM for Small Businesses |

Web Development Company in Gwalior said...

I am really like it very much for the interesting info in this blog that to this website is providing the wonderful info in this blog that to utilize the great technology in this blog.

Web Development Company in Gwalior |

Marriage Garden in Gwalior said...

Thank you very much for writing this blog. It was very easy to understand.

Marriage Garden in Gwalior |

Popular Posts