Home » » What is feature engineering?

What is feature engineering?

Just read a post from http://blog.bigml.com/2013/02/21/everything-you-wanted-to-know-about-machine-learning-but-were-too-afraid-to-ask-part-two/
What is Feature Engineering?
teachFeature Engineering is the Keyer_key
We’ll start out this time with a topic that is so important that it deserves an instructive example. In fact, Domingos calls it “easily the most important factor” in determining the success of a machine learning project, and I agree with him.
Suppose you have a dataset in which you have pairs of cities, coupled with a prediction of whether most people would consider the two cities to be comfortably drivable within a single day. You’ve got a nice database with the longitudes and latitudes for all of your cities, and so these are your input fields. Your dataset might look something like this (note that the values don’t correspond to actual city locations – they are random):
CITY 1 LAT.CITY 1 LNG.CITY 2 LAT.CITY 2 LNG.DRIVABLE?
123.2446.71121.3347.34Yes
123.2456.91121.3355.23Yes
123.2446.71121.3355.34No
123.2446.71130.9947.34No
And you expect to construct a model that can predict for any two cities whether the distance is drivable or not.
Probably not going to happen.
The problem here is that no single input field, or even any single pair of fields, is closely correlated with the objective. It is a combination of all four fields (the distance from one pair of geo-coordinates to the other), and a combination by a fairly complex formula, that is correlated with the input. Machine learning algorithms are limited in the way they can combine input fields; if they weren’t, they could totally exhaust themselves trying everything.
But all is not lost! Even if the machine doesn’t have any knowledge about how longitudes and latitudes work, you do. So why don’t you do it? Apply the formula to each instance and get a dataset like this (again, random values):
DISTANCE (MI.)DRIVABLE?
14Yes
28Yes
705No
2432No
Ah. Much more manageable. This is what we mean by feature engineering. It’s when you use your knowledge about the data to create fields that make machine learning algorithms work better.
Domingos mentions that feature engineering is where “most of the effort in a machine learning project goes”. I couldn’t agree more. In my career, I would say an average of 70% of the project’s time goes into feature engineering, 20% goes towards figuring out what comprises a proper and comprehensive evaluation of the algorithm, and only 10% goes into algorithm selection and tuning.
How does one engineer a good feature? One good rule of thumb is to try to design features where the likelihood of a certain class goes upmonotonically with the value of the field. So in our example above, “drivable = no” is more likely as distance increases, but that’s not true of longitude or latitude. You probably won’t be able to engineer a feature where this is strictly true, but it is a good feature even if it is somewhat close to that ideal.
Typically, there isn’t a single data transformation that makes learning immediately easy (as there was in the above example), but at least as typically there are one or more things you can do to the data to make machine learning easier. There’s no formula for this, and a lot of it happens by itch and by twitch. BigML attempts to do some of the easy ones for you (automated date parsing is an example) but far more interesting transformations can happen with detailed knowledge of your specific data. Great things happen in machine learning when human and machine work together, combining a person’s knowledge of how to create relevant features from the data with the machine’s talent for optimization.

206 Comments:

«Oldest   ‹Older   201 – 206 of 206   Newer›   Newest»
Steve watson said...

I am glad to discover this page. Thank you that I had a great read! Get professional nursing assignment help​ with well-researched content, accurate referencing, and expert academic support for essays, case studies, and reports.

narthan berlin said...

Great post! I appreciate the useful insights. Get professional report writing service with well-structured, research-based, and plagiarism-free content. Ensure timely delivery and high-quality academic results.

newmansbryants@gmail.com said...

derma clinic sotwareThe shift toward a paperless dermatology practice UAE is no longer a luxury for elite centers in Dubai. It has become a survival strategy for every dermatology clinic across the Emirates. For years, specialists used generic medical software that forced skin health data into rigid boxes meant for general practitioners.

Sophia Baker said...

Excellent explanation of feature engineering and its importance in machine learning projects. The way complex concepts were broken down into simple examples makes this post very informative for beginners as well as experienced developers. Understanding how data transformation improves model performance is essential for researchers, programmers, and even assessment helper professionals working on technical and data-driven academic tasks. Thanks for sharing such valuable insights!

Ashish said...

Looking for reliable Coursework Help to manage your academic tasks effectively? India Assignment Help provides expert assistance for coursework across various subjects and academic levels. Our experienced writers deliver plagiarism-free, well-researched, and customized content that meets university guidelines. With affordable pricing, timely delivery, and dedicated support, India Assignment Help helps students improve their grades and complete coursework with confidence.

thomassmith said...

This is a very clear explanation of feature engineering and why it plays such an important role in machine learning workflows. The idea of transforming raw data into meaningful input features really highlights how much impact good data preparation has on model performance. It’s interesting how much domain knowledge and experimentation go into creating useful features, whether it’s handling missing values, encoding variables, or deriving new attributes from existing data. Without this step, even advanced algorithms can struggle to produce accurate results. Many learners and professionals who work with data systems also rely on a DBMS assignment writing service to structure technical documentation, organize database-related concepts, and present academic work more effectively.

«Oldest ‹Older   201 – 206 of 206   Newer› Newest»

Popular Posts