In this guide we’ll be discussing the statistical concept, normalization. On the surface this can seem like a scary topic. But, in reality it’s a pretty straightforward concept, once you’re familiar with it. The definition of Normalization from Wikipedia isn't the most straightforward, but here it is.
"In the simplest cases, normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging. In more complicated cases, normalization may refer to more sophisticated adjustments where the intention is to bring the entire probability distributions of adjusted values into alignment. In the case of normalization of scores in educational assessment, there may be an intention to align distributions to a normal distribution. A different approach to normalization of probability distributions is quantile normalization, where the quantiles of the different measures are brought into alignment."
In itself that doesn’t make a whole lot of sense, and Wikipedia claims that's the simplest case.. So instead let's take some more straightforward case studies, that are more relatable. To start off, let's imagine that you need to compare temperatures from cities around the world. If you measure NYC, the data will most likely be in fahrenheit. And in Poland, temperatures will be listed in celsius. At its most basic, the normalization process will take the data from each city and convert the temperatures by reducing measurements to a "neutral" or "standard" scale. I’d like to think that's a little more straightforward than the one given by Wikipedia. I’d also like to mention that moving forward you may also hear the term standardization.
In the machine learning world, normalization is a key component, especially when it comes to data processing tasks and this process is called feature scaling. In short feature scaling is a data preprocessing technique that is used to normalize the range of independent variables or features of data. Some of the more common methods of feature scaling include:
This replaces the values by how many standard deviations an element is from the mean. The features are redistributed with their mean equal to zero and standard deviation equal to one.
This distribution will have all the elements between negative one and one, with a mean of zero.
Scaling to Unit Length (Unit Vector):
Another option that is widely used in machine learning is to scale the components of a vector to result in a magnitude of one. For those of you that have taken physics or three dimensional calculus you have probably seen this method used quite a bit.
Lastly, this is probably the simplest method, which rescales the features in the range zero to one.
The same rescaling method can be used without zero and one as boundaries. Instead, rescaling can be done on an arbitrary range, set by values a and b.
The overall purpose of normalizing data is to increase the performance of running various machine learning algorithms. This is made possible because feature scaling limits the range that the algorithm will need to look over. And before we wrap this section up I’d like to show you the difference between an original data frame, a min-max normalized data frame, and a standardized data frame. We won’t be going over the actual code, but I have attached it at the end of this guide for you to look at.
Our first example is our original data frame which shows the exam scores and final grade for five students. I’d also like to mention this data frame consists of 300 students and 15 hundred elements. So this is just a small sample of the data.
Now let’s take a look at the data after it’s been normalized.
So now all our data lies within the range of zero to one. Now let’s see what a standardized data frame looks like.
So here it is, and if you remember when you use standardization the mean off the data becomes zero and the other elements are set to how far they deviate from the mean. So there you have it, a few examples of how machine learning developers use feature scaling to help with the speed and efficiency of processing data.
Feature Scaling: Wikipedia