Why data normalization in SVM

现在的位置: 首页 > 综合 > 正文

Why data normalization in SVM

2019年09月30日 ⁄ 综合 ⁄ 共 2415字 ⁄ 字号小中大 ⁄ 评论关闭

Data normalization is generally performed during the data pre-processing step.

1. why we need normalization

There are two major reasons that data normalization is so essential for machine learning algorithm.

Data normalization can promote the performance in common machine learning problems.

Most classifiers will calculate the Euclidean distance between two points. If one of the features has a broad range of values, the distance will be governed by this particular feature. Thus, the range of all features should be normalized
so that each feature contributes approximately proportionately to the final distance.

Data normalization can speed up the coverage of gradient descent algorithm.

Let's illustrate this using a screenshot from Andrew's machine learning course

2. how to normalize data

Three common methods are used to perform feature normalization in machine learning algorithms.

Rescaling

The simplest method is rescaling the range of features by linear function. The common formula is given as:

$x'=\frac{x-min\left(x\right)}{max\left(x\right)-min\left(x\right)}(1)$

$x'=\frac{2x-max\left(x\right)-min\left(x\right)}{max\left(x\right)-min\left(x\right)}(2)$

where $x$ is the original value, $x'$ is the
normalized value.

The equation (1) rescales data into [0,1], and the equation (2) rescales data into [-1,1].

Note: the parameters $max(x)$ and $min(x)$ should be computed in the training data only, but will be used in the
training, validation, and testing data later.

There are also some methods to normalize the features using non-linear function, such as

logarithmic function: $x'=log_{10}(x)$

inverse tangent function: $x'=\frac{2}{\pi}arctan(x)$

sigmoid function: $x'=\frac{1}{1+e^(-x)}$

Standardization

Feature standardization makes the values of each feature in the data have zero-mean and unit-variance. This method is widely used for normalization in many machine learning algorithms (e.g.,
support vector machines,logistic regression, and neural networks). The general formula is given as:

$x'=\frac{x-\bar{x}}{\sigma}$

where $\sigma$ is the standard deviation of the feature $x$ .

Scaling to unit length

Another option that is widely used in machine-learning is to scale the components of a feature vector such the complete vector has length one:

$x'=\frac{x}{||x||}$

This is especially important if the Scalar Metric is used as a distance measure in the following learning steps.

3. Some cases you don't need data normalization

3.1 using a similarity function instead of distance function

You can propose a similarity function rather than a distance function and plug it in a kernel (technically this function must generate positive-definite matrices).

3.2 random reforest

Random forest never compare one feature with another in magnitude, so the ranges don't matter.

Reference

[1] http://en.wikipedia.org/wiki/Feature_scaling

[2] http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=03.1-LinearRegressionII-FeatureScaling

[3] http://stats.stackexchange.com/questions/57010/is-it-essential-to-do-normalization-for-svm-and-random-forest