现在的位置: 首页 > 综合 > 正文

AdaBoost: Wikis

2013年03月31日 ⁄ 综合 ⁄ 共 4823字 ⁄ 字号 评论关闭

AdaBoost: Wikis


<a href='http://cas.criteo.com/delivery/ck.php?n=8aaca650&cb=4548c0a4b8' target='_blank'><img src='http://cas.criteo.com/delivery/avw.php?zoneid=20119&n=8aaca650&ct0=' border='0' alt='' /></a>
  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See
more info or our
list of citable articles
.

Encyclopedia

Updated live from Wikipedia, last check: April 25, 2012 08:40 UTC (45 seconds ago)

来源:http://www.thefullwiki.org/AdaBoost

From Wikipedia, the free encyclopedia

AdaBoost, short for Adaptive
Boosting
, is a
machine learning
algorithm, formulated by
Yoav Freund
and
Robert Schapire
. It is a
meta-algorithm
, and can be used in conjunction with many other learning algorithms to improve their performance. AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers.
AdaBoost is sensitive to noisy data and
outliers
. However it is less susceptible to the
overfitting
problem than most learning algorithms.

AdaBoost calls a weak classifier repeatedly in a series of rounds  t = 1,\ldots,T. For each call a distribution of weights
Dt is updated that indicates the importance of examples in the data set for the classification. On each round, the weights of each incorrectly classified example are increased (or alternatively, the
weights of each correctly classified example are decreased), so that the new classifier focuses more on those examples.

Contents

The algorithm for the binary classification task

Given: (x_{1},y_{1}),\ldots,(x_{m},y_{m}) where
x_{i} \in X,\, y_{i} \in Y = \{-1, +1\}

Initialize D_{1}(i) = \frac{1}{m}, i=1,\ldots,m.

For t = 1,\ldots,T:

  • Find the classifier h_{t} : X \to \{-1,+1\} that minimizes the error with respect to the distribution
    Dt:
h_{t} = \arg \min_{h_{j} \in \mathcal{H}} \epsilon_{j}, where
 \epsilon_{j} = \sum_{i=1}^{m} D_{t}(i)[y_i \ne h_{j}(x_{i})]
  • if εt > = 0.5 then stop.
  • Choose \alpha_{t} \in \mathbf{R}, typically
    \alpha_{t}=\frac{1}{2}\textrm{ln}\frac{1-\epsilon_{t}}{\epsilon_{t}} where
    εt is the weighted error rate of classifier
    ht.
  • Update:
D_{t+1}(i) = \frac{ D_t(i) \exp(-\alpha_t \cdot y_i \cdot h_t(x_i)) }{ Z_t }
where Zt is a normalization factor (chosen so that
Dt + 1 will be a
probability distribution
, i.e. sum one over all x).

Output the final classifier:

H(x) = \textrm{sign}\left( \sum_{t=1}^{T} \alpha_{t}h_{t}(x)\right)

The equation to update the distribution Dt is constructed so that:

e^{- \alpha_{t} y_{i} h_{t}(x_{i})} \begin{cases} <1, & y(i)=h_{t}(x_{i}) \\ >1, & y(i) \ne h_{t}(x_{i}) \end{cases}

Thus, after selecting an optimal classifier h_{t} \, for the distribution
D_{t} \,, the examples
x_{i} \, that the classifier
h_{t} \, identified correctly are weighted less and those that it identified incorrectly are weighted more. Therefore, when the algorithm is testing the
classifiers on the distribution D_{t+1} \,, it will select a classifier that better identifies those examples that the previous classifer missed.

Statistical Understanding of Boosting

Boosting can be seen as minimization of a convex loss function over a
convex set
of functions. [1] Specifically, the loss being minimized is the exponential loss

\sum_i e^{-y_i f(x_i)}

and we are seeking a function

f = αtht
  t  

See also

References

  1. ^ T. Zhang, "Convex Risk Minimization", Annals of Statistics, 2004.

External links

  • icsiboost, an open source implementation of Boostexter
  • NPatternRecognizer , a fast machine learning algorithm library written in C#. It contains support vector machine, neural networks, bayes, boost, k-nearest neighbor,
    decision tree, ..., etc.



Got something to say? Make a comment.
Your name
Your email address
Message
Please enter the solution to case below
12+8=
 

 

抱歉!评论已关闭.