Loss Functions
阅读信息
202 词 2 分钟 本页总访问量 加载中...
Parametric Approach
e.g.
- Hard cases: Cases with no linear boundaries.
Loss function
Given a dataset of examples \(\{(x_i, y_i)\}_{i = 1}^n\), loss over the dataset is a sum of loss over examples:
Multiclass SVM Loss (Hinge loss)
\(y_i\) are integers, \(s_i\) is the \(j\)-th class prediction of data \(x_i\), \(L_i = \sum_{j \neq y_i}\max \{0, s_j - s_{y_i} + 1\}\) (s -> score). \(1\) here is a threshold, but it can be chosen randomly.

| Python | |
|---|---|
The regularization
Model should be simple, so it works on test data. (防止过拟合)
\(\lambda\): regularization strength. \(R(W)\) allows the model to choose a simple model (It can be understood as reducing the degree of the fitted curve)
- L2 regularization: \(R(W) = \Vert W \Vert_F\)
- L1 regularization: \(R(W) = \Vert W \Vert_1\) (May be better)
- Elastic net: \(R(W) = \beta \Vert W \Vert_F + \Vert W \Vert_1\)
- Max norm regularization
- Dropout
- Fancier
Bias Variance Tradeoff
Let \(L(W) = E(\hat{y} - y)^2\). Then we have
Considering the variance can reduce overfiting.
Softmax Loss (Cross-entropy loss)
where \(s = f(x_i, W)\), which will normalize the score vector \(s\). We want to minimize the negative log likelihood of the correct class, so the loss function would be
Actually it is the Kullback-Leibler Divergence: (\(Q, P\) are two discrete probability distributions)