Lec 02 Image Classification

阅读信息

348 词 2 分钟本页总访问量加载中...

About the dataset

The dataset is CIFAR-10, which contains 10 categories of images.

Two functions: train and predict

train is used to memorize and fit the training data.

predict takes a new image and the number k, finds the nearest neighbors, and outputs the predicted label.

Comparing a new image with its k nearest neighbors

L1 distance (Manhattan): sum the absolute difference of every corresponding pixel value.
Issue: each pixel only affects the result locally, so it is sensitive to noise.
Fix: use more neighbors k; adding weights can average out noise so decision boundaries become smoother.

L2 distance (Euclidean): square the difference in every dimension and then sum.
Large deviations on any pixel get magnified, so it is even more affected by outliers than L1.

k-NN is generally slow at query time and very sensitive to irrelevant or noisy features; high dimensionality worsens this.

Choosing hyperparameters

Hyperparameters are set before learning; the question is how to choose them. We can try different values, but judging them by training accuracy alone can be misleading because we care about performance on unseen data.

So we split data into train, validation, and test: train on train, evaluate hyperparameters on validation to pick the best setting, then run once on test to report results.

Another option is cross-validation: split data into a series of folds; in each round choose one fold as validation and train on the others, then average the results. This is common on small datasets.

Parametric model

A parametric model is \(f(x, W)\), composed of input \(x\) (an image) and weights \(W\) that produce scores for each class. For example, \(f(x, W) = W x + b\). Suppose there are \(n\) pixels and \(m\) classes: \(x\) is \(n \times 1\), \(W\) is \(m \times n\) with one row per class storing its weights, the output \(f(x, W)\) is a score vector, and \(b\) is \(n \times 1\).

In k-nearest neighbors, test time depends on all training samples. In a parametric model, the training phase stores knowledge in \(W\), so at test time only \(W\) is needed.

An image is a point in a high-dimensional space. A linear classifier draws hyperplanes to divide the space into regions assigned to different categories. However, some problems (e.g., judging whether a number is even or odd) are not linearly separable, so a linear classifier cannot solve them.