Project Pages - An Integrated Scientific Blogging Template

Natsuki; Ahmet Cecen; David Brough; Surya R. Kalidindi

CS231n笔记-Lec2 K-近邻、交叉验证与线性回归

K-近邻算法——找到K个与测试样例相近的训练集样例，然后投票决定测试样例所属类别

训练时间短，测试时间长，因此实践中几乎不采用。
可以采用L1距离或者L2距离。L2比L1对两个向量的差要求更严格（the L2 distance prefers many medium disagreements to one big one）。采用距离实际上会收到很大部分的背景影响，因此单使用距离是不够的。
适合低维数据时使用，其在处理高维数据时使用的距离不能直观体现出来，因此其不用于图像分类。
如何使用kNN:Summary: Applying kNN in practice

If you wish to apply kNN in practice (hopefully not on images, or perhaps as only a baseline) proceed as follows:
1. Preprocess your data: Normalize the features in your data (e.g. one pixel in images) to have zero mean and unit variance. We will cover this in more detail in later sections, and chose not to cover data normalization in this section because pixels in images are usually homogeneous and do not exhibit widely different distributions, alleviating the need for data normalization.
2. If your data is very high-dimensional, consider using a dimensionality reduction technique such as PCA (wiki ref, CS229ref, blog ref) or even Random Projections.
3. Split your training data randomly into train/val splits. As a rule of thumb, between 70-90% of your data usually goes to the train split. This setting depends on how many hyperparameters you have and how much of an influence you expect them to have. If there are many hyperparameters to estimate, you should err on the side of having larger validation set to estimate them effectively. If you are concerned about the size of your validation data, it is best to split the training data into folds and perform cross-validation. If you can afford the computational budget it is always safer to go with cross-validation (the more folds the better, but more expensive).
4. Train and evaluate the kNN classifier on the validation data (for all folds, if doing cross-validation) for many choices of k (e.g. the more the better) and across different distance types (L1 and L2 are good candidates)
5. If your kNN classifier is running too long, consider using an Approximate Nearest Neighbor library (e.g. FLANN) to accelerate the retrieval (at cost of some accuracy).
6. Take note of the hyperparameters that gave the best results. There is a question of whether you should use the full training set with the best hyperparameters, since the optimal hyperparameters might change if you were to fold the validation data into your training set (since the size of the data would be larger). In practice it is cleaner to not use the validation data in the final classifier and consider it to be burned on estimating the hyperparameters. Evaluate the best model on the test set. Report the test set accuracy and declare the result to be the performance of the kNN classifier on your data.

交叉验证

实际上交叉验证非常消耗资源
只有当验证集样例少的时候我们采用交叉验证

线性回归

模型：$f(x_i, W, b) = Wx_i + b$， $y_i$为真实标签

三种解释：
1. 各个像素点值的加权平均
2. 从模板匹配的角度：$W$的每一行都关联着一个类的模板，图像属于每个类的得分通过比较图像与模板的内积获得，取最合适的类。
3. 将其作为一种有效的近邻，但是每个类只使用一个图像而不是很多图像，然后使用内积来代替L1距离或者L2距离。
局限性（举例）：对于一张马的图像，其马头有多种朝向，线性回归无法处理；对于多种颜色的汽车，线性回归无法处理。这些因素在神经网络中可以用中间层神经元来处理。
吸收$b$ 以及 Normalization

多类别支持向量机损失

第$i$个图像的损失：$L_i = \sum \limits_{j \neq y_i} max(0, s_j - s_{y_i} + \Delta)$，正确类的得分要比错误类的得分多$\Delta$。其中$s_j = w_j^Tx_i$。

hinge loss: $max(0, -)$
squared hinge loss SVM(L2-SVM): $max(0, -)^2$

正则化

使用上述损失函数会使得$W$不唯一，如果$W$是最优解，那么$\lambda W $也是最优解，为了消除这种模糊性，我们引入正则化(Regularization)，即： $R(W)=\sum_k\sum_lW_{k,l}^2$ ，引入正则化后的多类别SVM损失为： $L = \underbrace{\frac{1}{N}\sum_i L_i}_{data\ loss} + \underbrace{\lambda R(W)}_{regularization\ loss}$

惩罚值大的$W$可以提高泛化性，例如输入向量为$x = [1, 1, 1, 1]$，两个权重向量为$w_1=[1, 0, 0, 0], w_2=[0.25, 0.25, 0.25, 0.25]$，它们所得点积都一样，但$w_2$的惩罚要更小。可以看出L2惩罚更偏好值小且分散的权重向量，这样可以均衡地考虑每一个分量。

Softmax 分类器

交叉熵损失函数：
$H(p, q) =-\sum_x p(x)\log q(x)$
其中$p(x)$为真实标签值，$q(x)=\frac{e^{f_{y_i}}}{\sum_j e^{f_j}}$为估计值。
KL散度（Kullback-Leibler divergence）、最大似然估计（Maximum Likelihood Estimation， MLE）、最大后验估计（Maximum a posteriori estimation， MAP）
工程指导：由于softmax函数的数值很大，为了提高数值稳定性，实际中采用$\frac{e^{f{y_i} + \log C}}{\sum_j e^{f{y_j} + \log C}}$, $C$通常设为$-\max_j f_j$。

SVM vs. Softmax

线性分类中两种常见损失函数。

softmax分类器会给出概率，但其概率受正则化项的影响。
SVM只关注特定差距内的误差，而Softmax会计算所有误差。