Implement the function compute_distances_two_loops that uses a (very inefficient) double loop over all pairs of (test, train) examples and computes the distance matrix one element at a time.
If there are Ntr training examples and Nte test examples, this stage should result in a Nte x Ntr matrix where each element (i,j) is the distance between the i-th test and j-th train example.
defcompute_distances_two_loops(self,X):""" Inputs: - X: A numpy array of shape (num_test, D) containing test data. Returns: - dists: A numpy array of shape (num_test, num_train) where dists[i, j] is the Euclidean distance between the ith test point and the jth training point. """num_test=X.shape[0]num_train=self.X_train.shape[0]dists=np.zeros((num_test,num_train))foriinrange(num_test):forjinrange(num_train):dists[i,j]=np.sqrt(np.sum(np.square(self.X_train[j,:]-X[i,:])))returndists
defpredict_labels(self,dists,k=1):""" Given a matrix of distances between test points and training points, predict a label for each test point. Inputs: - dists: A numpy array of shape (num_test, num_train) where dists[i, j] gives the distance between the ith test point and the jth training point. Returns: - y_pred: A numpy array of shape (num_test,) containing predicted labels for the test data, where y_pred[i] is the predicted label for test point X[i]. """num_test=dists.shape[0]y_pred=np.zeros(num_test)foriinrange(num_test):closest_y=[]row=dists[i,:]sorted_indices=np.argsort(row)# Indices that sort the distances (small to large)# Pick the labels of the k nearest neighborsforindexinrange(k):closest_y.append(self.y_train[sorted_indices[index]])# Count the occurrences of each label among the k nearest neighborscounts={}forelementinclosest_y:ifelementnotincounts:counts[element]=1else:counts[element]+=1# Determine final label by majority votelabel=0forkeyincounts:iflabelnotincounts:# If label hasn't been assigned yetlabel=keyifcounts[key]>counts[label]:# Found a label with higher countlabel=keyifcounts[key]==counts[label]andkey<label:label=keyy_pred[i]=labelreturny_pred
Other Operations
Subsample the Data
Subsample the data for more efficient code execution in this exercise.
# Pick out the first 5000 training samplesnum_training=5000mask=list(range(num_training))X_train=X_train[mask]y_train=y_train[mask]# Pick out the first 500 testing samplesnum_test=500mask=list(range(num_test))X_test=X_test[mask]y_test=y_test[mask]# Reshape the image data into rowsX_train=np.reshape(X_train,(X_train.shape[0],-1))X_test=np.reshape(X_test,(X_test.shape[0],-1))print(X_train.shape,X_test.shape)