Skip to main content

k-NN

Definition​

The k-Nearest Neighbors (k-NN) algorithm is a simple yet powerful supervised machine learning algorithm used for classification and regression tasks. It works by finding the k closest data points in the training set to a given input data point and then making predictions based on the labels or values of those neighboring points

Practice​

knn_predict(input_point, training_data, k, task_type):
// Step 1: Calculate distances
distances = []
for point in training_data:
distance = calculate_distance(input_point, point)
distances.append((point, distance))

// Step 2: Sort distances
sorted_distances = sort(distances, by=distance_value)

// Step 3: Select k nearest neighbors
nearest_neighbors = sorted_distances[:k]

// Step 4: Perform prediction based on task type
if task_type == "classification":
predicted_class = majority_vote(nearest_neighbors)
return predicted_class
elif task_type == "regression":
predicted_value = calculate_average(nearest_neighbors)
return predicted_value

calculate_distance(point1, point2):
// Implementation of distance calculation (e.g., Euclidean, Manhattan)
// Returns the distance value

distance_value(element):
// Returns the distance value of the element (used for sorting)

majority_vote(neighbors):
// Step 5a: Count occurrences of each class
class_counts = {}
for neighbor in neighbors:
neighbor_class = neighbor[0].class
if neighbor_class in class_counts:
class_counts[neighbor_class] += 1
else:
class_counts[neighbor_class] = 1

// Step 5b: Find the class with the highest count
max_count = 0
majority_class = None
for cls, count in class_counts.items():
if count > max_count:
max_count = count
majority_class = cls
return majority_class

calculate_average(neighbors):
// Step 5: Calculate the average (or weighted average) of target values
total_value = 0
for neighbor in neighbors:
total_value += neighbor[0].value // Assuming the value is stored in neighbor[0]
average_value = total_value / len(neighbors)
return average_value