k-Nearest Neighbors Classification (k-NN)¶
k-NN classification algorithm infers the class for the new feature vector by computing majority vote of the k nearest observations from the training set.
Operation |
Computational methods |
Programming Interface |
|||
Mathematical formulation¶
Training¶
Let X={x1,…,xn} be the training set of p-dimensional feature vectors, let Y={y1,…,yn} be the set of class labels, where yi∈{0,…,c−1}, 1≤i≤n. Given X, Y and the number of nearest neighbors k, the problem is to build a model that allows distance computation between the feature vectors in training and inference sets at the inference stage.
Training method: brute-force¶
The training operation produces the model that stores all the feature vectors from the initial training set X.
Inference¶
Let X′={x′1,…,x′m} be the inference set of p-dimensional feature vectors. Given X′, the model produced at the training stage and the number of nearest neighbors k, the problem is to predict the label y′j for each x′j, 1≤j≤m, by performing the following steps:
Identify the set N(x′j)⊆X of the k feature vectors in the training set that are nearest to x′j with respect to the Euclidean distance.
Estimate the conditional probability for the l-th class as the fraction of vectors in N(x′j) whose labels yj are equal to l:
Predict the class that has the highest probability for the feature vector x′j:
Inference method: brute-force¶
Brute-force inference method determines the set N(x′j) of the nearest feature vectors by iterating over all the pairs (x′j,xi) in the implementation defined order, 1≤i≤n, 1≤j≤m. The final prediction is computed according to the equations (1) and (2).
Inference method: k-d tree¶
K-d tree inference method traverses the k-d tree to find feature vectors associated with a leaf node that are closest to x′j, 1≤j≤m. The set ˜n(x′j) of the currently-known nearest k-th neighbors is progressively updated during tree traversal. The search algorithm limits exploration of the nodes for which the distance between the x′j and respective part of the feature space is not less than the distance between x′j and the most distant feature vector from ˜n(x′j). Once tree traversal is finished, ˜n(x′j)≡N(x′j). The final prediction is computed according to the equations (1) and (2).
Usage example¶
Training¶
knn::model<> run_training(const table& data,
const table& labels) {
const std::int64_t class_count = 10;
const std::int64_t neighbor_count = 5;
const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};
const auto result = train(knn_desc, data, labels);
return result.get_model();
}
Inference¶
table run_inference(const knn::model<>& model,
const table& new_data) {
const std::int64_t class_count = 10;
const std::int64_t neighbor_count = 5;
const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};
const auto result = infer(knn_desc, model, new_data);
print_table("labels", result.get_labels());
}
Programming Interface¶
All types and functions in this section shall be declared in the
oneapi::dal::knn namespace and be available via inclusion of the
oneapi/dal/algo/knn.hpp header file.
Descriptor¶
template <typename Float = float,
typename Method = method::by_default,
typename Task = task::by_default>
class descriptor {
public:
explicit descriptor(std::int64_t class_count,
std::int64_t neighbor_count);
std::int64_t get_class_count() const;
descriptor& set_class_count(std::int64_t);
std::int64_t get_neighbor_count() const;
descriptor& set_neighbor_count(std::int64_t);
};
-
template<typename
Float= float, typenameMethod= method::by_default, typenameTask= task::by_default>
classdescriptor¶ - Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be
floatordouble.Method – Tag-type that specifies an implementation of algorithm. Can be
method::bruteforceormethod::kd_tree.Task – Tag-type that specifies type of the problem to solve. Can be
task::classification.
Constructors
-
descriptor(std::int64_t class_count, std::int64_t neighbor_count)¶ Creates a new instance of the class with the given
class_countandneighbor_countproperty values.
Properties
-
std::int64_t
class_count¶ The number of classes c.
- Getter & Setter
std::int64_t get_class_count() constdescriptor & set_class_count(std::int64_t)- Invariants
class_count > 1
-
std::int64_t
neighbor_count¶ The number of neighbors k.
- Getter & Setter
std::int64_t get_neighbor_count() constdescriptor & set_neighbor_count(std::int64_t)- Invariants
neighbor_count > 0
Method tags¶
namespace method {
struct bruteforce {};
struct kd_tree {};
using by_default = bruteforce;
} // namespace method
-
struct
bruteforce¶ Tag-type that denotes brute-force computational method.
-
using
by_default= bruteforce¶ Alias tag-type for brute-force computational method.
Task tags¶
namespace task {
struct classification {};
using by_default = classification;
} // namespace task
-
struct
classification¶ Tag-type that parameterizes entities used for solving classification problem.
-
using
by_default= classification¶ Alias tag-type for classification task.
Model¶
template <typename Task = task::by_default>
class model {
public:
model();
};
-
template<typename
Task= task::by_default>
classmodel¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::classification.
Constructors
-
model()¶ Creates a new instance of the class with the default property values.
Training train(...)¶
Input¶
template <typename Task = task::by_default>
class train_input {
public:
train_input(const table& data = table{},
const table& labels = table{});
const table& get_data() const;
train_input& set_data(const table&);
const table& get_labels() const;
train_input& set_labels(const table&);
};
-
template<typename
Task= task::by_default>
classtrain_input¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::classification.
Constructors
-
train_input(const table &data = table{}, const table &labels = table{})¶ Creates a new instance of the class with the given
dataandlabelsproperty values.
Properties
-
const table &
data= table{}¶ The training set X.
- Getter & Setter
const table & get_data() consttrain_input & set_data(const table &)
-
const table &
labels= table{}¶ Vector of labels y for the training set X.
- Getter & Setter
const table & get_labels() consttrain_input & set_labels(const table &)
Result¶
template <typename Task = task::by_default>
class train_result {
public:
train_result();
const model<Task>& get_model() const;
};
-
template<typename
Task= task::by_default>
classtrain_result¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::classification.
Constructors
-
train_result()¶ Creates a new instance of the class with the default property values.
Properties
Operation¶
-
template<typename
Float, typenameMethod, typenameTask>
train_result<Task>train(const descriptor<Float, Method, Task> &desc, const train_input<Task> &input)¶ Runs the training operation for k-NN classifier. For more details see
oneapi::dal::train.- Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be
floatordouble.Method – Tag-type that specifies an implementation of algorithm. Can be
method::bruteforceormethod::kd_tree.Task – Tag-type that specifies type of the problem to solve. Can be
task::classification.
- Parameters
desc – Descriptor of the algorithm.
input – Input data for the training operation.
- Preconditions
Inference infer(...)¶
Input¶
template <typename Task = task::by_default>
class infer_input {
public:
infer_input(const model<Task>& m = model<Task>{},
const table& data = table{});
const model<Task>& get_model() const;
infer_input& set_model(const model&);
const table& get_data() const;
infer_input& set_data(const table&);
};
-
template<typename
Task= task::by_default>
classinfer_input¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::classification.
Constructors
-
infer_input(const model<Task> &m = model<Task>{}, const table &data = table{})¶ Creates a new instance of the class with the given
modelanddataproperty values.
Properties
-
const model<Task> &
model= model<Task>{}¶ The trained k-NN model.
- Getter & Setter
const model< Task > & get_model() constinfer_input & set_model(const model &)
-
const table &
data= table{}¶ The dataset for inference X′.
- Getter & Setter
const table & get_data() constinfer_input & set_data(const table &)
Result¶
template <typename Task = task::by_default>
class infer_result {
public:
infer_result();
const table& get_labels() const;
};
-
template<typename
Task= task::by_default>
classinfer_result¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::classification.
Constructors
-
infer_result()¶ Creates a new instance of the class with the default property values.
Properties
-
const table &
labels= table{}¶ The predicted labels.
- Getter & Setter
const table & get_labels() const
Operation¶
-
template<typename
Float, typenameMethod, typenameTask>
infer_result<Task>infer(const descriptor<Float, Method, Task> &desc, const infer_input<Task> &input)¶ Runs the inference operation for k-NN classifier. For more details see
oneapi::dal::infer.- Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be
floatordouble.Method – Tag-type that specifies an implementation of algorithm. Can be
method::bruteforceormethod::kd_tree.Task – Tag-type that specifies type of the problem to solve. Can be
task::classification.
- Parameters
desc – Descriptor of the algorithm.
input – Input data for the inference operation.
- Preconditions
input.data.has_data == true- Postconditions