k-Nearest Neighbors Classification (k-NN)¶

-NN classification algorithm infers the class for the new feature vector by computing majority vote of the nearest observations from the training set.

Operation	Computational methods		Programming Interface
Training	Brute-force	k-d tree	train(…)	train_input	train_result
Inference	Brute-force	k-d tree	infer(…)	infer_input	infer_result

Mathematical formulation¶

Training¶

Let be the training set of -dimensional feature vectors, let be the set of class labels, where , . Given , and the number of nearest neighbors , the problem is to build a model that allows distance computation between the feature vectors in training and inference sets at the inference stage.

Training method: brute-force¶

The training operation produces the model that stores all the feature vectors from the initial training set .

Training method: k-d tree¶

The training operation builds a - tree that partitions the training set (for more details, see k-d Tree).

Inference¶

Let be the inference set of -dimensional feature vectors. Given , the model produced at the training stage and the number of nearest neighbors , the problem is to predict the label for each , , by performing the following steps:

Identify the set of the feature vectors in the training set that are nearest to with respect to the Euclidean distance.
Estimate the conditional probability for the -th class as the fraction of vectors in whose labels are equal to :

(1)¶
Predict the class that has the highest probability for the feature vector :

(2)¶

Inference method: brute-force¶

Brute-force inference method determines the set of the nearest feature vectors by iterating over all the pairs in the implementation defined order, , . The final prediction is computed according to the equations (1) and (2).

Inference method: k-d tree¶

K-d tree inference method traverses the - tree to find feature vectors associated with a leaf node that are closest to , . The set of the currently-known nearest -th neighbors is progressively updated during tree traversal. The search algorithm limits exploration of the nodes for which the distance between the and respective part of the feature space is not less than the distance between and the most distant feature vector from . Once tree traversal is finished, . The final prediction is computed according to the equations (1) and (2).

Usage example¶

Training¶

knn::model<> run_training(const table& data,
                          const table& labels) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = train(knn_desc, data, labels);

   return result.get_model();
}

Inference¶

table run_inference(const knn::model<>& model,
                    const table& new_data) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = infer(knn_desc, model, new_data);

   print_table("labels", result.get_labels());
}

Programming Interface¶

All types and functions in this section shall be declared in the oneapi::dal::knn namespace and be available via inclusion of the oneapi/dal/algo/knn.hpp header file.

Descriptor¶

template <typename Float = float,
          typename Method = method::by_default,
          typename Task = task::by_default>
class descriptor {
public:
   explicit descriptor(std::int64_t class_count,
                       std::int64_t neighbor_count);

   std::int64_t get_class_count() const;
   descriptor& set_class_count(std::int64_t);

   std::int64_t get_neighbor_count() const;
   descriptor& set_neighbor_count(std::int64_t);
};

template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default> class descriptor¶

Template Parameters

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

descriptor(std::int64_t class_count, std::int64_t neighbor_count)¶: Creates a new instance of the class with the given class_count and neighbor_count property values.

Properties

std::int64_t class_count¶

The number of classes .

Getter & Setter: std::int64_t get_class_count() const

descriptor & set_class_count(std::int64_t)
Invariants: class_count > 1

std::int64_t neighbor_count¶

The number of neighbors .

Getter & Setter: std::int64_t get_neighbor_count() const

descriptor & set_neighbor_count(std::int64_t)
Invariants: neighbor_count > 0

Method tags¶

namespace method {
   struct bruteforce {};
   struct kd_tree {};
   using by_default = bruteforce;
} // namespace method

struct bruteforce¶: Tag-type that denotes brute-force computational method.

struct kd_tree¶: Tag-type that denotes k-d tree computational method.

using by_default = bruteforce ¶: Alias tag-type for brute-force computational method.

Task tags¶

namespace task {
   struct classification {};
   using by_default = classification;
} // namespace task

struct classification¶: Tag-type that parameterizes entities used for solving classification problem.

using by_default = classification ¶: Alias tag-type for classification task.

Model¶

template <typename Task = task::by_default>
class model {
public:
   model();
};

template<typename Task = task::by_default> class model¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

model()¶: Creates a new instance of the class with the default property values.

Training `train(...)`¶

Input¶

template <typename Task = task::by_default>
class train_input {
public:
   train_input(const table& data = table{},
               const table& labels = table{});

   const table& get_data() const;
   train_input& set_data(const table&);

   const table& get_labels() const;
   train_input& set_labels(const table&);
};

template<typename Task = task::by_default> class train_input¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

train_input(const table &data = table{}, const table &labels = table{})¶: Creates a new instance of the class with the given data and labels property values.

Properties

const table &data = table{}¶

The training set .

Getter & Setter: const table & get_data() const

train_input & set_data(const table &)

const table &labels = table{}¶

Vector of labels for the training set .

Getter & Setter: const table & get_labels() const

train_input & set_labels(const table &)

Result¶

template <typename Task = task::by_default>
class train_result {
public:
   train_result();

   const model<Task>& get_model() const;
};

template<typename Task = task::by_default> class train_result¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

train_result()¶: Creates a new instance of the class with the default property values.

Properties

const model<Task> &model = model<Task>{}¶

The trained -NN model.

Getter & Setter: const model< Task > & get_model() const

Operation¶

template<typename Float, typename Method, typename Task> train_result<Task> train(const descriptor<Float, Method, Task> &desc, const train_input<Task> &input)¶

Runs the training operation for -NN classifier. For more details see oneapi::dal::train.

Template Parameters

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Parameters

desc – Descriptor of the algorithm.
input – Input data for the training operation.

Preconditions: input.data.has_data == true

input.labels.has_data == true

input.data.row_count == input.labels.row_count

input.labels.column_count == 1

input.labels[i] >= 0

input.labels[i] < desc.class_count

Inference `infer(...)`¶

Input¶

template <typename Task = task::by_default>
class infer_input {
public:
   infer_input(const model<Task>& m = model<Task>{},
               const table& data = table{});

   const model<Task>& get_model() const;
   infer_input& set_model(const model&);

   const table& get_data() const;
   infer_input& set_data(const table&);
};

template<typename Task = task::by_default> class infer_input¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

infer_input(const model<Task> &m = model<Task>{}, const table &data = table{})¶: Creates a new instance of the class with the given model and data property values.

Properties

const model<Task> &model = model<Task>{}¶

The trained -NN model.

Getter & Setter: const model< Task > & get_model() const

infer_input & set_model(const model &)

const table &data = table{}¶

The dataset for inference .

Getter & Setter: const table & get_data() const

infer_input & set_data(const table &)

Result¶

template <typename Task = task::by_default>
class infer_result {
public:
   infer_result();

   const table& get_labels() const;
};

template<typename Task = task::by_default> class infer_result¶

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

infer_result()¶: Creates a new instance of the class with the default property values.

Properties

const table &labels = table{}¶

The predicted labels.

Getter & Setter: const table & get_labels() const

Operation¶

template<typename Float, typename Method, typename Task> infer_result<Task> infer(const descriptor<Float, Method, Task> &desc, const infer_input<Task> &input)¶

Runs the inference operation for -NN classifier. For more details see oneapi::dal::infer.

Template Parameters

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Parameters

desc – Descriptor of the algorithm.
input – Input data for the inference operation.

Preconditions: input.data.has_data == true
Postconditions: result.labels.row_count == input.data.row_count

result.labels.column_count == 1

result.labels[i] >= 0

result.labels[i] < desc.class_count

k-Nearest Neighbors Classification (k-NN)¶

Mathematical formulation¶

Training¶

Training method: brute-force¶

Training method: k-d tree¶

Inference¶

Inference method: brute-force¶

Inference method: k-d tree¶

Usage example¶

Training¶

Inference¶

Programming Interface¶

Descriptor¶

Method tags¶

Task tags¶

Model¶

Training train(...)¶

Input¶

Result¶

Operation¶

Inference infer(...)¶

Input¶

Result¶

Operation¶

Training `train(...)`¶

Inference `infer(...)`¶