k-Nearest Neighbors Classification (k-NN)#

k-NN classification algorithm infers the class for the new feature vector by computing majority vote of the k nearest observations from the training set.

Operation

Computational methods

Programming Interface

Training

Brute-force

k-d tree

train(…)

train_input

train_result

Inference

Brute-force

k-d tree

infer(…)

infer_input

infer_result

Mathematical formulation#

Training#

Let X={x1,,xn} be the training set of p-dimensional feature vectors, let Y={y1,,yn} be the set of class labels, where yi{0,,c1}, 1in. Given X, Y and the number of nearest neighbors k, the problem is to build a model that allows distance computation between the feature vectors in training and inference sets at the inference stage.

Training method: brute-force#

The training operation produces the model that stores all the feature vectors from the initial training set X.

Training method: k-d tree#

The training operation builds a k-d tree that partitions the training set X (for more details, see k-d Tree).

Inference#

Let X={x1,,xm} be the inference set of p-dimensional feature vectors. Given X, the model produced at the training stage and the number of nearest neighbors k, the problem is to predict the label yj for each xj, 1jm, by performing the following steps:

  1. Identify the set N(xj)X of the k feature vectors in the training set that are nearest to xj with respect to the Euclidean distance.

  2. Estimate the conditional probability for the l-th class as the fraction of vectors in N(xj) whose labels yj are equal to l:

    (1)#Pjl=1|N(xj)||{xrN(xj):yr=l}|,1jm,0l<c.
  3. Predict the class that has the highest probability for the feature vector xj:

    (2)#yj=argmax0l<cPjl,1jm.

Inference method: brute-force#

Brute-force inference method determines the set N(xj) of the nearest feature vectors by iterating over all the pairs (xj,xi) in the implementation defined order, 1in, 1jm. The final prediction is computed according to the equations (1) and (2).

Inference method: k-d tree#

K-d tree inference method traverses the k-d tree to find feature vectors associated with a leaf node that are closest to xj, 1jm. The set n~(xj) of the currently-known nearest k-th neighbors is progressively updated during tree traversal. The search algorithm limits exploration of the nodes for which the distance between the xj and respective part of the feature space is not less than the distance between xj and the most distant feature vector from n~(xj). Once tree traversal is finished, n~(xj)N(xj). The final prediction is computed according to the equations (1) and (2).

Usage example#

Training#

knn::model<> run_training(const table& data,
                          const table& labels) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = train(knn_desc, data, labels);

   return result.get_model();
}

Inference#

table run_inference(const knn::model<>& model,
                    const table& new_data) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = infer(knn_desc, model, new_data);

   print_table("labels", result.get_labels());
}

Programming Interface#

All types and functions in this section shall be declared in the oneapi::dal::knn namespace and be available via inclusion of the oneapi/dal/algo/knn.hpp header file.

Descriptor#

template <typename Float = float,
          typename Method = method::by_default,
          typename Task = task::by_default>
class descriptor {
public:
   explicit descriptor(std::int64_t class_count,
                       std::int64_t neighbor_count);

   std::int64_t get_class_count() const;
   descriptor& set_class_count(std::int64_t);

   std::int64_t get_neighbor_count() const;
   descriptor& set_neighbor_count(std::int64_t);
};
template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default>
class descriptor#
Template Parameters
  • Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

  • Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.

  • Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

descriptor(std::int64_t class_count, std::int64_t neighbor_count)#

Creates a new instance of the class with the given class_count and neighbor_count property values.

Properties

std::int64_t class_count#

The number of classes c.

Getter & Setter
std::int64_t get_class_count() const
descriptor & set_class_count(std::int64_t)
Invariants
std::int64_t neighbor_count#

The number of neighbors k.

Getter & Setter
std::int64_t get_neighbor_count() const
descriptor & set_neighbor_count(std::int64_t)
Invariants

Method tags#

namespace method {
   struct bruteforce {};
   struct kd_tree {};
   using by_default = bruteforce;
} // namespace method
struct bruteforce#

Tag-type that denotes brute-force computational method.

struct kd_tree#

Tag-type that denotes k-d tree computational method.

using by_default = bruteforce#

Alias tag-type for brute-force computational method.

Task tags#

namespace task {
   struct classification {};
   using by_default = classification;
} // namespace task
struct classification#

Tag-type that parameterizes entities used for solving classification problem.

using by_default = classification#

Alias tag-type for classification task.

Model#

template <typename Task = task::by_default>
class model {
public:
   model();
};
template<typename Task = task::by_default>
class model#
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

model()#

Creates a new instance of the class with the default property values.

Training train(...)#

Input#

template <typename Task = task::by_default>
class train_input {
public:
   train_input(const table& data = table{},
               const table& labels = table{});

   const table& get_data() const;
   train_input& set_data(const table&);

   const table& get_labels() const;
   train_input& set_labels(const table&);
};
template<typename Task = task::by_default>
class train_input#
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

train_input(const table &data = table{}, const table &labels = table{})#

Creates a new instance of the class with the given data and labels property values.

Properties

const table &data#

The training set X. Default value: table{}.

Getter & Setter
const table & get_data() const
train_input & set_data(const table &)
const table &labels#

Vector of labels y for the training set X. Default value: table{}.

Getter & Setter
const table & get_labels() const
train_input & set_labels(const table &)

Result#

template <typename Task = task::by_default>
class train_result {
public:
   train_result();

   const model<Task>& get_model() const;
};
template<typename Task = task::by_default>
class train_result#
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

train_result()#

Creates a new instance of the class with the default property values.

Public Methods

const model<Task> &get_model() const#

The trained k-NN model.

Operation#

template <typename Float, typename Method, typename Task>
train_result<Task> train(const descriptor<Float, Method, Task>& desc,
                         const train_input<Task>& input);
template<typename Float, typename Method, typename Task>
train_result<Task> train(const descriptor<Float, Method, Task> &desc, const train_input<Task> &input)#

Runs the training operation for k-NN classifier. For more details see oneapi::dal::train.

Template Parameters
  • Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

  • Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.

  • Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Parameters
  • desc – Descriptor of the algorithm.

  • input – Input data for the training operation.

Preconditions
input.data.has_data == true
input.labels.has_data == true
input.data.row_count == input.labels.row_count
input.labels.column_count == 1
input.labels[i] >= 0
input.labels[i] < desc.class_count

Inference infer(...)#

Input#

template <typename Task = task::by_default>
class infer_input {
public:
   infer_input(const model<Task>& m = model<Task>{},
               const table& data = table{});

   const model<Task>& get_model() const;
   infer_input& set_model(const model&);

   const table& get_data() const;
   infer_input& set_data(const table&);
};
template<typename Task = task::by_default>
class infer_input#
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

infer_input(const model<Task> &m = model<Task>{}, const table &data = table{})#

Creates a new instance of the class with the given model and data property values.

Properties

const table &data#

The dataset for inference X. Default value: table{}.

Getter & Setter
const table & get_data() const
infer_input & set_data(const table &)
const model<Task> &model#

The trained k-NN model. Default value: model<Task>{}.

Getter & Setter
const model< Task > & get_model() const
infer_input & set_model(const model &)

Result#

template <typename Task = task::by_default>
class infer_result {
public:
   infer_result();

   const table& get_labels() const;
};
template<typename Task = task::by_default>
class infer_result#
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

infer_result()#

Creates a new instance of the class with the default property values.

Public Methods

const table &get_labels() const#

The predicted labels.

Operation#

template <typename Float, typename Method, typename Task>
infer_result<Task> infer(const descriptor<Float, Method, Task>& desc,
                         const infer_input<Task>& input);
template<typename Float, typename Method, typename Task>
infer_result<Task> infer(const descriptor<Float, Method, Task> &desc, const infer_input<Task> &input)#

Runs the inference operation for k-NN classifier. For more details see oneapi::dal::infer.

Template Parameters
  • Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

  • Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.

  • Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Parameters
  • desc – Descriptor of the algorithm.

  • input – Input data for the inference operation.

Preconditions
input.data.has_data == true
Postconditions
result.labels.row_count == input.data.row_count
result.labels.column_count == 1
result.labels[i] >= 0
result.labels[i] < desc.class_count