k-Nearest Neighbors Classification (k-NN)#

$k$ -NN classification algorithm infers the class for the new feature vector by computing majority vote of the $k$ nearest observations from the training set.

Operation	Computational methods		Programming Interface
Training	Brute-force	k-d tree	train(…)	train_input	train_result
Inference	Brute-force	k-d tree	infer(…)	infer_input	infer_result

Mathematical formulation#

Training#

Let $X = {x_{1}, \dots, x_{n}}$ be the training set of $p$ -dimensional feature vectors, let $Y = {y_{1}, \dots, y_{n}}$ be the set of class labels, where $y_{i} \in {0, \dots, c - 1}$ , $1 \leq i \leq n$ . Given $X$ , $Y$ and the number of nearest neighbors $k$ , the problem is to build a model that allows distance computation between the feature vectors in training and inference sets at the inference stage.

Training method: brute-force#

The training operation produces the model that stores all the feature vectors from the initial training set $X$ .

Training method: k-d tree#

The training operation builds a $k$ - $d$ tree that partitions the training set $X$ (for more details, see k-d Tree).

Inference#

Let $X^{'} = {x_{1}^{'}, \dots, x_{m}^{'}}$ be the inference set of $p$ -dimensional feature vectors. Given $X^{'}$ , the model produced at the training stage and the number of nearest neighbors $k$ , the problem is to predict the label $y_{j}^{'}$ for each $x_{j}^{'}$ , $1 \leq j \leq m$ , by performing the following steps:

Identify the set $N (x_{j}^{'}) \subseteq X$ of the $k$ feature vectors in the training set that are nearest to $x_{j}^{'}$ with respect to the Euclidean distance.
Estimate the conditional probability for the $l$ -th class as the fraction of vectors in $N (x_{j}^{'})$ whose labels $y_{j}$ are equal to $l$ :

(1)# $P_{j l} = \frac{1}{| N (x_{j}^{'}) |} | {x_{r} \in N (x_{j}^{'}) : y_{r} = l} |, 1 \leq j \leq m, 0 \leq l < c .$
Predict the class that has the highest probability for the feature vector $x_{j}^{'}$ :

(2)# $y_{j}^{'} = \arg max_{0 \leq l < c} P_{j l}, 1 \leq j \leq m .$

Inference method: brute-force#

Brute-force inference method determines the set $N (x_{j}^{'})$ of the nearest feature vectors by iterating over all the pairs $(x_{j}^{'}, x_{i})$ in the implementation defined order, $1 \leq i \leq n$ , $1 \leq j \leq m$ . The final prediction is computed according to the equations (1) and (2).

Inference method: k-d tree#

K-d tree inference method traverses the $k$ - $d$ tree to find feature vectors associated with a leaf node that are closest to $x_{j}^{'}$ , $1 \leq j \leq m$ . The set $\tilde{n} (x_{j}^{'})$ of the currently-known nearest $k$ -th neighbors is progressively updated during tree traversal. The search algorithm limits exploration of the nodes for which the distance between the $x_{j}^{'}$ and respective part of the feature space is not less than the distance between $x_{j}^{'}$ and the most distant feature vector from $\tilde{n} (x_{j}^{'})$ . Once tree traversal is finished, $\tilde{n} (x_{j}^{'}) \equiv N (x_{j}^{'})$ . The final prediction is computed according to the equations (1) and (2).

Usage example#

Training#

knn::model<> run_training(const table& data,
                          const table& labels) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = train(knn_desc, data, labels);

   return result.get_model();
}

Inference#

table run_inference(const knn::model<>& model,
                    const table& new_data) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = infer(knn_desc, model, new_data);

   print_table("labels", result.get_labels());
}

Programming Interface#

All types and functions in this section shall be declared in the oneapi::dal::knn namespace and be available via inclusion of the oneapi/dal/algo/knn.hpp header file.

Descriptor#

template <typename Float = float,
          typename Method = method::by_default,
          typename Task = task::by_default>
class descriptor {
public:
   explicit descriptor(std::int64_t class_count,
                       std::int64_t neighbor_count);

   std::int64_t get_class_count() const;
   descriptor& set_class_count(std::int64_t);

   std::int64_t get_neighbor_count() const;
   descriptor& set_neighbor_count(std::int64_t);
};

template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default> class descriptor#

Template Parameters

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

descriptor(std::int64_t class_count, std::int64_t neighbor_count)#: Creates a new instance of the class with the given class_count and neighbor_count property values.

Properties

std::int64_t class_count#

The number of classes $c$ .

Getter & Setter: std::int64_t get_class_count() const

descriptor & set_class_count(std::int64_t)
Invariants: class_count > 1

std::int64_t neighbor_count#

The number of neighbors $k$ .

Getter & Setter: std::int64_t get_neighbor_count() const

descriptor & set_neighbor_count(std::int64_t)
Invariants: neighbor_count > 0

Method tags#

namespace method {
   struct bruteforce {};
   struct kd_tree {};
   using by_default = bruteforce;
} // namespace method

struct bruteforce#: Tag-type that denotes brute-force computational method.

struct kd_tree#: Tag-type that denotes k-d tree computational method.

using by_default = bruteforce #: Alias tag-type for brute-force computational method.

Task tags#

namespace task {
   struct classification {};
   using by_default = classification;
} // namespace task

struct classification#: Tag-type that parameterizes entities used for solving classification problem.

using by_default = classification #: Alias tag-type for classification task.

Model#

template <typename Task = task::by_default>
class model {
public:
   model();
};

template<typename Task = task::by_default> class model#

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

model()#: Creates a new instance of the class with the default property values.

Training train(...)#

Input#

template <typename Task = task::by_default>
class train_input {
public:
   train_input(const table& data = table{},
               const table& labels = table{});

   const table& get_data() const;
   train_input& set_data(const table&);

   const table& get_labels() const;
   train_input& set_labels(const table&);
};

template<typename Task = task::by_default> class train_input#

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

train_input(const table &data = table{}, const table &labels = table{})#: Creates a new instance of the class with the given data and labels property values.

Properties

const table &data#

The training set $X$ . Default value: table{}.

Getter & Setter: const table & get_data() const

train_input & set_data(const table &)

const table &labels#

Vector of labels $y$ for the training set $X$ . Default value: table{}.

Getter & Setter: const table & get_labels() const

train_input & set_labels(const table &)

Result#

template <typename Task = task::by_default>
class train_result {
public:
   train_result();

   const model<Task>& get_model() const;
};

template<typename Task = task::by_default> class train_result#

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

train_result()#: Creates a new instance of the class with the default property values.

Public Methods

const model<Task> &get_model() const#: The trained $k$ -NN model.

Operation#

template <typename Float, typename Method, typename Task>
train_result<Task> train(const descriptor<Float, Method, Task>& desc,
                         const train_input<Task>& input);

template<typename Float, typename Method, typename Task> train_result<Task> train(const descriptor<Float, Method, Task> &desc, const train_input<Task> &input)#

Runs the training operation for $k$ -NN classifier. For more details see oneapi::dal::train.

Template Parameters

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Parameters

desc – Descriptor of the algorithm.
input – Input data for the training operation.

Preconditions: input.data.has_data == true

input.labels.has_data == true

input.data.row_count == input.labels.row_count

input.labels.column_count == 1

input.labels[i] >= 0

input.labels[i] < desc.class_count

Inference infer(...)#

Input#

template <typename Task = task::by_default>
class infer_input {
public:
   infer_input(const model<Task>& m = model<Task>{},
               const table& data = table{});

   const model<Task>& get_model() const;
   infer_input& set_model(const model&);

   const table& get_data() const;
   infer_input& set_data(const table&);
};

template<typename Task = task::by_default> class infer_input#

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

infer_input(const model<Task> &m = model<Task>{}, const table &data = table{})#: Creates a new instance of the class with the given model and data property values.

Properties

const table &data#

The dataset for inference $X^{'}$ . Default value: table{}.

Getter & Setter: const table & get_data() const

infer_input & set_data(const table &)

const model<Task> &model#

The trained $k$ -NN model. Default value: model<Task>{}.

Getter & Setter: const model< Task > & get_model() const

infer_input & set_model(const model &)

Result#

template <typename Task = task::by_default>
class infer_result {
public:
   infer_result();

   const table& get_labels() const;
};

template<typename Task = task::by_default> class infer_result#

Template Parameters: Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Constructors

infer_result()#: Creates a new instance of the class with the default property values.

Public Methods

const table &get_labels() const#: The predicted labels.

Operation#

template <typename Float, typename Method, typename Task>
infer_result<Task> infer(const descriptor<Float, Method, Task>& desc,
                         const infer_input<Task>& input);

template<typename Float, typename Method, typename Task> infer_result<Task> infer(const descriptor<Float, Method, Task> &desc, const infer_input<Task> &input)#

Runs the inference operation for $k$ -NN classifier. For more details see oneapi::dal::infer.

Template Parameters

Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.

Parameters

desc – Descriptor of the algorithm.
input – Input data for the inference operation.

Preconditions: input.data.has_data == true
Postconditions: result.labels.row_count == input.data.row_count

result.labels.column_count == 1

result.labels[i] >= 0

result.labels[i] < desc.class_count

oneAPI Specification 1.2-provisional-rev-1 documentation

k-Nearest Neighbors Classification (k-NN)

Contents

k-Nearest Neighbors Classification (k-NN)#

Mathematical formulation#

Training#

Training method: brute-force#

Training method: k-d tree#

Inference#

Inference method: brute-force#

Inference method: k-d tree#

Usage example#

Training#

Inference#

Programming Interface#

Descriptor#

Method tags#

Task tags#

Model#

Training train(...)#

Input#

Result#

Operation#

Inference infer(...)#

Input#

Result#

Operation#