k-Nearest Neighbors Classification (k-NN)¶
Operation |
Computational methods |
Programming Interface |
|||
Mathematical formulation¶
Training¶
Let
Training method: brute-force¶
The training operation produces the model that stores all the feature vectors
from the initial training set
Inference¶
Let
Identify the set
of the feature vectors in the training set that are nearest to with respect to the Euclidean distance.Estimate the conditional probability for the
-th class as the fraction of vectors in whose labels are equal to :(1)¶Predict the class that has the highest probability for the feature vector
:(2)¶
Inference method: brute-force¶
Brute-force inference method determines the set
Inference method: k-d tree¶
K-d tree inference method traverses the
Usage example¶
Training¶
knn::model<> run_training(const table& data,
const table& labels) {
const std::int64_t class_count = 10;
const std::int64_t neighbor_count = 5;
const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};
const auto result = train(knn_desc, data, labels);
return result.get_model();
}
Inference¶
table run_inference(const knn::model<>& model,
const table& new_data) {
const std::int64_t class_count = 10;
const std::int64_t neighbor_count = 5;
const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};
const auto result = infer(knn_desc, model, new_data);
print_table("labels", result.get_labels());
}
Programming Interface¶
All types and functions in this section shall be declared in the
oneapi::dal::knn
namespace and be available via inclusion of the
oneapi/dal/algo/knn.hpp
header file.
Descriptor¶
template <typename Float = float,
typename Method = method::by_default,
typename Task = task::by_default>
class descriptor {
public:
explicit descriptor(std::int64_t class_count,
std::int64_t neighbor_count);
std::int64_t get_class_count() const;
descriptor& set_class_count(std::int64_t);
std::int64_t get_neighbor_count() const;
descriptor& set_neighbor_count(std::int64_t);
};
-
template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default>
class descriptor¶ - Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.
Constructors
-
descriptor(std::int64_t class_count, std::int64_t neighbor_count)¶
Creates a new instance of the class with the given
class_count
andneighbor_count
property values.
Properties
-
std::int64_t neighbor_count¶
The number of neighbors
.- Getter & Setter
std::int64_t get_neighbor_count() const
descriptor & set_neighbor_count(std::int64_t)
- Invariants
- neighbor_count > 0
-
std::int64_t class_count¶
The number of classes
.- Getter & Setter
std::int64_t get_class_count() const
descriptor & set_class_count(std::int64_t)
- Invariants
- class_count > 1
Method tags¶
namespace method {
struct bruteforce {};
struct kd_tree {};
using by_default = bruteforce;
} // namespace method
-
struct bruteforce¶
Tag-type that denotes brute-force computational method.
-
using by_default = bruteforce¶
Alias tag-type for brute-force computational method.
Task tags¶
namespace task {
struct classification {};
using by_default = classification;
} // namespace task
-
struct classification¶
Tag-type that parameterizes entities used for solving classification problem.
-
using by_default = classification¶
Alias tag-type for classification task.
Model¶
template <typename Task = task::by_default>
class model {
public:
model();
};
-
template<typename Task = task::by_default>
class model¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.
Constructors
-
model()¶
Creates a new instance of the class with the default property values.
Training train(...)¶
Input¶
template <typename Task = task::by_default>
class train_input {
public:
train_input(const table& data = table{},
const table& labels = table{});
const table& get_data() const;
train_input& set_data(const table&);
const table& get_labels() const;
train_input& set_labels(const table&);
};
-
template<typename Task = task::by_default>
class train_input¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.
Constructors
-
train_input(const table &data = table{}, const table &labels = table{})¶
Creates a new instance of the class with the given
data
andlabels
property values.
Properties
Result¶
template <typename Task = task::by_default>
class train_result {
public:
train_result();
const model<Task>& get_model() const;
};
-
template<typename Task = task::by_default>
class train_result¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.
Constructors
-
train_result()¶
Creates a new instance of the class with the default property values.
Public Methods
Operation¶
template <typename Float, typename Method, typename Task>
train_result<Task> train(const descriptor<Float, Method, Task>& desc,
const train_input<Task>& input);
-
template<typename Float, typename Method, typename Task>
train_result<Task> train(const descriptor<Float, Method, Task> &desc, const train_input<Task> &input)¶ Runs the training operation for
-NN classifier. For more details see oneapi::dal::train.- Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.
- Parameters
desc – Descriptor of the algorithm.
input – Input data for the training operation.
- Preconditions
Inference infer(...)¶
Input¶
template <typename Task = task::by_default>
class infer_input {
public:
infer_input(const model<Task>& m = model<Task>{},
const table& data = table{});
const model<Task>& get_model() const;
infer_input& set_model(const model&);
const table& get_data() const;
infer_input& set_data(const table&);
};
-
template<typename Task = task::by_default>
class infer_input¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.
Constructors
-
infer_input(const model<Task> &m = model<Task>{}, const table &data = table{})¶
Creates a new instance of the class with the given
model
anddata
property values.
Properties
Result¶
template <typename Task = task::by_default>
class infer_result {
public:
infer_result();
const table& get_labels() const;
};
-
template<typename Task = task::by_default>
class infer_result¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.
Constructors
-
infer_result()¶
Creates a new instance of the class with the default property values.
Public Methods
Operation¶
template <typename Float, typename Method, typename Task>
infer_result<Task> infer(const descriptor<Float, Method, Task>& desc,
const infer_input<Task>& input);
-
template<typename Float, typename Method, typename Task>
infer_result<Task> infer(const descriptor<Float, Method, Task> &desc, const infer_input<Task> &input)¶ Runs the inference operation for
-NN classifier. For more details see oneapi::dal::infer.- Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::bruteforce or method::kd_tree.
Task – Tag-type that specifies type of the problem to solve. Can be task::classification.
- Parameters
desc – Descriptor of the algorithm.
input – Input data for the inference operation.
- Preconditions
- input.data.has_data == true
- Postconditions