K-Means initialization¶
The K-Means initialization algorithm receives \(n\) feature vectors as input and chooses \(k\) initial centroids. After initialization, K-Means algorithm uses the initialization result to partition input data into \(k\) clusters.
Operation |
Computational methods |
Programming Interface |
||
Mathematical formulation¶
Computing¶
Given the training set \(X = \{ x_1, \ldots, x_n \}\) of \(p\)-dimensional feature vectors and a positive integer \(k\), the problem is to find a set \(C = \{ c_1, \ldots, c_k \}\) of \(p\)-dimensional initial centroids.
Computing method: dense¶
The method chooses first \(k\) feature vectors from the training set \(X\).
Usage example¶
Computing¶
table run_compute(const table& data) {
const auto kmeans_desc = kmeans_init::descriptor<float,
kmeans_init::method::dense>{}
.set_cluster_count(10)
const auto result = compute(kmeans_desc, data);
print_table("centroids", result.get_centroids());
return result.get_centroids();
}
Programming Interface¶
All types and functions in this section shall be declared in the
oneapi::dal::kmeans_init
namespace and be available via inclusion of the
oneapi/dal/algo/kmeans_init.hpp
header file.
Descriptor¶
template <typename Float = float,
typename Method = method::by_default,
typename Task = task::by_default>
class descriptor {
public:
explicit descriptor(std::int64_t cluster_count = 2);
std::int64_t get_cluster_count() const;
descriptor& set_cluster_count(std::int64_t);
};
-
template<typename
Float
= float, typenameMethod
= method::by_default, typenameTask
= task::by_default>
classdescriptor
¶ - Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be
float
ordouble
.Method – Tag-type that specifies an implementation of K-Means Initialization algorithm.
Task – Tag-type that specifies the type of the problem to solve. Can be
task::init
.
Constructors
-
descriptor
(std::int64_t cluster_count = 2)¶ Creates a new instance of the class with the given
cluster_count
.
Properties
-
std::int64_t
cluster_count
= 2¶ The number of clusters \(k\).
- Getter & Setter
std::int64_t get_cluster_count() const
descriptor & set_cluster_count(std::int64_t)
- Invariants
cluster_count > 0
Computing compute(...)
¶
Input¶
template <typename Task = task::by_default>
class compute_input {
public:
compute_input(const table& data = table{});
const table& get_data() const;
compute_input& set_data(const table&);
};
-
template<typename
Task
= task::by_default>
classcompute_input
¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::init
.
Constructors
-
compute_input
(const table &data = table{})¶ Creates a new instance of the class with the given
data
.
Properties
-
const table &
data
= table{}¶ An \(n \times p\) table with the data to be clustered, where each row stores one feature vector.
- Getter & Setter
const table & get_data() const
compute_input & set_data(const table &)
Result¶
template <typename Task = task::by_default>
class compute_result {
public:
compute_result();
const table& get_centroids() const;
};
-
template<typename
Task
= task::by_default>
classcompute_result
¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be
task::clustering
.
Constructors
-
compute_result
()¶ Creates a new instance of the class with the default property values.
Properties
-
const table &
centroids
= table{}¶ A \(k \times p\) table with the initial centroids. Each row of the table stores one centroid.
- Getter & Setter
const table & get_centroids() const
Operation¶
-
template<typename
Float
, typenameMethod
, typenameTask
>
compute_result<Task>compute
(const descriptor<Float, Method, Task> &desc, const compute_input<Task> &input)¶ Runs the computing operation for K-Means initialization. For more details, see
oneapi::dal::compute
.- Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be
float
ordouble
.Method – Tag-type that specifies an implementation of K-Means Initialization algorithm.
Task – Tag-type that specifies type of the problem to solve. Can be
task::init
.
- Parameters
desc – The descriptor of the algorithm.
input – Input data for the computing operation.
- Preconditions
- Postconditions