K-Means initialization#
The K-Means initialization algorithm receives \(n\) feature vectors as input and chooses \(k\) initial centroids. After initialization, K-Means algorithm uses the initialization result to partition input data into \(k\) clusters.
Operation |
Computational methods |
Programming Interface |
||
Mathematical formulation#
Computing#
Given the training set \(X = \{ x_1, \ldots, x_n \}\) of \(p\)-dimensional feature vectors and a positive integer \(k\), the problem is to find a set \(C = \{ c_1, \ldots, c_k \}\) of \(p\)-dimensional initial centroids.
Computing method: dense#
The method chooses first \(k\) feature vectors from the training set \(X\).
Usage example#
Computing#
table run_compute(const table& data) {
const auto kmeans_desc = kmeans_init::descriptor<float,
kmeans_init::method::dense>{}
.set_cluster_count(10)
const auto result = compute(kmeans_desc, data);
print_table("centroids", result.get_centroids());
return result.get_centroids();
}
Programming Interface#
All types and functions in this section shall be declared in the
oneapi::dal::kmeans_init
namespace and be available via inclusion of the
oneapi/dal/algo/kmeans_init.hpp
header file.
Descriptor#
template <typename Float = float,
typename Method = method::by_default,
typename Task = task::by_default>
class descriptor {
public:
explicit descriptor(std::int64_t cluster_count = 2);
std::int64_t get_cluster_count() const;
descriptor& set_cluster_count(std::int64_t);
};
-
template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default>
class descriptor# - Template Parameters:
Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of K-Means Initialization algorithm.
Task – Tag-type that specifies the type of the problem to solve. Can be task::init.
Constructors
-
descriptor(std::int64_t cluster_count = 2)#
Creates a new instance of the class with the given
cluster_count
.
Properties
-
std::int64_t cluster_count#
The number of clusters \(k\). Default value: 2.
- Getter & Setter
std::int64_t get_cluster_count() const
descriptor & set_cluster_count(std::int64_t)
- Invariants
- cluster_count > 0
Computing compute(...)#
Input#
template <typename Task = task::by_default>
class compute_input {
public:
compute_input(const table& data = table{});
const table& get_data() const;
compute_input& set_data(const table&);
};
Result#
template <typename Task = task::by_default>
class compute_result {
public:
compute_result();
const table& get_centroids() const;
};
-
template<typename Task = task::by_default>
class compute_result# - Template Parameters:
Task – Tag-type that specifies type of the problem to solve. Can be task::clustering.
Constructors
-
compute_result()#
Creates a new instance of the class with the default property values.
Public Methods
Operation#
template <typename Float, typename Method, typename Task>
compute_result<Task> compute(const descriptor<Float, Method, Task>& desc,
const compute_input<Task>& input);
-
template<typename Float, typename Method, typename Task>
compute_result<Task> compute(const descriptor<Float, Method, Task> &desc, const compute_input<Task> &input)# Runs the computing operation for K-Means initialization. For more details, see oneapi::dal::compute.
- Template Parameters:
Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of K-Means Initialization algorithm.
Task – Tag-type that specifies type of the problem to solve. Can be task::init.
- Parameters:
desc – The descriptor of the algorithm.
input – Input data for the computing operation.
- Preconditions
- Postconditions