-
Ground truth clustering is not the optimum clustering
Authors:
Lucia Absalom Bautista,
Timotej Hrga,
Janez Povh,
Shudian Zhao
Abstract:
The clustering of data is one of the most important and challenging topics in data science. The minimum sum-of-squares clustering (MSSC) problem asks to cluster the data points into $k$ clusters such that the sum of squared distances between the data points and their cluster centers (centroids) is minimized. This problem is NP-hard, but there exist exact solvers that can solve such problem to opti…
▽ More
The clustering of data is one of the most important and challenging topics in data science. The minimum sum-of-squares clustering (MSSC) problem asks to cluster the data points into $k$ clusters such that the sum of squared distances between the data points and their cluster centers (centroids) is minimized. This problem is NP-hard, but there exist exact solvers that can solve such problem to optimality for small or medium size instances.
In this paper, we use a branch-and-bound solver based on semidefinite programming relaxations called SOS-SDP to compute the optimum solutions of the MSSC problem for various $k$ and for multiple datasets, with real and artificial data, for which the data provider has provided ground truth clustering.
Next, we use several extrinsic and intrinsic measures to evaluate how the optimum clustering and ground truth clustering matches, and how well these clusterings perform with respect to the criteria underlying the intrinsic measures. Our calculations show that the ground truth clusterings are generally far from the optimum solution to the MSSC problem. Moreover, the intrinsic measures evaluated on the ground truth clusterings are generally significantly worse compared to the optimum clusterings. However, when the ground truth clustering is in the form of convex sets, e.g., ellipsoids, that are well separated from each other, the ground truth clustering comes very close to the optimum clustering.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
The precontraction group of the field of logarithmic transseries $\mathbb{T}_{\log}$
Authors:
José Leonardo Ángel Bautista
Abstract:
As a first step to understand the theory of the structure $\mathbb{T}_{\log}$ of logarithmic transseries as an ordered valued logarithmic field, we focus on the map $χ$ induced by the logarithm of $\mathbb{T}_{\log}$ in its value group $Γ_{\log}$ and study the theory of the precontraction group $(Γ_{\log},χ)$. Particularly, we show that this theory is model complete and complete, and we characteri…
▽ More
As a first step to understand the theory of the structure $\mathbb{T}_{\log}$ of logarithmic transseries as an ordered valued logarithmic field, we focus on the map $χ$ induced by the logarithm of $\mathbb{T}_{\log}$ in its value group $Γ_{\log}$ and study the theory of the precontraction group $(Γ_{\log},χ)$. Particularly, we show that this theory is model complete and complete, and we characterize all definable subsets of the discrete set $χ(Γ_{\log})$.
△ Less
Submitted 28 March, 2019;
originally announced March 2019.
-
The Subalgebra Structure of the Cayley-Dickson Algebra of Dimension 32 (trigintaduonion)
Authors:
Raoul E. Cawagas,
Alexander S. Carrascal,
Lincoln A. Bautista,
John P. Sta. Maria,
Jackie D. Urrutia,
Bernadeth Nobles
Abstract:
The Cayley-Dickson algebras R (real numbers), C (complex numbers), H (quaternions), O (octonions), S (sedenions), and T (trigintaduonions) have attracted the attention of several mathematicians and physicists because of their important applications in both pure mathematics and theoretical physics. This paper deals with the determination of the subalgebra structure of the algebra T by analyzing t…
▽ More
The Cayley-Dickson algebras R (real numbers), C (complex numbers), H (quaternions), O (octonions), S (sedenions), and T (trigintaduonions) have attracted the attention of several mathematicians and physicists because of their important applications in both pure mathematics and theoretical physics. This paper deals with the determination of the subalgebra structure of the algebra T by analyzing the loop T_L of order 64 generated by its 32 basis elements. The analysis shows that T_L is a non-associative finite invertible loop (NAFIL) with 373 non-trivial subloops of orders 32, 16, 8, 4, and 2 all of which are normal. These subloops generate subalgebras of T of dimensions 16, 8, 4, 2, and 1 which form the elements of its structure.
△ Less
Submitted 1 September, 2009; v1 submitted 12 July, 2009;
originally announced July 2009.