Publishing House SB RAS:

Publishing House SB RAS:

Address of the Publishing House SB RAS:
Morskoy pr. 2, 630090 Novosibirsk, Russia



Advanced Search

Numerical Analysis and Applications

2019 year, number 2

Exact algorithms of searching for the largest size cluster in two integer 2-clustering problems

A.V. Kel'manov1,2, A.V. Panasenko1,2, V.I. Khandeev1,2
1Sobolev Institute of Mathematics, Novosibirsk, Russia, 630090
2Novosibirsk State University, Novosibirsk, Russia, 630090
Keywords: евклидово пространство, 2-кластеризация, наибольшее подмножество, NP-трудность, целочисленная задача, псевдополиномиальная разрешимость, Euclidean space, 2-clustering, largest subset, NP-hardness, exact algorithm, pseudo- polynomial-time solvability

Abstract

We consider two related discrete optimization problems of searching for a subset in a finite set of points in the Euclidean space. Both problems are induced by the versions of the fundamental problem in data analysis, namely, by selecting a subset of similar elements in a set of objects. In each problem, an input set and a positive real number are given, and it is required to find a cluster (i.e., a subset) of the largest size under constraints on the value of a quadratic clusterization function. The points in the input set which are outside the sought for subset are treated as the second (complementary) cluster. In the first problem, the function under the constraint is the sum over both clusters of the intracluster sums of the squared distances between the elements of the clusters and their centers. The center of the first (i.e., the sought) cluster is unknown and determined as the centroid, while the center of the second one is fixed at a given point in the Euclidean space (without loss of generality in the origin). In the second problem, the function under the constraint is the sum over both clusters of the weighted intracluster sums of the squared distances between the elements of the clusters and their centers. As in the first problem, the center of the first cluster is unknown and determined as the centroid, while the center of the second one is fixed in the origin. In this paper, we show that both problems are strongly NP-hard. Also, we present the exact algorithms for the cases of these problems in which the input points have integer components. If the space dimension is bounded by some constant, the algorithms are pseudopolynomial.