In this paper, a new algorithm for solving k center problem is proposed based on bouncing bubble algorithm which is initially introduced by the author for solving minimal enclosing ball problem. A O(kn) heuristic algorithm is proposed as a substitution of legendary farthest point algorithm. Although approximation of k center problem with factor less than 2 is known to be NP hard thus the farthest point clustering algorithm was considered to be the best approximation possible, the proposed algorithm produces solutions much better than those produced by FPC algorithm.
Abstract
In this paper, a new algorithm for solving k center problem is proposed based on bouncing bubble algorithm which is initially introduced by the author for solving minimal enclosing ball problem. A O(kn) heuristic algorithm is proposed as a substitution of legendary farthest point algorithm. Although approximation of k center problem with factor less than 2 is known to be NP hard thus the farthest point clustering algorithm was considered to be the best approximation possible, the proposed algorithm produces solutions much better than those produced by FPC algorithm.
1. Introduction
k-center problem is originated from facility location problem, which is to find a few candidate facility location to minimize cost of transportation or construction etc. However, it can be generalized and applied to machine learning, data mining etc.
This problem has different variants with different object functions or using varied metrics. In this paper, we discuss the minmax radius clustering problem, which is to find k centers with the maximal distance of a point to its center minimized. In this problem, the centers do not need to belong to the data set and the distance mentioned here is in Euclidean metrics. A formal definition of this problem is:
Given a point set [Abbildung in dieser Leseprobe nicht enthalten] so that is minimized.
There exist efficient exact and approximation solvers to solve the special case when k=1, which is called as minimal enclosing ball (MEB) problem [1][2][3][4][5][6][7]. While it's already proved that exact solution of k-center problem is NP complete[8][9]. The best known algorithms runs in time [Abbildung in dieser Leseprobe nicht enthalten][10][11]. And it's also proved even ϵ approximation to k-center problem is also proved to be NP hard when ϵ is small.
Because of the NP hardness, solution to k center problem when k is large is usually obtained with const approximation factor algorithm. One option is the Farthest Point Clustering algorithm (FPC), which has const approximation factor of 2 and was considered as the best approximation one can have[12].
The author proposed a simple approximate algorithm called bouncing bubble algorithm for solving MEB problem[7] and it's natural to extend it to solve k-center problem. The result of the study is an algorithm called Bouncing Bubbles Algorithm (BBA). The name is inherited from Bouncing Bubble algorithm and is also a salute to Scott Aaronson who used physical bubbles to solve Steiner Tree problem[13].
Although the approximate factor α=2 is proved to be the best possible unless NP=P, solutions much better than those produced by FPC are constantly observed in the experimental results while the reason remains unknown. In this paper, the detail of the algorithm and experimental results are presented for further investigation.
This paper is organized in this way: the background of the problem and related work will be introduced in section 2. Section 3 give a brief introduction on Bouncing Bubble algorithm proposed by the author for solving the MEB problem. Section 4 will present the Bouncing Bubbles algorithm. Section 5 shows experimental results and finally conclusions will be given in section 6.
2. Related Works
Finding an exact solution to k center problem is NP hard. And it's proved that even approximation to the solution with small approximation factor is NP hard. Denote α as approximation factor. Gonzalez proved α <1.732 for dimension 2 is NP hard and α <2 for dimension ≥ 3 is NP hard [12]. Feder et al improved the bound for dimension 2 to α <1.822. So that we have α <1.822 for dimension 2 and α <2 for dimension ≥ 3 is NP hard[14].
There are several algorithms providing 1+ϵ approximation, where ϵ is arbitrary. Aggarwal and Procopiuc [15] proposed a 1+ϵ approximate algorithm which solves clustering in 2 and 3 dimensions. It uses a dynamic programming approach to solve the problem and run in time [Abbildung in dieser Leseprobe nicht enthalten]. Another algorithm proposed by Bădoiu base on computing of core-set [16]. The algorithm runs in time [Abbildung in dieser Leseprobe nicht enthalten]. Kumar proposed an algorithm also based on core-set which runs in and claim that the running time is much less than the worst case and thus it's possible to solve some problem when k is small (say k<5). As an example, the following shows how the core-set based solution works.
Bădoiu 's Core set based 1+ϵ approximation
Core set was introduced by Bădoiu et al [16]. The basic idea of core set is that there exist a small subset of original data set that the 1+ϵ expansion of its MEB covers the original data set. The process to find the core set can be detailed as follows:
1. Pick a point x from P, search a point y in P, which has the largest distance from x;
2. Search a point z in P, which has the largest distance from y. Initial core set as {y,z} and set an initial ball B, with its centre as the midpoint of y and z, the radius as half of the distance between y and z;
3. If all points in P is within ball (1+ϵ) B, then we get a bounding 1+ϵ approximation. Otherwise, let p be the farthest point from the centre of B, add p into core set; Compute a new ball B to be MEB of the core set. Repeat step 3 until 1+ϵ approximation is obtained.
In step 3, SOCP (Second Order Cone Programming) is used to obtain MEB of core set.
In the same paper, Bădoiu showed a straightforward approach to extend the core-set method to k-center problem: exhaustively enumerate all possibilities when deciding which cluster a new joined point should belong to. This lead to an exponential time complexity of [Abbildung in dieser Leseprobe nicht enthalten].
Several approximations runs in polynomial time with α = 2 has been reported. Hochbaum and Shmoys proposed a algorithm with time complexity of O(n2log n) [12][17]. A greedy approximate algorithm was proposed independently by Gonzalez with computational complexity of O(nk)[12]. Feder and Green improved the time complexity to O(n log(k)) with box decomposition technique [14].
Since Feder's solution produce the same result as Gonzalez's but is relatively more complex, hereby we shows the idea with Gonzalez's farthest point clustering algorithm.
Gonzalez's farthest point clustering algorithm
1. Pick a point x from P, add x into S.
2. Search a point y in P, which has the largest distance from S; add y into S;
3. Repeat 2 until | S |=k.
In step 2, distance between y and S is the minimum distance from y to each point in S.
3. Preliminary
In the author's previous work, a serial of approximate algorithms to MEB problem called bouncing bubble are proposed. A bubble is basically a ball that the optimum covers at least half of it therefore the ball is always smaller than the optimum. Formal definition is as follows:
Denote B(C,r) as a ball with C as its center and r as its radius, denote B(C *,r*) as MEB. A ball B(C,r) is called a bubble if [Abbildung in dieser Leseprobe nicht enthalten].
The idea of bouncing bubble algorithm is that one can always construct a bigger bubble when a outside point is encountered until it is close enough to the optimum.
Denote C i as the center of a bubble Bi, ri as the radius of [Abbildung in dieser Leseprobe nicht enthalten] Denote s as [Abbildung in dieser Leseprobe nicht enthalten]. The construction of a bigger bubble follows the following formula
Abbildung in dieser Leseprobe nicht enthalten
This iteration moves the center of the bubble towards the point p and enlarge the ball at the same time.
An iteration over the whole data set can be described as follows:
Abbildung in dieser Leseprobe nicht enthalten
Where "Ritter's enclosing approach" refers to the approach proposed by Ritter[5] for a coarse approximate of minimal enclosing ball problem. It's similar with our iteration except that when it move the center of the ball towards the point p, the radius is increased by half of the distance so that the new ball will cover both the new point and the previous ball. Ritter's approach ensures the algorithm will end up with a ball covers all points in S. Ritter's iteration can be formulated as follows:
Frequently Asked Questions About k-Center Problem Algorithm
What is the k-center problem?
The k-center problem is a facility location problem that seeks to find the optimal placement of k centers to minimize the maximum distance from any data point to its nearest center. It can be applied in various fields such as machine learning and data mining.
What is the objective of the minmax radius clustering problem discussed in the text?
The minmax radius clustering problem aims to find k centers such that the maximum distance from any point to its nearest center is minimized. The centers don't necessarily have to be part of the dataset, and distances are measured using Euclidean metrics.
Why is finding an exact solution to the k-center problem challenging?
Finding the exact solution to the k-center problem is NP-hard, meaning there's no known polynomial-time algorithm to solve it exactly for large datasets. Even approximating the solution with a small error factor is NP-hard.
What is the Farthest Point Clustering (FPC) algorithm, and why is it significant?
The Farthest Point Clustering (FPC) algorithm is an approximation algorithm for the k-center problem with a constant approximation factor of 2. It was considered the best possible approximation algorithm for the k-center problem due to the proven NP-hardness of achieving a smaller approximation factor.
What is the Bouncing Bubbles Algorithm (BBA) proposed in the paper?
The Bouncing Bubbles Algorithm (BBA) is a heuristic algorithm proposed as an alternative to FPC. It is based on the bouncing bubble algorithm initially introduced by the author for solving the minimal enclosing ball problem. Experiments have shown that BBA often produces solutions better than FPC, although the reasons for this are not fully understood.
How does the BBA relate to the Minimal Enclosing Ball (MEB) problem?
The BBA is inspired by the author's previous work on solving the MEB problem using a bouncing bubble algorithm. The BBA extends this concept to address the k-center problem.
What is a "bubble" in the context of the BBA?
A "bubble" refers to a ball that is constructed during the algorithm's iterations. Initially, in the context of MEB, it's defined as a ball where the optimal MEB covers at least half of it. In this generalized context, any ball constructed according to the specific update formulas is considered a bubble.
What is the purpose of the core-set based approach mentioned in the related works section?
The core-set based approach attempts to find a small subset (core-set) of the original data such that a (1+ϵ)-expansion of the MEB of the core-set covers the entire original dataset. This allows for faster computation of an approximate solution.
What is Ritter's enclosing approach?
Ritter's enclosing approach is a method for obtaining a coarse approximation of the minimal enclosing ball. It iteratively moves the center of the ball towards an uncovered point and increases the radius to cover both the point and the previous ball.
What are some of the key related works mentioned in the text?
Key related works include algorithms by Gonzalez, Hochbaum and Shmoys, Feder et al., Aggarwal and Procopiuc, and Bădoiu. These works focus on either providing approximation algorithms with specific approximation factors or achieving (1+ϵ) approximation, where ϵ is an arbitrary small value.
- Citar trabajo
- Bo Tian (Autor), 2012, Bouncing Bubbles for K center problem, Múnich, GRIN Verlag, https://www.grin.com/document/205619