For new authors:
free, easy and fast
For registered authors
Scientific Essay, 2012, 19 Pages
Scientific Essay
In this paper, a new algorithm for solving k center problem is proposed based on bouncing bubble algorithm which is initially introduced by the author for solving minimal enclosing ball problem. A O(kn) heuristic algorithm is proposed as a substitution of legendary farthest point algorithm. Although approximation of k center problem with factor less than 2 is known to be NP hard thus the farthest point clustering algorithm was considered to be the best approximation possible, the proposed algorithm produces solutions much better than those produced by FPC algorithm.
k-center problem is originated from facility location problem, which is to find a few candidate facility location to minimize cost of transportation or construction etc. However, it can be generalized and applied to machine learning, data mining etc.
This problem has different variants with different object functions or using varied metrics. In this paper, we discuss the minmax radius clustering problem, which is to find k centers with the maximal distance of a point to its center minimized. In this problem, the centers do not need to belong to the data set and the distance mentioned here is in Euclidean metrics. A formal definition of this problem is:
Given a point set [Abbildung in dieser Leseprobe nicht enthalten] so that is minimized.
There exist efficient exact and approximation solvers to solve the special case when k=1, which is called as minimal enclosing ball (MEB) problem [1][2][3][4][5][6][7]. While it's already proved that exact solution of k-center problem is NP complete[8][9]. The best known algorithms runs in time [Abbildung in dieser Leseprobe nicht enthalten][10][11]. And it's also proved even ϵ approximation to k-center problem is also proved to be NP hard when ϵ is small.
Because of the NP hardness, solution to k center problem when k is large is usually obtained with const approximation factor algorithm. One option is the Farthest Point Clustering algorithm (FPC), which has const approximation factor of 2 and was considered as the best approximation one can have[12].
The author proposed a simple approximate algorithm called bouncing bubble algorithm for solving MEB problem[7] and it's natural to extend it to solve k-center problem. The result of the study is an algorithm called Bouncing Bubbles Algorithm (BBA). The name is inherited from Bouncing Bubble algorithm and is also a salute to Scott Aaronson who used physical bubbles to solve Steiner Tree problem[13].
Although the approximate factor α=2 is proved to be the best possible unless NP=P, solutions much better than those produced by FPC are constantly observed in the experimental results while the reason remains unknown. In this paper, the detail of the algorithm and experimental results are presented for further investigation.
This paper is organized in this way: the background of the problem and related work will be introduced in section 2. Section 3 give a brief introduction on Bouncing Bubble algorithm proposed by the author for solving the MEB problem. Section 4 will present the Bouncing Bubbles algorithm. Section 5 shows experimental results and finally conclusions will be given in section 6.
Finding an exact solution to k center problem is NP hard. And it's proved that even approximation to the solution with small approximation factor is NP hard. Denote α as approximation factor. Gonzalez proved α <1.732 for dimension 2 is NP hard and α <2 for dimension ≥ 3 is NP hard [12]. Feder et al improved the bound for dimension 2 to α <1.822. So that we have α <1.822 for dimension 2 and α <2 for dimension ≥ 3 is NP hard[14].
There are several algorithms providing 1+ϵ approximation, where ϵ is arbitrary. Aggarwal and Procopiuc [15] proposed a 1+ϵ approximate algorithm which solves clustering in 2 and 3 dimensions. It uses a dynamic programming approach to solve the problem and run in time [Abbildung in dieser Leseprobe nicht enthalten]. Another algorithm proposed by Bădoiu base on computing of core-set [16]. The algorithm runs in time [Abbildung in dieser Leseprobe nicht enthalten]. Kumar proposed an algorithm also based on core-set which runs in and claim that the running time is much less than the worst case and thus it's possible to solve some problem when k is small (say k<5). As an example, the following shows how the core-set based solution works.
Bădoiu 's Core set based 1+ϵ approximation
Core set was introduced by Bădoiu et al [16]. The basic idea of core set is that there exist a small subset of original data set that the 1+ϵ expansion of its MEB covers the original data set. The process to find the core set can be detailed as follows:
1. Pick a point x from P, search a point y in P, which has the largest distance from x;
2. Search a point z in P, which has the largest distance from y. Initial core set as {y,z} and set an initial ball B, with its centre as the midpoint of y and z, the radius as half of the distance between y and z;
3. If all points in P is within ball (1+ϵ) B, then we get a bounding 1+ϵ approximation. Otherwise, let p be the farthest point from the centre of B, add p into core set; Compute a new ball B to be MEB of the core set. Repeat step 3 until 1+ϵ approximation is obtained.
In step 3, SOCP (Second Order Cone Programming) is used to obtain MEB of core set.
In the same paper, Bădoiu showed a straightforward approach to extend the core-set method to k-center problem: exhaustively enumerate all possibilities when deciding which cluster a new joined point should belong to. This lead to an exponential time complexity of [Abbildung in dieser Leseprobe nicht enthalten].
Several approximations runs in polynomial time with α = 2 has been reported. Hochbaum and Shmoys proposed a algorithm with time complexity of O(n2log n) [12][17]. A greedy approximate algorithm was proposed independently by Gonzalez with computational complexity of O(nk)[12]. Feder and Green improved the time complexity to O(n log(k)) with box decomposition technique [14].
Since Feder's solution produce the same result as Gonzalez's but is relatively more complex, hereby we shows the idea with Gonzalez's farthest point clustering algorithm.
Gonzalez's farthest point clustering algorithm
1. Pick a point x from P, add x into S.
2. Search a point y in P, which has the largest distance from S; add y into S;
3. Repeat 2 until | S |=k.
In step 2, distance between y and S is the minimum distance from y to each point in S.
In the author's previous work, a serial of approximate algorithms to MEB problem called bouncing bubble are proposed. A bubble is basically a ball that the optimum covers at least half of it therefore the ball is always smaller than the optimum. Formal definition is as follows:
Denote B(C,r) as a ball with C as its center and r as its radius, denote B(C *,r*) as MEB. A ball B(C,r) is called a bubble if [Abbildung in dieser Leseprobe nicht enthalten].
The idea of bouncing bubble algorithm is that one can always construct a bigger bubble when a outside point is encountered until it is close enough to the optimum.
Denote C i as the center of a bubble Bi, ri as the radius of [Abbildung in dieser Leseprobe nicht enthalten] Denote s as [Abbildung in dieser Leseprobe nicht enthalten]. The construction of a bigger bubble follows the following formula
Abbildung in dieser Leseprobe nicht enthalten
This iteration moves the center of the bubble towards the point p and enlarge the ball at the same time.
An iteration over the whole data set can be described as follows:
Abbildung in dieser Leseprobe nicht enthalten
Where "Ritter's enclosing approach" refers to the approach proposed by Ritter[5] for a coarse approximate of minimal enclosing ball problem. It's similar with our iteration except that when it move the center of the ball towards the point p, the radius is increased by half of the distance so that the new ball will cover both the new point and the previous ball. Ritter's approach ensures the algorithm will end up with a ball covers all points in S. Ritter's iteration can be formulated as follows:
While "bubble" was defined as a ball that at least half of it can be covered by the optimum ball, in this paper, it's generalized as any ball that is constructed according to the formula 1,2,3. The name is inherited for the similarity between new algorithm and the original bouncing bubble algorithm.
[...]
GRIN Publishing, located in Munich, Germany, has specialized since its foundation in 1998 in the publication of academic ebooks and books. The publishing website GRIN.com offer students, graduates and university professors the ideal platform for the presentation of scientific papers, such as research projects, theses, dissertations, and academic essays to a wide audience.
Free Publication of your term paper, essay, interpretation, bachelor's thesis, master's thesis, dissertation or textbook - upload now!