Abstract

We analyze and exploit some scaling properties of the Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007). First we observe that a divide and conquer strategy, used on a large data set hierarchically reduces the complexity ${\cal O}(N 2)$ to
${\cal O}(N {(h+2)/(h+1)})$, for a dataset of size N and a depth h of the hierarchical strategy. For a dataset embedded in a ddimensional space, we show that this is obtained without notably damaging the precision except in dimension d=2. In fact, for d>2 the relative loss in precision scales like $N{(2d)/(h+1)d}$. Finally, under some conditions we observe that there is a value $s *$ of the penalty coefficient, a free parameter used to fix the number of clusters, which separates a fragmentation phase (for s<s*) from a coalescent one (for s>s*) of the underlying hidden cluster structure. At this precise point holds a selfsimilarity property which can be exploited by the hierarchical strategy to actually locate its position. From this observation, a strategy based on AP can be defined to find out how many clusters are present in a given dataset.

Bibtex

@ARTICLE{FURTLEHNER:2010:INRIA00420407:2,
author = {Furtlehner, C. and Sebag, M. and Xiangliang, Z.},
title = {{S}caling {A}nalysis of {A}ffinity {P}ropagation},
year = {2010},
volume = {81},
pages = {066102},
note = {14 pages, 11 figures},
journal = {{P}hysical {R}eview {E}: {S}tatistical, {N}onlinear, and {S}oft {M}atter {P}hysics},
url = {http://hal.inria.fr/inria00420407/en/},
affiliation = {{TAO}  {INRIA} {S}aclay  {I}le de {F}rance  {INRIA}  {CNRS} : {UMR}8623  {U}niversit{\'e} {P}aris {S}ud  {P}aris {XI}},
audience = {internationale},
language = {{A}nglais},
hal_id = {inria00420407},
keywords = {{C}lustering; {B}elief{P}ropagation; {E}xtrem value statistics; {S}cale invariance; {R}enormalization},
doi = {10.1103/PhysRevE.81.066102},
}
