Business Listings Data Sets

Below are links to the business-listings data sets used in the Local Algorithms for Interactive Clustering study.

Each data set corresponds to an over-clustering instance (a cluster intersecting several ground-truth clusters) studied in the "Clustering Business Listings" section. Each over-clustering instance was split using different algorithms; the correctness of the resulting splits was then evaluated.

These data sets have been anonymized: each one is represented by an n x n similarity matrix in [0,1], and a 1 x n ground-truth vector in {1,2,...,k}. Here n is the number of elements in the data set and k is the number of ground-truth clusters that this cluster intersects.

There are 20 files specifying the similarity matrices and a single file specifying the ground-truth labels, which has 20 rows. The i-th row of the ground-truth labels file gives the ground-truth vector for the i-th similarity matrix.