December 1st, 2005, 10:48 am
(1) The ideal degree of fuzziness: the "fuzziness" of the clustering is determined by the exponent m. In older research, you will see the unstated assumption that m = 2. In practice you must perform a grid search over values of m between 1 and infinity, and you must also perform a grid search over the hyperparameter c (number of centers in the fuzzy c-means algorithm). In order to decide on an optimizer in the grid search you must use some cluster validity index as your performance measure.(2) In order to convert the c fuzzy clusters into k "crisp" clusters you can assign the each point to the cluster for which the point's fuzzy membership is greatest. In cases where a point's fuzzy membership is equal for two or more clusters you must make an "arbitrary" decision which depends on your application.I don't understand alvinkam's comment regarding fuzzy clustering being on shaky grounds. There is a great deal of theoretical work on fuzzy c-means in the literature. I don't think the k-modes clustering algorithm is on any better grounds - after all, a metric has to be arbitrarily chosen. Which algorithm works better depends on your particular application. It is true that the simplicity of the k-modes algorithm makes it attractive and if you are completely throwing away the fuzzy sets by converting to "crisp" sets then you might as well not bother with fuzzy logic and just use k-modes.