50052. K-means Algorithm

I'm a slow walker, but I never walk backwards.

Task Description

Given $N$ different strings with the same length $L$, categorize these strings into three groups using K-means algorithm.

A string with length $L$ can be represented by a vector with length $L$. The elements in the vector are the ASCII codes of the corresponding characters in the string. For example, string APPLE can be represented by the vector (65, 80, 80, 76, 69).

The distance of two strings is defined as the Manhattan distance of their vector representations. For example, the distance between CAT and DOG is the distance between vector (67, 65, 84) and (68, 79, 71), which is $\vert 67-68\vert + \vert 65-79\vert + \vert 84-71\vert = 28$.

We describe how the K-means algorithm works. Given $N$ strings ($N \geq 3$), our goal is to categorize those strings into three groups. First, we pick the first three strings to be three group leaders. Then we perform the following steps for $R$ rounds:

  1. First we assign each string to a leader that has the smallest distance to it. Those strings assigned to the same leader are in the same group. If a string has the same minimum distance to multiple leaders, it will be assigned to the leader that appears first lexicographically. For example DOC is lexicographically earlier (字典序較小) than DOG since C < G.

  2. Then we calculate the mean for each group. The mean is defined as the sum of all vectors in the group, divided by the number of the strings in the group. For simplicity we use integer division in this problem. For example, a group with three strings CAT, DOG and COW has a mean value $((67, 65, 84) + (68, 79, 71) + (67, 79, 87)) / 3 = (67, 74, 80)$.

  3. Then we assign the string with the smallest distance to the mean as the new group leader. Again, if there are multiple choices, we pick the string that is the earliest lexicographically as the new leader.

After $R$ rounds, output the three leaders in dictionary order.

All the strings have the same length $L$ and contain only capital alphabets.

Input Format

Input contains one test case. The first line contains three integers $N$, $L$ and $R$, indicating the number of the strings, the length of the strings, and the number of rounds need to be performed. For the following $N$ lines, each line contains a string with exactly $L$ capital letters. Note that the first three strings are the initial leaders and all the strings are different.

Technical limitation

  • $3 \leq N \leq 50$
  • $1 \leq L \leq 10$
  • $1 \leq R \leq 10$

Output Format

After performing the k-means algorithm for $R$ rounds, output the three leaders in dictionary order.

Sample Input 1

8 3 2
DOC
DOG
FOG
CAT
HAT
RAT
YOU
HIM

Sample Output 1

DOC
HAT
HIM

Explanation

At the beginning, the first three strings DOC, DOG and FOG are the three leaders.

  • Round 1:

    The distance between CAT and the three leaders are dist(CAT, DOC) = 32, dist(CAT, DOG) = 28 and dist(CAT, FOG) = 30.

    Therefore, CAT belongs to the second group, which has leader DOG. Similarly, we assign the rest of the strings to the leader with the smallest distance to it, and get the result

    | #Group | Leader | Members                 |
    |--------|--------|-------------------------|
    | 1      | DOC    | DOC                     |
    | 2      | DOG    | CAT, DOG                |
    | 3      | FOG    | FOG, HAT, HIM, RAT, YOU |

    Second, we calculate the mean of each group and get

    | #Group | Leader | Mean                                                                                            |
    |--------|--------|-------------------------------------------------------------------------------------------------|
    | 1      | DOC    | (68, 79, 67) / 1 = (68, 79, 67)                                                                 |
    | 2      | DOG    | [ (67, 65, 84) + (68, 79, 71) ] / 2 = (67, 72, 77)                                              |
    | 3      | FOG    | [ (70, 79, 71) + (72, 65, 84) + (72, 73, 77) + (82, 65, 84) + (89, 79, 85) ] / 5 = (77, 72, 80) |

    Finally, we assign a new leader for each group according to their distance to the mean value. The result is as follows:

    | #Group | Leader | New leader |
    |--------|--------|------------|
    | 1      | DOC    | DOC        |
    | 2      | DOG    | CAT        |
    | 3      | FOG    | HIM        |
  • Round 2:

    The second round is performed similarly.

    | #Group | Leader | Members       | Mean         | New leader |
    |--------|--------|---------------|--------------|------------|
    | 1      | DOC    | DOC, DOG, FOG | (68, 79, 69) | DOC        |
    | 2      | CAT    | CAT, HAT, RAT | (73, 65, 84) | HAT        |
    | 3      | HIM    | HIM, YOU      | (80, 76, 81) | HIM        |
  • After two rounds, the final three leaders are DOC, HAT and HIM, which is the answer of the sample input.

Discussion