Interview Questions 4: Union-find with specific canonical element

Preface:

We discuss the problem of finding maximum element in a connected component in logarithmic time based on Union-Find algorithm, an extra array to keep the largest element for each root element of the connected components is needed to accomplish the task. And it seems there is no constant extra memory approach exists to my knowledge. It would be highly appreciated if anyone could comment on it and give some constant extra memory approach.

Problem Definition:

Note this problem is from “Algorithms, Part I” on Coursera. The exact problem is described in details as follows:

Union-find with specific canonical element. Add a method getMax() to the union-find data type so that getMax(i) returns the largest element in the connected component containing i. The operations, union(), connected(), and getMax() should all take logarithmic time or better.

For example, if one of the connected components is {1,2,6,9}, then the getMax() method should return 9 for each of the four elements in the connected components.

Analysis:

Assume you have basic knowledge on Union-Find Algorithm with Weighted Quick Union, Path Compression techniques, if not, please review those stuff first. Here is the algorithm: For each root of the connected component we keep an extra array MaxElement, and MaxElement[i] contains the largest element among the connected component rooted at i. Each time we do the Union operation we need to update the MaxElement[i] for the new connected component with the new root which is constant time, to return the maximum value, we first find the root, and return MaxElement[root]. So basically, Union(p, q) takes log(N) and Find(p, q) takes log(N) and Root(p) takes log(N) and getMax(p) takes log(N), note here I define the methods Find(p, q) differently from the course material and the Find(p, q) here is defined as the method to check whether p and q are connected or not. I feel it more natural to define the methods in my own way since it is Union-Find algorithm, and Union() , Find() should be the basic operation names.

More discussion on Constant Extra Memory:

Update:  Thank  Alexei Averchenko for providing the following brilliant idea! Alexei’ comment is:

“Technically this should be possible: instead of storing its own index, the root could store the negative of the index of the maximal element. Because Java doesn’t have unsigned integers, it doesn’t change how big our structure can be. It should make the implementation slower, though, because finding the root now has height-of-the-tree number of branches, while previously you could unroll the loop to reduce their number significantly like this: x = id[id[id[id[x]]]]. I may try to implement this and compare the running time. ”

##############################################################################

I spent quite a bit time on thinking about whether there is an approach to use constant extra memory because the approach we just discussed is not very difficult to come up with by utilizing an extra array, the open question is that can we improve it even better without that explicit extra array? My conclusion is that, there is no such an approach. Here is my explanation:

One tempting approach is to use the root itself to store the maximum element, since it is just a connected component, it does not matter too much who is the root as long as the tree is kept balanced. So the first thing in my mind is to use swap, for example:

Before:
     3                           4
    / \            Union        /
   1   2                       0

After:
           3                        4
         / | \                    / | \
        1  4  2    ??? --->      1  3  2
           |                        |
           0                        0
        (A)                      (B)

Note the above is the normal Weighted Quick Union operation, so we can simply do two things:
(1) execute the regular weighted Quick Untion: to keep the tree balanced
(2) exchange or swap the two root 3 and 4, actually, after we got the balanced tree, we just compare the separate root 3 and 4 and put the larger one in the root

However, the above two steps trying to avoid the explicit extra array would make the Union operation Linear rather than logarithmic because essentially we do not have a convenient tree structure here. This is because we only have an array data, to convert from the (A) to (B), we have to linear scan all the elements whose parent is 4 and update it to 3. This approach is tempting at the first place because I thought it simply involves a swap operation. However, it is simple to swap if we have pointers and a complete tree structure.

Anyone who have some different ideas on this please feel free the comment and it would really be appreciated!

Conclusion:

We discuss how to modify Union-Find algorithm to get the largest element in each connected components with every operation taking logarithmic time or better. And it turns out an extra array has to be allocated to keep the largest value and no constant extra memory approach could be achievable (Again, please correct me if you have some constant memory approach and it would be highly appreciated if you can share that approach!!)

(End. Original articles, please associate the author information and original URL with any reprints)


(Please specify the source  烟客旅人 sigmainfy — http://www.sigmainfy.com for any reprints,  

and please do not use it for commercial purpose)

Written on September 14, 2014