Avoid Backwash Issue in Percolation Problem (Without 2 WQUUF)
This is a summary of all different solutions to the backwash problem within the percolation simulation including: 1) Associate each component root with an additional information (Without 2 WQUUF), I summarize two methods here, one involves modification of the given API while the other not; 2) using TWO Weighted Quick Union Union Find (WQUUF) objects. If you want to check the one without 2 WQUUF and no need to modify the given API which is what most of the people are looking for, you can directly jump to the the end of section 3 “Solutions to Resolve the issue” for the details.
Backwash Problem Definition:
In the context of Percolation, the backwash issue is that some site might be mistakenly judged as a full site (A full site is an open site that can be connected to an open site in the top row via a chain of neighboring (left, right, up, down) open sites.) if we directly adopt the dummy nodes suggested in the course material, i.e., a top virtual node connected to each site in the first first top row, another bottom virtual node connected to each site in the last bottom row. More concretely, it is just as the following pic shows (adopted from this post)
Solutions to Resolve the Backwash Issue:
1) The easiest way to tackle this problem is to use two different Weighted Quick Union Union Find objects. The only difference between them is that one has only top virtual site (let’s call it uf_A), the other is the normal object with two virtual sites top one and the bottom one (let’s call it uf_B) suggested in the course, uf_A has no backwash problem because we “block” the bottom virtual site by removing it. uf_B is for the purpose to efficiently determine if the system percolates or not as described in the course. So every time we open a site (i, j) by calling Open(i, j), within this method, we need to do union() twice: uf_A.union() and uf_B.union(). Obviously the bad point of the method is that: semantically we are saving twice the same information which doesn’t seem like a good pattern indeed. The good aspect might be it is the most straightforward and natural approach for people to think of.
2) And there turned out to be more elegant solutions without using 2 WQUUF if we can modify the API or we just wrote our own UF algorithm from scratch. The solution is from “Bobs Notes 1: Union-Find and Percolation (Version 2)“: Store two bits with each component root. One of these bits is true if the component contains a cell in the top row. The other bit is true if the component contains a cell in the bottom row. Updating the bits after a union takes constant time. Percolation occurs when the component containing a newly opened cell has both bits true, after the unions that result from the cell becoming open. Excellent! However, this one involves the modification of the original API. Based on this and some other discussion from other threads in the discussion forum, I have come up with the following approach which need not to modify the given API but adopt similar idea by associating each connected component root with the information of connection to top and/or bottom sites.
(3) Here we go for the approach based on Bob’s notes while involving no modification of the original given API:
In the original Bob’s notes, it says we have to modify the API to achieve this, but actually it does not have to be like that. We create ONE WQUUF object of size N * N, and allocate a separate array of size N * N to keep the status of each site: blocked, open, connect to top, connect to bottom. I use bit operation for the status so for each site, it could have combined status like Open and connect to top.
The most important operation is open(int i, int j): we need to union the newly opened site (let’s call it site ‘S’) S with the four adjacent neighbor sites if possible. For each possible neighbor site(Let’s call it ‘neighbor’), we first call find(neighbor) to get the root of that connected component, and retrieves the status of that root (Let’s call it ‘status’), next, we do Union(S, neighbor); we do the similar operation for at most 4 times, and we do a 5th find(S) to get the root of the newly (copyright @sigmainfy) generated connected component results from opening the site S, finally we update the status of the new root by combining the old status information into the new root in constant time. I leave the details of how to combine the information to update the the status of the new root to the readers, which would not be hard to think of.
For the isFull(int i, int j), we need to find the the root site in the connected component which contains site (i, j) and check the status of the root.
For the isOpen(int i, int j) we directly return the status.
For percolates(), there is a way to make it constant time even though we do not have virtual top or bottom sites: think about why?
So the most important operation open(int i, int j) will involve 4 union() and 5 find() API calls.
Conclusion and Summary of Backwash Problem:
I outline the tree methods here to resolve the backwash (copyright @sigmainfy) issues:
1) Using 2 WQUUF objects:
- open(): would involves 8 union() API calls at most but the time complexity is still 8 * log (N^2) = O(logN)
- isOpen(): O(1) by memorizing in a 2-d array
- isFull(): by call connected() API and thus is O(logN)
2) Using 1 QQUUF objects and store additional information for keeping track of each connected component root:
- open(): would involves 4 union() and 5 find() API calls at most but the time complexity is still 9 * log (N^2) = O(logN)
- isOpen(): O(1) by memorizing in a 2-d array
- isFull(): by call find() API and thus is O(logN)
A final remark: one key observation here is that to test a site is full or not, we only need to know the status of the root of that connected component containing the site, whether a site is full is the same question interpreted by asking whether the connected component is full or not: this make things simple and saves time, someone in the forum proposed to linear scan the whole connected component to update the status of each site which is not necessary and inefficient. each time we only update the status of the root site rather than each site in that component.