Home » » TreeSet.removeAll() and Comparators Not Consistent with Equals

TreeSet.removeAll() and Comparators Not Consistent with Equals


Sent to you by jeffye via Google Reader:


via LingPipe Blog by lingpipe on 3/6/09

Works on My Machine Seal of Approval

I thought it'd be convenient to upgrade com.aliasi.util's BoundedPriorityQueue<E> to have it implement java.util's SortedSet<E>. That way I could use more generic returns for things like the auto-completer. Turns out I built a tighter bubble-sort based sorted set for that in the end, but that's a different story.

So I replaced the extends AbstractCollection<E> with extends AbstractSet<E>. Everything's hunky-dory, and it's passing all of our unit tests.

Move on a few days, and I'm doing massive refactorings to remove compiler warnings, so I want to use Eclipse, which has even more aggressive compiler warnings than the JDK. I couldn't get it to work on 64-bit Windows, so I go home and fire up our antique ThinkPad. Eclipse only does JDK 1.5, so I plug that in. I do a little refactoring, then run unit tests. Failing in cluster.CompleteLinkClusterer. WTF? I shouldn't have changed anything recently that has an effect on clustering. I trace through the code by dumping out dendrograms as they get created (agglomerative clusterers are easy to trace). It's getting the distances wrong. The only way it could do that is if the priority queue of inter-cluster distances is messed up. Uh oh, that's my new BoundedPriorityQueue. How'd I mess it up?

Next, I back all the changes I've made with Eclipse and still it's failing. But hang on, it worked on my machine! So I switch the JDK back to 1.6 and voilà, it works again. How can it work in 1.6 and not 1.5? Unfortunately, I still don't know the answer. But I did, through more fine-grained tracing, figure out where it was going wrong. The culprit is the inconsistent behavior of TreeSet's removeAll() method, which is only javadoc-ed on the superclass, AbstractSet, which points to the source of the problem:

This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.

The problem here is that iterating over the tree set itself (what happens when it's smaller) uses a test with contains() and then calls the remove method on the iterator, whereas iterating over the argument collection calls remove() on the tree set. These differ with respect to whether they use the natural ordering (equals()) or whether they use the tree set's comparator method. Thus tree sets whose comparators are not compatible with natural equality run into problems.

Needless to say, there've been lots of "bugs" filed against this behavior, every one of which Sun dismisses as "obeying the letter of the spec", but I happen to agree with user "gouldina" on this one:

  • JDK Bug 4730113: TreeSet removeAll(), retainAll() don't use comparator; addAll() does
  • JDK Bug 6394757: (coll) AbstractSet.removeAll is surprisingly dependent on relative collection sizes

Here's the code from the 1.5 JDK (version 1.26):

 public boolean removeAll(Collection c) {     boolean modified = false;     if (size() > c.size()) {         for (Iterator i = c.iterator(); i.hasNext(); )             modified |= remove(i.next());     } else {         for (Iterator i = iterator(); i.hasNext(); ) {             if (c.contains(i.next())) {                 i.remove();                 modified = true;             }         }     }     return modified; } 

So I simply implemented my own removeAll() for bounded priority queues that used the iterator-based approach (first case above). That's how AbstractCollection implements it, and why the program worked before. Problem solved. Of course, it's ridiculously inefficient for large queues. I need a better data structure here.

The root cause (pun intended) is a trickier question. Both JDK 1.5 and 1.6 have the same definition of removeAll(), but it's a maze of twisty little passages, all different, out from there through all the indirection of the remove() and contains() through the superclasses and contained map's definitions. Something must be different in 1.5 and 1.6, because I'm getting different answers. But I have better things to do than trace down what it is.


Things you can do from here:



Popular Posts