(Junior) Varsity Interview: String Reverse

We’re hiring, so interviews have been on my mind.

I’m cross-posting what was mostly a response to reading Stephans Blog [sic], which was itself a riff on Joel on Software’s Guerilla Guide to Interviewing. Both ask us to look at the string reverse problem.

Apparently, if you are interviewing at a tech company, at some point
along the way you’re going to be asked to reverse a string. Or a
sequence of tokens. Really. No, really, I’m not kidding.

All of the answers I’ve seen are pure junior varsity when it comes to string processing.

Before reversing a “string”, we need to get clear on what we want.
At one level, a Java String object is just an array of chars. At
another level, it represents a sequence of unicode code points. The
tension arises from the fact that String(char[]) lets you
construct a string with a sequence of chars that don’t correspond to
valid unicode code points. This tension was ratcheted up in the 1.5 JDK
when they moved to Unicode 4.0.

In the 1.5 JDK, Sun changed the behavior of the StringBuffer.reverse() method in a way that was not backward compatible with 1.4. That is, there are StringBuffer instances for which reverse() in 1.4 returns a different value than in 1.5.

The 1.5 version of reverse() is sensitive to surrogate
pairs (unicode code points requiring more than 16 bits, and hence more
than one char, in UTF-16). In 1.5, both java.lang.StringBuilder and java.lang.StringBuffer use the implementation of reverse() from their common superclass, java.lang.AbstractStringBuilder.

Here’s the first paragraph of doc for java.lang.StringBuffer.reverse() from JDK 1.5:

“Causes this character sequence to be replaced by the reverse of
the sequence. If there are any surrogate pairs included in the
sequence, these are treated as single characters for the reverse
operation. Thus, the order of the high-low surrogates is never
reversed.”

And here’s the first paragraph of doc from java.lang.StringBuffer.reverse() from JDK 1.4:

“The character sequence contained in this string buffer is replaced by the reverse of the sequence.”

Following Stephan’s suggestion to use the built-in has either a good
or bad side effect. Moving from 1.4 to 1.5 either breaks backward
compatibility for the string as char sequence representation, or
appropriately handles unicode 5.0 in the string as sequence of code
points representation.

Extra credit 1: Recursion won’t work because it’ll blow out the
stack if we’re using Sun’s JDKs, which (at least so far) don’t perform
tail recursion optimization (a kind of last call optimization).

Extra credit 2: The exception thrown when trying to reverse a null
string should be a null pointer exception. That’s how Sun codes the JDK
itself (see, e.g., the java.lang.String.String(String) constructor). It’s a runtime exception because it’s a coding error to send reverse(String) a null string (assuming you want behavior like your call to StringBuffer.reverse()). It should be a null pointer exception, as that’ll lead you right to the problem while debugging.

Do I get a callback for a second interview?

1 Comments:

Stephan.Schmidt said...: Probably ;-)

"Extra credit 1: Recursion won’t work because it’ll blow out the
stack if we’re using Sun’s JDKs, which (at least so far) don’t perform
tail recursion optimization (a kind of last call optimization)."

See
http://stephan.reposita.org/archives/2007/11/12/string-reversing-part-ii-tail-recursion/

for a discussion on tail recursion.

Your considerations are too sophisticated for most candidates. Which is the common error commenters on that post make.

Peace
-stephan; 12:11 AM

1 Comments:

Popular Posts

IR、ML、NLP

Total Pageviews