ArrayIndexOutOfBoundsException from Java Path.normalize() for Empty Path

Overview

This post explains why and how ArrayIndexOutOfBoundsException is thrown from Java Path.normalize() for Empty Path directly from the angle of UnixPath source code.

ArrayIndexOutOfBoundsException from Java Path.normalize() for Empty Path

Well, what happens is that the following simple one line statement throws and ArrayIndexOutOfBoundsException:

public class App {
	public static void main( String[] args ) {
		Paths.get("").normalize();
		System.out.println( "Hello World!" );
    }
}
/*
The following exception is thrown:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
	at sun.nio.fs.UnixPath.normalize(UnixPath.java:508)
	at net.tech-wonderland.app.App.main(App.java:11)
*/

The even more weird thing is that this code could pass without throwing ArrayIndexOutOfBoundsException on some Ubuntu system, but fail by throwing the ArrayIndexOutOfBoundsException on other certain environments. So I started to think about why. I searched the UnixPath source code at line 508:

494        // first pass:
495        //   1. compute length of names
496        //   2. mark all occurences of "." to ignore
497        //   3. and look for any occurences of ".."
498        for (int i=0; i<count; i++) {
499            int begin = offsets[i];
500            int len;
501            if (i == (offsets.length-1)) {
502                len = path.length - begin;
503            } else {
504                len = offsets[i+1] - begin - 1;
505            }
506            size[i] = len;
507
508            if (path[begin] == '.') {
509                if (len == 1) {
510                    ignore[i] = true;  // ignore  "."
511                    remaining--;
512                }
513                else {
514                    if (path[begin+1] == '.')   // ".." found
515                        hasDotDot = true;
516                }
517            }
518        }

It indicates, that path[begin] is the cause, and from the context of the source code, offsets[i] would be non-negative, this further indicates, path is an empty byte array in our case which is correct because we pass an empty string and it should be an empty path. Then why it reaches the line of 508? Here is why:

483    public Path normalize() {
484        final int count = getNameCount();
485        if (count == 0)
486            return this;

The above is the first a few lines of the implementation of normalize method, from my intuitive understanding, an empty path should return zero when calling getNameCount() because we do not have any names. However, it turns out when it is empty path, the statement getNameCount() will return 1. You can verify this by seeing the results of the following code

public class App {
	public static void main( String[] args ) {
		System.out.println("NameCount = " + Paths.get("").getNameCount());
		Paths.get("").normalize();
		System.out.println( "Hello World!" );
    }
}

/* output:
NameCount = 1
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
	at sun.nio.fs.UnixPath.normalize(UnixPath.java:508)
	at net.tech-wonderland.app.App.main(App.java:11)
*/

Considering that this does not happen in certain environments, I have several conclusions and suggestion about using Java Path.normalize() for Empty Path, some of which are not fully verified, feel free to correct me and leave any comments:

  1. OpenJDK might have some bugs, as in this case, it is not correct to return 1 from getNameCount for an empty Path, and this results in ArrayIndexOutOfBoundsException when calling normalize on empty path. Oracle JDK might not have this issue, this issue might only apply to OpenJDK because the same problem does not happen for certain other cases.
  2. As suggested in Oracle’s JavaDoc for using Paths.get(“”):Note that while this method is very convenient, using it will imply an assumed reference to the default FileSystem and limit the utility of the calling code. Hence it should not be used in library code intended for flexible reuse. A more flexible alternative is to use an existing Path instance
    as an anchor, such as:Path dir = …
    Path path = dir.resolve(“file”);

More Details: As I mentioned previously, the even more weird thing is that this code could pass without throwing ArrayIndexOutOfBoundsException on some Ubuntu system, but fail by throwing the ArrayIndexOutOfBoundsException on other certain environments.

The following is the reason I think why the same piece of code behaves differently, I think openjdk has bug regarding with this Path API.

$ (The machine on which the code does NOT throw ArrayIndexOutOfBoundsException)
$ mvn –version
Apache Maven 3.2.5 xxx
Maven home: /usr/maven/apache-maven-3.2.5
Java version: 1.7.0_76, vendor: Oracle Corporation
Java home: /usr/java/jdk1.7.0_76/jre
Default locale: xxx
OS name: xxx

$ (The machine on which the code throws ArrayIndexOutOfBoundsException)
$ mvn –version
Apache Maven 3.3.1 xxx
Maven home: /usr/local/apache-maven
Java version: 1.7.0_75, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-7-openjdk-amd64/jre
Default locale: xxx
OS name: xxx

Summary

I explain why and how ArrayIndexOutOfBoundsException is thrown from Java Path.normalize() for Empty Path directly from the angle of UnixPath source code.

Written on March 30, 2015