Introduction
Today at lunch we had a chat about generics and their implementation in Java. Generics are available since Java 5. To stay byte code compatible between Java 1.4 and 5 generics are implemented by using type erasure. That means that at compile time all generic types are striped out and the resulting byte code only contains raw types like:
List a = ArrayList();
instead of
List<MyType> a = new ArrayList<MyType>();
This means that when we read elements of the list at runtime there needs be a cast added to check that both sides of the assignment have a compatible type:
MyType a = (MyType)list.get(0);
instead of
MyType a = list.get(0);
Java Class File Disassembler
To analyse how this cast is actually implemented in Java byte code one can use the Java Class File Disassembler. This command line tool ships with every Oracle JDK distribution and is called javap.
To inspect the byte code of a given class file use the command:
javap -c <classfile>
For example:
javap -c org.henrikeichenhardt.javabytecode.Main
The output can get quite verbose and it makes sense to boil down the problem you want to look at to a few lines of code - just enough so you can create and compile you scenario.
Under the hood
For our scenario the Java code would look like this:
package org.henrikeichenhardt.javabytecode;
import java.util.ArrayList;
import java.util.List;
public class Main {
public static void main(final String[] args) {
final List<Main> list = new ArrayList<Main>();
list.add(new Main());
final Main value = list.get(0);
}
}
We simply create an
java.util.ArrayList
with type of
Main
and add one element to this list. Then we retrieve this element again from the list. This few lines of code result in the following output of
javap
:
Compiled from "Main.java"
public class org.henrikeichenhardt.javabytecode.Main extends java.lang.Object{
public org.henrikeichenhardt.javabytecode.Main();
Signature: ()V
Code:
0: aload_0
1: invokespecial #8; //Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Signature: ([Ljava/lang/String;)V
Code:
0: new #16; //class java/util/ArrayList
3: dup
4: invokespecial #18; //Method java/util/ArrayList."<init>":()V
7: astore_1
8: aload_1
9: new #1; //class org/henrikeichenhardt/javabytecode/Main
12: dup
13: invokespecial #19; //Method "<init>":()V
16: invokeinterface #20, 2; //InterfaceMethod java/util/List.add:(Ljava/lang/Object;)Z
21: pop
22: aload_1
23: iconst_0
24: invokeinterface #26, 2; //InterfaceMethod java/util/List.get:(I)Ljava/lang/Object;
29: checkcast #1; //class org/henrikeichenhardt/javabytecode/Main
32: astore_2
33: return
}
The first block of information beginning with
public org.henrikeichenhardt.javabytecode.Main();
is the code to generate the (implicit) constructor of the
Main
object. We can ignore this for now. The second block of information is the byte code representing the code of the static main method
public static void main(java.lang.String[]);
On the first few lines the instances of the ArrayList is created and a reference is stored in on the stack. Then an instance of the
Main
object is created and added to the list. Now lets focus on reading an element from the list:
final Main value = list.get(0);
This resulting Java byte code looks like this:
24: invokeinterface #26, 2; //InterfaceMethod java/util/List.get:(I)Ljava/lang/Object;
29: checkcast #1; //class org/henrikeichenhardt/javabytecode/Main
32: astore_2
Without going into to much detail here what basically happens is that
invokeinterface
calls the method
get
of our list and puts the result on top of the stack. Then the command
checkcast
is executed. It takes only one argument which is a reference to the type which needs to be checked (Main). This type is then compared with the top of the stack (where we previously put the result of
list.get(0)
).
How does checkcast work?
The
documentation of the
checkcast
instruction reveals how this check is done:
- If the stack reference is
null
nothing happens.
- If the stack reference can be cast to the type provided nothing happens.
- Otherwise a
ClassCastException
is thrown.
That means that
checkcast
is very similar to an
instanceOf
test. If the
checkcast
test is passed execution continues normally and the top of the stack is stored in a local variable.
So what the problem boils down to is how does
checkcast
know if it can cast a value to a type? The following is copied from [1]:
The following rules are used to determine whether an objectref that is not null
can be cast to the resolved type: if S is the class of the object referred to by objectref and T is the resolved class, array, or interface type, checkcast determines whether objectref can be cast to type T as follows:
If S is an ordinary (nonarray) class, then:
If T is a class type, then S must be the same class as T, or S must be a subclass of T;
If T is an interface type, then S must implement interface T.
If S is an interface type, then:
If T is a class type, then T must be Object
.
If T is an interface type, then T must be the same interface as S or a superinterface of S.
If S is a class representing the array type SC[]
, that is, an array of components of type SC, then:
If T is a class type, then T must be Object
.
If T is an interface type, then T must be one of the interfaces implemented by arrays (JLS §4.10.3).
If T is an array type TC[]
, that is, an array of components of type TC, then one of the following must be true:
TC and SC are the same primitive type.
TC and SC are reference types, and type SC can be cast to TC by recursive application of these rules.
It is now up to the JVM implementation how efficient this checks are executed and if any profiling or heuristics are applied. In our example S is an ordinary class and T is a class type and both are the same which satisfies the first case above.
[1]
Oracle checkcast