Henrik Eichenhardt's blog: May 2013

Sunday, 19 May 2013

NoSQL and the technology adoption lifecycle

Introduction

Last week I went to a talk titled "Considerations for using NoSQL technology on your next IT project" by Akmal Chaudhri. NoSQL is a vast field and the headline was rather vague so I was not sure what to expect.

In the end I was really glad I went. Akmal is a great speaker with decades of experience in IT. He really helped put things in the NoSql space into perspective by comparing the "revolution" of object oriented data stores in the early 90s and the trend with XML based databases in the early 2000s with today's hype of NoSQL data bases. It is interesting to compare past trends or hypes which haven't quite taken off with the current NoSQL landscape.

In the early 90s and 2000s Object-oriented and XML based databases seemed to be the next big thing and analysts predicted a strong growth for these technologies. In reality their adaptation became never widespread and their market share remained in the single digits.

Technology Adoption Lifecycle

The technology adoption life-cycle classifies the technology adoption into 5 stages and can be visualized in Roger's bell curve.

Innovators - Techies like Google and Amazon. They innovate based on a deep need.
Early adopters - Visionaries who like to try out new technologies and are generally curious. These are the opinion leaders.
Early majority - Pragmatists who are not really interested in new technologies per se. They are looking for the best tool to get the job done as fast as possible.
Late Majority - Conservatives who try to preserve the status quo and stick with known technologies because they are perceived to be safer even if it means more work.
Laggards - Skeptics

Crossing the chasm

What really rang through for me was the picture of crossing the chasm. Crossing the chasm is a book by Geoffrey Moore "that focuses on the specifics of marketing high tech products during the early start up period..." [wiki]

The chasm lies between the early adopters and the early majority (see picture above). It demonstrates that it is quite difficult for new products / technologies to make the jump from the visionaries to a much wider adoption of the pragmatics who are not really interested in new technologies but just want to get the job done. A lot of products which once were new and exciting lie deep down in the chasm.

Conclusion

It is not clear if NoSQL really is the next big thing. There is a lot of hype around this new technology and NoSQL has yet to prove that it can jump over the chasm. From what I can see around me, NoSQL is in a state where the pragmatists are experimenting with the new technologies and developing proof of concepts to see if NoSQL has the ability to make their life easier.

Tuesday, 14 May 2013

Continuous delivery using MongoDB

Introduction

Last weekend I organized, managed and supported our production release for the first time. Our current primary data store is SQL Server and the release follows roughly this order:

Shutdown application
Backup databases
Run data definition deployment scripts and data scripts
Deploy the application
Start-up application
Test application

From shutdown to start-up there are a good few hours off downtime for the application. That's why releases usually happen on the weekend. It would be nice however if we would be a bit more flexible and the team would appreciate it if we could do deployments after hours during the week. To be able to do this though we would need to cut down the deployment time.

One way of doing this is to use a schemaless database like MongoDB. Since MongoDB doesn't enforce a schema there are no data definition scripts to run either on the database side. The schema is enforced on the application level. That means that the application is responsible for writing data in a safe way and providing read methods which can retrieve the stored data again. By deploying the new version of the application we would therefore also roll out the new version of the schema implicitly.

Data migration in MongoDB

After deploying the new version of the application existing data needs to be migrated. Lets take for example the common case of adding a new field to a collection. In RDBMS we would add the column using a DDL statement and either set a default value or run a batch update to populate the column.

In MongoDB there is a incremental way of achieving the same thing. Currently I am reading the book "NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence" by Pramod J. Sadalage and Martin Fowler which introduced an interesting pattern to achieve incremental database deployment.

The application needs to make sure that during the transition phase it can read both the old version of the document and the new version. However once the document gets saved the application would save only the new version. That way the entire data set will get migrated over time. To implement this the book suggest to add a field schema_version to each document. This schema version could correspond to the version of the application. Based on this field the application can decide if a document has been migrated already. If a document is loaded which is of an older version the application can provide special code to execute for migrating to the next version and save it.

Once all documents are migrated the migration code in the application can safely be removed in the next release.

How are Java generics implemented under the hood?

Introduction

Today at lunch we had a chat about generics and their implementation in Java. Generics are available since Java 5. To stay byte code compatible between Java 1.4 and 5 generics are implemented by using type erasure. That means that at compile time all generic types are striped out and the resulting byte code only contains raw types like:

List a = ArrayList();
instead of
List<MyType> a = new ArrayList<MyType>();

This means that when we read elements of the list at runtime there needs be a cast added to check that both sides of the assignment have a compatible type:

MyType a = (MyType)list.get(0);
instead of
MyType a = list.get(0);

Java Class File Disassembler

To analyse how this cast is actually implemented in Java byte code one can use the Java Class File Disassembler. This command line tool ships with every Oracle JDK distribution and is called javap.

To inspect the byte code of a given class file use the command: javap -c <classfile>
For example:

javap -c org.henrikeichenhardt.javabytecode.Main

The output can get quite verbose and it makes sense to boil down the problem you want to look at to a few lines of code - just enough so you can create and compile you scenario.

Under the hood

For our scenario the Java code would look like this:


package org.henrikeichenhardt.javabytecode;

import java.util.ArrayList;
import java.util.List;

public class Main {
   public static void main(final String[] args) {
    final List<Main> list = new ArrayList<Main>();
    list.add(new Main());
    
    final Main value = list.get(0);
   }
}

We simply create an java.util.ArrayList with type of Main and add one element to this list. Then we retrieve this element again from the list. This few lines of code result in the following output of javap:


Compiled from "Main.java"
public class org.henrikeichenhardt.javabytecode.Main extends java.lang.Object{
public org.henrikeichenhardt.javabytecode.Main();
  Signature: ()V
  Code:
   0:   aload_0
   1:   invokespecial   #8; //Method java/lang/Object."<init>":()V
   4:   return

public static void main(java.lang.String[]);
  Signature: ([Ljava/lang/String;)V
  Code:
   0:   new     #16; //class java/util/ArrayList
   3:   dup
   4:   invokespecial   #18; //Method java/util/ArrayList."<init>":()V
   7:   astore_1
   8:   aload_1
   9:   new     #1; //class org/henrikeichenhardt/javabytecode/Main
   12:  dup
   13:  invokespecial   #19; //Method "<init>":()V
   16:  invokeinterface #20,  2; //InterfaceMethod java/util/List.add:(Ljava/lang/Object;)Z
   21:  pop
   22:  aload_1
   23:  iconst_0
   24:  invokeinterface #26,  2; //InterfaceMethod java/util/List.get:(I)Ljava/lang/Object;
   29:  checkcast       #1; //class org/henrikeichenhardt/javabytecode/Main
   32:  astore_2
   33:  return

}

The first block of information beginning with

public org.henrikeichenhardt.javabytecode.Main();

is the code to generate the (implicit) constructor of the Main object. We can ignore this for now. The second block of information is the byte code representing the code of the static main method

public static void main(java.lang.String[]);

On the first few lines the instances of the ArrayList is created and a reference is stored in on the stack. Then an instance of the Main object is created and added to the list. Now lets focus on reading an element from the list:

final Main value = list.get(0);

This resulting Java byte code looks like this:


   24:  invokeinterface #26,  2; //InterfaceMethod java/util/List.get:(I)Ljava/lang/Object;
   29:  checkcast       #1; //class org/henrikeichenhardt/javabytecode/Main
   32:  astore_2

Without going into to much detail here what basically happens is that invokeinterface calls the method get of our list and puts the result on top of the stack. Then the command checkcast is executed. It takes only one argument which is a reference to the type which needs to be checked (Main). This type is then compared with the top of the stack (where we previously put the result of list.get(0)).

How does checkcast work?

The documentation of the checkcast instruction reveals how this check is done:

If the stack reference is null nothing happens.
If the stack reference can be cast to the type provided nothing happens.
Otherwise a ClassCastException is thrown.

That means that checkcast is very similar to an instanceOf test. If the checkcast test is passed execution continues normally and the top of the stack is stored in a local variable.

So what the problem boils down to is how does checkcast know if it can cast a value to a type? The following is copied from [1]:

The following rules are used to determine whether an objectref that is not null can be cast to the resolved type: if S is the class of the object referred to by objectref and T is the resolved class, array, or interface type, checkcast determines whether objectref can be cast to type T as follows:

If S is an ordinary (nonarray) class, then:

If T is a class type, then S must be the same class as T, or S must be a subclass of T;

If T is an interface type, then S must implement interface T.

If S is an interface type, then:

If T is a class type, then T must be Object.

If T is an interface type, then T must be the same interface as S or a superinterface of S.

If S is a class representing the array type SC[], that is, an array of components of type SC, then:

If T is a class type, then T must be Object.

If T is an interface type, then T must be one of the interfaces implemented by arrays (JLS §4.10.3).

If T is an array type TC[], that is, an array of components of type TC, then one of the following must be true:

TC and SC are the same primitive type.

TC and SC are reference types, and type SC can be cast to TC by recursive application of these rules.

It is now up to the JVM implementation how efficient this checks are executed and if any profiling or heuristics are applied. In our example S is an ordinary class and T is a class type and both are the same which satisfies the first case above.

[1] Oracle checkcast