Saturday, 1 June 2013

Why shared mutable state is the root of all evil.

Introduction


A while back I went to a talk by Martin Odersky introducing Scala. Martin is the lead designer of Scala and a very smart guy. You can have a look at the slides from the talk here.

He described how the effect of Moore's law (that the number of transistors on a reasonable priced CPU doubles roughly every 2 years) doesn't lead to higher clock speeds anymore since the early 2000s. The number of transistors is still doubling but the extra transistors are used to include additional cores per CPU and to increase L2 and L3 cache sizes. Before the 2000s programmers got a free lunch meaning that the performance of their programs automatically improved with each new processor generation. This free lunch is now over and the performance of a single threaded program hasn't improved much over the last 10 years. To be able to create programs with smaller latency or higher throughput parallel programming mechanism are needed. 

What I found really profound was this pseudo equation which Martin introduced:

       non-determinism = parallel processing + mutable state

This equation basically means that both parallel processing and mutable state combined result in non-deterministic program behaviour. If you just do parallel processing and have only immutable state everything is fine and it is easy to reason about programs. On the other hand if you want to do parallel processing with mutable data you need to synchronize the access to the mutable variables which essentially renders these sections of the program single threaded. This  is not really new but I haven't seen this concepts expressed so elegantly. A non-deterministic program is broken.

Why are parallel programs without proper synchronization broken?

There are a few different things which can go wrong when no proper synchronization policy is enforced: Race conditions, visibility issues and out-of-order processing.

Race condition

A race condition means that the outcome of an algorithm is depend on its timing and lack of influences from other threads. Lets have a lock at the check-then-act race condition. In Java it would look something like this:

public class BrokenMap<E, V> extends HashMap<E ,V> {
    public T putIfAbsent(final E key, final T value) {
        if (!this.containsKey(key)) {
            return this.put(key, value);
        } else {
            return this.get(key);
        }
    )
}

In the putIfAbsent method we first check if the element is part of the map already. If this is not the case we add the element. Otherwise we just return the existing element. This code works perfectly fine in a single threaded environment. In a multithreaded environment however a second thread could have jumped in between the two calls to containsKey and put and put its value their first. Once thread one is continuing execution again it would override the already existing value of thread two in the map.

For the method putIfAbsent to be thread-safe it needs to guaranteed that the two calls to containsKey and put are always run uninterrupted. They need to be atomic. The easiest way to ensure this to put both calls into a synchronized block which essentially renders the program single threaded in this section.

Another form of race condition is read-modify-write. In Java it would look something like this:


public class BrokenCounter {
    private long count = 0;
    public long increment() {
        return ++count;
    }
}

The class BrokenCounter has a method increment. This class is not thread-safe because the call to increment the counter ++count is not just one operation. Under the hood it is actually 3 operations.
  1. Read the old value of count.
  2. Increment the value of count.
  3. Store the new value of count.
If a second thread interrupts the first thread somewhere between step 1 and 3 it would read the same value as thread one, increment it by one and store it back into the count variable. Once the first thread resumes it would also increment the old value and store it back into the count variable. The result of this would be a counter which has been only incremented by one even though it was called twice.

For the method increment to be thread-safe it needs to guarantee that calls to ++count will not be interrupted. The easiest way to achieve this is by putting the call into a synchronized block which essentially renders this section of the program single threaded.


Visibility

Visibility means essentially that an action in thread one is visible to a second thread. In a single threaded program when you write a new value to a variable it is guaranteed that all future reads to this variable will return the new value. In a multithreaded program such a guarantee doesn't exist. Lets have a look at the example program BrokenServer:

public class BrokenServer {

    private boolean stopped=false;
    
    public void run() {
        while(!stopped) {
            /* do something useful */
        }
    }
 
    public void cancel() {
        this.stopped = true;
    }
}


In this program a method run keeps processing stuff until it get interrupted by another thread calling the cancel method. It is a bid odd to imagine but the thread in the run method may actually never see the updated value of stopped. This is because the variable stopped may be cached in a local variable in the JVM or the value is resolved from the L1/L2 cache of the core rather than from main memory. To fix this we need to tell the JVM to include a memory barrier after writing the stopped variable and another one before reading the stopped variable. The memory barrier guarantees that all writes to the variable happen before all successive reads from the same variable (like you would expect in a single threaded program).

The easiest way to achieve this is to use a synchronized block around reading and writing the variable stopped which once again renders this part of the program single threaded.


Ordering


If you made it this far you must be truly interested in Java Concurrency. Lets have a look at one more case which is even more weird than the visibility issue above. The JVM and modern CPUs are allowed to execute code out of order if that helps with speeding up the code or improve memory access pattern.

Lets have a look at another piece of code:

public class BrokenOrder {
    int a = 0;
    int b = 0;
 
    public void assignment() {
        a = 1;
        b = 2;
    }
 
    public int getSum() {
        return a+b;
    } 
}
It is actually surprisingly difficult to reason about this very simple program. What would be the result of getSum if called in a multithreaded environment?

  • Returns 0 if getSum runs before assignment
  • Returns 3 if getSum runs after assignment
  • Returns 1 if getSum runs after a=2 assignment but before b=3 assignment
  • Returns 2 if getSum runs after b=2 assignment but before a=2 assignment

This last case can happen if the JVM / CPU decides to process the assignment method out of order. To prohibit this kind of non-deterministic behavior once again we need to use a memory barrier to prevent out-of-order processing. The easiest way to do this in Java is to use a synchronized block which renders the program single threaded in this section.

Conclusion


Shared mutable state and parallel processing doesn't go together at all. If you do not use proper synchronization you will create extremely difficult to find bugs and your programs are basically broken even if they appear to work just fine in most cases.

21 comments:

  1. Very well explained. I could able to understand why people again started moving to Functional Programming.

    ReplyDelete
  2. Nice explanation, but your last example is confusing because there is no b=3 or a=2 assignments.

    ---
    Returns 1 if getSum runs after a=2 assignment but before b=3 assignment
    Returns 2 if getSum runs after b=2 assignment but before a=2 assignment

    ReplyDelete
  3. Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Java developer learn from Java Training in Chennai. or learn thru Java Online Training India . Nowadays Java has tons of job opportunities on various vertical industry.

    ReplyDelete
  4. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
    Best Devops Training in pune
    Data science training in Bangalore

    ReplyDelete
  5. Read all the information that i've given in above article. It'll give u the whole idea about it.
    Best Devops Training in pune
    Data science training in Bangalore

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. Really you have done great job,There are may person searching about that now they will find enough resources by your post
    Python Online training
    python Training in Chennai
    Python training in Bangalore

    ReplyDelete
  8. I appreciate your efforts because it conveys the message of what you are trying to say. It's a great skill to make even the person who doesn't know about the subject could able to understand the subject . Your blogs are understandable and also elaborately described. I hope to read more and more interesting articles from your blog. All the best.
    Selenium training in Chennai
    Selenium training in Bangalore
    Selenium training in Pune
    Selenium Online training

    ReplyDelete
  9. Good job in presenting the correct content with the clear explanation. The content looks real with valid information. Good Work

    DevOps is currently a popular model currently organizations all over the world moving towards to it. Your post gave a clear idea about knowing the DevOps model and its importance.

    Good to learn about DevOps at this time.


    devops training in chennai | devops training in chennai with placement | devops training in chennai omr | devops training in velachery | devops training in chennai tambaram | devops institutes in chennai | devops certification in chennai | trending technologies list 2018

    ReplyDelete
  10. Such a wonderful blog on Machine learning . Your blog have almost full information about Machine learning .Your content covered full topics of Machine learning that it cover from basic to higher level content of Machine learning . Requesting you to please keep updating the data about Machine learning in upcoming time if there is some addition.
    Thanks and Regards,
    Machine learning tuition in chennai
    Machine learning workshops in chennai
    Machine learning training with certification in chennai

    ReplyDelete
  11. Nice post. Thanks for sharing! I want people to know just how good this information is in your article. It’s interesting content and Great work.
    Thanks & Regards,
    VRIT Professionals,
    No.1 Leading Web Designing Training Institute In Chennai.

    And also those who are looking for
    Web Designing Training Institute in Chennai
    SEO Training Institute in Chennai
    Photoshop Training Institute in Chennai
    PHP & Mysql Training Institute in Chennai
    Android Training Institute in Chennai

    ReplyDelete