Scalable Computing model : Lock free protocol By Peeyush Agrawal 2010MCS3469 Guided By Dr. Kolin Paul Problem: Programmer uses mutex in multi-threaded programming to prevent inconsistency in shared data. But mutex ( locks ) seems to be creating bottleneck : If we have many core system, then using mutex will create competition for lock that will keep threads/process out of ready queue (blocked, waiting for lock when waken up by other threads ) . Disadvantages of using mutex : * Taking few locks : It's common to forget to acquire lock before writing and end-up with multiple threads modifying same data. * Deadlock and live-lock: It is uncommon for a programmer to find out the situation when his system can go into deadlock. and programmer knows no-way to handle out deadlock situation. * Lost wake-ups : It is easy to forget to signal a conditional variable (lock object) on which thread is waiting. Replacements of lock for multi-core system Atomic operations : knows as CAS ( compare and swap ), which is supported by modern processors. CAS (Value held in memory, Old value, New value) { Existing value = Value held in memory; if (Existing value == Old value) { Value held in memory = New value; return Old value ( success ); } else return Value held in memory ( fail ); } Atomic operation ... So updatation routine becomes: While ( CAS ( &address, oldValue, newValue ) ) ; Retry : if CAS is failed then we need to retry the operation . Example of Atomic increment while(true){ old_val=sequenceNumber.get(); new_val=old_val+1; if(sequenceNumber.compareAndSet(old_val,new_val))//CAS break; } Benchmark : Written in C++ and tested over 16 core linux server and 8 core Solaris system. The following benchmarks were tested : 1) Counting : Many threads are incrementing a shared variable. 2) Linked List : Insertion operation performed by multiple threads if key is not there . If( !found(head,node)) insert(head,node); 3) HashTable : Extension of linked list . Benchmark HashTable: Contention : Many threads are inserting node in a same bucket. Advantage in using Atomic operation : If we have contention, then while one thread is doing insertion at the tail, another thread ( or many thread ) can proceed to search/scan the existing node. Disadvantage of Atomic operation: * If the critical section ( writings ) is large, then there would be many retries, which would degrade the performance. Conclusion : * Increasing number of thread beyond the number of physical cores does not gives us much improvement. •Having too many buckets increase probability of having no contention, so atomic operation does not seems to be advantageous over mutex . • Acquiring and releasing locks, does not create overhead. There is overhead only when there is contention. References : •Art of multiprocessor programming : Maurice Herlihy. • A pragmatic Implementation of Non-blocking Linked-lists : Timothy L. Harris.