Thursday, November 13, 2008

Why choosing Erlang for concurrent processing

A few weeks ago I was reading a blog article called Erlang, the next Java. While I agreed with the author's views, I felt something was bugging me in the article, which eventually lead me to write this blog article.

I didn't choose Erlang just because it's a functional language. I have chosen Erlang to learn because Erlang programming requires and has to enforce the programmers to write the concurrent code. Concurrent programing should minimize the number and size of critical regions and bottlenecks. And I think writing such programs is very difficult without the assistance of the programming language.

Many programming languages claim or plan to be capable of running a part of code based on concurrent processing. For example, Ruby will incorporate distributed storage mechanism called Roma and the task administration subsystem called Fairy. Another good example is Haskell concurrency: it is implemented as a language feature called Parallel Haskell.

I think, however, introducing concurrency while allowing programmers to write code to use shared memory will cause a lot of problems. Joe Armstrong has already described his concern on shared memory in his blog. I support his arguments; those arguments an important part of the reason why I decided to learn Erlang.

I should add another problem programmers will face when dealing with code allowing shared memory; rewriting the code for removing shared access to run it efficiently on concurrent environment will be an incredibly difficult task.

Unfortunately, most existing languages have already had a lot of code written assuming shared memory areas. For example, C code with extern variables implicitly assume those are shared between the functions in all source code files linked together. I assume I can hardly find any set of C code without using an extern declaration. I have learned that even Common Lisp has special declaration for the variables, which allows multiple functions share the same object, out of lexical scopes.

Another example of shared-memory concurrency is the operating system (OS) threads. C/C++/Java threads inherently share the parent OS process address space and environment in common. Python has the Thread Objects. While OS threads often ease implementation of concurrent servers by reducing the task switching time, the semantics is implicit and error-prone.

I understand and agree that sharing objects itself cannot be completely eliminated under the read-world constraints of processing and communication timings between programming language functions. I think, however, that programming languages should help the programmers to minimize writing code including shared memory areas, which will turn themselves into critical regions.

Erlang imposes necessary restriction on avoiding implicit data sharing between functions by:

  • prohibiting multiple variable assignments in a function;

  • enforcing and helping the programmers to conduct message-passing programming between functions, by not providing any implicit data-sharing facility between the functions;

  • providing fast task-switching capabilities, by giving the definition that functions are the minimum concurrent execution units; and

  • restricting the usage of data-sharing facilities between functions to the minimum, such as process dictionaries, ETS, DETS, Mnesia, and the global naming service shared between connected Erlang nodes, by requring explicitly writing so in the code.

In short, I think adding concurrent features is not enough for concurrent programming; prohibiting non-concurrent habits and enforcing writing concurrent programs are necessary as part of the programming language specifications. I believe concurrent programming under coming multi-processor environment is only possible under such a hard-liner attitude to the programmers. I feel programmers including myself are very conservative and rigid to change their sequential-programming habits.

And actually, I was one of those programmers who did not recognize the urgency of learning concurrent programming in 2007.