Profiling and Optimization of AmbientTalk

Tom Van Cutsem, PROG
Christophe Scholliers, PROG

The goal of this apprenticeship proposal is to perform a thorough profiling and optimization of the AmbientTalk programming language. AmbientTalk is a dynamically typed language in the spirit of Ruby and Python, but specifically targeted at writing distributed applications in mobile (ad hoc) networks.

A proof-of-concept interpreter for AmbientTalk exists (written in Java), but it has virtually no optimizations and is therefore not competitive with respect to similar languages like Jython and JRuby.

The goal of this apprenticeship project is to:

benchmark AmbientTalk according to the Programming Languages shootout benchmarks. We require an implementation of at least 5 different benchmarks in AmbientTalk. This allows us to compare AmbientTalk's performance with similar languages.
profile the AmbientTalk interpreter, to get an overview of the 'hot spots' that need optimization.
optimize the 'hot spots' of the interpreter by means of any applicable technique (e.g. caching, compilation to bytecode, etc.)

More detail about each of these steps is provided below:

Step 1: Benchmarking

Implement the language shootout benchmarks in AmbientTalk.
Identify useful “competitor” languages (likely competitors are JRuby, Jython and Groovy).
Implement little “micro-benchmarks” that test the runtime of e.g. a method call, an assignment, a variable lookup in all languages under study.
Install competitors and AmbientTalk on the same test machine.
Run the benchmarks on all languages (requires reading about how to correctly measure benchmarks, taking into account startup time, JIT behavior, etc.)
Process results (requires some basic knowledge about statistics)

Step 2: Profiling

Identify useful tools (Java profilers, Eclipse Plug-ins) that allow us to profile the AmbientTalk interpreter.
Profile the interpreter using those tools.
Identify “hot-spots” (= repeatedly invoked code, meaning that optimizing this code pays off the most) and “bottlenecks” (those pieces of code that introduce the most performance penalties)

Step 3: Optimizing (not all of the below optimizations need to be applied, we will have to identify the most relevant ones based on the results of step 2) Possible optimizations include:

Remove as much reflection as possible from the interpreter code (by means of on-the-fly bytecode generation)
Implement a (bytecode)-compiler for AmbientTalk's ASTs.
Implement Polymorphic Inline Caches.
Implement lexical addressing.
Replace 1 thread per actor by a thread pool.
Measure the effects of your optimizations using the setup of Step 1.

Instead of a bytecode interpreter, see the following set of blog posts on how to speed up an interpreter written in a language like Java/C#:

Steps 1 and 2 will be performed during an apprenticeship at the PROG lab. Step 3 can be performed in the context of a Masters Thesis, also at the PROG lab and immediately following the apprenticeship period.

For more information, contact us (tvcutsem at vub…) or come and visit us on Friday, May 8th in the afternoon at the PROG lab.