Parallelization of multimedia applications for embedded systems with multi-processor system on chip platforms

Begeleiders:

The emerging trend for multimedia applications on mobile terminals (cellphones, portable gaming devices, PDAs, …), combined with a decreasing time-to-market and a multitude of standards, have created the need for computing platforms that are capable of providing considerable (application specific) computational performance at a low cost and a low energy budget. Hence, in recent years, the first Multi-Processor System-on-Chip (MPSoC) components have emerged (e.g. TI OMAP, ST Nomadik, Philips Nexperia). These platforms contain multiple heterogeneous, flexible processing elements (such as DSP’s, accelerators and general-purpose CPU’s), a memory hierarchy and I/O components. All these components are linked to each other by a flexible on-chip interconnect structure. These architectures meet the performance needs of multimedia applications, while limiting the power consumption. However, quite some issues remain to be solved before multimedia applications can be developed in a timely way and execute in an efficient way on such MPSoC platforms.

We focus on one issue and that is mapping a multimedia application, implemented as a sequential C program, onto the multiple processors of MPSoC platforms. In short, this means that the multimedia application has to be parallelized in order to take advantage of the multiple processors. Over the last decade, IMEC has developed extensive expertise in this area, in particular in the form of the Multi-processor Parallelization Assistant (MPA), for helping application designers with parallelizing their sequential C programs. Nevertheless, inherent characteristics of the programming language C, such as pointers and macro’s to name but a few, often require the developer to change the implementation fundamentally in order to avoid unsatisfactory parallelization results. Furthermore, this process might take an inordinate amount of time – weeks and even months – which has a very negative impact on the time-to-market. When taking a closer look at what really happens, we observe that in the development process the inherently parallel multimedia application is first represented in a sequential way in C. Next, in the mapping process, this implementation has to be analyzed extensively, and often unsuccessfully, to detect the parallelism. We can therefore ask ourselves if this process cannot be improved by implementing multimedia applications in parallel programming languages in the first place. Of course, the enormous amount of legacy implementations in C that exist will still have to run.

We propose to investigate alternative programming models for making the development and mapping of multimedia applications on MPSoC platforms more efficient. The general idea is to consider the entire software stack, from the programming language, which is translated or compiled, to the execution environment. This software stack can differ for different programming languages. For instance if we simplify things a little we can say that C is compiled to machinecode while Java is compiled to bytecode that runs on a virtual machine. Recently, some approaches are considering the .NET CLI as a common intermediate representation for both C/C#-like languages and languages of other programming paradigms, such as F# for typed functional programming. Functional programming is one of the primary known techniques for minimizing, tracking and isolating the use of mutable state, essential for various concurrency mechanisms as well as for emerging techniques such as software-transactional-memory. Moreover F# can directly use multi-core programming libraries such as ParallelFX. F# also includes language and library support for asynchronous workflows, a technique to write reactive programs and asynchronous message passing agents in a natural and compositional style.

Some questions that arise are: What is the overhead of using the .NET environment? and Is the CLI a suitable intermediate representation for parallel programming languages?

We propose several, loosely related master thesis topics that address this general idea. We welcome other topics that contribute to this general idea. The topics below can be roughly described as (1) parallelization of AVC encoding with .NET’s F#, (2) parallelization of multimedia applications considering parallel programming languages in general, and (3) virtual machine support for the parallel execution of multimedia applications on MPSoC platforms. As such, the first topic is more concrete than the other two, which have a stronger research flavour. The topics are subdivided in several tasks, some of which overlap since the topics are related.

  1. parallelizing multimedia applications:
    • study of the suitability of F# (.NET’s typed functional programming language) for implementing certain parts of AVC encoding
    • study of the generated byte code with respect to code size, analyzability for parallelization (The degree in which parallelism is explicitly represented), etc.
    • comparison between byte code generated from the F# implementation and from a functionally equivalent C# implementation with respect to performance, analyzability for parallelization, etc.
  2. parallel programming languages:
    • general study of parallel programming languages and categorization of characteristics; the application domain is multimedia
    • study of intermediate representations (byte code or otherwise) with respect to translation from the parallel programs, quality of the byte codes, conservation of parallelism (for analyzability with respect to parallelization), etc.
  3. support for parallelization at the level of the virtual machine:
    • study of existing intermediate representations (byte code or otherwise) for parallel programming languages; assessment of .NET’s CLI or other approaches
    • design of parallel-friendly extensions in view of analyzability for parallelization; assessment of .NET’s CLI or other approaches
    • study of virtual machines with respect to the identification of parallel sections in the intermediate representations, assignment of parallel sections to processors, management of synchronisation between parallel sections
    • Study of usability and benefits of dedicated virtual machines on existing MpSocs ( many-core such as CUDA, or multi-core such as Intel QuadCores) in relation with the above for automatic parallellisation

References: