|
The Mitosis Compiler: Speculative Parallelization using Pre-Computation
Carlos García Quiñones1, Carlos Madriles1,
Pedro Marcuello1,
Antonio González1,2 and Dean Tullsen3
1 Intel Barcelona Research Center, Barcelona,
Spain
2 Department of Computer Architecture, Universitat
Politcénica de Catalunya,
Barcelona, Spain
3 Department of Computer Science and Engineering,
UC San Diego, USA
Speculative parallelization can provide significant sources of additional thread-level parallelism, especially for irregular applications that are hard to parallelize by conventional approaches. In this paper, we present the Mitosis compiler, which partitions applications into speculative threads, with special emphasis on applications for which conventional parallelizing approaches fail.
The management of inter-thread data dependences is crucial for the performance of the system. The Mitosis framework uses a pure software approach to predict/compute the threadŐs input values. This software approach is based on the use of pre-computation slices (p-slices), which are built by the Mitosis compiler and added at the beginning of the speculative thread. P-slices must compute thread input values accurately but they do not need to guarantee correctness, since the underlying architecture can detect and recover from misspeculations. This allows the compiler to use aggressive/unsafe optimizations to significantly reduce their overhead. The most important optimizations included in the Mitosis compiler and presented in this paper are branch pruning, memory and register dependence speculation, and early thread squashing.
Performance evaluation of the Mitosis compiler/architecture has been
evaluated for a set of the Olden benchmarks.
These programs have been shown to be very hard
to autoparallelize by a state-of-the-art
compiler. The main reason is that these codes
are full of pointers and dynamically-linked
data structued which make the task of the
autoparalelizer really tough. Results
obtained by the Mitosis compiler shows an
average speedup of 2.2 when four thread units
are assumed.
Back to the Workshop Program
|