|
ESODYP: An Entirely Software and Dynamic Data Prefetcher based on a Markov Model
Jean Christophe Beyler and Philippe Clauss
Université de Strasbourg, LSIIT/ICPS, Illkirch-Graffenstaden, France
Many works have shown that data prefetching can be an efficient answer
to the well-known memory bottleneck. Although
several approaches ranging from dynamic
hardware to static software mechanisms have
been proposed [5, 6, 2, 3], to our knowledge
no automatic, pure and standalone dynamic
software data prefetching solution has yet
been proposed. However such a portable
approach can be quite worthy regarding the
number of data-intensive applications having a
dynamically changing memory behavior and
regarding the ever growing variety of
processor and memory architectures.
In this work, we propose an Entirely SOftware and DYnamic data
Prefetcher (ESODYP) based on a Markov
model. It runs in two main phases: a short
training phase where a graph coding
sequences of occurring memory accesses is
constructed, and an optimizing phase where
predicted addresses are prefetched while
some informations in the graph are updated
by continuously monitoring the
program. The different schemes that are
studied are patterns that can be found
using the logical addresses or directly
memory strides.
In some programs, the memory usage can be intensive but on a small
amount of data. Knowing which accesses
will come next will enable an
optimizer to prefetch correctly, thus
reducing the number of
cache-misses. Since the number of
different addresses accessed is
relatively low, we can use them
directly as keys for the pattern
creation and monitoring.
However, in other programs, the accessed data is too large to be able
to model the patterns this
way. Some methods are based on
hash tables and LRU systems to
keep the memory usage of the
optimizer low. Our technique
consists in studying memory
strides instead of directly using
the data addresses. This gives us
the possibility to handle data
traversals that are done by
link-lists or certain array
traversals in for loops. Large
link-lists are an interesting
target for this optimization since
the dynamic allocation makes
static optimizations complicated
and compilers rarely optimize
anything because of the limited
knowledge of the link-lists
allocation and traversal
behaviour. They leave such
programs unoptimized since they
are unsure at compile time whether
they will speedup or slowdown the
program.
A program memory behaviour can be described as a succession of
phases. Each phase having its
own behaviour. Our model
continuously monitors the data
behaviour to check the current
graph's relevancy. This is
done by checking if the
current memory behaviour is
still relatively similar to
the one created during the
training phase. If it is not,
the graph is destroyed and a
new one is created. This can
be seen as a flush of the
model to allow a new model to
be created. Of course, if
there are too many phase
changes then our model will
continuously be destroying the
graphs created. Thus a
threshold is implemented to
allow small phase changes or
irregularities to go unnoticed
as long as we return to a
phase corresponding to the one
of the created model.
Significant speed-ups are obtained on an Itanium-2 processor [1,
4]using ESODYP for several
benchmark programs that
could not have been
optimized statically. It
is particularly shown that
the induced software
overhead can represent a
minor execution time
regarding performance
improvements, due to a
careful lowering of the
optimizer computations and
memory accesses.
This work also opens some interesting perspectives in the development
of pure software
dynamic optimizers.
References
[1] Reference Manual for Software Development and Optimization,
chapter
Optimal use of
lfetch, pages
71Ð72. Intel
Corporation,
May 2004.
[2] V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent
dynamic
optimization
system. ACM
SIGPLAN
Notices,
35(5):1Ð12,
2000.
[3] G. Desoli, N. Mateev, E. Duesterwald, P. Faraboschi, and
J. A. Fisher. Deli: a new run-time control point. In 35th Annual ACM/IEEE International Symposium on Microarchitecture, pages 257Ð268, December 2002.
[4] Intel. Hyper-threading
technology. http://www.intel.com/technology/hyperthread/.
[5] D. Joseph and D. Grunwald. Prefetching using markov predictors. In
IEEE
Transactions on Computers, Vol. 48, NO. 2, pages 121Ð 133, Febuary 1999.
[6] J. Lu, H. Chen, R. Fu, W. Hsu, B. Othmer, P. Yew, and D. Chen. The performance of runtime data cache prefetching in a dynamic optimization system. In 36th Annual IEEE/ACM International Symposium on Microarchitecture, December 2003.
Back to the Workshop Program
|