|
Shape Analysis for Dynamic Data Structures based on Coexistent Links Sets
A. Tineo, F. Corbera, A. Navarro, R. Asenjo and E.L. Zapata
Department of Computer Architecture, University of Málaga, Spain
Static knowledge of memory references in a program is a must for
compilers, if they are to provide
optimizations related to parallelism
in an automated basis. Such knowledge
is not easy to gather due to the
existence of aliases. Arrays, pointers
and pointer-based dynamic data
structures introduce aliases in
programs. Parallelizing compilers have
obtained a reasonable degree of
success when dealing with array
aliases and stack-directed
pointers. However, heap-directed
pointers and the structures they
dereference are a whole different
ground that still needs significant
work. The problem of characterizing
the dynamically allocated memory
locations in a program can be
approached in several ways. We believe
that, in order to provide accurate
information for real-life programs,
some sort of abstraction in the form
of a bounded graph must be
performed. The kind of analysis that
represents the heap as a storage shape
graph is known as shape analysis. Its
main goal is to capture the shape of
memory configurations that are
accessible through heap-directed
pointers in a program.
Information about the shape of dynamic data structures is useful for
parallelizing compiler
transformations over the input
program. Maybe the most obvious
application is data dependence
detection, needed for instruction
scheduling, data-cache optimization,
loop transformation and automatic
vectorization and
parallelization. Another interesting
application comes from the use of
the shape information for debugging
analysis of the program. The shape
abstraction can provide information
about incorrect pointer usage that
can lead to mistakes difficult to
track.
In the context of shape analysis, our research group has already
developed a powerful framework
over the past few years, that has
been successfully put to use for a
dependence detection algorithm
that handles complex traversals
and structures. The experience
gained with this framework has
allowed us to learn from its
strengths and weaknesses. We
present in this work a new
strategy for shape analysis based
on what we call coexistent links
sets (CLSs).
CLSs codify possibilities of existence and connectivity of memory
locations in a neat and compact
way. Each CLS defines a set of
(incoming and outgoing) links
that exist simultaneously over a
memory location at a possible
state of the memory
configuration . Our goal is to
provide a new shape analysis
framework based on CLSs that can
be configured for varying
degrees of precision and
cost. In this work we present
the rationale behind such
framework.
Our shape analysis philosophy is based on a graph abstraction of
memory configurations in a
program. Graphs are made of
connected nodes that represent
memory locations and links
between them. Our technique
works as a fixed-point
iterative data-flow algorithm
that transforms graphs through
abstract interpretation of the
program statements. Changes in
graphs conform to the rules
(abstract semantics) that
impose each kind of
statement.
In order to describe a graph-based shape analysis technique, we must
provide some basic
operations that define how
graphs are created and
changed along the
analysis. There are three
basic operations: (i)
summarization, to limit
memory configurations of
unknown size to a
bounded-size shape graph;
(ii) materialization, to
focus on singular memory
locations that have been
previously summarized and
(iii) graph union, to define
how graphs are formed at
join points in the control
flow graph (CFG). These
operations are defined in
terms of simple changes in
the CLSs of a graph.
The summarization operation is key in a shape analysis algorithm. The
way it is defined
determines to a great
extent the algorithm
behaviour and
precision. It would be
desirable to be able to
tweak this basic operation
for different data
structures. At the same
time, this operation must
be defined clearly in
order to formalize the
algorithm. We have taken a
complementary approach:
first we define a fixed
criterion about what nodes
must be kept separate and
what nodes can be merged;
and second, we fine-tune
summarizing of nodes with
properties. Properties are
associated to nodes and
can be easily switched on
and off for the
analysis. We provide a set
of basic properties in
this work. However new
properties can be added in
the future effortlessly if
the analysis of new codes
suggests so.
We also present CLS extensions to support cycles in dynamic data
structures. Cyclicity is
traditionally a big
problem for shape
analysis
techniques. Some authors
choose not to handle
cyclicity at all, while
others offer varying
degrees of support for
cyclic structures. We
aim to provide full
analysis of arbitrary
shapes of data
structures, i.e., we do
not limit the analysis
to support only
pre-defined shapes such
as tree, DAG or cycle.
We believe that the use of CLSs in the context of shape analysis
offers an appealing
mechanism for
heapbased dynamic data
structure
characterization. We
can achieve good
precision at the core
level with a compact
representation, while
at the same time
fine-tune further
precision-vs-cost
tradeoff with the use
of properties. CLSs
will provide better
means to handle
parallelism detection
involving dynamic data
structures for
large-scale programs.
|