CPC 2006 CPC 2006

Call for Participation
Current List of
Program (List of Talks)
Registration Form
Travel Information

Funded by the Vicerreitoría de Investigación of the University of A Coruña and the Ministry of Education and Science of Spain

Universidade de A Coruña Ministerio de
		      Educación y Ciencia

Shape Analysis for Dynamic Data Structures based on Coexistent Links Sets

A. Tineo, F. Corbera, A. Navarro, R. Asenjo and E.L. Zapata
Department of Computer Architecture, University of Málaga, Spain

Static knowledge of memory references in a program is a must for compilers, if they are to provide optimizations related to parallelism in an automated basis. Such knowledge is not easy to gather due to the existence of aliases. Arrays, pointers and pointer-based dynamic data structures introduce aliases in programs. Parallelizing compilers have obtained a reasonable degree of success when dealing with array aliases and stack-directed pointers. However, heap-directed pointers and the structures they dereference are a whole different ground that still needs significant work. The problem of characterizing the dynamically allocated memory locations in a program can be approached in several ways. We believe that, in order to provide accurate information for real-life programs, some sort of abstraction in the form of a bounded graph must be performed. The kind of analysis that represents the heap as a storage shape graph is known as shape analysis. Its main goal is to capture the shape of memory configurations that are accessible through heap-directed pointers in a program.

Information about the shape of dynamic data structures is useful for parallelizing compiler transformations over the input program. Maybe the most obvious application is data dependence detection, needed for instruction scheduling, data-cache optimization, loop transformation and automatic vectorization and parallelization. Another interesting application comes from the use of the shape information for debugging analysis of the program. The shape abstraction can provide information about incorrect pointer usage that can lead to mistakes difficult to track.

In the context of shape analysis, our research group has already developed a powerful framework over the past few years, that has been successfully put to use for a dependence detection algorithm that handles complex traversals and structures. The experience gained with this framework has allowed us to learn from its strengths and weaknesses. We present in this work a new strategy for shape analysis based on what we call coexistent links sets (CLSs).

CLSs codify possibilities of existence and connectivity of memory locations in a neat and compact way. Each CLS defines a set of (incoming and outgoing) links that exist simultaneously over a memory location at a possible state of the memory configuration . Our goal is to provide a new shape analysis framework based on CLSs that can be configured for varying degrees of precision and cost. In this work we present the rationale behind such framework.
Our shape analysis philosophy is based on a graph abstraction of memory configurations in a program. Graphs are made of connected nodes that represent memory locations and links between them. Our technique works as a fixed-point iterative data-flow algorithm that transforms graphs through abstract interpretation of the program statements. Changes in graphs conform to the rules (abstract semantics) that impose each kind of statement.
In order to describe a graph-based shape analysis technique, we must provide some basic operations that define how graphs are created and changed along the analysis. There are three basic operations: (i) summarization, to limit memory configurations of unknown size to a bounded-size shape graph; (ii) materialization, to focus on singular memory locations that have been previously summarized and (iii) graph union, to define how graphs are formed at join points in the control flow graph (CFG). These operations are defined in terms of simple changes in the CLSs of a graph.

The summarization operation is key in a shape analysis algorithm. The way it is defined determines to a great extent the algorithm behaviour and precision. It would be desirable to be able to tweak this basic operation for different data structures. At the same time, this operation must be defined clearly in order to formalize the algorithm. We have taken a complementary approach: first we define a fixed criterion about what nodes must be kept separate and what nodes can be merged; and second, we fine-tune summarizing of nodes with properties. Properties are associated to nodes and can be easily switched on and off for the analysis. We provide a set of basic properties in this work. However new properties can be added in the future effortlessly if the analysis of new codes suggests so.

We also present CLS extensions to support cycles in dynamic data structures. Cyclicity is traditionally a big problem for shape analysis techniques. Some authors choose not to handle cyclicity at all, while others offer varying degrees of support for cyclic structures. We aim to provide full analysis of arbitrary shapes of data structures, i.e., we do not limit the analysis to support only pre-defined shapes such as tree, DAG or cycle.
We believe that the use of CLSs in the context of shape analysis offers an appealing mechanism for heapbased dynamic data structure characterization. We can achieve good precision at the core level with a compact representation, while at the same time fine-tune further precision-vs-cost tradeoff with the use of properties. CLSs will provide better means to handle parallelism detection involving dynamic data structures for large-scale programs.

Back to the Workshop Program 

Please contact our webadmin with any comments or changes.