Portable Checkpointing of MPI Applications
Gabriel Rodríguez, María J. Martín, Patricia González, Juan Touriño and Ramón Doallo
CPPC (Controller/Precompiler for Portable Checkpointing) is a checkpoint-based fault tolerance tool focused on the execution of parallel applications over heterogeneous clusters or Grids. In addition to using portable code, protocols and data storage formats, compile-time analysis of the application to be checkpointed is needed in order to avoid restart inconsistencies, without falling back to traditional approaches like process coordination or message-logging, which typically introduce unscalable overheads. These solutions are implemented into the CPPC library, used by the CPPC source-to-source preprocessor to automatically transform a parallel application into a fault tolerant counterpart. Experimental results to assess the checkpointing tool are also provided.
Please contact our webadmin with any comments or changes.