Checkpointing Tools from Tsinghua University.

Thckpt64

A sequectial applicaton checkpointing/recovery library for IA-64.

Installation and Usage are in the tarball.

Download : Thckpt


Parallel Checkpointing/Recovery

Process checkpointing and rollback recovery is a convenient and effective technique for fault tolerance. While checkpointing software had been developed for most platforms, few work had been done to port those software to IA-64.IA-64 is a new and revolutionary architecture, yet the structural complexity made it difficult to implement process checkpointing and rollback recovery on IA-64. Based on Thckpt, we have implemented two parallel program checkpointing tools: ChaRM4MPI and ChaRM4PVM, for MPI nad PVM respecitvely.

ChaRM4MPI

A checkpointing/recovery system for MPI applications on IA-64, using Thckpt64 as the underlying checkpointer.

Installation:

Usage:

Download : ChaRM4MPI

This page is just a simple guide, and I am writing a full version of the user manual for ChaRM4MPI.

ChaRM4PVM

A checkpointing and rollback recovery resource manager for PVM. This system provides cooperative checkpointing support for parallel computing with PVM, as well as task migration and resource management through a graphical interface. To run a PVM application under Charm ,the programmer should compile and link the application source code to the checkpointing library provided by Charm. Launched in the graphical Charm Manager, the paralled PVM application can be checkpointed periodically and rolled back anytime. The task distribution can be observed and adjusted from the graphical interface.A task can be easily migrated from one node to another. The programmer may add or remove nodes without interrupting the computation.

Installation and Usage are in the tarball.

Download: ChaRM4PVM


Information

Please access gelato to get more about Linux & Itanium. SourceForge.net Logo