Optimizing job reliability through contention-free, distributed checkpoint scheduling. Academic Article uri icon