The cloud clinic on Thursday, January 23 from 11:30am-12:30pm Pacific (2:30-3:30pm Eastern) was intended for NAIRR investigators and other research teams using cloud platforms for data science. Specifically we focused on checkpointing, or storing partial results from a cloud compute task so that if it is interrupted it can be restarted roughly where it left off.

The emphasis was on cloud efficiency, terminology, use cases, and best practices, including GPU access, persistent (object) storage, distinguishing preemptible VM types e.g. “one time” versus “persistent” on AWS, and useful details such as the user data option on AWS. 

Slides: 2025-01-23-CloudBank-Clinic (PDF)

Email: help@cloudbank.org with any questions

Abstract: In today’s busy world we can lose track of small details that have a big impact. Suppose you have a cloud budget of $10,000 but your computations could be scaled up beyond that limitation to produce better results. What you need is access to immutable storage (easy), access to cheap preemptible cloud VM instances (easy) and a reliable method of checkpointing your progress (easy? hard?). This one-two-three punch means you can purchase $33,333 worth of cloud computing for a mere $10,000 and get better research results as a consequence. This cloud clinic will catch you up on the how-tos and other small details of such a substantial gain in compute power. We use a CNN as our example implementation of a compute-intensive research task.