Suspend vs. snapshot: pause a sandbox, or save it for reuse?

00 · The question we keep getting

A big value proposition for using sandboxes to deploy agents is statefulness. A stateful sandbox preserves the filesystem and the in-memory state of the execution environment — agent memory, partial artifacts, a warmed-up interpreter — so work doesn't evaporate between turns.

Tensorlake sandboxes ship two related-but-distinct primitives for this: you can snapshot a running VM to create a point-in-time artifact of the whole sandbox, or you can suspend a VM, which snapshots and puts the sandbox to sleep. We get this question a lot: when do I use which?

◆

TL;DR

Suspend is a pause button. Same sandbox ID, resumed in place. Snapshot is a save file. A durable artifact you can restore into new sandboxes — once, or a hundred times.

01 · The real split: pause vs. save

Suspend is about compute. You have one sandbox, it's idle, and you want to pause it in place without losing state or paying for compute. Later, you resume the same sandbox under the same sandbox ID and it picks up where it left off.

Snapshot is about checkpoints. You capture the full state of a running sandbox — filesystem, memory, running processes — into a reusable artifact that outlives the source. You can restore that artifact into a new sandbox: once, or a hundred times, today or next month.

“

Suspend is a pause button. Snapshot is a save file. Everything downstream — the API surface, the pricing, which failures are recoverable — follows from this split.

— Design principle, sandbox lifecycle RFC

	Suspend	Snapshot
Identity	Same sandbox ID	New sandbox per restore
Artifact	None; state stays in place	Persistent, independent object
Lifetime	Tied to the sandbox	Outlives the source
Cost shape	No compute while paused	Storage per artifact
Fan-out	Single lineage (no branching)	N (restore many times)

02 · How sandbox providers ship it

Both operations are technically hard. Freezing a running sandbox — memory, process tree, open file descriptors — and bringing it back cleanly is non-trivial. Turning that frozen state into a durable, portable artifact is harder still. Not every provider has built both paths.

Here's what's shipping today (April 2026):

Provider	Suspend	Snapshot	Filesystem	Memory	Processes
Tensorlake	●	●	●	●	●
E2B	●	●	●	●	●
Modal	—	●	●	α	α
Vercel Sandbox	β	●	●	—	—
Daytona	●	—	●	—	—

● supportedα alphaβ beta— not available

A few things that stood out when putting this together:

Only some providers preserve running processes and memory. Others call the operation "snapshot" but only capture the filesystem — you won't notice until a restored sandbox comes back missing its in-flight processes.
Several providers ship one side of the split but not the other. Where the memory-preserving path exists, it's sometimes behind an alpha or beta flag.
Pause and snapshot are separate operations on Tensorlake and E2B. Vercel collapses them (snapshot auto-stops the source). Modal's stable path skips pause entirely. The API shape tells you which mental model the provider chose.

The rest of this post uses Tensorlake for code examples because it ships both paths as distinct operations, which makes the patterns below runnable as written. Examples are in Python, TypeScript, and CLI — the SDKs have full parity.

03 · When to suspend

Suspend when you have one ongoing task and the sandbox will idle between bursts of work.

Concretely: a coding agent waiting for a human reply, an overnight research loop between steps, a notebook you'll come back to tomorrow. You want the exact process tree, open files, and memory back. Re-initializing would be slow — or wrong, if the process holds unserializable in-memory state.

# A coding agent, paused between user turns
from tensorlake import Sandbox

sbx = Sandbox.create(name="agent-session-A7F2")
# ... agent does work, waits for a human reply ...

sbx.suspend()
# → SBX_01HK9Z · PAUSED · compute: $0.00/s

# Later, the user comes back:
sbx = Sandbox.attach("agent-session-A7F2")
sbx.resume()  # same PIDs, same memory, same fs

→

AUTO-SUSPEND

A timeout_secs on a named sandbox triggers auto-suspend (not terminate), so you get this pattern defensively without writing idle-detection logic. Ephemeral sandboxes (created without a name) can't be suspended at all — the absence of a name is the signal that the sandbox isn't meant to outlive its current task.

04 · When to snapshot

Snapshot when one state needs to seed many future sandboxes, or outlive the current one. Three clear cases:

Fan-out

RL rollouts from a shared starting point. Every worker needs the exact same post-setup state, in parallel.

Golden environments

A dev environment with tools, weights, and auth preloaded, cloned per user session.

Checkpoints

A durable recovery point before a step that might fail. Retry from that point any number of times without redoing setup.

# Warm a base, then fan out to N rollout workers
from tensorlake import Sandbox, Snapshot

src = Sandbox.create(image="python:3.12")
src.exec("pip install torch transformers && python setup.py")

snap = src.snapshot(tag="rollout-base-v3")
# → snap_A7F2 · 1.2 GiB · blake3:f4c9…
src.terminate()  # source is done; snapshot lives on

# Fan out: 100 workers, same warm state, all parallel
workers = [Snapshot.restore(snap.id) for _ in range(100)]

The source sandbox can now be terminated. The snapshot lives on independently, and that independence is the whole point: snapshots are objects, not sandbox states.

05 · When you need both

Long-running agents often want both primitives. The pattern:

1
Snapshot after expensive setup (install deps, download weights, warm caches). This is your durable recovery point.
2
Suspend between idle turns during normal operation. Cheap and fast.
3
If the sandbox fails catastrophically or you need to fork the session, restore from the snapshot into a fresh sandbox.

Snapshot is your insurance policy; suspend is your day-to-day cost control. They compose cleanly because they answer different questions.

06 · Quick decision guide

One sandbox, one ongoing task, will idle

→

SUSPEND

One state, many descendants (now or later)

→

SNAPSHOT

Expensive setup you don't want to redo on failure

→

SNAPSHOT as a checkpoint

Long-running agent with idle gaps

→

BOTH snapshot warm base, suspend between turns

If you find yourself reaching for snapshot every time the user goes to lunch, or for suspend when what you really want is a reproducible starting point, step back to the split: compute vs. storage. Once you frame it this way, the choice becomes pretty obvious.

WRITTEN BYDiptanu Gon ChoudhuryCEO / Co-founder