Checkpointing Flax NNX Models with Orbax: Saving and Restoring State
Google for DevelopersDecember 3, 202513 min181 views
36 connectionsΒ·40 entities in this videoβUnderstanding NNX State Management
- π‘ Flax NNX provides a Pythonic, stateful approach to defining models, where module instances directly hold their state variables (like parameters or batch statistics) as attributes.
- π State variables, such as
nx.paramfor learnable parameters ornx.batch_batch_statfor batch normalization statistics, are instances derived fromnnx.Variable. - π§© Since NNX version 0.11, module instances are native JAX PyTrees, allowing them to be passed to JAX functions, but
nx.stateis used to extract only the dynamicnnx.Variableobjects for checkpointing.
Core NNX Functions for State Handling
- βοΈ
nx.splitseparates a module instance into its static structure (agraphdef) and its dynamic state (annnx.StatePyTree), which is suitable for saving or passing to JAX functions. - π
nx.mergereconstructs a module instance given its static structure and a state PyTree, typically loaded from a checkpoint. - β¬οΈ
nx.updatemodifies an existing module instance by updating its variables with data from a state PyTree, rather than creating a new instance.
Orbax Checkpointing Workflow: Saving Models
- πΎ Orbax is the standard checkpointing library in the JAX ecosystem, designed to reliably save and load state, especially in complex distributed settings.
- ποΈ The
CheckpointManageris a wrapper that handles checkpoint logistics, including saving at specific steps, version tracking, automatic deletion of old checkpoints, and restoring the latest one. - π To save an NNX model, first create a
CheckpointManager, then usenx.splitto extract thennx.StatePyTree, and finally callmanager.savewith the training step and the state wrapped in an Orbax argument structure. - β
It's important to use
wait_until_finishedto ensure the save operation completes, especially if it occurs in the background.
Orbax Checkpointing Workflow: Restoring Models
- π Restoring requires a template to guide Orbax. An abstract model is created using
nx.eval_shapeto get a PyTree of shape and dtype information without allocating actual data. - ποΈ This abstract model is then split into its
graphdef(static structure) and anabstract_statePyTree, which serves as the template for Orbax. - π₯
manager.restoreis called with the abstract state as a template, returning a restored state PyTree containing the loaded JAX arrays. - π οΈ The restored state and the
graphdefare then used withnx.mergeto reconstruct the model instance, ready for inference or continued training.
Knowledge graph40 entities Β· 36 connections
How they connect
An interactive map of every person, idea, and reference from this conversation. Hover to trace connections, click to explore.
Hover Β· drag to explore
40 entities
Chapters5 moments
Key Moments
Transcript48 segments
Full Transcript
Topics15 themes
Whatβs Discussed
OrbaxFlax NNXCheckpointingJAX EcosystemModel StateParametersOptimizer StateDistributed TrainingPyTreesSerializationState ManagementNNX ModuleNNX Variablenx.splitnx.merge
Smart Objects40 Β· 36 links
ProductsΒ· 10
ConceptsΒ· 25
MediasΒ· 5