Pico

A Lightweight Framework for Studying Learning Dynamics

Pico demystifies how language models learn. Lightweight and efficient, it's your go-to toolkit for training and analyzing models across different scales.

Built around two core libraries: pico-train for model training and pico-analyze for in-depth analysis—designed from the ground up to work seamlessly together.

Training Made Easy

pico-train makes the process of training models simple and efficient.

With pico-train, you can train language models of various sizes with minimal configuration. The framework handles the complexities of distributed training, gradient accumulation, and checkpoint management, allowing researchers to focus on experimenting with model architectures and training paradigms.

Small-Scale Focus

Train and study models from 1M to 1B parameters, making experimentation with training paradigms practical and accessible.

Advanced Checkpointing

Access model activations, gradients, and other rich information throughout training for mechanistic interpretability research.

Easy Retraining

Simple, modular codebase designed for researchers to modify and retrain the entire model suite with custom training paradigms.

PyTorch Lightning

Built on PyTorch Lightning for efficient, scalable training with minimal boilerplate code.

Minimal Dependencies

Lightweight framework with only essential dependencies, making it easy to install and modify.

Research Ready

Designed with researchers in mind, providing tools and flexibility needed for academic exploration.

Learning Dynamics Revealed

pico-analyze provides comprehensive tooling to capture and analyze training metrics, enabling researchers to understand how models learn across different scales.

Out of the box, pico-analyze provides a suite of tools to capture and analyze training metrics, including:

  • Convergence Rates

    Compute layer convergence rates across model sizes using automatically stored activation checkpoints.

  • Effective Rank

    Analyze dimensional utilization across layers to understand how models distribute complexity and identify potential bottlenecks.

  • Gradient Magnitude

    Track how gradient magnitudes evolve during training to understand optimization dynamics and identify potential training instabilities.

  • Model Sparsity

    Measure the percentage of near-zero weights in models to understand pruning potential and efficiency.

Using checkpoints from pico-train, the analysis framework pico-analyze lets you extract critical insights about model behavior throughout training. These insights can help identify optimization issues and guide architectural improvements.

Built with ❤ by the Pico team

Code and Artifacts are licensed under Apache License 2.0