hf-mem – Estimate Hugging Face Model GPU Memory Before Downloading
💾 hf-mem – Estimate GPU Memory Before Downloading a Model
If you’ve ever downloaded a large Hugging Face model only to immediately hit a CUDA Out of Memory error, you already know the problem: model sizes are growing fast, but GPU memory isn’t.
What if you could know exactly how much GPU memory a model needs before downloading it?
That’s exactly what hf-mem does.
🚀 What is hf-mem?
hf-mem is a lightweight command-line tool that estimates the GPU memory required for inference of Hugging Face models — without downloading the model weights.
It works by: - Fetching only safetensors metadata via HTTP range requests - Parsing tensor shapes and data types - Estimating VRAM usage based on parameter counts - Presenting a clear, human-readable breakdown in the terminal
No weights. No waiting. No wasted bandwidth.
Project: https://github.com/alvarobartt/hf-mem
🧠 How It Works (High Level)
Instead of pulling gigabytes of model data, hf-mem: 1. Requests only the first ~100 KB of each safetensors file 2. Reads tensor metadata (shape + dtype) 3. Calculates expected GPU memory requirements for inference
Because it relies purely on metadata, the tool is fast, efficient, and accurate enough for model selection and capacity planning.
🛠️ Usage (uv / uvx)
Run hf-mem directly using uvx:
```bash uvx hf-mem --model-id meta-llama/Llama-2-7b-hf