hf-mem – Estimate Hugging Face Model GPU Memory Before Downloading

💾 hf-mem – Estimate GPU Memory Before Downloading a Model

If you’ve ever downloaded a large Hugging Face model only to immediately hit a CUDA Out of Memory error, you already know the problem: model sizes are growing fast, but GPU memory isn’t.

What if you could know exactly how much GPU memory a model needs before downloading it?

That’s exactly what hf-mem does.


🚀 What is hf-mem?

hf-mem is a lightweight command-line tool that estimates the GPU memory required for inference of Hugging Face models — without downloading the model weights.

It works by: - Fetching only safetensors metadata via HTTP range requests - Parsing tensor shapes and data types - Estimating VRAM usage based on parameter counts - Presenting a clear, human-readable breakdown in the terminal

No weights. No waiting. No wasted bandwidth.

Project: https://github.com/alvarobartt/hf-mem


🧠 How It Works (High Level)

Instead of pulling gigabytes of model data, hf-mem: 1. Requests only the first ~100 KB of each safetensors file 2. Reads tensor metadata (shape + dtype) 3. Calculates expected GPU memory requirements for inference

Because it relies purely on metadata, the tool is fast, efficient, and accurate enough for model selection and capacity planning.


🛠️ Usage (uv / uvx)

Run hf-mem directly using uvx:

```bash uvx hf-mem --model-id meta-llama/Llama-2-7b-hf