Understand the GPU tier system and which models your hardware can run.

    The Tier System

    CompressNode organizes GPUs into three tiers based on VRAM capacity. Your tier determines which compressed models you can run and affects the job pricing you receive.

    • Tier 1 (Entry) - 8GB VRAM (RTX 4060, RTX 3060, RTX 3070). Runs 7B-14B parameter models.
    • Tier 2 (Mid) - 12GB VRAM (RTX 4070, RTX 3080, RTX 4070 Ti). Runs up to 32B parameter models.
    • Tier 3 (High) - 16GB+ VRAM (RTX 4080, RTX 4090, A4000, A5000). Runs all models including 70B.

    Tier 1 Models (8GB VRAM)

    Entry-level models that run on the most affordable consumer GPUs:

    • llama-3.1-8b-compressed - Llama 3.1 8B, general-purpose, ~5 GB VRAM
    • mistral-7b-compressed - Mistral 7B, efficient instruction-following, ~4.5 GB
    • gemma-2-9b-compressed - Gemma 2 9B, Google's compact model, ~5.5 GB
    • deepseek-r1-8b-compressed - DeepSeek R1 8B, chain-of-thought reasoning, ~5 GB
    • qwen-2.5-coder-7b-compressed - Qwen 2.5 Coder 7B, best small code model, ~4.5 GB
    • phi-4-compressed - Phi-4 14B, Microsoft's best small model, ~8 GB
    • qwen-2.5-vl-7b-compressed - Qwen 2.5 VL 7B, image + text understanding, ~5 GB

    Tier 2 Models (12GB VRAM)

    Mid-range models offering premium quality for mid-tier GPUs:

    • qwen-2.5-14b-compressed - Qwen 2.5 14B, high-quality multilingual, ~9 GB
    • qwen-2.5-32b-compressed - Qwen 2.5 32B, premium mid-tier chat, ~12 GB
    • deepseek-r1-32b-compressed - DeepSeek R1 32B, near-GPT-4 reasoning, ~12 GB
    • qwen-2.5-coder-32b-compressed - Qwen 2.5 Coder 32B, GPT-4 level code, ~12 GB
    • gemma-2-27b-compressed - Gemma 2 27B, Google's strongest open model, ~11 GB

    Tier 3 Models (16GB+ VRAM)

    Maximum capability models for high-end hardware:

    • llama-3.1-70b-compressed - Llama 3.1 70B, maximum capability, ~14 GB VRAM

    Managing Models

    Use the CLI to manage models: compressnode models lists all available models for your tier, compressnode pull <id> downloads a model, and the daemon loads the model into VRAM when you start it with --model <id>. You can load multiple models simultaneously if your VRAM allows.