Hacker News

jetsnoc 2 days ago [ - ]

  Models
    gpt-oss-120b, Meta Llama 3.2, or Gemma
    (just depends on what I’m doing)

  Hardware
    - Apple M4 Max (128 GB RAM)
      paired with a GPD Win 4 running Ubuntu 24.04 over USB-C networking

  Software
    - Claude Code
    - RA.Aid
    - llama.cpp

  For CUDA computing, I use an older NVIDIA RTX 2080 in an old System76 workstation.

  Process

    I create a good INSTRUCTIONS.md for Claude/Raid that specifies a task & production process with a task list it maintains. I use Claude Agents with an Agent Organizer that helps determine which agents to use. It creates the architecture, prd and security design, writes the code, and then lints, tests and does a code review.

Infernal 2 days ago [ - ]

What does the GPD Win 4 do in this scenario? Is there a step w/ Agent Organizer that decides if a task can go to a smaller model on the Win 4 vs a larger model on your Mac?

altcognito 2 days ago [ - ]

What sorts of token/s are you getting with each model?

jetsnoc 2 days ago [ - ]

Model performance summary:

  **openai/gpt-oss-120b** — MLX (MXFP4), ~66 tokens/sec @ Hugging Face: `lmstudio-community/gpt-oss-120b-MLX-8bit`

  **google/gemma-3-27b** — MLX (4-bit), ~27 tokens/sec @ Hugging Face: `mlx-community/gemma-3-27b-it-qat-4bit`

  **qwen/qwen3-coder-30b** — MLX (8-bit), ~78 tokens/sec @ Hugging Face: `Qwen/Qwen3-Coder-30B-A3B-Instruct`

Will reply back and add Meta Llama performance shortly.

CubsFan1060 2 days ago [ - ]

What is the Agent Organizer you use?

jetsnoc 2 days ago [ - ]

It’s a Claude agent prompt. I don’t recall who originally shared it, so I can’t yet attribute the source, but I’ll track that down shortly and add proper attribution here.

Here’s the Claude agent markdown:

https://github.com/lst97/claude-code-sub-agents/blob/main/ag...

Edit: Updated from the old Pastebin link to the GitHub version. Attribution found: lst97 on GitHub

nicce 2 days ago [ - ]

How it looks like Claude agent is written by Claude...