Company: AMD

Role: GPU Performance Software Development Engineer

Location: San Jose, CA (see posting for details)

We’re building Wave, a high-performance GPU programming language and compiler for machine learning workloads. Wave combines a Python-embedded DSL with an MLIR-based compiler stack to give engineers explicit control over GPU kernel performance.

We’re looking for a low-level GPU performance engineer to own end-to-end performance of Wave’s kernels.

You’ll work on

• Hand-tuned GPU kernels (GEMM, Attention, MoE, decoding)

• Instruction-level optimization: memory, registers, scheduling, wave/warp behavior

• HIP/CUDA, GPU intrinsics (MFMA/MMA), and inline assembly • MLIR dialects, lowering pipelines, and performance-critical compiler passes

• Profiling via disassembly, counters, rocprof/Nsight

Wave codebase: https://github.com/iree-org/wave

Job posting: https://careers.amd.com/careers-home/jobs/77039?lang=en-us