TE

Sr. Engineer, Kernel Development and Optimization

Tenstorrent
Belgrade, Remotefull_timePosted 8 Jun 2026

About the role

<div class="content-intro"><p>Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.</p></div><p>Tenstorrent is building next-generation AI compute. The Kernel Development and Optimization team develops the performance-critical kernels that unlock the full capability of our hardware across ML and HPC workloads.</p> <p>This role is<strong> </strong>hybrid based out of Belgrade, Serbia.</p> <p>We welcome candidates at various experience levels for this role. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting.</p> <p> </p> <p><strong>Who You Are</strong></p> <ul> <li data-start="423" data-end="519">A strong C++ systems engineer with experience writing performance-critical or low-level software.</li> <li data-start="423" data-end="519">Comfortable reasoning about concurrency, synchronization, latency hiding, and compute versus memory trade-offs.</li> <li data-start="423" data-end="519">Data-driven in your approach, using profiling and benchmarking results to guide optimization decisions.</li> <li data-start="423" data-end="519">Effective at debugging complex runtime or kernel-level issues in large codebases.</li> <li data-start="423" data-end="519">Structured thinker who can break down ambiguous performance problems into measurable experiments.</li> </ul> <p> </p> <p><strong>What We Need</strong></p> <ul> <li data-start="953" data-end="1104">Engineers who can design, implement, and optimize GPU-style kernels such as matrix multiplication, attention primitives, and data-movement operations.</li> <li data-start="953" data-end="1104">Clear ownership of performance, from identifying bottlenecks to delivering measurable throughput improvements.</li> <li data-start="953" data-end="1104">Contribution to host-side orchestration code and parallelization strategies.</li> <li data-start="953" data-end="1104"&g

Apply for this role

Generate a tailored application kit with a matched cover letter, interview prep, and CV highlights — in under 60 seconds.

Generate Application Kit

Free account required — sign up in 30s

Company

Tenstorrent

View all open roles →