About the role
<div class="content-intro"><div>&nbsp;</div> <div><br></div></div><p><span class="fontColorThemeDarkAlt"><strong>WHO WE ARE: &nbsp;</strong></span></p> <p>Aviatrix® is pioneering the Cloud Native Security Fabric — the architecture the Containment Era requires. The Cloud Native Security Fabric governs every workload communication path across every cloud, every VPC, every Kubernetes cluster, and every serverless function, from a single policy plane. One rule. Universal propagation. Enforced at the workload, not at a chokepoint. Trusted by more than 500 of the world's leading enterprises. For more information, visit <a href="http://aviatrix.ai" target="_blank">aviatrix.ai</a>.</p> <p><strong>ABOUT THE ROLE:</strong></p> <p>The Aviatrix SRE team is a small but highly skilled global group of Systems Engineers/SREs dedicated to ensuring the reliability, availability, and performance of Aviatrix’s critical systems and services. Our mission is to build and maintain a robust, resilient infrastructure that enables Aviatrix to deliver high-quality services with agility through automation, best practices, and a culture of operational excellence.</p> <p>As a Member of Technical Staff (MTS) Site Reliability Engineer, you’ll be developing your foundational SRE skills while contributing to the reliability and performance of our systems. You’ll work under supervision to implement solutions, learn our infrastructure, and gain hands-on experience with production systems.</p> <h3><strong>KEY RESPONSIBILITIES</strong></h3> <ul> <li>Kubernetes: Learn to manage basic application deployments, assist with troubleshooting, and support monitoring tasks</li> <li>Infrastructure as Code: Implement IaC for straightforward provisioning tasks and configuration changes</li> <li>Automation &amp; Development: Contribute to existing automation tools and frameworks in Golang and Python</li> <li>Basic System Maintenance: Contribute to system reliability through routine maintenance tasks and monitoring</li> <li>Implementation Support: Implement well-defined solutions for moderate complexity technical problems</li> <li>Reliability Engineering: Learn fundamentals of system reliability; contribute to maintaining uptime for well-defined services under guidance</li> <li>Automation Excellence: Execute basic automation scripts; contribute to existing automation frameworks with supervision</li> <li>Observability: Implement basic monitoring configurations; learn to read dashboards and interpret common metrics</li> <li>Incident Management: Participate in incident response with escalation support; document findings and learnings</li> <li>Performance Eng