Platform/Site Reliability Engineering Specialist
Department of Education
About the role
As a Platform/Site Reliability Engineering Specialist, you will play a crucial role in designing, developing, and evolving the cloud platforms, automation, and reliability systems that empower FSA's applications. Your expertise will be vital in creating infrastructure, tools, and monitoring capabilities that enable teams to deliver secure, reliable, and high-performing services at scale. You'll collaborate with cross-functional partners to standardize cloud architectures, enhance system reliability, and transform FSA into a platform-driven, engineering-centric organization.
This position uniquely combines the mission of public service with the intricacies of major commercial cloud and SRE environments. You will lead the development of platforms, safety measures, and reliability practices that allow teams to deploy changes confidently and safely. If you have a passion for designing scalable infrastructure, optimizing system reliability, and empowering engineers to accelerate their work, this is the perfect opportunity for you.
Responsibilities
As a Platform/Site Reliability Engineering Specialist, you will be responsible for:
- Advising the IOG Director and Chief of the Network Support Division while acting as a network architect and engineer to design and implement solutions across both cloud and on-premises environments, developing reusable platform services, container ecosystems, identity integrations, networking frameworks, and infrastructure components.
- Contributing to design and technical documentation, reviewing final deliverables, and ensuring compliance with the enterprise network operations engineering framework as a principal expert in platform engineering, cloud architecture, SRE practices, and infrastructure automation.
- Engaging with technology leaders, business partners, and contractors to ensure operational requirements are met, articulating technical concepts to non-technical stakeholders, and producing platform standards, design documents, and technical evaluations.
- Evaluating system security plans and protocols, overseeing project planning and updates, managing office support contractors, and addressing IT compliance issues, while designing and maintaining CI/CD pipelines to support automated testing, deployment, change control, and compliance validation.
- Leading network engineering responses to CISA Binding Operational Directives (BODs) affecting data center operations, developing strategies and processes to enhance security, and implementing secure cloud configurations, IAM models, encryption, and zero-trust architectural patterns.
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free