Skip to content
mimi

Lead Site Reliability Engineer (SRE)

Macpower Digital Assets Edge

Rockville · On-site Full-time Lead 3w ago

About the role

Key Focus Areas

  • Manage and optimize control towers, organizational policies, and multi-account environments.
  • Oversee AWS backups, SSM patching, AMI deployments, and configuration pushes across multiple accounts.
  • Manage and maintain core AWS services including EC2, ECS, EKS, RDS, S3, SageMaker, CloudFront, and Lambda.
  • Implement S3, SFTP, and site externalization methods.
  • Develop Infrastructure as Code (IaC) using Terraform, CloudFormation, and Python.
  • Manage IAM policies, access controls, and permissions.

Core Responsibilities

  • Manage and maintain cloud infrastructure to ensure high availability, reliability, and performance.
  • Serve as the primary escalation point for all cloud infrastructure issues.
  • Monitor cloud resource performance and cost efficiency.
  • Lead major incident management and communicate timely updates to stakeholders.
  • Perform due diligence and impact analysis before implementing changes to cloud platforms.
  • Lead and mentor a team of cloud engineers to ensure performance and collaboration.
  • Manage daily operations and ensure alignment with organizational objectives.
  • Develop and implement incident management processes and conduct root cause analysis.
  • Identify and automate repetitive infrastructure tasks using IaC principles.
  • Continuously improve operational processes and standard operating procedures.
  • Implement and enforce security controls, ensuring compliance with standards such as GDPR and HIPAA.
  • Monitor cloud usage and conduct capacity planning to balance efficiency and scalability.
  • Develop and test disaster recovery and business continuity plans.
  • Collaborate with IT, business units, and vendors to deliver scalable cloud solutions.
  • Document cloud configurations, processes, and reports, ensuring accessibility and version control.

Technical Skills

  • Proficiency in AWS (EC2, ECS, EKS, RDS, S3, Lambda, SageMaker, CloudFront).
  • Experience with Azure and OCI cloud environments.
  • Infrastructure as Code (Terraform, CloudFormation, Ansible, Puppet, Chef).
  • Scripting in Python and PowerShell.
  • Strong understanding of cloud architecture, monitoring, and automation tools.
  • System administration experience (Windows, Linux, VMware, Active Directory, Azure AD SSO).
  • Strong networking knowledge (DNS, DHCP, PKI, LAN/WAN).

Leadership and Behavioral Skills

  • Demonstrated experience in leading teams and managing cloud operations.
  • Strong communication and stakeholder management across technical and business functions.
  • Proactive problem-solver with excellent analytical and root cause analysis skills.
  • Self-motivated with a continuous improvement mindset.
  • Experienced in vendor management and contract negotiations.

Basic Qualifications

  • Bachelor's degree in Computer Science, Information Technology, Electrical Engineering, or equivalent.
  • Experience in cloud operations and team leadership in technical environments.

Preferred Certifications and Experience

  • AWS Certified Solutions Architect - Associate or Professional.
  • Microsoft Certified: Azure Architect.
  • Familiarity with DevOps tools (CI/CD, Jenkins, Git).
  • Experience with ITIL or ITSM frameworks.

Skills

AnsibleAWSAzureAzure AD SSOCDChefCloudFormationCloudFrontCIDHCPDNSDockerEC2ECSEKSGitIaCIAMITILITSMJenkinsLAN/WANLambdaLinuxMicrosoft AzureOCIPuppetPKIPowerShellPythonRDSSageMakerS3SFTPSSMTerraformVMwareWindows

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free