Skip to content
mimi

Site Reliability Engineers

Morgan Stanley

Alpharetta · On-site Full-time Senior 3w ago

About the role

About the Role

Maintain applications once they are live by measuring and monitoring availability, latency and overall system health with a focus on business activities and continuously evaluate cost and TOIL. Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, capacity planning and launch reviews. Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity; includes automation for other various operational needs. Troubleshoot infrastructure issues, reviewing log files, updating documentation, and having knowledge base with resolutions. Work closely with the application Development team to understand the platform and create tools/utilities to help with production management. Work with upstream data providers and upstream consumers, and reducing the amount of escalation to development teams. Develop scripts and assist with code changes along with operational tasks/activities. Work closely with Application Development to ensure that the support team has excellent knowledge of the application set, own and maintain support knowledgebase and documents. Use analytical skills to find trends in the environment and drive out problems. Lead effort to determine improvement areas to stabilize the plant. Identify risks and work with a sense of urgency, working within a team or independently. Test and tune network, hardware, and software configurations to maximize performance. Interface with different teams like IT Dev managers, Infrastructure teams and lead as a Subject Matter Expert (SME) for the application(s) supported.

Responsibilities

  • Maintain applications once they are live by measuring and monitoring availability, latency and overall system health with a focus on business activities and continuously evaluate cost and TOIL.
  • Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, capacity planning and launch reviews.
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity; includes automation for other various operational needs.
  • Troubleshoot infrastructure issues, reviewing log files, updating documentation, and having knowledge base with resolutions.
  • Work closely with the application Development team to understand the platform and create tools/utilities to help with production management.
  • Work with upstream data providers and upstream consumers, and reducing the amount of escalation to development teams.
  • Develop scripts and assist with code changes along with operational tasks/activities.
  • Work closely with Application Development to ensure that the support team has excellent knowledge of the application set, own and maintain support knowledgebase and documents.
  • Use analytical skills to find trends in the environment and drive out problems.
  • Lead effort to determine improvement areas to stabilize the plant.
  • Identify risks and work with a sense of urgency, working within a team or independently.
  • Test and tune network, hardware, and software configurations to maximize performance.
  • Interface with different teams like IT Dev managers, Infrastructure teams and lead as a Subject Matter Expert (SME) for the application(s) supported.

Requirements

  • 5+ years of experience in a production environment with a solid software development background and understanding of performance tuning, end-to-end troubleshooting, networking fundamentals and appropriate attention to detail.
  • Ability to focus, provide resolutions for production issues in a high demanding and pressured environment.
  • 5+ years' hands‑on experience in designing, developing, and implementing technical solutions, or significant experience in deep technical support.
  • Strong experience in scripting language (Shell scripting, Python, Perl, etc.) and cloud driven development.
  • Strong database skills with DB2, Sybase or Oracle.
  • Hands‑on experience with Autosys or other batch scheduling software.
  • Strong experience in Continuous Integration and Continuous Deployment.
  • Strong experience in environment on demand for both Virtual Machines and containers.
  • Knowledge and hands‑on experience with monitoring tools like Splunk, IP Soft, Sockeye.
  • Practical experience in Agile Methodology (e.g. Scrum).
  • Knowledge or experience with automating deployments using Jenkins and Train.
  • Ability to diagnose technical problems, debug, optimize code, and automate routine tasks.
  • Hands‑on experience in application and database troubleshooting/issue resolution in a fast‑paced environment.
  • Excellent communication and ability to think out of the box for process improvements.
  • Knowledge of Cloud based deployment, security, networking concepts in Azure and AWS.
  • Hands‑on experience leveraging generative AI tools to enhance research, automate and improve productivity.
  • Knowledge or experience with algorithms, data structures, complexity analysis and software design.
  • Interest in designing, analyzing and troubleshooting large‑scale distributed systems.
  • Minimum BS degree in Computer Science, Engineering or a related field.

About the Company

We do it in a way that's differentiated and we've done that for 90 years.

Values

  • Putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back.
  • Aren't just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries.

Employee Benefits & Opportunities

Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work‑life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There's also ample opportunity to move about the business for those who show passion and grit in their work. To learn more about our offices across the globe, please copy and paste https://www.morganstanley.com/about-us/global-offices​ into your browser.

Equal Employment Opportunity

It is the policy of the Firm to ensure equal employment opportunity without discrimination or harassment on the basis of race, color, religion, creed, age, sex, sex stereotype, gender, gender identity or expression, transgender, sexual orientation, national origin, citizenship, disability, marital and civil partnership/union status, pregnancy, veteran or military service status, genetic information, or any other characteristic protected by law.

Internal Applicants

Internal mobility can be a way to grow your career and realize your professional potential. Typically, you must be in your position for at least 18 months and performing satisfactorily before applying for another job at the Firm. Internal applicants can find out more regarding career navigation, mobility guidelines and policy on our employee portal by clicking here.

Requirements

  • 5+ years of experience in a production environment with a solid software development background and understanding of performance tuning, end-to-end troubleshooting, networking fundamentals and appropriate attention to detail.
  • Ability to focus, provide resolutions for production issues in a high demanding and pressured environment.
  • 5+ years' hands-on experience in designing, developing, and implementing technical solutions, or significant experience in deep technical support.
  • Strong experience in scripting language (Shell scripting, Python, Perl, etc.) and cloud driven development.
  • Strong database skills with DB2, Sybase or Oracle.
  • Hands-on experience with Autosys or other batch scheduling software.
  • Strong experience in Continuous Integration and Continuous Deployment.
  • Strong experience in environment on demand for both Virtual Machines and containers.
  • Knowledge and hands-on experience with monitoring tools like Splunk, IP Soft, Sockeye.
  • Practical experience in Agile Methodology (e.g. Scrum).
  • Knowledge or experience with automating deployments using Jenkins and Train.
  • Ability to diagnose technical problems, debug, optimize code, and automate routine tasks.
  • Hands-on experience in application and database troubleshooting/issue resolution in a fast-paced environment.
  • Excellent communication and ability to think out of the box for process improvements.
  • Knowledge of Cloud based deployment, security, networking concepts in Azure and AWS.
  • Hands on experience leveraging generative AI tools to enhance research, automate and improve productivity.
  • Knowledge or experience with algorithms, data structures, complexity analysis and software design.
  • Interest in designing, analyzing and troubleshooting large-scale distributed systems.

Responsibilities

  • Maintain applications by measuring and monitoring availability, latency and overall system health.
  • Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, capacity planning and launch reviews.
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Troubleshoot infrastructure issues, reviewing log files, updating documentation, and having knowledge base with resolutions.
  • Work closely with the application Development team to understand the platform and create tools/utilities to help with production management.
  • Work with upstream data providers and upstream consumers, and reducing the amount of escalation to development teams.
  • Develop scripts and assist with code changes along with operational tasks/activities.
  • Own and maintain support knowledgebase and documents.
  • Use analytical skills to find trends in the environment and drive out problems.
  • Lead effort to determine improvement areas to stabilize the plant.
  • Identify risks and work with a sense of urgency, working within a team or independently.
  • Test and tune network, hardware, and software configurations to maximize performance.
  • Interface with different teams like IT Dev managers, Infrastructure teams and lead as a Subject Matter Expert (SME) for the application(s) supported.

Benefits

health insurancedental insurancevision insurance

Skills

AWSAutosysAzureDB2JenkinsOraclePerlPythonScrumShell scriptingSplunkSybaseTrain

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free