Walmart hiring for Software Engineer III (Service Reliability Engineer) jobs in Bellevue, WA, US
Position Summary...What you'll do...Walmart's Transactional System provides core transactional systems to enable segment and technology partners in creating wonderful omni experiences with speed and leverage. We are a highly motivated group of engineers, working in an agile group to solve sophisticated and high impact problems. This role is part of Cloud Powered Checkout team and will build the next generation multi-tenant, client agnostic, highly scalable, omnichannel checkout solution to seamlessly enable a frictionless customer checkout experience across all sales channels globally. We process millions of orders daily through our high-performance checkout services running in Edge and Cloud.As a Site Reliability Engineer in the CPC Team, you will work with L2, Other dependent Applications, Platform team, DevOps and Engineering practitioners to proactively maintain mission-critical infrastructure, cloud platforms, microservices, tools, and processes that will ensure the highest levels of availability and reliability of CPC applications.
About Team: Our team works closely with our US stores and eCommerce business to better serve customers by empowering team members, stores, and merchants with technological innovation. From groceries and entertainment to sporting goods and crafts, Walmart U.S. offers an extensive selection that our customers value, whether they shop online at Walmart.com, through one of our mobile apps, or in-store. Focus areas include customers, stores and employees, in-store service, merchant tools, merchant data science, and search and personalization.
What you'll do:
- Incident triage, Escalation and Resolution: Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for quick remediation, engaging the right teams for recovery [Reduce MTTE - Mean Time to Engage], and focusing on immediate restoration [ Reduce MTTR - Mean Time to Restore] of large-scale enterprise systems.
- Alert, Monitoring, Log analysis: Detect and analyze monitoring graphs and alerts to identify systems causing production impacts with various tools like Grafana, Prometheus, MMS, Service Now, JIRA, Dynatrace, Splunk etc [Reduce MTTD - Mean Time to Detect].Enhance Alerting solutions: Design and implement JavaScript for the integration of alerting tool with service API endpoints with various tools like ServiceNow, Spotlight, Splunk, and xMatters. Requires knowledge of: Monitoring and alerting tools; Monitoring metrics and key performance indicators (for example, availability, MTBF, MTTR); SLIs and SLOs (for example, request latency, availability, error rates, saturation); Distributed tracing; Alerting logic. To demonstrate awareness of the metrics used to monitor software or system performance. Monitors current performance data to ensure ad herence to defined SLOs and SLIs for simple applications/systems. Demonstrates awareness of the different types of alerts generated by the monitoring tools. Demonstrates awareness of infrastructure and application metrics.
- Disaster Recovery Planning: Requires knowledge of: Disaster recovery procedures and processes; Enterprise disaster recovery systems. To work with business partners to identify and document critical applications. Interprets and follows procedures in contingency plans. Explains the contingency and disaster recovery plans for assigned environment. Executes established procedures necessary to continue operations in an emergency. Participates in the design of a minimum operating environment for a computer-based facility.
- Performance and Optimization : Requires knowledge of: Unix/Linux performance optimization tuning; Java/NodeJS/Tomcat/Apache tuning and optimization; Chaos tools to utilize established criteria (for example, probability of failure, frequency of failure) to measure site reliability. Monitors site reliability conditions and new reliability requirements. Assists in the design and development of a reliability program plan for a specific site environment. Applies appropriate tools, services, or applications for reliability prediction and other site improvements. Researches and assesses various reliability models for different site environments.
- Work on Product Enrichment & Content Services projects at Walmart: Develop enterprise monitoring and utilize tooling software solutions such as Grafana, Splunk etc, to improve visibility, pro-actively detect issues and restore system availability.
- Develop Tools and support: Design and develop solutions for widespread internal communications for cloud applications support or workflows for infrastructure availability issues with various internal applications with multiple programming languages like Java, JavaScript (React, Node JS), Python and Shell programming technologies like Prometheus, Database Query languages. Design and develop a UI tool to display Item Content Quality data on a dashboard using AngularJS, ReactJs, HTML5 & CSS3 etc
- To create and maintain Playbooks.
- Steps to perform correct analysis on the issues and engage correct teams for CPC, Dependent downstream services and Platform teams.
- To handle Deployments. Streamline the deployments process and handle the responsibility as a single team. Understand and explore Post validations and back out steps to make app more resilient.
- Coordinate with platform teams for non-app releases like VM upgrades, DB Maintenance, and other component environment related tasks.
- Participate in rotating on-call duties and work across different time zone with a multi-national team
- Responsible for timely root cause analysis [RCA] of production issues.
- Develop reusable tooling and processes to drive and improve customer experience and lower operational costs.
- Understand DevOps Industry best practices
- Help teams to build highly Observable and Resilient systems
- Collaborate with developers to capture requirements and understanding pain points
- Build reusable tools, library, dashboards which can be used across DevOps/SRE teams
What you'll bring:
- Bachelor's degree in Computer Science, Engineering or related discipline
- 3+ years of hands-on related to SRE, Operations & Development experience with Java Script, Java , Restful services, Git, Maven, Jenkins, DevOps , Containerization, Docker, Kubernetes, Azure, Google cloud, Kafka, Azure Cosmos, Azure SQL, Mega cache CI/CD ,Prometheus, Grafana, Splunk etc.
- Automation and Self-healing: Demonstrate knowledge of scripting and software development for automation and self-healing of multi-cloud environments. Help enhance existing solutions by developing automation with Docker, Kubernetes and working with DevOps and Engineering partners.
- Excellent end to end technical understanding of core infrastructure, cloud services, platforms, and micro-services.
- Ability to effectively triage - be able to detect and determine symptom vs cause.
- Identify and drive continuous improvement efforts to reduce waste (eliminate, automate or streamline).
- Influence the design of system architecture and tactical solutions.
- Familiar with log centric tooling. Produce time series data and reusable dashboards for use both during and post event.
About Walmart
Global Tech Imagine working in an environment where one line of code can make life easier for hundreds of millions of people. That's what we do at Walmart Global Tech. We're a team of software engineers, data scientists, cybersecurity expert's and service professionals within the world's leading retailer who make an epic impact and are at the forefront of the next retail disruption. People are why we innovate, and people power our innovations. We are people-led and tech-empowered. We train our team in the skillsets of the future and bring in experts like you to help us grow. We have roles for those chasing their first opportunity as well as those looking for the opportunity that will define their career. Here, you can kickstart a great career in tech, gain new skills and experience for virtually every industry, or leverage your expertise to innovate at scale, impact millions and reimagine the future of retail.Flexible, hybrid work: We use a hybrid way of working that is primarily in office coupled with virtual when not onsite. Our campuses serve as a hub to enhance collaboration, bring us together for purpose and deliver on business needs. This approach helps us make quicker decisions, remove location barriers across our global team and be more flexible in our personal lives.
Benefits: Benefits: Beyond our great compensation package, you can receive incentive awards for your performance. Other great perks include 401(k) match, stock purchase plan, paid maternity and parental leave, PTO, multiple health plans, and much more.
Equal Opportunity Employer: Walmart, Inc. is an Equal Opportunity Employer - By Choice. We believe we are best equipped to help our associates, customers and the communities we serve live better when we really know them. That means understanding, respecting and valuing diversity- unique styles, experiences, identities, ideas and opinions - while being inclusive of all people. The above information has been designed to indicate the general nature and level of work performed in the role. It is not designed to contain or be interpreted as a comprehensive inventory of all responsibilities and qualifications required of employees assigned to this job . The full Job Description can be made available as part of the hiring process.At Walmart, we offer competitive pay as well as performance-based bonus awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting. Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more. You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable. For information about PTO, see https://one.walmart.com/notices . Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms. For information about benefits and eligibility, see One.Walmart . The annual salary range for this position is $108,000.00-$216,000.00 Additional compensation includes annual or quarterly performance bonuses. Additional compensation for certain positions may also include: - Stock
Minimum Qualifications... Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 2 years' experience in software engineering or related area.Option 2: 4 years' experience in software engineering or related area.
Preferred Qualifications... Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.Customer Service, We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart's accessibility standards and guidelines for supporting an inclusive culture.Masters: Computer Science
Primary Location... 10500 Ne 8th St, Bellevue, WA 98004, United States of America