Sr. Problem Management Engineer – Engineering Service Management (Remote) - Military veterans preferred

2025-06-10
CrowdStrike, Inc.
Other

/yr

  employee   contract


Sunnyvale
California
94086
United States

CrowdStrike, Inc.

Full time

R23434

About the Role:
We are seeking a Senior Engineering Problem Manager to lead the transformation of our Problem Management Engineering function. This strategic role will focus on embedding resilient, automated, and intelligent problem management practices into our engineering, operations, and platform ecosystems. You will be responsible for building technical integrations, leveraging AI/ML for advanced root cause analysis, and driving a culture of continuous learning and operational excellence.
You’ll lead end-to-end delivery of initiatives that reduce incident recurrence, improve service stability, and create measurable business value — with a strong focus on automation, governance, and DevOps alignment.
What You'll Do:

  • Design and implement modern problem management workflows, tightly integrated into engineering and operations toolchains.

  • Lead the governance of key problem management deliverables including post-incident action tracking, known error records, and systemic remediation.

  • Drive continuous evolution of a structured retrospective process that promotes learning and resilience engineering.

  • Partner with platform, SRE, and observability teams to automate known error workarounds, temporary fixes, and proactive health checks.

  • Utilize AIOps and ML-driven tooling to correlate events, detect patterns, and identify root causes more effectively.

  • Work closely with business units and product teams to perform business impact analysis and prioritize problem resolution based on value and risk.

  • Integrate post-incident review outcomes into continuous improvement loops, product backlogs, and technical roadmaps.

  • Maintain and evolve the tooling ecosystem supporting problem management, including dashboards, knowledge repositories, and workflows.

  • Act as a coach and change agent to promote a culture of accountability, proactive risk reduction, and shared ownership of reliability.


  • Key Focus Areas:
  • Retrospective Process Management: Facilitate structured reviews and systemic RCA that drive long-term improvements.

  • Automation of Known Errors & Workarounds: Reduce manual overhead through scripts, workflows, and proactive detection.

  • AI-Augmented Root Cause Analysis: Integrate ML models and historical telemetry to improve diagnostic speed and accuracy.

  • Post-Incident Governance: Ensure action items are documented, assigned, and driven to closure with cross-functional visibility.

  • Business Impact Analysis: Collaborate with stakeholders to prioritize recurring problems based on cost, customer experience, and risk.

  • Toolchain Integration: Seamlessly embed problem management into DevOps tools (e.g., Jira, ServiceNow, PagerDuty, GitHub).


  • What You'll Need:
  • 8+ years of experience in Engineering Operations, DevOps, Service Management, Platform/SRE Engineering.

  • Strong understanding of ITSM, particularly Problem, Incident, and Change Management.

  • Experience managing or building post-incident processes, RCAs, and follow-through governance models.

  • Proven ability to automate operational workflows and known error processes using scripting or platform tooling.

  • Proficiency with observability platforms and AIOps tools (e.g., Datadog, Splunk, New Relic, Moogsoft, or similar).

  • Exceptional collaboration and communication skills across technical and non-technical stakeholders.

  • Data-driven mindset with the ability to perform root cause trend analysis and report on service health metrics.

  • Experience working in DevOps, cloud-native, or agile environments.


  • Preferred Qualifications:
  • Experience with structured problem-solving methodologies (e.g., 5 Whys, Fishbone, Fault Tree).

  • Familiarity with knowledge management systems, runbooks, and self-healing infrastructure practices.

  • Background in software engineering, platform reliability, or infrastructure automation.

  • Certifications in ITIL, SRE, Agile, or SAFe frameworks.


  • #LI-LY1
    #LI-Remote

    #HTF


    Benefits of Working at CrowdStrike:
  • Remote-friendly and flexible work culture

  • Market leader in compensation and equity awards

  • Comprehensive physical and mental wellness programs

  • Competitive vacation and holidays for recharge

  • Paid parental and adoption leaves

  • Professional development opportunities for all employees regardless of level or role

  • Employee Resource Groups, geographic neighbourhood groups and volunteer opportunities to build connections

  • Vibrant office culture with world class amenities

  • Great Place to Work Certified™ across the globe



  • CrowdStrike is proud to be an equal opportunity employer. We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed. We support veterans and individuals with disabilities through our affirmative action program.

    CrowdStrike is committed to providing equal employment opportunity for all employees and applicants for employment. The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance, or any other characteristic protected by law. We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.

    If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at recruiting@crowdstrike.com for further assistance.

    Find out more about your rights as an applicant.

    CrowdStrike participates in the E-Verify program.

    Notice of E-Verify Participation


    Right to Work



    Equal employment opportunity, including veterans and individuals with disabilities.

    PI273072988