Technical Support L2

Engineering, Tech

Remote Jobs Worldwide

October 24, 2025

Flexible Schedule Full Time Learning and Development Support Offer PTO

Apply Now

Respond to production incidents within defined SLAs and provide rapid problem identification and initial resolution;
Use monitoring instruments (Grafana, VictoriaMetrics/Prometheus) to identify and diagnose issues in microservices architecture;
Use logging tools (OpenSearch/Kibana) for comprehensive log analysis and root cause investigation;
Monitor and respond to alerts from PagerDuty or Grafana On-call, ensuring proper escalation and communication;
Escalate complex issues to appropriate specialized teams (DevOps, SystemRE, PlatformRE) with clear context and documentation;
Create, maintain, and update runbooks, troubleshooting guides, and incident documentation;
Provide clear, timely communication during incidents to stakeholders, development teams, and management;
Contribute to continuous improvement of incident response processes and tool utilization;
Participate in on-call rotations, ensuring timely response to critical incidents and proper handoff procedures;
Provide operational support and guidance to development teams regarding system reliability and performance.

1-3 years in L2 support, Site Reliability Engineering or technical support, or related role with incident response experience;
Hands-on experience with Grafana dashboards, VictoriaMetrics, Prometheus, and metrics exporters for system health monitoring and performance analysis;
Proficiency with OpenSearch (Kibana web interface), log aggregation, search queries, and log analysis for troubleshooting and root cause investigation;
Experience with PagerDuty, Grafana On-call, or similar alerting systems for incident response, escalation procedures, and on-call operations;
Strong analytical skills for identifying issues using provided monitoring tools, dashboards, and alerting systems;
Clear written and verbal communication skills for incident reporting, stakeholder updates, and creating/updating runbooks and troubleshooting guides;
Understanding of cloud concepts and familiarity with AWS services (EC2, EKS, RDS, S3) for context in incident response and escalation;
Systematic approach to problem-solving, ability to follow runbooks, and experience with incident response procedures;
Ability to quickly learn and effectively use monitoring, logging, and alerting tools provided by DevOps/SystemRE/PlatformRE teams.

Strong desire to learn new technologies and tools, with ability to adapt to changing monitoring and alerting systems;
Ability to interpret metrics, logs, and system behavior to make informed decisions;
Attention to details: ensures accuracy in infrastructure changes, configurations, and deployment processes;
Effective communication, ability to explain technical concepts clearly to both technical and non-technical stakeholders.

GROWE TOGETHER: Our team is our main asset. We work together and support each other to achieve our common goals;
DRIVE RESULT OVER PROCESS: We set ambitious, clear, measurable goals in line with our strategy and driving Growe to success;
BE READY FOR CHANGE: We see challenges as opportunities to grow and evolve. We adapt today to win tomorrow.

InclusivelyRemote