-
View all jobs
Our Company
We're Hitachi Digital Services, a global digital solutions and transformation business with a bold vision of our world's potential. We're people-centric and here to power good. Every day, we future-proof urban spaces, conserve natural resources, protect rainforests, and save lives. This is a world where innovation, technology, and deep expertise come together to take our company and customers from what's now to what's next. We make it happen through the power of acceleration.
Imagine the sheer breadth of talent it takes to bring a better tomorrow closer to today. We don't expect you to 'fit' every requirement – your life experience, character, perspective, and passion for achieving great things in the world are equally as important to us.
Job Description
L1 SRE Operations Engineer
The L1 SRE is the first line of defense in monitoring, triaging, and executing standardized operational tasks for all enterprise applications running on standard patterns and platforms like Kubernetes, APIs, WAF, databases, API Proxy (Gloo, APIGEE), Kafka, and Cloud (AWS/Azure/GCP). They will followrunbooks, leverage automation, and escalate appropriately to minimize downtime.
Responsibilities
Monitor system health, alerts, dashboards, and logs across cloud and on-prem infrastructure.
Ability to isolate functional issue with application versus platform
Execute standardized runbooks for incident resolution, deployments, and routine tasks.
Perform initial triage of incidents and escalate to L2/L2+ as needed to mitigate the issue to get tobypass.
Document new issues, gaps in runbooks, and automation opportunities.
Provide excellent communication to stakeholders during incidents.
Support onboarding of new applications into the operations framework.
Skills
Mandatory Skills (Must-Have)
Example: When a Kubernetes pod crash-loop is flagged in Prometheus, L1 should validate it againstrunbooks, check pod logs, and escalate if restart attempts fail.
Example: Use a provided runbook to restart a failed API proxy service; if error persists beyonddocumented steps, escalate to L2.
Example: For a database connection timeout, collect error logs, verify service reachability, andprovide a detailed incident note to L2 before escalation.
Example: Run kubectl get pods -n to verify if deployments arehealthy.
Example: Modify a Bash script to include an additional log path in a health check.
Example: For an unreachable service, confirm DNS resolution and connectivity before escalating toL2.
Example: After handling an alert for disk usage, note missing cleanup steps in the runbook and flagfor update.
Preferred Skills (Nice-to-Have)
Example: Use AWS Console to check EC2 instance health status when a service alert is triggered.
Example: Execute
SELECT 1; to verify a database is reachable.
Example: Flag that manual log collection during outages could be replaced with a script.
Example: Log incident details in ServiceNow with accurate categorization and timestamps.
Example: Ask an AI ops assistant to summarize logs before escalation.
Qualifications
2–5 years in IT operations, NOC, or SRE/DevOps engineer role.
Kubernetes 101, Linux 101, Networking 101
Understanding of cloud-ready applications
Understanding of observability tools (Prometheus, Grafana, ELK, Splunk, etc.).
Strong troubleshooting mindset, ability to follow structured workflows. Eg: 5 Why?s and Fishbone
About Us
We're a global, team of innovators. Together, we harness engineering excellence and passion to co-create meaningful solutions to complex challenges. We turn organizations into data-driven leaders that can make a positive impact on their industries and society. If you believe that innovation can bring a better tomorrow closer to today, this is the place for you.
Fostering innovation through diverse perspectives
Hitachi is a global company operating across a wide range of industries and regions. One of the things that sets Hitachi apart is the diversity of our business and people, which drives our innovation and growth.
We are committed to building an inclusive culture based on mutual respect and merit-based systems. We believe that when people feel valued, heard, and safe to express themselves, they do their best work.
How we look after you
We help take care of your today and tomorrow with industry-leading benefits, support, and services that look after your holistic health and wellbeing. We're also champions of life balance and offer flexible arrangements that work for you (role and location dependent). We're always looking for new ways of working that bring out our best, which leads to unexpected ideas. So here, you'll experience a sense of belonging, and discover autonomy, freedom, and ownership as you work alongside talented people you enjoy sharing knowledge with.
We're proud to say we're an equal opportunity employer and welcome all applicants for employment without attention to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran, age, disability status or any other protected characteristic. Should you need reasonable accommodations during the recruitment process, please let us know so that we can do our best to set you up for success.
We're Hitachi Digital Services, a global digital solutions and transformation business with a bold vision of our world's potential. We're people-centric and here to power good. Every day, we future-proof urban spaces, conserve natural resources, protect rainforests, and save lives. This is a world where innovation, technology, and deep expertise come together to take our company and customers from what's now to what's next. We make it happen through the power of acceleration.
Imagine the sheer breadth of talent it takes to bring a better tomorrow closer to today. We don't expect you to 'fit' every requirement – your life experience, character, perspective, and passion for achieving great things in the world are equally as important to us.
Job Description
L1 SRE Operations Engineer
The L1 SRE is the first line of defense in monitoring, triaging, and executing standardized operational tasks for all enterprise applications running on standard patterns and platforms like Kubernetes, APIs, WAF, databases, API Proxy (Gloo, APIGEE), Kafka, and Cloud (AWS/Azure/GCP). They will followrunbooks, leverage automation, and escalate appropriately to minimize downtime.
Responsibilities
Monitor system health, alerts, dashboards, and logs across cloud and on-prem infrastructure.
Ability to isolate functional issue with application versus platform
Execute standardized runbooks for incident resolution, deployments, and routine tasks.
Perform initial triage of incidents and escalate to L2/L2+ as needed to mitigate the issue to get tobypass.
Document new issues, gaps in runbooks, and automation opportunities.
Provide excellent communication to stakeholders during incidents.
Support onboarding of new applications into the operations framework.
Skills
Mandatory Skills (Must-Have)
- System & Infrastructure Monitoring
Example: When a Kubernetes pod crash-loop is flagged in Prometheus, L1 should validate it againstrunbooks, check pod logs, and escalate if restart attempts fail.
- Runbook Execution
Example: Use a provided runbook to restart a failed API proxy service; if error persists beyonddocumented steps, escalate to L2.
- Incident Triage & Communication
Example: For a database connection timeout, collect error logs, verify service reachability, andprovide a detailed incident note to L2 before escalation.
- Kubernetes (Cloud or onprem) operations knowledge
Example: Run kubectl get pods -n to verify if deployments arehealthy.
- Scripting (Python, Bash, PowerShell)
Example: Modify a Bash script to include an additional log path in a health check.
- Networking & Security Awareness
Example: For an unreachable service, confirm DNS resolution and connectivity before escalating toL2.
- Documentation & Knowledge Capture
Example: After handling an alert for disk usage, note missing cleanup steps in the runbook and flagfor update.
Preferred Skills (Nice-to-Have)
- Cloud Platform Familiarity (AWS, Azure, GCP)
Example: Use AWS Console to check EC2 instance health status when a service alert is triggered.
- Database Basics (SQL/NoSQL)
Example: Execute
SELECT 1; to verify a database is reachable.
- Automation & Self-Service Mindset
Example: Flag that manual log collection during outages could be replaced with a script.
- Exposure to Incident Management Tools (xMatters, ServiceNow, Jira, etc.)
Example: Log incident details in ServiceNow with accurate categorization and timestamps.
- AI/Chatbot-Assisted Ops (emerging skill)
Example: Ask an AI ops assistant to summarize logs before escalation.
Qualifications
2–5 years in IT operations, NOC, or SRE/DevOps engineer role.
Kubernetes 101, Linux 101, Networking 101
Understanding of cloud-ready applications
Understanding of observability tools (Prometheus, Grafana, ELK, Splunk, etc.).
Strong troubleshooting mindset, ability to follow structured workflows. Eg: 5 Why?s and Fishbone
About Us
We're a global, team of innovators. Together, we harness engineering excellence and passion to co-create meaningful solutions to complex challenges. We turn organizations into data-driven leaders that can make a positive impact on their industries and society. If you believe that innovation can bring a better tomorrow closer to today, this is the place for you.
Fostering innovation through diverse perspectives
Hitachi is a global company operating across a wide range of industries and regions. One of the things that sets Hitachi apart is the diversity of our business and people, which drives our innovation and growth.
We are committed to building an inclusive culture based on mutual respect and merit-based systems. We believe that when people feel valued, heard, and safe to express themselves, they do their best work.
How we look after you
We help take care of your today and tomorrow with industry-leading benefits, support, and services that look after your holistic health and wellbeing. We're also champions of life balance and offer flexible arrangements that work for you (role and location dependent). We're always looking for new ways of working that bring out our best, which leads to unexpected ideas. So here, you'll experience a sense of belonging, and discover autonomy, freedom, and ownership as you work alongside talented people you enjoy sharing knowledge with.
We're proud to say we're an equal opportunity employer and welcome all applicants for employment without attention to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran, age, disability status or any other protected characteristic. Should you need reasonable accommodations during the recruitment process, please let us know so that we can do our best to set you up for success.
Key Skills
Ranked by relevance
cloud
kubernetes
prometheus
grafana
splunk
bash
aws
ai
firewall
storage
datadog
python
kafka
linux
jira
elk
dns
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
DevOps Engineer
2026-05-18
Full-time
Not Applicable
Portugal
IT Services
Engineering
View Job Details
Related
Fullstack Developer
2025-04-10
Full-time
Entry
Portugal
IT Services
Engineering
View Job Details
Related
Fullstack Developer
2024-11-09
Full-time
Entry
Portugal
IT Services
Engineering
Login to Apply
- Posted
- May 12, 2026
- Type
- Full-time
- Level
- Not Applicable
- Location
- Toronto
- Company
- Hitachi Digital Services
Industries
IT Services
IT Consulting
Categories
Engineering
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
DevOps Engineer
2026-05-18
Full-time
Not Applicable
Portugal
IT Services
Engineering
View Job Details
Related
Fullstack Developer
2025-04-10
Full-time
Entry
Portugal
IT Services
Engineering
View Job Details
Related
Fullstack Developer
2024-11-09
Full-time
Entry
Portugal
IT Services
Engineering