Effective SRE
Objectives
This course will take you through all aspects of modern Site Reliability Engineering.
These principles and practices of effective SRE go from simple deployment to Continuous Delivery & Verification with Observability based SLO Engineering and Operations Efficiency.
Benefits: Learn how to effectively apply SRE hard and soft skills in your work and architecture.
Target Audience: Software engineers, DevOps engineers, System engineers, ML Architects…
Pre-requisites: Basic knowledge in biology and physiology, clinical analysis, medical research
- Software Engineering skills (OOP, Scripting ad ac code, ...)
- System Engineering skills (OS, Network, Deployment, Security, Monitoring, …)
- Advantageous: Performance Analysis, Release Engineering, APM/Infra Monitoring
- Distributed/ Reliable Architect Design

The Program



Module 1: SRE fundamentals
- Effective SRE: Principles & Challenges Digital Highway Blueprint
- DevOps Culture
- Fundamentals of Cloud Native Apps : Containers and microservices
Key learnings
- Introduction to Continuous Delivery, SLO Engineering & Operations Efficiency
- Foster continuous learning

Module 3: Operations Efficiency
- Operations Efficiency: Automation and Emergency Response
- Efficiency and Performance Tracking
- All about Dashboards
- Introduction to Error Budgetin
Key learnings
- Effective Dashboards for optimal Operations
- Set alerts to ensure Reliability
- Formulation of Error Budgets and Error Budget Alerting

Module 2: SLOs, SLIs & monitoring
- Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Monitoring: Past to Future
- Observability Monitoring and Automation
Key learnings
- Specify meaningful SLOs
- Find your SLIs to fit your SLOs
Get hands-on experience SLI/SLO

Module 4: AIOps
- Introduction to AI Ops
- Microservices APM real-time AI-driven alerts
- Performance Analysis
Key learnings
- Judge achievement of SLOs
- Automate Emergency Response

Module 5: Best practices
- Effective SRE Best Practices
- Reliability Architecture Patterns
- Q&A Session of the Week & Exploring SRE across companies
Key learnings
- Structure your architecture with reliable design choices
- Anticipation of potential problem

Module 7: Continuous verification
- Continuous Verification (CV) in CD Pipeline
- Automation of CV with AI
- How to implement CV & Rollback
Key learnings
- Learn how implement CV in CD Pipelines
- Automate Rollback on CV Exceptions

Module 6: CI/CD
- Deep dive: DevOps and Continuous Delivery (CD) Philosophy
- CI/CD Practices, Smart Automation: Gains in Efficiency and Reliability
- How to Build CD Pipelines monitoring system
Key learnings
CI/CD and Automation DevOps mentality for Release Engineering (RE)

Module 8: Elastic provisioning
- Distributed Scheduling and Reliability Perspective
- Elastic Provisioning: Change Management and Capacity Planning
- Provision & Manage any Infrastructure
Key learnings
- Danger-aware data-driven Provisioning
- Best Practices for Reliability Assurance
- Understanding downtime impact on Job Scheduling and automating Relaunching

Module 9: Service & Automation
- Security Layers for Effective SRE
- Automated release system with transparent reports for troubleshooting
Key learnings
- Structure your architecture to be security-aware at every step
- Streamline code reviews into development workflow

Module 10: On-call & SRE culture
- Being On-Call: Managing Operational Load, Leveraging Automatic Alerts and Dashboards
- Blameless Postmortem Culture
- The New On-Call Checklist
Key learnings
Clear on-call scheduling, escalation paths, incident management procedures


Effective SRE


Effective MLOps


ML for Life Sciences

ML4 DevOps
Interested in shaping your tailored training ?
Contact us at info@digital-innovation-partner.ch

