What does a successful Automation Engineer do?
As a key Automation Engineer, you will be responsible for designing, developing, and implementing automation solutions that directly address operational toil, improve system reliability, and reduce risk. You will work closely with application support and SRE teams to identify high-impact automation opportunities and build robust, scalable solutions using a variety of tools and technologies.
What you will do:
Script & Tool Development: Design, develop, and maintain automation scripts and playbooks using languages like Python, Shell, and Ansible to automate routine system maintenance, deployment, and operational tasks.
CI/CD Pipeline Integration: Build and manage CI/CD pipelines (e.g., Jenkins, GitLab CI) to ensure that automation scripts are treated as first-class software assets, with rigorous testing, version control, and automated deployment.
Telemetry & Monitoring Integration: Integrate automation solutions with existing telemetry and monitoring platforms (Dynatrace, Splunk, Data Dog, Prometheus, Grafana) to enable proactive, data-driven automation and to create automated incident response workflows.
Infrastructure Automation: Collaborate with infrastructure teams to automate the provisioning and configuration of servers, networks, and cloud resources, ensuring consistent and reproducible environments.
Toil Reduction & Process Improvement: Actively engage with operations and support teams to identify and quantify manual processes and then develop targeted automation solutions to reduce operational effort.
Configuration Drift Detection & Remediation: Implement automated solutions to detect and remediate configuration drift across cloud and on-prem environments, ensuring consistency and compliance with baseline configurations.
Documentation & Best Practices: Create clear, comprehensive documentation for all automation solutions. Promote and enforce software engineering best practices, including modular design, version control, and robust error handling.
What you will need to have:
Bachelor’s degree preferably in Information Technology, Computer Science, Electrical/Computer Engineering, or related field
Overall, 5-12 years of experience.
Strong Experience in a hands-on role in DevOps, SRE, or Application/System Operations.
Strong programming skills in Python and Bash scripting.
Proficiency with configuration management tools, particularly Ansible.
Extensive experience with CI/CD tools and concepts (preferably GitLab, Azure DevOps, Harnes).
Experience with API integration and developing RESTful APIs.
A proven ability to troubleshoot complex issues and solve problems systematically.
Experience working on monitoring & telemetry tools like Splunk, Dynatrace, Moogsoft, Prometheus, Grafana
Experience with Synthetic Monitoring and API Monitoring using tools like Dyanatrace Synthetic Monitoring, Broadcom ASM or Selenium based synthetic monitoring
Strong understanding of observability concepts (metrics, logs, traces) and OpenTelemetry
Strong troubleshooting and problem-solving skills
Familiar with Network architecture (Apigee, Firewalls, Network diagrams)
Ability to multi-task demonstrating flexibility to easily adapt to changing business priorities.
Good report generation skills.
Experience with tools and techniques for event correlation, anomaly detection, and predictive analytics.
Knowledge of containerization technologies (Docker, Kubernetes).
Experience with database systems (SQL, NoSQL)
Experience of cloud platforms preferably Microsoft Azure
Experience with LLM-based automation (e.g. Azure OpenAI) for intelligent scripting or incident response
Familiarity with ML frameworks (e.g., Scikit-learn, TensorFlow, PyTorch) and their application in IT operations
R-10358952