How will this role impact First Command?
The Site Reliability Engineering practice combines software and systems engineering to build tools to run and manage First Command’s enterprise solutions. Their responsibility is to bridge the gap between development and operations, aiming to expedite development work while retaining high level of system resiliency. The SRE Engineer is someone who requires minimal management oversight. With a focus on operational excellence, the SRE Engineer builds and monitors dashboards to provide proactive alerting mechanisms to themselves and the respective agile teams. They drive operational excellence by working alongside agile teams to provide reliable automated deployment processes across all environments. The SRE Engineer work closely with developer and support roles to ensure the production environment provides the proper end user experience.
What will the employee do in this role?
- Maintain high solution uptime of Production, Stage, and QA environments while embracing rapid change and growth of solutions
- Supporting new services and solutions before they go live through activities such as system design consulting, development frameworks and tooling, creation of deployment pipelines, capacity planning, and launch reviews
- Developing and implementing real time observability solutions that provide visibility into health
- Implementing, monitoring, and maintaining CI/CD frameworks and Configuration Management
- Organizing, Securing, and Automating infrastructure deployments
- Brainstorming and championing operational improvements
- Documenting Tribal Knowledge
- Key player and leader in an Agile environment, participating in daily huddles, sprint planning, retrospectives, etc.
- Attends Agile team and development group meetings
- Communicate and work alongside members across multiple teams in support of their day-to-day work items
- Leads troubleshooting processes to determine root cause analysis
- Leads Community of Practices or other internal training opportunities within the development group
- Assists agile teams in creation of deployment and release plans
- Gather and analyze metrics from our solutions to assist in performance tuning and fault finding
- Continued education of First Command business processes by engaging business partners
- Continued education to learn additional technologies, programming languages, industry best practices and tools that are needed within First Command
- Serve as an escalation point and mentor for all agile team members on technical issues
- Responsible for performing business and technical knowledge transfer with their peers
- Serve as System Administrators for solutions and technology where privileged access is required
- Leader of postmortem review processes and drives to completion the work necessary to eliminate or reduce the chances of a reoccurring event
What skills and qualifications do you need?
Education
- Preferred – Bachelor’s degree
Work Experience
Required Knowledge, Skills and Abilities
- Required – Extensive experience with web-based technologies and Windows OS
- Required – Experience with cloud-based technologies and solutions
- Required – Solid knowledge of scripting language
- Required – Solid knowledge of SQL
- Required – Solid knowledge of Git and a development IDE
- Required – Knowledge of DevOps tools and mindset
- Required – Ability to work alongside others and be a team player
- Required – Knowledge of 3 or more core First Command business processes
- Required – Ability to work across multiple teams to ensure operational efficiencies
- Required – Visio or comparable drawing tool
- Required – Proactive approach to identifying problems and areas of improvement
- Preferred – Experience with all aspects of Azure DevOps
#LI-NC1 #LI-hybrid