Service Stability Lead – Credit & Lending Platforms
A Service Stability Lead (SSL) empowers users with high availability and stellar performance levels of our Credit & Lending systems. The SSL is responsible for delivering highly optimized and high performing environments ensuring the reliability, scalability, and performance of a company's systems and infrastructure. Their duties include designing and implementing automation for deployment, monitoring, and incident response, as well as collaborating with development teams to improve system architecture and reliability. SSLs also conduct post-mortems to analyze and prevent future incidents, and they may be involved in capacity planning and performance optimization efforts. Strong coding and scripting skills, along with a deep understanding of system architecture and cloud technologies, are essential for this role. The SSL will be the primary interface to the business units to discuss service stability and roadmaps for improvement. We are committed to delivering exceptional services to our customers and ensuring the stability and reliability of our systems is paramount. As we continue to grow, we are seeking a talented individual to join our team.
Position Responsibilities:
Design and Refine
- Works closely with business units, application teams, infrastructure areas and vendors to identity, review and evaluate the solution requirements.
- Investigates and proposes strategic fits for virtualization, consolidation and rationalization solution opportunities within the infrastructure or business.
- Proposes changes to the technical architecture and design solutions as applicable.
- Evaluates and aligns strategic fit solutions across infrastructure platforms and solutions specific to system hardware and software technologies.
- Understands, participates, reviews and influences long term capacity planning and technology investments.
Technical Consulting
- Provides client consulting and planning guidance as applicable for moderate to large highly complex projects/programs.
- Provides consultation and works closely with other functional infrastructure areas/departments on multiple initiatives to meet common organizational / business goals and objectives.
- Participates in and provides consulting to project teams on architectural, design development, integration opportunities, planning of highly complex systems and assures it is aligned to our established strategies, guiding principles, rationales and practices.
Planning & Execution
- Identifies and evaluates projects/programs/initiatives and design processes that enhance and rationalize existing and upcoming solutions.
- Maps requirements into standard services solution, identity opportunities for integrating to existing or reuse technology and provides cost effective solutions for moderate to large highly complex project/programs/initiatives.
- Calculates the potential cost of outages and plans for contingency.
- Review, identify and manage requirements for moderate to complex solutions and do a cost value, feasibility and risk analysis.
Risk Management
- Reviews, participates, develops, and updates architectural standards, guiding principles, rationales and strategies.
- Performs in-depth analyses of the possible risks and countermeasures for them.
- Evaluates, reviews and approves highly complex design solutions for business and Infrastructure project or programs or initiatives.
Position Qualifications:
- Bachelor's Degree from an accredited university in Computer Science, Engineering or in a Technology related field, OR equivalent through a combination of education and/or technology experience, OR 12 years of technology experience
- 8 years of Technology experience
- 7 years of experience identifying technical solutions for complex business problems, identifying the benefits and risks of the solutions and providing recommendations
- 5 years of experience in SRE or related experience
- 3 years of experience deploying and administering a subset of Monitoring Tools described below
Preferred Qualifications:
- Excellent communication and collaboration skills, as the role often involves working closely with development, operations, and other teams
- Proven ability to manage in a matrix manner across multiple teams to achieve collective goals
- Proficiency with cloud platforms such as AWS, Azure, or Google Cloud
- Experience with containerization and orchestration technologies like Docker and Kubernetes
- Deep understanding of system architecture, networking, and distributed systems
- Familiarity with monitoring and logging tools such as Prometheus, Grafana, ELK stack, etc
- Experience with configuration management tools like Ansible, Puppet, or Chef
Category C - Days in the office will either be designated days or will vary week to week from 2-5 days
8:00am - 5:00pm Monday - Friday
To Be Determined Based on Individual Experience