Description
Leidos is an industry and technology leader serving government and commercial customers with smarter, more efficient digital and mission innovations and our employees with endless opportunities. Passionate about customer success by being determined to understand and respond to our customers’ needs as if they were our own. United as a team, we are bound together by our conviction that ethics and integrity is core to how we operate.
We are seeking a highly skilled System Operations Manager to join our team and help oversee and optimize our CDC center’s hybrid cloud/on-prem infrastructure, contingent upon contract award.
Candidate MUST:
Be a US Citizen or Green Card holder with the ability to obtain a Public Trust security clearance and be located in the Atlanta metro area for hybrid onsite/telecommuting work.
If you thrive in a fast-paced environment and have a passion for technology, we’d love to meet you! The System Operations Manager is responsible for managing and optimizing the customer on-prem/cloud infrastructure and services. You will work closely with product management, development, and training teams and manage operations, security, content management, and system engineering staff to ensure the on-prem/cloud environments are secure, scalable, reliable, and cost-efficient. You’ll oversee the day-to-day operations of our on-prem/cloud services, including monitoring, provisioning, and performance optimization, while also implementing best practices for security and compliance.
Primary Responsibilities:
• Manage on-prem and cloud environments (Primary: Azure, Secondary: AWS, Google Cloud, etc.) to ensure performance, availability, and security. Manage on-prem / cloud infrastructure, including deployment, automation, monitoring, scaling, and troubleshooting. Monitor system performance, detect issues, and implement solutions to ensure maximum uptime and reliability.
• Manage the fulfillment and resolution of ITSM requests, incidents & problems, and changes within SLA requirements
• Lead the Incident Response (IR) team and coordinate resolution efforts.
• Implement and maintain on-prem and cloud monitoring and automation tools.
• Manage CDM operations, vulnerability management, and other cyber functions associated with securing federal cloud environments.
• Take direction from Program Manager/Program Operations Manager and support the implementation of cloud service design and transition into service operations.
• Support training program for team on implementing SAFe Agile management concepts into on-prem / cloud service operations.
• Optimize cloud resources for cost-efficiency and performance.
• Oversee on-prem / cloud security, ensuring compliance with industry standards and best practices.
• Develop and enforce on-prem / cloud management processes, workflows, and operational standards.
• Build and maintain disaster recovery plans and ensure data integrity.
• Provide mentorship and leadership to the system operations team.
• Maintain up-to-date knowledge of industry trends and emerging technologies.
Manage a team responsible for
- infrastructure systems, applications, and processes, and for ensuring that all issues are identified, tracked, and resolved in a timely manner. The environment includes Microsoft-based servers, databases, and workstations as well as VMware and Linux server instances and an extensive network infrastructure (LAN and WAN).
- maintain system backups and instances as well as explore opportunities to automate and optimize program systems.
- Maintaining a complex server-based enclave, including mitigation from performing system scans, vulnerability management activities, and active directory configuration.
- Identifying and correcting hardware and software issues.
- Providing technical assistance to companion work groups in support of overlapping projects and maintaining good inter-departmental relations.
- Participating in the creation and ongoing maintenance of documentation ensuring that clear, concise, and accurate information is readily available to assist with incident resolution.
- Communicating with users and publishing status of any system outages, as needed.
- Executing the process for managing information assurance vulnerability alerts (IAVAs) and system security scanning for equipment suites in accordance with the System Security Plans for these systems to identify and remediate IAVAs rapidly and accurately.
- Initiating IAVA responses and system security scans, completing remediation, extending IAVA patches and security updates to designated sites, and executing monthly security scans.
- Planning and implementing IT enhancements and undertaking project work.
- Maintaining test lab, equipment, and inventory management of all resources necessary for test lab maintenance.
- Providing timely and professional support in response to calls and emails.
Basic Qualifications:
• Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent experience).
• 12 years of experience in on-prem and cloud operations, infrastructure management, or a related field.
• Hands-on experience with cloud platforms (Azure preferred, AWS or GCP acceptable) and cloud management tools.
• Strong knowledge of cloud security best practices and regulatory compliance.
• Proficiency in automation tools (e.g., Terraform, Ansible, CloudFormation).
• Strong problem-solving skills and the ability to manage high-pressure situations.
• Experience with cost control, performance monitoring, and capacity planning in a cloud environment.
• Experience using Atlassian Software (Jira & Confluence)
• Excellent communication and interpersonal skills.
- Must have at least 2 years of experience with Linux server administration (RHEL and/or CentOS) in an enterprise environment (including both hardware installs and virtual machines).
- Must have at least seven (7) years of experience providing support for users and systems in Information Technology and/or information security environments.
- Must have at least two (2) years of experience with network switches, routers, and firewalls from Cisco, Juniper, or similar enterprise vendors.
- Must have experience in or familiarity with the following systems: Windows/Linux operating systems, VMware, VSphere and Vsphere architecture, VCenter.
- Must have experience with VDI architecture and environments (e.g., Citrix).
- Must have a solid understanding of advanced security protocols and standards, and information security principles and practices.
- Experience documenting and providing information for security accreditation and certification.
- Must have in-depth experience with software and security architectures.
- Must be committed to adopting and adhering to best practices including compliance with maintenance windows and change control procedures.
- Must be able to respond to system administration and maintenance problems while off duty, on an on-call basis.
- Must be able to travel up to 20% per year,
Preferred Qualifications:
• Experience with containerization technologies (Docker, Kubernetes).
• Experience using SAFe Agile concepts
• Familiarity with DevOps principles and CI/CD pipelines.
• Experience in cost management and optimization in the cloud.
• Cloud certifications (AWS Certified Solutions Architect, Azure Solutions Architect, Google Cloud Professional Cloud Architect, etc.).
hhscdc
Original Posting:
March 20, 2025
For U.S. Positions: While subject to change based on business needs, Leidos reasonably anticipates that this job requisition will remain open for at least 3 days with an anticipated close date of no earlier than 3 days after the original posting date as listed above.
Pay Range:
Pay Range $112,450.00 - $203,275.00
The Leidos pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law.
#Remote