• Share this Job

Site Reliability Engineer (GCP)

Location : Hybrid, Northern Virginia
Job Type : Direct
Hours : Full Time
Travel : No
Relocation : No

Job Description :
 

Veritas Partners has an immediate need for a Site Reliability Engineer (GCP) to work in a full-time capacity for a well-established financial institution!


 


This is a hybrid in-office position.


 


The successful Site Reliability Engineer (GCP) will work on a newly developed Bank as a Service platform!


 


 We are in search of a dynamic SRE to partner with our Development teams to ensure the cloud-based infrastructure's integrity, performance, reliability, and cost-effectiveness.


 


Responsibilities:


 


       Provide L2/L3 support for production systems


 


       Work with the product and development teams to establish service level objectives and monitor to ensure the objectives are met


       Create dashboards to monitor performance and scalability in GCP platform


 


       An ability to design and code new software or modify existing software


 


       Supporting cloud environments in accordance with operational requirements


 


       Review, resolve incidents, escalation and tasks


 


       Facilitate root cause analysis meetings in the event of a production-systems incident and improve run books.


       Monitor production components by running health checks, monitoring latency and memory utilization.


       Practice incident management best practices and perform RCA.


 


       Participate in disaster recovery tests and operational acceptance tests


 


       Deploying and debugging cloud initiatives as needed in accordance with best practices


 


       Automating build and infrastructure self healing pipeline.


 


       Maintain and update deployment Playbook


 


       Remediate security vulnerabilities in the cloud infrastructure


 


       Work with the Information security and dev teams on implementing secure cloud best practices.


 


       Troubleshoot and resolve issues in live production environments and implement strategies to eliminate them with minimal support.


       Support and monitor new and existing services, platforms, and application stacks.


 


       Engage in improving the lifecycle of services deployment, operations, and refnement.


 


       Participate in periodic 24x7 on-call duties.


 


       Being accountable for resolving the outage via workaround or permanent fx


 


       Ensuring all administration and reports are maintained and up to date including contacts information technical diagrams post major incident reviews.


       Responsible for communicating with various stakeholders & shipping IT Communication.


 


       Responsible for the effective implementation of the process Incident, Change and Problem Management and conducts the respective reporting procedure.


       Monitor the incidents to ensure that the Service Level Agreement is respected.


 


       Identify initiate schedule and conduct incident reviews


 


 


Qualifications:


 


       Excellent verbal, written, and interpersonal communication skills to maintain relationships and partnerships.


       Strong leadership skills and understanding of developing and mentoring others.


 


       5+ years of experience in a DevOps Engineer role or related position


 


       2+ years of experience with GCP is a must. Experience with AWS or Azure preferred in addition to GCP experience.


       Experience with log monitoring tools


 


       Experience administering multiple observability or APM systems


 


       2+ years of Software Development work experience using Java or similar languages.


 


       Experience with Apache Kafka or similar event streaming platforms


 


       Experience with container orchestration


 


       Disaster recovery experience


 


       High level proficiency in understanding of REST and microservice architectures


 


       Advanced understanding of how to develop, build, test, and deploy code using an integrated CI/CD Pipeline


       Hands on experience automating CI/CD pipelines


 


       Cloud Certification(s) in AWS, GCP or Azure is preferred


 


       Demonstrated ability to manage and complete projects from design phase to implementation phase


 


 


Technical Skills & Experience Required


 


       Cloud services providers: GCP


 


       Orchestration: GKE, Cloud Run, Cloud Functions


 


       CI/CD tools: GitHub CI, Jenkins, CloudBuild


 


       Infrastructure as Code: Terraform


 


       Monitoring tools: Google Cloud Monitoring, Grafana, DataDog, Google Cloud Trace


 


       Logging tools: Google Cloud Logging, Prometheus and/or Splunk


 


       Secrets management


 


       Languages: Python, GO, Java, JavaScript, TypeScript


 


       Kafka, KSQL


 


       Linux


 


       Scalable, high-available architecture


 


       Agile development


 


       SCM and project management tools: Gitlab, Jira


 


 


Requirements:


 


       Hybrid/remote position with ability to work at our Sterling office.


 


       Expertise with GCP.


 


       Experience working with GitHub, Git-based tools, CI/CD tools similar to Jenkins, Artifactory, Terraform, CloudFormation and other modern tools.


       Experience with Kubernetes, Docker, and containerization, GKE or equivalent tools.


Required Qualifications :
 
Powered by AkkenCloud
www.hireveritas.com