Senior Site Reliability Engineer

4020-Site Reliability Engineering · Any City, Turkey
Department 4020-Site Reliability Engineering
Employment Type Full-Time

About Hazelcast


Hazelcast provides an in-memory computing platform that empowers Global 2000 enterprises to deliver innovative, low-latency, data-centric applications. Built for ultra-fast processing at an extreme scale, the platform can be deployed on-premise or consumed as a Cloud Service. Our cloud-native in-memory data grid and event stream processing technologies are trusted by leading companies such as J.P. Morgan Chase, Charter Communications, Apple, Ellie Mae, UBS, and National Australia Bank to accelerate business-critical applications.

We're far from the typical Silicon Valley startup story. Even though we're headquartered in San Mateo, CA, our origins trace back to Turkey, where a majority of our engineering team resides. Nowadays, we've become a highly distributed global family of 120+ employees. We are supported by a global cast of VC’s across the U.S., U.K., and Europe, including Bain Capital, C5 Capital, and Earlybird ventures


Overview

 

Hazelcast SRE team is seeking a Senior Site Reliability Engineer to help with the transformation of the enterprise product to a managed solution. This individual must be self-motivated and comfortable working remotely as part of our global team. As part of the SRE team, you will be responsible for different tasks from the traditional roles of support and automation to defining the upgrade strategies or working closely with other engineering teams as a cloud subject matter expert in defining the transformation of the solution to the cloud.

 

Responsibilities

 

  • Keeping Hazelcast cloud-based production systems running smoothly 24/7/365
  • On-call rotation to respond to availability incidents and work with support and engineers on customer incidents
  • Manage our infrastructure with Terraform and Kubernetes
  • Manage build/release of Dev, Test, Production environments
  • Work closely with software developers to deploy and operate our systems
  • Help automate and streamline our operations and software delivery processes
  • Build and maintain tools for deployment, monitoring, and operations
  • Improve the reliability and performance of our products through root-cause analysis and reviewing gaps in designs and implementations of our infrastructure

 

Requirements

 

  • 5 years+ experience in Cloud Infrastructure and Operations domains
  • Experience working in a multi-cloud environment - AWS, GCP, or Azure
  • Good understanding of security best practices in a cloud environment
  • Experience with setup, configuration, and usage of monitoring, distributed logging, and metrics to spot problems (e.g Prometheus, Grafana, Fluentbit, DataDog)
  • Extensive experience with Kubernetes and Docker is a must Experience with Terraform, ArgoCD, Helm
  • Experience with at least one programming languages, preferably Golang or Python
  • Experience with CI and building CD pipelines (Jenkins, GitHub Actions)
  • Must have a good understanding of cloud networking patterns
  • Must have a good knowledge of HA architectures
  • Desire to learn and work with new technologies
  • Dependable and good team player
  • Experience with Git
  • Love automation
  • Fluent in English

 

Nice to Have

 

  • Experience working with software engineers in designing cloud-native applications or troubleshooting them
  • Have an urge to document all the things so you don't need to learn the same thing twice
  • Background/experience working with distributed systems
  • Experience working in a Remote Agile environment
  • Experience working with Linux systems 


Location


We accept candidates working remotely from Ukraine and Turkey only. Additionally, in Istanbul, it's possible to work from Hazelcast's office. 


Thank You

Your application was submitted successfully.

  • Location
    Any City, Turkey
  • Department
    4020-Site Reliability Engineering
  • Employment Type
    Full-Time