What does a site reliability engineer do? Ruth Grace talks about her responsibilities and compensation as a site reliability engineer in the Bay Area.
Want to join the conversation?
- What is Kafka? ("Sean, who works on Kafka here at Pinterest".... at01:39) Is it specific to Pinterest (e.g., a project group?) or a general thing (e.g., a programming language?)?(4 votes)
- Kafka is a messaging system, so apps can send messages to kafka and then kafka will then inform other apps of this message if it is of interest to them.
So it's part of the infrastructure.(6 votes)
My name is Ruth Grace Wong, I'm 24-years-old, I'm a Site Reliability Engineer at Pinterest on the Core Site Reliability Engineering Team, and my salary is approximately $120,000. Pinterest makes a website and mobile apps, and they allow people to collect things that they find on the internet, ideas that they wanna put into their own lives, and save them all in one place as inspiration. Core site reliability is responsible for the overall reliability of Pinterest. We're always trying to be proactive to help improve the experience for engineers. We've got about 400 engineers at Pinterest and our goal is to help them make their services more reliable, and we also have 150 million users of Pinterest, and so we want Pinterest to work well for them. For site reliability engineering, we have two categories of responsibilities. There's proactive and there's reactive. So, reactive work would be looking at operations requests if somebody needs help with someone, and then proactive would be improving the system so that they're more reliable and easier for people to use. I think there are two main skills that are good to have. The first one is learning to be okay with not feeling like you aren't the expert and you might not ever be the expert, but kind of diving in and doing your best anyways. And also, knowing how to code is also really good because then you can automate what you're doing and improve the system. Problems are so complex that it's important to also persevere. Sometimes I'll get stuck on something and I'll try to work on something else and then come back to it. It's also really important to ask the people around you for help because often there's that one senior engineer who knows all these details and they're not written down. I guess the most frustrating thing, or difficult thing about this job is that sometimes the problem that you're trying to fix is just so deep, so complicated, you try all these, they don't work, and it turns out it something that doesn't even make sense. Sean, who works on Kafka here at Pinterest, he once had this problem where certain machines would run fine and then other machines would not, and he figured out that it was because the way they were named, certain numbers on the end of the machine name were not working because it was being converted to Octal. And that's just an example of a problem that's so crazy that you would never be able to figure out unless you met somebody that had figured it out before.