Staff Software Engineer, Stream Infrastructure

Other Jobs To Apply

No other job posts for this day.

Who we are

About Stripe

Stripe is a financial infrastructure platform for businesses. Millions of companies - from the world’s largest enterprises to the most ambitious startups - use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.

About the team

The Stream Infrastructure team builds and operates Stripe’s real-time, event-driven platform that powers asynchronous communication between services and high-throughput streaming workloads across the company. We run globally distributed systems with high reliability and performance to meet Stripe’s scaling, availability, and product needs. The team operates dozens of Apache Kafka clusters with industry-leading reliability and efficiency, and we continually reduce operational toil by investing in automation and self-service tooling for upgrades, maintenance, and day-to-day operations. The team is distributed between Seattle, Toronto and remote locations.

What you’ll do

You’ll help define and deliver the next generation of Stripe’s Kafka-first streaming infrastructure - driving industry-level innovation to meet extremely high availability targets at global scale. Partnering with infrastructure engineers, adjacent platform teams, and the product orgs that depend on Kafka every day, you’ll set a long-term technical direction that scales with Stripe’s growth while enabling reliable, efficient operations for years to come. You’ll work on the hardest problems in operating Kafka in production - availability, resilience, performance isolation, and automated recovery - so teams across Stripe can confidently build event-driven systems on top of it.

Responsibilities

  • Design, build, and operate event-driven infrastructure with Apache Kafka at the center, alongside technologies like Temporal and AWS services
  • Partner with product and platform teams across Stripe to understand requirements, unblock Kafka adoption, and improve how streaming infrastructure is used end-to-end
  • Define and implement operational best practices (e.g., shuffle sharding, cellular architecture, load shedding, automated failover) to improve resilience and reliability at scale
  • Drive fleet-level automation and standardization (“pets” to “cattle”) through self-service workflows, safer rollouts, and self-healing systems that reduce manual operations
  • Lead initiatives that raise the bar on Kafka availability and durability (e.g., multi-region strategies, disaster recovery readiness, operational readiness reviews, incident learning)
  • Evaluate and productionize Kafka ecosystem capabilities (e.g., tiered storage, direct-to-s3) to improve cost-efficiency and scalability without compromising reliability
  • Here's some examples of recent work the team has done: 6 Nines and Tiered Storage in Production?

Who you are

We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.

Minimum requirements

  • This is a Staff-level role - that typically means 10+ years of experience building, operating, and evolving large-scale production systems
  • Experience as a technical lead for team(s) working on distributed systems, including scaling them in fast-moving environments
  • Hands-on experience with big data technologies such as Kafka, Pulsar, Flink, or Pinot
  • Comfortable operating with high autonomy and ownership
  • Growth mindset and a willingness to learn quickly, explore ambiguous problem spaces, and dive deep when needed
  • Strong written and verbal communication skills, including the ability to produce clear technical documentation

Preferred qualifications

  • Experience operating streaming technologies as a platform (e.g., Kafka, Pulsar, Flink, Pinot) for internal customers at scale
  • Experience building or operating control planes for managing large-scale infrastructure
Back to blog

Common Interview Questions And Answers

1. HOW DO YOU PLAN YOUR DAY?

This is what this question poses: When do you focus and start working seriously? What are the hours you work optimally? Are you a night owl? A morning bird? Remote teams can be made up of people working on different shifts and around the world, so you won't necessarily be stuck in the 9-5 schedule if it's not for you...

2. HOW DO YOU USE THE DIFFERENT COMMUNICATION TOOLS IN DIFFERENT SITUATIONS?

When you're working on a remote team, there's no way to chat in the hallway between meetings or catch up on the latest project during an office carpool. Therefore, virtual communication will be absolutely essential to get your work done...

3. WHAT IS "WORKING REMOTE" REALLY FOR YOU?

Many people want to work remotely because of the flexibility it allows. You can work anywhere and at any time of the day...

4. WHAT DO YOU NEED IN YOUR PHYSICAL WORKSPACE TO SUCCEED IN YOUR WORK?

With this question, companies are looking to see what equipment they may need to provide you with and to verify how aware you are of what remote working could mean for you physically and logistically...

5. HOW DO YOU PROCESS INFORMATION?

Several years ago, I was working in a team to plan a big event. My supervisor made us all work as a team before the big day. One of our activities has been to find out how each of us processes information...

6. HOW DO YOU MANAGE THE CALENDAR AND THE PROGRAM? WHICH APPLICATIONS / SYSTEM DO YOU USE?

Or you may receive even more specific questions, such as: What's on your calendar? Do you plan blocks of time to do certain types of work? Do you have an open calendar that everyone can see?...

7. HOW DO YOU ORGANIZE FILES, LINKS, AND TABS ON YOUR COMPUTER?

Just like your schedule, how you track files and other information is very important. After all, everything is digital!...

8. HOW TO PRIORITIZE WORK?

The day I watched Marie Forleo's film separating the important from the urgent, my life changed. Not all remote jobs start fast, but most of them are...

9. HOW DO YOU PREPARE FOR A MEETING AND PREPARE A MEETING? WHAT DO YOU SEE HAPPENING DURING THE MEETING?

Just as communication is essential when working remotely, so is organization. Because you won't have those opportunities in the elevator or a casual conversation in the lunchroom, you should take advantage of the little time you have in a video or phone conference...

10. HOW DO YOU USE TECHNOLOGY ON A DAILY BASIS, IN YOUR WORK AND FOR YOUR PLEASURE?

This is a great question because it shows your comfort level with technology, which is very important for a remote worker because you will be working with technology over time...