Build the foundations for understanding Stripe.
Stripe is the best software platform for running an internet business. We handle billions of dollars every year for hundreds of thousands of businesses around the world. One third of Americans bought something on Stripe in the last year.
With all this data, the Data Science team is looking for talented engineers to help us manage business critical data leveraged across the entire organization. If you are data curious, excited about designing data pipelines, and motivated by having impact on the business, we want to hear from you.
Every record in our data warehouse is vitally important for the businesses that use Stripe, so we’re looking for people with a strong background in data engineering and analytics to help us scale while maintaining correct and complete data. You’ll be working with a variety of internal teams across Engineering and Business to help them solve their data needs. Your work will provide teams with visibility into how Stripe’s products are being used and how we can better serve our customers. You will:
Identify data needs for business and product teams, understand their specific requirements for metrics and analysis, and build efficient and scalable data pipelines to enable data-driven decisions across Stripe Design, develop, and own data pipelines and models that power internal analytics for product and business teams Help the Data Science team apply and generalize statistical and econometric models on large datasets Drive the collection of new data and the refinement of existing data sources, develop relationships with production engineering teams to manage our data structures as the Stripe product evolves Develop strong subject matter expertise and manage the SLAs for those data pipelines
We’re looking for someone who has:
3+ Years of experience in a Data Engineering or Data Science role, with a focus on building data pipelines or conducting data intensive analysis. A strong engineering background and are interested in data Prior experience with writing and debugging data pipelines using a distributed data framework (Hadoop/Spark/Pig etc…) An inquisitive nature in diving into data inconsistencies to pinpoint issues Knowledge of a scientific computing language (such as R or Python) and SQL The ability to communicate cross-functionally, derive requirements and architect shared datasets
Some things you might work on: