At Xandr, we process over 130 terabytes of data per day and serve 45 billion ads each day. Our software is complex, massively scaled, and built with a wide variety of languages and tools, lending itself to the most exciting and challenging technical problems. Data Science is at the core of what we do and we’re looking for a Software Engineer to play a pivotal part in developing, improving, and scaling our platform.
About the team:
The Data Science Core Technology team builds platforms that are used to identify and analyze relevant data to match ad buyers and sellers and help them achieve their business goals. To do this, the data science platform engineers work closely with data scientists, implementing, testing and deploying complex platform tools and applications for our diverse client base. We work with Python, Java, Scala, Kafka and Spark to support teams that build machine learning algorithms that run in-production and at scale
About the job:
•Leads the data science and analytics teams to use effectively the data science platform clusters composed of Cloud resources (AWS-EMR), and on-prem Hadoop, Kubernetes and Database clusters (MySQL/Vertica/Postgres/Snowflake).
•Acts as an engineering leadership for the Data Science and Analytics organizations to simplify the life of our team members. Supports the team to extract product requirements and common workflows.
•Encourages software engineering best practices with a focus on Clean Code principles such as test driven development, frequent release cycles and long-term maintainability.
•Provides full scientific python support for data scientists and analysts (Python, Pandas, Numpy, Scipy, Dask, Pypi, Conda, Docker, Linux).
•Collaborate with data scientists and analysts to support efforts to carry out large-scale data exploration, feature engineering, predictive model building and training with tools such as PyTorch, Scikit-learn and Tensorflow/Keras.
•Help set up and manage various computational job dispatching systems that support business critical machine learning and reporting workloads.
•Troubleshoot and configure client-side and server-side configuration and performance issues related to Hadoop, Yarn, EMR, S3, Presto, Hive, Spark, Kafka and GPU computing.
•Support the design and development of high performance, distributed computing tasks using Big Data technologies such as Hadoop, NoSQL, text mining and other distributed environment technologies.
• Masters of Science in Computer Science, Math or Scientific Computing preferred. • Typically requires 6-8 years experience handling Big Data pipelines and Machine Learning systems architectures. • Experienced technical professional with extensive experience. • Completes highly complex work within discipline/specialty area. • Leads the development of concepts/methods/techniques. • High team impact. • Has extensive technical knowledge in broad relevant part of the data science and analytics workloads. • Defines and Applies broader knowledge of discipline/specialty area standards to work assignments. • Deep understanding of the majority of Xandr technologies/systems/procedures. • Deepens technical knowledge through exposure and continuous learning. • Identifies systemic problems/issues. • Solves non routine problems by independently applying judgment to established analysis and standard approaches. • Builds roadmaps for the data science team to include creative practical solutions to systemic and recurrent system-wide problems. • High levels of leadership and decision making. Sets own priorities and aligns them with business objectives. • Independently applies knowledge of technical practices and specialty area standards. • Independently completes assignments; participates in diverse projects. • Contributions to Enterprise technology: • Leads large and complex technical initiatives. • Works on new technologies development and/or existing technologies maintenance. • Contributes to milestone project completion. • Develops and simplifies complex technical information. • Provides training/guidance to others in work area breaking down information in a systematic/logical manner. • Cultivates good peer working relationships across teams and functional areas.