• Design, construct, install, test and maintain highly scalable data pipelines with state-of-the-art monitoring and logging practices.
• Investigate and resolve performance and stability issues in data processing systems and advise necessary infrastructure changes
• Select and integrate data tools and frameworks required to provide requested capabilities
• Recommend ways to improve data reliability, efficiency and quality.
• Collaborate with Data Scientists, DevOps and Project Managers on meeting project goals.
Minimum qualifications: • BS or MS in Computer Science or Computer Engineering • Proficient understanding of distributed computing principles • Data warehouse experience with major ETL tool. The experience and knowledge of good hands in any ETL tools is required. • Familiar with relational database and NoSQL database. like MySQL, Hadoop/HBase, MongoDB, Redis…etc • 3+ years of experience in working with big data using technologies like Spark, Kafka, Flink, Hadoop, and NoSQL datastores. • 2+ years of experience on distributed, high throughput and low latency architecture. • 2+ years of experience deploying or managing data pipelines for supporting data-science-driven decisioning at scale. Preferred qualifications: • A successful track-record of processing and extracting value from large disconnected datasets. • Machine learning or statistics related knowledge • Experience with the AWS ecosystem tools and system level DevOps tools.