1. Develop and operate data warehouse system and ETL pipeline for data access, collection, processing and storage, and support data analysis tasks
2. Manage deployment of the platform on public clouds with hundreds of instances across the globe
3. Dedicate to Big Data and Machine Learning Platform using Apache Spark and related technologies
4. Responsible for laying the foundation for the platform as well as proposing solutions to ease software development, monitoring of software, etc.
5. Handling hundred of terabyte data
Qualification: 1. 2+ years of experience building and operating large scale distributed systems or applications 2. Experience with operating Spark or Hadoop farm 3. Experience with managing data storage using HDFS and Cassandra 4. Expertise in developing data structures, algorithms on top of Big Data platforms 5. Ability to operate effectively and independently in a dynamic, fluid environment 6. Ability to work in a fast-moving team environment and juggle many tasks and projects 7. Eagerness to change the world in a huge way by being a self-motivated learner and builder Other: 1. Degree from a top Computer Science school or a related technical discipline a plus 2. Contributing to open source projects a huge plus (Please include your github page) 3. Experience working with Scala a plus 4. Experience with Hadoop, Hive, Flink, Storm, Presto and related big data systems is a plus 5. Experience with Public Cloud like AWS is a plus