Description and Responsibilities :
This is another 0 to 1 story. As an early data member, we need to figure out the data driven policy, strategies, engineering requirements from the company. In Paktor, data / backend sides are 100% on AWS, therefore the whole data ingestion, automation and data warehouse etc. are relying on those components. We are processing 50-100x GB realtime / batch jobs and the other data sources (RDBMS, APIs) for ETL/ELT on S3, Redshift, the data platform helps our marketing / HQ scientists team getting data into insights and making good decisions. More responsibilities / details as below.
- Supports Big Data and batch, real-time analytical solutions leveraging transformational technologies.
- Optimize data pipeline on AWS using Kinesis-Firehose/Lambda/Kinesis Analytics/Data Pipeline, and optimize, resizing Redshift clusters and related scripts.
- Translates complex analytics requirements into detailed architecture, design, and high performing software such as machine-learning, CI/CD of recommendation pipeline.
- Collaborate with client / backend side developers to formulate innovative solutions to experiment
and implement related algorithms.
Tech Stacks :
- Storage, S3/Redshift/Aurora
- Realtime process and Message system, Kinesis Firehose / SNS
- Data warehouse, Visualization, Redshift / Klipfolio / Metabase
- ETL/ELT workflow, Lambda / SNS / Batch / Python
- Recommendation, ML, DynamoDB / EMR / Spark / Sagemaker
- Metadata management, Athena (presto) / Glue / Redshift Spectrum
- Continuous deployment, Elasticbeanstalk / Cloudformation
- Operations, PagerDuty / Zapier / Cloud Watch
Reports to : CTO, Data Head