Avatar of Chin-Hung (Wilson) Liu.
Chin-Hung (Wilson) Liu
Senior Data Engineer at Paktor x M17 Entertainment Group | AWS x GCP x Azure Big Data Specialist | Data Architect
ProfileResume
Posts
26Connections
Imprimer
Avatar of the user.

Chin-Hung (Wilson) Liu

Senior Data Engineer at Paktor x M17 Entertainment Group | AWS x GCP x Azure Big Data Specialist | Data Architect
I'm leading, and architect a large-scale data pipeline from Lomotif, Paktor x 17LIVE (GCP/AWS/python/scala), cooperating with Singapore data science/machine learning team/TW HQ data team, and Hadoop ecosystem-related experiences such as HDFS/Hbase/Kafka in JSpectrum (Hong Kong, Sydney). ● Having 10+ years of experience in designing, and developing various Java/Scala/Python-based applications that support day-to-day operations. ● 7+ years of working on multiple issues as a data team member or driving analysis and elaboration, design, development of data pipeline applications, and building automation tools. ● Extensive knowledge of Spark and Hadoop ecosystems, such as Hadoop, HDFS, HBase, etc. ● Strong knowledge in developing/designing AWS / GCP Big Data services. ● Extensive skill in developing generic distributed systems, streaming processing, deployed machine learning pipelines, and continuous development of ML models.
Logo of the organization.
KKCompany
National Taiwan University
Taiwan

Professional Background

  • Statut Actuel
    Employé
    Ouvert à de nouvelles opportunités
  • Profession
    Data Engineer
    Back-end Engineer
  • Fields
    Logiciel
  • Expérience professionnelle
    10 à 15 ans (10 à 15 ans pertinente)
  • Management
    I've had experience in managing 1-5 people
  • Skills
    Big Data
    Data Engineering
    ETL
    AWS
    GCP
    Python
    BigQuery
    Data Warehouse
    Data Pipeline
    Java
    Azure
    SQL
    Spark
    kafka
    spark streaming
    Scala
    Redshift
    HBase
    SQL Server
    AWS S3
    AWS Lambda
    MongoDB
    Hadoop
    Hadoop Distributed File System
    AWS SQS
    Azure Storage
    MySQL
    PostgreSQL
    Postman for API
    Snowflake
  • Languages
    English
    Professionnel
    Chinese
    Natif ou Bilingue
  • Highest level of education
    Master

Job search preferences

  • Desired job type
    Temps plein
    Intéressé par le travail à distance
  • Desired positions
    Backend Engineer, Data Engineer, MLOps Engineer
  • Lieu de travail désiré
    Taiwan Province, Taiwan
    Singapore
    Hong Kong
  • Freelance
    Je ne suis pas indépendant

Work Experience

Logo of the organization.

Principal Engineer, Data Engineering

KKCompany
Temps plein
avr. 2023 - Présent
Taipei City, Taiwan
Logo of the organization.

Senior Data Engineer

Lomotif
Temps plein
juil. 2021 - Présent
Singapore
Description and Responsibilities: Lomotif is a leading short video social platform in South America and India that holds PBs of videos in buckets and serves millions of users. DataOps and AI team take part in many challenging projects e.g. Ncanto, XROAD services, Ray Serve, and scalable model serving frameworks for support the recommendation and moderation pipeline, also integrated Universal Music Group music (UMG) and full catalog feed with 7digital. DataOps team handling 10TB+ data for day-to-day operation, moderating model training results, and designing SLIs/SLOs for EKS Clusters. More responsibilities/details as below. - Optimize music (UMG) pipeline with queries and memories for Elasticsearch and PostgreSQL, the pipeline saving 90% execution time from 10+ hours to 40 mins. - Migrate service from apache spark, AWS Data Lake Formation to AWS MWAA, EKS airflow environment. - Design, and deliver distributed system for Ray Serve with AI team. - Design, and implement a modern machine learning pipeline for a recommendation, and moderation pipe. - Design SLA and implement alert log reporting system (history logs) for moderation pipeline, histories logs handling application, server levels information for further investigation. - Supporting other departments to gather data in the appropriate platforms. Tech Stacks : - Streaming, Snowpipe/Kinesis/Firehose - Monitoring, CloudWatch/Grafana - Orchestration, AWS MWAA / Airflow - Kubernetes, EKS - Message, SQS/SNS - MLflow, Ray Serve/EMR/Lambda - Storage, Snowflake / RDS (PostgreSQL) / ElastiCache (Redis) / Elastic search - Bucket, AWS S3 Reports to : VP of Data Engineering
Logo of the organization.

Senior Data Engineer

oct. 2020 - mai 2021
8 mos
Description and Responsibilities: The main responsibility of the engineering team is launching ScoutAsia by Nikkei and The Financial Times Nikkei content to SGX TitanOTC's platform. Titan Users will be able to access Nikkei news articles from across 11 categories, including equities, stocks, indices, foreign exchange, and iron ore. DPP (Data team) is processing hundreds of GB articles/market/financial/relationships and organization for day-to-day operation on Azure and on-premise environments. More responsibilities/details as below. - Identify, digging bottlenecks, and problem-solving especially optimizing the performance of SQL Server, NoSQL (Azure Cosmos), resource units, and message queues, reducing/saving almost 50-75% of resources. - Identify, solving the problems between machine learning/backend/frontend/DDP side and giving the advance logical/physical design of a system. - Displayed technical expertise in optimizing the databases and improving the data pipeline to achieve the objective. - Bring in industry standards to data management to delivery of data at the end objective. - Build, recruiting the new data engineering staff for the next-generation, enterprise data pipeline. Tech Stacks : - Storage, Azure Cosmos DB/Gremlin/SQL Server/MYSQL/Redis - Storage (Bucket), Azure Blob/AWS S3 - Streaming/Batch/transform, Spark/Scala (90% codebase coverage) - Message, Azure service bus, queue storage - Search, Elasticsearch - Algorithm, graph/concordance Reports to : CTO
Logo of the organization.

Senior Data Engineer

févr. 2020 - juil. 2020
6 mos
Taipei City, Taiwan
Description and Responsibilities : The big challenge of 17 Media data teams is facing fast-growing data volume (processing 5-10x TB level daily), complex cooperation with stakeholders, the cost optimization of pipeline and refactor big latency systems .etc. As a senior data member, I’m making a data dictionary and trying to explain/design how the whole pipeline working with each component, especially how to solve those bottlenecks. More responsibilities/details as below. - Leading, architect a large-scale data pipeline for supporting scientists and shareholders. - Optimize, ensure quality and play a tough role in data lake projects/data pipes. infrastructure. - Define, designing stage, dimension, production, and fact tables for data warehouse (BigQuery). - Coordinate with client / QA / backend team for QC lists / MongoDB change stream workers. - Architect workflows with those components, Dataflow, Cloud Functions and GCS. - Recruiting (Jr./Sr.) data engineering members, setting goals and sprint management. Tech Stacks : - Storage, GCS/BigQuery/Firebase/MongoDB/MYSQL - Realtime process and Message system, DataFlow (Apache Beam) / BigQuery Streaming / MongoDB Change Stream / Fluentd / Firebase / Pub/Sub - ETL/ELT workflow, Digdag / Embulk - Datawarehouse, Visualization, BigQuery / Superset / Chartio / Data Studio - Continuous deployment, docker, Cricle CI Reports to : Data Head
Logo of the organization.

Data Engineer

sept. 2015 - déc. 2019
4 yrs 4 mos
Description and Responsibilities : This is another 0 to 1 story. As an early data member, we need to figure out the data driven policy, strategies, engineering requirements from the company. In Paktor, data / backend sides are 100% on AWS, therefore the whole data ingestion, automation and data warehouse etc. are relying on those components. We are processing 50-100x GB realtime / batch jobs and the other data sources (RDBMS, APIs) for ETL/ELT on S3, Redshift, the data platform helps our marketing / HQ scientists team getting data into insights and making good decisions. More responsibilities / details as below. - Supports Big Data and batch, real-time analytical solutions leveraging transformational technologies. - Optimize data pipeline on AWS using Kinesis-Firehose/Lambda/Kinesis Analytics/Data Pipeline, and optimize, resizing Redshift clusters and related scripts. - Translates complex analytics requirements into detailed architecture, design, and high performing software such as machine-learning, CI/CD of recommendation pipeline. - Collaborate with client / backend side developers to formulate innovative solutions to experiment and implement related algorithms. Tech Stacks : - Storage, S3/Redshift/Aurora - Realtime process and Message system, Kinesis Firehose / SNS - Data warehouse, Visualization, Redshift / Klipfolio / Metabase - ETL/ELT workflow, Lambda / SNS / Batch / Python - Recommendation, ML, DynamoDB / EMR / Spark / Sagemaker - Metadata management, Athena (presto) / Glue / Redshift Spectrum - Continuous deployment, Elasticbeanstalk / Cloudformation - Operations, PagerDuty / Zapier / Cloud Watch Reports to : CTO, Data Head
Logo of the organization.

System Analyst (Data Backend Engineer)

janv. 2014 - août 2015
1 yr 8 mos
Description and Responsibilities : JSPectrum is a leading passive location-based service company in Hong Kong which holds many interesting products such as NetProbe, NetWhere, NetAd etc. In Optus (The main project in Sydney), the main responsibility of system analyst is designing / implementing data ingestion (real-time processing) / load and management data with major components of the Hadoop ecosystem. We meet the challenge to process 15,000 TPS, 60,000 inserts per second and 300 GB daily storages, therefore we are trying to optimize those components with Kafka consumers, HDFS storages and re-designing keys / columns of HBase to fulfill the requirement and deployed NetAd, whole in-house solutions on Optus. More responsibilities / details as below. - Design, implement and optimize Hadoop ecosystems, MLP, real-time processing on Optus in house servers with our main product NetAd, NetWhere. We are focusing on HBase schema, HDFS, balancing Kafka consumers and more issues on data ingestion. - Collaborate with shareholders and LBS team members for further requirements with HeapMap. Tech Stacks : - Storage, HDFS - Realtime process and Message system, Kafka streaming, Log systems - Data warehouse, Visualization, HBase / NetWhere (Dashboard) - Hadoop ecosystem, Hadoop / HDFS / Zookeeper / Spark / Hive - ETL/ELT workflow, MLP / Scala / Java Reports to : CTO
Logo of the organization.

Senior Software Engineer

oct. 2012 - déc. 2013
1 yr 3 mos
Description and Responsibilities : TORO is a technology business that provides a mobile platform and its associated systems, services and rules to help Brands (with initial focus on Sports Teams, Smart Cities and Streaming apps) become super-apps to generate additional revenue with minimum effort. Responsibilities as below. - Design, implement and test back-office modules for NFC wallet platform, Trusted Service Managers (TSM) and distributed NFC services to end­ users / stakeholders. - Implement RESTful services and deliver endpoints for wallet managers and collaborating with front­end, backend teams for further business requirements. Tech Stacks: MYSQL / Spring / Hibernate / XML / Apache Camel / Java / POJO .etc. Reports to : Head of Server Solutions
Logo of the organization.

Software Engineer

oct. 2011 - sept. 2012
1 yr 0 mos
Description and Responsibilities : Digital river proactive partners, providing API-based Payments & Risk, Order Management and Commerce services to leading enterprise brands. The big challenge to DR is integrating with the current module and working well with a huge code base (over 2+ millions lines), the strict process including analysis requirements, design, implement, test and code review. More responsibilities as below. - Design, implement custom bundle project, bundle customized by shoppers to pick products of groups and get special discounts, the main stakeholders /users from Logitech, Microsoft. - Analysis, collect business requirements, identify use cases and collaborate with business analysts and deliver related diagrams, documents. Tech Stacks: Oracle / Tomcat / Spring / Struts / JDO / XML / JUnit / Java / J2EE .etc. Reports to : Technical Development Manager

Technical Supervisor

oct. 2008 - sept. 2011
3 yrs 0 mos
Description and Responsibilities : Stark Technology (STI) is the largest domestic system integrator in Taiwan. We plan and deliver complete ICT solutions for a wide spectrum of industries through representing and reselling the world's leading products. This is made possible by using the most advanced technology, and providing the best professional services. More responsibilities / projects as below. - Lead, coach JR. programmers for the development process of enterprise modules, and design Fatwire CMS components as Template/Page/Cache .etc. - Design, analyze DMDB systems, and implement functions to meet the requirements of queries / storage. - Optimize performance for online servers and GC tuning. Tech Stacks: Oracle / Sybase / Tomcat / Weblogic / Spring / Struts / Hibernate / Fatwire / Java / J2EE .etc. Reports to : Technical Manager

Education

Master of Business Administration (MBA)
EMBA Programs, Business Administration, Accounting, Finance and International Business.
2010 - 2011
Master of Science (MS)
Computer Science, Data Mining, Expert Systems and Knowledge Base as major concentration.
2002 - 2005