Welcome to my portfolio, showcasing my diverse range of technical expertise and experience in data engineering, serverless architectures, deep learning, and IoT systems. Here you'll find real-world applications of tools like Google Cloud Platform, AWS, BigQuery, and Python, demonstrating my unique problem-solving abilities. From designing complex data pipelines to improving IoT system architectures, this portfolio reflects my commitment to creating efficient, scalable solutions and my continuous drive to learn and adapt. Enjoy exploring!
At my current role, I'm responsible for managing and operating a government subsidy disbursement platform. Given the substantial budget and size of the project team, there was a significant demand for data analysis. The original outsourced dashboard vendor could not keep up with the client's needs. Hence, I proposed to establish our own data analysis system within the team, taking full responsibility for it.
Data Sources
Data Extraction and Storage
Front-end Product Numbers Monitoring
Cloud Storage
Google Analytics 4 Integration
Alert System
Future Planning
In the early stages of my career, I developed skills in digital advertising projects which primarily fell into three categories:
During various advertising projects, I garnered experience in digital marketing across numerous industries, including securities finance, government websites, and large-scale 3C e-commerce. I handled traffic tracking for over 15 websites. While my current focus lies in the first category, I continue to collect data relevant to all aspects. For instance, in the website I am currently responsible for, I manage the system that collects data that impacts website traffic and conversion rates, such as keyword rankings and tracking the effectiveness of marketing campaigns.
Digital marketers grapple daily with the challenge of retrieving and consolidating data from diverse sources, often encountering unexpected formats and locations. To address this predicament, we turned to Databricks as a primary tool, harnessing its robust data processing prowess coupled with the concepts of Data Lakes and Delta Lake. Data lakes facilitate centralized storage of data in varied formats, while Delta Lake offers data version control and efficient querying. This piece delves into the intricacies of data integration, presenting a viable solution through Databricks and affiliated technologies, empowering marketers to optimize their data assets.
This solution architecture diagram provides a panoramic view of a comprehensive data workflow, spanning from data acquisition, transformation, to the final applications and presentations. Here are the pivotal phases and their corresponding actions:
In the evolving landscape of AI and machine learning, our solution seamlessly integrates MLflow and SHAP for comprehensive MLOps. From structured model training, logging, and versioning to insightful model interpretation, our approach enhances transparency. Leveraging SHAP values, we offer both global and local model insights. Further, through MLflow's capabilities, models are effectively deployed, monitored, and updated. This solution not only ensures optimal model performance but also prioritizes clarity and trust, making complex AI models more understandable and reliable.
Model Training and Logging:
Model Interpretation:
Model Refinement and Retraining:
Model Deployment:
Model Monitoring and Feedback:
Model Retraining and Updating:
Situation:
As BigQuery continuously integrates robust natural language processing (NLP) capabilities, developers are presented with increasingly convenient tools. The recent inclusion of text-embedding features in BigQuery amplifies the diversity and potency of NLP applications. However, while numerous overviews touch upon these new features, the practical application and operational steps are often overlooked.
Task:
This article aims to fill that gap by offering a step-by-step guide on harnessing these advanced NLP tools within BigQuery. A highlight includes the integration of LLM Bard, a recently added feature allowing users to directly conduct AI operations within the data warehouse.
Action:
Result:
The integration of LLM Bard into BigQuery presents a groundbreaking approach to conducting AI operations directly within the data warehouse. The sentiment analysis query, for example, deduced that a message advertising free entry to an FA Cup final was deemed "positive". The distinction between legitimate and spam messages was clearly identified, emphasizing that while legitimate messages have a clear purpose and relevance to the recipient, spam messages are unsolicited and sent in bulk.
This module checks if the issuance of points on the Taiwan Cloud Market platform exceeds the budget. It queries the budget data in a BigQuery table, and if the remaining budget falls below a threshold, it sends an email to notify relevant personnel.
System Architecture
In today's data-driven landscape, the need to harness data efficiently and ensure its quality has never been more critical. Dataplex, an innovative tool in the world of data management, stands at the forefront of this evolution. It promises not just enhanced data analytics but also an assurance of impeccable data quality.
Situation:
In the digital age, ensuring data quality has become pivotal for business success.
Task:
To implement a robust system that monitors and enhances data quality automatically.
Action:
Result:
This strategy not only safeguarded high-quality data but also increased transparency in data quality management. The visualizations in Looker combined with the real-time alerting capabilities of Cloud Logging provided a more immediate and clear view of data quality, ensuring smooth business operations.
More details can be found in my medium blog:透過 Dataplex 優化資料分析和資料品質
This solution is designed for effective management and analysis of Google Analytics 4 (GA4) log data. Given the substantial volume of GA4 log data, we apply a data flow involving Google Cloud Function, Amazon S3, AWS Glue ETL, and Amazon Athena for efficient data processing.
System Architecture
More details can be found in my medium blog:基於事件觸發的AWS Glue:實作處理GA4日誌檔案
This project involves the utilization of Google Cloud Functions and Pub/Sub for the analysis and communication of stock data.
For a detailed walkthrough of the process, please refer to my article: 利用 Google Cloud Functions 和 Pub/Sub 串接實作
This project involved consultation and suggestions for architectural modifications on a system designed to collect data from IoT devices in aquaculture settings. Though I didn't implement the changes, my role was instrumental in designing the improved system's architecture.
The original system structure worked as follows:
This project served as a Proof of Concept (PoC) for a digital twin system. In the next section, I will provide the proposed architectural changes.
The main modifications to the IoT-Based Aquaculture Data Collection System architecture are as follows:
Expexted Benefits: