Feb 2021 - Present
Backend/Data Engineer
eCloudvalley Digital Technology Co., Ltd • Feb 2021 - Present
1. Project: Real Time Dashboard Project using OCR- POC
"This project is to demonstrate the entire process from image capture to dashboard. The process includes labeling , cropping , recognizing the cropped equipment-screen-captured images, storing the recognized information, and displaying the information on a real-time dashboard."
- Redesigning the VGG Image Annotator for labeling equipment-screen-captured images and storing the boundary data of labels in a SQL Server database
- Using OpenCV to crop the incoming image based on the boundary data and recognizing the cropped images by PaddleOCR
- Storing the recognized information and corresponding labels in SQL Server Database
Using Power BI Desktop to display the information on a real time dashboard
Result: quickly demonstrate how to build the end to end process from image capture to real-time dashboard
2. Project: Real Time Plastic Injection Data Dashboard Project
"This project is to send plastic injection data to IOT Hub, combined with mold data, to display the data on an interactive dashboard that provides up-to-date information . "
- Using Azure IOT Hub and Azure Event Grid to trigger Azure Functions to process and store data into MYSQL Database
- Using Grafana to design an interactive dashboard with real-time updates
- Hosting a static website on Azure Blob Storage and and using Azure Functions and FastAPI as backend to manage mode data
- Using Azure Logic Apps to send an email alert when there is abnormal data detected
Result: Monitoring factory machine data remotely and in real-time to improve productivity
3. Project: Equipment Damage classification and prediction Project
"The goal of this project is to perform data analysis using collected equipment damage features and label data, train a classification model to predict damage severity labels, and forecast the trend of future damage severity. "
- Using Pandas, NumPy and Pandas-Profiling to conduct exploratory data analysis (EDA)
Using Seaborn heatmap to visualize the correlations and boxplot to visualize the distribution of damage severity based on month, weekday and hour
- Using the XGBoost algorithm to classify damage severity
- Using recurrent neural network (RNN) to forecast the trend of future damage severity over the next six months in a time series analysis
Result: The accuracy of the classification model is 99% evaluated using the test dataset, and the important features that contribute the most to the classification task are identified. The time series forecast will provide recommendations on which equipment should be maintained over the next 6 months.
4. Project: Customer Data Platform Cloud Architecture Solution Project
"This project is to provide design of a scalable cloud structure for the open source Customer Data Platform (Apache Unomi ) which can handle large volumes of traffic, ensure high availability and scalability of the service "
- Configuring the Amazon API Gateway to handle the API requests distributed to AWS Simple Queue Service (SQS) and AWSElastic Load Balancer (ELB) to provide different levels of traffic control
- Setting up AWS ELB to deploy the Apache Unomi servers in an autoscaling group, allowing the infrastructure to automatically add or remove servers based on traffic levels or other conditions
- Setting up AWS SQS which acts as a buffer to handle bursts of traffic, so that incoming requests are stored in the queue and sent to the Apache Unomi servers in a controlled manner
- Using AWS OpenSearch for data storage and data visualization
Result: providing system design report to achieve the goal of reducing the cost of AWS OpenSearch
5. Project: Data Streaming and Visualization Project
"This project is to build a platform deployed in Azure and AWS to store streaming data and further visualization."
- Azure Platform
- Using Azure Event Hubs/Azure IOT Hub to stream data into Azure Blob Storage
- Using Azure Event Grid to trigger Azure Function to analyze and store the data into Azure SQL Server
- Using Power BI for visualization
- AWS Platform
- Using AWS Kinesis to stream data into AWS S3
- Using AWS Lambda to analyze the data
- Using AWS OpenSearch for visualization
Result: the platform that can fulfill the need to store 2 million units of sensor data by streaming it into storage within a time frame of 4 seconds