L2u7kbllgdykw0tdhqll

高振倫    Joe Kao

Majoring in data mining and database system, I am taught to grow into a person with positive attitude and willing to learn and progress. Being an active, creative and innovative member in a team, I can always provide different ideas through brainstorming. With numerous experiences and a great ability to deal with problem solving, I have strong determination to succeed and hope to make a contribution to every opportunity.

Data Engineer
Hsinchu, TW
[email protected]

Skills


Big Data Engineering

  • Rich knowledge in NoSQL, skilled in RESTful search API and distributed analytics engine ElasticSearch

  • Experienced in multilingual Natural Language Processing, working on reliable data pre-processing and data cleaning.


Deep Learning

  • Leading team members to do research and build Machine Learning models, further designing prototypes on application system. 

  • Utilizing Deep Learning methods (CNN/RNN) to process heterogeneous documents and develop systems for market analysis and prediction.

Web Crawler Design

  • A high-efficiency Python crawler designer who can rapidly analyze elements in websites according to past experiences in JavaScript, HTML5, CSS, PHP and JSP.

  • Capable of putting test codes into practice and deploying mechanism of data collection and storage in various database framework such as MSSQL, MySQL, PostgreSQL, and MongoDB.


Automated Tool Development

  • Good at interactive computational environment Ipython Notebook, producing clear source code and visualizing results with plots and diagrams.

  • Developing assistant programs using Graphic User Interface which provides semi-automatic or manual operations.


DevOps

  • Building Continuous Integration framework for team members, including version control based on Git, and production deployment with Jenkins.
     
  • Setting Virtual Environments for development integrity, providing modularizing code snippets for Faster Delivery, and writing scripts for Automation.


System Administration

  • Having great experience of building and using Unix / Linux OS especially on CentOS and Ubuntu, understanding settings of Hadoop development and Google Cloud Platform.

  • Excellent grade in CCNA, being well-trained to do installation and troubleshooting on network devices such as Cisco Catalyst 2960, 3750 Switches.


Experience

Data Engineer

Big Data Innovation Team, Information Technology Dept., Realtek
(Feb. 2018 – Present)


Responsibility 


  • Develop every component in ETL process, which (1) extracts data by developing web crawlers and document parser, and then (2) transforms data by open source approaches such as pandas, finally (3) loads results into NoSQL / SQL Databases according to project requirements.
  • Research on deep learning solutions to different problems. Primarily use Keras to do semantic analysis on unstructured data. Secondarily building pipeline and process heterogeneous documents using open-sourced methods. 
  • Build and introduce Git flow and Jenkins CI server for affiliated team. Set up virtual environments for development integrity of projects, providing team members with modularizing code snippets. 
  • In most projects take charge of version control, production deployment, system administration, backup and restore, execution log analysis, and also work as an advisor for cutting edge technology. 

 Selected Projects

  • Data storage and search engine of Customer Visiting System
    1. Introduced a schema flexible NoSQL database and established ElasticSearch workflow for loading documents with inconsistent encoding and variable data structure. 

    2. Built ODBC Connector to link Python-based programs with SQL Server, making functions developed with high-level languages be able to access existent data tables.

    3. Assembled ElasticSearch query clauses to develop a Google-like searching API which features keyword highlight, customized result sorting, and fuzzy searching. 

  • Semantic analysis on unstructured sales reports
    1. Initialized machine learning environment by installing graphic card driver, cuDNN libraries, and Jupyter Lab for collaborative development. 

    2. For Word2Vec feature extraction, cleaned data and preprocessing using multilingual Natural Language Processes such as NLTK stemming and Jieba word segmentation. 

    3. Surveyed pros and cons between CNN and RNN methods toward assigned mission, eventually simplified task to sequential LSTM text classification, and completed a minimum viable production using Kaggle’s IMDB dataset, which achieved 77% precision. 

    4. Wrote scripts for the automation of model training and prediction, and designed a human labeling GUI for the generation of ground truth.

  • Market analysis of business insight reports
    1. Instructed team members to develop web crawlers in order to collect news from information technology medium. 

    2. Extracted figures and data tables in PDF files using 3 different ways, including Tabula, PDFMiner, and open-sourced machine learning approaches. 

    3. Developed a human labeled data importer, mapping file names and page numbers to corresponding figures and tables and construct a data archives for future analysis. 

    4. Referred to existent SQL tables and labeled keywords, using regular expression to extract product serial numbers from documents, and further categorized documents into product groups. 

  • Data pipeline for heterogeneous man-made documents
    1. Mostly implemented by Python libraries pandas, xlrd, and openpyxl, developed an Excel file parser providing the flexibility in fitting templates with different headers and variable numbers of worksheets. 

    2. Cooperated with other teams to set up the mechanism of monitoring sent mails from company employees, which monitors file changes in directories and imports mail attachments into databases.
       
    3. Error handling by checking data dependency, missing value, data length and type defined in SQL table, and other exceptions. 

    4. Designed format of automatic replied notice mail to inform end users of error messages and illegal cells. 

 Side Projects

  • Vehicle access control system using License Plate Recognition
Served as an internship mentor, leading 2 project members to build a machine learning model of License Plate Recognition System and implementing access control framework. Initially, we collected license plate dataset with labels and trained one CNN model for locating plates and another RNN model to recognize plate numbers. Then, we developed Flask APIs to receive video frames from web cams, calculating edit distance as similarity between predicted plate numbers and those from company database. Finally, we could control the entrance gate by checking whether the vehicle owner is our employee. Our system’s prediction accuracy reached 90% in the examination. 


Associate Engineer

Digital Service Innovation Institute, Institute for Information Industry
(Mar. 2016 – Feb. 2018)


 Responsibility 
 

  • Existent RESTful APIs maintaining and refactoring, develop new APIs with Python which provides fast prototyping and flexibly interfacing with different data source. 
  • Discover or hack website APIs, rapidly develop web crawlers for sites with large numbers of users like Facebook, PChome, and Dcard. Deploy crawler programs for routine jobs or one-off tasks. 
  • Develop automated tools from scratch, divide and conquer tasks with large quantity of data, and improve analysts’ routine work efficiency.  

 Selected Projects

  • IFEEL, a system observing groups and issues on social network
    1. Improved speed of data collection from Facebook source. Rewrote crawler to a multiprocessing version and increased retrieved data amount in each GET request from 50 posts to 1000 posts.

    2. Surveyed ElasticSearch query clause, developed data aggregation and metric calculation APIs for different data sources. 

    3. Researched on Youtube URL and completed a sub-system to display trend videos given a keyword or searching clause. 

  • Analysis tool for Nielsen Digital Ad Rating System
    1. Refactored the function in international system and designed another analysis tool for localization requirements. 

    2. Logged in and retrieved the session by a fake user-agent and made it persistent by keeping the system headers. 

    3. Found URL patterns of the original APIs endpoint and deployed data collection day-to-day

  • Facebook Profile Finder 
    1. Designed an automated bot to find the mapping from users’ email or phone number to IDs. 

    2. This program utilized Facebook’s data leaking bug which was repaired in April, 2018. 

    3. Supported user inputs of reCAPTCHA when Facebook activates bot detection. 

 Side Projects

  • Food Safety Alert System
To collect food safety news and its attachments, I developed 23 crawlers for 22 Departments of Public Health in Taiwan and 1 Gmail account inbox. The crawled text was processed to alert schools to the use of illegal ingredients. Since the program structure and syntax in above 22 web crawlers was alike and reusable, I created a Github project and committed all my codes for someone’s need in crawling Taiwanese Government official sites. 


Master of Information Systems and Applications, National Tsing Hua University 

(Sep. 2012 – Jan. 2016)

Part-time Employee (Innovative DigiTech-Enabled Applications & Services Institute,
                                       Institute for Information Industry) 

  • Implemented a mechanism to discover potential pages in Facebook posts on the purpose of enriching existent data


System Administrator (National Tsing Hua University Publisher)

  • Developed a full text search system to find keywords in corresponding pages of published books
  • Implemented a URL redirection plan to avoid site address recorded on books from being invalid 



Bachelor of Computer Science and Information Engineering, National Chung Cheng University
(Sep. 2008 – Jun. 2012)

Mobile Application - Merit Award (Chunghwa Telecom Innovation & Application Contest) 

  • Developed an Android application featuring a social media platform which let users exchange name cards on both business and daily life side


Director of Network Administration (Campus Network Association)

  • Led network administrator teams to maintain network devices in campus dormitory and provide consulting service

Kaohsiung Senior High School 
(Sep. 2005 – Jun. 2008)

KSHS Journal – Executive Editor