Austin Yang

Hello, I am Ming-Hao, Yang.

I graduated from National Yunlin University of Science and Technology with M.S. in Computer Science and Information Engineering.

My research interests are in deep learning, speech synthesis, speech recognition, anomalous sound detection.

I am familiar with Linux, audio and computer vision, deep learning framework PyTorch, Keras, Tensorflow.

Machine Learning Engineer

[email protected]

Work Experience

National Yunlin University of Science and Technology, Machine Engineer, Feb 2020 ~ Present

I am responsible for solving industry issues, such as anomaly detection, object detection, object classification, and anomalous sound detection.

At the same time, I serve as the instructor of the Ministry of Education's online pre-employment training course to train AI-related talents.

This course teaches college students of AI-related technology across the country to introduce AI into enterprises and also sets up AI workshops to teach enterprises how to use AI to recognize environmental sounds.

Projects

Respiration sound recognition
Infant sound recognition
Anomaly image recognition
Anomalous sound detection

National Yunlin University of Science and Technology, Artificial Intelligence Engineer, Aug 2019 ~ Oct 2019

This project aims to use students' online learning behaviors to predict whether students will pass the course at the end of the semester and use the AI model to give appropriate learning suggestions.

In the project, I am responsible for the establishment of the entire project such as behavioral feature analysis, AI model establishment, and hyperparameter tuning and prediction.

Education

National Yunlin University of Science and Technology, Master’s Degree, Computer Science and Information Engineering, 2017 ~ 2019

National Yunlin University of Science and Technology, Bachelor’s Degree, Computer Science and Information Engineering, 2012 ~ 2017

Skills

Programming language

Python
Ruby
C++

Machine learning

SVM
KNN
PCA
Random forest
Decision tree

Deep learning

Generative Adversarial Network
Autoencoder
Convolutional neural network
Fully connected neural network

Speech

Speech synthesis
Speaker recognition
Sound event classification

Computer vision

Object detection
Object classification
Image classification

Tools

git
GitLab
Docker
Shell script

Portfolio

ImageClassification https://github.com/fastyangmh/ImageClassification

This repository is an image classification based on deep learning.

It contains various state-of-art models like ResNet, MobileNet, EfficientNet and the user can use a self-defined model.

This repository has a hyperparameter tuning and a k-fold cross validation feature.

You can easily use this repository to complete tasks of image classification. What you need is to prepare data and input instructions according to the document, and then you can automatically start training the AI model, and finally, get the best AI model.

SoundClassification https://github.com/fastyangmh/SoundClassification

This repository is an sound classification based on deep learning.

It contains various state-of-art models like ResNet, MobileNet, EfficientNet and the user can use a self-defined model, also contains any audio transform based on SoX.

This repository has a hyperparameter tuning and a k-fold cross validation feature.

You can easily use this repository to complete tasks of sound classification. What you need is to prepare data and input instructions according to the document, and then you can automatically start training the AI model, and finally, get the best AI model.

AudioGANomaly https://github.com/fastyangmh/AudioGANomaly

AudioGANomaly is based on an anomaly detection paper named, "GANomaly: Semi-Supervised Anomaly Detection via Adversarial Train", this architecture uses Encoder-Decoder-Encoder to learn the distribution of normal data in high dimensional space and combine with a classifier based on Encoder by adversarial learning.

AudioDenoiser https://github.com/fastyangmh/AudioDenoiser

AudioDenoiser is based on an anomaly detection paper named, "Real Time Speech Enhancement in the Waveform Domain", this architecture is similar to Autoencoder and U-Net and adds LSTM in the middle layer to improve denoising performance.

Master's Thesis

Speech Synthesis based on Generative Adversarial Network

https://hdl.handle.net/11296/b2ztz4

In recent years, based on mature hardware technology and big data, the Deep Neural Network(DNN) has made breakthroughs, and many successful cases can be seen in various fields. One of the most groundbreaking deep network architectures is the generative adversarial network, which provides an innovative way to train the generative model, and more specifically, it designs the model into two sub-models: generator and discriminator. The generator is used to generate samples, and the discriminator attempts to classify the samples as real or fake. This thesis, which is different from traditional speech synthesis technology, explores the speech synthesis technology based on a generative adversarial network. The generative adversarial network can learn the feature distribution from the training data, thereby generating more natural speech.

This thesis includes the Chinese and English speech synthesis. For the English model, which corpus CSTR VCTK corpus to train three different speaker models of men and women. As for the Chinese corpus, which uses the COSPRO & Toolkit, and also trains three different speakers models of men and women. From the results, it can be found that the English language average score of men and women Mean Opinion Score(MOS) reached 3.18 points (3.52 points for men and 2.83 points for women) out of 5 points, and the average score of men and women in Chinese language MOS reached 1.91 points (2.21 points for men, 1.6 points for women). In addition, in the speaker identification experiment, we found that the average pass rate of the text-related synthesized speech in Chinese and English is as follows: DNN average pass rate reaches 80.5% (72% for Chinese, 89% for English). The Support Vector Machine (SVM) has an average pass rate of 86% (100% in Chinese, 72% in English). The average pass rate of text-independent synthesized speech has different pass rates according to the length of speech: the average pass rate of DNN is 36% (44% in Chinese, 28% in English) in 0.5 seconds, and 44.5% in SVM. The average pass rate of DNN in 3 seconds is 75% (78% in Chinese, 72% in English), SVM is 80.5% (72% in Chinese, 89% in English), DNN average in 5 seconds, the pass rate was 89% (78% in Chinese, 100% in English), and the SVM is 97% (94% in Chinese, 100% in English).

In the average opinion score, since English has a more complete front-end language rule to produce complete text features, so that the model can generate more natural speech. Therefore, English synthesized speech is better than Chinese. In the speaker identification experiment, the English pass rate is worse than that of Chinese in this case because the English speech time is much shorter than Chinese speech. As far as this article is unrelated, it can be found that the longer the speech time is, the higher the pass rate is. Therefore, improving the security of the speaker recognition system can reduce the phrase time or improve the model. Since the discriminator of the system is used to identify the authenticity of the speech during the training process, we can combine the discriminator in the system into the speaker recognition system to effectively block the synthetic speech attack.