Zong-Ci Lu (Serge) [email protected] Experience AMD,2022/05 - present Develop GEMM kernel for AMD GPU in GCN assembly Study on different stratedies on GPU kernel fusion such as GEMM + GEMM and GEMM + Softmax + GEMM Appier,2022//05 Developed API for AIQUA service Skymizer,2020//03 TensorFlow integration with DLA (Deep Learning Accelerator) Sped up float to int8 quantization by using x86 SIMD instructions(10%~50% improvement depends on batch size) Developed customized neural network visualization tool to help developers to debug graph partitioning result Amended forward shape