Kishan Gondaliya

Experienced embedded software engineer working on Embedded Systems and Deep Learning to enable vision and voice-based machine learning algorithms on low-power FPGA and edge embedded devices. ~8 years of experience consists in writing, debugging, and optimizing software/firmware for embedded devices.

+91 9409 24 93 94
[email protected] Ahmedabad, Gujarat, India

Skillset

Languages:

Frameworks:

Dev Tools:

HW Platform:

Cloud (GCP):

Cloud (AWS):

Other:

C, Python, C++

Tensorflow (TFlite, TFmicro), Keras, Caffe, Darknet

Anaconda, Git, Gerrit, Perforce, Pycharm, CVS, Jira, Confluence

Google Coral TPU, Lattice ECP5, U+, Crosslin-NX FPGA, Raspberry Pi, Intel Movidius, NVIDIA GPU

Compute Engine, App Engine, Vision API, Auto-ML, Container Registry, Kubernetes Engine

Sagemaker, DeepLens, Lambda, Rekognition API, Reko API custom labels

Docker, OpenCV, Machine Learning, Deep Learning, Computer Vision, Convolution Neural Nets (CNN), LSTM, Networking, Model Optimization, Quantization, Pruning, Linux Kernel, OpenWRT

Work Experience

AI & Embedded Systems Consultant

Self-Employed • February 2021 - Present

Working with companies to blend AI with embedded systems specifically to enable AI on edge devices, including the device ecosystem.

Staff Engineer

Softnautics • September 2016 - February 2021

Architectured a Dockerized ML training framework and led the team for bug-free releases
Led Machine Learning COE team and completed 9+ projects successfully based on edge devices and cloud services
Worked on different DL model architectures and customized them for small footprint edge FPGA devices with techniques like quantization and pruning
Worked on OpenWRT firmware customization for mobility solution, network utilization monitoring and controlling

Associate Engineer

Sibridge Technologies • May 2015 - August 2016

Worked as a developer in critical 32-bit Tensile core based audio processor firmware development
Implemented multi-radio feature for mesh networks in the Linux kernel and improved HWMP to get a 7% throughput increment
Contributed to several projects as an individual contributor

Projects

Omnivision Camera driver for OpenQ2500 platform and DL model integration

OpenQ2500 is a wearable SOC designed mainly for small devices like trackers, smart watches, smart eyewear etc.
Work involved camera driver development and fine-tuning the camera with parameters that can be changed from user space.
Later with a camera feed, DL model was developed to identify multiple custom objects based on wearable application of the client

Linux Driver for I2S on iMX8

Work involved developing an I2S driver to stream audio from/to the DSP core
Controlling parameters of of audio stream were controlled through I2C bus and part of driver work

Microchip WLSom1 WiFi support

Driver porting, specifically backporting, was done for Microchip's WLSOM1 target chip SAMA5D27 for OpenWRT operating system

802.11s mesh network for 802.11ac radios with multi-radio multi-channel support

The IEEE 802.11s Mesh standard has defined Hybrid Wireless Mesh Protocol (HWMP) as the default routing protocol and Airtime Link
metric (ALM) as the default metric for path selection.
The project involves enhancing the existing HWMP routing protocol for more efficient working in different environmental conditions and considering other important wireless parameters other than ALM in link cost calculation for better path selection.
Add support for multiple Mesh Points with different channels MIMC (Multi Mesh Interface Multi Channel) for better n/w connectivity and performance by avoiding issues of interference due to the same channel in SISC (Single Mesh Interface Single Channel).
Define both user interfaces of command line and GUI for individual
and central management of the Mesh network
All implementations are on the Linux-based open source code of 802.11s
Development includes understanding of mac80211, nl80211, and cfg80211 drivers as well as utilities like iw, iwconfig, ifconfig, and iwlist.
Integrate power-saving mechanism for multi-radio support in
Linux kernel.

Audio processor firmware development for Tensilica-based DSP

This project was about the maintenance of voice processor firmware, which included bug fixing, feature enhancement, and functional testing.
The voice processor is based on a customized 32-bit Tensilica core running a single-threaded custom OS, which has various IO peripherals like I2C, PDM, I2S/PCM, SLIMBus etc

Dockerized ML training framework

Containerized Machine learning training framework by which users can create, train, debug and freeze the ML model
Architect whole framework from scratch and created plug and use components
Generated various docker images for the different training environments
Added generic base code component along with a detector which can support any object detection or classification model architecture
Enabled automated data augmentation, splitting, and performance matrix generation

Neural Network compiler development

Development/Enhancement of Neural network compiler tool written in Python for FPGA manufacturers
Tool code optimization for 2x speed of simulation
Dynamic fixed-point calculations implementation
Development of a part of a tool that handles debugging hardware through USB by reading and writing DRAM by doing bulk & control transfer
On top of the UMDF driver for windows and libusb for linux, wrapper library was developed.

Shoulder Surfing detection

Manually annotated OID v6 dataset of person class images with front and non-front looking classes
Automated class distribution and augmentation flow using python scripts
Customized SqeezeDet network architecture to fit into the small footprint of Lattice iCE40 FPGA
Developed C# windows GUI to communicate with FPGA through UART com port to display input images to the CNN engine and detection results

Intelligent parking slot allocation system

CNRPark-2 used as the base dataset
Used AWS rekognition custom label service at the POC stage
Automated pipeline on AWS to trigger training when a new dataset is added to the S3 bucket
Trained 2 different models due to available dataset, first to detect parking slots, second to detect if it is free or busy
Generated dataset with augmentation operations like to fake weather conditions
Designed final model to accommodate both functionality and trained with custom dataset

Human Counting on low power FPGA

Developed human counting optimised model for FPGAs like Lattice ECP5, Crosslink-NX, Crosslink-NX Voice & Vision, iCE40
Customised training code based on SqueezeDet detector which can accommodate architectures like VGG, MobileNet V1 & V2, ResNet etc
Quantization and model pruning

Keyphrase detection

Develop a CNN that can recognize a keyword from its audio spectrum that runs on Lattice iCE40 FPGA.
Added support in NN compiler to generate filter binary to convert audio data into image like data
Audio data augmentation

Face Recognition

Developed face recognition model compatible with Lattice ECP5 FPGA
Cleaned VGGFace2 with the help of dlib to remove images that could confuse our network
The trained model with the VGGFace2 dataset and custom-added images to give a 128 feature map that can be used to recognize a person’s face

Analog gauge reader

Design a system for an industrial analog gauge reading
Synthetic dataset generation & augmentation for different gauges
Train model with Google AutoML and use TFLite model with Google Coral stick as POC
Design a custom VGG type model for speed and performance optimization with quantization techniques

Gesture Recognition

Lattice iCE40 FPGA with IR transmitter-based solution
Configured camera for enhanced IR sensitivity in RTL to mimic IR sensor-based input
Generated dataset by capturing actual images from the hardware itself for better accuracy and performance. Developed C# Windows app
Customized SqeezeDet network architecture to fit into the small footprint of Lattice iCE40 FPGA

AWS DeepLens

Deployed models based on Face analytics, clothing style detection, logo detection & scene detection
Developed lambda function for all the models for inference output processing
Developed ML IOT quiz based on pre-trained MobileNet SSD object detection model and node-red based service

POC Projects (Deep Learning)

Age & gender detection (Targeted advertisement)
Driver distraction alert
Face mask detection
Social distancing alert
Facial expression recognition

Education

Charotar University of Science & Technology

B.Tech (Electronics & Communication) 2011 – 2015