Kishan Gondaliya

Experienced embedded software engineer working on Embedded Systems and Deep Learning to enable vision and voice-based machine learning algorithms on low-power FPGA and edge embedded devices. ~8 years of experience consists in writing, debugging, and optimizing software/firmware for embedded devices.

+91 9409 24 93 94
[email protected]   Ahmedabad, Gujarat, India      

Skillset

Languages:

Frameworks:

Dev Tools:

HW Platform:

Cloud (GCP):

Cloud (AWS):

Other:


C, Python, C++

Tensorflow (TFlite, TFmicro), Keras, Caffe, Darknet

Anaconda, Git, Gerrit, Perforce, Pycharm, CVS, Jira, Confluence

Google Coral TPU, Lattice ECP5, U+, Crosslin-NX FPGA, Raspberry Pi, Intel Movidius, NVIDIA GPU

Compute Engine, App Engine, Vision API, Auto-ML, Container Registry, Kubernetes Engine

Sagemaker, DeepLens, Lambda, Rekognition API, Reko API custom labels

Docker, OpenCV, Machine Learning, Deep Learning, Computer Vision, Convolution Neural Nets (CNN), LSTM, Networking, Model Optimization, Quantization, Pruning, Linux Kernel, OpenWRT

Work Experience

Work Experience

AI & Embedded Systems Consultant

Self-Employed  •  February 2021 - Present

Working with companies to blend AI with embedded systems specifically to enable AI on edge devices, including the device ecosystem.

Staff Engineer

Softnautics  •  September 2016 - February 2021

  • Architectured a Dockerized ML training framework and led the team for bug-free releases
  • Led Machine Learning COE team and completed 9+ projects successfully based on edge devices and cloud services
  • Worked on different DL model architectures and customized them for small footprint edge FPGA devices with techniques like quantization and pruning
  • Worked on OpenWRT firmware customization for mobility solution, network utilization monitoring and controlling

Associate Engineer

Sibridge Technologies  •  May 2015 - August 2016

  • Worked as a developer in critical 32-bit Tensile core based audio processor firmware development
  • Implemented multi-radio feature for mesh networks in the Linux kernel and improved HWMP to get a 7% throughput increment
  • Contributed to several projects as an individual contributor

Projects

Omnivision Camera driver for OpenQ2500 platform and DL model integration

  • OpenQ2500 is a wearable SOC designed mainly for small devices like trackers, smart watches, smart eyewear etc.
  • Work involved camera driver development and fine-tuning the camera with parameters that can be changed from user space.
  • Later with a camera feed, DL model was developed to identify multiple custom objects based on wearable application of the client

Linux Driver for I2S on iMX8

  • Work involved developing an I2S driver to stream audio from/to the DSP core
  • Controlling parameters of of audio stream were controlled through I2C bus and part of driver work

Microchip WLSom1 WiFi support

  • Driver porting, specifically backporting, was done for Microchip's WLSOM1 target chip SAMA5D27 for OpenWRT operating system

802.11s mesh network for 802.11ac radios with multi-radio multi-channel support

  • The IEEE 802.11s Mesh standard has defined Hybrid Wireless Mesh Protocol (HWMP) as the default routing protocol and Airtime Link

    metric (ALM) as the default metric for path selection.

  • The project involves enhancing the existing HWMP routing protocol for more efficient working in different environmental conditions and considering other important wireless parameters other than ALM in link cost calculation for better path selection.

  • Add support for multiple Mesh Points with different channels MIMC (Multi Mesh Interface Multi Channel) for better n/w connectivity and performance by avoiding issues of interference due to the same channel in SISC (Single Mesh Interface Single Channel).

  • Define both user interfaces of command line and GUI for individual
    and central management of the Mesh network

  • All implementations are on the Linux-based open source code of 802.11s

  • Development includes understanding of mac80211, nl80211, and cfg80211 drivers as well as utilities like iw, iwconfig, ifconfig, and iwlist.

  • Integrate power-saving mechanism for multi-radio support in
    Linux kernel.

Audio processor firmware development for Tensilica-based DSP

  • This project was about the maintenance of voice processor firmware, which included bug fixing, feature enhancement, and functional testing.
  • The voice processor is based on a customized 32-bit Tensilica core running a single-threaded custom OS, which has various IO peripherals like I2C, PDM, I2S/PCM, SLIMBus etc

Dockerized ML training framework

  • Containerized Machine learning training framework by which users can create, train, debug and freeze the ML model
  • Architect whole framework from scratch and created plug and use components
  • Generated various docker images for the different training environments
  • Added generic base code component along with a detector which can support any object detection or classification model architecture
  • Enabled automated data augmentation, splitting, and performance matrix generation

Neural Network compiler development

  • Development/Enhancement of Neural network compiler tool written in Python for FPGA manufacturers
  • Tool code optimization for 2x speed of simulation
  • Dynamic fixed-point calculations implementation
  • Development of a part of a tool that handles debugging hardware through USB by reading and writing DRAM by doing bulk & control transfer
  • On top of the UMDF driver for windows and libusb for linux, wrapper library was developed.

Shoulder Surfing detection

  • Manually annotated OID v6 dataset of person class images with front and non-front looking classes
  • Automated class distribution and augmentation flow using python scripts
  • Customized SqeezeDet network architecture to fit into the small footprint of Lattice iCE40 FPGA
  • Developed C# windows GUI to communicate with FPGA through UART com port to display input images to the CNN engine and detection results

Intelligent parking slot allocation system

  • CNRPark-2 used as the base dataset
  • Used AWS rekognition custom label service at the POC stage
  • Automated pipeline on AWS to trigger training when a new dataset is added to the S3 bucket
  • Trained 2 different models due to available dataset, first to detect parking slots, second to detect if it is free or busy 
  • Generated dataset with augmentation operations like to fake weather conditions
  • Designed final model to accommodate both functionality and trained with custom dataset

Human Counting on low power FPGA

  • Developed human counting optimised model for FPGAs like Lattice ECP5, Crosslink-NX, Crosslink-NX Voice & Vision, iCE40
  • Customised training code based on SqueezeDet detector which can accommodate architectures like VGG, MobileNet V1 & V2, ResNet etc
  • Quantization and model pruning

Keyphrase detection

  • Develop a CNN that can recognize a keyword from its audio spectrum that runs on Lattice iCE40 FPGA.
  • Added support in NN compiler to generate filter binary to convert audio data into image like data
  • Audio data augmentation

Face Recognition

  • Developed face recognition model compatible with Lattice ECP5 FPGA

  • Cleaned VGGFace2 with the help of dlib to remove images that could confuse our network

  • The trained model with the VGGFace2 dataset and custom-added images to give a 128 feature map that can be used to recognize a person’s face

Analog gauge reader

  • Design a system for an industrial analog gauge reading
  • Synthetic dataset generation & augmentation for different gauges
  • Train model with Google AutoML and use TFLite model with Google Coral stick as POC
  • Design a custom VGG type model for speed and performance optimization with quantization techniques

Gesture Recognition

  • Lattice iCE40 FPGA with IR transmitter-based solution
  • Configured camera for enhanced IR sensitivity in RTL to mimic IR sensor-based input
  • Generated dataset by capturing actual images from the hardware itself for better accuracy and performance. Developed C# Windows app
  • Customized SqeezeDet network architecture to fit into the small footprint of Lattice iCE40 FPGA

AWS DeepLens

  • Deployed models based on Face analytics, clothing style detection, logo detection & scene detection
  • Developed lambda function for all the models for inference output processing
  • Developed ML IOT quiz based on pre-trained MobileNet SSD object detection model and node-red based service

POC Projects (Deep Learning)

  • Age & gender detection (Targeted advertisement)
  • Driver distraction alert
  • Face mask detection
  • Social distancing alert
  • Facial expression recognition

Education

Charotar University of Science & Technology

B.Tech (Electronics & Communication)  2011 – 2015