Subham Sahu

Seeking a challenging environment that encourages learning, creativity, provides exposures to new ideas and stimulates personal and professional growth along with organizational growth. 

[email protected]

+91- 9039347186

HSR Layout Sector 5, Bangalore, Karnataka, 560034

Technical Skills


Cloud Skills

MS Azure, Azure Data Factory, Data Lake, Azure Dev-Ops, Azure Synapse Analytics, Azure Data bricks, ETL/ELT, Storage Blob, Azure function, logic apps, Delta lake, Kafka, Grafana, Streaming Data

Programming 

Python, Pandas, PySpark, Beautiful Soup
Spark SQL, Scala

Database

MS SQL SERVER, Azure SQL Database, T-SQL, MySQL, Snowflake, Cosmos DB, Hive, Delta tables, IBM DB2 i series


Professional Summary


  • 6.3+ years of experience in Microsoft Azure cloud platform such as Azure Data Factory, Data Bricks, Data Lake, Azure Synapse analytics, functions, logic Apps, SQL DB, Cosmos DB, PySpark and Python along with Aviation, Oil & Gas, Energy Sector and Pharmaceutical domain knowledge.
  • Creates Several Relational and Non-relational Data Modelling of Data and Creates Prototype diagram using Draw.io.
  • Cleaning and transformation of complex data using data pipelines & notebook in the Azure cloud using Data Factory and Data Bricks with PySpark
  • Perform root cause analysis and resolve the production and data issues.
  • Responsible for design, development, modification, debug and maintenance of Data Pipelines.
  • Delivers technical accountability for team specific work products within an application and provide technical support during solution design for new requirements.
  • Maintains the Existing project along with new development using JIRA in Agile methodology to euiinhance the growth of productivity. 
  • Utilize sound engineering practices to deliver functional, stable, and scalable solutions to new or existing problems.
  • Involves in requirement analysis and business discussions with clients, delivery process, etc. 
  • Excellent interpersonal skills with strong analytical and problem-solving skills. 

Work Experience

Publicis Sapient, April 2023 - Present

Senior Associate Data Engineering L2 (MS Azure Cloud, PySpark)

  • Working on Orx RFP Healthcare & Insurance data of Client. 
  • Design the data model for data migration.
  • Design and Implementation of the Data ingestion using data pipelines and transformation pipelines through Azure Data factory, Databricks and Kafka streaming/Batch data.
  • Maintains technical documentation using  Confluence platform
  • Manage the projects using Agile Methodology.

Ness Digital Engineering, May 2021 - April-2023

Senior Data Engineer (MS Azure Cloud, PySpark)

  • Works on Clinical and drug trial data of Giant organizations. 
  • Design the non-relational common Data model on Cosmos DB.
  • Analysis of existing relational data for data model creation. 
  • Design and Implementation of the Data ingestion using data pipelines and transformation pipelines in Azure Data factory, Databricks and Azure Synapse analytics.
  • Maintains technical documentation using Docusaurus and Confluence platform
  • Manage the teams and projects using Agile Methodology along with Client Interaction.

IHS Markit ltd., Oct 2017 - Apr 2021

Data Engineer (MS Azure Cloud, PySparkPython)

  • Extract the complex and Bulk Data related to Aviation, Gas, Energy Sector
  • Investigate issues by reviewing/debugging pipelines, provide fixes, workarounds, and review changes for operability to maintain existing data solutions.
  • Experience of building data pipelines using Databricks (Azure Data Factory and Apache Spark).
  • We extract information by using Python and Kofax RPA tool for fast Crawling.
  • The general orientation of team to reduce manual efforts, growth of products and saving of time during ETL/ELT Process.
  • Have done the integration of third-party tools such as Shutil, Pandas, file transfer from local to azure storage container cloud services using Python.

Project

Healthcare and Insurance Analytics

ORX RFP DMA Explorer Analytics: -

We are building data Application and Framework that contains the data about the Healthcare & Insurance analytics and broadcast the visualizations through power BI reports for business users. Stores the huge data at Lakehouse level and process the data for some ML application.  
  • Used various technologies like MySQL, IBM DB2 i series, Data Lake, Spark SQL, GCP Kafka, Scala, Delta lake & Tables, PySpark and Python.
  • We pull the data from source relational DB such as MySQL and IBM DB2 and move data to Data lake in form of Parquet.
  • Created the Common Data Model using draw.io.
  • Created Azure Data Factory pipelines for process of ETL\ELT.
  • Implement custom logics for transformation and automation in Azure Databricks Notebook
  • Using Azure Monitor, Grafana to monitors the ADF Data pipelines and GCP Kafka jobs
  • Implemented CI/CD for moving pipelines/Scripts from one environment to another by Repos branch strategy using Azure DevOps
  • Manage the sprint planning, Backlog refinement and retrospectives Using JIRA in Agile methodology.
  • Maintained the documentation on Confluence platform.
  • Generates the Claim, hospitalization and expenses reports using Power BI.
  • Published the reports sharing the pbix file on product portal for Business purpose.

Clinical and Drug Trial Analytics

Pharma, Healthcare and Drugs trials: -

We are building Application that contains the data about the Pharma, Healthcare and Drugs trial and broadcast the visualizations through power BI reports. Stores the huge data at data warehouse level and process the data for some ML application.  
  • Used various technologies like MySQL, MS SQL SERVER, Azure Synapse, Data Lake, PySpark and Cosmos DB.
  • We pull the data from relational DB such as MySQL and SQL Server and move data to cosmos DB after the several transformations.
  • Created the Common Data Model using SQL API of Cosmos DB using draw.io.
  • Created Azure Synapse data analytics pipelines for process of ETL\ELT.
  • Implement custom logics for transformation and automation in Synapse Notebook
  • Using Azure Monitor, New Relic & analytics to monitors the Synapse Data pipelines
  • Implemented CI/CD for moving pipelines/Scripts from one environment to another by Repos branch strategy using Azure DevOps
  • Manage the sprint planning, Backlog refinement and retrospectives Using JIRA in Agile methodology.
  • Maintained the documentation on Docusaurus and Confluence platform.
  • Generates the ingredient, excipients, numerator devices, dosage, artifact, sub artifact, products reports using Snowflake & Power BI.
  • Published the reports sharing the pbix file on product portal for Business purpose.

Energy Analytics

Oil, Gas and Coal, OMDC:-

We are building products that contains the information about the oil and gas prices, tenders bidding, consumption and production data country-wise with other factors included.
  • Used various technologies like Python, Azure Database, Data Factory, Data Lake, Data bricks, PySpark, T-SQL, Pandas and Power BI
  • We crawl Complex data from Business, external resources and websites using Python and dump the Files into Azure blobs and data lake.
  • Created Azure Data Factory data pipelines, Activities, Linked services, IR, Triggers for process of ETL/ELT.
  • Write Azure Functions to Implement custom logics for transformation and automation in Python scripts.
  • Using Azure Monitor & analytics to monitors the ADF pipelines. Implemented CI/CD for moving pipelines/Scripts from one environment to another by ARM templates using Azure DevOps.
  • Generates the prices, production, consumption comparison reports using Power BI.

Aviation, IHS Markit Ltd.

Cargo & Flight BI: -

We are having the several pipeline which populates the data of Cargo, Shipment, Booking etc. for multiple Marts. On basis of this, Daily, weekly and monthly Report are generated.
  • Performed several transformations, structuring and cleansing on data including various transformations using Pyspark, Spark SQL and Delta tables.
  • Built multiple data pipelines and job clusters using Azure Data Factory and Databricks.
  • Handling of data on basis of refresh date and SQP date for Incremental load.
  • Worked in Agile methodology using JIRA
  • Highly proficient in using Spark-SQL for developing complex joins and aggregations.
  • Hands on experience on Synapse data warehousing of External tables through Parquet files in Data Lake.
  • Azure Data factory, Azure Data-lake, Azure Databricks, Delta Table, DevOps, Pyspark and Delta Tables. 
  • Cleaning of Cargo & flight data and movement from MS SQL, Hive, traditional Hadoop system and SFTP to Azure Data-lake and Delta tables in Databricks Using Azure Data factory pipelines and Databricks notebook.

Certifications

  • Certification in Azure Data Fundamentals from Microsoft Azure.    
  • Certification in Databricks Certified Data Engineer Associate from Databricks.
  • Certification in Databricks Certified Apache Spark Developer Associate 3.0 from Databricks.
  • Certification in Databricks Accredited Lakehouse fundamentals from Databricks. 
  • Certification In Master Data Analysis with Python - Intro To pandas from Udemy.

Rewards and Achievement

  • Achieved Team player award for Pharma & Clinical project in Q3, 2021.
  • Achieved Best performance Award for Energy analytics project in Q2, 2020.
  • Achieved Peer Award for Optimization of Parts intelligence pipelines Q3, 2019.
  • Achieved Rewards as Team player for Energy Analytics projects in Q4, 2018.   

Education

 B.E. in Electronics Engineering - 73.46% (2016) 
 Institute of Engineering, JIWAJI University, Gwalior 

Core Skills & Strengths 

 ● Team Management          ● Leadership Quality 

 ● Passionate and Creative  ● Quick Learner  

 ● Positive Thinking               ● Punctual 

 ● Motivated                           ● Flexible

Area of Interest 

 ● Interacting with people.  

 ● Willingness to learn new skills.

 ● Cooking                             

 ● Chess