Oshin Agarwal


Site Reliability Engineer with experience in infrastructure deployment and programming. As an SRE at Directi I have: 

  • Setup and managed 4 AWS cloud-infrastructure with more than 40+ servers 
  • Migrated legacy infrastructure of 50+ servers to a newer server cluster, reducing load by 50%.
  • Revamped the monitoring infrastructure using NetData and PagerDuty pipeline reducing deployment time. 
  • Automated letsEncrypt certificate deployment pipeline for real-time cert generation for domain parking system serving 1 million+ domains
  • Automated the process of identifying and blocking high load domains in domain parking to reduce load on serving cluster. 
  • Filtered SPAM traffic coming from China thereby reducing load on routers and load-balancer.
  • Reduced baking time of Redis from 9+ hours to less than 15 mins.


   [email protected]

+91- 9532912337
Mumbai, India

Experience

Site Reliabity Engineer II at Media.net, 27th June 2018 to Present

  • Deploying, managing and scaling push-notification project based in ADC. 
  • Managing uptime of domain parking system deployed DC and other 3 AWS projects having total of 70+ servers 
  • Revamp the monitoring and metrics pipeline Automate deployment of small serving cluster.
  • Automate deployment of small serving cluster.

Devops Engineer at Media.net, 27 July 2016 to Present

  • Migrating legacy infrastructure of 50+ servers
  •  Deploying, managing and scaling of 3 AWS projects with 30+ servers
  •  Automating cert deployment and domain blocking Setting up central logging infrastructure.
  • Setting up central logging infrastructure.
  • Managing uptime of live-system.

Skills


  • Cloud Infrastructure Expert in deploying, managing and migrating to/from AWS infrastructure. 
  • Containerisation Have working knowledge of Docker containers 
  • Configuration Management Expert in managing Puppet configurations and have worked on Terraform. 
  • Monitoring Proficient in monitoring systems using PagerDuty and Icinga. 
  • Metric Collection Collected system metrics using Netdata, Collectd and custom python scripts and shipped to graphite. 


  • Continuous deployment Skilled at setting up continuous pipeline using Jenkins 
  • Programming skills Proficient in Python and Shell-Scripting. 
  • Other skills Apache Kafka, ElasticSearch, Redis, Dovecot, PowerDNS, Netscaler and BIGIP. 
  • Problem Solving Proficient in using Strace and other debugging tools. 
  • Managed multiple projects independently 
  • Good at maintaining communication between multiple teams.

Education

CPI: 9.05
Bachelor of Technology, Jul 2012 - May 2016

B.Tech in Computer Science and Engineering .National Institute of Technology Allahabad

Per: 94.00%
ISC-XII, Apr 2010 - May 2012

RLPS,Jhansi-284002




Projects

Infrastructure deployment to send push-notification to web-users

Deployed an AWS infrastructure capable of sending push-notification to 40k+ users at the same time. Handled the project end to end from AWS infrastructure deployment, VPC setup, configuration management, deployment pipeline, server setup, OpenVPN setup, monitoring and metric handling. 

Infrastructure deployment to send email campaigns to iphone users

Involved setting up the infrastructure at AWS including web servers, mail servers(dovecot) ,database servers,,ftp servers,rsyslog server for log aggregation and nagios sentinel ,cache servers, managing security groups and elbs. Deploying scripts for adhoc tasks for backing up ami's,mysql and cert renewal,domain addition and creation of elb.

Automated deployment of a  a WebServer cluster for serving architecture.

Used terraform and puppet to automate the deployment of small independent serving clusters.

Cert Renewal Pipeline using Let's encrypt

Setup infrastructure for dynamic generation and management of multiple certs using let's encrypt pipeline.



Automatically blocking high load domain with 0 revenue

Blocked troublesome domains in real-time by exposing a python api to handle event-listener of high-load from Nagios and then using elk to read apache logs and parsing revenue page HTML and blocking domain at load-balancer in real-time.