Joel Calanche

Python developer & Machine Learning Engineer

  Portugal, Olhao

Phone: +351-924701160

email : [email protected]




SUMMARY


* 3 years of industry Experience with 2 years of Experience as a Data Scientist using ML Algorithms and Nature language Processing

* Working Experience & Extensive Knowledge in Python with libraries Such as Sklearn, TensorFlow, Numpy, Pandas, Matplotlib, Seaborn, spaCy, Nltk, OpenCv, Pyspark.

* Used Machine learning and Deep learning skills to successfully deliver a Customer segmentation Project.

* Worked on tools like - PyCharm, Visual studio, Jupyter Notebook, Sublime text. Google Colab Notebook

* Have Excellent communication and agile team work experience


  https://www.linkedin.com/in/joelcalanche96/

    https://github.com/Joelcalanche


Project Portfolio: http://joelcalanche96portafolio.pythonanywhere.com/portfolio/

Machine learning Skills


  • Tensorflow2.0 
  • Hyper-parameter tunning
  • Boto 3                                      
  • Keras
  • Python 3
  • Scikit-Learn
  • Pandas
  • NumPy
  • Jupyter Notebook
  • Flask
  • Selenium
  • Docker
  • streamlit
  • Django
  • Matplotlib
  • SQL
  • Time Series Forecasting

Data Engineer Skills


  • Kafka
  • Hadoop
  • Azure Data lake
  • Amazon S3
  • Spark
  • Flink
  • Spark Streaming
  • Kineses
  • Airflow
  • Dask
  • AWS "sage maker"
  • TFX
  • Mlops pipelines design


Languages


  • Spanish — Native
  • English — Profesional

DATABASES


  • SQL, PostgreSQL, MySQL, Big Query


    Cloud: AWS, GCP



I A Projects &  Applications


Data Scientist Model Builder at FULL VENUE
A company that performs customer segmentation, through artificial intelligence algorithms based on the previous behavior of customers, for the ecommerce,  ticketing, advertising and events industries to  analyze and optimize marketing campaigns.

Roles & Responsibilities:

* Actively Involved in daily standup calls task assigned.

* Merge data from multiple databases and sources using GOOGLE BIGQUERY, MYSQL, 

* Optimized & Pre-processed  raw  data.

* Feature building and data validation

* Exploratory data Analysis

* Performed Feature Selection on data using Python libraries like NumPy,Pandas,Seaborn.

* Performed Feature Engineering on data using Python libraries like NumPy,Pandas,Seaborn.

* Creating Clusters using K-Means, DBSCAN algotirithm.

* Built and Trained  supervised  Ml model like Random forest classifier and Dense Neural Network, using Scikit-learn

and Tensorflow libraries

* Analyzed Model Prediction, accuracy, using Classification Reports, Confusion Matrix, AUC Score

* Building machine learning pipelines using vertex ai from Google cloud platform

* Monitoring dashboard  using Google studio

April 2022- Present

Deep Learning   (POC)

NLP project: "NLP APP" for Deep Search Labs

Proof of concept for Deep Search Lab for the development of an application that allows the analysis of BBC news with NLP techniques, such as sentiment analysis, entities recognition, and content base recommendation system

Roles & Responsibilities:

* Analyze requirements

* Creation of ETL pipelines using Beautifullsoup , and pandas libraries 

*  Automation using Airflow and docker

* Text preprocessing, tokenization and lemmatization

* Model selection and construction using NLTK and Spacy libraries

* Creation of a web application using Streamlit for the visualization of tasks, deployment

* Unit Testing the data on custom data sets.

December  2021- January 2022

Deep Learning   

NLP project: "SKIMlLIT"  for Zero To Mastery Academy  (Project to obtain certificate)
Developer of a NLP model to classify abstract sentences into the role they play (e.g. objective, methods, results, etc) to enable researchers to skim through the literature (hence SkimLit ) and dive deeper when necessary. Feature engineering is used; hybrid models, different embedding forms(multi data models).  A f1 score of 0.79 was reached in test dataset.

  • In this project, the deep learning model behind the 2017 PubMed 200k RCT paper: A Dataset for Sequential Sentence Classification in Medical Abstracts has been replicated.
  • Using the so-called PubMed 200k RCT dataset consisting of ~200,000 abstracts of labeled randomized controlled trials (RCTs).
  • The goal of the dataset was to explore the ability of NLP models to classify sentences that appear in sequential order.
  • In other words, given the summary of an RCT, determine what role each sentence plays in the summary


september 2021 - october 2021

Deep Learning 

Computer vision project: "FOOD VISION" for Zero To Mastery Academy  (Project to obtain certificate)
Developer of a model based on convolutional neural networks with Tensorflow that allows identifying between 101 different classes of food dishes, using EffcientNet (transfer-learning) and new mixed precision features of tensorflow. 75,750 images (750 per class) were used for the training set and 25,250 (250 per class) images for the test set, an accuracy of 0.80 was reached  in test dataset .

The goal of beating DeepFood, a 2016 paper which used a Convolutional Neural Network trained for 2-3 days to achieve 77.4% top-1 accuracy.


  • image preprocessing and normalization is performed.
  • to start the selection of the model only 10% of the data have been used.
  • construction of different structures based on convolutional neural networks is carried out.
  • Feature extraction is performed.
  • Data augmentation is performed.

  • Transfer learning is used and then fine tuning is carried out.
  • and finally a Scaling up is done using 100% of the data,
  • different models are compared using Scikitlearn's classification report function.


may 2021 - august 2021

Machine Learning  

End-to-end-bulldozer-price-regression for Zero To Mastery Academy  (Project to obtain certificate)

Developer of a predictive random forest regression model with Scikit-learn and Python to estimate the cost of sales of heavy machinery(bulldozer), based on time series data. Exploratory data analysis ,data cleaning, feature engineering, model selection, evaluation metrics and feature importance. The data is from the Kaggle Bluebook for Bulldozers competition. A r square value of 0.87 was achieved  in test dataset .

march 2021 - april 2021

Machine Learning

 Heart disease detection (binary classification project)

Developer of a logistic regression model with Scikit-learn for binary classification of patients with heart diseases, based on previous medical récords(14 different medical features). EDA, model selection, feature importance, metrics evaluation: ROC curve and AUC score, confusion matrix, accuracy, recall and f1 with cross validation. the model achieved a value of f1 of 0.88  in test dataset .

january 2021 - febrary2021

Machine Learning/Electrical Engineer in CORPOELEC

 

Electrical engineer, worked for the state electric company, carried out static and dynamic studies, developing models to simulate fault conditions in elements of the electrical system such as power switches, overvoltage and stability studies were carried out, and models were also created "time series forecasting" to forecast the future demand for electrical energy in the system, using recurrent neural networks, LSTM.

Apr 2018 - Dec 2019

Education


Zero To Mastery Academy

Tensorflow developer

2021 - 2021

Zero To Mastery Academy

Machine Learning , Data Science

2021 - 2021

Universidad Nacional Experimental Politécnica

Bs in Electrical Engineering

2013 - 2019