Portugal, Olhao
Phone: +351-924701160
email : [email protected]
SUMMARY
* 3 years of industry Experience with 2 years of Experience as a Data Scientist using ML Algorithms and Nature language Processing
* Working Experience & Extensive Knowledge in Python with libraries Such as Sklearn, TensorFlow, Numpy, Pandas, Matplotlib, Seaborn, spaCy, Nltk, OpenCv, Pyspark.
* Used Machine learning and Deep learning skills to successfully deliver a Customer segmentation Project.
* Worked on tools like - PyCharm, Visual studio, Jupyter Notebook, Sublime text. Google Colab Notebook
* Have Excellent communication and agile team work experience
https://www.linkedin.com/in/joelcalanche96/
https://github.com/Joelcalanche
Project Portfolio: http://joelcalanche96portafolio.pythonanywhere.com/portfolio/
DATABASES
Cloud: AWS, GCP
Data Scientist Model Builder at FULL VENUE
A company that performs customer segmentation, through artificial intelligence algorithms based on the previous behavior of customers, for the ecommerce, ticketing, advertising and events industries to analyze and optimize marketing campaigns.
Roles & Responsibilities:
* Actively Involved in daily standup calls task assigned.
* Merge data from multiple databases and sources using GOOGLE BIGQUERY, MYSQL,
* Optimized & Pre-processed raw data.
* Feature building and data validation
* Exploratory data Analysis
* Performed Feature Selection on data using Python libraries like NumPy,Pandas,Seaborn.
* Performed Feature Engineering on data using Python libraries like NumPy,Pandas,Seaborn.
* Creating Clusters using K-Means, DBSCAN algotirithm.
* Built and Trained supervised Ml model like Random forest classifier and Dense Neural Network, using Scikit-learn
and Tensorflow libraries
* Analyzed Model Prediction, accuracy, using Classification Reports, Confusion Matrix, AUC Score
* Building machine learning pipelines using vertex ai from Google cloud platform
* Monitoring dashboard using Google studio
April 2022- Present
NLP project: "NLP APP" for Deep Search Labs
Proof of concept for Deep Search Lab for the development of an application that allows the analysis of BBC news with NLP techniques, such as sentiment analysis, entities recognition, and content base recommendation system
Roles & Responsibilities:
* Analyze requirements
* Creation of ETL pipelines using Beautifullsoup , and pandas libraries
* Automation using Airflow and docker
* Text preprocessing, tokenization and lemmatization
* Model selection and construction using NLTK and Spacy libraries
* Creation of a web application using Streamlit for the visualization of tasks, deployment
* Unit Testing the data on custom data sets.
December 2021- January 2022
NLP project: "SKIMlLIT" for Zero To Mastery Academy (Project to obtain certificate)
Developer of a NLP model to classify abstract sentences into the role they play (e.g. objective, methods, results, etc) to enable researchers to skim through the literature (hence SkimLit ) and dive deeper when necessary. Feature engineering is used; hybrid models, different embedding forms(multi data models). A f1 score of 0.79 was reached in test dataset.
september 2021 - october 2021
Computer vision project: "FOOD VISION" for Zero To Mastery Academy (Project to obtain certificate)
Developer of a model based on convolutional neural networks with Tensorflow that allows identifying between 101 different classes of food dishes, using EffcientNet (transfer-learning) and new mixed precision features of tensorflow. 75,750 images (750 per class) were used for the training set and 25,250 (250 per class) images for the test set, an accuracy of 0.80 was reached in test dataset .
The goal of beating DeepFood, a 2016 paper which used a Convolutional Neural Network trained for 2-3 days to achieve 77.4% top-1 accuracy.
may 2021 - august 2021
Developer of a predictive random forest regression model with Scikit-learn and Python to estimate the cost of sales of heavy machinery(bulldozer), based on time series data. Exploratory data analysis ,data cleaning, feature engineering, model selection, evaluation metrics and feature importance. The data is from the Kaggle Bluebook for Bulldozers competition. A r square value of 0.87 was achieved in test dataset .
march 2021 - april 2021
Developer of a logistic regression model with Scikit-learn for binary classification of patients with heart diseases, based on previous medical récords(14 different medical features). EDA, model selection, feature importance, metrics evaluation: ROC curve and AUC score, confusion matrix, accuracy, recall and f1 with cross validation. the model achieved a value of f1 of 0.88 in test dataset .
january 2021 - febrary2021
Electrical engineer, worked for the state electric company, carried out static and dynamic studies, developing models to simulate fault conditions in elements of the electrical system such as power switches, overvoltage and stability studies were carried out, and models were also created "time series forecasting" to forecast the future demand for electrical energy in the system, using recurrent neural networks, LSTM.
Apr 2018 - Dec 2019
2021 - 2021
2021 - 2021
2013 - 2019
Portugal, Olhao
Phone: +351-924701160
email : [email protected]
SUMMARY
* 3 years of industry Experience with 2 years of Experience as a Data Scientist using ML Algorithms and Nature language Processing
* Working Experience & Extensive Knowledge in Python with libraries Such as Sklearn, TensorFlow, Numpy, Pandas, Matplotlib, Seaborn, spaCy, Nltk, OpenCv, Pyspark.
* Used Machine learning and Deep learning skills to successfully deliver a Customer segmentation Project.
* Worked on tools like - PyCharm, Visual studio, Jupyter Notebook, Sublime text. Google Colab Notebook
* Have Excellent communication and agile team work experience
https://www.linkedin.com/in/joelcalanche96/
https://github.com/Joelcalanche
Project Portfolio: http://joelcalanche96portafolio.pythonanywhere.com/portfolio/
DATABASES
Cloud: AWS, GCP
Data Scientist Model Builder at FULL VENUE
A company that performs customer segmentation, through artificial intelligence algorithms based on the previous behavior of customers, for the ecommerce, ticketing, advertising and events industries to analyze and optimize marketing campaigns.
Roles & Responsibilities:
* Actively Involved in daily standup calls task assigned.
* Merge data from multiple databases and sources using GOOGLE BIGQUERY, MYSQL,
* Optimized & Pre-processed raw data.
* Feature building and data validation
* Exploratory data Analysis
* Performed Feature Selection on data using Python libraries like NumPy,Pandas,Seaborn.
* Performed Feature Engineering on data using Python libraries like NumPy,Pandas,Seaborn.
* Creating Clusters using K-Means, DBSCAN algotirithm.
* Built and Trained supervised Ml model like Random forest classifier and Dense Neural Network, using Scikit-learn
and Tensorflow libraries
* Analyzed Model Prediction, accuracy, using Classification Reports, Confusion Matrix, AUC Score
* Building machine learning pipelines using vertex ai from Google cloud platform
* Monitoring dashboard using Google studio
April 2022- Present
NLP project: "NLP APP" for Deep Search Labs
Proof of concept for Deep Search Lab for the development of an application that allows the analysis of BBC news with NLP techniques, such as sentiment analysis, entities recognition, and content base recommendation system
Roles & Responsibilities:
* Analyze requirements
* Creation of ETL pipelines using Beautifullsoup , and pandas libraries
* Automation using Airflow and docker
* Text preprocessing, tokenization and lemmatization
* Model selection and construction using NLTK and Spacy libraries
* Creation of a web application using Streamlit for the visualization of tasks, deployment
* Unit Testing the data on custom data sets.
December 2021- January 2022
NLP project: "SKIMlLIT" for Zero To Mastery Academy (Project to obtain certificate)
Developer of a NLP model to classify abstract sentences into the role they play (e.g. objective, methods, results, etc) to enable researchers to skim through the literature (hence SkimLit ) and dive deeper when necessary. Feature engineering is used; hybrid models, different embedding forms(multi data models). A f1 score of 0.79 was reached in test dataset.
september 2021 - october 2021
Computer vision project: "FOOD VISION" for Zero To Mastery Academy (Project to obtain certificate)
Developer of a model based on convolutional neural networks with Tensorflow that allows identifying between 101 different classes of food dishes, using EffcientNet (transfer-learning) and new mixed precision features of tensorflow. 75,750 images (750 per class) were used for the training set and 25,250 (250 per class) images for the test set, an accuracy of 0.80 was reached in test dataset .
The goal of beating DeepFood, a 2016 paper which used a Convolutional Neural Network trained for 2-3 days to achieve 77.4% top-1 accuracy.
may 2021 - august 2021
Developer of a predictive random forest regression model with Scikit-learn and Python to estimate the cost of sales of heavy machinery(bulldozer), based on time series data. Exploratory data analysis ,data cleaning, feature engineering, model selection, evaluation metrics and feature importance. The data is from the Kaggle Bluebook for Bulldozers competition. A r square value of 0.87 was achieved in test dataset .
march 2021 - april 2021
Developer of a logistic regression model with Scikit-learn for binary classification of patients with heart diseases, based on previous medical récords(14 different medical features). EDA, model selection, feature importance, metrics evaluation: ROC curve and AUC score, confusion matrix, accuracy, recall and f1 with cross validation. the model achieved a value of f1 of 0.88 in test dataset .
january 2021 - febrary2021
Electrical engineer, worked for the state electric company, carried out static and dynamic studies, developing models to simulate fault conditions in elements of the electrical system such as power switches, overvoltage and stability studies were carried out, and models were also created "time series forecasting" to forecast the future demand for electrical energy in the system, using recurrent neural networks, LSTM.
Apr 2018 - Dec 2019
2021 - 2021
2021 - 2021
2013 - 2019