Bacteremia Prediction using Machine Learning (NTU Hospital Medical Research) 使用機器學習預測細菌血症（台大醫院醫研部）

Around 2000 features, 1000 000+ records, over 10 datasets

大約2000特徵， 10萬多行, 10個多資料集

Main tools:

Python coding language

Scikit-Learn

Pandas

Numpy

Power BI

Excel

Classification models:

Logistic regression

XGBClassifier

CatBoost

Extra Trees

Random Forest

Gradient Boosting

LightGBM Classifier

Complex multi-step data preparation and cleansing. Including arbitrary string columns, numeric and categorical columns. 複雜的多步驟資料準備和清理，包括任意字串欄位、數值和分類欄位

More information: https://github.com/yarik89spb?tab=repositories

We embarked on a two-year machine learning research project within the Medical Research department of National Taiwan University Hospital. Our primary objective was to leverage quantitative data collected from the emergency department to develop a predictive model capable of determining the likelihood of patients contracting bacteremia. Additionally, we aimed to explore secondary goals such as predicting mortality rates.

The project encompassed an end-to-end process, beginning with comprehensive data collection from various internal hospital sources. To ensure data quality, we implemented sophisticated techniques for data cleansing and transformation, addressing complexities specific to the healthcare domain. Subsequently, we fed the enriched dataset into multiple classification models, including powerful ensemble models like CatBoost and XGBoost.

Throughout the project, we emphasized the integration of big data analytics, enabling us to derive meaningful insights and increase the accuracy of our predictive models. By effectively merging disparate data sources and employing advanced machine learning algorithms, we aimed to enhance the overall efficiency and precision of healthcare decision-making within the emergency department at National Taiwan University Hospital.

國立台灣大學醫學研究部門的一項為期兩年的機器學習研究計畫，從數據收集到預測模型部署實現全流程。本計畫的目的是利用急診科患者的定量數據，預測其是否感染細菌血症，並包括其他次要目標，如預測死亡率。我們對內部醫院數據進行結構化和合併，進行了複雜的數據清洗和轉換，然後將大數據輸入多個分類模型，包括CatBoost和XGBoost等多個集成模型。

在這項計畫中，我們強調了大數據分析的整合，使我們能夠獲得有意義的洞察並提高預測模型的準確性。通過有效整合不同的數據來源和應用先進的機器學習算法，我們旨在提高國立台灣大學醫院急診科的整體效率和醫療決策的精確性。

使用機器學習預測細菌血症 Predict Bacteremia using Machine Learning Techniques