a fully automated data integration pipeline that aggregated diverse data sets into a S3 Data Lake. Successfully integrated a range of data sources, including real-time data feeds from AWS Redshift and DocumentDB, as well as batch processes to import traditional CSV files. Utilized Databricks for large-scale data processing, leveraging its Spark capabilities to efficiently transform and aggregate incoming data streams. With the combined power of Databricks and AWS Lambda, ensured unparalleled data consistency, quality, and preparedness for sophisticated analytics and reporting. Utilized Databricks and Airflow to run extensive data profiling tasks, analyzing data patterns