Projects Portfolio

Welcome to my professional portfolio. Here, I present a curated collection of my career highlights and detailed documentation of the diverse projects I’ve been involved in. This space not only reflects my professional journey but also showcases my personal ventures into new technologies, demonstrating how they continuously enrich my skills and contribute to my growth as a technology professional.

Corporative Experience

Title: Data Quality Pipeline Migration from Databricks to AWS
Company: H1-Insights
Period: Oct-2013 to Mar-2024
Role: Technical Lead, Project Manager, Data Scientist, Solutions Architect, Data Engineer
Key Responsibilities: Oversaw technical direction, managed project lifecycle, led data science initiatives, and hands-on coding of 75% of the pipeline and reporting system.
Technologies: AWS (EMR, Athena, Step Functions, S3), Python, SQL, PySpark, Jira, Slack

Title: Embeddings with OpenAI and ChromaDB (Proof of Concept)
Company: H1-Insights
Period: Jan-2024 to Mar-2024
Role: Data Science, Data Engineering, Project Manager
Key Responsibilities: Developed data models with OpenAI API, engineered data pipelines using ChromaDB, analyzed data for accuracy and insights, lead strategy, project evaluation and execution of the proof of concept.
Technologies: Python, OpenAI API, ChromaDB, LangChain

Title: Automated Data Quality Management and Reporting in Multi-Source ETL Pipeline
Company: H1-Insights
Period: Jan-2023 to Jun-2023
Role: Data Analyst, Python Developer, Project Manager
Key Responsibilities:

  • Designed and implemented automated data quality checks for 6000+ web scraping data sources.
  • Developed and maintained SQL and Python scripts for data validation and cleansing.
  • Oversaw the integration of AWS services (EMR, Athena, S3) for efficient data handling and storage.
  • Utilized NLP techniques for advanced data analysis and insights generation.
  • Coordinated cross-functional teams to align project objectives with business goals.
  • Prepared and presented detailed reports on data quality metrics and pipeline performance. Technologies: Python, SQL, AWS (EMR, Athena, S3), Natural Language Processing (NLP)

Title: Automated Data Quality Management Pipeline Databricks
Company: H1-Insights
Period: Nov-2022 to Apr-2023
Role:
Key Responsabilities:
Technologies: SQL, Python, NLP, Pandas, PySpark, Databricks Workflows, Databricks Notebooks, Git, Jira, Slack, Zoom

Title: Data Quality Management Report Develpment
Company: H1-Insights
Period: Jul-2022 to Mar-2024
Role:
Key Responsabilities:
Technologies:

Title: Record Linkage for Organizations Database
Company: H1-Insights
Period: Mar-2023 to Mar-2024
Role:
Key Responsabilities:
Technologies: SQL, Python, NLP, Pandas, PySpark, Databricks Workflows, Databricks Notebooks, Git, Jira, Slack, Zoom

Title: Medical Registries Data Modeling
Company: H1-Insights
Period: Dec-2021 to Jun-2022
Role:
Key Responsabilities:
Technologies: Jira

Title: ETL Pubmed and Record Linkage
Company: H1-Insights
Period: March-2021 to Dec-2021
Role:
Key Responsabilities:
Technologies: SQL, Python, NLP, Pandas, Git, Jira, TF-IDF, Fuzzy-Matching

Title: Medical Affairs Research and Data Collection
Company: H1-Insights
Period: Feb-2021 to Jun-2021
Role: Research Data Analyst
Key Responsabilities:
Technologies: SQL, Python, Scrapy, Jira, Slack, Zoom