awk_aws_pipeline

Title: Apple Weatherkit REST API data –> AWS RDS Postgresql Database

- End to end automated data pipeline.

To see the full codebase for this project: Link to my github account

Description:

A project to build out a postgresql database schema of four tables, and then an automated data pipeline of location-hourly key weather metrics for 112 airport locations acrosss the US. The time period of the data includes the 10 days prior, the current day, and then 9 days into the future.
Purpose:

The ultimate purpose of this project was to produce a ‘live’ database (refreshed every 4 hours) of by location, by hour observational weather data to be able to build interactive data visualizations of baromtric pressure and also other key weather metrics for the purpose of informing the user of the ‘current’ weather patterns.

Visualization 1 - 10 Day Historical & Forecasted Barometric Pressure: By Hour

Visualization 2 - Line & Bar Graphs of Historical and Forecasted Key Weather Metrics

Database Setup Process:
Data Pipeline Process:
Unit Testing:
Technologies:
  1. Python and various standard library modules.
  2. Apple Weatherkit REST API.
  3. AWS Cloud Platform including: S3, Lambda (including Layers), RDS, EventBridge, CloudWatch
  4. The Pandas and Numpy third-party packages.
  5. SQLAlchemy and SQLAlchemy ORM.
  6. Postgresql database.
  7. Knowledge of data cleaning and tidying.
  8. Advanced SQL techniques including: CTE’s, Window Functions and CASE Statements for data analysis and aggregation.