subscriber_portfolio

Title: Codecademy Portfolio Project, Data Ingestion Pipeline.

To see the full codebase for this project: Link to my github account

Description:

A project intended to automate the data ingestion of subscriber cancellations for an online learning company. Ultimately providing a tidy, analytic ready csv file and sqlite database.

Features:

Utilizing Jupyter notebooks and the pandas python library to expolore, clean and transform datasets.
Use Python to automate data cleaning and transformation using the unittest built-in module and also error logging utilizing the logging module.
Harness the sqlite3 module within Python to access data from a relational database, and then produce an analytic ready data warehouse.
Using Bash scripts to automate file management and run scripts.

Technologies:

Python and various standard library modules.
The Pandas and Numpy third-party packages.
Sqlite databases.
Knowledge of data cleaning and tidying.
Command Line and Bash Scripting.

Folder Structure:

Main Level: Includes the python, testing and bash scripts as well as the folders for the following:

/logs - contains the logs for the testing and main production scripts.
/data_dev - the repository for the main development database and csv file.
/data_prod - the final location for the sqlite analytics database and csv file.

Running the Bash Script:

The entire process can be run by running the following script from the command line:

bash_script_runall.sh

Collaborators:

Codecademy Portfolio Project: #13 lesson from the Data Engineering Career Path.

License:

N/A.