Python

End-to-End ML Pipeline with SageMaker Pipelines

Overview SageMaker Pipelines lets you define a directed acyclic graph (DAG) of ML steps that SageMaker executes, tracks, and makes reproducible. This project builds a complete pipeline over a retail sales dataset: raw data in S3 goes in, predictions come out, with every intermediate artefact versioned and auditable. The four steps in the pipeline: PreprocessData → TrainModel → CreateInferenceModel → BatchInference The Dataset Walmart retail sales data with three source tables (features, sales, stores). The target is weekly sales per store — a regression problem. ...

ML Workflow Orchestration with AWS Step Functions

Overview AWS Step Functions lets you orchestrate multi-step workflows as state machines — each state can invoke an AWS service, branch on conditions, retry on failure, or run steps in parallel. This project uses Step Functions to wire together SageMaker Processing, Training, and Batch Transform jobs into a production-grade ML workflow that can be triggered on a schedule or by an event. The key difference from SageMaker Pipelines: Step Functions is AWS-native orchestration for any service combination (SageMaker + Lambda + Glue + SNS), whereas SageMaker Pipelines is ML-specific. For workflows that touch multiple AWS services, Step Functions is the right tool. ...

SageMaker Jobs: Processing, Training, Inference & HPT

Overview Before building automated pipelines, it helps to understand SageMaker’s individual building blocks. This project exercises all four core SageMaker job types using a Walmart retail sales dataset and SageMaker’s built-in XGBoost algorithm. Each job type solves a distinct phase of the ML workflow and runs on managed, ephemeral compute — no servers to provision or maintain. Job Type Purpose Processing Job Data prep, feature engineering, evaluation Training Job Model fitting Batch Transform Job Offline inference on large datasets Hyperparameter Tuning Job Automated hyperparameter search The Dataset Three CSV tables from Walmart historical sales data — features, weekly sales, and store metadata — merged and engineered into a regression dataset predicting weekly store sales. ...

Deploying an ML Model with a SageMaker Real-Time Endpoint

Overview Training a model is the easy part. Getting it into production — where it serves real-time predictions reliably — is where most ML projects stall. This project deploys a trained bank churn model as a SageMaker real-time endpoint using a custom Docker inference container, then adds a Lambda function to automate inference whenever new data lands in S3. The stack: FastAPI + uvicorn as the inference server, Docker for packaging, ECR for the registry, SageMaker for hosting, and Lambda + S3 for event-driven automation. ...

Makefile Explained

What is a Makefile? A Makefile is a declarative build script used by the make tool to automate tasks like compiling code, running scripts, cleaning directories, or managing dependencies. It’s most useful when: You want to codify a repeatable workflow You want one-liner commands for multi-step processes You work across environments (e.g., Docker, EC2, CI/CD) Basic Syntax and Anatomy # Syntax: #target: dependency1, dependency2, … #<TAB>recipe (command 1) #<TAB>recipe (command 2) # ... # Example of Makefile entry train: data/processed.csv python src/train.py --input data/processed.csv --output model.pkl Key Concepts: target: The name of the file or alias you want to build (e.g. train) dependencies: Files that the target depends on, in the case above data/processed.csv is the dependency. recipe: Command(s) to run if dependencies are newer than target. in the example above recipe is python src/train.py --input data/processed.csv --output model.pkl Tab is required. Space will break it. ...

How to Install Python Manually in Linux

This guide will discuss how to install Python manually on a Linux machine. For your convenience, we will also discuss how to uninstall Python installed in this way. Steps to Follow to Install Python Manually First of all, we need to update package repositories and install dependencies. Step 1: Update repositories On Debian-based distributions, execute (modify the commands according to the distro you are running): sudo apt update sudo apt install build-essential zlib1g-dev \ libncurses5-dev libgdbm-dev libnss3-dev \ libssl-dev libreadline-dev libffi-dev curl Step 2: Download the stable release of Python on its official website In this step, go to https://www.python.org/downloads/source/ and download XZ compressed source tarball (.tar.xz) file. This file contains all the source files we can build to get the Python we want (I am downloading Python 3.10.5, so I get, Python-3.10.5.tar.xz file). ...