End-to-End ML Pipeline with SageMaker Pipelines

Overview SageMaker Pipelines lets you define a directed acyclic graph (DAG) of ML steps that SageMaker executes, tracks, and makes reproducible. This project builds a complete pipeline over a retail sales dataset: raw data in S3 goes in, predictions come out, with every intermediate artefact versioned and auditable. The four steps in the pipeline: PreprocessData → TrainModel → CreateInferenceModel → BatchInference The Dataset Walmart retail sales data with three source tables (features, sales, stores). The target is weekly sales per store — a regression problem. ...

September 1, 2025 · 3 min · 495 words · Kiprono Elijah

ML Workflow Orchestration with AWS Step Functions

Overview AWS Step Functions lets you orchestrate multi-step workflows as state machines — each state can invoke an AWS service, branch on conditions, retry on failure, or run steps in parallel. This project uses Step Functions to wire together SageMaker Processing, Training, and Batch Transform jobs into a production-grade ML workflow that can be triggered on a schedule or by an event. The key difference from SageMaker Pipelines: Step Functions is AWS-native orchestration for any service combination (SageMaker + Lambda + Glue + SNS), whereas SageMaker Pipelines is ML-specific. For workflows that touch multiple AWS services, Step Functions is the right tool. ...

August 1, 2025 · 3 min · 609 words · Kiprono Elijah

SageMaker Jobs: Processing, Training, Inference & HPT

Overview Before building automated pipelines, it helps to understand SageMaker’s individual building blocks. This project exercises all four core SageMaker job types using a Walmart retail sales dataset and SageMaker’s built-in XGBoost algorithm. Each job type solves a distinct phase of the ML workflow and runs on managed, ephemeral compute — no servers to provision or maintain. Job Type Purpose Processing Job Data prep, feature engineering, evaluation Training Job Model fitting Batch Transform Job Offline inference on large datasets Hyperparameter Tuning Job Automated hyperparameter search The Dataset Three CSV tables from Walmart historical sales data — features, weekly sales, and store metadata — merged and engineered into a regression dataset predicting weekly store sales. ...

July 1, 2025 · 3 min · 513 words · Kiprono Elijah

Deploying an ML Model with a SageMaker Real-Time Endpoint

Overview Training a model is the easy part. Getting it into production — where it serves real-time predictions reliably — is where most ML projects stall. This project deploys a trained bank churn model as a SageMaker real-time endpoint using a custom Docker inference container, then adds a Lambda function to automate inference whenever new data lands in S3. The stack: FastAPI + uvicorn as the inference server, Docker for packaging, ECR for the registry, SageMaker for hosting, and Lambda + S3 for event-driven automation. ...

November 1, 2025 · 6 min · 1105 words · Kiprono Elijah