End-to-End ML Pipeline with SageMaker Pipelines

Overview SageMaker Pipelines lets you define a directed acyclic graph (DAG) of ML steps that SageMaker executes, tracks, and makes reproducible. This project builds a complete pipeline over a retail sales dataset: raw data in S3 goes in, predictions come out, with every intermediate artefact versioned and auditable. The four steps in the pipeline: PreprocessData → TrainModel → CreateInferenceModel → BatchInference The Dataset Walmart retail sales data with three source tables (features, sales, stores). The target is weekly sales per store — a regression problem. ...

September 1, 2025 · 3 min · 495 words · Kiprono Elijah

SageMaker Jobs: Processing, Training, Inference & HPT

Overview Before building automated pipelines, it helps to understand SageMaker’s individual building blocks. This project exercises all four core SageMaker job types using a Walmart retail sales dataset and SageMaker’s built-in XGBoost algorithm. Each job type solves a distinct phase of the ML workflow and runs on managed, ephemeral compute — no servers to provision or maintain. Job Type Purpose Processing Job Data prep, feature engineering, evaluation Training Job Model fitting Batch Transform Job Offline inference on large datasets Hyperparameter Tuning Job Automated hyperparameter search The Dataset Three CSV tables from Walmart historical sales data — features, weekly sales, and store metadata — merged and engineered into a regression dataset predicting weekly store sales. ...

July 1, 2025 · 3 min · 513 words · Kiprono Elijah