AI Engineering and MLOps: Designing, Deploying, and Scaling AI Systems

Program Description

This two-day technical program is designed for technical executives (CTOs, IT Directors, and Lead Architects) to bridge the gap between "Experimental AI" and "Enterprise-Grade Production." In the Malaysian corporate landscape, the challenge has shifted from building a model to maintaining its reliability, safety, and scalability.
This program provides a deep dive into Machine Learning Operations (MLOps) and AI Engineering lifecycle management.
Participants will learn to architect robust pipelines for both Traditional ML and Generative AI, ensuring structural compliance with PDPA 2.0 and the National AI Governance & Ethics (AIGE) guidelines while optimizing for high-concurrency and cost-efficiency.

While this outline serves as a foundational framework with use cases from multiple industries and functions, the final program is fully customized to your industry and internal workflows.

Participants work on real-world problems, not generic examples. We engage in a pre-workshop alignment to inject your specific organizational datasets, pain points, and proprietary use cases directly into the curriculum.

Learning Objectives

Architect End-to-End MLOps Pipelines: Master the technical workflow from data versioning and automated testing to continuous deployment (CI/CD/CT).
Master Infrastructure for Generative AI (LLMOps): Design scalable architectures for RAG (Retrieval-Augmented Generation), fine-tuning, and low-latency inference.
Implement Model Observability & Drift Detection: Use technical monitoring frameworks to detect and correct data drift, model decay, and algorithmic bias in real-time.

Program Details

Content

Day 1: AI Systems Design & The Engineering Lifecycle

Module 1: The AI Engineering Stack for Technical Leaders

Moving from “Notebooks” to “Microservices.” Understanding the core components of a production AI system: Feature Stores, Model Registries, and Metadata Tracking.
Scenario (Banking): A technical lead audits a legacy credit scoring system and architects a transition to a real-time, event-driven feature store to enable instant loan approvals.
Hands-on: “The Architecture Blueprint” – Designing a multi-tier AI system architecture that separates data ingestion, training, and inference layers for high availability.
Expected Impact: Technical clarity on selecting the right tools (e.g., MLflow, Kubeflow) vs. cloud-native managed services (AWS/Azure/GCP).

Module 2: Data Engineering & Versioning (DVC)

Ensuring “Reproducibility” in AI. Deep dive into Data Version Control (DVC) and automated data validation pipelines to prevent “Garbage In, Garbage Out.”
Demo (Manufacturing): An automated “Data Guardrail” in a factory sensor pipeline that halts model retraining if it detects faulty calibration data from a specific assembly line.
Hands-on: Setting up a Git-integrated Data Versioning (DVC) workflow to track changes in a large-scale e-commerce transaction dataset.
Expected Impact: Elimination of the “It worked on my machine” problem; 100% auditability of data lineage.

Module 3: Continuous Integration & Training (CI/CT)

Beyond standard DevOps. Implementing Continuous Training (CT) – where models automatically retrain and validate when performance drops or new data arrives.
Scenario (Retail): A recommendation engine that automatically retrains itself when it detects a 5% drop in click-through rate during a sudden Malaysian festive sale (e.g., Shopee 11.11).
Hands-on: Coding a GitHub Action or GitLab CI pipeline that triggers a model validation suite every time a code change or data update is committed.
Expected Impact: Significant reduction in manual intervention; faster “Time-to-Production” for model improvements.

Module 4: Engineering for PDPA 2.0 & Privacy

Implementing “Privacy-by-Design.” Technical methods for PII masking, k-anonymity, and the newly mandated Data Protection Impact Assessments (DPIA) within the AI pipeline.
Scenario (HR/Operations): Building a secure talent analytics pipeline where NRICs and sensitive personal data are automatically encrypted and pseudonymized at the point of ingestion.
Hands-on: Implementing an automated “PII Scanner” node in a Python data pipeline that flags and redacts sensitive Malaysian identifiers before data storage.
Expected Impact: 100% compliance with Malaysian PDPA 2.0; structural protection against multi-million ringgit fines and data breaches.

Day 2: LLMOps, Observability, and Scaling

Module 5: LLMOps: Deploying Generative AI at Scale

The unique challenges of GenAI. Architecting for Retrieval-Augmented Generation (RAG), managing Vector Databases (Pinecone, Weaviate), and optimizing GPU utilization.
Demo (Customer Experience): Architecting a “Sovereign RAG” system for a Malaysian telco that answers customer queries using internal PDFs while keeping data strictly within local cloud regions.
Hands-on: Configuring an end-to-end RAG pipeline – linking a Vector Database to an LLM and setting up automated “Hallucination Checks” for output quality.
Expected Impact: Transition from simple chatbots to reliable, context-aware enterprise AI agents.

Module 6: Model Serving & High-Concurrency Deployment

Choosing the right “Inference Strategy.” Comparing Batch vs. Real-time vs. Edge deployment. Mastering Docker and Kubernetes for AI containerization.
Scenario (Logistics/E-commerce): Deploying a high-concurrency demand forecasting model that must handle 10,000+ requests per second during peak “12.12” traffic without latency.
Hands-on: Containerizing a Python-based model using Docker and simulating a high-load deployment to test auto-scaling triggers and load balancing.
Expected Impact: Technical mastery over infrastructure cost-optimization and system resilience.

Module 7: Observability: Monitoring Model & Data Drift

The “Post-Deployment” crisis. Monitoring for Concept Drift (market changes) and Data Drift (input changes). Setting up automated alerts and “Circuit Breakers.”
Demo (Finance/Risk): A monitoring dashboard that flags a “Drift Alert” when the profile of mortgage applicants shifts significantly due to a rise in interest rates by Bank Negara Malaysia.
Hands-on: Setting up an automated monitoring loop in Python that calculates Population Stability Index (PSI) and triggers an email alert if the model’s confidence drops.
Expected Impact: Proactive risk management; ensuring AI remains accurate and fair long after its initial launch.

Module 8: AIGE Compliance & The 90-Day Scaling Roadmap

Implementing the National AI Governance & Ethics (AIGE) principles (Fairness, Accountability, Transparency). Designing “Emergency Shut-off” mechanisms for AI agents.
The Framework: Prioritizing the “AI Backlog” based on Technical Debt, Scalability, and Regulatory Risk.
Hands-on: Co-creating an “Enterprise AI Playbook” – defining technical KPIs for MLOps maturity and a phased 3-6 month roadmap for departmental AI scaling.
Expected Impact: A clear, sustainable path toward transforming your organization into a technically-mature, AI-First enterprise.

List of Deliverables

Master MLOps Code Repository: Reusable templates for CI/CD pipelines, Dockerfiles, and DVC configurations.
Architecture Reference Blueprints: Design patterns for RAG, real-time inference, and hybrid cloud AI setups.
AIGE Compliance Checklist: A technical audit guide for bias detection, explainability, and safety guardrails.

Prerequisites

Technical Knowledge: Intermediate Python proficiency and familiarity with Basic Machine Learning concepts (training vs. testing). Some experience with Cloud/DevOps (Docker/Git) is helpful.
Essential Equipment: A laptop with VS Code, Docker, and access to a Cloud Sandbox (AWS/Azure/GCP setup will be provided).
Mindset: A shift from "Building Models" to "Building Systems."

Who Should Attend

Training Methodology

The "Production Lab": 70% of the program is hands-on coding, debugging, and architectural whiteboarding.
Deep-Dive Technical Case Studies: Analyzing real-world production failures and successes in the Malaysian market.
Strategic Co-Design: Group sessions to solve actual departmental "Deployment Bottlenecks" using advanced MLOps patterns.

100% HRDC-Claimable

This program is fully registered and compliant with HRDC (Human Resource Development Corporation) requirements under the SBL-Khas scheme, allowing Malaysian employers to offset the training costs against their levy.

Certification of Completion

Participants who successfully complete the program will be awarded a “Professional Certificate in AI Engineering & MLOps Leadership.“

Post-Workshop Consulting (Optional)

For organizations looking to bridge the gap between training and execution, we offer optional, paid consulting services. These engagements provide expertise and technical support for specific pilot development or full-scale operational integration of the data- and AI-driven use cases established during the program.