Data Science Essentials

Program Description

This two-day technical program is designed for technical executives (CTOs, Heads of Data, IT Directors) who need to move beyond high-level AI concepts into the architectural and mathematical foundations of Modern Data Science.
This program covers the full stack of intelligence, from Classical Machine Learning (ML) and Deep Learning (DL) to the integration of Generative AI within production workflows.
Focused on the Malaysian corporate landscape, the course addresses data engineering at scale, model evaluation metrics, and the technical implementation of PDPA-compliant and Sovereign AI architectures for high-stakes industries like banking, manufacturing, and e-commerce.

While this outline serves as a foundational framework with use cases from multiple industries and functions, the final program is fully customized to your industry and internal workflows.

Participants work on real-world problems, not generic examples. We engage in a pre-workshop alignment to inject your specific organizational datasets, pain points, and proprietary use cases directly into the curriculum.

Learning Objectives

Architect Hybrid AI Systems: Design workflows that combine the predictive precision of Traditional ML with the cognitive scale of LLMs.
Master Feature Engineering & Data Pipelines: Understand the technical nuances of data preprocessing for structured corporate data versus unstructured text.
Evaluate Model Performance: Move beyond "accuracy" to master Precision-Recall curves, F1-scores, and ROC-AUC for imbalanced Malaysian datasets.

Program Details

Content

Day 1: Statistical Foundations & Predictive Modeling

Module 1: The Modern Data Science Stack

Deconstructing the lifecycle of a DS project – from Exploratory Data Analysis (EDA) to Model Deployment. Understanding the transition from heuristics to stochastic modeling.
Scenario (Manufacturing): An executive team audits a predictive maintenance pipeline, identifying why a Random Forest model outperformed a standard regression in predicting tool failure.
Hands-on: Python-based EDA – using automated libraries to identify data leakage and multicollinearity in a manufacturing sensor dataset.
Expected Impact: Technical clarity on selecting the right algorithm for specific hardware and data constraints.

Module 2: Advanced Supervised & Unsupervised Learning

Deep dive into Gradient Boosting (XGBoost/LightGBM) for structured data and Clustering (K-Means/DBSCAN) for market segmentation.
Demo (Banking/Finance): Using Isolation Forests for real-time fraud detection in credit card transactions, identifying outliers that deviate from “normal” spending clusters.
Hands-on: Building a “Propensity to Buy” model using XGBoost – optimizing hyperparameters to handle imbalanced retail data.
Expected Impact: Capability to lead teams in building high-precision classifiers for commercial ROI.

Module 3: Neural Networks & Deep Learning Foundations

Understanding Tensors, Backpropagation, and Activation Functions. When to move from Scikit-Learn to PyTorch or TensorFlow.
Scenario (E-commerce): Implementing a Convolutional Neural Network (CNN) for automated visual search and product categorization in a high-volume retail app.
Hands-on: Building a simple Multi-Layer Perceptron (MLP) to predict customer lifetime value (CLV) based on multi-dimensional behavioral data.
Expected Impact: Technical intuition on the cost-benefit analysis of DL vs. traditional ML models.

Module 4: Technical PDPA & Data Sovereignty

Implementing Differential Privacy, k-Anonymity, and Federated Learning to protect sensitive Malaysian citizen data.
Scenario (HR/Operations): Engineering a “Privacy-Preserving” talent analytics model that allows for trend analysis without exposing NRIC or PII.
Hands-on: Implementing data masking and hashing protocols within a Python data pipeline to ensure 100% compliance with PDPA 2.0.
Expected Impact: Structural security and legal compliance embedded directly into the code base.

Day 2: Generative AI Engineering & MLOps

Module 5: Large Language Models (LLMs) & Transformers

The Attention Mechanism and Transformer architecture. Fine-tuning vs. Prompt Engineering vs. RAG (Retrieval-Augmented Generation).
Scenario (Legal/Compliance): Architecting a technical summary tool that uses a Pre-trained Transformer to extract “Red Flag” clauses from complex Malaysian corporate contracts.
Hands-on: Utilizing Hugging Face to deploy a local LLM and comparing its performance on Shariah-compliant financial terminology.
Expected Impact: Technical mastery of the “New Stack” of Generative AI.

Module 6: Vector Databases & RAG Architectures

Understanding Semantic Search, Embeddings, and Vector Databases (Pinecone, Weaviate, Milvus). Building the “External Brain” for AI.
Demo (Customer Experience): Connecting a Customer Support Agent to a Vector Database containing 10,000+ technical SOPs to eliminate hallucinations.
Hands-on: Engineering a RAG Pipeline – ingesting corporate PDFs, generating embeddings, and building a query-response loop with “Source Attribution.”
Expected Impact: Elimination of AI hallucinations; high-fidelity autonomous knowledge retrieval.

Module 7: Agentic Workflows & Multi-Agent Systems

Moving beyond linear chains to autonomous agents that can use tools (APIs, SQL, Calculators). Understanding ReAct prompting logic.
Scenario (Supply Chain): An “Inventory Agent” that can autonomously query a SQL database, check port congestion APIs, and write a logistics re-routing plan.
Hands-on: Building a multi-agent “Research Squad” using LangChain or AutoGPT to monitor and report on Bursa Malaysia market shifts.
Expected Impact: Extreme operational scale; transitioning from “Chat” to “Autonomous Action.”

Module 8: Brainstorming High-Impact AI Use Cases & Value Mapping

Model Versioning (DVC), Experiment Tracking (MLflow), and CI/CD for Machine Learning. Monitoring for Data Drift and Model Decay.
The Framework: Prioritizing the “Technical Backlog” based on Inference Latency, Compute Cost, and Business Criticality.
Hands-on: Setting up a Model Monitoring Dashboard to track accuracy degradation over time in a live sales forecasting environment.
Expected Impact: A clear, sustainable roadmap for moving models from “Lab” to “Production.”

List of Deliverables

Technical AI Architecture Blueprints: Design patterns for Hybrid (ML + GenAI) production systems.
Master Python Code Repository: A collection of notebooks covering EDA, XGBoost, CNNs, and RAG implementation.
Vector DB & RAG Configuration Guide: A technical manual for setting up sovereign knowledge bases.

Prerequisites

Technical Knowledge: Basic proficiency in Python and familiarity with SQL is required. Understanding of basic statistics (Mean, Median, Standard Deviation) is assumed.
Essential Equipment: A laptop with an environment like Anaconda, VS Code, or access to Google Colab.
Mindset: A focus on technical rigor, scalability, and long-term architectural stability.

Who Should Attend

Training Methodology

Code-First Lab: 70% of the program is hands-on coding and architectural whiteboarding.
Mathematical Deconstruction: Moving beyond "Black Boxes" to understand the weights and biases of models.
Executive Technical Co-Design: Group sessions to solve actual corporate data bottlenecks using advanced DS patterns.

100% HRDC-Claimable

This program is fully registered and compliant with HRDC (Human Resource Development Corporation) requirements under the SBL-Khas scheme, allowing Malaysian employers to offset the training costs against their levy.

Certification of Completion

Participants who successfully complete the program will be awarded a “Professional Certificate in Data Science Essentials & AI Engineering.“

Post-Workshop Consulting (Optional)

For organizations looking to bridge the gap between training and execution, we offer optional, paid consulting services. These engagements provide expertise and technical support for specific pilot development or full-scale operational integration of the data- and AI-driven use cases established during the program.