Data Science Essentials

Program Description

While this outline serves as a foundational framework with use cases from multiple industries and functions, the final program is fully customized to your industry and internal workflows.

Participants work on real-world problems, not generic examples. We engage in a pre-workshop alignment to inject your specific organizational datasets, pain points, and proprietary use cases directly into the curriculum.

Learning Objectives

Program Details

Content

Day 1: Statistical Foundations & Predictive Modeling

  • Deconstructing the lifecycle of a DS project – from Exploratory Data Analysis (EDA) to Model Deployment. Understanding the transition from heuristics to stochastic modeling.
  • Scenario (Manufacturing): An executive team audits a predictive maintenance pipeline, identifying why a Random Forest model outperformed a standard regression in predicting tool failure.
  • Hands-on: Python-based EDA – using automated libraries to identify data leakage and multicollinearity in a manufacturing sensor dataset.
  • Expected Impact: Technical clarity on selecting the right algorithm for specific hardware and data constraints.
  • Deep dive into Gradient Boosting (XGBoost/LightGBM) for structured data and Clustering (K-Means/DBSCAN) for market segmentation.
  • Demo (Banking/Finance): Using Isolation Forests for real-time fraud detection in credit card transactions, identifying outliers that deviate from “normal” spending clusters.
  • Hands-on: Building a “Propensity to Buy” model using XGBoost – optimizing hyperparameters to handle imbalanced retail data.
  • Expected Impact: Capability to lead teams in building high-precision classifiers for commercial ROI.
  • Understanding Tensors, Backpropagation, and Activation Functions. When to move from Scikit-Learn to PyTorch or TensorFlow.
  • Scenario (E-commerce): Implementing a Convolutional Neural Network (CNN) for automated visual search and product categorization in a high-volume retail app.
  • Hands-on: Building a simple Multi-Layer Perceptron (MLP) to predict customer lifetime value (CLV) based on multi-dimensional behavioral data.
  • Expected Impact: Technical intuition on the cost-benefit analysis of DL vs. traditional ML models.
  • Implementing Differential Privacy, k-Anonymity, and Federated Learning to protect sensitive Malaysian citizen data.
  • Scenario (HR/Operations): Engineering a “Privacy-Preserving” talent analytics model that allows for trend analysis without exposing NRIC or PII.
  • Hands-on: Implementing data masking and hashing protocols within a Python data pipeline to ensure 100% compliance with PDPA 2.0.
  • Expected Impact: Structural security and legal compliance embedded directly into the code base.

Day 2: Generative AI Engineering & MLOps

  • The Attention Mechanism and Transformer architecture. Fine-tuning vs. Prompt Engineering vs. RAG (Retrieval-Augmented Generation).
  • Scenario (Legal/Compliance): Architecting a technical summary tool that uses a Pre-trained Transformer to extract “Red Flag” clauses from complex Malaysian corporate contracts.
  • Hands-on: Utilizing Hugging Face to deploy a local LLM and comparing its performance on Shariah-compliant financial terminology.
  • Expected Impact: Technical mastery of the “New Stack” of Generative AI.
  • Understanding Semantic Search, Embeddings, and Vector Databases (Pinecone, Weaviate, Milvus). Building the “External Brain” for AI.
  • Demo (Customer Experience): Connecting a Customer Support Agent to a Vector Database containing 10,000+ technical SOPs to eliminate hallucinations.
  • Hands-on: Engineering a RAG Pipeline – ingesting corporate PDFs, generating embeddings, and building a query-response loop with “Source Attribution.”
  • Expected Impact: Elimination of AI hallucinations; high-fidelity autonomous knowledge retrieval.
  • Moving beyond linear chains to autonomous agents that can use tools (APIs, SQL, Calculators). Understanding ReAct prompting logic.
  • Scenario (Supply Chain): An “Inventory Agent” that can autonomously query a SQL database, check port congestion APIs, and write a logistics re-routing plan.
  • Hands-on: Building a multi-agent “Research Squad” using LangChain or AutoGPT to monitor and report on Bursa Malaysia market shifts.
  • Expected Impact: Extreme operational scale; transitioning from “Chat” to “Autonomous Action.”
  • Model Versioning (DVC), Experiment Tracking (MLflow), and CI/CD for Machine Learning. Monitoring for Data Drift and Model Decay.
  • The Framework: Prioritizing the “Technical Backlog” based on Inference Latency, Compute Cost, and Business Criticality.
  • Hands-on: Setting up a Model Monitoring Dashboard to track accuracy degradation over time in a live sales forecasting environment.
  • Expected Impact: A clear, sustainable roadmap for moving models from “Lab” to “Production.”
Data Analytics Training for IT Professionals

List of Deliverables

Prerequisites

Who Should Attend

Training Methodology

100% HRDC-Claimable

This program is fully registered and compliant with HRDC (Human Resource Development Corporation) requirements under the SBL-Khas scheme, allowing Malaysian employers to offset the training costs against their levy.

Certification of Completion

Participants who successfully complete the program will be awarded a “Professional Certificate in Data Science Essentials & AI Engineering.

Post-Workshop Consulting (Optional)

For organizations looking to bridge the gap between training and execution, we offer optional, paid consulting services. These engagements provide expertise and technical support for specific pilot development or full-scale operational integration of the data- and AI-driven use cases established during the program.

Contact us for In-House Training

    * All fields are required