Consultancy and Total Solutions Training Provider for Embedded Systems, Electronics and Electrical Engineering, Programming, Computing, Operations, ISO9000, ISO14000 and Management.

Bridging the Gap

Training Courses

Heterogeneous Programming with GPGPU

Course id: 0026


The digital age heralds the need for vast data processing and number crunching capabilities to satisfy our insatiable needs. With the advent of General Purpose Graphic Processing Units (GPGPU), the standard modern computer now fulfills some of our needs with their highly parallelized computing structures. The GPGPU’s number crunching and data processing capabilities is significantly greater than CPUs due to their arithmetic throughput and memory bandwidth, making them ideal to accelerate a variety of data parallel applications.

Using GPGPUs for computation is not as straightforward as using CPUs, however. The programmer is required to know the general architecture and workings of the GPGPU, thread synchronization issues, memory issues and other issues generally taken for granted on standard CPU-based platforms. The control of GPGPUs also necessitates knowing programming languages such as CUDA, STREAM or OpenCL.

Course highlight
This course covers the essentials of GPGPU programming and focuses on developing structured programs and algorithms for efficient execution with CUDA.

What you will learn

This course concentrates on the theoretical and practical knowledge covering the following main topics:
  • GPGPU structure
  • CUDA overview
  • Kernel creation
  • Memory structures
  • Multithreading
  • Optimization

Who should attend

Engineers and researchers who wish to fully utilize the power of modern GPUs for time critical, computationally intensive applications.


Participants must be familiar with the C programming language. An understanding of parallel programming is beneficial.

Course methodology

This course is presented in a workshop style with lectures interlaced with demonstrations and practicals for maximum understanding.

Course duration

3 days.

Course structure

  • Introduction
    • History of parallel and vector computing
    • Evolution of GPU
    • Overview of NVidia
  • Overview of CUDA
    • API introduction
    • Differences from ANSI C
    • Hardware abstraction
    • Compilation
  • Hands-on practical 1: Hello CUDA World
  • Kernel:
    • Creation
    • Execution
  • Hands-on practical 2: PI Computation Kernel
  • Memory:
    • Structures
    • Performance
    • Coalesced memory access
  • Hands-on practical 3: Matrix Multiplication
  • Multithreading:
    • Overview
    • Inter thread communication
    • Implementation
  • Hands-on practical 4: Threaded Matrix Multiplication
  • Optimization:
    • Shared memory
    • Coalesced accesses
    • Loop unrolling
  • Hands-on practical 5: Optimising Neural Networks

Course Schedule





News on ProvenPac

  ProvenPac Sdn. Bhd.
  C-4-3 Gembira Park,
  Jalan Riang, 58200
  Kuala Lumpur, Malaysia

  Tel: +603 03 5889 5889

No public course
currently scheduled.


Please inform me when
this course is scheduled.


Please contact me to
arrange in-house training.