Chapter#8
Q: What is a GPU?
A: A GPU (Graphics Processing Unit) is a special type of processor designed to handle many tasks at the same time, especially ones involving images and video. While it was first used for graphics in games and displays, now it's also used for things like AI, scientific research, and big data because it can do thousands of small tasks in parallel, taking the load off the CPU.
Q: Why are GPUs good for parallel operations?
A: GPUs are great at doing many things at once because they have hundreds or even thousands of tiny processing units (called cores). These cores are designed to work together on many parts of a task all at the same time, which makes them perfect for jobs that need lots of simple calculations done quickly.
Q: What is the main architectural difference between a CPU and a GPU regarding cores?
A: CPUs have a small number of powerful cores designed to do complex tasks one at a time very quickly. GPUs, on the other hand, have many more smaller cores that are designed to do simple tasks all at once, making them better for parallel processing.
Q: How do modern GPUs (like NVIDIA’s) achieve high performance with many cores?
A: Modern GPUs reach high performance by using thousands of cores that can run many threads (mini programs) at once. They're especially good at doing math on large sets of data, like in 3D graphics, AI models, or scientific simulations.
Q: What is the basic idea behind GPU evolution for increased parallelism?
A: The main idea is to increase the number of simpler cores that can work on the same instruction across different data at once. This follows a model called SIMD (Single Instruction, Multiple Data), where the same command is run in parallel across lots of data pieces.
Q: Besides graphics, what other field uses GPUs extensively?
A: GPUs are widely used in High-Performance Computing (HPC), including fields like weather prediction, physics simulations, medical imaging, machine learning, and cryptocurrency mining—anywhere you need a lot of fast, repeated calculations.
GPU Components & Architecture
Q: List 6 main components of a GPU.
A: Main GPU components include:
-
Graphics Processor – the main engine doing the work.
-
Frame Buffer – memory that holds the image shown on screen.
-
Video Memory (VRAM) – fast memory used for processing data.
-
Graphics BIOS – firmware that helps start and control the GPU.
-
Display Connectors – ports to connect monitors (like HDMI, DisplayPort).
-
Computer Connectors – the interface (like PCIe) that links GPU to the computer.
Q: What are the key components of GPU cores, and how are they structured for efficiency?
A: Each GPU core has small parts like an ALU (for math), registers (for storing data), and sometimes shared fetch/decode units. These are grouped into Compute Units (CUs), and each CU has Processing Elements (PEs) that carry out the actual tasks. Sharing some components across cores helps save space and power.
GPU Memory Hierarchy
Q: Name the five main memory regions accessible from a single work item on a GPU.
A: The five memory regions are:
-
Registers – fastest, private memory for each thread.
-
Local Memory – shared by threads in a group, very fast.
-
Texture Memory – read-only memory optimized for graphics.
-
Constant Memory – read-only, good for fixed data.
-
Global Memory – biggest memory, accessible by both CPU and GPU, but slower.
Q: What are Registers in the GPU memory hierarchy?
A: Registers are super-fast memory spaces used by individual threads (work-items). Each thread has its own, and they are used for storing temporary results and data being worked on right now.
Q: What is Global Memory on a GPU, and what are its characteristics?
A: Global Memory is the main memory of the GPU. It can store large amounts of data and is shared across all threads. It's slower than registers or local memory, but it has high bandwidth and can be accessed by both the GPU and CPU.
Q: What is Constant Memory, and what are its special properties?
A: Constant Memory is a small memory area that can only be read, not changed. It’s good for data that stays the same, and many threads can quickly read from it.
Q: What is Local Memory in the GPU hierarchy?
A: Local Memory is a small, fast memory that threads in the same group share. It helps them work together by quickly sharing data. It's much faster than Global Memory but has limited space.
Q: When is Texture Memory beneficial?
A: Texture Memory is helpful when your program reads data that’s next to each other, like pixels in an image. It has special caching and filtering that can speed up these types of reads and improve performance.
GPU Programming Concepts (General & OpenCL)
Q: What is the "host" and "device" in GPU computing?
A: In GPU computing, the host is your main computer and CPU, and the device is the GPU. The host controls everything and sends work to the GPU, which then does the heavy lifting.
Q: What is a "Work-Item" in GPU computing?
A: A Work-Item is one small task or thread running on the GPU. Many work-items do the same job at the same time but on different data.
Q: What is a "Work-Group"?
A: A Work-Group is a group of threads (work-items) that run together on the GPU. They can share fast memory and help each other to finish parts of the job.
Q: What is OpenCL?
A: OpenCL (Open Computing Language) is a framework that lets you write code that runs on different kinds of hardware—like CPUs, GPUs, or FPGAs—for parallel computing. It helps you make better use of all your devices.
Q: What is an OpenCL Kernel?
A: An OpenCL Kernel is a function written by the programmer that runs on the GPU or other devices. It gets run many times in parallel—once by each work-item—to process different parts of the data.
Q: When writing OpenCL kernels, why is it important to specify memory address spaces like __global
or __local
?
A: You need to tell OpenCL where the data is stored so it can handle it correctly. For example, __global
means the data is in the GPU’s main memory, while __local
means it's in fast, shared memory. Using the right type makes your code run faster.
Q: What is a "heterogeneous system" in the context of OpenCL?
A: A heterogeneous system has more than one kind of processing unit—like CPUs and GPUs—working together. OpenCL allows them to cooperate by sharing data and tasks.
Q: What does NDRange define in the OpenCL execution model?
A: NDRange defines the total number of work-items (threads) that will run. It sets up a grid (1D, 2D, or 3D) of work-items that each process different parts of the problem.
Q: In an OpenCL kernel, how does an individual work-item typically determine what data to process?
A: Each work-item gets a unique ID using a function like get_global_id()
. That ID tells the thread which part of the data it's supposed to handle.
Q: What is the general role of "Host Code" in OpenCL?
A: Host code runs on the CPU and manages everything. It prepares the GPU code, sends data to the GPU, and tells the GPU when to start working.
Q: List 3 key steps the host code must perform to execute an OpenCL kernel.
A:
-
Find the GPU and create an OpenCL environment (context).
-
Compile the kernel code and prepare it to run.
-
Move data to the GPU, set up the kernel’s inputs, and tell the GPU to run it.
Q: What is "occupancy" on a GPU, and why is it important?
A: Occupancy is a measure of how many threads are actively running on the GPU compared to the maximum it can handle. Higher occupancy usually means better performance, as it helps keep the GPU busy while some threads wait for memory access.
Specific OpenCL Functions
Q: Which OpenCL function is used to discover available devices?
A: clGetDeviceIDs()
is used to find all the OpenCL-compatible devices (like GPUs or CPUs) on your computer that match what you're looking for.
Q: Which OpenCL function is used to get detailed information about a device?
A: clGetDeviceInfo()
lets you ask about a device’s details—like its name, how much memory it has, how many threads it can run, and more.
Q: What is the purpose of clCreateContext()
?
A: clCreateContext()
sets up an OpenCL environment where devices, memory, and other resources are managed. You need this before doing any GPU work.
Q: What does clBuildProgram()
do?
A: clBuildProgram()
compiles your kernel code (written in OpenCL C) so the GPU can understand and run it. It can compile for one or multiple devices at the same time.
Q: How is data typically transferred from host memory to device memory in OpenCL?
A: Data is sent from the CPU to the GPU using functions like clEnqueueWriteBuffer()
. This command copies your data into the GPU’s memory so it can be used in the kernel.
Comments
Post a Comment