GPUs are omnipresent ranging from smart phones to tablets to server class GPUs with up to 5760 cores per chip.
CPUs have powerful arithmetic logic units (ALUs) and large caches. A serial task gets executed really fast on a CPU.
GPUs on contrast have much less powerful core about 1/3 of clock speed and smaller caches)
but have a larger number of them
can still deliver very high throughput due to SIMT (single instruction multiple threads).
Threads are grouped for execution on GPUs.
Groups of threads known as Warp (usually size is 32) they execute same instruction so instruction fetch-time from RAM to the processor cache is reduced.
Different threads obviously execute same instruction on different set of data
Types of GPUs
- Does not share anything with the CPU RAM (common)
For processing you need to transfer data from CPU RAM to GPU RAM on PCIe bus
Transfer the compiled code to GPU RAM and then wait for execution to finish
- Heterogeneous System Architecture based SoCs
Shared the RAM
No need to transfer data
JVM like code can benefit from it as data does not need to be serialized on another RAM
Can access all Java objects
Programming Environments & IDEs
CUDA using C/C++ for NVidia GPUs
OpenCL C/C++ (All GPUs)
JavaCL/ScalaCL as native wrappers, Aparapi, Rootbeer
JDK 1.9 Lambdas can execute on HSA based GPUs
GPUs in the world of Big Data
GPU based ML packages
12 GPUs= 60k cores
like gpudb (kinetica), sqream, mapd
Aparapi based APIs to develop spark closures
Aparapi converts Java code to OpenCL and run on GPUs
InfoObjects is working with a bunch of clients for Spark implementations based on GPUs. Please contact us for more details.