GPUs are omnipresent ranging from smart phones to tablets to server class GPUs with up to 5760 cores per chip.

CPUs have powerful arithmetic logic units (ALUs) and large caches. A serial task gets executed really fast on a CPU.

GPUs on contrast have much less powerful core about 1/3 of clock speed and smaller caches)
but have a larger number of them
can still deliver very high throughput due to SIMT (single instruction multiple threads).

SIMT

Threads are grouped for execution on GPUs.
Groups of threads known as Warp (usually size is 32) they execute same instruction so instruction fetch-time from RAM to the processor cache is reduced.
Different threads obviously execute same instruction on different set of data

Types of GPUs

  • Does not share anything with the CPU RAM (common)
  • For processing you need to transfer data from CPU RAM to GPU RAM on PCIe bus
    Transfer the compiled code to GPU RAM and then wait for execution to finish

  • Heterogeneous System Architecture based SoCs
  • Shared the RAM
    No need to transfer data
    JVM like code can benefit from it as data does not need to be serialized on another RAM
    Can access all Java objects

Programming Environments & IDEs
CUDA using C/C++ for NVidia GPUs
OpenCL C/C++ (All GPUs)
JavaCL/ScalaCL as native wrappers, Aparapi, Rootbeer
JDK 1.9 Lambdas can execute on HSA based GPUs

GPUs in the world of Big Data
GPU based ML packages
Analytic DBs
12 GPUs= 60k cores
like gpudb (kinetica), sqream, mapd
Deep Learning
Image Classification
Speech Recognition
Genomics, DNA
SparkCL
Aparapi based APIs to develop spark closures
Aparapi converts Java code to OpenCL and run on GPUs

InfoObjects is working with a bunch of clients for Spark implementations based on GPUs. Please contact us for more details.