Compute Landscape

Big Data Compute Landscape has been more volatile than the storage counterpart. The reason is simple: the speed. While MapReduce, being the pioneer compute technology in Big Data, paved the way by doing what was impossible before, very soon it became inadequate to fit the needs.

As MapReduce helped do batch compute on teraBytes and petaBytes of data, its strength became its weakness. Users wanted to do much more than batch processing using Big Data. Like:

  • Real-time Analytics
  • Streaming Analytics
  • Query Processing
  • Graph Processing
  • Machine Learning

To address these needs, the market responded with different solutions. Twitter Storm became very popular for handling streams. Mahout became famous library for machine learning. Giraph came for graph processing. Query processing in real time got addressed by a variety of tools, which was more or less based on Google’s Dremel paper:

  • Cloudera Impala
  • Apache Drill
  • Hortonworks Stinger

Though there are companies behind these initiatives, funded up to their neck, all are still open source initiatives. To disrupt all this landscape and provide a single solution for all these needs, Apache Spark arrived and become very popular. Like other technologies, it is still a work in progress, but already leaving every other technology behind .

We at InfoObjects have deep expertise in every aspect of the Big Data Compute Landscape. Landscape is changing fast, but we are keeping pace with it every step of the way. Please contact us at bigdata@infoobjects.com to discuss your compute needs and how we can help.