The Juggernaut of Public Cloud

For a long time, I was not convinced about the power of the public cloud. Naturally! I, like many others, thought that it was a sideshow, and one which would mostly cater to startups and some medium size companies. However, I discovered I could not be further from the truth. The cloud, from the very More

2016 - The Year of Fast Data

A lot of technologies change so fast that sometimes the name given to them becomes a misnomer. Big data is one such technology. It's no longer big but fast. Most of the enterprises do not have petabytes of data but they have data which moves very fast. In other words out of volume, velocity, More

Streaming-first In-Memory Dataware House

Overview Big Data has reached enough maturity that it is ready to create disruption in the enterprise software industry. The first industry that it is going to disrupt is enterprise data warehousing or EDW. EDW technologies came into foray to separate analytical loads from transaction loads. Since memory used to be expensive until recently, transforming More

Integrating Enterprise Data with Big Data

EMC may not be successful in it's big data strategy but one thing the are successful for sure is coining the term 'Data Lake'. As big data movement is evolving, it's looking more and more like a lake. Gartner in it's most recent hype curve, threw big data out and it created some FUD More

Is data locality really a virtue?

Hadoop has started with data locality as one it's primary features. Compute happens on a node where data is stored, it reduces data which needs to be shuffled over the network. Since every commodity machine has some basic compute power, you do not need specialized hardware and it brings the cost to a fraction More

Spark: JDBC Using DataFrames

For Spark 1.3 onwards, JdbcRDD is not recommended as DataFrames have support to load JDBC. Let us look at a simple example in this recipe. Using JdbcRDD with Spark is slightly confusing, so I thought about putting a simple use case to explain the functionality. Most probably you'll use it with spark-submit but I have put More

