Spark: Calculating Correlation Using RDD of Vectors

Spark: Calculating Correlation Using RDD of Vectors

Correlation is a relationship among two variables so if one changes, other also changes. Correlation measures how strong this change is from 0 t0 1. 0 means there is no correlation at all while one means perfect correlation i.e. if first variable become double, second...
Spark: Connecting to Amazon EC2

Spark: Connecting to Amazon EC2

Apache Spark installation comes bundled with spark-ec2 script which makes it easy to create Spark instances on EC2. This recipe will cover connecting to EC2 using this script. Login to your Amazon AWS account Click on Security Credentials under your account name in...
Spark: Connecting To A JDBC Data-Source Using Dataframes

Spark: Connecting To A JDBC Data-Source Using Dataframes

So far in Spark, JdbcRDD has been the right way to connect with a relational data source. In Spark 1.4 onwards there is an inbuilt datasource available to connect to a jdbc source using dataframes. Dataframe Spark introduced dataframes in version 1.3 and enriched...
Top