For Spark 1.3 onward, JdbcRDD is not recommended as DataFrames have support to load JDBC.

Let us look at a simple example in this recipe.

Using JdbcRDD with Spark is slightly confusing, so I thought about putting a simple use case to explain the functionality. Most probably you’ll use it with spark-submit but I have put it here in spark-shell to illustrate easily.

Database Preparation

We are going to load a person table in MySQL database. Here’s DDL:

CREATE TABLE `person` (
`person_id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(30) DEFAULT NULL,
`last_name` varchar(30) DEFAULT NULL,
`gender` char(1) DEFAULT NULL,
PRIMARY KEY (`person_id`)
)

The following is an example of some data which is loaded:

 

first_name last_name gender
Barack Obama M
Bill Clinton M
Hillary Clinton F

 

I am a fan of oversimplification when it comes to examples, as you can see.

JDBC Driver

We have to make the MySQL JDBC driver available to spark-shell. I am using:

mysql-connector-java-5.1.34-bin.jar

You can download it from http://dev.mysql.com/downloads/connector/j/ if it’s not already available.

Get Hands dirty on spark-shell

Load Spark shell with MySQL driver and launch it.

$spark-shell --driver-class-path /path-to-mysql-jar/mysql-connector-java-*.*.**-bin.jar --jars /path-to-mysql-jar/mysql-connector-java-*.*.**-bin.jar

Load DataFrame

scala> val personDF = sqlContext.load("jdbc",Map("url" -> "jdbc:mysql://localhost:3306/hadoopdb?user=hduser&password=******","dbtable" -> "person"))

Now query the results:

scala> personDF.foreach(println)

Write it to HDFS:

scala> personDF.saveAsTextFile("person")

InfoObjects Spark Sandbox comes fully-loaded with Hadoop, Spark and more than 10 other ecosystem technologies. Please email bigdata@infoobjects.com to request download access to the Sandbox.

Top