Please note SchemaRDD in Spark 1.2 has been replaced by DataFrames in Spark 1.3.

SqlContext can be used to load underlying data in JSON and Parquet format like:

scala> import sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> import sqlContext._
scala> val parquetFile = sqlContext.parquetfile("hdfs://localhost:9000/user/hduser/people.parquet")
scala> val jsonFile = sqlContext.jsonFile("hdfs://localhost:9000/user/hduse/people.json")

This is about loading from external DB, but how about saving in the format you desire? SchemaRDD let’s you do exactly that. Saving to Parquet was already supported using the saveAsParquetFile method. Now saving to JSON is also possible using the toJSON method. Let’s do a small example.

scala> case class Person(first_name:String, last_name:String, gender)
scala> val p = sc.textFile("hdfs://localhost:9000/user/hduser/person").map(_.split("\t")).map(e => Person(e(0),e(1),e(2)))
scala> p.registerAsTable("person")
scala> val jsonFile = p.toJSON
scala> val parquetFile=p.saveAsParquetFile("hdfs://localhost:9000/user/hduser/people.parquet")
Top