Reading Hbase table from Spark

 For reading Hbase table from Spark, here we will show with the help of Hortonworks Spark-Hbase connector. 

We need to define a Catalog for each Hbase table. It will be in Json Format.It will define the mapping between Hbase columns and table schema.

E.g. We have a Hbase table called 'video-creator'.

To access this table through Spark.

In Spark shell, we need to run the below hortonworks spark package:

spark-shell --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/

And import the below:

import org.apache.spark.sql.{SQLContext, _}

import org.apache.spark.sql.execution.datasources.hbase._

Now we can define a catalog which is hbase table details in jason format:

def catalog =

      s"""{

         |"table":{"namespace":"default", "name":"video-creator"},

         |"rowkey":"key",

         |"columns":{

         |"rowkey":{"cf":"rowkey", "col":"key", "type":"string"},

         |"vidcrt":{"cf":"vidcrt", "col":"creator_id", "type":"string"}

         |}

         |}""".stripMargin

Here table name is 'video-creator' and column family is 'vidcrt' and column 'creator_id'.

Then Dataframe on top of the Hbase table:

def withCatalog(cat: String): DataFrame = {

    spark.sqlContext

    .read

    .options(Map(HBaseTableCatalog.tableCatalog->cat))

    .format("org.apache.spark.sql.execution.datasources.hbase")

    .load()

 }

We will define the Dataframe by passing the catalog object:

val df_video_creator = withCatalog(catalog)

In Spark shell:

Here i am using Spark version 2.1.1, Hbase 1.1.2 and Scala 2.11

DataFrame 


Oldest