Wednesday, October 1, 2014

Import data from HDFS to HBase


There are 2 ways to directly import data from HDFS to HBase

1. By Running MapReduce Program on Eclipse.

1) Make a new Java Project whose classpath is as follow:

<?xml version="1.0" encoding="UTF-8"?>
<classpath>
  <classpathentry kind="src" path="src"/>
  <classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/hbase-0.94.5.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/lib/commons-cli-1.2.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/lib/commons-logging-1.1.1.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/lib/commons-configuration-1.6.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/lib/log4j-1.2.16.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/lib/zookeeper-3.4.5.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hadoop-1.0.4/hadoop-core-1.0.4.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hadoop-1.0.4/lib/commons-lang-2.4.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/lib/slf4j-log4j12-1.4.3.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/lib/slf4j-api-1.4.3.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/lib/protobuf-java-2.4.0a.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/lib/jackson-mapper-asl-1.8.8.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/lib/jackson-core-asl-1.8.8.jar"/>
  <classpathentry kind="lib" path="/home/hadoop/hbase-0.94.5/lib/commons-httpclient-3.1.jar"/>
  <classpathentry kind="output" path="bin"/>
</classpath>

2) Set the program argument to point the input file location / input file URI on HDFS:

                hdfs://master:54310/home/input/weatherData
              hdfs://cssec164.nda.ac.jp:54310/home/input/weatherData
             
3) Run the Program, and as a result the designated data / files will be loaded to a table in HBase cluster.

2. By Running MapReduce Program through Command Line.

1) Having made sure that the program run normally on Eclipse, copy the class files in the ProjectName/bin directory to a location on Linux.




2) Make a jar file from the class files. BUT, before proceed it, make a text file which will be used as the MANIFEST of the jar files. This text file will contain the program MAIN CLASS name and the CLASSPATH as follow (hbaseClassPath.txt) :

Manifest-Version: 1.0
Main-Class: temperatureData.HBaseTemperatureImporter
Class-Path: /home/hadoop/hbase-0.94.5/hbase-0.94.5.jar /home/hadoop/hb
 ase-0.94.5/lib/commons-cli-1.2.jar /home/hadoop/hbase-0.94.5/lib/comm
 ons-logging-1.1.1.jar /home/hadoop/hbase-0.94.5/lib/commons-configura
 tion-1.6.jar /home/hadoop/hbase-0.94.5/lib/log4j-1.2.16.jar /home/had
 oop/hbase-0.94.5/lib/zookeeper-3.4.5.jar /home/hadoop/hadoop-1.0.4/ha
 doop-core-1.0.4.jar /home/hadoop/hadoop-1.0.4/lib/commons-lang-2.4.ja
 r /home/hadoop/hbase-0.94.5/lib/slf4j-log4j12-1.4.3.jar /home/hadoop/
 hbase-0.94.5/lib/slf4j-api-1.4.3.jar /home/hadoop/hbase-0.94.5/lib/pr
 otobuf-java-2.4.0a.jar /home/hadoop/hbase-0.94.5/lib/jackson-mapper-a
 sl-1.8.8.jar /home/hadoop/hbase-0.94.5/lib/jackson-core-asl-1.8.8.jar
  /home/hadoop/hbase-0.94.5/lib/commons-httpclient-3.1.jar

The Class-Path variable must at 70 characters width (except the last line) and must be started with a space in every new line (except the first line).

Make a jar file from the class files in bin directory using hbaseClassPath.txt to set the MANIFEST of the jar file.

jar -cvfm HBaseTemperatureImporter.jar hbaseClassPath.txt -C bin/ .

Execute the jar file to import data/files from HDFS to HBase

java -jar HBaseTemperatureImporter.jar hdfs://master:54310/home/input/weatherData

No comments: