By | January 14, 2016

Install Rhadoop rhdfs and rmr2 on Hortonworks Sandbox

Big Data Analytics – R, rhdfs, rmr2, Rhive – Tutorial 2

Install rhdfs and rmr2 on Hortonworks Sandbox – Rhdfs rmr2 Installation and Tutorial

In the Previous tutorial Tutorial 1 we learned how to install R, so lets start second part of tutorial

First we need to install dependency packages required for rhdfs, run below command on R console

install.packages(c("rJava ","bitops", "RJSONIO", "digest", "functional", "Rcpp", "stringr", "plyr", "reshape2",”codetools”,"dplyr","R.methodsS3","caTools","Hmisc"))

I prefer to install this packages one by one, by using this way i can dig specific package related errors more easily.

Example –

install.packages("rJava ")
install.packages("bitops")

and so on ..

Once we are done with the above package installation process follow below steps for rhdfs installation

Download Rhdfs package from github somewhere in your local sandbox.

Donwload link of github-rhadoop

Step 1 – Donwload rhdfs package

wget https://github.com/RevolutionAnalytics/rhdfs/blob/master/build/rhdfs_1.0.8.tar.gz?raw=true

After downlading you will see file name something like – rhdfs_1.0.8.tar.gz?raw=true

I just copied this into another tar.gz (rhdfs.tar.gz)

Now we have rhdfs package in our local linux box

Now type

R CMD javareconf

This means Java is configured correctly with R




Installing rhdfs package – type below two command on R console for installation

Sys.setenv(HADOOP_CMD="/user/bin/hadoop")
install.packages("/root/r_packages_download/rhdfs.tar.gz", repos=NULL, type="source")

Step 2 – Downloading rmr2 package (I have downoaded rmr2 tar in sandbox)

wget https://github.com/RevolutionAnalytics/rmr2/releases/download/3.3.1/rmr2_3.3.1.tar.gz

Install rmr2 package

install.packages("/root/r_packages_download/rhdfs.tar.gz", repos=NULL, type="source")

Done

So, here is basic example to test the functionality

You can login to your sandbox using putty session, just for user friendliness

Type ip – 127.0.0.1 and ssh port 2222

Username – root and password – hadoop

Enter into R console and type below code

(Note – Ignore the warnings)

Sys.setenv(HADOOP_CMD="/usr/bin/hadoop")
Sys.setenv(HADOOP_HOME="/usr/hdp/current/hadoop-client/")
Sys.setenv(HADOOP_STREAMING="/usr/hdp/2.3.0.0-2557/hadoop-mapreduce/hadoop-streaming-2.7.1.2.3.0.0-2557.jar")
library(rJava)
library(rmr2)
library(rhdfs)
hdfs.init()
input = to.dfs(1:10)
driver = mapreduce(input = input, map = function(k, v) cbind(v, 2*v))
tushar <- from.dfs(driver)
tushar$val

 

Final output

So Enjoy rhdfs and rmr2 programing on Hortonworks sandbox, Next we will see Rhive installation, Rhive package is use for reading data from hive tables into R, for any errors please post below in comment box.

Keep visiting Toodey.com also like my facebook fan page for latest update, Thanks!

One thought on “

  1. anandajayam

    The below error when i run the r script

    17/08/10 16:01:09 ERROR streaming.StreamJob: Job not successful. Error: Task failed task_1502353587113_0003_m_000000
    Job failed as tasks failed. failedMaps:1 failedReduces:0

    17/08/10 16:01:09 INFO streaming.StreamJob: killJob…
    17/08/10 16:01:09 INFO impl.YarnClientImpl: Killed application application_1502353587113_0003
    Streaming Command Failed!
    Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
    hadoop streaming failed with error code 1
    Calls: mapreduce -> mr
    Execution halted
    17/08/10 16:01:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    17/08/10 16:01:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    17/08/10 16:01:14 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
    Deleted /tmp/file2e7d6ec8e6c7
    17/08/10 16:01:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    17/08/10 16:01:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    17/08/10 16:01:18 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
    Deleted /tmp/file2e7d4ff82274

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *