Install Rhadoop rhdfs and rmr2 on Hortonworks Sandbox
Big Data Analytics – R, rhdfs, rmr2, Rhive – Tutorial 2
Install rhdfs and rmr2 on Hortonworks Sandbox – Rhdfs rmr2 Installation and Tutorial
In the Previous tutorial Tutorial 1 we learned how to install R, so lets start second part of tutorial
First we need to install dependency packages required for rhdfs, run below command on R console
install.packages(c("rJava ","bitops", "RJSONIO", "digest", "functional", "Rcpp", "stringr", "plyr", "reshape2",”codetools”,"dplyr","R.methodsS3","caTools","Hmisc"))
I prefer to install this packages one by one, by using this way i can dig specific package related errors more easily.
install.packages("rJava ") install.packages("bitops")
and so on ..
Once we are done with the above package installation process follow below steps for rhdfs installation
Download Rhdfs package from github somewhere in your local sandbox.
Donwload link of github-rhadoop
Step 1 – Donwload rhdfs package
After downlading you will see file name something like – rhdfs_1.0.8.tar.gz?raw=true
I just copied this into another tar.gz (rhdfs.tar.gz)
Now we have rhdfs package in our local linux box
R CMD javareconf
This means Java is configured correctly with R
Installing rhdfs package – type below two command on R console for installation
Sys.setenv(HADOOP_CMD="/user/bin/hadoop") install.packages("/root/r_packages_download/rhdfs.tar.gz", repos=NULL, type="source")
Step 2 – Downloading rmr2 package (I have downoaded rmr2 tar in sandbox)
Install rmr2 package
install.packages("/root/r_packages_download/rhdfs.tar.gz", repos=NULL, type="source")
So, here is basic example to test the functionality
You can login to your sandbox using putty session, just for user friendliness
Type ip – 127.0.0.1 and ssh port 2222
Username – root and password – hadoop
Enter into R console and type below code
(Note – Ignore the warnings)
Sys.setenv(HADOOP_CMD="/usr/bin/hadoop") Sys.setenv(HADOOP_HOME="/usr/hdp/current/hadoop-client/") Sys.setenv(HADOOP_STREAMING="/usr/hdp/22.214.171.124-2557/hadoop-mapreduce/hadoop-streaming-126.96.36.199.3.0.0-2557.jar") library(rJava) library(rmr2) library(rhdfs) hdfs.init() input = to.dfs(1:10) driver = mapreduce(input = input, map = function(k, v) cbind(v, 2*v)) tushar <- from.dfs(driver) tushar$val
So Enjoy rhdfs and rmr2 programing on Hortonworks sandbox, Next we will see Rhive installation, Rhive package is use for reading data from hive tables into R, for any errors please post below in comment box.
Keep visiting Toodey.com also like my facebook fan page for latest update, Thanks!