Logstash webhdfs example with Kerberos enabled hadoop cluster

By | February 26, 2016

In the previous tutorial Part 1,  we have discussed about how to read messages from log file and send this messages to rabbitmq queue
In this tutorial we will discuss the logstash webhdfs plugin used for storing logs into hdfs, we will read logs from rabbitmq queue and store this messages into hadoop directory.

Note – Recommended or tested with logstash 2.x or higher for webhdfs plugin.





Step 1 – Installing logstash webhdfs plugin
Please type below command on your command prompt, this will install the logstash webhdfs plugin
/opt/logstash/bin/plugin install logstash-output-webhdfs
Step 2 – If Hadoop Kerberos is not enabled 

In your case if Hadoop Kerberos is not enabled then please type below configuration and restart the logstash service.

input{
   rabbitmq {
                host => "127.0.0.1"
                queue => "tomcatq"
                durable => true
                key => "tomcatkey"
                exchange => "tomcatex"
                threads => 1
                prefetch_count => 50
                port => 5672
                user => "admin"
                password => "secret12"
        }
}
filter{
}
output{
if [type] == "tomcat" {
        webhdfs {
            codec => json_lines
            host => "namenode"
            port => 14000
            path => "/user/logstash/tomcat _%{+YYYY-MM-dd}.log"  # (required)
            user => "hdfs"
            use_httpfs => true
            workers => 1
            retry_times => 5
        }
   }
}

Further steps are for the users having kerberos enabled hadoop cluster, in this case some additional steps we need to do.

 

Step 3 – Edit the webhdfs_helper.rb file available in below path

 

vi /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-2.0.2/lib/logstash/outputs/webhdfs_helper.rb
Search this function in file “def prepare_client
Change this line, remove username from parameter.
client = WebHDFS::Client.new(host, port)
Add below two lines which will enable the Kerberos client and user key.
client.kerberos = true

client.kerberos_keytab = "/etc/security/keytabs/hdfs.headless.keytab"
Your file will look like this

# Setup a WebHDFS client
# @param host [String] The WebHDFS location
# @param port [Number] The port used to do the communication
# @param username [String] A valid HDFS user
# @return [WebHDFS] An setup client instance
def prepare_client(host, port, username)

# client = WebHDFS::Client.new(host, port, username)

client = WebHDFS::Client.new(host, port)
client.kerberos = true
client.kerberos_keytab = "/etc/security/keytabs/hdfs.headless.keytab"

client.httpfs_mode = @use_httpfs
client.open_timeout = @open_timeout
client.read_timeout = @read_timeout
client.retry_known_errors = @retry_known_errors
client.retry_interval = @retry_interval if @retry_interval
client.retry_times = @retry_times if @retry_times
client
end

 

Step 4 – Install gssapi gem for logstash to use Kerberos
export PATH=/opt/logstash/vendor/jruby/bin:$PATH

export GEM_HOME=/opt/logstash/vendor/bundle/jruby/1.9/

cd /opt/logstash/vendor/jruby

bin/gem install gssapi

 

Step 5 – copy gssapi into webhdfs lib folder

cp –r /opt/logstash/vendor/bundle/jruby/1.9/gems/gssapi-1.2.0/lib/* /opt/logstash/vendor/bundle/jruby/1.9/gems/webhdfs-0.8.0/lib/

 

Step 6 – Restart the logstash service and Enjoy!

service logstash restart

 

During Logstash and Kerberos hadoop I have faced a lots of problem, so don’t get worried if anything wrong happen, just type errors in comment box and I will help you out.

 




Leave a Reply

Your email address will not be published. Required fields are marked *