Watch for changes in HDFS

Programmer
1 min readSep 10, 2019

--

Want to be a “File Watcher” in HDFS cluster using a JVM application?

In this article I would like to give a quick code watch for changes happening in HDFS.

HDFS provides a feature called inotify. This provides a way to monitor for changes to files and/or directories in HDFS.

import java.net.URI

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hdfs.client.HdfsAdmin
import org.apache.hadoop.hdfs.inotify.Event.{AppendEvent, CreateEvent, RenameEvent}


object HDFSTest extends App {
val admin = new HdfsAdmin( URI.create( "hdfs://namenode:port" ), new Configuration() )
val eventStream = admin.getInotifyEventStream()

while( true ) {
val events = eventStream.poll(2l, java.util.concurrent.TimeUnit.SECONDS)
events.getEvents.toList.foreach { event ⇒
println(s"event type = ${event.getEventType}")
event match {
case create: CreateEvent ⇒
println("CREATE: " + create.getPath)

case rename: RenameEvent ⇒
println("RENAME: " + rename.getSrcPath + " => " + rename.getDstPath)

case append: AppendEvent ⇒
println("APPEND: " + append.getPath)

case other ⇒
println("other: " + other)
}
}
}
}

The following line can be changed to playback from a last-read-tx-id.

val eventStream = admin.getInotifyEventStream()

💡Note:
(1) In case if one wants to impersonate a hadoop-user, then setup an environment variable HADOOP_USER_NAME to point to the required username.
(2) It is expected that the above code needs to be executed by a hadoop user with admin role.

--

--

Programmer
Programmer

No responses yet