Watch for changes in HDFS
1 min readSep 10, 2019
Want to be a “File Watcher” in HDFS cluster using a JVM application?
In this article I would like to give a quick code watch for changes happening in HDFS.
HDFS provides a feature called inotify. This provides a way to monitor for changes to files and/or directories in HDFS.
import java.net.URI
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hdfs.client.HdfsAdmin
import org.apache.hadoop.hdfs.inotify.Event.{AppendEvent, CreateEvent, RenameEvent}
object HDFSTest extends App {
val admin = new HdfsAdmin( URI.create( "hdfs://namenode:port" ), new Configuration() )
val eventStream = admin.getInotifyEventStream()
while( true ) {
val events = eventStream.poll(2l, java.util.concurrent.TimeUnit.SECONDS)
events.getEvents.toList.foreach { event ⇒
println(s"event type = ${event.getEventType}")
event match {
case create: CreateEvent ⇒
println("CREATE: " + create.getPath)
case rename: RenameEvent ⇒
println("RENAME: " + rename.getSrcPath + " => " + rename.getDstPath)
case append: AppendEvent ⇒
println("APPEND: " + append.getPath)
case other ⇒
println("other: " + other)
}
}
}
}
The following line can be changed to playback from a last-read-tx-id
.
val eventStream = admin.getInotifyEventStream()
💡Note:
(1) In case if one wants to impersonate a hadoop-user, then setup an environment variable HADOOP_USER_NAME
to point to the required username.
(2) It is expected that the above code needs to be executed by a hadoop user with admin role.