Configure and docker compose in 5 minutes for docker images


Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services.

Whenever one has to run more than a container and have them to communicate with another, Docker Compose comes to the rescue.

The core of docker compose can be understood with the below example.

Let’s say, we want to build an application with three services

  • myservice_1 — its some JVM application that uses 2g of memory and needs to communicate with redis (using tcp) and myservice_2 (using…

Quick yet gentle introduction in get Kafka running

In this article I am using Kafka 2.8.0 for client and server. Hence one may notice some discrepancy with the use of Zookeeper. This is due to KIP-500 to replace Zookeeper with self-managed quorum.


The best way to introduce Kafka is by installing one. Currently I have a way to install a 3-node Kafka cluster using helm for kubernetes . In the future, I shall attempt to add steps for bare-metal and/or for other cloud vendors.

I have the following in config.yaml

replicaCount: 3
nodeSelector: |
node-type: 2-cores

With the above config…

Many times one would have a need to template-ize a volume and would like to clone it for every new pod creation. ‘Volume Snapshot’ is the name for this.

In this article I have put together the steps required to

  • create a master template as PVC
  • create a volume snapshot
  • restore a volume snapshot as PVC

In order to use VolumeSnapshot one needs to enable CSI in their kubernetes. In this attempt of my article I have tested this on GKE. And one requires to run atleast 1.17.x version of kubernetes master plane.

Note (1): I have biased this article…

Customize Apache Spark 3.1.1 to work with S3 / GCS

Apache Spark 3.0.1 can be built from source code along with
(1) AWS specific binaries to enable reading and writing to s3
(2) GCP specific binaries to enable reading and writing to gcs
(3) Azure — at the time of writing this article hadoop-azure ( does not provide any OOTB shaded jar. — TBD (add steps to generate one), until then stay tuned 😊

Step 1: Building Spark from source

Here, we build spark from source. This step can take 20+ mins to run. …

A developer’s guide to setting up Vault in kubernetes and using it with kv-store for secrets and userpass access.

In this brief write-up, I shall try to provide a quick way to get Vault up and running from a running GKE cluster.


Installation of vault can be simplified using helm. There is an official helm chart for this at

Step 1: Add the helm chart. Please note helm 3.0 is recommended.

helm repo add hashicorp

Step 2: Installing the chart.
The default installation might work fine for most people. In my case, I use GKE and I use nodeSelector


What is Closure?

A closure is the combination of a function and the lexical environment within which that function was declared.

The reason it is called a “closure” is that an expression containing free variables is called an “open” expression, and by associating to it the bindings of its free variables, you close it

The concept of closure comes from Lambda Calculus (also written as λ-calculus). λ-calculus is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. …

Deploying elasticsearch using kubernetes

3-Node Client, Data and Master deployment of ES

In this article, I would like to provide an example of using StatefulSet to deploy an elasticsearch cluster.

The configuration for this setup requires

  • An headless service (for intra-node communication)
  • A LoadBalancer service (for providing REST endpoint to outside world) using Client Nodes only.
  • A StatefulSet for Master node(s).
  • A StatefulSet for Data Nodes.
  • A StatefulSet for Client Nodes.

Also, one needs to note that, ElasticSearch 7.x has made some major changes to elasticsearch.yaml w.r.t cluster configuration.

Lets break this configuration into three steps based on the above description.

Each of these steps can be executed…

Understanding how storage works

Storage concepts from k8 documentation

Kubernetes, a container orchestration engine had been built for stateless systems. These are generally the kinds of applications we commonly build.

applying deployment configuration for applications does help with this effectively. But, there may be cases where one wants to preserve state in a pod.

Configure Apache Spark with Kubernetes

Many people like to use k8 for the clustering and scaling capabilities. And many other people like to use Apache Spark for big data processing in a cluster.

In order to be able to get best of both worlds, a new experimental resource-manager has been added to apache-spark ( Its scheduler looks as simple as spark-standalone, yet it provides resiliency at the executor level (to reschedule it onto another pod during failure).

Other alternatives from certain cloud providers exists. And these are not free and are at times loaded with features that are never used.

HDFS as dfs deployed to local for development and testing.


The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. It employs a NameNode and DataNode architecture to implement a distributed file system that provides high-performance access to data across highly scalable Hadoop clusters.

In this article we shall focus on NameNode + DataNode as a single node cluster setup.

In an ideal world, we should be using the “Single Node Setup Instructions” and should be able to deploy an HDFS cluster from

Except that, it doesn’t work well with docker. I hope…


Listener and reader

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store