Configure and docker compose in 5 minutes for docker images
Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services.
Whenever one has to run more than a container and have them to communicate with another,
Docker Compose comes to the rescue.
The core of docker compose can be understood with the below example.
Let’s say, we want to build an application with three services
myservice_1— its some JVM application that uses 2g of memory and needs to communicate with
redis(using tcp) and
Quick yet gentle introduction in get Kafka running
In this article I am using Kafka 2.8.0 for client and server. Hence one may notice some discrepancy with the use of Zookeeper. This is due to
KIP-500 to replace Zookeeper with self-managed quorum.
The best way to introduce Kafka is by installing one. Currently I have a way to install a 3-node Kafka cluster using
kubernetes . In the future, I shall attempt to add steps for bare-metal and/or for other cloud vendors.
I have the following in config.yaml
With the above config…
Many times one would have a need to template-ize a volume and would like to clone it for every new pod creation. ‘Volume Snapshot’ is the name for this.
In this article I have put together the steps required to
In order to use VolumeSnapshot one needs to enable CSI in their kubernetes. In this attempt of my article I have tested this on GKE. And one requires to run atleast 1.17.x version of kubernetes master plane.
Note (1): I have biased this article…
Customize Apache Spark 3.1.1 to work with S3 / GCS
Apache Spark 3.0.1 can be built from source code along with
(1) AWS specific binaries to enable reading and writing to
(2) GCP specific binaries to enable reading and writing to
(3) Azure — at the time of writing this article
hadoop-azure (https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-azure/3.2.1) does not provide any OOTB shaded jar. — TBD (add steps to generate one), until then stay tuned 😊
Here, we build spark from source. This step can take 20+ mins to run. …
A developer’s guide to setting up Vault in kubernetes and using it with kv-store for secrets and userpass access.
In this brief write-up, I shall try to provide a quick way to get Vault up and running from a running GKE cluster.
Installation of vault can be simplified using helm. There is an official helm chart for this at https://www.vaultproject.io/docs/platform/k8s/helm
Step 1: Add the helm chart. Please note helm 3.0 is recommended.
helm repo add hashicorp https://helm.releases.hashicorp.com
Step 2: Installing the chart.
The default installation might work fine for most people. In my case, I use GKE and I use
A closure is the combination of a function and the lexical environment within which that function was declared.
The reason it is called a “closure” is that an expression containing free variables is called an “open” expression, and by associating to it the bindings of its free variables, you close it
The concept of closure comes from Lambda Calculus (also written as λ-calculus). λ-calculus is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. …
Deploying elasticsearch using kubernetes
In this article, I would like to provide an example of using
StatefulSet to deploy an elasticsearch cluster.
The configuration for this setup requires
StatefulSetfor Master node(s).
StatefulSetfor Data Nodes.
StatefulSetfor Client Nodes.
Also, one needs to note that, ElasticSearch 7.x has made some major changes to
elasticsearch.yaml w.r.t cluster configuration.
Lets break this configuration into three steps based on the above description.
Each of these steps can be executed…
Understanding how storage works
Kubernetes, a container orchestration engine had been built for stateless systems. These are generally the kinds of applications we commonly build.
deployment configuration for applications does help with this effectively. But, there may be cases where one wants to preserve state in a pod.
Configure Apache Spark with Kubernetes
Many people like to use k8 for the clustering and scaling capabilities. And many other people like to use Apache Spark for big data processing in a cluster.
In order to be able to get best of both worlds, a new experimental
resource-manager has been added to
apache-spark (https://github.com/apache/spark/tree/master/resource-managers/kubernetes). Its scheduler looks as simple as spark-standalone, yet it provides resiliency at the executor level (to reschedule it onto another pod during failure).
Other alternatives from certain cloud providers exists. And these are not free and are at times loaded with features that are never used.
HDFS as dfs deployed to local for development and testing.
The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. It employs a NameNode and DataNode architecture to implement a distributed file system that provides high-performance access to data across highly scalable Hadoop clusters.
In this article we shall focus on NameNode + DataNode as a single node cluster setup.
In an ideal world, we should be using the “Single Node Setup Instructions” and should be able to deploy an HDFS cluster from https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-common/SingleCluster.html.
Except that, it doesn’t work well with docker. I hope…