Setting up a Hadoop cluster on Windows using Docker and WSL2
I wanted to setup a Hadoop cluster as a playground on my Windows 10 laptop. I thought that using Docker with the new WSL2 (Windows Sub-system Linux version 2) included in Windows 10 version 0420 could be a solution. Indeed Docker can use WSL2 to run natively Linux on Windows. I basically followed the tutorial How to set up a Hadoop cluster in Docker that is normally designed for a Linux host machine running docker (and not Windows).
1. Install Docker on Windows
I’m currently using docker desktop version 2.3.0.3 from the stable channel. But any version that supports WSL2 should work. The corresponding engine version is 19.03.8 and docker-compose version is 1.25.5:
You can confirm that docker is running properly by launching a web server:
docker run -d -p 80:80 --name myserver nginx
2. Setting up Hadoop cluster using Docker
Use git to download the the Hadoop Docker files from the Big Data Europe repository:
git clone git@github.com:big-data-europe/docker-hadoop.git
Deploy the docker cluster using the command:
docker-compose up -d
You can check that the containers are running using:
docker ps
You can also double check with the Docker dashboard:
And the current status can also be checked using the web page http://localhost:9870:
3. Testing the Hadoop cluster
We will test the Hadoop cluster running the Word Count example.
- Open a terminal session on the namenode
docker exec -it namenode bash
This will open a session on the namenode for the root user.
- Create some simple text files to be used by the wordcount program
cd /tmp mkdir input echo "Hello World" >input/f1.txt echo "Hello Docker" >input/f2.txt
- Create a hdfs directory named inut
hadoop fs -mkdir -p input
- Put the input files in all the datanodes on HDFS
hdfs dfs -put ./input/* input
-
Download on the host pc (e.g in the directory on top of the hadoop cluster directory) the word count program from this link
- Run the command below in a terminal on the Windows host to identify the namenode container id:
docker container ls
- Use the command below on the Windows host to copy the word count program in the namenode container:
docker cp ../hadoop-mapreduce-examples-2.7.1-sources.jar afb235f8629c:/tmp
- Run the word count program in the namenode:
hadoop jar hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount input output
The program should display something like:
- Print the output of the word count program
hdfs dfs -cat output/part-r-00000
- Shutdown the Hadoop cluster by running on the Windows host
docker-compose down
That’s all !