Setting up a Hadoop cluster on Windows using Docker and WSL2

2 minute read

I wanted to setup a Hadoop cluster as a playground on my Windows 10 laptop. I thought that using Docker with the new WSL2 (Windows Sub-system Linux version 2) included in Windows 10 version 0420 could be a solution. Indeed Docker can use WSL2 to run natively Linux on Windows. I basically followed the tutorial How to set up a Hadoop cluster in Docker that is normally designed for a Linux host machine running docker (and not Windows).

1. Install Docker on Windows

I’m currently using docker desktop version 2.3.0.3 from the stable channel. But any version that supports WSL2 should work. The corresponding engine version is 19.03.8 and docker-compose version is 1.25.5:

Docker version

You can confirm that docker is running properly by launching a web server:

docker run -d -p 80:80 --name myserver nginx

2. Setting up Hadoop cluster using Docker

Use git to download the the Hadoop Docker files from the Big Data Europe repository:

git clone git@github.com:big-data-europe/docker-hadoop.git

Deploy the docker cluster using the command:

docker-compose up -d

You can check that the containers are running using:

docker ps

You can also double check with the Docker dashboard:

Docker Dashboard

And the current status can also be checked using the web page http://localhost:9870:

Hadoop Overview

3. Testing the Hadoop cluster

We will test the Hadoop cluster running the Word Count example.

Open a terminal session on the namenode
```
docker exec -it namenode bash
```
This will open a session on the namenode for the root user.

Create some simple text files to be used by the wordcount program

cd /tmp
mkdir input
echo "Hello World" >input/f1.txt
echo "Hello Docker" >input/f2.txt

Create a hdfs directory named inut
```
hadoop fs -mkdir -p input
```
Put the input files in all the datanodes on HDFS
```
hdfs dfs -put ./input/* input
```
Download on the host pc (e.g in the directory on top of the hadoop cluster directory) the word count program from this link
Run the command below in a terminal on the Windows host to identify the namenode container id:
```
docker container ls
```
Use the command below on the Windows host to copy the word count program in the namenode container:

docker cp ../hadoop-mapreduce-examples-2.7.1-sources.jar afb235f8629c:/tmp

Run the word count program in the namenode:

hadoop jar hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount input output

The program should display something like:

Hadoop Job

Print the output of the word count program
```
hdfs dfs -cat output/part-r-00000
```
Shutdown the Hadoop cluster by running on the Windows host
```
docker-compose down
```

That’s all !

Share on

Twitter Facebook LinkedIn

José Lise

Setting up a Hadoop cluster on Windows using Docker and WSL2

1. Install Docker on Windows

2. Setting up Hadoop cluster using Docker

3. Testing the Hadoop cluster

Share on

You may also enjoy

Neural machine translation with attention

Neural characters language models

N-gram language models or how to write scientific papers

Prohibited Comments Classification