Would you like to learn how to do an Apache Hadoop installation on Ubuntu Linux? In this tutorial, we are going to show you how to download and install Apache Hadoop on a computer running Ubuntu Linux.
• Ubuntu 18.04
• Ubuntu 19.04
• Ubuntu 19.10
• Apache Hadoop 3.1.3
• Openjdk version 11.0.4
Hardware List:
The following section presents the list of equipment used to create this Apache Hadoop tutorial.
Every piece of hardware listed above can be found at Amazon website.
Apache Hadoop Related Tutorial:
On this page, we offer quick access to a list of tutorials related to Apache Hadoop.
Tutorial – Apache Hadoop Installation on Ubuntu Linux
Install the Java JDK package.
Use the following command to find the Java JDK installation directory.
This command output should show you the Java installation directory.
In our example, our Java JDK is installed under the folder: /usr/lib/jvm/java-11-openjdk-amd64
Now, you need to create an environment variable named JAVA_HOME.
Let’s create a file to automate the required environment variables configuration
Here is the java.sh file content.
Reboot the computer.
Use the following command to verify if the JAVA_HOME variable was created.
Here is the command output:
Use the following command to test the Java installation.
Here is the command output:
Create a local user account named hadoop.
Here is the command output.
Take note of the Hadoop user password.
Use the SU comand to become the Haddop user.
Generate a SSH key to the Hadoop user account.
Here is the command output.
As the user Haddop, add the Hadoop user key to the list of authorized ssh keys.
You will need to enter the Hadoop user password.
Here is the command output.
As the Hadoop user account, try to login on the localhost.
Logoff from the Hadoop user account and go back to the root account.
Download the Hadoop package from the official website.
Install the Hadoop software on your Linux server.
Now, you need to create the Apache Haddop required environment variables.
Let’s create a file to automate the required environment variables configuration.
Here is the hadoop.sh file content.
You need to set the JAVA_HOME environment variable on the hadoop.sh file.
Edit the hadoop-env.sh file.
Add the following line at the end of this file.
Reboot the computer.
Use the following command to verify if the Apache Hadoop environment variables were created.
Here is the command output:
Verify the Apache Hadoop version installed.
Here is the command output.
The Apache Hadoop software installation was completed.
Tutorial – Apache Hadoop Configuration Example
In our example, we are going to configure an Apache Hadoop single node cluster setup.
Edit the core-site.xml file.
Here is the original file, before our configuration.
Here is the new file with our configuration.
Edit the hdfs-site.xml file.
Here is the original file, before our configuration.
Here is the new file with our configuration.
Edit the mapred-site.xml file.
Here is the original file, before our configuration.
Here is the new file with our configuration.
Edit the yarn-site.xml file.
Here is the original file, before our configuration.
Here is the new file with our configuration.
Create the required directories named namenode and datanode.
Use the following command to format the namenode.
Use the following command to start your Apache Hadoop cluster.
Use the following command to start your Apache Hadoop cluster.
Open a browser software, enter the IP address of your Apache Hadoop server plus :9870
In our example, the following URL was entered in the Browser:
• http://192.168.15.10:9870
The Apache Hadoop web interface should be presented.
Congratulations! You have finished the Apache Hadoop installation on Ubuntu Linux.