LVM integration with Hadoop for making shared storage elastic

Manali Jain
8 min readOct 30, 2020

--

🔅Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

HADOOP -

  • BigData is a Problem and Hadoop is the solution to this.
  • Hadoop is an open-source framework written in Java Language.
  • Hadoop handles massive data.
  • It solves the issue of Volume and Velocity.
  • Hadoop uses a cluster to handle humongous amounts of data.

LVM ( LOGICAL VOLUME MANAGEMENT)

  • LVM is an advanced version of Partition.
  • The partition is of two types — Static and Dynamic. So LVM is a Dynamic one.
  • LVM can increase or decrease the size of the allocated drive on runtime.
  • LVM uses Volume Groups which borrow space from Physical Volumes.

ELASTICITY-

  • Elasticity means the Volume can be increased or decreased when it is needed.
  • This concept is used in companies as real-world applications need a dynamic approach.

Elasticity is the concept which we can use to increase or decrease the volume of Hadoop datanode on the fly. In the real world, Hadoop data nodes shared storage can’t be static so LVM is used to make it dynamic.

ADD TWO HARD DISKS-

  • Do this process only when your P.C. is shut down.
  • Right-click on the VM name and click on settings.
  • Click on the hard disk and add it.
  • Click on harddisk.
  • Click on NVMe.
  • Click on “create a new virtual disk”.
  • Select the size of the hard disk according to your requirement.
  • Repeat the same process for both the harddisk or any number of hard disks according to your requirement.
  • Here you can see [ “hard disk 3” and “ hard disk 4”] for size 1GB.
  • By the “ fdisk -l ” command you can see the extra hard disks that we have added recently.
fdisk -l

___________________________

CREATE PHYSICAL VOLUME

  • We need to convert the hard disk into a physical volume format.
  • “ pvcreate /dev/nvme0n3 ” is the command to create a physical volume.
  • Create a physical volume for both the hard disks by the above command.
  • The commands are shown in the images below.
  • “ pvdisplay /dev/nvme0n3 ” is the command to check the details of physical volume. Also, you can see the Allocatable is “NO” now because till now it is not attached to any volume group.
PV Create
PV Create

__________________________

CREATE VOLUME GROUP

  • After creating the physical volume the next step is to create a Volume Group.
  • Volume Group Accumulate all the storage and make it act like one hard disk. Also, the user didn’t know whether it is a single hard disk or multiple because its internal working is so powerful about storing and fetching data.
  • “ vgcreate volumegroupname /dev/nvme0n3 /dev/nvme0n4 ” is the command to create a volume group.
  • Now in the “ pvdisplay /dev/nvme0n3 “ and “ pvdisplay /dev/nvme0n4 “ command the allocatable is “YES” because now the physical volume is attached to the volume groups.
VG Create
  • “ vgdisplay volumegroup ” is the command used to see all the details about the volume group.
VG Display

CREATE LOGICAL VOLUME-

  • After creating the Volume Group we can create partitions according to our needs.
  • “ lvcreate — size 1.5G — name mylv volumegroup” is the command to create the logical volume.
LV Create
  • lvdisplay volumegroup/mylv ‘’ is the command which shows all the details of this logical volume.
  • Two different Volume Groups can contain the same name Logical Volume.
lv display

SHOW ALL THE LOGICAL VOLUME OF MY OS-

  • lvdisplayis the command which is used to see all the information about it.
LV display

FORMAT THE LOGICAL PARTITION-

  • A logical volume is created and now format it before using it.
  • “ mkfs.ext4 /dev/volumegroup/mylv ” is the command used to format the partition. In this case, it is formatting the logical volume.
format

MOUNT -

  • After formatting, the partition should be mounted to a drive or directory.
  • Mount is temporary in nature.
  • First, create a directory so that the partition will be mounted to it.
  • “ mkdir /newpart ” is the command used to create a directory
  • “ mount /dev/volumegroup/mylv /newpart ” is the command used to link the directory t the partition.
mount

REAL PLACE OF PARTITION-

  • Partition is a device and it is stored in the “/dev/mapper” directory

EXTEND THE VOLUME GROUP -

  • “ lvextend -size 500M /dev/volumegroup/mylv ” is the command used to increase the size of the logical volume.
LV extend
  • Here you can see that LV SIZE increased from 1.5 to 1.99GB.
LV Display

RESIZE THE FORMATTED HARDDISK-

  • If we use the “mkfs.ext4” command here then it will format the whole logical volume but we don’t want to remove our previous data.
  • resize2fs /dev/volumegroup/mylv ‘’ This is the command used to format only that part of the hard disk which is not formatted.
resize
  • Here the highlighted part says that the logical volume is formatted as “df -h ” command only shows that space which is formatted and usable.
details

Now after completing LVM we need to integrate it with Hadoop.

  • A master node and slave node is needed.
  • Check hadoop and java is installed or not on both systems.
  • “ java -version ” command will show the version of java.
java version
  • “ hadoop version ” command will show the version of Hadoop.
Hadoop version
  • In this path all the Hadoop files are present.
all Hadoop files

MASTER NODE -

Enter into this file for configuration.

vi editor
  • hdfs-site.xml
hdfs-site.xml
  • core-site.xml
core-site.xml
  • Run the “jps” command to check if the namenode is running or not.
jps

SLAVE NODE -

  • hdfs-site.xml
  • Here the path given is “ /newpart ” which was created by the LVM process having 2GB logical volume.
hdfs-site.xml
  • core-site.xml
core-site.xml
  • Run the “jps” command to check whether datanode is running or not.
jps

MASTER NODE -

  • Format the namenode by this command “ hadoop namenode -format “ as the main duty of the namenode is to keep the metadata.
format namenode
  • Start the namenode by using the command “hadoop-daemon.sh start namenode”
start namenode
  • Run “jps” command.
jps
  • “Hadoop dfsadmin -report” command is used to check whether any datanode is connected to a namenode or not. Till now the datanode is not started so it is showing zero nodes.
NameNode CLI

SLAVE NODE-

  • Start the datanode by the command “ hadoop-daemon.sh start datanode ”.
Start Datanode
  • Run the “jps” command.
jps

SLAVE NODE -

  • “ hadoop dfsadmin -report ” now this command shows that the data node is connected with space we have allocated to /newpart directory.
Namenode CLI
  • The size of the master node on the browser.
Namenode Browser

_____________________________

EXTEND THE LOGICAL VOLUME-

  • In the below image logical volume is extended and it is formatted.
  • By “df -h ” command shows that the size is increased to 2.5G.
LV Extended
  • In the master, the shared storage is increased.
Namenode CLI
  • In the browser’s namenode the configured capacity increased.
Namenode Browser

Thank You For Giving Your Precious Time To My Article!!

Have a Great Day!!

--

--

Manali Jain
Manali Jain

Written by Manali Jain

You are the only one person who can Change Yourself…

No responses yet