The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. In this journal entry, we will try to answer the question “How to find the size of a directory using Hadoop?”
Let’s have a look at how we can look and get the size of the directory created in Hadoop.
List the Files
hadoop fs -ls /path/to/directory
This command displays the list of files in the hadoop directory structure [/path/to/directory] and all its details.
Read More: Hadoop Files List command options
Size of the Directory
hadoop fs -du -s -h /path/to/directory
This command displays the total size of the current directory. However, if you are looking for the breakup, then use the below command.
hadoop fs -du -s -h /path/to/directory/*
Read More: Hadoop Directory Size
This command will show the breakup size [size of subdirectories] out of the total size of the current directory.
Top 12 Interesting Careers to Explore in Big Data