Get a list of file names from HDFS using python

raaj Published at Dev

Raaj

Hadoop noob here.

I've searched for some tutorials on getting started with hadoop and python without much success. I do not need to do any work with mappers and reducers yet, but it's more of an access issue.

As a part of Hadoop cluster, there are a bunch of .dat files on the HDFS.

In order to access those files on my client (local computer) using Python,

what do I need to have on my computer?

How do I query for filenames on HDFS ?

Any links would be helpful too.

user4322779

You should have login access to a node in the cluster. Let the cluster administrator pick the node and setup the account and inform you how to access the node securely. If you are the administrator, let me know if the cluster is local or remote and if remote then is it hosted on your computer, inside a corporation or on a 3rd party cloud and if so whose and then I can provide more relevant information.

To query file names in HDFS, login to a cluster node and run hadoop fs -ls [path]. Path is optional and if not provided, the files in your home directory are listed. If -R is provided as an option, then it lists all the files in path recursively. There are additional options for this command. For more information about this and other Hadoop file system shell commands see http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html.

An easy way to query HDFS file names in Python is to use esutil.hdfs.ls(hdfs_url='', recurse=False, full=False), which executes hadoop fs -ls hdfs_url in a subprocess, plus it has functions for a number of other Hadoop file system shell commands (see the source at http://code.google.com/p/esutil/source/browse/trunk/esutil/hdfs.py). esutil can be installed with pip install esutil. It is on PyPI at https://pypi.python.org/pypi/esutil, documentation for it is at http://code.google.com/p/esutil/ and its GitHub site is https://github.com/esheldon/esutil.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-21

Comments

0 comments

From Dev

Related Related

Article

Get a list of file names from HDFS using python

Get a list of file names from HDFS using python

How to list only the file names in HDFS

get animal names from text file into List Of String

Get a file from an ASPX webpage using Python

Get list of database names using psql

How to get list of file names from a private remote ip

Extract substring from list of file names in Python or R

Cannot Read a file from HDFS using Spark

list file names from a folder to a tkinter window, with python 3

Extracting variable names and data from csv file using Python

Python get list of mac addresses and compare them with list from file

Extract subset from several file names using python

How to get top 3 count of names from list of dictionary in python

grep list of names and information from bigger file

get list of file names and store them in array on linux using C

Searching in a big list file for names using in operator in python

Read a list from a file and append to it using Python

get function names from a list python

Python, string slicing (getting file names from a list of file locations)

Get file names from specific input using Java Servlet 3.0

Extract substring from list of file names in Python or R

How to delete files from a folder using a list of file names in windows?

How to get file names from command-line parameters in Python

list file names from a folder to a tkinter window, with python 3

Extracting variable names and data from csv file using Python

Extract subset from several file names using python

Extracting file names from text using regular expression Python

How do you get the absolute paths for multiple file selections using jfilechooser from an array of file names in java

Download file from list and assigning different names

Python libtorrent, get file list names