Get a list of file names from HDFS using python

Raaj

Hadoop noob here.

I've searched for some tutorials on getting started with hadoop and python without much success. I do not need to do any work with mappers and reducers yet, but it's more of an access issue.

As a part of Hadoop cluster, there are a bunch of .dat files on the HDFS.

In order to access those files on my client (local computer) using Python,

what do I need to have on my computer?

How do I query for filenames on HDFS ?

Any links would be helpful too.

user4322779

You should have login access to a node in the cluster. Let the cluster administrator pick the node and setup the account and inform you how to access the node securely. If you are the administrator, let me know if the cluster is local or remote and if remote then is it hosted on your computer, inside a corporation or on a 3rd party cloud and if so whose and then I can provide more relevant information.

To query file names in HDFS, login to a cluster node and run hadoop fs -ls [path]. Path is optional and if not provided, the files in your home directory are listed. If -R is provided as an option, then it lists all the files in path recursively. There are additional options for this command. For more information about this and other Hadoop file system shell commands see http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html.

An easy way to query HDFS file names in Python is to use esutil.hdfs.ls(hdfs_url='', recurse=False, full=False), which executes hadoop fs -ls hdfs_url in a subprocess, plus it has functions for a number of other Hadoop file system shell commands (see the source at http://code.google.com/p/esutil/source/browse/trunk/esutil/hdfs.py). esutil can be installed with pip install esutil. It is on PyPI at https://pypi.python.org/pypi/esutil, documentation for it is at http://code.google.com/p/esutil/ and its GitHub site is https://github.com/esheldon/esutil.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How to list only the file names in HDFS

From Dev

get animal names from text file into List Of String

From Dev

Get a file from an ASPX webpage using Python

From Dev

Get list of database names using psql

From Dev

How to get list of file names from a private remote ip

From Dev

Extract substring from list of file names in Python or R

From Dev

Cannot Read a file from HDFS using Spark

From Dev

list file names from a folder to a tkinter window, with python 3

From Dev

Extracting variable names and data from csv file using Python

From Dev

Python get list of mac addresses and compare them with list from file

From Dev

Extract subset from several file names using python

From Dev

How to get top 3 count of names from list of dictionary in python

From Dev

grep list of names and information from bigger file

From Dev

get list of file names and store them in array on linux using C

From Dev

Searching in a big list file for names using in operator in python

From Dev

Read a list from a file and append to it using Python

From Dev

get function names from a list python

From Dev

Python, string slicing (getting file names from a list of file locations)

From Dev

Get file names from specific input using Java Servlet 3.0

From Dev

Extract substring from list of file names in Python or R

From Dev

How to delete files from a folder using a list of file names in windows?

From Dev

How to get file names from command-line parameters in Python

From Dev

list file names from a folder to a tkinter window, with python 3

From Dev

Extracting variable names and data from csv file using Python

From Dev

Extract subset from several file names using python

From Dev

Extracting file names from text using regular expression Python

From Dev

How do you get the absolute paths for multiple file selections using jfilechooser from an array of file names in java

From Dev

Download file from list and assigning different names

From Dev

Python libtorrent, get file list names

Related Related

  1. 1

    How to list only the file names in HDFS

  2. 2

    get animal names from text file into List Of String

  3. 3

    Get a file from an ASPX webpage using Python

  4. 4

    Get list of database names using psql

  5. 5

    How to get list of file names from a private remote ip

  6. 6

    Extract substring from list of file names in Python or R

  7. 7

    Cannot Read a file from HDFS using Spark

  8. 8

    list file names from a folder to a tkinter window, with python 3

  9. 9

    Extracting variable names and data from csv file using Python

  10. 10

    Python get list of mac addresses and compare them with list from file

  11. 11

    Extract subset from several file names using python

  12. 12

    How to get top 3 count of names from list of dictionary in python

  13. 13

    grep list of names and information from bigger file

  14. 14

    get list of file names and store them in array on linux using C

  15. 15

    Searching in a big list file for names using in operator in python

  16. 16

    Read a list from a file and append to it using Python

  17. 17

    get function names from a list python

  18. 18

    Python, string slicing (getting file names from a list of file locations)

  19. 19

    Get file names from specific input using Java Servlet 3.0

  20. 20

    Extract substring from list of file names in Python or R

  21. 21

    How to delete files from a folder using a list of file names in windows?

  22. 22

    How to get file names from command-line parameters in Python

  23. 23

    list file names from a folder to a tkinter window, with python 3

  24. 24

    Extracting variable names and data from csv file using Python

  25. 25

    Extract subset from several file names using python

  26. 26

    Extracting file names from text using regular expression Python

  27. 27

    How do you get the absolute paths for multiple file selections using jfilechooser from an array of file names in java

  28. 28

    Download file from list and assigning different names

  29. 29

    Python libtorrent, get file list names

HotTag

Archive