create files through terminal and joining two files in script python3

debugcn Published at Dev

kutlus

I have a recursive directory called 'dir'. I am writing to list of files from all subdirectories to a CSV file with the following command in linux on the terminal.

dir$ find . -type f -printf '%f\n' > old_names.csv

I am using a detox code to change filenames. And I am making a new list using

dir $ find . -type f -printf '%f\n' > new_names.csv

I would like to join this to lists together and make a new list with two columns something like this;

To do that I read both csv files into pandas data frame and join them on index as follows in python3 script

 import pandas as pd
 import csv

 df_old=pd.read_csv(os.path.join(somepath,'old_names.csv')
 df_new=pd.read_csv(os.path.join(somepath,'new_names.csv')
 df_names=df_new.join(df_old)

The problem is I am getting something like this, wrong file pairs;

When I open the new_names.csv I see that file list is written in a different order than old_names list so joining on index resulting in wrong pairs. How can I solve this problem?

Michael Homer

The find command just outputs in the order the filesystem gives its directory entries in, without any sorting or processing. Depending on the filesystem you're using and other factors, renaming even a single file could change the iteration order, but changing all of them is quite likely to do so. Without a tightly-controlled environment there's no particular reason that two finds should give the same order like that.

For example, many modern filesystems store names in a hash table, and iterate in the order entries appear there. A tiny filename change may be much earlier or later in the table than the original, or even cause total re-hashing of the entire directory so that everything moves. There's no realistic way to put the pieces back together in that case.

It's possible that sorting the filenames might help, if they each have a unique unchanged prefix, but that's the only realistic sort of post-processing you could do and carry on with two separate files from two find runs. I don't recommend even trying that.

However, detox does have a -v option that prints out the changes it is making (and -n to print out what it would do). You could use that to produce your CSV file, or directly from Python using subprocess.run.

detox -v ... | sed -e 's/ -> /,/' > names.csv

would produce a CSV file at least as well as one of your finds, with the old and new names automatically matched up. For the basenames (like %f did) you'll need to postprocess, which you can do in Python if necessary, or in the shell.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-07-18

Comments

0 comments

From Dev

Related Related

Article

create files through terminal and joining two files in script python3

create files through terminal and joining two files in script python3

A shell script for joining two files

Joining two files

Joining Two MKV files in Ubuntu?

Joining two files with unique identifier

Joining Two MKV files in Ubuntu?

Joining two files by a common column

Python: Joining files in a list

Python: Joining files in a list

Script with renaming files in terminal

Renaming Files Through Script

Loop through files in Mac terminal

Unable to delete files through Terminal

Encrypting Files and folder through terminal

Run python script through Java with files arguments

How to use Bash script to loop through two files

Iterating through subdirectories and get two files as arguments for awk script

how to use bash script to loop through two files

Joining two text files with multiple keys

Looping through list in python to create multiple files

Looping through list in python to create multiple files

Terminal script to group files into folders

Comparing two files in a script

How to create multiple files with the Terminal?

How to create multiple files with the Terminal?

Run ruby files through script

Script Loop through files in directory

Run ruby files through script

Loop through two files Ruby

Joining two files based on two key columns awk