I have a recursive directory called 'dir'. I am writing to list of files from all subdirectories to a CSV file with the following command in linux on the terminal.
dir$ find . -type f -printf '%f\n' > old_names.csv
I am using a detox code to change filenames. And I am making a new list using
dir $ find . -type f -printf '%f\n' > new_names.csv
I would like to join this to lists together and make a new list with two columns something like this;
To do that I read both csv files into pandas data frame and join them on index as follows in python3 script
import pandas as pd
import csv
df_old=pd.read_csv(os.path.join(somepath,'old_names.csv')
df_new=pd.read_csv(os.path.join(somepath,'new_names.csv')
df_names=df_new.join(df_old)
The problem is I am getting something like this, wrong file pairs;
When I open the new_names.csv I see that file list is written in a different order than old_names list so joining on index resulting in wrong pairs. How can I solve this problem?
The find
command just outputs in the order the filesystem gives its directory entries in, without any sorting or processing. Depending on the filesystem you're using and other factors, renaming even a single file could change the iteration order, but changing all of them is quite likely to do so. Without a tightly-controlled environment there's no particular reason that two find
s should give the same order like that.
For example, many modern filesystems store names in a hash table, and iterate in the order entries appear there. A tiny filename change may be much earlier or later in the table than the original, or even cause total re-hashing of the entire directory so that everything moves. There's no realistic way to put the pieces back together in that case.
It's possible that sort
ing the filenames might help, if they each have a unique unchanged prefix, but that's the only realistic sort of post-processing you could do and carry on with two separate files from two find
runs. I don't recommend even trying that.
However, detox
does have a -v
option that prints out the changes it is making (and -n
to print out what it would do). You could use that to produce your CSV file, or directly from Python using subprocess.run
.
detox -v ... | sed -e 's/ -> /,/' > names.csv
would produce a CSV file at least as well as one of your find
s, with the old and new names automatically matched up. For the basenames (like %f
did) you'll need to postprocess, which you can do in Python if necessary, or in the shell.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments