Efficiently remove file(s) from large .tgz

Aksel Willgert

Assume i have an gzip compressed tar-ball compressedArchive.tgz (+100 files, totaling +5gb).

What would be the fastest way to remove all entries matching a given filename pattern for example prefix*.jpg and then store the remains in a gzip:ed tar-ball again?

Replacing the old archive or creating a new one is not important, whichever is fastest.

Stéphane Chazelas

With GNU tar, you can do:

pigz -d < file.tgz |
  tar --delete --wildcards -f - '*/prefix*.jpg' |
  pigz > newfile.tgz

With bsdtar:

pigz -d < file.tgz |
  bsdtar -cf - --exclude='*/prefix*.jpg' @- |
  pigz > newfile.tgz

(pigz being the multi-threaded version of gzip).

You could overwrite the file over itself like:

{ pigz -d < file.tgz |
    tar --delete --wildcards -f - '*/prefix*.jpg' |
    pigz &&
    perl -e 'truncate STDOUT, tell STDOUT'
} 1<> file.tgz

But that's quite risky, especially if the result ends up being less compressed than the original file (in which case, the second pigz may end up overwriting areas of the file which the first one has not read yet).

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Remove duplicate from large lists efficiently?

From Dev

How to remove duplicate lines from a large text file efficiently?

From Dev

Efficiently reading specific lines from large files into R

From Dev

Reading large text files efficiently

From Dev

How can one efficiently remove a range of rows from a large numpy array?

From Dev

Splitting large files efficiently (currently using awk)

From Dev

Remove duplicate comparing two large lists efficiently in C#?

From Dev

Efficiently remove the last two lines of an extremely large text file

From Dev

With two very large lists/collections - how to detect and/or remove duplicates efficiently

From Dev

git/github: Remove from history (but not from HEAD) some large binary files

From Dev

Efficiently serving large static files to many clients after login with Flask

From Dev

vim editing/switching between multiple files efficiently in large directory structures

From Dev

Efficiently merge / sort / unique large number of text files

From Dev

Efficiently serving large static files to many clients after login with Flask

From Dev

Using backends and google cloud store to efficiently process large data files

From Dev

efficiently grep strings between 2 patterns in LARGE log files

From Dev

Efficiently remove the last word from a string in Swift

From Dev

How to efficiently remove entries from a LinkedHashMap in Java?

From Dev

Remove onmouseover event from DOM elements efficiently

From Dev

How to efficiently remove an element from java LinkedList

From Dev

Java how to remove element from List efficiently

From Dev

R: Efficiently remove singleton dimensions from array

From Dev

Efficiently remove last element from std::list

From Dev

How to efficiently remove duplicate rows from a DataFrame

From Dev

Remove duplicates from large NSMutableArray

From Dev

What is a platform-agnostic powershell way to remove nodes from large xml files?

From Dev

How to remove stop words from a large collection files with more efficient way?

From Dev

Remove all lines after an 4 digit number from a large number of .txt files

From Dev

Search a text in all the files in tgz

Related Related

  1. 1

    Remove duplicate from large lists efficiently?

  2. 2

    How to remove duplicate lines from a large text file efficiently?

  3. 3

    Efficiently reading specific lines from large files into R

  4. 4

    Reading large text files efficiently

  5. 5

    How can one efficiently remove a range of rows from a large numpy array?

  6. 6

    Splitting large files efficiently (currently using awk)

  7. 7

    Remove duplicate comparing two large lists efficiently in C#?

  8. 8

    Efficiently remove the last two lines of an extremely large text file

  9. 9

    With two very large lists/collections - how to detect and/or remove duplicates efficiently

  10. 10

    git/github: Remove from history (but not from HEAD) some large binary files

  11. 11

    Efficiently serving large static files to many clients after login with Flask

  12. 12

    vim editing/switching between multiple files efficiently in large directory structures

  13. 13

    Efficiently merge / sort / unique large number of text files

  14. 14

    Efficiently serving large static files to many clients after login with Flask

  15. 15

    Using backends and google cloud store to efficiently process large data files

  16. 16

    efficiently grep strings between 2 patterns in LARGE log files

  17. 17

    Efficiently remove the last word from a string in Swift

  18. 18

    How to efficiently remove entries from a LinkedHashMap in Java?

  19. 19

    Remove onmouseover event from DOM elements efficiently

  20. 20

    How to efficiently remove an element from java LinkedList

  21. 21

    Java how to remove element from List efficiently

  22. 22

    R: Efficiently remove singleton dimensions from array

  23. 23

    Efficiently remove last element from std::list

  24. 24

    How to efficiently remove duplicate rows from a DataFrame

  25. 25

    Remove duplicates from large NSMutableArray

  26. 26

    What is a platform-agnostic powershell way to remove nodes from large xml files?

  27. 27

    How to remove stop words from a large collection files with more efficient way?

  28. 28

    Remove all lines after an 4 digit number from a large number of .txt files

  29. 29

    Search a text in all the files in tgz

HotTag

Archive