Trouble trimming whitespace when piping grep to awk

user1764386

I am trying to write a simple wrapper for grep in order to put its output in a more readable format. This includes putting the matched string (which occurs after the second colon) on a new line, and trimming any leading whitespace/tabs from the matched string.

So instead of doing the following:

$ grep -rnIH --color=always "grape" .

./apple.config:1:   Did you know that grapes are tasty?

I would like to be able to get this:

$ grep -rnIH --color=always "grape" . | other-command

./apple.config:1:   
Did you know that grapes are tasty?

I have tried many different methods to try to do this, including using sed, awk itself, substitution, perl etc. One important thing to keep in mind is that I want to trim leading space from $3, but that $3 may not actually contain the entire matched string (for example, if the matched string contains a url with ":" characters).

So far I have gotten to the point that I have the following.

$ grep -rnIH --color=always "grape" . | \
      awk -F ":" '{gsub(/^[ \t]+/, "", $3); out=""; for(i=4;i<=NF;i++){out=out$i}; print $1":"$2"\n"$3out}'

./apple.config:1:   
    Did you know that grapes are tasty?

The gsub is intended to trim whitespace/tabs from the start of whatever occurs right after the second colon. Then the for loop is intended to build a variable made up of anything else in the matched string that may have gotten split by the field separator ":".

I greatly appreciate any help in getting the leading whitespace to be trimmed properly.

user1764386

I ended up using a combination of grep, awk, and sed to solve my problem and produce the desired output format. I wanted to keep the coloured output that grep provides when the "--color=always" option is used, which initially steered me away from using awk to perform the file contents matching.

The tricky bit was that the coloured grep output was producing the color codes in unexpected locations. It was therefore not possible to trim the leading whitespace from a line that in fact began with a colour code. The second tricky part was that I needed to ensure that matched strings containing the awk file separator (":" in my case) we reproduced properly.

I made the following bash wrapper function finds() in order to recursively search file contents in a directory quickly.

#--------------------------------------------------------------#
# Search for files whose contents contain a given string.      #
#                                                              #
# Param1: Substring to recursively search for in file contents.#
# Param2: Directory in which to search for files. [optional].  #
# Return: 0 on success, 1 on failure.                          #
#--------------------------------------------------------------#
finds() {
    # Error if:
    # - Zero or more than two arguments were provided.
    # - The first argument contains an empty string.
    if [[ ( $# -eq 0  ) || ( $# -gt 2  ) || ( -z "$1" ) ]]
    then
        echo "About: Search for files whose contents contain a given string."
        echo "Usage: $FUNCNAME string [path-to-dir]"
        echo "* string     : string to recursively search for in file contents"
        echo "* path-to-dir: directory in which to search files. [OPTIONAL]"

        return 1 # Failure
    fi

    # (r)ecursively search, show line (n)umbers.
    # (I)gnore binaries, s(H)ow filenames.
    grep_flags="-rnIH"

    if [ $# -eq 1 ]; then # No directory given; search from current directory.
        rootdir="."
    else # Search from specified directory.
        rootdir="$2"
    fi

    # The default color code, with brackets
    # escaped by backslashes.
    def_color="\[m\[K"

    grep $grep_flags --color=always "$1" $rootdir | 
    awk '
    BEGIN {
        FS = ":"
    }
    {
        print $1":"$2
        out = $3
        for(i=4; i<=NF; i++) {
            out=out":"$i
        }
        print out
    }' |
    sed -e "s/$def_color\s*/$def_color/"

    return 0 # Success
}
  1. grep is used to recursively look for matching strings in the contents of those files contained in the specified directory.
  2. awk is used to print "filename:linenumber", then build a variable holding the rest of the arguments, separated by the field separator character ":". This allows us to recombine the rest of the matched string, in case it was divided by the initial split (e.g. urls containing "http://").
  3. sed is used to trim any leading whitespace/tabs from the output lines. Here it matches the default color code (followed by a variable amount of space) and replaces it with itself (without the trailing space).

Setting the correct value of def_color

I am unable to display the correct value of def_color in the above codebox (the \[m\[K shown above in the code is not correct). To get the correct ANSI escape sequence to use for this variable:

  1. Redirect the output of grep --color=always to a text file.

  2. Copy and paste the highlighted sequence below as the value of def_color in the finds() function above.

  3. Add a "\" escape character before each bracket.

Code to write colored grep output to a text file:

$ cd orange_test/
$ cat orange1.txt
I like to eat oranges.
$grep -r --color=always "orange" . > ./grep_out.txt

grep_out.txt

Using the function

The following shows the output produced by the function. Note that you can also specify a directory path in the second parameter.

cheese_test/cheese1.txt

I like to eat cheese.

    Do you all like cheese?

   I like
when the cheese is
on my pizza.

you can find out more about
      cheese at http://cheeseisgood.com

cheesestick

grep_out2

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related