I am trying to replace three letter code at the end of a sequence with nothing (basically removing) with sed
but is not working well for multiple regex pattern. Here is an example of sequences
GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG
GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAA
GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTGA
When I try to use regex
individually with sed
it works
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" | sed 's/TAG$//'
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAA" | sed 's/TAA$//'
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" | sed 's/TAG$//'
However when I try to include multiple regex it doesn't work
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" |
sed 's/(TAG$|TAA$|TGA$)//'
Could somebody point to me where I am doing wrong?
You need to use extended regex switch in sed:
sed -r 's/(TAG|TAA|TGA)$//'
OR on OSX:
sed -E 's/(TAG|TAA|TGA)$//'
Or this sed without extended regex (doesn't work on OSX though):
sed 's/\(TAG\|TAA\|TGA\)$//'
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments