s/// returns out of place newline

  • A+
Category:Languages

I'm trying to use Perl to reorder the content of an md5 file. For each line, I want the filename without the path then the hash. The best command I've come up with is:

$ perl -pe 's|^([[:alnum:]]+).*?([^/]+)$|$2 $1|' DCIM.md5 

The input file (DCIM.md5) is produced by md5sum on Linux. It looks like this:

e26ff03dc1bac80226e200c0c63d17a2  ./Path1/IMG_20150201_160548.jpg 01f92572e4c6f2ea42bd904497e4f939  ./Path 2/IMG_20150204_190528.jpg afce027c977944188b4f97c5dd1bd101  ./Path3/Path 4/IMG_20151011_193008.jpg 
  1. The hash is matched by the first group ([[:alnum:]]+) in the
    regular expression.
  2. Then the spaces and the path to the file are
    matched by .*?.
  3. Then the filename is matched by ([^/]+).
  4. The expression is enclosed with ^ (apparently non-necessary here) and $. Without the $, the expression does not output what I expect.
  5. I use | rather than / as a separator to avoid escaping it in file paths.

That command returns:

IMG_20150201_160548.jpg  e26ff03dc1bac80226e200c0c63d17a2IMG_20150204_190528.jpg  01f92572e4c6f2ea42bd904497e4f939IMG_20151011_193008.jpg  afce027c977944188b4f97c5dd1bd101IMG_20151011_195133.jpg 

The matching is correct, the output sequence is correct (filename without path then hash) but the spacing is not: there's a newline after the filename. I expect it after the hash, like this:

IMG_20150201_160548.jpg e26ff03dc1bac80226e200c0c63d17a2 IMG_20150204_190528.jpg 01f92572e4c6f2ea42bd904497e4f939 IMG_20151011_193008.jpg afce027c977944188b4f97c5dd1bd101 

It seems to me that my command outputs the newline character, but I don't know how to change this behavior. Or possibly the problem comes from the shell, not the command?

Finally, some version information:

$ perl -version This is perl 5, version 22, subversion 1 (v5.22.1) built for i686-linux-gnu-thread-multi-64int (with 69 registered patches, see perl -V for more detail) 

 


[^/]+ matches newlines, so the ones in your input are part of $2, which gets put first in your transformed $_ (And there's no newline in $1 so there's no newline at the end of $_...)

Solution: Read up on the -l option from perlrun. In particular:

-l[octnum] enables automatic line-ending processing. It has two separate effects. First, it automatically chomps $/ (the input record separator) when used with -n or -p. Second, it assigns $/ (the output record separator) to have the value of octnum so that any print statements will have that separator added back on. If octnum is omitted, sets $/ to the current value of $/ .

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: