selecting ranges from a file with the perl .. operator
Posted on December 9th, 2004 in by jud || No Comment
Mike Schilli showed me a cool perl operator today that I haven’t ever used. Often I want to get lines from a file that are between to specific lines. For example, given a file like:
blah blah START THIS IS VERY IMPORTANT END blah blah
I would like to easily extract everything between (and including) START and END. Using a perl one liner and the .. operator, this is very easy.
$ perl -ne'print $_ if /START/ .. /END/'
START THIS IS VERY IMPORTANT END
The .. operator (and it’s cousin the … operator) is a flip-flop operator. In scalar context, it is false until the first expression becomes true and then is true until the second expression becomes false. In this case, we want nothing to be printed until the current line matches /START/, and then continue printing until the current line matches /END/. I’m using the implicit variable $_ (the current line) to make this more succinct.
See perldoc perlop
for more information
Here’s a more complex example that excludes the boundary lines by making use of a hackish perlism:
When true, the .. operator returns a sequence number (starting with 1). However, on the last item (when the second expression becomes false), perl will return not simply a number like 24, but instead, a number like 24E0. This is actually 24 in scientific notation (24 * 10 ^ 0 = 24). This evaulates to the correct value when treated as a number, but can be identified via a regular expression in string context. So in this case, we print lines EXCEPT the last line by adding the condition our pattern matching expression ($i !~ /E0^/), and we print all the lines in except the first one by including the condition ($i > 1).
$ perl -ne'print $_ if ($i = /START/ .. /END/) and ($i > 1) and ($i !~ /E0$/)'
THIS IS VERY IMPORTANT
Just to confuse matters, the .. operator builds a range when called in a list context, e.g.
@alphabet = ( 'A' .. 'Z');
Finally, remember to use a regular expression that is as specific as is ncessary, like /^START$/ instead of START so you don’t match string like RESTART.
Leave a Reply