perl6 Regex match conjunction &&

  • A+

Perl6 regex match conjunction && returns True if all parts in the conjunction matches the same substring instead of the whole string:

> my $a="123abc456def"; 123abc456def > so $a ~~ m/ 23 && ef / False 

It is False because "23" in the conjunction matched "23" substring in $a, but this substring does not match "ef" in the conjunction. This is a little counter-intuitive because it is easier to interpret $a~~m/23&&ef/ as "$a matches 23 and $a matches ef" than as "$a has a substring that matches 23 and this substring also matches ef".

If I have n regexes and I want to see if all these n regexes match the same whole string rather than match the same substring part of the whole string, then what is the best way to write the perl6 expression?

In the example, I really mean to do

so (($a ~~ /23/) && ($a ~~ /ef/)) 

If the number of regexes is large, then the above is harder to write except with a loop:

so (gather {for @myRegexes { take $a ~~ / $_ /; } }).all 

Is there a simpler way?

With alternations, it is much easier to read as "$a matches 23 or $a matches ef" rather than "the part of $a that matches 23 or matches ef":

> so $a ~~ m/ 23 || ef / True 

Thanks !



A solution focusing on simplicity, not speed

Ignoring regexes for a moment, the generic P6 construct for making foo op bar and foo op baz shorter, provided op is pure in the sense that it's OK to run multiple calls to it in parallel, is foo op bar & baz.

(The main language's & operator is a Junction operator. Junctions are conjunctions with two key characteristics; one is their syntactic brevity/simplicity/clarity; the other is their parallel processing semantics.)

Applying this to the ~~ op in your regex match:

my $a="123abc456def"; say so $a ~~ / 23 / & / ef / 

The above is often suitable provided the bar & baz & ... fits nicely in a single line.

An alternative that still uses junctional logic but skips the infix operator between operands and scales better to larger lists of patterns to match is something like:

my @keywords = <12 de>; say so all ( $a.match: / $_ / for @keywords ) ; 

(with thanks to @lisprogtor for spotting and patiently explaining the bug in my original code for this bit.)

Solutions focusing on speed, not simplicity

There will be many ways to optimize for speed. I'll provide just one.

If all or most of your patterns are just strings rather than regexes, then use the .contains method rather than regexes for the strings:

say so all ( $a.contains: $_ for <23 ef> ) ; 


it is easier to interpret $a~~m/23&&ef/ as "$a matches 23 and $a matches ef"

Yes and no.

Yes, in the sense that there's ambiguity to "matches a and b"; and that your guess is one of several reasonable ones for anyone exploring regexes in general; and, in particular, that your guess is evidently the one you currently find most appropriate aka "easiest".

No, if our iofo's were to match.

(I just invented "iofo". I'm using it to mean "in our friendly opinion", a version of ioho that is not only genuinely intended humbly but also with open arms, conjuring an opinion that I/we imagine might one day be happily shared by some readers.)

Iofo we find it easier to read $a~~m/23&&ef/ as "$a matches 23 and ef" rather than "$a matches 23 and $a matches ef". But of course, "$a matches 23 and ef" remains ambiguous.

For the reading you suggest we have junctions, as explained above:

say so $a ~~ / 23 / & / ef / 

Just as with && inside a single match, iofo it's appropriate to read the above in English as "$a matches 23 and ef", but this time it's short for "$a matches 23 and $a matches ef", just as you wanted.

In the meantime, use of && inside a single match corresponds to the other useful conjunctional meaning, which is to say it refers to matching the regex atom on its left and the regex atom on its right to the same sub-string.

Iofo this is a highly intuitive approach once one becomes aware of, and then used to, these two possible interpretations of a conjunction.


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: