A problem with regex as usual and an unsupported feature
regex has lookahead (?=) and the $to as to=com /^https:\/\/(?:[a-z]{2}\.)?[a-z]{7,14}\.com\/r(?=[a-z]*[0-9A-Z])[0-9A-Za-z]{10,16}\/[A-Za-z]{5}$/$script,3p,match-case,to=com
To [properly] convert it to RUST regex since we are making a new rule (and I mean properly like for RUST where you don’t even need to escape / as JS does \/):
/^https://(?:[a-z]{2}\.)?[a-z]{7,14}\.com/r(?:[a-z]*[0-9A-Z])[0-9A-Za-z]{9,16}/[A-Za-z]{5}$/$script,3p,match-case note that there is a difference in rule rom {10,16} because lookahead feature matters in that case for that difference.
But this rule will work as expected.
Also, sometimes $/$ and match-case causes issues but not in this case. To be honest, I don’t know why uBlock doesn’t write simpler regex rules, what is the point of $ in these cases where it is almost impossible to find an address that goes beyond it to match it. But it is what it is, RUST regex fault too, and RUST differences, like capturing groups don’t even work the same in RUST and all that, so RUST regex can be annoying.
Note: I forgot to mention, the Regex rule blocks: https://pn.jactantsplodgy.com/rnJ9Lqh4qmL1/QOaaJ https://oj.bromisescapose.com/rnmSOUjlOk5UHQ/mjllA
Maybe there could be a talk, like I understand the use of lookeahead and lookbehind, RUST doesn’t and you need a crate like fancy_regex to support this, which would be problematic if it affects performance since even it says it will take exponential time in some cases.
But uBlock supports JS regex, which supports all that, so it is hard to request help and ‘changes’ when the problem is RUST and then Brave decision of basing their adblocker on RUST.
Sometimes I think rules could be simpler though, sometimes they have unnecessary stuff, and then a rule will only match one element at a time, doesn’t match multiple lines or anything.
I believe Brave should test how fanzy_regex performs and go with it. uBlock while they could do some changes, sometimes stuff is needed for easier match, therefore Brave has to adapt to it or convert the regex rules, not expect uBlock to do it since it works on their side just fine.
In the end this rule wouldn’t work because of the $to which Brave doesn’t support either, so that’s also it. I believe Brave team should adapt rules if necessary don’t just hardcode jactantsplodgy and bromisescapose and then get the same issue when domains change.
The rule became the current form to avoid false positive like https://github.com/uBlockOrigin/uAssets/issues/8280 . I’m aware that Rust doesn’t support lookahead, neither does uBlock Origin Lite as per MV3 specification. This is why we put a separate rule for uBOL - even though it can’t catch everything, still far better than nothing. The issue here is the same, I think /^https:\/\/[a-z]{2}\.[a-z]{7,14}\.com\/r[0-9A-Za-z]{10,16}\/[A-Za-z]{5}$/$script,3p will be safe though inperfect so will add this to the filter for uBOL. Maybe Brave should consider to include ubol-filters.txt, but for a time can add the same rule(s) to built-in filters.
Well, that sounds terrible, that means eventually when only MV3 exists, all the nice regex rules uBlock has won’t work, so it means fancy_regex will be useless even if implemented = that might be a reason not to even support it today.
I still asked about it, I only can get a ‘no’ for an answer now.
What I do to fix and adapt regex rules is go to regex101 and fix it there with their RUST implementation. I had to start over in this computer, but I already adapted 4 regex rules to Brave when I encounter the issues.
Have done it for more than a year, and haven’t notice any problem in websites, but then, if I don’t see something breaking, then, I won’t really care.
There wasn’t really a good option at the time - fancy_regex only just had its first release by the time we started using adblock-rust, whereas regex was already well-established. It might be time to take a look at it.
I do suspect there will be some performance-related pushback, but as you’ve mentioned it is likely a net benefit.
I’ll try to investigate once the drama with YouTube has settled down a bit!
So it might happen, since fanzy_regex says if your regex or parts of it does not use any special features, the matching is delegated to the regex crate. and only use the slow for the fancy features. So it will affect few rules, and might not even be noticeable for normal people.
If it takes 100ms more to block scripts that are going to create a popup or something, then it is worth the trade.
There wasn’t really a good option at the time - fancy_regex only just had its first release by the time we started using adblock-rust, whereas regex was already well-established. It might be time to take a look at it.
I do suspect there will be some performance-related pushback, but as you’ve mentioned it is likely a net benefit.
I’ll try to investigate once the drama with YouTube has settled down a bit!
So it might happen, since fanzy_regex says if your regex or parts of it does not use any special features, the matching is delegated to the regex crate. and then only use for this is if/when a regex has some fancy features. So it will affect few rules, and might not even be noticeable for normal people.
If it takes 100ms more to block scripts that are going to create a popup or something, then it is worth the trade.
So I believe it will be good for Brave and added eventually!