h/t to @official-techsupport for finding and help fixing this bug.
When given certain pathological input, `sanitize` would time out
(notably only on posts, rather than comments, perhaps due to the
longer maximum length of input). For example, using as input the
result of:
with open("test.txt", "w") as f:
for i in range(26):
f.write(f":{chr(ord('a') + i)}: ")
f.write('x' * 20_000)
We believe this to be because of some combination of the greedy
quantifiers and the negative lookahead before the match. The regex
was rewritten to (in theory) have much more linear performance.
This bug was discovered when lottery.check_if_end_lottery_task was
failing due to a stack trace thru end_lottery session < badge_grant
< send_repeatable_notifications < sanitize L208. In particular, when
`flask cron` (helpers/cron.py) executes, it does not set g.v, whereas
this code previously assumed that g.v : (None + User) and did not
check for its presence.
Despite being very fun, this fixes the recently discovered bug where
placing '#' or '!' within the 'pat:' suffix of a patted emoji causes
the enclosing <span> to not be given the proper CSS `display` or
`position`, leading to the hand being sized relative to the comment
bounding box rather than the emoji box.
This should be backward compatible. The only posts it wont fix are
existing ones with the giant hands. Main example being:
https://rdrama.net/h/slackernews/post/76302/
Originally prompted by https://rdrama.net/post/18459/-/1984609 which
noticed that streamable.com/e/ links as posts would have another e/
added to them. This was in spite of logic in posts.py api_is_repost
and submit_post designed to specifically counteract this.
Proximal cause was a copypasta'd url.replace(...) chain which
caused the mistake before the streamable-specific logic had a chance
to avoid making it.
Solution: remove the streamable replacement from the chained statement
and create `helpers.normalize_url(url)` to get rid of the copypasta.
Per https://rdrama.net/post/70341/-/1976650 added more gTLDs that
are actually desired by site users.
Also, hard wrapped the `TLDS` and `allowed_tags` tuple-lists at a
100char hard ruler for my sanity.
Comments of the style e.g. "@TLSM / @TLSM2" would mistakenly be
`sanitize`d to have identical links only on "@TLSM", the latter
instance having a dangling 2 on the end. It seems this is purely an
issue with text formatting; alerts.py @ NOTIF_USERS had no such
issues. The root cause appears to be partly an optimization and
partly the use of str.replace without a count limit.