ScannerBot

New to bots on Wikipedia? Read these primers!

Operator: 0xDeadbeef (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 01:48, Thursday, May 5, 2022 (UTC)

Function overview: Removes tracker tags in Twitter links.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: gist

Links to relevant discussions (where appropriate):

Edit period(s): One time run

Estimated number of pages affected: Probably 10000+

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: Finds twitter.com URLs and remove parameters named as s or t.

Discussion

Comments before change

 Comment: if a bot account is needed, I will probably use ScannerBot. 0xDEADBEEF (T C) 01:51, 5 May 2022 (UTC)[reply]

  • information Note: This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT 10:53, 5 May 2022 (UTC) AnomieBOT (talkcontribs) has made few or no other edits outside this topic. [reply]
  • information Note: This bot has edited its own BRFA page. Bot policy states that the bot account is only for edits on approved tasks or trials approved by BAG; the operator must log into their normal account to make any non-bot edits. AnomieBOT 11:40, 5 May 2022 (UTC)[reply]
  • I'm not entirely sure how much I want to be commenting with my BAG hat on, but based on previous tasks that were approved I am not convinced that as a bot task this is fully formed yet. Based on the supposed list of URLs where this tracking is located, the scanner isn't working right either, because there are a few false positives that I know exist out there that are not on the list. If 0xDeadbeef wants to use JWB on their main account they are welcome to and do not require BAG approval. On that note, though, I have moved this BRFA to the bot's page to make it officially a BRFA. Primefac (talk) 14:41, 7 May 2022 (UTC)[reply]
    And, on a minor note, this has prompted me to run Task 17 again... Primefac (talk) 14:49, 7 May 2022 (UTC)[reply]
    I didn't have a method for determining that they are actually parameters of an URL. I tested with a python script that just matched on keywords within the source. I didn't know that there were previous tasks. I will take a look at those and perhaps amend the regex to match more parameters. 0xDEADBEEF (T C) 02:30, 8 May 2022 (UTC)[reply]
    \??(?:&?(?:fbclid|yclid|tracking_referrer|referrer(?:_access_token)?|gs_l|dclid|_ga|_gl|fb_(?:source|ref)|ref_)=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?(?:fbclid|yclid|tracking_referrer|referrer(?:_access_token)?|gs_l|dclid|_ga|_gl|fb_(?:source|ref)|ref_)=[^&\s\]\|]*)+&|(?<=&)(?:&?(?:fbclid|yclid|tracking_referrer|referrer(?:_access_token)?|gs_l|dclid|_ga|_gl|fb_(?:source|ref)|ref_)=[^&\s\]\|]*)+& 0xDEADBEEF (T C) 02:40, 8 May 2022 (UTC)[reply]
    Based on the supposed list of URLs where this tracking is located, the scanner isn't working right either: For the record: I didn't know that CirrusSearch allowed regex searching so I used pywikibot. Now I will probably use insource:/.../ to generate list of articles to fix, with JWB. 0xDEADBEEF (T C) 04:06, 8 May 2022 (UTC)[reply]