The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was  Approved.

Operator: ST47 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 07:33, Sunday, December 1, 2019 (UTC)

Function overview: Block IP addresses belonging to open proxies, public VPN services, and web hosts.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Combination of Python and Perl

Source code available: No, to protect sources and testing methods. Mostly uses pywikibot to interact with the wiki.

Links to relevant discussions (where appropriate): WP:NOP authorizes open proxies to be blocked, Wikipedia:Bots/Requests for approval/ProcseeBot demonstrates that a bot may be used for this task.

Edit period(s): Continuously

Estimated number of pages affected: Initially a higher rate due to a large number of not-currently-blocked proxies, once it reaches steady state, probably about 100 logged actions per day.

Namespace(s): None

Exclusion compliant (Yes/No): No, not applicable

Adminbot (Yes/No): Yes

Function details: WP:NOP states that open proxies may be blocked at any time. Open proxies allow editors to evade blocks, avoid detection, or appear to be multiple users when they were actually only one person. Waiting until these proxies are abused is not practical, as multiple easily-searchable websites advertise thousands of such proxies. The number of blocks to be made is high enough that automation is required.

There is already a bot approved for this task. However, it is beneficial to have multiple independent tools. There are a large number of data sources out there, and I've already been able to find a number of proxies that I can access, but which haven't been blocked. I'm sure others can find things that I can't. Since this involves trying to test the proxies, operators in different geographical areas or homed to different ISPs may have different views on the internet, one ISP may "blackhole" a network that hosts a lot of malicious proxies while another ISP may not, or one bot might have an outage of some sort while another is still running normally.

This bot makes three types of blocks:

  1. Open (HTTP/HTTPS/SOCKS) proxies, tested by accessing Wikipedia through the proxy
  2. Open VPN servers, verified by at least two data sources
  3. IP ranges used for web hosting, cloud services, etc, manually reviewed and curated

In the first two cases, it uses open sources to build a list of IP addresses, which it scans through. For proxy types that can be automatically tested, the bot attempts to access the Userinfo API. VPN testing cannot easily be automated, so instead the bot performs checks against several independent sources to determine if the IP address is a VPN node or not. The original proxy list is the first source, and the bot requires positive responses from at least two more sources before blocking. Once the bot has either successfully accessed API:Userinfo through the proxy or verified the proxy against enough independent data sources, it adds the IP to a list to be blocked.

The block duration varies based on the number of previous proxy blocks of that IP address. Currently, the first block is set for 14 days, ramping up to 2 years after enough blocks. The block is set to account-creation-blocked with anon-only unselected. I.e., this is a hardblock. The block message uses ((blocked proxy)) with a comment providing enough information for me to investigate why the IP was blocked in case there is any question. If the IP is already blocked, the bot skips it. This includes if it is already subject to a local rangeblock. The bot does not check global blocks. If the proxy is an IPv6 IP, the bot blocks the /64 range.

In the case of web hosting IP ranges, these are ranges that I identify through whois data and other sources as belonging to web hosting or other similar types of companies. I generally find an address that is being abused, usually by a spambot, and then investigate and block the entire address space assigned to web hosting at that company. In this case, I manually review the list of ranges to be blocked, and the bot blocks them for a fixed time period, generally using the ((colocationwebhost)) template, and a description of the IP ownership in the block comment. This task will never be fully automated, as it is based on me finding and deciding to block a given set of IP ranges. However, I'm bundling it with this bot request because it also entails making a large number of blocks at once.

Further, this bot may modify blocks that it initially issued, either extending them if they are near expiration and the proxy is still active, or removing them if the proxy is confirmed to be inactive. (The removal would only be after several checks in a row, over a period of time, confirm that the proxy isn't active, and this isn't currently implemented.) ST47 (talk) 07:33, 1 December 2019 (UTC)[reply]


Discussion[edit]

To which degree does this task overlap with that of User:ProcseeBot? Also, summoning their botop slakr here. Jo-Jo Eumerus (talk) 11:35, 1 December 2019 (UTC)[reply]
  • @Jo-Jo Eumerus: Same objective, but I am finding a large number of proxies that are not currently blocked, and that need to be. Possibly because I'm using different data sources or different testing methods. ST47 (talk) 18:32, 1 December 2019 (UTC)[reply]
What mechanism do you have in place to prevent wheel warring, especially with human admins? — xaosflux Talk 14:17, 1 December 2019 (UTC)[reply]
  • @Xaosflux: I will add a check that it will not block any IP address that has ever been unblocked by a human admin. Instead it will log those to a file or to userspace. ST47 (talk) 18:32, 1 December 2019 (UTC)[reply]
Does this bot support IPv6? SQLQuery me! 16:12, 1 December 2019 (UTC)[reply]
Also, as far as the major compute services (amazon, azure, google) - how are you identifying these? SQLQuery me! 16:14, 1 December 2019 (UTC)[reply]
  • @SQL: It does support IPv6, and blocks the /64 of detected IPv6 proxies. There is no special handling for the major compute services. Most of them are already blocked, if a blocked IP (including via a range block) shows up on one of the proxy lists, I currently don't even bother testing it, no point since it's already blocked. (For the ancillary task of blocking web hosting types of IP ranges, that would be through manual review of the whois information. Basically, if I CU a spambot and find that it's on some minor cloud services company's IP range, I run it through ISP rangefinder, review the netblock names, and block everything that looks webhost-y. The only automation for those is to save me from clicking the block button 100 times.) ST47 (talk) 18:32, 1 December 2019 (UTC)[reply]
    ST47, It is often not the case that the major providers are already blocked. Amazon, and azure get new ranges all the time, see: User:SQL/Non-blocked compute hosts. I've always used [1] to detect azure, and [2] to detect amazon. Google is a bit more complicated. SQLQuery me! 18:39, 1 December 2019 (UTC)[reply]
@SQL: What would you suggest doing with this information? Refrain from blocking proxy IPs within that range, or block the entire range? ST47 (talk) 19:14, 1 December 2019 (UTC)[reply]
ST47, I normally completely block those ranges by hand, It can be a bit tedious / tiresome. They're webhosts, and very commonly used by spammers / UPE. SQLQuery me! 21:02, 1 December 2019 (UTC)[reply]
@SQL: Right, I guess my point is that if that range isn't blocked yet, it's still good to directly block any proxies within that range. (Hopefully, that range won't be unblocked for more than a couple of days, and once the range does get blocked, the bots will stop scanning it for proxies, and the short initial block will eventually expire, leaving the long-term rangeblock.) Improving the automation around detecting (and hopefully eventually blocking and unblocking) hosting ranges is important, but isn't this bot request's objective. ST47 (talk) 21:26, 1 December 2019 (UTC)[reply]
Fair enough, I thought that this would fall under point 3, IP ranges used for web hosting, cloud services, etc, manually reviewed and curated. SQLQuery me! 22:10, 1 December 2019 (UTC)[reply]
That's intended to cover cases when I checkuser a spambot, find some huge webhosting range, run it through ISP rangefinder and decide to block the whole darn thing. Or for another example, based on this experimental product, I found these guys. There are over 1,000 individually blocked proxy IPs in that AS. Some of the ranges in ISP Rangefinder I wasn't sure about, and left unblocked. But still, the 81 ranges that I did block, I think should be done with a bot account - it's assisted rather than automatic, but doing it with my normal account just floods my block log. ST47 (talk) 22:35, 1 December 2019 (UTC)[reply]

((BAGAssistanceNeeded)) Folks, this is a pretty important task, and the operator has provided good answers to all questions/concerns above. Any particular reason why it hasn't been approved for trial? -FASTILY 03:33, 21 March 2020 (UTC)[reply]

Various reasons, I suspect, but likely because we were busy with what we feel are more important things and/or felt that the scope/details of the task were too far above their comfort level. Personally, I left it be because two other BAGs were (I thought) going to deal with it, but clearly they have decided to wait. Primefac (talk) 22:17, 22 March 2020 (UTC)[reply]

Approved for trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete. I'm going to grant adminbot for a week, but I would like to see at most 10 blocks. After this has been completed, please post the full results here with a note at WP:AN so that input about the blocks/length/etc can be evaluated. Primefac (talk) 22:17, 22 March 2020 (UTC)[reply]

@Primefac: Can I suggest not granting "bot" rights, so that the blocks can be seen in recent changes? DannyS712 (talk) 22:25, 22 March 2020 (UTC)[reply]

Trial Results[edit]

IP Address Blocked Type of Proxy Basis for Block Previous Block Count Block Duration WHOIS Link
108.60.113.218 Public OpenVPN Server Advertised on an open proxy list, confirmed port open, confirmed VPN server listening, listed on proxy blacklists 2, both for the same reason (although on a different port) 1 year WiLine Networks Inc. (commercial ISP)
119.193.22.213 Public OpenVPN Server Advertised on an open proxy list, confirmed ports (TCP and UDP) open, confirmed VPN server listening, listed on proxy blacklists 0 2 weeks Korea Telecom (residential ISP)
24.189.33.84 Public OpenVPN Server Advertised on an open proxy list, confirmed ports (TCP and UDP) open, confirmed VPN server listening, listed on proxy blacklists 2, both for the same reason 1 year Optimum Online (residential ISP?)
121.109.129.46 Public OpenVPN Server Advertised on an open proxy list, confirmed ports (TCP and UDP) open, confirmed VPN server listening, listed on proxy blacklists 2, both for the same reason 1 year KDDI Corporation (Japanese company, unknown connection type)
201.113.47.213 Public OpenVPN Server Advertised on an open proxy list, confirmed port open, confirmed VPN server listening, listed on proxy blacklists 0 2 weeks Uninet S.A. de C.V. (residential ISP?)
5.17.89.13 Open SOCKS Proxy Server Advertised on an open proxy list, confirmed able to reach Wikipedia through the proxy, listed on proxy blacklists 1, same reason 2 months Z-Telecom Network (unknown ISP)
159.192.253.187 Open SOCKS Proxy Server Advertised on an open proxy list, confirmed able to reach Wikipedia through the proxy, listed on proxy blacklists 1, by ProcseeBot for the same reason 2 months CAT Telecom (unknown ISP)
103.206.225.64 Open SOCKS Proxy Server Advertised on an open proxy list, confirmed able to reach Wikipedia through the proxy, listed on proxy blacklists 0 2 weeks Acme Diginet Corporation (unknown type of company)
190.104.204.242 Open SOCKS Proxy Server Advertised on an open proxy list, confirmed able to reach Wikipedia through the proxy, listed on proxy blacklists 1, by ProcseeBot for the same reason 2 months Nestle Argentina (business, likely compromised server?)
103.117.110.254 Open SOCKS Proxy Server Advertised on an open proxy list, confirmed able to reach Wikipedia through the proxy, listed on proxy blacklists 2, same reason 1 year I Link Internet Service (residential ISP?)

Trial complete.@Primefac:, I have run the bot for 10 actions, and listed the results above. As it turned out, most (but not all) of the IPs had been blocked before, so you can see the escalating block duration - a first detection results in a block for 2 weeks, whereas the IP addresses that have been hosting a proxy for a longer period of time also get longer blocks. Two of the addresses are also blocked globally, likely due to Jon Kolbert's work dealing with spambots. Let me know if you have any questions, and I'll also post to WP:AN as you asked. ST47 (talk) 02:54, 23 March 2020 (UTC)[reply]

Tag added for the bot. I'll leave this open for about a week to see if there's any input. Primefac (talk) 00:24, 25 March 2020 (UTC)[reply]
 Approved. Primefac (talk) 12:15, 3 April 2020 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.