Back to Essay

Discuss this story

The links are in the "Finding More" section, the primary one is https://openai-openai-detector.hf.space/Bri (talk) 18:21, 20 February 2023 (UTC)[reply]
Thanks! I didn't see a finding more section but I Could just be dumb. ― Blaze WolfTalkBlaze Wolf#6545 18:23, 20 February 2023 (UTC)[reply]
ChatGPT creates plausible-sounding bullshit. In cases where it has a lot of very similar sources to draw from, such as mostly-empty space-filler articles about an upcoming racing video game (for which it would have about a thousand examples) it can generate something low on nonsense. For something more unique, the bullshit quota is higher. In all cases, though, you can't tell what's bullshit without checking it line by line, because it's all plausible-sounding. Similarly, the sources will always be nonsense, because it isn't generating text based on specific sources, it's generating plausible-sounding reference text bullshit, with no connection to anything. --PresN 19:29, 20 February 2023 (UTC)[reply]
Yes I'm not trying to argue that we should be using ChatGPT (because frankly no one should), simply that it isn't 100% bad all of the time. ― Blaze WolfTalkBlaze Wolf#6545 19:31, 20 February 2023 (UTC)[reply]
IN fact I have encountered situations where it likes to hallucinate (I asked it a few things regarding Splatoon and it kept thinking the special gauge was the amount of ink the weapon had which is not true whatsoever) no matter what I tell it. ― Blaze WolfTalkBlaze Wolf#6545 19:33, 20 February 2023 (UTC)[reply]
One of the data sources for ChatGPT is Wikipedia, so if you ask it to write about something already in Wikipedia, there’s a likelihood that it will select correct information for its output. — rsjaffe 🗣️ 22:24, 20 February 2023 (UTC)[reply]
WP:Randy in Boise can also make good contributions most of the time, but the few times he's wrong still make him a net negative. AI seems to be a long way from getting past this level of ability. Daß Wölf 20:24, 24 February 2023 (UTC)[reply]

I test for articles typically using https://openai-openai-detector.hf.space/ - this and various other currently available "ChatGPT detectors" (including OpenAi's own) are highly unreliable. https://openai-openai-detector.hf.space/ actually already says on the tin that it is a detector for GPT-2 (released in 2019 and very different from ChatGPT). Given the article's focus on the dangers of misinformation, it's a bit sad and ironic that the Signpost is itself providing such dubious recommendations here without any caveats.

Regards, HaeB (talk) 11:12, 21 February 2023 (UTC)[reply]

The article glosses over a lot of the issues regarding detection. It was just a brief intro. I emphasized in the article that I was using a very insensitive method of finding LLM-generated text. There were a couple of reasons I went about things as described there (and to note: I no longer rely solely on GPT-2 detector). 1) at the time I started, other detectors available were very opaque as to how they were constructed; 2) the nature of the output, even though the models are different, has many similar characteristics, so a GPT-2 detector would have some sensitivity and specificity; 3) I intentionally minimized false positives as those irritate article contributors, by doing a vigorous pre-screen of the text. As to point two, note that at least one of the recommended detectors (https://gptzero.me/) is not based on the GPT model, but rather on the text output characteristics. As to point three, I used the authors' feedback as an indicator of the false positive rate: getting no complaints after a lot of tags is a decent indicator that the false positive rate is low. — rsjaffe 🗣️ 18:53, 21 February 2023 (UTC)[reply]
Good to hear that you are proceeding diligently when patrolling new articles (and to be clear, this is very important work and it's good to call attention to this issue). But the part with the tool recommendations was not including any caveats about false positives, and should not have been published in this form.
the nature of the output, even though the models are different, has many similar characteristics, so a GPT-2 detector would have some sensitivity and specificity - what research is this claim based on? (I mean, of course any detection method has "some sensitivity and specificity", the question is whether they are good enough.)
is not based on the GPT model, but rather on the text output characteristics - it seems that there is some fundamental confusion here between the model that is doing the detection and the model whose output is being detected (and/or the features of its output). https://openai-openai-detector.hf.space/ is also not using "the GPT model" (there are many actually) to detect GPT-2 output, but RoBERTa instead.
Regards, HaeB (talk) 06:57, 24 February 2023 (UTC)[reply]