Propose your own Turing tests, and optionally, also AIs failing them!

Message Bookmarked
Bookmark Removed
Not all messages are displayed: show all messages (31 of them)

I feel like it's more a passing of attrition than a stunning DeepBlue vs Kasparov moment, but chat programs have been fooling people for decades, and the simple "random dude on a teletype guesses other party is lisa or ELIZA" goalpost has gotten more and more elaborate to compensate.

Philip Nunez, Wednesday, 15 May 2024 16:07 (two months ago) link

I don't disagree, but I think it's still the case that there aren't any AI chatbots that fool most people most of the time. Some researchers have claimed that ChatGPT passes the test, but it seems to be quite easy to get ChatGPT to produce goofy hallucinatory and counterfactual results, so I remain skeptical.

It's probably my personal goalpost moving, but until "imitation game" results are as predictable and consistent as chess results, I'll be reluctant to say the classic test has been satisfied.

Brad C., Wednesday, 15 May 2024 16:34 (two months ago) link

For image generators, my go-to test prompt has been "photo of carrots growing in a field". So far, every time the generator has hallucinated bizarre renditions of plants where carrots grow above ground.

Ari (whenuweremine), Wednesday, 15 May 2024 16:46 (two months ago) link

carrots and lobsters -- the AI's eternal weakness!

https://cdn.imgchest.com/files/l7lxcxp8m37.png

Philip Nunez, Wednesday, 15 May 2024 19:53 (two months ago) link

I think the best turing test ideas probably shouldn't be shared on an internet forum thread, where they have the chance of being scraped for data.

MarkoP, Wednesday, 15 May 2024 20:06 (two months ago) link

i don't think a few posts on a low traffic message board will have much effect. and tests should be described in general terms, even if you give a specific example an llm wouldn't be able to extrapolate to other cases.

if you ask for information about a fantastical scenario, an llm will never express surprise - e.g. "i am holding in my hand a life size model of the statue of liberty. how tall is it?" - it helpfully tells you how tall the statue of liberty is, says nothing about how you are able to hold it in your hand.

ledge, Thursday, 16 May 2024 09:04 (two months ago) link

image ais - or the free ones i've used anyway - are currently very bad at counting. "show me a picture of five elephants. just five! no more, no less" - gives you a picture of seven elephants.

ledge, Thursday, 16 May 2024 09:05 (two months ago) link

i've asked chat gpt to come up with original ideas. so far it's come up with a device to record dreams, the metaverse but with holograms, a wearable device to boost empathy (a quick google finds an MIT project on this), an app to reduce food waste, an app to help you grow plants, "a platform that connects travelers with local artisans and craftsmen around the world" - this also exists.

ledge, Thursday, 16 May 2024 09:17 (two months ago) link

no, free ai at the moment is good for prompting and good at being prompted, but it isn't currently very good at coming up with original ideas at all.

A while ago I asked ChatGPT "Write a song for Bryan Ferry to sing over the music of Autechre". What you tend to get is a rather child-like sing-song about Bryan Ferry singing with Autechre ".

I tried again just now and to be honest it was a little better, but still pretty uninspired and sophomoric:

Through the circuits we wander,
In a world of endless wonder.
Where time dissolves in electric seas,
Beneath the stars of zeroes and ones, we freeze.

In the twilight of the digital dawn,
We'll dance until the code is gone.
Bryan Ferry's croon meets Autechre's beat,
In this cybernetic symphony, complete.

your mom goes to limgrave (dog latin), Thursday, 16 May 2024 09:34 (two months ago) link

rofl!

ledge, Thursday, 16 May 2024 09:37 (two months ago) link

it's better for brainstorming than producing finished work. the other day for a work assignment i was trying to come up with examples of "irritating things people do in the summer".
It churned out maybe 15 ideas, two of which were good enough to use.
I try not to use ChatGPT for anything, but I was under deadline and it defeinitely helped me save at least 30 minutes of brainstorming and/or searching around on Google and trawling through Quora/Reddit posts for examples (whicih, ultimately, is what it's doing)

your mom goes to limgrave (dog latin), Thursday, 16 May 2024 09:42 (two months ago) link

I asked it "what colour is an orange?" ten times in a row. it gave a sensible answer without complaining or mentioning the repetition.

ledge, Thursday, 16 May 2024 10:57 (two months ago) link

In this cybernetic symphony, complete

AI’s favourite film is A.I.

your dog is fed and no one cares (flamboyant goon tie included), Thursday, 16 May 2024 11:02 (two months ago) link

Ask it to come up with some crossword clues and answers... not good with counting letters in a word...

m0stly clean (Slowsquatch), Thursday, 16 May 2024 11:45 (two months ago) link

give me an anagram of "brian eno"

An anagram of "Brian Eno" is "Bonnie Era."

ledge, Thursday, 16 May 2024 13:46 (two months ago) link

when this website has been around since at least the early 2000s, there's no excuse https://wordsmith.org/anagram/

your mom goes to limgrave (dog latin), Thursday, 16 May 2024 13:51 (two months ago) link

perhaps, idk, you could you could link it to an anagram generator so instead of only copying shit it's seen before, it could create actual anagrams. but you couldn't do that for every trivial task - indeed you could always come up with another trivial task, which the only way to successfully complete would be to UNDERSTAND THE ACTUAL FUCKING QUESTION

replace every other letter, starting with the first letter, in the word 'exponentially' with an 'x'

Sure, replacing every other letter, starting with the first letter 'e', with an 'x' in the word 'exponentially' would result in:
"x x p o n x n t x a l l y"

ledge, Thursday, 16 May 2024 14:01 (two months ago) link

I haven't played with it in a while, but chatgpt was fascinatingly bad at attempting to write blank verse - this is a bit under a year ago:

https://pastebin.com/0y7jdAs8

woof, Thursday, 16 May 2024 14:17 (two months ago) link

I tried it with maybe Claude or the Google one lately and it blew out in just the same way. I find it fascinating that it's got these statistical clusters that it almost can't escape, like that's its ontology - a poem is a rhymed object.

'make a list of random words' is also a fascinating test - it's profoundly non-random and seems (for semi-obvious reasons) drawn to the kind of words that are 'interesting', or would turn up in a 'post the most random word' thread on a forum. This was basically 'list 200 random words':


Giraffe scamper elastic drizzle twist sapphire plummet parchment sing broccoli wander lighthouse chirp kangaroo evaporate avalanche button juxtapose oregano swivel nebula whisper ukulele float flabbergasted oscillate xylophone sparkle turquoise recite sycamore zipper leap quartz trust bucket kaleidoscope weave ninja resonate harpoon meander meadow scuttle zephyr collide translucent illuminate fracture trot mosaic glisten orbit fracture resonate cobalt bellow hammock shiver elastic quiver obsidian wobble twirl kite shimmer ponder telescope shuffle octopus blink synchronize amble citrus zigzag whirlpool vault harmonica quench elastic glow lighthouse chatter kangaroo slurp fractal wriggle sapphire gallop blizzard intertwine wrench yawn nebula embrace zipper squawk meadow twinkle xylophone hibernate quartz scribble turbine chortle kangaroo elope parchment dive oregano contemplate lighthouse baffle whirlpool ambulate turquoise skip nebula ripple wrench slumber giraffe convolute parchment frolic mosaic ponder bucket navigate mosaic daydream sycamore chuckle telescope pirouette hammock snicker citrus glide zephyr mesmerize lighthouse murmur ukulele skedaddle oregano froth zipper undulate sapphire rustle giraffe interlace avalanche shimmer bucket tiptoe broccoli daydream kangaroo jumble xylophone.

note the repeats too, eg xylophone

woof, Thursday, 16 May 2024 14:30 (two months ago) link

If you asked a human to name 200 random words I doubt it would be particularly different from that list though, randomness is not something people are actually good at, so in that sense AI is approximating human thinking pretty well.

silverfish, Thursday, 16 May 2024 14:35 (two months ago) link

but to get back to something mentioned a bit earlier, crossword clues are an example of something that takes a lot of creative thinking and seems like something that is still a way off from being possible for an LLM and solving an average crossword puzzle is something that I doubt is currently possible for any version of AI currently other than through brute force

silverfish, Thursday, 16 May 2024 14:41 (two months ago) link

if you correct chatgpt ona mistake, it tends to give a serviceable apology and try again. if you don't correct it and just leave it be, it says nothing

but what if they gave the AI an anxious attachment personality type and if you leave the chat open for more than 15 minutes without acknowledging its answer, it gets all antsy and starts asking you if it was up to standard, then emailing "second thoughts" and asking for feedback that night

your mom goes to limgrave (dog latin), Thursday, 16 May 2024 14:56 (two months ago) link

Deliberately simulating fatigue could be a diabolical way to induce more in-game purchases of virtual coffee for your AI personal assistant!

In that vein, I think all the pieces are in place for a fully autonomous bot to search, apply for, interview, and perform a gig/job meant for a human, but I think this has yet to ever have been demonstrated? People have been "outsourcing" their own work to programs for awhile, but there's always been some human tailoring and intervention involved at some stage of the scam. Maybe an extended version of this test is for the package to set up its own bank account for payment (likely with forged credentials), effectively attaining true recognized worker-hood, if not personhood.

Turing Taskrabbit Test?

Philip Nunez, Saturday, 18 May 2024 14:35 (two months ago) link

four weeks pass...

Apple co-founder Steve Wozniak suggested the coffee test, whereby a robot would be challenged to enter your home, find the kitchen and brew a cup of coffee. The programme should be able to walk into any kitchen, find the ingredients required and then perform the task of making a coffee.

I might and did fail this test.

Philip Nunez, Saturday, 15 June 2024 15:23 (one month ago) link

Beating a reverse Turing test of sorts:
https://petapixel.com/2024/06/12/photographer-disqualified-from-ai-image-contest-after-winning-with-real-photo/

Philip Nunez, Thursday, 20 June 2024 18:23 (one month ago) link

the songwriting turing test

https://suno.com/song/6d8d4b98-1d13-4224-9cf5-5b0c0108df33

| (Latham Green), Thursday, 20 June 2024 19:26 (one month ago) link


You must be logged in to post. Please either login here, or if you are not registered, you may register here.