ChatGPT Answers Programming Questions Incorrectly 52% of the Time: Study

ForgottenFlux@lemmy.world · 6 months ago

ChatGPT Answers Programming Questions Incorrectly 52% of the Time: Study

NotMyOldRedditName@lemmy.world · edit-2 6 months ago

My experience with an AI coding tool today.

Me: Can you optimize this method.

AI: Okay, here’s an optimized method.

Me seeing the AI completely removed a critical conditional check.

Me: Hey, you completely removed this check with variable xyz

Ai: oops you’re right, here you go I fixed it.

It did this 3 times on 3 different optimization requests.

It was 0 for 3

Although there was some good suggestions in the suggestions once you get past the blatant first error

piecat@lemmy.world · 6 months ago

My favorite is when I ask for something and it gets stuck in a loop, pasting the same comment over and over

efstajas@lemmy.world · 6 months ago

Yeah it’s wrong a lot but as a developer, damn it’s useful. I use Gemini for asking questions and Copilot in my IDE personally, and it’s really good at doing mundane text editing bullshit quickly and writing boilerplate, which is a massive time saver. Gemini has at least pointed me in the right direction with quite obscure issues or helped pinpoint the cause of hidden bugs many times. I treat it like an intelligent rubber duck rather than expecting it to just solve everything for me outright.

Jimmyeatsausage@lemmy.world · 6 months ago

Same here. It’s good for writing your basic unit tests, and the explain feature is useful getting for getting your head wrapped around complex syntax, especially as bad as searching for useful documentation has gotten on Google and ddg.

InternetPerson@lemmings.world · 6 months ago

That’s a good way to use it. Like every technological evolution it comes with risks and downsides. But if you are aware of that and know how to use it, it can be a useful tool.
And as always, it only gets better over time. One day we will probably rely more heavily on such AI tools, so it’s a good idea to adapt quickly.

zelifcam@lemmy.world · edit-2 6 months ago

“Major new Technology still in Infancy Needs Improvements”

– headline every fucking day

aname@lemmy.one · 6 months ago

“Corporation using immature technology in productions because it’s cool”

More news at eleven

capital@lemmy.world · 6 months ago

This is scary because up to now, all software released worked exactly as intended so we need to be extra special careful here.

otp@sh.itjust.works · 6 months ago

Yes, and we never have and never will put lives in the hands of software developers before!

Tap for spoiler

/s…for this comment and the above one, for anyone who needs it

TropicalDingdong@lemmy.world · edit-2 6 months ago

“Will this technology save us from ourselves, or are we just jerking off?”

BeatTakeshi@lemmy.world · edit-2 6 months ago

Who would have thought that an artificial intelligence trained on human intelligence would be just as dumb

capital@lemmy.world · edit-2 6 months ago

Hm. This is what I got.

I think about 90% of the screenshots we see of LLMs failing hilariously are doctored. Lemmy users really want to believe it’s that bad through.

Edit:

otp@sh.itjust.works · 6 months ago

I’ve had lots of great experiences with ChatGPT, and I’ve also had it hallucinate things.

I saw someone post an image of a simplified riddle, where ChatGPT tried to solve it as if it were the entire riddle, but it added extra restrictions and have a confusing response. I tried it for myself and got an even better answer.

Prompt (no prior context except saying I have a riddle for it):

A man and a goat are on one side of the river. They have a boat. How can they go across?

Response:

The man takes the goat across the river first, then he returns alone and takes the boat across again. Finally, he brings the goat’s friend, Mr. Cabbage, across the river.

I wish I was witty enough to make this up.

capital@lemmy.world · 6 months ago

I reproduced that one and so I believe that one is true.

I looked up the whole riddle and see how it got confused.

It happened on 3.5 but not 4.

otp@sh.itjust.works · 6 months ago

Interesting! What did 4 say?

capital@lemmy.world · 6 months ago

Evidently I didn’t save the conversation but I went ahead and entered the exact prompt above into GPT-4. It responded with:

The man can take the goat across the river in the boat. After reaching the other side, he can leave the goat and return alone to the starting side if needed. This solution assumes the boat is capable of carrying at least the man and the goat at the same time. If there are no further constraints like a need to transport additional items or animals, this straightforward approach should work just fine!

otp@sh.itjust.works · 6 months ago

Thanks for sharing!

AIhasUse@lemmy.world · 6 months ago

Yesterday, someone posted a doctored one on here saying everyone eats it up even if you use a ridiculous font in your poorly doctored photo. People who want to believe are quite easy to fool.

gravitas_deficiency@sh.itjust.works · 6 months ago

Holy fuck did it just pass the Turing test?

Subverb@lemmy.world · 6 months ago

ChatGPT and github copilot are great tools, but they’re like a chainsaw: if you apply them incorrectly or become too casual and careless with them, they will kickback at you and fuck your day up.

shotgun_crab@lemmy.world · 6 months ago

I always thought of it as a tool to write boilerplate faster, so no surprises for me

disconnectikacio@lemmy.world · 6 months ago

Yes there are mistakes, but if you direct it to the right direction, it can give you correct answers

agelord@lemmy.world · 6 months ago

In my experience, if you have the necessary skills to point it at the right direction, you don’t need to use it at the first place

andallthat@lemmy.world · edit-2 6 months ago

it’s just a convenience, not a magic wand. Sure relying on AI blindly and exclusively is a horrible idea (that lots of people peddle and quite a few suckers buy), but there’s room for a supervised and careful use of AI, same as we started using google instead of manpages and (grudgingly, for the older of us) tolerated the addition of syntax highlighting and even some code completion to all but the most basic text editors.

pearsaltchocolatebar@discuss.online · 6 months ago

AI is a tool, not a solution.

interdimensionalmeme@lemmy.ml · 6 months ago

Yesterday, I wrote all of this, working javascript code https://github.com/igorlogius/gather-from-tabs/discussions/8 And I don’t know a lick of javascript I know other languages but that barely was needed. I just gave it plain language instructions and reported the errors until it worked.

aidan@lemmy.world · 6 months ago

It can, it also sometimes can’t unless you ask it “could it be x answer”

corroded@lemmy.world · 6 months ago

I will resort to ChatGPT for coding help every so often. I’m a fairly experienced programmer, so my questions usually tend to be somewhat complex. I’ve found that’s it’s extremely useful for those problems that fall into the category of “I could solve this myself in 2 hours, or I could ask AI to solve it for me in seconds.” Usually, I’ll get a working solution, but almost every single time, it’s not a good solution. It provides a great starting-off point to write my own code.

Some of the issues I’ve found (speaking as a C++ developer) are: Variables not declared “const,” extremely inefficient use of data structures, ignoring modern language features, ignoring parallelism, using an improper data type, etc.

ChatGPT is great for generating ideas, but it’s going to be a while before it can actually replace a human developer. Producing code that works isn’t hard; producing code that’s good requires experience.

aesthelete@lemmy.world · edit-2 6 months ago

Sounds low

interdimensionalmeme@lemmy.ml · 6 months ago

Yes, and even if it was only right 1% of the time it would still be amazing

Also hallucinations are not a universally bad thing.

reksas@sopuli.xyz · 6 months ago

I just use it to get ideas about how to do something or ask it to write short functions for stuff i wouldnt know that well. I tried using it to create graphical ui for script but that was constant struggle to keep it on track. It managed to create something that kind of worked but it was like trying to hold 2 magnets of opposing polarity together and I had to constantly reset the conversation after it got “corrupted”.

Its useful tool if you dont rely on it, use it correctly and dont trust it too much.

foremanguy@lemmy.ml · 6 months ago

We have to wait a bit to have an useful assistant (but maybe something like copilot or more coded focused ai are better)

NounsAndWords@lemmy.world · 6 months ago

GPT-2 came out a little more than 5 years ago, it answered 0% of questions accurately and couldn’t string a sentence together.

GPT-3 came out a little less than 4 years ago and was kind of a neat party trick, but I’m pretty sure answered ~0% of programming questions correctly.

GPT-4 came out a little less than 2 years ago and can answer 48% of programming questions accurately.

I’m not talking about mortality, or creativity, or good/bad for humanity, but if you don’t see a trajectory here, I don’t know what to tell you.

14th_cylon@lemm.ee · 6 months ago

Seeing the trajectory is not ultimate answer to anything.

otp@sh.itjust.works · 6 months ago

I appreciate the XKCD comic, but I think you’re exaggerating that other commenter’s intent.

The tech has been improving, and there’s no obvious reason to assume that we’ve reached the peak already. Nor is the other commenter saying we went from 0 to 1 and so now we’re going to see something 400x as good.

stufkes@lemmy.world · 6 months ago

I think the one argument for the assumption that we’re near peak already is the entire issue of AI learning from AI input. I think numberphile discussed a maths paper that said that to achieve the accuracy that we want, there is simply not enough data to train it on.

That’s of course not to say that we can’t find alternative approaches

14th_cylon@lemm.ee · edit-2 6 months ago

I appreciate the XKCD comic, but I think you’re exaggerating that other commenter’s intent.

i don’t think so. the other commenter clearly rejects the critic(1) and implies that existence of upward trajectory means it will one day overcome the problem(2).

while (1) is well documented fact right now, (2) is just wishful thinking right now.

hence the comic, because “the trajectory” doesn’t really mean anything.

otp@sh.itjust.works · 6 months ago

In general, “The technology is young and will get better with time” is not just a reasonable argument, but almost a consistent pattern. Note that XKCD’s example is about events, not technology. The comic would be relevant if someone were talking about events happening, or something like sales, but not about technology.

Here, I’m not saying that you’re necessarily right or they’re necessarily wrong, just that the comic you shared is not a good fit.

14th_cylon@lemm.ee · 6 months ago

In general, “The technology is young and will get better with time” is not just a reasonable argument, but almost a consistent pattern. Note that XKCD’s example is about events, not technology.

yeah, no.

try to compare horse speed with ford t and blindly extrapolate that into the future. look at the moore’s law. technology does not just grow upwards if you give it enough time, most of it has some kind of limit.

and it is not out of realm of possibility that llms, having already stolen all of human knowledge from the internet, having found it is not enough and spewing out bullshit as a result of that monumental theft, have already reached it.

that may not be the case for every machine learning tool developed for some specific purpose, but blind assumption it will just grow indiscriminately, because “there is a trend”, is overly optimistic.

otp@sh.itjust.works · 6 months ago

I don’t think continuing further would be fruitful. I imagine your stance is heavily influenced by your opposition to, or dislike of, AI/LLMs

14th_cylon@lemm.ee · 6 months ago

oh sure. when someone says “you can’t just blindly extrapolate a curve”, there must be some conspiracy behind it, it absolutely cannot be because you can’t just blindly extrapolate a curve 😂

31337@sh.itjust.works · 6 months ago

We’re close to peak using current NN architectures and methods. All this started with the discovery of transformer architecture in 2017. Advances in architecture and methods have been fairly small and incremental since then. The advancements in performance has mostly just been throwing more data and compute at the models, and diminishing returns have been observed. GPT-3 costed something like $15 million to train. GPT-4 is a little better and costed something like $100 million to train. If the next model costs $1 billion to train, it will likely be a little better.

NounsAndWords@lemmy.world · 6 months ago

Perhaps there is some line between assuming infinite growth and declaring that this technology that is not quite good enough right now will therefore never be good enough?

Blindly assuming no further technological advancements seems equally as foolish to me as assuming perpetual exponential growth. Ironically, our ability to extrapolate from limited information is a huge part of human intelligence that AI hasn’t solved yet.

14th_cylon@lemm.ee · 6 months ago

will therefore never be good enough?

no one said that. but someone did try to reject the fact it is demonstrably bad right now, because “there is a trajectory”.

systemglitch@lemmy.world · 6 months ago

That comes off as disingenuous in this instance.

Snot Flickerman@lemmy.blahaj.zone · edit-2 6 months ago

https://www.reuters.com/technology/openai-ceo-altman-says-davos-future-ai-depends-energy-breakthrough-2024-01-16/

Speaking at a Bloomberg event on the sidelines of the World Economic Forum’s annual meeting in Davos, Altman said the silver lining is that more climate-friendly sources of energy, particularly nuclear fusion or cheaper solar power and storage, are the way forward for AI.

“There’s no way to get there without a breakthrough,” he said. “It motivates us to go invest more in fusion.”

It’s a good trajectory, but when you have people running these companies saying that we need “energy breakthroughs” to power something that gives more accurate answers in the face of a world that’s already experiencing serious issues arising from climate change…

It just seems foolhardy if we have to burn the planet down to get to 80% accuracy.

I’m glad Altman is at least promoting nuclear, but at the same time, he has his fingers deep in a nuclear energy company, so it’s not like this isn’t something he might be pushing because it benefits him directly. He’s not promoting nuclear because he cares about humanity, he’s promoting nuclear because has deep investment in nuclear energy. That seems like just one more capitalist trying to corner the market for themselves.

Eheran@lemmy.world · 6 months ago

The study is using 3.5, not version 4.

phoneymouse@lemmy.world · 6 months ago

4 produces inaccurate programming answers too

Eheran@lemmy.world · 6 months ago

Obviously. But it is FAR better yet again.

phoneymouse@lemmy.world · 6 months ago

Not really. I ask it questions all the time and it makes shit up.

Eheran@lemmy.world · 6 months ago

Yes. But it is better than 3.5 without any doubt.

egeres@lemmy.world · 6 months ago

Lemmy seems to be very near-sighted when it comes to the exponential curve of AI progress, I think this is an effect because the community is very anti-corp

Knock_Knock_Lemmy_In@lemmy.world · 6 months ago

In what year do you estimating AI will have 90% accuracy?

NounsAndWords@lemmy.world · 6 months ago

No clue? Somewhere between a few years (assuming some unexpected breakthrough) or many decades? The consensus from experts (of which I am not) seems to be somewhere in the 2030s/40s for AGI. I’m guessing accuracy probably will be more on a topic by topic basis, LLMs might never even get there, or only related to things they’ve been heavily trained on. If predictive text doesn’t do it then I would be betting on whatever Yann LeCun is working on.

exanime@lemmy.today · 6 months ago

You have no idea how many times I mentioned this observation from my own experience and people attacked me like I called their baby ugly

ChatGPT in its current form is good help, but nowhere ready to actually replace anyone

UnderpantsWeevil@lemmy.world · 6 months ago

A lot of firms are trying to outsource their dev work overseas to communities of non-English speakers, and then handing the result off to a tiny support team.

ChatGPT lets the cheap low skill workers churn out miles of spaghetti code in short order, creating the illusion of efficiency for people who don’t know (or care) what they’re buying.

Petter1@lemm.ee · 6 months ago

I guess it depends on the programming language… With python, I got very fast great results. But python is all about quick and dirty 😂

anlumo@lemmy.world · 6 months ago

In Rust, it’s not great. It can’t do proper memory management in the language, which is pretty essential.

Petter1@lemm.ee · 6 months ago

Well, if you use free chatGPT you only have knowledge until 2022, maybe that’s the reason

tsonfeir@lemmy.world · 6 months ago

If you ask the wrong questions you get the wrong results. If you don’t check the response for accuracy, you get invalid answers.

It’s just a tool. Don’t use it wrong because you’re lazy.