The research from Purdue University, first spotted by news outlet Futurism, was presented earlier this month at the Computer-Human Interaction Conference in Hawaii and looked at 517 programming questions on Stack Overflow that were then fed to ChatGPT.
“Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose,” the new study explained. “Nonetheless, our user study participants still preferred ChatGPT answers 35% of the time due to their comprehensiveness and well-articulated language style.”
Disturbingly, programmers in the study didn’t always catch the mistakes being produced by the AI chatbot.
“However, they also overlooked the misinformation in the ChatGPT answers 39% of the time,” according to the study. “This implies the need to counter misinformation in ChatGPT answers to programming questions and raise awareness of the risks associated with seemingly correct answers.”
My experience with an AI coding tool today.
Me: Can you optimize this method.
AI: Okay, here’s an optimized method.
Me seeing the AI completely removed a critical conditional check.
Me: Hey, you completely removed this check with variable xyz
Ai: oops you’re right, here you go I fixed it.
It did this 3 times on 3 different optimization requests.
It was 0 for 3
Although there was some good suggestions in the suggestions once you get past the blatant first error
My favorite is when I ask for something and it gets stuck in a loop, pasting the same comment over and over
Yeah it’s wrong a lot but as a developer, damn it’s useful. I use Gemini for asking questions and Copilot in my IDE personally, and it’s really good at doing mundane text editing bullshit quickly and writing boilerplate, which is a massive time saver. Gemini has at least pointed me in the right direction with quite obscure issues or helped pinpoint the cause of hidden bugs many times. I treat it like an intelligent rubber duck rather than expecting it to just solve everything for me outright.
Same here. It’s good for writing your basic unit tests, and the explain feature is useful getting for getting your head wrapped around complex syntax, especially as bad as searching for useful documentation has gotten on Google and ddg.
That’s a good way to use it. Like every technological evolution it comes with risks and downsides. But if you are aware of that and know how to use it, it can be a useful tool.
And as always, it only gets better over time. One day we will probably rely more heavily on such AI tools, so it’s a good idea to adapt quickly.
“Major new Technology still in Infancy Needs Improvements”
– headline every fucking day
“Corporation using immature technology in productions because it’s cool”
More news at eleven
This is scary because up to now, all software released worked exactly as intended so we need to be extra special careful here.
Yes, and we never have and never will put lives in the hands of software developers before!
Tap for spoiler
/s…for this comment and the above one, for anyone who needs it
“Will this technology save us from ourselves, or are we just jerking off?”
Who would have thought that an artificial intelligence trained on human intelligence would be just as dumb
Hm. This is what I got.
I think about 90% of the screenshots we see of LLMs failing hilariously are doctored. Lemmy users really want to believe it’s that bad through.
Edit:
I’ve had lots of great experiences with ChatGPT, and I’ve also had it hallucinate things.
I saw someone post an image of a simplified riddle, where ChatGPT tried to solve it as if it were the entire riddle, but it added extra restrictions and have a confusing response. I tried it for myself and got an even better answer.
Prompt (no prior context except saying I have a riddle for it):
A man and a goat are on one side of the river. They have a boat. How can they go across?
Response:
The man takes the goat across the river first, then he returns alone and takes the boat across again. Finally, he brings the goat’s friend, Mr. Cabbage, across the river.
I wish I was witty enough to make this up.
I reproduced that one and so I believe that one is true.
I looked up the whole riddle and see how it got confused.
It happened on 3.5 but not 4.
Interesting! What did 4 say?
Evidently I didn’t save the conversation but I went ahead and entered the exact prompt above into GPT-4. It responded with:
The man can take the goat across the river in the boat. After reaching the other side, he can leave the goat and return alone to the starting side if needed. This solution assumes the boat is capable of carrying at least the man and the goat at the same time. If there are no further constraints like a need to transport additional items or animals, this straightforward approach should work just fine!
Thanks for sharing!
Yesterday, someone posted a doctored one on here saying everyone eats it up even if you use a ridiculous font in your poorly doctored photo. People who want to believe are quite easy to fool.
Holy fuck did it just pass the Turing test?
ChatGPT and github copilot are great tools, but they’re like a chainsaw: if you apply them incorrectly or become too casual and careless with them, they will kickback at you and fuck your day up.
I always thought of it as a tool to write boilerplate faster, so no surprises for me
Yes there are mistakes, but if you direct it to the right direction, it can give you correct answers
In my experience, if you have the necessary skills to point it at the right direction, you don’t need to use it at the first place
it’s just a convenience, not a magic wand. Sure relying on AI blindly and exclusively is a horrible idea (that lots of people peddle and quite a few suckers buy), but there’s room for a supervised and careful use of AI, same as we started using google instead of manpages and (grudgingly, for the older of us) tolerated the addition of syntax highlighting and even some code completion to all but the most basic text editors.
AI is a tool, not a solution.
Yesterday, I wrote all of this, working javascript code https://github.com/igorlogius/gather-from-tabs/discussions/8 And I don’t know a lick of javascript I know other languages but that barely was needed. I just gave it plain language instructions and reported the errors until it worked.
It can, it also sometimes can’t unless you ask it “could it be x answer”
I will resort to ChatGPT for coding help every so often. I’m a fairly experienced programmer, so my questions usually tend to be somewhat complex. I’ve found that’s it’s extremely useful for those problems that fall into the category of “I could solve this myself in 2 hours, or I could ask AI to solve it for me in seconds.” Usually, I’ll get a working solution, but almost every single time, it’s not a good solution. It provides a great starting-off point to write my own code.
Some of the issues I’ve found (speaking as a C++ developer) are: Variables not declared “const,” extremely inefficient use of data structures, ignoring modern language features, ignoring parallelism, using an improper data type, etc.
ChatGPT is great for generating ideas, but it’s going to be a while before it can actually replace a human developer. Producing code that works isn’t hard; producing code that’s good requires experience.
Sounds low
Yes, and even if it was only right 1% of the time it would still be amazing
Also hallucinations are not a universally bad thing.
I just use it to get ideas about how to do something or ask it to write short functions for stuff i wouldnt know that well. I tried using it to create graphical ui for script but that was constant struggle to keep it on track. It managed to create something that kind of worked but it was like trying to hold 2 magnets of opposing polarity together and I had to constantly reset the conversation after it got “corrupted”.
Its useful tool if you dont rely on it, use it correctly and dont trust it too much.
We have to wait a bit to have an useful assistant (but maybe something like copilot or more coded focused ai are better)
GPT-2 came out a little more than 5 years ago, it answered 0% of questions accurately and couldn’t string a sentence together.
GPT-3 came out a little less than 4 years ago and was kind of a neat party trick, but I’m pretty sure answered ~0% of programming questions correctly.
GPT-4 came out a little less than 2 years ago and can answer 48% of programming questions accurately.
I’m not talking about mortality, or creativity, or good/bad for humanity, but if you don’t see a trajectory here, I don’t know what to tell you.
Seeing the trajectory is not ultimate answer to anything.
I appreciate the XKCD comic, but I think you’re exaggerating that other commenter’s intent.
The tech has been improving, and there’s no obvious reason to assume that we’ve reached the peak already. Nor is the other commenter saying we went from 0 to 1 and so now we’re going to see something 400x as good.
I think the one argument for the assumption that we’re near peak already is the entire issue of AI learning from AI input. I think numberphile discussed a maths paper that said that to achieve the accuracy that we want, there is simply not enough data to train it on.
That’s of course not to say that we can’t find alternative approaches
I appreciate the XKCD comic, but I think you’re exaggerating that other commenter’s intent.
i don’t think so. the other commenter clearly rejects the critic(1) and implies that existence of upward trajectory means it will one day overcome the problem(2).
while (1) is well documented fact right now, (2) is just wishful thinking right now.
hence the comic, because “the trajectory” doesn’t really mean anything.
In general, “The technology is young and will get better with time” is not just a reasonable argument, but almost a consistent pattern. Note that XKCD’s example is about events, not technology. The comic would be relevant if someone were talking about events happening, or something like sales, but not about technology.
Here, I’m not saying that you’re necessarily right or they’re necessarily wrong, just that the comic you shared is not a good fit.
In general, “The technology is young and will get better with time” is not just a reasonable argument, but almost a consistent pattern. Note that XKCD’s example is about events, not technology.
yeah, no.
try to compare horse speed with ford t and blindly extrapolate that into the future. look at the moore’s law. technology does not just grow upwards if you give it enough time, most of it has some kind of limit.
and it is not out of realm of possibility that llms, having already stolen all of human knowledge from the internet, having found it is not enough and spewing out bullshit as a result of that monumental theft, have already reached it.
that may not be the case for every machine learning tool developed for some specific purpose, but blind assumption it will just grow indiscriminately, because “there is a trend”, is overly optimistic.
I don’t think continuing further would be fruitful. I imagine your stance is heavily influenced by your opposition to, or dislike of, AI/LLMs
oh sure. when someone says “you can’t just blindly extrapolate a curve”, there must be some conspiracy behind it, it absolutely cannot be because you can’t just blindly extrapolate a curve 😂
We’re close to peak using current NN architectures and methods. All this started with the discovery of transformer architecture in 2017. Advances in architecture and methods have been fairly small and incremental since then. The advancements in performance has mostly just been throwing more data and compute at the models, and diminishing returns have been observed. GPT-3 costed something like $15 million to train. GPT-4 is a little better and costed something like $100 million to train. If the next model costs $1 billion to train, it will likely be a little better.
Perhaps there is some line between assuming infinite growth and declaring that this technology that is not quite good enough right now will therefore never be good enough?
Blindly assuming no further technological advancements seems equally as foolish to me as assuming perpetual exponential growth. Ironically, our ability to extrapolate from limited information is a huge part of human intelligence that AI hasn’t solved yet.
will therefore never be good enough?
no one said that. but someone did try to reject the fact it is demonstrably bad right now, because “there is a trajectory”.
That comes off as disingenuous in this instance.
Speaking at a Bloomberg event on the sidelines of the World Economic Forum’s annual meeting in Davos, Altman said the silver lining is that more climate-friendly sources of energy, particularly nuclear fusion or cheaper solar power and storage, are the way forward for AI.
“There’s no way to get there without a breakthrough,” he said. “It motivates us to go invest more in fusion.”
It’s a good trajectory, but when you have people running these companies saying that we need “energy breakthroughs” to power something that gives more accurate answers in the face of a world that’s already experiencing serious issues arising from climate change…
It just seems foolhardy if we have to burn the planet down to get to 80% accuracy.
I’m glad Altman is at least promoting nuclear, but at the same time, he has his fingers deep in a nuclear energy company, so it’s not like this isn’t something he might be pushing because it benefits him directly. He’s not promoting nuclear because he cares about humanity, he’s promoting nuclear because has deep investment in nuclear energy. That seems like just one more capitalist trying to corner the market for themselves.
The study is using 3.5, not version 4.
4 produces inaccurate programming answers too
Obviously. But it is FAR better yet again.
Not really. I ask it questions all the time and it makes shit up.
Yes. But it is better than 3.5 without any doubt.
Lemmy seems to be very near-sighted when it comes to the exponential curve of AI progress, I think this is an effect because the community is very anti-corp
In what year do you estimating AI will have 90% accuracy?
No clue? Somewhere between a few years (assuming some unexpected breakthrough) or many decades? The consensus from experts (of which I am not) seems to be somewhere in the 2030s/40s for AGI. I’m guessing accuracy probably will be more on a topic by topic basis, LLMs might never even get there, or only related to things they’ve been heavily trained on. If predictive text doesn’t do it then I would be betting on whatever Yann LeCun is working on.
You have no idea how many times I mentioned this observation from my own experience and people attacked me like I called their baby ugly
ChatGPT in its current form is good help, but nowhere ready to actually replace anyone
A lot of firms are trying to outsource their dev work overseas to communities of non-English speakers, and then handing the result off to a tiny support team.
ChatGPT lets the cheap low skill workers churn out miles of spaghetti code in short order, creating the illusion of efficiency for people who don’t know (or care) what they’re buying.
I guess it depends on the programming language… With python, I got very fast great results. But python is all about quick and dirty 😂
In Rust, it’s not great. It can’t do proper memory management in the language, which is pretty essential.
Well, if you use free chatGPT you only have knowledge until 2022, maybe that’s the reason
If you ask the wrong questions you get the wrong results. If you don’t check the response for accuracy, you get invalid answers.
It’s just a tool. Don’t use it wrong because you’re lazy.