Why are we training AIs on reddit posts instead of Research Papers? We could be saving the world!

@Melatonin@lemmy.dbzer0.com · 7M

Why are we training AIs on reddit posts instead of Research Papers? We could be saving the world!

@RangerJosie@lemmy.world · 7M

Saving the world isn’t profitable in the short term.

Vulture capitalists don’t care about the future. They care about the immediate. Short term profitability. And nothing else.

@Strayce@lemmy.sdf.org · edit-2 7M

They are. T&F recently cut a deal with Microsoft. Without author’s consent, of course.

I’m fairly sure a few others have too, but that’s the only article I could find quickly.

HobbitFoot · 7M

Because they are looking for conversations.

@CanadaPlus@lemmy.sdf.org · 7M

They’re trained on both, and the kitchen sink.

@ImplyingImplications@lemmy.ca · 7M

Because AI needs a lot of training data to reliably generate something appropriate. It’s easier to get millions of reddit posts than millions of research papers.

Even then, LLMs simply generate text but have no idea what the text means. It just knows those words have a high probability of matching the expected response. It doesn’t check that what was generated is factual.

@Melatonin@lemmy.dbzer0.com · 7M

How do you know that’s not what YOU’RE doing when you converse?

@ulkesh@beehaw.org · 7M

Because we have brains that are capable of critical thinking. It makes no sense to compare the human brain to the infancy and current inanity of LLMs.

originalucifer · 7M

money. theres no money in saving the world. lots of money in not saving the world.

greed will be humanities downfall

Scott · 7M

Brain damage is cheaper than professionals

@ryathal@sh.itjust.works · 7M

Both are happening. Samples of casual writing are more valuable to use to generate an article than research papers though.

FaceDeer · 7M

Yeah. Scientific papers may teach an AI about science, but Reddit posts teach AI how to interact with people and “talk” to them. Both are valuable.

geekwithsoul · 7M

Hopefully not too pedantic, but no one is “teaching” AI anything. They’re just feeding it data in the hopes that it can learn probabilities for certain types of output. It “understands” neither the Reddit post nor the scientific paper.

@Zexks@lemmy.world · 7M

Describe how you ‘learned’ to speak. How do you know what word comes after the next. Until you can describe this process in a way that doesn’t make it ‘human’ or ‘biological’ only it’s no different. The only thing they can’t do is adjust their weights dynamically. But that’s a limitation we gave it not intrinsic to the system.

geekwithsoul · 7M

I inherited brain structures that are natural language processors. As well as the ability to understand and repeat any language sounds. Over time, my brain focused in on only the language sounds I heard the most and through trial and repetition learned how to understand and make those sounds.

AI - as it currently exists - is essentially a babbling infant with none of the structures necessary to do anything more than repeat sounds back without understanding any of them. Anyone who tells you different is selling you something.

@hoshikarakitaridia@lemmy.world · edit-2 7M

This might be a wild take but people always make AI out to be way more primitive than it is.

Yes, in it’s most basic for an LLM can be described as an auto-complete for conversations. But let’s be real: the amount of different optimizations and adjustments made before and after the fact is pretty complex, and the way the AI works is pretty close already to a brain. Hell that’s where we started out; emulating a brain. And you can look into this, the base for AI is usually neural networks, which learn to give specific parts of an input a specific amount of weight when generating the output. And when the output is not what we want, the AI slowly adjusts those weights to get closer.

Our brain works the same in it’s most basic form. We use electric signals and we think associative patterns. When an electric signal enters one node, this node is connected via stronger or lighter bridges to different nodes, forming our associations. Those bridges is exactly what we emulate when we use nodes with weighted connectors in artificial neural networks.

Our AI output is quality wise right now pretty good, but integrity and security wise pretty bad (hallucinations, not following prompts, etc.), but saying it is performing at the level of a three year old is simultaneously under-selling and overselling how AI performs. We should be aware that just because it’s AI doesn’t mean it’s good, but it also doesn’t mean it’s bad either. It just means there’s a feature (which is hopefully optional) and then we can decide if it’s helpful or not.

I do music production and I need cover art. As a student, I can’t afford commissioning good artworks every now and then, so AI is the way to go and it’s been nailing it.

As a software developer, I’ve come to appreciate that after about 2y of bad code completion AIs, there’s finally one that is a net positive for me.

AI is just like anything else, it’s a tool that brings change. How that change manifests depends on us as a collective. Let’s punish bad AI, dangerous AI or similar (copilot, Tesla self driving, etc.) and let’s promote good AI (Gmail text completion, chatgpt, code completion, image generators) and let’s also realize that the best things we can get out of AI will not hit the ceiling of human products for a while. But if it costs too much, or you need quick pointers, at least you know where to start.

geekwithsoul · 7M

This shows so many gross misconceptions and with such utter conviction, I’m not even sure where to start. And as you seem to have decided you like to get free stuff that is the result of AI trained off the work of others without them receiving any compensation, nothing I say will likely change your opinion because you have an emotional stake in not acknowledging the problems of AI.

@tiddy@sh.itjust.works · 7M

Papers are most importantly a documentation of exactly what and how a procedure was performed, adding a vagueness filter over that is only going to decrease its value infinitely.

Real question is why are we using generative ai at all (gets money out of idiot rich people)

@Even_Adder@lemmy.dbzer0.com · 7M

They’re trained on technical material too.

@callouscomic@lemm.ee · 7M

Most research papers are likely ad valid as an average reddit point.

Getting published is a circlejerk, and rarely are they properly tested, or does anyone actually read them.

@atimehoodie@lemmy.ml · 7M

Who’s going to peer review that?

lattrommi · 7M

I think I read this post wrong.

I was thinking the sentence “We could be saving the world!” meant ‘we’ as in humans only.

No need to be training AI. No need to do anything with AI at all. Humans simply start saving the world. Our Research Papers can train on Reddit. We cannot be training, we are saving the world. Let the Research Papers run a train on Reddit AI. Humanity Saves World.

No cynical replies please.

@realcaseyrollins@thelemmy.club · 7M

Those research papers are expensive to procure ethically, I’d imagine

@slacktoid@lemmy.ml · edit-2 7M

AuroraGPT. They are trying to do it.

Its cause number of people who can read, understand, and then create the necessary dataset to train and test the LLM are very very very few for research papers vs the data for pop culture is easilier to source.