Skip to content
Sections
SEARCH
Technology
Today’s Paper
[Technology](/section/technology)<span
class="css-x15j1o">|Reddit Wants to Get
Paid for Helping to Teach Big A.I. Systems
<span
class="css-1n6z4y">
https://www.nytimes.com/2023/04/18/technology/reddit-ai-openai-google.html
- Give this article
-
- -
<span
class="css-kcgk2n">[7](/2023/04/18/technology/reddit-ai-openai-google.html)
### A.I. and Chatbots
- <span class="css-fi5tub"
data-testid="menu-link">Can You Spot the A.I. Images?
- <span class="css-fi5tub"
data-testid="menu-link">How 35 Real People Use A.I.
- <span class="css-fi5tub"
data-testid="menu-link">Become an A.I. Expert
- <span class="css-fi5tub"
data-testid="menu-link">How Chatbots Work
- <span class="css-fi5tub"
data-testid="menu-link">Why Chatbots ‘Hallucinate’
<span class="styln-edit-storyline" style="position:relative"
data-storyline-uri="nyt://storyline/c3f19d9c-d2cb-4619-8aea-ec3afa27be39"
data-storyline-module-name="menu">
Advertisement
Continue reading the main story
Supported by
Continue reading the main story
Reddit Wants to Get Paid for Helping to Teach Big A.I. Systems
==============================================================
The internet site has long been a forum for discussion on a huge variety
of topics, and companies like Google and OpenAI have been using it in
their A.I. projects.
- **Send any friend a story**
As a subscriber, you have **10 gift articles** to give each month.
Anyone can read what you share.
Give this article
-
- -
<span
class="css-kcgk2n">[7](/2023/04/18/technology/reddit-ai-openai-google.html)
“The Reddit
corpus of data is really valuable,” Steve Huffman, founder and chief
executive of Reddit, said in an interview. “But we don’t need to give
all of that value to some of the largest companies in the world for
free.”<span
class="css-1ly73wi e1tej78p0">Credit...<span
aria-hidden="false">Jason Henry for The New York
Times
By <span
class="css-1baulvz last-byline"
itemprop="name">Mike Isaac
Mike Isaac, based in San Francisco, writes about social media and the
technology industry.
April 18, 2023<span
class="css-233int e16638kd4">Updated 2:43 p.m.
ET
Reddit has long been a hot spot for conversation on the internet. About
57 million people visit the site every day to chat about topics as
varied as makeup, video games and pointers for power washing driveways.
In recent years, Reddit’s array of chats also have been a free teaching
aid for companies like Google, OpenAI and Microsoft. Those companies are
using Reddit’s conversations in the development of giant artificial
intelligence systems that many in Silicon Valley think are on their way
to becoming the tech industry’s next big thing.
Now Reddit wants to be paid for it. The company said on Tuesday that it
planned to begin charging companies for access to its application
programming interface, or A.P.I., the method through which outside
entities can download and process the social network’s vast selection of
person-to-person conversations.
“The Reddit corpus of data is really valuable,” Steve Huffman, founder
and chief executive of Reddit, said in an interview. “But we don’t need
to give all of that value to some of the largest companies in the world
for free.”
The move is one of the first significant examples of a social network’s
charging for access to the conversations it hosts for the purpose of
developing A.I. systems like ChatGPT, OpenAI’s popular program. Those
new A.I. systems could one day lead to big businesses, but they aren’t
likely to help companies like Reddit very much. In fact, they could be
used to create competitors — automated duplicates to Reddit’s
conversations.
Reddit is also acting as it prepares for a possible initial public
offering on Wall Street this year. The company, which was founded in
2005, makes most of its money through advertising and e-commerce
transactions on its platform. Reddit said it was still ironing out the
details of what it would charge for A.P.I. access and would announce
prices in the coming weeks.
Reddit’s conversation forums have become valuable commodities as large
language models, or L.L.M.s, have become an essential part of creating
new A.I. technology.
A New Generation of Chatbots
----------------------------
Card 1 of 5
**A brave new world.** A new crop of chatbots powered by
artificial intelligence has ignited a scramble to determine whether the
technology [could upend the economics of the
internet](
https://www.nytimes.com/2023/03/08/technology/chatbots-disrupt-internet-industry.html?action=click&pgtype=Article&state=default&module=styln-artificial-intelligence&variant=show®ion=MAIN_CONTENT_1&block=storyline_levelup_swipe_recirc),
turning today’s powerhouses into has-beens and creating the industry’s
next giants. Here are the bots to know:
**ChatGPT.** ChatGPT, the artificial intelligence language model
from a research lab, OpenAI, has been making headlines since November
for its ability to respond to complex questions, write poetry, generate
code, [plan
vacations](
https://www.nytimes.com/2023/03/16/travel/chatgpt-artificial-intelligence-travel-vacation.html?action=click&pgtype=Article&state=default&module=styln-artificial-intelligence&variant=show®ion=MAIN_CONTENT_1&block=storyline_levelup_swipe_recirc) and
translate languages. GPT-4, the latest version introduced in mid-March,
[can even respond to
images](
https://www.nytimes.com/2023/03/14/technology/openai-new-gpt4.html?action=click&pgtype=Article&state=default&module=styln-artificial-intelligence&variant=show®ion=MAIN_CONTENT_1&block=storyline_levelup_swipe_recirc) (and
ace the Uniform Bar Exam).
**Bing.** Two months after ChatGPT’s debut, Microsoft, OpenAI’s
primary investor and partner, [added a similar
chatbot](
https://www.nytimes.com/2023/02/07/technology/microsoft-ai-chatgpt-bing.html?action=click&pgtype=Article&state=default&module=styln-artificial-intelligence&variant=show®ion=MAIN_CONTENT_1&block=storyline_levelup_swipe_recirc),
capable of having open-ended text conversations on virtually any topic,
to its Bing internet search engine. But it was the bot’s occasionally
inaccurate, misleading and [weird
responses](
https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html?action=click&pgtype=Article&state=default&module=styln-artificial-intelligence&variant=show®ion=MAIN_CONTENT_1&block=storyline_levelup_swipe_recirc) that
drew much of the attention after its release.
**Bard.** Google’s chatbot, called Bard, [was released in
March](
https://www.nytimes.com/2023/03/21/technology/google-bard-chatbot.html?action=click&pgtype=Article&state=default&module=styln-artificial-intelligence&variant=show®ion=MAIN_CONTENT_1&block=storyline_levelup_swipe_recirc) to
a limited number of users in the United States and Britain. Originally
conceived as a creative tool designed to draft emails and poems, it can
generate ideas, write blog posts and [answer questions with facts or
opinions](
https://www.nytimes.com/2023/03/21/technology/google-bard-guide-test.html?action=click&pgtype=Article&state=default&module=styln-artificial-intelligence&variant=show®ion=MAIN_CONTENT_1&block=storyline_levelup_swipe_recirc).
**Ernie.** The search giant Baidu unveiled China’s first major
rival to ChatGPT in March. The debut of Ernie, short for Enhanced
Representation through Knowledge Integration, [turned out to be a
flop](
https://www.nytimes.com/2023/03/16/world/asia/china-baidu-chatgpt-ernie.html?action=click&pgtype=Article&state=default&module=styln-artificial-intelligence&variant=show®ion=MAIN_CONTENT_1&block=storyline_levelup_swipe_recirc) after
a promised “live” demonstration of the bot was revealed to have been
recorded.
- - - - -
<span class="arrow-left css-1tnh8ll" role="button"
aria-label="Previous card" aria-disabled="true"
tabindex="-1"><span class="arrow-right css-95lxlj" role="button"
aria-label="Next card" aria-disabled="false" tabindex="0">
L.L.M.s are essentially sophisticated algorithms developed by companies
like Google and OpenAI, which is a close partner of Microsoft. To the
algorithms, the Reddit conversations are data, and they are among the
vast pool of material being fed into the L.L.M.s. to develop them.
The underlying algorithm that helped to build Bard,
Google’s conversational A.I. service,
is
partly trained
on Reddit data. OpenAI’s Chat GPT
cites
Reddit
data
as one of the
sources
of information it has been trained on.
Other companies are also beginning to see value in the conversations and
images they host. Shutterstock, the image hosting service, also
sold image data to OpenAI
to help create DALL-E, the A.I. program that creates vivid graphical
imagery with only a text-based prompt required.
Last month, Elon Musk, the owner of Twitter, said he was cracking down
on the use of Twitter’s A.P.I., which thousands of companies and
independent developers use to track the millions of conversations across
the network. Though he did not cite L.L.M.s as a reason for the change,
the new fees could go well into
the tens or even hundreds of thousands of dollars.
To keep improving their models, artificial intelligence makers need two
significant things: an enormous amount of computing power and an
enormous amount of data. Some of the biggest A.I. developers have plenty
of computing power but still look outside their own networks for the
data needed to improve their algorithms. That has included sources like
Wikipedia, millions of digitized books, academic articles and Reddit.
Representatives from Google, Open AI and Microsoft did not immediately
respond to a request for comment.
Reddit has long had a symbiotic relationship with the search engines of
companies like Google and Microsoft. The search engines “crawl” Reddit’s
web pages in order to index information and make it available for search
results. That crawling, or “scraping,”
isn’t always welcome
by every site on the internet. But Reddit has benefited by appearing
higher in search results.
The dynamic is different with L.L.M.s — they gobble as much data as they
can to create new A.I. systems like the chatbots.
Reddit believes its data is particularly valuable because it is
continuously updated. That newness and relevance, Mr. Huffman said, is
what large language modeling algorithms need to produce the best
results.
“More than any other place on the internet, Reddit is a home for
authentic conversation,” Mr. Huffman said. “There’s a lot of stuff on
the site that you’d only ever say in therapy, or A.A., or never at all.”
Mr. Huffman said Reddit’s A.P.I. would still be free to developers who
wanted to build applications that helped people use Reddit. They could
use the tools to build a bot that automatically tracks whether users’
comments adhere to rules for posting, for instance. Researchers who want
to study Reddit data for academic or noncommercial purposes will
continue to have free access to it.
Reddit also hopes to incorporate more so-called machine learning into
how the site itself operates. It could be used, for instance, to
identify the use of A.I.-generated text on Reddit, and add a label that
notifies users that the comment came from a bot.
The company also promised to improve software tools that can be used by
moderators — the users who volunteer their time to keep the site’s
forums operating smoothly and improve conversations between users. And
third-party bots that help moderators monitor the forums will continue
to be supported.
But for the A.I. makers, it’s time to pay up.
“Crawling Reddit, generating value and not returning any of that value
to our users is something we have a problem with,” Mr. Huffman said.
“It’s a good time for us to tighten things up.”
“We think that’s fair,” he added.
Advertisement
Continue reading the main story
Site Index
----------
Go to Home Page »
news
- Home Page
- World
- Coronavirus
- U.S.
- Politics
- New York
- Business
- Tech
- Science
- Climate
- Sports
- Wildfire Tracker
- Obituaries
- The Upshot
- International
- Canada
- Español
- 中文网
- Today's Paper
- Corrections
- Trending
Opinion
- Today's Opinion
- Columnists
- Editorials
- Guest Essays
- Letters
- Sunday Opinion
- Opinion Video
Arts
- Today's Arts
- Art & Design
- Books
- Best Sellers Book List
- Dance
- Movies
- Music
- Pop Culture
- Television
- Theater
- What to Watch
- Video: Arts
Living
- Automotive
- Games
- Education
- Food
- Health
- Jobs
- Love
- Magazine
- Parenting
- Real Estate
- Style
- T Magazine
- Travel
Listings & More
- Reader Center
- The Athletic
- Wirecutter
- Cooking
- Headway
- Live Events
- The Learning Network
- Tools & Services
- Podcasts
- Video
- Graphics
- TimesMachine
- Times Store
- Manage My Account
- NYTLicensing
### news
- Home Page
- World
- Coronavirus
- U.S.
- Politics
- New York
- Business
- Tech
- Science
- Climate
- Sports
- Wildfire Tracker
- Obituaries
- The Upshot
- International
- Canada
- Español
- 中文网
- Today's Paper
- Corrections
- Trending
### Opinion
- Today's Opinion
- Columnists
- Editorials
- Guest Essays
- Letters
- Sunday Opinion
- Opinion Video
### Arts
- Today's Arts
- Art & Design
- Books
- Best Sellers Book List
- Dance
- Movies
- Music
- Pop Culture
- Television
- Theater
- What to Watch
- Video: Arts
### Living
- Automotive
- Games
- Education
- Food
- Health
- Jobs
- Love
- Magazine
- Parenting
- Real Estate
- Style
- T Magazine
- Travel
### More
- Reader Center
- The Athletic
- Wirecutter
- Cooking
- Headway
- Live Events
- The Learning Network
- Tools & Services
- Podcasts
- Video
- Graphics
- TimesMachine
- Times Store
- Manage My Account
- NYTLicensing
### Subscribe
- Home Delivery
- Digital Subscriptions
- Games
- Cooking
- Email Newsletters
- Corporate Subscriptions
- Education Rate
- Mobile Applications
- Replica Edition
- International
- Canada
- Español
- 中文网
Site Information Navigation
---------------------------
- © 2023 The New York Times Company
- NYTCo
- Contact Us
- Accessibility
- Work with us
- Advertise
- T Brand Studio
- Your Ad Choices
- Privacy Policy
- Terms of Service
- Terms of Sale
- Site Map
- Canada
- International
- Help
- Subscriptions