How AI (oversimplified) Works

Large Language Models (LLM)

Essentially: "VERY fancy auto-complete"
Caveman with Rock : Orchestral Symphony :: Autocomplete : LLM

infographic titled "How/What Did the AI Learn?" with boxes labeled Training Data -> AI -> Results

Prerequisite: Have the computing infrastructure and research budget equivalent to a medium-sized country.

Step 1: Absorb all human text and turn it into numbers

Step 2: Run complex mathematical formula on the numbers to learn how human text "works"

Step 3: "Train" or "teach" computer programs how to understand the relationships between the numbers (words)

Step 4: Have low-paid humans oversee the computer "training" and "grade" the computers' progress

Step 5: Turn questions into numbers, analyze those numbers with respect to all the other numbers, predict the "answer" numbers.

Step 6: Take the final "answer" numbers and turn them back into language.

Wait a minute: ALL human text?

LLM Training Corpora:

CommonCrawl (public Internet scrape)
Wikipedia
Reddit (GPT-2)
Pubmed
Github
Gutenberg
ArXiv
"Books1" or "Books2" or "Books3"? (GPT-3 and ThePile)

-- https://stanford-cs324.github.io/winter2022/lectures/data/

Wait, "low-paid humans?"

https://time.com/6247678/openai-chatgpt-kenya-workers/