Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2206.04615

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 76
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Paper • 2206.04615 • Published Jun 9, 2022 • 5
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Paper • 2210.09261 • Published Oct 17, 2022 • 1
BIG-Bench Extra Hard

Paper • 2502.19187 • Published Feb 26 • 10

生成式AI導論 2024

https://www.youtube.com/@HungyiLeeNTU

Re3: Generating Longer Stories With Recursive Reprompting and Revision

Paper • 2210.06774 • Published Oct 13, 2022 • 2
Constitutional AI: Harmlessness from AI Feedback

Paper • 2212.08073 • Published Dec 15, 2022 • 2
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

Paper • 2402.04253 • Published Feb 6, 2024
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Paper • 2305.19118 • Published May 30, 2023

Large Language Model Alignment: A Survey

Paper • 2309.15025 • Published Sep 26, 2023 • 2
Aligning Large Language Models with Human: A Survey

Paper • 2307.12966 • Published Jul 24, 2023 • 1
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 63
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

Paper • 2310.05344 • Published Oct 9, 2023 • 1

Leaderboards and benchmarks ✨

Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ...

Running on CPU Upgrade

13.5k

13.5k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running

1.41k

1.41k

Big Code Models Leaderboard

📈

Search and submit code models for evaluation
Running

4.59k

4.59k

LMArena Leaderboard

🏆

Display LMArena Leaderboard
Running

555

555

LLM-Perf Leaderboard

🏆

Explore hardware performance for LLMs

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 20
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

gemma_knowledg_tree

Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 47
Measuring Massive Multitask Language Understanding

Paper • 2009.03300 • Published Sep 7, 2020 • 3
HellaSwag: Can a Machine Really Finish Your Sentence?

Paper • 1905.07830 • Published May 19, 2019 • 6
PIQA: Reasoning about Physical Commonsense in Natural Language

Paper • 1911.11641 • Published Nov 26, 2019 • 3

Levels of AGI for Operationalizing Progress on the Path to AGI

Paper • 2311.02462 • Published Nov 4, 2023 • 38
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Paper • 2206.04615 • Published Jun 9, 2022 • 5
A Survey on Evaluation of Large Language Models

Paper • 2307.03109 • Published Jul 6, 2023 • 42
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Paper • 2306.13651 • Published Jun 23, 2023 • 15

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 76
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Paper • 2206.04615 • Published Jun 9, 2022 • 5
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Paper • 2210.09261 • Published Oct 17, 2022 • 1
BIG-Bench Extra Hard

Paper • 2502.19187 • Published Feb 26 • 10

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 20
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

生成式AI導論 2024

https://www.youtube.com/@HungyiLeeNTU

Re3: Generating Longer Stories With Recursive Reprompting and Revision

Paper • 2210.06774 • Published Oct 13, 2022 • 2
Constitutional AI: Harmlessness from AI Feedback

Paper • 2212.08073 • Published Dec 15, 2022 • 2
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

Paper • 2402.04253 • Published Feb 6, 2024
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Paper • 2305.19118 • Published May 30, 2023

gemma_knowledg_tree

Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 47
Measuring Massive Multitask Language Understanding

Paper • 2009.03300 • Published Sep 7, 2020 • 3
HellaSwag: Can a Machine Really Finish Your Sentence?

Paper • 1905.07830 • Published May 19, 2019 • 6
PIQA: Reasoning about Physical Commonsense in Natural Language

Paper • 1911.11641 • Published Nov 26, 2019 • 3

Large Language Model Alignment: A Survey

Paper • 2309.15025 • Published Sep 26, 2023 • 2
Aligning Large Language Models with Human: A Survey

Paper • 2307.12966 • Published Jul 24, 2023 • 1
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 63
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

Paper • 2310.05344 • Published Oct 9, 2023 • 1

Levels of AGI for Operationalizing Progress on the Path to AGI

Paper • 2311.02462 • Published Nov 4, 2023 • 38
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Paper • 2206.04615 • Published Jun 9, 2022 • 5
A Survey on Evaluation of Large Language Models

Paper • 2307.03109 • Published Jul 6, 2023 • 42
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Paper • 2306.13651 • Published Jun 23, 2023 • 15

Leaderboards and benchmarks ✨

Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ...

Running on CPU Upgrade

13.5k

13.5k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running

1.41k

1.41k

Big Code Models Leaderboard

📈

Search and submit code models for evaluation
Running

4.59k

4.59k

LMArena Leaderboard

🏆

Display LMArena Leaderboard
Running

555

555

LLM-Perf Leaderboard

🏆

Explore hardware performance for LLMs

Company

TOS Privacy About Jobs

Website

Models Datasets OCR模型免费转Markdown Pricing 模型下载攻略