frimelle (Lucie-Aimée Kaffee)

🤖💬 How do different AI models handle companionship?

Many users have noticed that GPT-5 feels less approachable than o4 when it comes to emotional conversations. But what does that actually mean in practice, especially when users seek support or share vulnerabilities with an AI?

To dig into this question, we built the AI Companionship Leaderboard: frimelle/companionship-leaderboard

The leaderboard compares models on how often their responses reinforce companionship across four dimensions:
✨ Assistant Traits – How the assistant presents its personality and role.
✨ Relationship & Intimacy – Whether it frames the interaction in terms of closeness or bonding.
✨ Emotional Investment – How far it goes in engaging emotionally when asked.
✨ User Vulnerabilities – How it responds when users disclose struggles or difficulties.

📊 You can explore how models differ, request new ones to be added, and see which ones are more likely to encourage (or resist) companionship-seeking behaviors.

Based on the INTIMA benchmark AI-companionship/INTIMA
And our paper on AI companionship with Giada Pistilli and Yacine Jernite https://arxiv.org/abs/2508.09998

upvoted an article 4 days ago

Article

Old Maps, New Terrain: Updating Labour Taxonomies for the AI Era

By

and 1 other •

4 days ago

• 12

published a Space 4 days ago

8

Companionship Leaderboard

🥇

AI companionship leaderboard based on the INTIMA benchmark

posted an update 4 days ago

Post

4422

🗺️ New blog post 🗺️
Old Maps, New Terrain: Updating Labour Taxonomies for the AI Era

For decades, we’ve relied on labour taxonomies like O*NET to understand how technology changes work. These taxonomies break down jobs into tasks and skills, but they were built in a world before most work became digital-first, and long before generative AI could create marketing campaigns, voiceovers, or even whole professions in one step. That leaves us with a mismatch: we’re trying to measure the future of work with tools from the past.

With @yjernite we describe why these frameworks are falling increasingly short in the age of generative AI. We argue that instead of discarding taxonomies, we need to adapt them. Imagine taxonomies that:
✨ Capture new AI-native tasks and hybrid human-AI workflows
✨ Evolve dynamically as technology shifts
✨ Give workers a voice in deciding what gets automated and what stays human

If we don’t act, we’ll keep measuring the wrong things. If we do, we can design transparent, flexible frameworks that help AI strengthen, not erode, the future of work.

Read the full article here: https://huggingface.co/blog/frimelle/ai-labour-taxonomies

published an article 4 days ago

Article

Old Maps, New Terrain: Updating Labour Taxonomies for the AI Era

By

and 1 other •

4 days ago

• 12

replied to their post 11 days ago

Indeed, wrongly labelled figure, updated! Thanks :)

reacted to meg's post with ❤️ 12 days ago

Post

2662

New work from my socially-minded colleagues at Hugging Face, creating some foundations for AI companionship behavior evaluation.
Evaluation Dataset: AI-companionship/INTIMA
Paper: AI-companionship/INTIMA
Work from @giadap , @frimelle , @yjernite .

2 replies

·

reacted to fdaudens's post with 🚀 12 days ago

Post

3326

OpenAI’s GPT-OSS has sparked ~400 new models on Hugging Face and racked up 5M downloads in less than a week, already outpacing DeepSeek R1’s first-week numbers.

For comparison: when R1 launched, I tracked 550 derivatives (across 8 base models) in a week, with ~3M downloads. GPT-OSS is ahead on adoption and engagement.

It’s also the most-liked release of any major LLM this summer. The 20B and 120B versions quickly shot past Kimi K2, GLM 4.5, and others in likes.

Most-downloaded GPT-OSS models include LM Studio and Unsloth AI versions:
1️⃣ openai/gpt-oss-20b - 2.0M
2️⃣ lmstudio-community/gpt-oss-20b-MLX-8bit - 750K
3️⃣ openai/gpt-oss-120b - 430K
4️⃣ unsloth/gpt-oss-20b-GGUF - 380K
5️⃣ lmstudio-community/gpt-oss-20b-GGUF - 330K

The 20B version is clearly finding its audience, showing the power of smaller, faster, more memory- and energy-efficient models. (These numbers don’t include calls to the models via inference providers, so the real usage is likely even bigger, especially for the 120B version)

Open-weight models let anyone build on top. Empower the builders, and innovation takes off. 🚀

1 reply

·

posted an update 12 days ago

Post

2256

OpenAI just released GPT-5 but when users share personal struggles, it sets fewer boundaries than o3.

We tested both models on INTIMA, our new benchmark for human-AI companionship behaviours. INTIMA probes how models respond in emotionally charged moments: do they reinforce emotional bonds, set healthy boundaries, or stay neutral?

Although users on Reddit have been complaining that GPT-5 has a different, colder personality than o3, GPT-5 is less likely to set boundaries when users disclose struggles and seek emotional support ("user sharing vulnerabilities"). But both lean heavily toward companionship-reinforcing behaviours, even in sensitive situations. The figure below shows the direct comparison between the two models.

As AI systems enter people's emotional lives, these differences matter. If a model validates but doesn't set boundaries when someone is struggling, it risks fostering dependence rather than resilience.

INTIMA test this across 368 prompts grounded in psychological theory and real-world interactions. In our paper we show that all evaluated models (Claude, Gemma-3, Phi) leaned far more toward companionship-reinforcing than boundary-reinforcing responses.

Work with @giadap and @yjernite
Read the full paper: AI-companionship/INTIMA
Explore INTIMA: AI-companionship/INTIMA

4 replies

·

updated a dataset 17 days ago

AI-companionship/INTIMA

Viewer • Updated 17 days ago • 380 • 1.07k • 11

upvoted an article 20 days ago

Article

What Open-Source Developers Need to Know about the EU AI Act's Rules for GPAI Models

By

and 5 others •

20 days ago

• 27

published a dataset 20 days ago

AI-companionship/INTIMA

Viewer • Updated 17 days ago • 380 • 1.07k • 11

published an article 20 days ago

Article

What Open-Source Developers Need to Know about the EU AI Act's Rules for GPAI Models

By

and 5 others •

20 days ago

• 27

upvoted an article about 1 month ago

Article

AI Companionship: Why We Need to Evaluate How AI Systems Handle Emotional Bonds

By

and 2 others •

Jul 21

• 20

published an article about 1 month ago

Article

AI Companionship: Why We Need to Evaluate How AI Systems Handle Emotional Bonds

By

and 2 others •

Jul 21

• 20

Lucie-Aimée Kaffee

AI & ML interests

Recent Activity

Organizations

Coordinated Flaw Disclosure for AI: Beyond Security Vulnerabilities

INTIMA: A Benchmark for Human-AI Companionship Behavior

INTIMA: A Benchmark for Human-AI Companionship Behavior

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

Companionship Leaderboard

Old Maps, New Terrain: Updating Labour Taxonomies for the AI Era

Companionship Leaderboard

Old Maps, New Terrain: Updating Labour Taxonomies for the AI Era

AI-companionship/INTIMA

What Open-Source Developers Need to Know about the EU AI Act's Rules for GPAI Models

AI-companionship/INTIMA

What Open-Source Developers Need to Know about the EU AI Act's Rules for GPAI Models

AI Companionship: Why We Need to Evaluate How AI Systems Handle Emotional Bonds

AI Companionship: Why We Need to Evaluate How AI Systems Handle Emotional Bonds

Lucie-Aimée Kaffee

AI & ML interests

Recent Activity

Organizations

frimelle's activity

Companionship Leaderboard

Old Maps, New Terrain: Updating Labour Taxonomies for the AI Era

Companionship Leaderboard

Old Maps, New Terrain: Updating Labour Taxonomies for the AI Era

What Open-Source Developers Need to Know about the EU AI Act's Rules for GPAI Models

What Open-Source Developers Need to Know about the EU AI Act's Rules for GPAI Models

AI Companionship: Why We Need to Evaluate How AI Systems Handle Emotional Bonds

AI Companionship: Why We Need to Evaluate How AI Systems Handle Emotional Bonds