Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
免费去水印
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
7
4
16
eoe
eoe
Follow
webxos's profile picture
21world's profile picture
2 followers
·
82 following
AI & ML interests
None yet
Recent Activity
upvoted
a
collection
about 2 months ago
2026 April 🐝 China Open Source Highlights
reacted
to
anakin87
's
post
with ❤️
about 2 months ago
How LLM training with RL Environments works? It all starts with 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗩𝗲𝗿𝗶𝗳𝗶𝗮𝗯𝗹𝗲 𝗥𝗲𝘄𝗮𝗿𝗱𝘀 - question asked - model generates reasoning + answer - answer checked against ground truth - reward drives RL training In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s) Consider a more complex tic-tac-toe env ❌⭕ It adds: - dynamic game generation/handling - tunable opponent skill - multi-turn interactions (envs can also include tools) --- What happens at training? We use 𝗚𝗿𝗼𝘂𝗽 𝗥𝗲𝗹𝗮𝘁𝗶𝘃𝗲 𝗣𝗼𝗹𝗶𝗰𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 with a tic-tac-toe env No critic model needed, the group is the baseline Simpler than PPO 1️⃣ Rollout generation: from the same board, model plays N games via sampling 2️⃣ Each game scored with deterministic rewards (win, format, ...) 3️⃣ Mean score computed across the group 4️⃣ Each rollout's advantage = its score minus the group mean 5️⃣ Model updated to favor trajectories above baseline 🔁 Repeat For a deep dive, check out 🌱 https://github.com/anakin87/llm-rl-environments-lil-course a free hands-on course on RL environments for LLMs
liked
a model
2 months ago
DunnBC22/vit-base-patch16-224-in21k_Human_Activity_Recognition
View all activity
Organizations
None yet
eoe
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
commented
a paper
4 months ago
Shallow-π: Knowledge Distillation for Flow-based VLAs
Paper
•
2601.20262
•
Published
Jan 28
•
2
•
3
New activity in
NexaAI/OmniNeural-4B
10 months ago
how to run the model on mobile device with qualcomm soc
2
#5 opened 10 months ago by
eoe
New activity in
seba/qwen-2-coreml-ane
over 1 year ago
can you share your export code?
4
#1 opened over 1 year ago by
eoe
New activity in
smpanaro/Llama-3.2-1B-Instruct-CoreML
over 1 year ago
Convert and split CoreML model
8
#1 opened over 1 year ago by
andmev
New activity in
seba/llama-3.2-1B-instruct-coreml-ane
over 1 year ago
May I ask if you can share the model export code.
3
#1 opened over 1 year ago by
eoe
New activity in
qualcomm/Llama-v2-7B-Chat
about 2 years ago
Weights are not present in the repo
11
#1 opened over 2 years ago by
julien-c
×
Free Tool
Free AI Image Generator
Create images in seconds. No sign-up, no paywall, no setup.
No Sign-Up
Instant Results
Ready to Use
Create Images Free
Great for posters, avatars, covers, and social visuals.
Free AI Image Generator
No sign-up. Instant results.
Open Now