eoe's picture

eoe

eoe

·

AI & ML interests

None yet

Recent Activity

upvoted a collection about 2 months ago

2026 April 🐝 China Open Source Highlights

reacted to anakin87's post with ❤️ about 2 months ago

How LLM training with RL Environments works? It all starts with 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗩𝗲𝗿𝗶𝗳𝗶𝗮𝗯𝗹𝗲 𝗥𝗲𝘄𝗮𝗿𝗱𝘀 - question asked - model generates reasoning + answer - answer checked against ground truth - reward drives RL training In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s) Consider a more complex tic-tac-toe env ❌⭕ It adds: - dynamic game generation/handling - tunable opponent skill - multi-turn interactions (envs can also include tools) --- What happens at training? We use 𝗚𝗿𝗼𝘂𝗽 𝗥𝗲𝗹𝗮𝘁𝗶𝘃𝗲 𝗣𝗼𝗹𝗶𝗰𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 with a tic-tac-toe env No critic model needed, the group is the baseline Simpler than PPO 1️⃣ Rollout generation: from the same board, model plays N games via sampling 2️⃣ Each game scored with deterministic rewards (win, format, ...) 3️⃣ Mean score computed across the group 4️⃣ Each rollout's advantage = its score minus the group mean 5️⃣ Model updated to favor trajectories above baseline 🔁 Repeat For a deep dive, check out 🌱 https://github.com/anakin87/llm-rl-environments-lil-course a free hands-on course on RL environments for LLMs

liked a model 2 months ago

DunnBC22/vit-base-patch16-224-in21k_Human_Activity_Recognition

View all activity

Organizations

None yet

commented a paper 4 months ago

Shallow-π: Knowledge Distillation for Flow-based VLAs

Paper • 2601.20262 • Published Jan 28 • 2 •

New activity in NexaAI/OmniNeural-4B 10 months ago

how to run the model on mobile device with qualcomm soc

#5 opened 10 months ago by

New activity in seba/qwen-2-coreml-ane over 1 year ago

can you share your export code?

#1 opened over 1 year ago by

New activity in smpanaro/Llama-3.2-1B-Instruct-CoreML over 1 year ago

Convert and split CoreML model

#1 opened over 1 year ago by

New activity in seba/llama-3.2-1B-instruct-coreml-ane over 1 year ago

May I ask if you can share the model export code.

#1 opened over 1 year ago by

New activity in qualcomm/Llama-v2-7B-Chat about 2 years ago

Weights are not present in the repo

#1 opened over 2 years ago by