Ava Monroe's picture

Ava Monroe

AvaMonroe

·

AI & ML interests

None yet

Recent Activity

new activity 4 days ago

NexaAI/OmniNeural-4B:Function Call

liked a model 4 days ago

NexaAI/OmniNeural-4B

reacted to Jaward's post with 🔥 4 days ago

fascinating read! staying bullish on search with rl might just help us get rid of hallucination entirely. I really like their approach: 1) <think>on prompt/context && what u know </think> 2) self <search>when u don’t know</search> (iteratively) with no external tool 3) <information>cite sources to support claim(s)</information> 4) <answer>final answer</answer> their rl training was done cost efficiently too, see code: https://github.com/TsinghuaC3I/SSRL

View all activity

Organizations

None yet

New activity in NexaAI/OmniNeural-4B 4 days ago

Function Call

#3 opened 4 days ago by

liked a model 4 days ago

NexaAI/OmniNeural-4B

Updated 1 day ago • 189 • 105

reacted to Jaward's post with 🔥🚀 4 days ago

Post

4127

fascinating read!
staying bullish on search with rl might just help us get rid of hallucination entirely. I really like their approach:
1) <think>on prompt/context && what u know </think>
2) self <search>when u don’t know</search> (iteratively) with no external tool
3) <information>cite sources to support claim(s)</information>
4) <answer>final answer</answer>
their rl training was done cost efficiently too, see code: https://github.com/TsinghuaC3I/SSRL

2 replies

·

liked a model 4 days ago

google/gemma-3-270m

Text Generation • 0.3B • Updated 11 days ago • 68.7k • 633

liked 2 models 8 months ago

public-data/ViTPose

Keypoint Detection • Updated Aug 27, 2024 • 8

NexaAI/OmniAudio-2.6B

Audio-Text-to-Text • 0.6B • Updated Dec 13, 2024 • 632 • 270

liked 2 models 9 months ago

NexaAI/Qwen2-Audio-7B-GGUF

Audio-Text-to-Text • 8B • Updated Nov 25, 2024 • 3.59k • 160

NexaAI/OmniVLM-968M

0.5B • Updated 4 days ago • 1.44k • 522

upvoted 11 papers 10 months ago

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Paper • 2411.04999 • Published Nov 7, 2024 • 18

Analyzing The Language of Visual Tokens

Paper • 2411.05001 • Published Nov 7, 2024 • 25

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Paper • 2411.04952 • Published Nov 7, 2024 • 30

GazeGen: Gaze-Driven User Interaction for Visual Content Generation

Paper • 2411.04335 • Published Nov 7, 2024 • 15

Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

Paper • 2411.05000 • Published Nov 7, 2024 • 23

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Paper • 2411.04923 • Published Nov 7, 2024 • 24

TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

Paper • 2411.04709 • Published Nov 5, 2024 • 27

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 69

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Paper • 2411.05003 • Published Nov 7, 2024 • 72

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 52

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published Nov 7, 2024 • 58