Ankush Singal
Andyrasika
AI & ML interests
None yet
Recent Activity
updated
a collection
5 days ago
Reinforcement Learning
liked
a model
5 days ago
openbmb/MiniCPM-V-4
updated
a collection
16 days ago
Reasoning-Model
Organizations
Agents
Prompt-collection
Fine-Tuning
Fine-Tuning
-
Direct Judgement Preference Optimization
Paper • 2409.14664 • Published -
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Paper • 2411.02397 • Published • 24 -
RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Paper • 2410.08876 • Published -
Efficient Streaming Language Models with Attention Sinks
Paper • 2309.17453 • Published • 14
RAG articles
This collection is meant for RAG articles 1. Let your LLM generate a few tokens https://www.arxiv.org/abs/2412.11536
-
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models
Paper • 2406.14550 • Published • 4 -
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper • 2406.04692 • Published • 60 -
Meta Prompting for AGI Systems
Paper • 2311.11482 • Published • 3 -
Symbolic Learning Enables Self-Evolving Agents
Paper • 2406.18532 • Published • 12
Time series
this collection is for time series articles
Reinforcement Learning
This collection is for papers in Reinforcement Learning
Stable Diffusion
Papers related to stable diffusion
-
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion
Paper • 2408.03178 • Published • 41 -
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
Paper • 2408.17131 • Published • 11 -
LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
Paper • 2412.09262 • Published • 1 -
SegDT: A Diffusion Transformer-Based Segmentation Model for Medical Imaging
Paper • 2507.15595 • Published • 4
Synthetic Datasets
Reasoning-Model
Embedding
computation
this is for Mixture of XXX
Ankush Collection
Transformer Articles
-
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Paper • 2309.14327 • Published • 22 -
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 33 -
Memory^3: Language Modeling with Explicit Memory
Paper • 2407.01178 • Published • 4 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2
multimodal
this collection is for multimodal papers
-
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Paper • 2407.10387 • Published • 8 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 52 -
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 47 -
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 157
Audio
This collection is dedicate to Audio Transformers
Transformers
This collection is for Transformer Articles
-
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Paper • 2307.03712 • Published • 1 -
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Paper • 2408.04093 • Published • 4 -
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 20 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 53
cool models
List of coll models
-
alibaba-damo/mgp-str-base
Image-to-Text • 0.1B • Updated • 2.03k • 64 -
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 60 -
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Paper • 2409.02889 • Published • 55
Evaluations
Reasoning-Model
Agents
Embedding
Prompt-collection
computation
this is for Mixture of XXX
Fine-Tuning
Fine-Tuning
-
Direct Judgement Preference Optimization
Paper • 2409.14664 • Published -
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Paper • 2411.02397 • Published • 24 -
RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Paper • 2410.08876 • Published -
Efficient Streaming Language Models with Attention Sinks
Paper • 2309.17453 • Published • 14
Ankush Collection
Transformer Articles
-
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Paper • 2309.14327 • Published • 22 -
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 33 -
Memory^3: Language Modeling with Explicit Memory
Paper • 2407.01178 • Published • 4 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2
RAG articles
This collection is meant for RAG articles 1. Let your LLM generate a few tokens https://www.arxiv.org/abs/2412.11536
-
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models
Paper • 2406.14550 • Published • 4 -
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper • 2406.04692 • Published • 60 -
Meta Prompting for AGI Systems
Paper • 2311.11482 • Published • 3 -
Symbolic Learning Enables Self-Evolving Agents
Paper • 2406.18532 • Published • 12
multimodal
this collection is for multimodal papers
-
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Paper • 2407.10387 • Published • 8 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 52 -
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 47 -
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 157
Time series
this collection is for time series articles
Audio
This collection is dedicate to Audio Transformers
Reinforcement Learning
This collection is for papers in Reinforcement Learning
Transformers
This collection is for Transformer Articles
-
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Paper • 2307.03712 • Published • 1 -
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Paper • 2408.04093 • Published • 4 -
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 20 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 53
Stable Diffusion
Papers related to stable diffusion
-
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion
Paper • 2408.03178 • Published • 41 -
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
Paper • 2408.17131 • Published • 11 -
LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
Paper • 2412.09262 • Published • 1 -
SegDT: A Diffusion Transformer-Based Segmentation Model for Medical Imaging
Paper • 2507.15595 • Published • 4
cool models
List of coll models
-
alibaba-damo/mgp-str-base
Image-to-Text • 0.1B • Updated • 2.03k • 64 -
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 60 -
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Paper • 2409.02889 • Published • 55
Synthetic Datasets