🦉 CodeSearch-ModernBERT-Owl-Plus: High-Performance Sentence-BERT for Code Search

CodeSearch-ModernBERT-Owl-Plus is a high-performance code search model fine-tuned in a Sentence-BERT architecture, based on the pretrained CodeModernBERT-Owl v1.0.

This model is optimized for function-level search within codebases and natural language queries, achieving state-of-the-art results on the MTEB benchmark.


🛠 Features

  • ✅ Fine-tuned in Sentence-BERT format from CodeModernBERT-Owl
  • ✅ Supports multiple languages (Python, Java, JavaScript, etc.)
  • ✅ Specialized encoder for high-accuracy code search
  • ✅ Ideal for multi-stage (dual encoder) retrieval setups
  • ✅ Generates rich semantic embeddings for code and queries

📊 Evaluation on MTEB Benchmark

🏆 Main Scores in MTEB

This model achieved the following main scores (based on NDCG@10):

  • CodeSearchNetRetrieval: main_score = 0.8918
  • COIR-CodeSearchNetRetrieval: main_score = 0.8013

🧪 CodeSearchNetRetrieval (MTEB)

Metric Score
MRR@10 0.8704
NDCG@10 0.8918
MAP@10 0.8704
Recall@10 0.9563
Precision@10 0.0956

This model achieves strong performance across all ranking metrics and demonstrates balanced retrieval capability.


🧪 COIR-CodeSearchNetRetrieval (MTEB)

Metric Score
MRR@10 0.7751
NDCG@10 0.8013
MAP@10 0.7751
Recall@10 0.8826
Precision@10 0.0883

Robust and consistent performance is also maintained on the COIR dataset, demonstrating strong generalization.


📥 Usage Example

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Shuu12121/CodeSearch-ModernBERT-Owl-Plus")
embeddings = model.encode(["binary search function", "def binary_search(arr, target): ..."])

📝 Conclusion

  • ✅ An optimized Sentence-BERT model based on CodeModernBERT-Owl
  • ✅ Achieves MRR@10 > 0.87 on MTEB CodeSearchNetRetrieval
  • ✅ Ready for integration in production-level code search systems

📜 License

📄 Apache-2.0

📧 Contact

For questions or inquiries, feel free to reach out: 📧 shun0212114@outlook.jp

Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Shuu12121/CodeSearch-ModernBERT-Owl-Plus

Free AI Image Generator No sign-up. Instant results. Open Now