nintwentydo
/

Razorback-12B-v0.2

Image-Text-to-Text

Model card Files Files and versions

nintwentydo commited on Jan 10

Commit

715cc5f

·

verified ·

1 Parent(s): 7dfea76

Create README.md

Files changed (1) hide show

README.md +56 -0

README.md ADDED Viewed

	@@ -0,0 +1,56 @@

+---
+base_model:
+- mistralai/Pixtral-12B-2409
+- TheDrummer/UnslopNemo-12B-v3
+base_model_relation: merge
+library_name: transformers
+tags:
+- mergekit
+- merge
+- multimodal
+- mistral
+- pixtral
+language:
+- en
+- fr
+- de
+- es
+- it
+- pt
+- ru
+- zh
+- ja
+license: other
+pipeline_tag: image-text-to-text
+---
+# Razorback 12B v0.2
+<img src="https://huggingface.co/nintwentydo/Razorback-12B-v0.1/resolve/main/razorback.jpg" style="width: 100%; max-width:700px"></img>
+A more robust attempt at merging TheDrummer's UnslopNemo v3 into Pixtral 12B.
+Has been really stable in my testing so far. Needs more testing to see what samplers it does/doesn't like.
+Seems to be the best of both worlds - less sloppy, more engaging content and decent intelligence / visual understanding.
+## Merging Approach
+First, I loaded up Pixtral 12B Base and Mistral Nemo Base to compare their parameter differences.
+Looking at the L2 norm / relative difference values I was able to isolate which parts of Pixtral 12B are a significant deviation from Mistral Nemo.
+Because while the language model architecture is the same between the two, a lot of vision understanding has been trained into Pixtral's language model and can break very easily.
+Then I calculated merging weights for each parameter using an exponential falloff. The smaller the difference, the higher the weight.
+Applied this recipe to Pixtral Instruct (Pixtral-12B-2409) and TheDrummer's UnslopNemo-12B-v3. The goal is to infuse as much Drummer goodness without breaking vision input. And it looks like it's worked!
+## Usage
+Needs more testing to identify best sampling params, but so far just using ~0.7 temp + 0.03 min p has been rock solid.
+Use the included chat template (Mistral). No chatml support yet.
+## Credits
+- Mistral for [mistralai/Pixtral-12B-2409](https://huggingface.co/mistralai/Pixtral-12B-2409)
+- Unsloth for [unsloth/Pixtral-12B-2409](https://huggingface.co/unsloth/Pixtral-12B-2409) transformers conversion
+- TheDrummer for [TheDrummer/UnslopNemo-12B-v3](https://huggingface.co/TheDrummer/UnslopNemo-12B-v3)