Upload tokenizer.json
#16
by
						
jonatanklosko
	
							
						- opened
							
					
Generated with:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/whisper-large-v3")
assert tokenizer.is_fast
tokenizer.save_pretrained("...")
As discussed with @ArthurZ on the PR the fast tokenizer can always be loaded from the slow one: https://github.com/huggingface/transformers/pull/27338/files#r1384935617
So there's no issue with not having the tokenizer.json. Happy to merge this PR to improve clarity for the Hub weights however
@sanchit-gandhi
	 yeah, the thing is that the Rust huggingface/tokenizers can only load tokenizer.json. In the Elixir ecosystem we have bindings to huggingface/tokenizers and so rely solely on fast tokenizers :)
patrickvonplaten
	
				
		changed pull request status to
		merged