broken config.json
Just a heads-up, the config.json file is broken (i.e. syntactically invalid, maybe hand-edited the "Infinity" in)?
It can be loaded and loaded as inf
in Python and works with the Transformers library.
>>> json.load(open("./config.json"))
{'architectures': ['NemotronHForCausalLM'],
...
'time_step_limit': [0.0, inf],
...
'vocab_size': 131072}
I know it's technically not RFC-8259 compatible, but it should be considered fine in this case.
https://datatracker.ietf.org/doc/html/rfc8259
Numeric values that cannot be represented in the grammar below (such
as Infinity and NaN) are not permitted.
I know it's technically not RFC-8259 compatible, but it should be considered fine in this case.
That means we need to manually patch it each time we want to quantize it. That's an extremely shitty attitude, but then, it's up to nvidia how they represent themselves and their models. And as result of this arrogant attitude, we won't quantize it.