Jellywibble's picture
Create README.md
84f473e
|
raw
history blame
1 kB
metadata
license: mit
language:
  - en
pipeline_tag: text-classification
tags:
  - pytorch
  - reward_model
  - transformers
  - RLHF

Model Card for Model ID

This is part of the Chai reward-model series, using the GPT2 architecture with a classification head, optimising for a user accepting the completion generated by the base model.

Its training dataset consists of purely user-generated content retry_and_continue_50m_reward_model, where a user has the option to decline the generated response via the retry button or end the conversation.

Model Details