SteveWCG
/

roberta-sentence-classifier

@@ -1,40 +1,54 @@
 ---
 library_name: transformers
 license: mit
-base_model: roberta-base
-tags:
-- generated_from_trainer
 metrics:
 - accuracy
 model-index:
 - name: roberta-sentence-classifier
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # roberta-sentence-classifier
-This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.6266
-- Accuracy: 0.7990
-- Macro F1: 0.7614
-- Micro F1: 0.7990
-- Qwk: 0.6588
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -66,3 +80,15 @@ The following hyperparameters were used during training:
 - Pytorch 2.9.0+cu126
 - Datasets 4.0.0
 - Tokenizers 0.22.1

 ---
+base_model: roberta-base
 library_name: transformers
 license: mit
+pipeline_tag: text-classification
 metrics:
 - accuracy
+tags:
+- generated_from_trainer
 model-index:
 - name: roberta-sentence-classifier
   results: []
 ---
 # roberta-sentence-classifier
+This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) presented in the paper **[Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction](https://huggingface.co/papers/2606.28186)**.
+It serves as the sentence-level cognitive episode tagger in the **Epi2Diff** (Episode to Difficulty) framework. It maps Large Reasoning Model (LRM) reasoning traces into cognitively grounded episode sequences to support interpretable modeling of human item difficulty.
+- **Repository:** [c-steve-wang/Epi2Diff](https://github.com/c-steve-wang/Epi2Diff)
+- **Paper:** [Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction](https://huggingface.co/papers/2606.28186)
 ## Model description
+The model classifies sentence-level reasoning units into 8 problem-solving cognitive episode states:
+- `Read`
+- `Analyze`
+- `Plan`
+- `Implement`
+- `Explore`
+- `Verify`
+- `Monitor`
+- `Answer`
+These classified sequences are then used by the Epi2Diff framework to extract compact episode-dynamic process features for downstream item difficulty prediction.
 ## Intended uses & limitations
+You can use this model to segment and tag raw reasoning traces into functional problem-solving states to evaluate reasoning behaviors, perform interpretability studies, or support downstream educational measurement tasks.
 ## Training and evaluation data
+The model was fine-tuned on annotated reasoning trace sentences derived from datasets such as SAT Math, SAT Reading & Writing, Cambridge, and USMLE.
+It achieves the following results on the evaluation set:
+- Loss: 0.6266
+- Accuracy: 0.7990
+- Macro F1: 0.7614
+- Micro F1: 0.7990
+- Qwk: 0.6588
 ## Training procedure
 - Pytorch 2.9.0+cu126
 - Datasets 4.0.0
 - Tokenizers 0.22.1
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{wang2026epi2diff,
+  title = {Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction},
+  author = {Wang, Chenguang and Li, Ming and Zeng, Xinyue and Li, Zhuochun and Jiao, Hong and Zhou, Tianyi and Zhou, Dawei},
+  year = {2026}
+}
+```