davanstrien HF Staff commited on
Commit
e0d2d6d
·
verified ·
1 Parent(s): 8b15dba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -3
README.md CHANGED
@@ -36,11 +36,94 @@ The model was trained on:
36
 
37
  ## Usage
38
 
 
 
 
 
 
 
39
  ```python
40
- from transformers import pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
- generator = pipeline("text-generation", model="davanstrien/SmolLM2-360M-tldr-sft-2025-02-12_15-13", device="cuda")
43
- output = generator(input_text, max_new_tokens=128, return_full_text=False)[0]
 
 
44
  ```
45
 
46
  ## Framework Versions
 
36
 
37
  ## Usage
38
 
39
+ Using the chat template when using the model in inference is recommended. Additionally, you should prepend either `<MODEL_CARD>` or `<DATASET_CARD>` to the start of the card you want to summarize. The training data used the body of the model or dataset card, i.e., the part after the YAML, so you will likely get better results only by passing this part of the card.
40
+
41
+ I have so far found that a low temperature of `0.4` generates better results.
42
+
43
+ Example:
44
+
45
  ```python
46
+ from transformers import AutoModelForCausalLM, AutoTokenizer
47
+ from huggingface_hub import ModelCard
48
+
49
+ card = ModelCard.load("davanstrien/Smol-Hub-tldr")
50
+
51
+ # Load tokenizer and model
52
+ tokenizer = AutoTokenizer.from_pretrained("davanstrien/Smol-Hub-tldr")
53
+ model = AutoModelForCausalLM.from_pretrained("davanstrien/Smol-Hub-tldr")
54
+
55
+ # Format input according to the chat template
56
+ messages = [{"role": "user", "content": f"<MODEL_CARD>{card.text}"}]
57
+ # Encode with the chat template
58
+ inputs = tokenizer.apply_chat_template(
59
+ messages, add_generation_prompt=True, return_tensors="pt"
60
+ )
61
+
62
+ # Generate with stop tokens
63
+ outputs = model.generate(
64
+ inputs,
65
+ max_new_tokens=60,
66
+ pad_token_id=tokenizer.pad_token_id,
67
+ eos_token_id=tokenizer.eos_token_id,
68
+ temperature=0.4,
69
+ do_sample=True,
70
+ )
71
+
72
+ input_length = inputs.shape[1]
73
+ response = tokenizer.decode(outputs[0][input_length:], skip_special_tokens=False)
74
+
75
+ # Extract just the summary part
76
+ summary = response.split("<CARD_SUMMARY>")[-1].split("</CARD_SUMMARY>")[0]
77
+ print(summary)
78
+ >>> "The Smol-Hub-tldr model is a fine-tuned version of SmolLM2-360M designed to generate concise, one-sentence summaries of model and dataset cards from the Hugging Face Hub."
79
+ ```
80
+
81
+ The model currently should close its summary with a `</CARD_SUMMARY>` (cooking some more with this...), so you can also use this as a stopping criterion when using `pipeline` inference.
82
+
83
+ ```
84
+ from transformers import pipeline, StoppingCriteria, StoppingCriteriaList
85
+ import torch
86
+
87
+ class StopOnTokens(StoppingCriteria):
88
+ def __init__(self, tokenizer, stop_token_ids):
89
+ self.stop_token_ids = stop_token_ids
90
+ self.tokenizer = tokenizer
91
+
92
+ def __call__(
93
+ self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs
94
+ ) -> bool:
95
+ for stop_id in self.stop_token_ids:
96
+ if input_ids[0][-1] == stop_id:
97
+ return True
98
+ return False
99
+
100
+ # Initialize pipeline
101
+ pipe = pipeline("text-generation", "davanstrien/Smol-Hub-tldr")
102
+ tokenizer = pipe.tokenizer
103
+
104
+ # Get the token IDs for stopping
105
+ stop_token_ids = [
106
+ tokenizer.encode("</CARD_SUMMARY>", add_special_tokens=True)[-1],
107
+ tokenizer.eos_token_id,
108
+ ]
109
+
110
+ # Create stopping criteria
111
+ stopping_criteria = StoppingCriteriaList([StopOnTokens(tokenizer, stop_token_ids)])
112
+
113
+ # Generate with stopping criteria
114
+ response = pipe(
115
+ messages,
116
+ max_new_tokens=50,
117
+ do_sample=True,
118
+ temperature=0.7,
119
+ stopping_criteria=stopping_criteria,
120
+ return_full_text=False,
121
+ )
122
 
123
+ # Clean up the response
124
+ summary = response[0]["generated_text"]
125
+ print(summary)
126
+ >>> "This model is a fine-tuned version of SmolLM2-360M for generating concise, one-sentence summaries of model and dataset cards from the Hugging Face Hub."
127
  ```
128
 
129
  ## Framework Versions