dnaihao commited on
Commit
3d1fed5
·
verified ·
1 Parent(s): 852da28

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +285 -0
README.md ADDED
@@ -0,0 +1,285 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - meta-llama/Llama-3.1-8B-Instruct
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - table
10
+ ---
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+ Recent advances in table understanding have focused on instruction-tuning large language models (LLMs) for table-related tasks. However, existing research has overlooked the impact of hyperparameter choices, and also lacks a comprehensive evaluation of the out-of-domain table understanding ability and the general capabilities of these table LLMs. In this paper, we evaluate these abilities in existing table LLMs, and find significant declines in both out-of-domain table understanding and general capabilities as compared to their base models.
16
+
17
+ Through systematic analysis, we show that hyperparameters, such as learning rate, can significantly influence both table-specific and general capabilities. Contrary to the previous table instruction-tuning work, we demonstrate that smaller learning rates and fewer training instances can enhance table understanding while preserving general capabilities. Based on our findings, we introduce TAMA, a TAble LLM instruction-tuned from LLaMA 3.1 8B Instruct, which achieves performance on par with, or surpassing GPT-3.5 and GPT-4 on table tasks, while maintaining strong out-of-domain generalization and general capabilities. Our findings highlight the potential for reduced data annotation costs and more efficient model development through careful hyperparameter selection.
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ <!-- Provide a longer summary of what this model is. -->
24
+
25
+
26
+
27
+ - **Model type:** Text generation.
28
+ - **Language(s) (NLP):** English.
29
+ - **License:** [[License for Llama models](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE))]
30
+ - **Finetuned from model:** [[meta-llama/Llama-3.1-8b-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)]
31
+
32
+ ### Model Sources
33
+
34
+ <!-- Provide the basic links for the model. -->
35
+
36
+ - **Repository:** [[github](https://github.com/MichiganNLP/TAMA)]
37
+ - **Paper:** [[paper](https://arxiv.org/abs/2501.14693)]
38
+
39
+ ## Uses
40
+
41
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
42
+ TAMA is intended for the use in table understanding tasks and to facilitate future research.
43
+
44
+
45
+ ## How to Get Started with the Model
46
+
47
+ Use the code below to get started with the model.
48
+ Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
49
+
50
+ Make sure to update your transformers installation via `pip install --upgrade transformers`.
51
+
52
+ ```
53
+ import transformers
54
+ import torch
55
+
56
+ model_id = "MichiganNLP/tama-5e-7"
57
+
58
+ pipeline = transformers.pipeline(
59
+ "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
60
+ )
61
+
62
+ pipeline("Hey how are you doing today?")
63
+ ```
64
+
65
+ You may replace the prompt with table-specific instructions. We recommend using the following prompt structure:
66
+
67
+ ```
68
+ Below is an instruction that describes a task, paired with an input that provides further context. Write a response that
69
+ appropriately completes the request.
70
+
71
+ ### Instruction:
72
+ {instruction}
73
+
74
+ ### Input:
75
+ {table_content}
76
+
77
+ ### Question:
78
+ {question}
79
+
80
+ ### Response:
81
+ ```
82
+
83
+
84
+ ## Training Details
85
+
86
+ ### Training Data
87
+
88
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
89
+
90
+ Coming soon.
91
+
92
+ ### Training Procedure
93
+
94
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
95
+
96
+ We utilize the [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) library for model training and inference. Example YAML configuration files are provided [here](https://github.com/MichiganNLP/TAMA/blob/main/yamls/train.yaml).
97
+
98
+ The training command is:
99
+ ```
100
+ llamafactory-cli train yamls/train.yaml
101
+ ```
102
+
103
+
104
+ #### Training Hyperparameters
105
+
106
+ - **Training regime:** bf16
107
+ - **Training epochs:** 2.0
108
+ - **Learning rate scheduler:** linear
109
+ - **Cutoff length:** 2048
110
+ - **Learning rate**: 5e-7
111
+
112
+ ## Evaluation
113
+
114
+ ### Results
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+
119
+ <table>
120
+ <tr>
121
+ <th>Models</th>
122
+ <th>FeTaQA</th>
123
+ <th>HiTab</th>
124
+ <th>TaFact</th>
125
+ <th>FEVEROUS</th>
126
+ <th>WikiTQ</th>
127
+ <th>WikiSQL</th>
128
+ <th>HybridQA</th>
129
+ <th>TATQA</th>
130
+ <th>AIT-QA</th>
131
+ <th>TABMWP</th>
132
+ <th>InfoTabs</th>
133
+ <th>KVRET</th>
134
+ <th>ToTTo</th>
135
+ <th>TableGPT<sub>subset</sub></th>
136
+ <th>TableBench</th>
137
+ </tr>
138
+ <tr>
139
+ <th>Metrics</th>
140
+ <th>BLEU</th>
141
+ <th>Acc</th>
142
+ <th>Acc</th>
143
+ <th>Acc</th>
144
+ <th>Acc</th>
145
+ <th>Acc</th>
146
+ <th>Acc</th>
147
+ <th>Acc</th>
148
+ <th>Acc</th>
149
+ <th>Acc</th>
150
+ <th>Acc</th>
151
+ <th>Micro F1</th>
152
+ <th>BLEU</th>
153
+ <th>Acc</th>
154
+ <th>ROUGE-L</th>
155
+ </tr>
156
+ <tr>
157
+ <td>GPT-3.5</td>
158
+ <td><u>26.49</u></td>
159
+ <td>43.62</td>
160
+ <td>67.41</td>
161
+ <td>60.79</td>
162
+ <td><u>53.13</u></td>
163
+ <td>41.91</td>
164
+ <td>40.22</td>
165
+ <td>31.38</td>
166
+ <td>84.13</td>
167
+ <td>46.30</td>
168
+ <td>56.00</td>
169
+ <td><u>54.56</u></td>
170
+ <td><u>16.81</u></td>
171
+ <td>54.80</td>
172
+ <td>27.75</td>
173
+ </tr>
174
+ <tr>
175
+ <td>GPT-4</td>
176
+ <td>21.70</td>
177
+ <td><u>48.40</u></td>
178
+ <td><b>74.40</b></td>
179
+ <td><u>71.60</u></td>
180
+ <td><b>68.40</b></td>
181
+ <td><u>47.60</u></td>
182
+ <td><u>58.60</u></td>
183
+ <td><b>55.81</b></td>
184
+ <td><u>88.57</u></td>
185
+ <td><b>67.10</b></td>
186
+ <td><u>58.60</u></td>
187
+ <td><b>56.46</b></td>
188
+ <td>12.21</td>
189
+ <td><b>80.20</b></td>
190
+ <td><b>40.38</b></td>
191
+ </tr>
192
+ <tr>
193
+ <td>base</td>
194
+ <td>15.33</td>
195
+ <td>32.83</td>
196
+ <td>58.44</td>
197
+ <td>66.37</td>
198
+ <td>43.46</td>
199
+ <td>20.43</td>
200
+ <td>32.83</td>
201
+ <td>26.70</td>
202
+ <td>82.54</td>
203
+ <td>39.97</td>
204
+ <td>48.39</td>
205
+ <td>50.80</td>
206
+ <td>13.24</td>
207
+ <td>53.60</td>
208
+ <td>23.47</td>
209
+ </tr>
210
+ <tr>
211
+ <td>TAMA</td>
212
+ <td><b>35.37</b></td>
213
+ <td><b>63.51</b></td>
214
+ <td><u>73.82</u></td>
215
+ <td><b>77.39</b></td>
216
+ <td>52.88</td>
217
+ <td><b>68.31</b></td>
218
+ <td><b>60.86</b></td>
219
+ <td><u>48.47</u></td>
220
+ <td><b>89.21</b></td>
221
+ <td><u>65.09</u></td>
222
+ <td><b>64.54</b></td>
223
+ <td>43.94</td>
224
+ <td><b>37.94</b></td>
225
+ <td><u>53.60</u></td>
226
+ <td><u>28.60</u></td>
227
+ </tr>
228
+ </table>
229
+
230
+ **Note these results are corresponding to the [tama-1e-6](https://huggingface.co/MichiganNLP/tama-1e-6) checkpoint. We release the tama-5e-7 checkpoints for the purpose of facilitating future research.**
231
+
232
+ We make the number bold if it is the best among the four, we underline the number if it is at the second place.
233
+
234
+ Please refer to our [paper](https://arxiv.org/abs/2501.14693) for additional details.
235
+
236
+
237
+ #### Metrics
238
+
239
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
240
+
241
+ Please refer to our [paper](https://arxiv.org/abs/2501.14693) for additional details.
242
+
243
+
244
+ #### Summary
245
+
246
+ Notably, as an 8B model, TAMA demonstrates strong table understanding ability, outperforming GPT-3.5 on most of the table understanding benchmarks, even achieving performance on par or better than GPT-4.
247
+
248
+
249
+ ## Technical Specifications
250
+
251
+ ### Model Architecture and Objective
252
+
253
+ We base our model on the [Llama-3.1-8B-Instruct model](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
254
+ We instruction tune the model on a set of 2,600 table instructions.
255
+
256
+ ### Compute Infrastructure
257
+
258
+ #### Hardware
259
+
260
+ We conduct our experiments on A40 and A100 GPUs.
261
+
262
+ #### Software
263
+
264
+ We leverage the [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) for model training.
265
+
266
+ ## Citation
267
+
268
+ ```
269
+ @misc{
270
+ deng2025rethinking,
271
+ title={Rethinking Table Instruction Tuning},
272
+ author={Naihao Deng and Rada Mihalcea},
273
+ year={2025},
274
+ url={https://openreview.net/forum?id=GLmqHCwbOJ}
275
+ }
276
+ ```
277
+
278
+
279
+ ## Model Card Authors
280
+
281
+ Naihao Deng
282
+
283
+ ## Model Card Contact
284
+
285
+ Naihao Deng