|  | --- | 
					
						
						|  | license: mit | 
					
						
						|  | language: | 
					
						
						|  | - en | 
					
						
						|  | base_model: | 
					
						
						|  | - meta-llama/Llama-3.1-8B-Instruct | 
					
						
						|  | pipeline_tag: text-generation | 
					
						
						|  | tags: | 
					
						
						|  | - table | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | <div style="display: flex; align-items: center;"> | 
					
						
						|  | <img src="https://huggingface.co/datasets/MichiganNLP/blog-images/resolve/main/tama.png" alt="TAMA Logo" width="80px" style="margin-right: 12px;"> | 
					
						
						|  | <h1 style="margin: 0;">Model Card for TAMA-vA</h1> | 
					
						
						|  | </div> | 
					
						
						|  |  | 
					
						
						|  | <!-- Provide a quick summary of what the model is/does. --> | 
					
						
						|  |  | 
					
						
						|  | Recent advances in table understanding have focused on instruction-tuning large language models (LLMs) for table-related tasks. However, existing research has overlooked the impact of hyperparameter choices, and also lacks a comprehensive evaluation of the out-of-domain table understanding ability and the general capabilities of these table LLMs. In this paper, we evaluate these abilities in existing table LLMs, and find significant declines in both out-of-domain table understanding and general capabilities as compared to their base models. | 
					
						
						|  |  | 
					
						
						|  | Through systematic analysis, we show that hyperparameters, such as learning rate, can significantly influence both table-specific and general capabilities. Contrary to the previous table instruction-tuning work, we demonstrate that smaller learning rates and fewer training instances can enhance table understanding while preserving general capabilities. Based on our findings, we introduce TAMA, a TAble LLM instruction-tuned from LLaMA 3.1 8B Instruct, which achieves performance on par with, or surpassing GPT-3.5 and GPT-4 on table tasks, while maintaining strong out-of-domain generalization and general capabilities. Our findings highlight the potential for reduced data annotation costs and more efficient model development through careful hyperparameter selection. | 
					
						
						|  |  | 
					
						
						|  | ## 🚀 Model Details | 
					
						
						|  |  | 
					
						
						|  | ### Model Description | 
					
						
						|  |  | 
					
						
						|  | <!-- Provide a longer summary of what this model is. --> | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | - **Model type:** Text generation. | 
					
						
						|  | - **Language(s) (NLP):** English. | 
					
						
						|  | - **License:** [[License for Llama models](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE))] | 
					
						
						|  | - **Finetuned from model:** [[meta-llama/Llama-3.1-8b-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)] | 
					
						
						|  |  | 
					
						
						|  | ### Model Sources | 
					
						
						|  |  | 
					
						
						|  | <!-- Provide the basic links for the model. --> | 
					
						
						|  |  | 
					
						
						|  | - **Repository:** [[github](https://github.com/MichiganNLP/TAMA)] | 
					
						
						|  | - **Paper:** [[paper](https://arxiv.org/abs/2501.14693)] | 
					
						
						|  |  | 
					
						
						|  | ## Uses | 
					
						
						|  |  | 
					
						
						|  | <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> | 
					
						
						|  | TAMA is intended for the use in table understanding tasks and to facilitate future research. | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## 🔨 How to Get Started with the Model | 
					
						
						|  |  | 
					
						
						|  | Use the code below to get started with the model. | 
					
						
						|  | Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function. | 
					
						
						|  |  | 
					
						
						|  | Make sure to update your transformers installation via `pip install --upgrade transformers`. | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  | import transformers | 
					
						
						|  | import torch | 
					
						
						|  |  | 
					
						
						|  | model_id = "MichiganNLP/TAMA-vA" | 
					
						
						|  |  | 
					
						
						|  | pipeline = transformers.pipeline( | 
					
						
						|  | "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto" | 
					
						
						|  | ) | 
					
						
						|  |  | 
					
						
						|  | pipeline("Hey how are you doing today?") | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | You may replace the prompt with table-specific instructions. We recommend using the following prompt structure: | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  | Below is an instruction that describes a task, paired with an input that provides further context. Write a response that | 
					
						
						|  | appropriately completes the request. | 
					
						
						|  |  | 
					
						
						|  | ### Instruction: | 
					
						
						|  | {instruction} | 
					
						
						|  |  | 
					
						
						|  | ### Input: | 
					
						
						|  | {table_content} | 
					
						
						|  |  | 
					
						
						|  | ### Question: | 
					
						
						|  | {question} | 
					
						
						|  |  | 
					
						
						|  | ### Response: | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Training Details | 
					
						
						|  |  | 
					
						
						|  | ### Training Data | 
					
						
						|  |  | 
					
						
						|  | <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> | 
					
						
						|  |  | 
					
						
						|  | [TAMA Instruct](https://huggingface.co/datasets/MichiganNLP/TAMA_Instruct). | 
					
						
						|  |  | 
					
						
						|  | ### Training Procedure | 
					
						
						|  |  | 
					
						
						|  | <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> | 
					
						
						|  |  | 
					
						
						|  | We utilize the [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) library for model training and inference. Example YAML configuration files are provided [here](https://github.com/MichiganNLP/TAMA/blob/main/yamls/train.yaml). | 
					
						
						|  |  | 
					
						
						|  | The training command is: | 
					
						
						|  | ``` | 
					
						
						|  | llamafactory-cli train yamls/train.yaml | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | #### Training Hyperparameters | 
					
						
						|  |  | 
					
						
						|  | - **Training regime:** bf16 | 
					
						
						|  | - **Training epochs:** 2.0 | 
					
						
						|  | - **Learning rate scheduler:** linear | 
					
						
						|  | - **Cutoff length:** 2048 | 
					
						
						|  | - **Learning rate**: 1e-6 | 
					
						
						|  |  | 
					
						
						|  | ## 📝 Evaluation | 
					
						
						|  |  | 
					
						
						|  | ### Results | 
					
						
						|  |  | 
					
						
						|  | <!-- This should link to a Dataset Card if possible. --> | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | <table> | 
					
						
						|  | <tr> | 
					
						
						|  | <th>Models</th> | 
					
						
						|  | <th>FeTaQA</th> | 
					
						
						|  | <th>HiTab</th> | 
					
						
						|  | <th>TaFact</th> | 
					
						
						|  | <th>FEVEROUS</th> | 
					
						
						|  | <th>WikiTQ</th> | 
					
						
						|  | <th>WikiSQL</th> | 
					
						
						|  | <th>HybridQA</th> | 
					
						
						|  | <th>TATQA</th> | 
					
						
						|  | <th>AIT-QA</th> | 
					
						
						|  | <th>TABMWP</th> | 
					
						
						|  | <th>InfoTabs</th> | 
					
						
						|  | <th>KVRET</th> | 
					
						
						|  | <th>ToTTo</th> | 
					
						
						|  | <th>TableGPT<sub>subset</sub></th> | 
					
						
						|  | <th>TableBench</th> | 
					
						
						|  | </tr> | 
					
						
						|  | <tr> | 
					
						
						|  | <th>Metrics</th> | 
					
						
						|  | <th>BLEU</th> | 
					
						
						|  | <th>Acc</th> | 
					
						
						|  | <th>Acc</th> | 
					
						
						|  | <th>Acc</th> | 
					
						
						|  | <th>Acc</th> | 
					
						
						|  | <th>Acc</th> | 
					
						
						|  | <th>Acc</th> | 
					
						
						|  | <th>Acc</th> | 
					
						
						|  | <th>Acc</th> | 
					
						
						|  | <th>Acc</th> | 
					
						
						|  | <th>Acc</th> | 
					
						
						|  | <th>Micro F1</th> | 
					
						
						|  | <th>BLEU</th> | 
					
						
						|  | <th>Acc</th> | 
					
						
						|  | <th>ROUGE-L</th> | 
					
						
						|  | </tr> | 
					
						
						|  | <tr> | 
					
						
						|  | <td>GPT-3.5</td> | 
					
						
						|  | <td><u>26.49</u></td> | 
					
						
						|  | <td>43.62</td> | 
					
						
						|  | <td>67.41</td> | 
					
						
						|  | <td>60.79</td> | 
					
						
						|  | <td><u>53.13</u></td> | 
					
						
						|  | <td>41.91</td> | 
					
						
						|  | <td>40.22</td> | 
					
						
						|  | <td>31.38</td> | 
					
						
						|  | <td>84.13</td> | 
					
						
						|  | <td>46.30</td> | 
					
						
						|  | <td>56.00</td> | 
					
						
						|  | <td><u>54.56</u></td> | 
					
						
						|  | <td><u>16.81</u></td> | 
					
						
						|  | <td>54.80</td> | 
					
						
						|  | <td>27.75</td> | 
					
						
						|  | </tr> | 
					
						
						|  | <tr> | 
					
						
						|  | <td>GPT-4</td> | 
					
						
						|  | <td>21.70</td> | 
					
						
						|  | <td><u>48.40</u></td> | 
					
						
						|  | <td><b>74.40</b></td> | 
					
						
						|  | <td><u>71.60</u></td> | 
					
						
						|  | <td><b>68.40</b></td> | 
					
						
						|  | <td><u>47.60</u></td> | 
					
						
						|  | <td><u>58.60</u></td> | 
					
						
						|  | <td><b>55.81</b></td> | 
					
						
						|  | <td><u>88.57</u></td> | 
					
						
						|  | <td><b>67.10</b></td> | 
					
						
						|  | <td><u>58.60</u></td> | 
					
						
						|  | <td><b>56.46</b></td> | 
					
						
						|  | <td>12.21</td> | 
					
						
						|  | <td><b>80.20</b></td> | 
					
						
						|  | <td><b>40.38</b></td> | 
					
						
						|  | </tr> | 
					
						
						|  | <tr> | 
					
						
						|  | <td>base</td> | 
					
						
						|  | <td>15.33</td> | 
					
						
						|  | <td>32.83</td> | 
					
						
						|  | <td>58.44</td> | 
					
						
						|  | <td>66.37</td> | 
					
						
						|  | <td>43.46</td> | 
					
						
						|  | <td>20.43</td> | 
					
						
						|  | <td>32.83</td> | 
					
						
						|  | <td>26.70</td> | 
					
						
						|  | <td>82.54</td> | 
					
						
						|  | <td>39.97</td> | 
					
						
						|  | <td>48.39</td> | 
					
						
						|  | <td>50.80</td> | 
					
						
						|  | <td>13.24</td> | 
					
						
						|  | <td>53.60</td> | 
					
						
						|  | <td>23.47</td> | 
					
						
						|  | </tr> | 
					
						
						|  | <tr> | 
					
						
						|  | <td>TAMA</td> | 
					
						
						|  | <td><b>35.37</b></td> | 
					
						
						|  | <td><b>63.51</b></td> | 
					
						
						|  | <td><u>73.82</u></td> | 
					
						
						|  | <td><b>77.39</b></td> | 
					
						
						|  | <td>52.88</td> | 
					
						
						|  | <td><b>68.31</b></td> | 
					
						
						|  | <td><b>60.86</b></td> | 
					
						
						|  | <td><u>48.47</u></td> | 
					
						
						|  | <td><b>89.21</b></td> | 
					
						
						|  | <td><u>65.09</u></td> | 
					
						
						|  | <td><b>64.54</b></td> | 
					
						
						|  | <td>43.94</td> | 
					
						
						|  | <td><b>37.94</b></td> | 
					
						
						|  | <td><u>53.60</u></td> | 
					
						
						|  | <td><u>28.60</u></td> | 
					
						
						|  | </tr> | 
					
						
						|  | </table> | 
					
						
						|  |  | 
					
						
						|  | We make the number bold if it is the best among the four, we underline the number if it is at the second place. | 
					
						
						|  |  | 
					
						
						|  | Please refer to our [paper](https://arxiv.org/abs/2501.14693) for additional details. | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | #### Metrics | 
					
						
						|  |  | 
					
						
						|  | <!-- These are the evaluation metrics being used, ideally with a description of why. --> | 
					
						
						|  |  | 
					
						
						|  | Please refer to our [paper](https://arxiv.org/abs/2501.14693) for additional details. | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | #### Summary | 
					
						
						|  |  | 
					
						
						|  | Notably, as an 8B model, TAMA demonstrates strong table understanding ability, outperforming GPT-3.5 on most of the table understanding benchmarks, even achieving performance on par or better than GPT-4. | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Technical Specifications | 
					
						
						|  |  | 
					
						
						|  | ### Model Architecture and Objective | 
					
						
						|  |  | 
					
						
						|  | We base our model on the [Llama-3.1-8B-Instruct model](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). | 
					
						
						|  | We instruction tune the model on a set of 2,600 table instructions. | 
					
						
						|  |  | 
					
						
						|  | ### Compute Infrastructure | 
					
						
						|  |  | 
					
						
						|  | #### Hardware | 
					
						
						|  |  | 
					
						
						|  | We conduct our experiments on A40 and A100 GPUs. | 
					
						
						|  |  | 
					
						
						|  | #### Software | 
					
						
						|  |  | 
					
						
						|  | We leverage the [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) for model training. | 
					
						
						|  |  | 
					
						
						|  | ## Citation | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  | @misc{ | 
					
						
						|  | deng2025rethinking, | 
					
						
						|  | title={Rethinking Table Instruction Tuning}, | 
					
						
						|  | author={Naihao Deng and Rada Mihalcea}, | 
					
						
						|  | year={2025}, | 
					
						
						|  | url={https://openreview.net/forum?id=GLmqHCwbOJ} | 
					
						
						|  | } | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Model Card Authors | 
					
						
						|  |  | 
					
						
						|  | Naihao Deng | 
					
						
						|  |  | 
					
						
						|  | ## Model Card Contact | 
					
						
						|  |  | 
					
						
						|  | Naihao Deng |