cyzhh's picture
Update README.md
62315fe verified
|
raw
history blame
1.94 kB
metadata
license: llama3
base_model: meta-llama/Meta-Llama-3-8B-Instruct

Llama3-KALE-LM-Chem 8B

Benchmarks

Open Benchmarks

Models ChemBench MMLU MMLU-Chem SciQ IE(Acc) IE(LS)
GPT-3.5 47.15 69.75 53.32 89.6 52.98 68.28
GPT-4 53.72 78.67 63.70 94.10 54.20 69.74
Llama3-8B-Instruct 46.02 68.3 51.10 93.30 45.83 61.22
LlaSMol 28.47 54.47 33.24 72.30 2.16 3.23
ChemDFM 44.44 58.11 45.60 86.70 7.61 11.49
ChemLLM-7B-Chat 34.16 61.79 48.39 94.00 29.66 39.17
ChemLLM-7B-Chat-1.5-SFT 42.75 63.56 49.63 95.10 14.96 19.61
KALE-LM 52.40 68.74 53.83 91.50 67.50 78.37
KALE-LM-INSTRUCT 57.01 68.09 54.83 91.60 57.53 64.16

In-House Benchmarks

Models NC PP M2C C2M PP Retro YP TP SP Average
GPT-3.5 46.93 56.98 85.28 38.25 43.67 42.33 30.33 42.57 38 47.15
GPT-4 54.82 65.02 92.64 52.88 62.67 52.67 42.33 24.75 35.67 53.72
Llama3-8B-Instruct 51.31 27.79 90.30 40.88 34.00 30.00 45.33 60.89 33.67 46.02
LlaSMol 27.78 29.34 31.44 23.38 25.67 24.00 37.33 34.65 22.67 28.47
ChemDFM 36.92 55.57 83.95 42.00 40.00 37.33 39.00 33.17 32.00 44.44
ChemLLM-7B-Chat 41.05 29.76 85.28 26.12 26.00 24.00 20.00 24.26 31.00 34.16
ChemLLM-7B-Chat-1.5-SFT 50.06 49.51 85.28 38.75 38.00 26.67 28.33 31.68 33.67 42.44
OURMODEL 63.58 58.39 92.98 44.50 48.67 38.33 46.33 44.55 34.33 52.41
OURMODELINSTRUCT 61.33 43.44 90.30 53.62 72.67 53.67 46.00 47.03 45.00 57.01