Text Classification
Transformers
Safetensors
English
HHEMv2Config
custom_code
ofermend commited on
Commit
4e5fd87
·
1 Parent(s): c9ccea3

updated README

Browse files
Files changed (1) hide show
  1. README.md +49 -21
README.md CHANGED
@@ -9,31 +9,52 @@ pipline_tag: text-classficiation
9
 
10
  <img src="https://huggingface.co/vectara/hallucination_evaluation_model/resolve/main/candle.png" width="50" height="50" style="display: inline;"> In Loving memory of Simon Mark Hughes...
11
 
12
- Click [here](https://huggingface.co/spaces/vectara/hhem-2.1-open-demo) for HHEM-2.1-Open demo app.
13
 
14
- <iframe src="https://vectara-hhem-2-1-open-demo.hf.space/" title="Demo for HHEM-2.1-Open"></iframe>
15
 
16
- With a performance superior than GPT-3.5-Turbo and GPT-4 but a footprint of less than 600MB RAM,
17
- HHEM-2.1-Open is the lastest open source version of Vectara's HHEM series models for detecting hallucinations in LLMs. They are particularly useful in the context of building retrieval-augmented-generation (RAG) applications where a set of facts is summarized by an LLM, and HHEM can be used to measure the extent to which this summary is factually consistent with the facts.
18
 
19
- If you are interested to learn more about RAG or experiment with Vectara, you can [sign up](https://console.vectara.com/signup/?utm_source=huggingface&utm_medium=space&utm_term=hhem-model&utm_content=console&utm_campaign=) for a Vectara account.
20
 
 
21
 
22
  ## Hallucination Detection 101
23
- By "hallucinated" or "factually inconsistent", we mean that a text (hypothesis, to be judged) is not supported by another text (evidence/premise, given). You **always need two** pieces of text to determine whether a text is hallucinated or not. When applied to RAG (retrieval augmented generation), the LLM is provided with several pieces of text (often called facts or context) retrieved from some dataset, and a hallucination would indicate that the summary (hypothesis) is not supported by those facts (evidence).
24
 
25
- A common type of hallucination in RAG is **factual but hallucinated**.
26
  For example, given the premise _"The capital of France is Berlin"_, the hypothesis _"The capital of France is Paris"_ is hallucinated -- although it is true in the world knowledge. This happens when LLMs do not generate content based on the textual data provided to them as part of the RAG retrieval process, but rather generate content based on their pre-trained knowledge.
27
 
28
  Additionally, hallucination detection is "asymmetric" or is not commutative. For example, the hypothesis _"I visited Iowa"_ is considered hallucinated given the premise _"I visited the United States"_, but the reverse is consistent.
29
 
30
- ## Using HHEM-2.1-Open
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- > HHEM-2.1 has some breaking change from HHEM-1.0. Your code that works with HHEM-1 (November 2023) will not work anymore. While we are working on backward compatibility, please follow the new usage instructions below.
 
 
 
 
 
 
 
 
 
33
 
34
  Here we provide several ways to use HHEM-2.1-Open in the `transformers` library.
35
 
36
- > You may run into a warning message that "Token indices sequence length is longer than the specified maximum sequence length". Please ignore it which is inherited from the foundation, T5-base.
37
 
38
  ### Using with `AutoModel`
39
 
@@ -101,10 +122,20 @@ print(simple_scores)
101
 
102
  Of course, with `pipeline`, you can also get the most likely label, or the label with the highest score, by setting `top_k=1`.
103
 
 
 
 
 
 
 
 
 
 
 
104
 
105
  ## HHEM-2.1-Open vs. HHEM-1.0
106
 
107
- The major difference between HHEM-2.1-Open and the original HHEM-1.0 is that HHEM-2.1-Open has an unlimited context length, while HHEM-1.0 is capped at 512 tokens. The longer context length allows HHEM-2.1-Open to provide more accurate hallucination detection for RAG which often needs more than 512 tokens.
108
 
109
  The tables below compare the two models on the [AggreFact](https://arxiv.org/pdf/2205.12854) and [RAGTruth](https://arxiv.org/abs/2401.00396) benchmarks, as well as GPT-3.5-Turbo and GPT-4. In particular, on AggreFact, we focus on its SOTA subset (denoted as `AggreFact-SOTA`) which contains summaries generated by Google's T5, Meta's BART, and Google's Pegasus, which are the three latest models in the AggreFact benchmark. The results on RAGTruth's summarization (denoted as `RAGTruth-Summ`) and QA (denoted as `RAGTruth-QA`) subsets are reported separately. The GPT-3.5-Turbo and GPT-4 versions are 01-25 and 06-13 respectively. The zero-shot results of the two GPT models were obtained using the prompt template in [this paper](https://arxiv.org/pdf/2303.15621).
110
 
@@ -148,23 +179,20 @@ Table 4: Percentage points of HHEM-2.1-Open's balanced accuracies over GPT-3.5-T
148
 
149
  Another advantage of HHEM-2.1-Open is its efficiency. HHEM-2.1-Open can be run on consumer-grade hardware, occupying less than 600MB RAM space at 32-bit precision and elapsing around 1.5 second for a 2k-token input on a modern x86 CPU.
150
 
151
- ## HHEM-2.1: The more powerful, proprietary counterpart of HHEM-2.1-Open
152
-
153
- As you may have already sensed from the name, HHEM-2.1-Open is the open source version of the premium HHEM-2.1. HHEM-2.1 (without the `-Open`) is offered exclusively via Vectara's RAG-as-a-service platform. The major difference between HHEM-2.1 and HHEM-2.1-Open is that HHEM-2.1 is cross-lingual on three languages: English, German, and French, while HHEM-2.1-Open is English-only. "Cross-lingual" means any combination of the three languages, e.g., documents in German, query in English, results in French.
154
 
155
- ### Why RAG in Vectara?
156
 
157
- Vectara provides a Trusted Generative AI platform. The platform allows organizations to rapidly create an AI assistant experience which is grounded in the data, documents, and knowledge that they have. Vectara's serverless RAG-as-a-Service also solves critical problems required for enterprise adoption, namely: reduces hallucination, provides explainability / provenance, enforces access control, allows for real-time updatability of the knowledge, and mitigates intellectual property / bias concerns from large language models.
158
 
159
- To start benefiting from HHEM-2.1, you can [sign up](https://console.vectara.com/signup/?utm_source=huggingface&utm_medium=space&utm_term=hhem-model&utm_content=console&utm_campaign=) for a Vectara account, and you will get the HHEM-2.1 score returned with every query automatically.
160
 
161
  Here are some additional resources:
162
  1. Vectara [API documentation](https://docs.vectara.com/docs).
163
- 2. Quick start using Forrest's [`vektara` package](https://vektara.readthedocs.io/en/latest/crash_course.html).
164
- 3. Learn more about Vectara's [Boomerang embedding model](https://vectara.com/blog/introducing-boomerang-vectaras-new-and-improved-retrieval-model/), [Slingshot reranker](https://vectara.com/blog/deep-dive-into-vectara-multilingual-reranker-v1-state-of-the-art-reranker-across-100-languages/), and [Mockingbird LLM](https://vectara.com/blog/mockingbird-a-rag-and-structured-output-focused-llm/)
 
165
 
166
- ## LLM Hallucination Leaderboard
167
- If you want to stay up to date with results of the latest tests using this model to evaluate the top LLM models, we have a [public leaderboard](https://huggingface.co/spaces/vectara/leaderboard) that is periodically updated, and results are also available on the [GitHub repository](https://github.com/vectara/hallucination-leaderboard).
168
 
169
  # Cite this model
170
 
 
9
 
10
  <img src="https://huggingface.co/vectara/hallucination_evaluation_model/resolve/main/candle.png" width="50" height="50" style="display: inline;"> In Loving memory of Simon Mark Hughes...
11
 
12
+ ## Quickstart: try HHEM-2.1-Open Live Demo
13
 
14
+ 👉 **[Launch Interactive Demo](https://huggingface.co/spaces/vectara/hhem-2.1-open-demo)** - No setup required, runs in your browser
15
 
16
+ <iframe src="https://vectara-hhem-2-1-open-demo.hf.space/" title="Demo for HHEM-2.1-Open"></iframe>
 
17
 
18
+ 💡 **Quick test**: Try inputting "The capital of France is Berlin" as premise and "The capital of France is Paris" as hypothesis to see HHEM detect this factual but hallucinated case.
19
 
20
+ HHEM-2.1-Open is the latest open source version of [Vectara](https://vectara.com)'s HHEM series models for detecting hallucinations in LLMs. These are particularly useful in the context of building retrieval-augmented-generation (RAG) applications or Agentic workflows, where a set of facts is summarized by an LLM, and HHEM can be used to measure the extent to which this summary is factually consistent with the facts.
21
 
22
  ## Hallucination Detection 101
23
+ By "hallucinated" or "factually inconsistent", we mean that a text (hypothesis, to be judged) is not supported by another text (evidence/premise, given). You **always need two** pieces of text to determine whether a text is hallucinated or not. When applied to RAG or AI Agents, the LLM is provided with several pieces of text (often called facts or context) retrieved from some dataset, and a hallucination would indicate that the summary (hypothesis) is not supported by those facts (evidence).
24
 
25
+ A common type of RAG hallucination is **factual but hallucinated**.
26
  For example, given the premise _"The capital of France is Berlin"_, the hypothesis _"The capital of France is Paris"_ is hallucinated -- although it is true in the world knowledge. This happens when LLMs do not generate content based on the textual data provided to them as part of the RAG retrieval process, but rather generate content based on their pre-trained knowledge.
27
 
28
  Additionally, hallucination detection is "asymmetric" or is not commutative. For example, the hypothesis _"I visited Iowa"_ is considered hallucinated given the premise _"I visited the United States"_, but the reverse is consistent.
29
 
30
+ 💡 **Using HHEM in production?** We'd love to hear about your use case! Connect with us on [LinkedIn](https://linkedin.com/company/vectara) or [Twitter](https://twitter.com/vectara).
31
+
32
+ ## AI Engineers: Stay Updated on Hallucination Mitigation
33
+
34
+ Hallucinations mitigation is a growing field, that includes both hallucination detection and correction.
35
+ Specifically:
36
+
37
+ - **HHEM** is a specialized model for **detecting** hallucinations in LLM outputs - it identifies when generated text is not supported by the provided context in RAG.
38
+ - **VHC (Vectara Hallucination Corrector)** can **correct** hallucinations - fix inaccurate generated content (for RAG or Agents), ensuring its consistent with the context. [Learn more about VHC](https://www.vectara.com/blog/vectaras-hallucination-corrector).
39
+
40
+ Together, these tools provide a comprehensive approach to hallucination mitigation, helping you to create better and more reliable AI applications.
41
+
42
+ **Want to learn more?** Here are some resources to learn more about hallucination mitigation and AI safety:
43
 
44
+ 1. 📧 **[Join our newsletter](https://21542831.hs-sites.com/vectara-community-newsletter-sign-up)** for up to date updates on hallucination mitigation.
45
+
46
+ 2. 🚀 **[Sign up for Vectara](https://console.vectara.com/signup/?utm_source=huggingface&utm_medium=space&utm_term=hhem-model&utm_content=console&utm_campaign=)** - Access Vectara Hallucination Corrector (VHC) and gain hands-on experience using RAG and Agent workflows with VHC.
47
+
48
+ 3. Join our community:
49
+ - 🔗 **LinkedIn**: Follow [@Vectara](https://linkedin.com/company/vectara) for AI safety insights and industry updates
50
+ - 🐦 **X/Twitter**: [@vectara](https://twitter.com/vectara) for real-time updates and research discussions
51
+ - ⭐ **GitHub**: [Star our hallucination leaderboard](https://github.com/vectara/hallucination-leaderboard) which tracks LLM hallucination rates across leading LLMs.
52
+
53
+ ## Using HHEM-2.1-Open
54
 
55
  Here we provide several ways to use HHEM-2.1-Open in the `transformers` library.
56
 
57
+ > You may run into a warning message that "Token indices sequence length is longer than the specified maximum sequence length". Please ignore it which is inherited from the foundation, T5-base.
58
 
59
  ### Using with `AutoModel`
60
 
 
122
 
123
  Of course, with `pipeline`, you can also get the most likely label, or the label with the highest score, by setting `top_k=1`.
124
 
125
+ ## HHEM-2.3 and the LLM Hallucination Leaderboard
126
+
127
+ **See how LLMs compare**: HHEM-2.3, our latest commercial hallucination detection model, powers our [live leaderboard](https://huggingface.co/spaces/vectara/leaderboard) that continuously benchmarks leading LLMs for hallucination rates. Watch comparisons of the latest models including GPT-5, Claude 4, Grok-4, Gemini 2.5, Llama4, Mistrak, DeepSeek, and many others.
128
+
129
+ **HHEM-2.3 advantages over the open source version**:
130
+ - Enhanced accuracy and performance
131
+ - Cross-lingual support (English, German, French, Portuguese, Spanish, Arabic, Chinese-Simplified, Korean, Russian, Japanese, and Hindi)
132
+ - Available exclusively via Vectara's platform
133
+
134
+ 🔍 **Track AI safety across the industry**: The leaderboard updates regularly as new models are released, giving the community insights into which LLMs are most reliable for factual tasks. All data and methodology are [open source](https://github.com/vectara/hallucination-leaderboard).
135
 
136
  ## HHEM-2.1-Open vs. HHEM-1.0
137
 
138
+ HHEM-1.0 is the first open version of HHEM. The major difference between HHEM-2.1-Open and the original HHEM-1.0 is that HHEM-2.1-Open has an unlimited context length, while HHEM-1.0 is capped at 512 tokens. The longer context length allows HHEM-2.1-Open to provide more accurate hallucination detection for RAG which often needs more than 512 tokens.
139
 
140
  The tables below compare the two models on the [AggreFact](https://arxiv.org/pdf/2205.12854) and [RAGTruth](https://arxiv.org/abs/2401.00396) benchmarks, as well as GPT-3.5-Turbo and GPT-4. In particular, on AggreFact, we focus on its SOTA subset (denoted as `AggreFact-SOTA`) which contains summaries generated by Google's T5, Meta's BART, and Google's Pegasus, which are the three latest models in the AggreFact benchmark. The results on RAGTruth's summarization (denoted as `RAGTruth-Summ`) and QA (denoted as `RAGTruth-QA`) subsets are reported separately. The GPT-3.5-Turbo and GPT-4 versions are 01-25 and 06-13 respectively. The zero-shot results of the two GPT models were obtained using the prompt template in [this paper](https://arxiv.org/pdf/2303.15621).
141
 
 
179
 
180
  Another advantage of HHEM-2.1-Open is its efficiency. HHEM-2.1-Open can be run on consumer-grade hardware, occupying less than 600MB RAM space at 32-bit precision and elapsing around 1.5 second for a 2k-token input on a modern x86 CPU.
181
 
182
+ ## Hallucination detection with Vectara
 
 
183
 
184
+ Vectara provides a Trusted Generative AI platform. The platform allows organizations to rapidly create an AI agent experience which is grounded in the data, documents, and knowledge that they have. Vectara solves critical problems required for enterprise adoption of RAG and Agentic AI applications, namely: reduces hallucination, provides explainability / provenance, enforces access control, allows for real-time updatability of the knowledge, and mitigates intellectual property / bias concerns from large language models.
185
 
186
+ HHEM-2.3 is fully integrated into Vectara and is automtically returned with every query API call.
187
 
188
+ To start benefiting from HHEM-2.3, you can [sign up](https://console.vectara.com/signup/?utm_source=huggingface&utm_medium=space&utm_term=hhem-model&utm_content=console&utm_campaign=) for a Vectara account, and you will get the HHEM-2.3 score returned with every query automatically.
189
 
190
  Here are some additional resources:
191
  1. Vectara [API documentation](https://docs.vectara.com/docs).
192
+ 2. Vectara [SDK](https://github.com/vectara/python-sdk).
193
+ 3. Quick start using Forrest's [`vektara` package](https://vektara.readthedocs.io/en/latest/crash_course.html).
194
+ 4. Learn more about Vectara's [Boomerang embedding model](https://vectara.com/blog/introducing-boomerang-vectaras-new-and-improved-retrieval-model/), [Slingshot reranker](https://vectara.com/blog/deep-dive-into-vectara-multilingual-reranker-v1-state-of-the-art-reranker-across-100-languages/), and [Mockingbird LLM](https://vectara.com/blog/mockingbird-a-rag-and-structured-output-focused-llm/)
195
 
 
 
196
 
197
  # Cite this model
198
 
Free AI Image Generator No sign-up. Instant results. Open Now