File size: 3,054 Bytes
43251d4
b5f41e0
b32ffdd
 
c10009b
 
7957232
bcdb661
 
b32ffdd
252c0c9
b7d75f5
43251d4
 
bd3e58f
78e272c
899b715
bd3e58f
78e272c
3f23add
bd3e58f
b5f41e0
 
3f23add
1dc8986
 
43251d4
 
 
 
 
b7d75f5
 
 
43251d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
252c0c9
 
 
 
 
 
b7d75f5
bd3e58f
252c0c9
 
 
b7d75f5
252c0c9
 
 
43251d4
 
 
da52d2c
 
 
43251d4
 
 
b5f41e0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: llama3
inference:
  parameters:
    num_beams: 3
    num_beam_groups: 3
    num_return_sequences: 1
    repetition_penalty: 3.0
    diversity_penalty: 3.01
    no_repeat_ngram_size: 2
    temperature: 0.8
    max_length: 64
widget:
- text: >-
    paraphraser: Learn to build generative AI applications with an expert AWS instructor with the 2-day Developing Generative AI Applications on AWS course.
  example_title: AWS course
- text: >-
    paraphraser: In healthcare, Generative AI can help generate synthetic medical data to train machine learning models, develop new drug candidates, and design clinical trials.
  example_title: Generative AI
- text: >-
    paraphraser: By leveraging prior model training through transfer learning, fine-tuning
    can reduce the amount of expensive computing power and labeled data needed
    to obtain large models tailored to niche use cases and business needs.
  example_title: Fine Tuning
extra_gated_fields:
  geo: ip_location
---


# Text Rewriter Paraphraser

This repository contains a fine-tuned text-rewriting model based on the T5-Base with 223M parameters.

Developed by: https://exnrt.com

## Key Features:

* **Fine-tuned on t5-base:** Leverages the power of a pre-trained text-to-text transfer model for effective paraphrasing.
* **Large Dataset (430k examples):** Trained on a comprehensive dataset combining three open-source sources and cleaned using various techniques for optimal performance.
* **High Quality Paraphrases:** Generates paraphrases that significantly alter sentence structure while maintaining accuracy and factual correctness.
* **Non-AI Detectable:** Aims to produce paraphrases that appear natural and indistinguishable from human-written text.

**Model Performance:**

* Train Loss: 1.0645
* Validation Loss: 0.8761

## Getting Started:

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Replace 'YOUR_TOKEN' with your actual Hugging Face access token
tokenizer = AutoTokenizer.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser", token='YOUR_TOKEN')
model = AutoModelForSeq2SeqLM.from_pretrained("Ateeqq/Text-Rewriter-Paraphraser", token='YOUR_TOKEN')

text = "paraphraser:" + "Data science is a field that deals with extracting knowledge and insights from data. "

inputs = tokenizer(text, return_tensors="pt")

output = model.generate(**inputs, max_length=64)

print(tokenizer.decode(output[0]))
```

**Disclaimer:**

* Limited Use: It grants a non-exclusive, non-transferable license to use the this model same as Llama-3. This means you can't freely share it with others or sell the model itself.
* Commercial Use Allowed: You can use the model for commercial purposes, but under the terms of the license agreement.
* Attribution Required: You need to abide by the agreement's terms regarding attribution. It is essential to use the paraphrased text responsibly and ethically, with proper attribution of the original source.

**Further Development:**

(Mention any ongoing development or areas for future improvement in Discussions.)