Instructions to use urduhack/roberta-urdu-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use urduhack/roberta-urdu-small with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="urduhack/roberta-urdu-small")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small") model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small") - Notebooks
- Google Colab
- Kaggle
| language: ur | |
| thumbnail: https://raw.githubusercontent.com/urduhack/urduhack/master/docs/_static/urduhack.png | |
| tags: | |
| - roberta-urdu-small | |
| - urdu | |
| - transformers | |
| license: mit | |
| ## roberta-urdu-small | |
| [](https://github.com/urduhack/urduhack/blob/master/LICENSE) | |
| ### Overview | |
| **Language model:** roberta-urdu-small | |
| **Model size:** 125M | |
| **Language:** Urdu | |
| **Training data:** News data from urdu news resources in Pakistan | |
| ### About roberta-urdu-small | |
| roberta-urdu-small is a language model for urdu language. | |
| ``` | |
| from transformers import pipeline | |
| fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small") | |
| ``` | |
| ## Training procedure | |
| roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from | |
| urduhack to eliminate characters from other languages like arabic. | |
| ### About Urduhack | |
| Urduhack is a Natural Language Processing (NLP) library for urdu language. | |
| Github: https://github.com/urduhack/urduhack | |