| # 📺 T5 YouTube Summarizer | |
| This is a fine-tuned [`t5-base`](https://huggingface.co/t5-base) model for abstractive summarization of YouTube video transcripts. The model is trained on a custom dataset of video transcriptions and their manually written summaries. | |
| --- | |
| ## ✨ Model Details | |
| - **Base Model**: [`t5-base`](https://huggingface.co/t5-base) | |
| - **Task**: Abstractive Summarization | |
| - **Training Data**: YouTube video transcripts and human-written summaries | |
| - **Max Input Length**: 512 tokens | |
| - **Max Output Length**: 256 tokens | |
| - **Fine-tuning Epochs**: 10 | |
| - **Tokenizer**: `T5Tokenizer` (pretrained) | |
| --- | |
| ## 🧠 Intended Use | |
| This model is designed to generate short, informative summaries from long transcripts of educational or conceptual YouTube videos. It can be used for: | |
| - Quick understanding of long videos | |
| - Automated content summaries for blogs, platforms, or note-taking tools | |
| - Enhancing accessibility for long-form spoken content | |
| --- | |
| ## 🚀 How to Use | |
| ```python | |
| from transformers import T5ForConditionalGeneration, T5Tokenizer | |
| # Load the model | |
| model = T5ForConditionalGeneration.from_pretrained("your-username/t5-youtube-summarizer") | |
| tokenizer = T5Tokenizer.from_pretrained("your-username/t5-youtube-summarizer") | |
| # Define input text | |
| text = "The video talks about coordinate covalent bonds, giving examples from..." | |
| # Preprocess and summarize | |
| inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True) | |
| summary_ids = model.generate( | |
| inputs, | |
| max_length=256, | |
| min_length=80, | |
| num_beams=5, | |
| length_penalty=2.0, | |
| no_repeat_ngram_size=3, | |
| early_stopping=True | |
| ) | |
| summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) | |
| print(summary) | |