DeepAxion commited on
Commit
f8a32b7
·
verified ·
1 Parent(s): b1f1815

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -8
README.md CHANGED
@@ -2,13 +2,14 @@
2
  library_name: transformers
3
  license: mit
4
  datasets:
5
- - ajaykarthick/imdb-movie-reviews # Assuming this is the dataset you used on HF
6
  language:
7
  - en
8
  metrics:
9
  - accuracy
10
  - f1
11
  - recall
 
12
  base_model:
13
  - distilbert/distilbert-base-uncased-finetuned-sst-2-english
14
  ---
@@ -77,22 +78,25 @@ from transformers import AutoModelForSequenceClassification, AutoTokenizer
77
  import torch
78
 
79
  # Load the model and tokenizer from the Hugging Face Hub
80
- model_name = "DeepAxion/distilbert-imdb-sentiment" # REPLACE with your actual model ID
81
- tokenizer = AutoTokenizer.from_pretrained(model_name)
82
  model = AutoModelForSequenceClassification.from_pretrained(model_name)
 
 
 
83
 
84
  # Example Inference
85
  text = "This movie totally blew me away, absolutely brilliant acting and a fantastic plot!"
86
 
87
  inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
88
 
 
89
  with torch.no_grad():
90
  outputs = model(**inputs)
91
  logits = outputs.logits
92
  probabilities = torch.softmax(logits, dim=-1)
93
  prediction = torch.argmax(probabilities, dim=-1).item()
94
 
95
- sentiment_labels = {0: "Negative", 1: "Positive"} # Assuming 0: Negative, 1: Positive
96
 
97
  print(f"Input Text: \"{text}\"")
98
  print(f"Predicted Sentiment: {sentiment_labels[prediction]}")
@@ -107,10 +111,10 @@ The model was fine-tuned on the IMDb Large Movie Review Dataset. This dataset co
107
 
108
  Dataset Card: https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews (or the official IMDb dataset link if different)
109
 
110
- ## Preprocessing
111
  Text was tokenized using the DistilBertTokenizerFast associated with the base model. Input sequences were truncated to a maximum length of 512 tokens and padded to the longest sequence in the batch. Labels were mapped to 0 for negative and 1 for positive.
112
 
113
- ## Training Hyperparameters
114
  - Training regime: Mixed precision (fp16) was likely used for faster training and reduced memory footprint. (Confirm this if you know your specific training setup)
115
 
116
  - Optimizer: AdamW
@@ -125,7 +129,24 @@ Text was tokenized using the DistilBertTokenizerFast associated with the base mo
125
 
126
  - Framework: PyTorch
127
 
128
- ## Speeds, Sizes, Times
129
  Training Time: [E.g., Approximately 1-2 hours on a single Colab T4 GPU] (Estimate based on your experience)
130
 
131
- Model Size: The model.safetensors file is approximately 255 MB.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
  license: mit
4
  datasets:
5
+ - ajaykarthick/imdb-movie-reviews
6
  language:
7
  - en
8
  metrics:
9
  - accuracy
10
  - f1
11
  - recall
12
+ - precision
13
  base_model:
14
  - distilbert/distilbert-base-uncased-finetuned-sst-2-english
15
  ---
 
78
  import torch
79
 
80
  # Load the model and tokenizer from the Hugging Face Hub
81
+ model_name = "DeepAxion/distilbert-imdb-sentiment"
 
82
  model = AutoModelForSequenceClassification.from_pretrained(model_name)
83
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
84
+ # put the model in eval mode
85
+ model.eval()
86
 
87
  # Example Inference
88
  text = "This movie totally blew me away, absolutely brilliant acting and a fantastic plot!"
89
 
90
  inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
91
 
92
+ # turn on eval mode
93
  with torch.no_grad():
94
  outputs = model(**inputs)
95
  logits = outputs.logits
96
  probabilities = torch.softmax(logits, dim=-1)
97
  prediction = torch.argmax(probabilities, dim=-1).item()
98
 
99
+ sentiment_labels = {0: "Negative", 1: "Positive"}
100
 
101
  print(f"Input Text: \"{text}\"")
102
  print(f"Predicted Sentiment: {sentiment_labels[prediction]}")
 
111
 
112
  Dataset Card: https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews (or the official IMDb dataset link if different)
113
 
114
+ ### Preprocessing
115
  Text was tokenized using the DistilBertTokenizerFast associated with the base model. Input sequences were truncated to a maximum length of 512 tokens and padded to the longest sequence in the batch. Labels were mapped to 0 for negative and 1 for positive.
116
 
117
+ ### Training Hyperparameters
118
  - Training regime: Mixed precision (fp16) was likely used for faster training and reduced memory footprint. (Confirm this if you know your specific training setup)
119
 
120
  - Optimizer: AdamW
 
129
 
130
  - Framework: PyTorch
131
 
132
+ ### Speeds, Sizes, Times
133
  Training Time: [E.g., Approximately 1-2 hours on a single Colab T4 GPU] (Estimate based on your experience)
134
 
135
+ Model Size: The model.safetensors file is approximately 255 MB.
136
+
137
+ ## Metrics
138
+ The primary evaluation metrics used were:
139
+
140
+ - Accuracy: The proportion of correctly classified samples.
141
+ - F1-Score (weighted/macro): A measure combining precision and recall, useful for balanced assessment.
142
+ - Recall: The proportion of actual positive/negative samples that were correctly identified.
143
+ - Precision: The proportion of classified postive/negative that were actually positive/negative
144
+
145
+ ### Result
146
+ - Accuracy: 94%
147
+ - Recall: 94%
148
+ - Precision: 94%
149
+ - F1: 93%
150
+
151
+ ## Summary
152
+ The fine-tuned DistilBERT model demonstrates strong performance on the IMDb sentiment classification task, achieving high accuracy, F1-score, and recall on the test set.