File size: 2,636 Bytes
1f34424
ae1c3ae
1f34424
 
 
 
 
 
 
 
 
 
 
 
f4d75d0
 
 
 
 
 
 
 
 
 
 
 
ae1c3ae
f4d75d0
 
 
 
 
ae1c3ae
f4d75d0
ae1c3ae
 
 
 
 
 
f4d75d0
ae1c3ae
f4d75d0
ae1c3ae
f4d75d0
ae1c3ae
f4d75d0
ae1c3ae
 
 
f4d75d0
ae1c3ae
f4d75d0
ae1c3ae
f4d75d0
ae1c3ae
f4d75d0
ae1c3ae
f4d75d0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ae1c3ae
e95d452
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: apache-2.0
datasets:
- nhull/tripadvisor-split-dataset-v2
language:
- en
pipeline_tag: text-classification
tags:
- sentiment-analysis
- logistic-regression
- text-classification
- hotel-reviews
- tripadvisor
- nlp
---

# Logistic Regression Sentiment Analysis Model

This model is a **Logistic Regression** classifier trained on the **TripAdvisor sentiment analysis dataset**. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.

## Model Details

- **Model Type**: Logistic Regression
- **Task**: Sentiment Analysis
- **Input**: A hotel review (text)
- **Output**: Sentiment rating (1-5 stars)
- **Trained Dataset**: [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2)

## Intended Use

This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.

---

**The model will return a sentiment rating** between 1 and 5 stars, where:
   - 1: Very bad
   - 2: Bad
   - 3: Neutral
   - 4: Good
   - 5: Very good

---

### Dataset

The dataset used for training, validation, and testing is [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2). It consists of:

- **Training Set**: 30,400 reviews
- **Validation Set**: 1,600 reviews
- **Test Set**: 8,000 reviews

All splits are balanced across five sentiment labels.

--- 

### Test Performance

Model predicts too high on average by `0.44`.

- **Test Accuracy**: 61.05% on the test set.
  
- **Classification Report** (Test Set):

| Label | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 1.0   | 0.70      | 0.73   | 0.71     | 1600    |
| 2.0   | 0.52      | 0.50   | 0.51     | 1600    |
| 3.0   | 0.57      | 0.54   | 0.55     | 1600    |
| 4.0   | 0.55      | 0.54   | 0.55     | 1600    |
| 5.0   | 0.71      | 0.74   | 0.72     | 1600    |
| **Accuracy** | -   | -      | **0.61**  | 8000    |
| **Macro avg** | 0.61 | 0.61   | 0.61     | 8000    |
| **Weighted avg** | 0.61 | 0.61 | 0.61     | 8000    |

---

## Limitations

- The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
- The model was trained on the **TripAdvisor** dataset and may not generalize well to reviews from other sources or domains.
- The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.