{}
Model Card for Model ID
This modelcard aims to be a base template for new models. It has been generated using this raw template.
Model Details
This model classifies news headlines as either NBC or Fox News.
Model Description
- Developed by: Jack Bader, Kaiyuan Wang, Pairan Xu
- Taks: Binary classification (NBC News vs. Fox News)
- Preprocessing: TF-IDF vectorization applied to the text data
- stop_words = "english"
- max_features = 1000
- Model type: Random Forest
- Freamwork: Scikit-learn
Metrics
- Accuracy Score
Model Description
import pandas as pd import joblib from huggingface_hub import hf_hub_download from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics import classification_report
Mount to drive
from google.colab import drive drive.mount('/content/drive')
Load test set
test_df = pd.read_csv("/content/drive/MyDrive/test_data_random_subset.csv")
Log in w/ huggingface token
token: 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
!huggingface-cli login
Download the model
model = hf_hub_download(repo_id = "CIS5190FinalProj/GBTrees", filename = "gb_trees_model.pkl")
Download the vectorizer
tfidf_vectorizer = hf_hub_download(repo_id = "CIS5190FinalProj/GBTrees", filename = "tfidf_vectorizer.pkl")
Load the model
pipeline = joblib.load(model)
Load the vectorizer
tfidf_vectorizer = joblib.load(tfidf_vectorizer)
Extract the headlines from the test set
X_test = test_df['title']
Apply transformation to the headlines into numerical features
X_test_transformed = tfidf_vectorizer.transform(X_test)
Make predictions using the pipeline
y_pred = pipeline.predict(X_test_transformed)
Extract 'labels' as target
y_test = test_df['labels']
Print classification report
print(classification_report(y_test, y_pred))