RandomForest / README.md
Dada80's picture
Update README.md
40c3180 verified
|
raw
history blame
2.35 kB
metadata
{}

Model Card for Model ID

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

This model classifies news headlines as either NBC or Fox News.

Model Description

  • Developed by: Jack Bader, Kaiyuan Wang, Pairan Xu
  • Taks: Binary classification (NBC News vs. Fox News)
  • Preprocessing: TF-IDF vectorization applied to the text data
  • stop_words = "english"
  • max_features = 1000
  • Model type: Random Forest
  • Freamwork: Scikit-learn

Metrics

  • Accuracy Score

Model Description

import pandas as pd import joblib from huggingface_hub import hf_hub_download from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics import classification_report

Mount to drive

from google.colab import drive drive.mount('/content/drive')

Load test set

test_df = pd.read_csv("/content/drive/MyDrive/test_data_random_subset.csv")

Log in w/ huggingface token

token: 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

!huggingface-cli login

Download the model

model = hf_hub_download(repo_id = "CIS5190FinalProj/GBTrees", filename = "gb_trees_model.pkl")

Download the vectorizer

tfidf_vectorizer = hf_hub_download(repo_id = "CIS5190FinalProj/GBTrees", filename = "tfidf_vectorizer.pkl")

Load the model

pipeline = joblib.load(model)

Load the vectorizer

tfidf_vectorizer = joblib.load(tfidf_vectorizer)

Extract the headlines from the test set

X_test = test_df['title']

Apply transformation to the headlines into numerical features

X_test_transformed = tfidf_vectorizer.transform(X_test)

Make predictions using the pipeline

y_pred = pipeline.predict(X_test_transformed)

Extract 'labels' as target

y_test = test_df['labels']

Print classification report

print(classification_report(y_test, y_pred))