Model Card for Sentence Type Classification

This model is fine-tuned to classify Korean financial sentences into four categories: Predictive, Inferential, Factual, and Conversational. It is built upon jhgan/ko-sroberta-multitask, a multilingual transformer model specialized for Korean NLP tasks.

Model Details

Model Description

  • Developed by: Kwon Cho
  • Shared by: kwoncho
  • Model type: RoBERTa-based transformer (fine-tuned for sequence classification)
  • Language(s): Korean (한국어)
  • License: Apache 2.0 (from base model)
  • Finetuned from model: jhgan/ko-sroberta-multitask

This model was fine-tuned for multi-class classification using supervised learning with Hugging Face Transformers and PyTorch.

Model Sources

  • Repository: [More Information Needed]
  • Demo: [More Information Needed]

Uses

Direct Use

The model can be used to classify financial sentences (in Korean) into one of the following categories:

  • Predictive (예측형)
  • Inferential (추론형)
  • Factual (사실형)
  • Conversational (대화형)

Training Data

  • Dataset Name: 문장 유형(추론, 예측 등) 판단 데이터
  • 출처: AIHub 링크

이 데이터는 한국어 금융 문장을 다음 네 가지 유형으로 분류합니다:

  • 예측형 (Predictive)
  • 추론형 (Inferential)
  • 사실형 (Factual)
  • 대화형 (Conversational)

Out-of-Scope Use

  • Not suitable for general-purpose Korean sentence classification outside financial or economic contexts.
  • May not perform well on informal or highly colloquial text.

Bias, Risks, and Limitations

  • The model may carry biases present in the training dataset.
  • Misclassifications could have downstream implications if used for investment recommendations or financial analysis without verification.

Recommendations

Use this model in conjunction with human oversight, especially for high-stakes or production-level applications.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("kwoncho/sentence_type_classification")
model = AutoModelForSequenceClassification.from_pretrained("kwoncho/sentence_type_classification")

text = "해당 종목은 단기적으로 하락할 가능성이 있습니다."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
Downloads last month
5
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support