Azzindani commited on
Commit
fe1cb3b
·
verified ·
1 Parent(s): 19f06e3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +134 -12
README.md CHANGED
@@ -1,21 +1,143 @@
1
  ---
2
- base_model: unsloth/deepseek-r1-0528-qwen3-8b-unsloth-bnb-4bit
3
  tags:
4
- - text-generation-inference
5
- - transformers
6
  - unsloth
7
- - qwen3
8
  license: apache-2.0
9
- language:
10
- - en
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
14
 
15
- - **Developed by:** Azzindani
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/deepseek-r1-0528-qwen3-8b-unsloth-bnb-4bit
18
 
19
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
2
+ library_name: transformers
3
  tags:
 
 
4
  - unsloth
 
5
  license: apache-2.0
6
+ datasets:
7
+ - Azzindani/Indonesian_Legal_QA
8
+ base_model:
9
+ - deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
10
  ---
11
 
12
+ 📧 [Github](https://github.com/azzindani)
13
+ 🔗 [LinkedIn](https://www.linkedin.com/in/azzindan1/)
14
 
15
+ # 🧠 DeepSeek-R1-0528-Qwen3-8B Instruct Fine-Tuned with GRPO on Indonesian Legal QA Dataset
 
 
16
 
17
+ Welcome! This repository hosts a **fine-tuned version of [DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B)** using **Group Relative Policy Optimization (GRPO)**, trained on a custom **Indonesian Legal Q\&A Dataset**. The goal is to enhance the model's reasoning and **structured thinking** capabilities for legal question-answering tasks.
18
+ You can try the demo [here](https://www.kaggle.com/code/azzindani/indonesian-legal-assistant-preview)
19
+
20
+ ---
21
+
22
+ ## 🚀 Model Summary
23
+
24
+ * **Purpose**: Research and Development
25
+ * **Base Model**: DeepSeek-R1-0528-Qwen3-8B
26
+ * **Language**: Bahasa Indonesia 🇮🇩
27
+ * **Domain**: Legal / Law (Q\&A format)
28
+ * **Purpose**: Boost performance in structured, legal reasoning under Indonesian legal context
29
+
30
+ ---
31
+
32
+ ## 🏋️ Training Summary
33
+
34
+ * **Fine-tuning Method**: Group Relative Policy Optimization (GRPO) combined with Knowledge Distillation
35
+ * **Pipeline**: Cloud to Cloud training
36
+ * **Dataset**: **Indonesian Legal Questions and Answers** [pertanyaan hukum](https://huggingface.co/datasets/Azzindani/Indonesian_Legal_QA)
37
+ * **Compute**: Nvidia RTX6000 Ada
38
+ * **Provider**: [vast.ai](https://cloud.vast.ai/)
39
+ * **Training Steps**: 2000
40
+ * **Number of Generation**: 16
41
+ * **Cost**: 50 USD
42
+ * **Distilled Knowledge**: [DeepSeek_0528_8B_Legal_Distill](https://huggingface.co/datasets/Azzindani/DeepSeek_0528_8B_Legal_Distill)
43
+
44
+ ---
45
+
46
+ ## 🧩 What is GRPO?
47
+
48
+ **Group Relative Policy Optimization (GRPO)** is an advanced reinforcement learning fine-tuning technique that:
49
+
50
+ * Groups samples by difficulty or topic (e.g., legal concepts)
51
+ * Encourages policies (model outputs) to optimize within their group context
52
+ * Promotes **structured and relative improvements**, not just raw accuracy
53
+
54
+ This method leads to:
55
+
56
+ * **Better structured answers**
57
+ * **Improved logical flow**
58
+ * **Greater consistency** in domain-specific reasoning (e.g., answering legal queries with relevant laws and regulations)
59
+
60
+ ---
61
+
62
+ ## 🧠 Structured Thinking Enabled
63
+
64
+ The fine-tuned model is trained to think in **steps** using GRPO:
65
+
66
+ 1. **Understand the legal context**
67
+ 2. **Identify the relevant law**
68
+ 3. **Apply reasoning with facts**
69
+ 4. **Summarize the legal conclusion clearly**
70
+
71
+ This mimics how **law students or practitioners approach** legal cases, making the model suitable for:
72
+
73
+ * Law education
74
+ * Legal chatbot assistants
75
+ * Indonesian legal exam prep
76
+
77
+ ---
78
+
79
+ ## 💻 How to Use
80
+
81
+ ```python
82
+ import torch
83
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
84
+
85
+ # Load the tokenizer and model
86
+ model_name = 'Azzindani/Deepseek_ID_Legal_Preview'
87
+
88
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
89
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map = 'auto', torch_dtype = torch.float16)
90
+
91
+ SYSTEM_PROMPT = '''
92
+ Anda adalah asisten AI yang ahli di bidang hukum Indonesia. Anda dapat membantu konsultasi hukum, menjawab pertanyaan, dan memberikan analisis berdasarkan peraturan perundang-undangan yang relevan.
93
+ Untuk setiap respons, Anda harus berfikir dan menjawab dengan Bahasa Indonesia, serta gunakan format:
94
+ <think>
95
+ ...
96
+ </think>
97
+ Tuliskan jawaban akhir secara jelas, ringkas, profesional, dan berempati jika diperlukan. Gunakan bahasa hukum yang mudah dipahami. Sertakan referensi hukum Indonesia yang relevan. Selalu rekomendasikan konsultasi dengan ahli hukum untuk keputusan final.
98
+ '''
99
+
100
+ prompt = '''
101
+ Adakah hukumnya yang mengatur pembagian persentase/laba dalam mendirikan suatu perusahaan?
102
+ Dan berapa persenkah yang didapat oleh si pemilik ide untuk mendirikan perusahaan,
103
+ jika dia tidak menyetor modal sedikit pun atau hanya menjalankan saja?
104
+ '''
105
+
106
+ conversation = [
107
+ {"role": "system", "content": SYSTEM_PROMPT},
108
+ {"role": "user", "content": "Apa dasar hukum pemecatan PNS di Indonesia?"}
109
+ ]
110
+
111
+ input_ids = tokenizer.apply_chat_template([
112
+ {'role' : 'system', 'content' : SYSTEM_PROMPT},
113
+ {'role' : 'user', 'content' : prompt}],
114
+ tokenize = True,
115
+ add_generation_prompt = True,
116
+ return_tensors = 'pt'
117
+ ).to(model.device)
118
+
119
+ streamer = TextStreamer(tokenizer, skip_prompt = True, skip_special_tokens = True)
120
+
121
+ result = model.generate(
122
+ input_ids,
123
+ streamer = streamer,
124
+ max_new_tokens = 2048,
125
+ do_sample = True,
126
+ temperature = 0.7,
127
+ min_p = 0.1,
128
+ top_p = 1.0,
129
+ top_k = 20
130
+ )
131
+
132
+ ```
133
+
134
+ ---
135
+
136
+
137
+ ## 🤝 Acknowledgements
138
+
139
+ * [Deepseek team](https://huggingface.co/deepseek-ai)
140
+ * [GRPO research paper](https://arxiv.org/abs/2402.03300)
141
+
142
+ ---
143