shreyasmeher commited on
Commit
e845c90
·
verified ·
1 Parent(s): 8b78db9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -28
README.md CHANGED
@@ -48,24 +48,24 @@ inference:
48
  **ConflLlama** is a large language model fine-tuned to classify conflict events from text descriptions. This repository contains the GGUF quantized models (q4\_k\_m, q8\_0, and BF16) based on **Llama-3.1 8B**, which have been adapted for the specialized domain of political violence research.
49
 
50
  This model was developed as part of the research paper:
51
- **Meher, S., & Brandt, P. T. (2025). ConflLlama: Domain-specific adaptation of large language models for conflict event classification. [cite\_start]*Research & Politics*, July-September 2025. [https://doi.org/10.1177/20531680251356282](https://doi.org/10.1177/20531680251356282)** [cite: 3, 6, 8]
52
 
53
  -----
54
 
55
  ### Key Contributions
56
 
57
- [cite\_start]The ConflLlama project demonstrates how efficient fine-tuning of large language models can significantly advance the automated classification of political events[cite: 10]. The key contributions are:
58
 
59
- * [cite\_start]**State-of-the-Art Performance**: Achieves a macro-averaged AUC of 0.791 and a weighted F1-score of 0.753, representing a 37.6% improvement over the base model[cite: 15].
60
- * [cite\_start]**Efficient Domain Adaptation**: Utilizes Quantized Low-Rank Adaptation (QLORA) to fine-tune the Llama-3.1 8B model, making it accessible for researchers with consumer-grade hardware[cite: 51, 52, 73].
61
- * [cite\_start]**Enhanced Classification**: Delivers accuracy gains of up to 1463% in challenging and rare event categories like "Unarmed Assault"[cite: 15, 166].
62
- * [cite\_start]**Robust Multi-Label Classification**: Effectively handles complex events with multiple concurrent attack types, achieving a Subset Accuracy of 0.724[cite: 24, 180].
63
 
64
  -----
65
 
66
  ### Model Performance
67
 
68
- [cite\_start]ConflLlama variants substantially outperform the base Llama-3.1 model in zero-shot classification[cite: 123]. The fine-tuned models show significant gains across all major metrics, demonstrating the effectiveness of domain-specific adaptation.
69
 
70
  | Model | Accuracy | Macro F1 | Weighted F1 | AUC |
71
  | :--- | :--- | :--- | :--- | :--- |
@@ -73,24 +73,22 @@ This model was developed as part of the research paper:
73
  | ConflLlama-Q4 | 0.729 | 0.286 | 0.718 | 0.749 |
74
  | Base Llama-3.1 | 0.346 | 0.012 | 0.369 | 0.575 |
75
 
76
- [cite\_start]*Performance metrics are derived from Figures 2 and 3 in the research paper*[cite: 159, 175].
77
-
78
  The most significant improvements were observed in historically difficult-to-classify categories:
79
 
80
- * [cite\_start]**Unarmed Assault**: 1463% improvement (F1-score from 0.035 to 0.553)[cite: 166, 190].
81
- * [cite\_start]**Hostage Taking (Barricade)**: 692% improvement (F1-score from 0.045 to 0.353)[cite: 167, 190].
82
- * [cite\_start]**Hijacking**: 527% improvement (F1-score from 0.100 to 0.629)[cite: 167, 190].
83
- * [cite\_start]**Armed Assault**: 83.5% improvement (F1-score from 0.374 to 0.687)[cite: 171].
84
- * [cite\_start]**Bombing/Explosion**: 65.4% improvement (F1-score from 0.549 to 0.908)[cite: 170].
85
 
86
  -----
87
 
88
  ### Model Architecture and Training
89
 
90
  * **Base Model**: `unsloth/llama-3-8b-bnb-4bit`
91
- * [cite\_start]**Framework**: QLoRA (Quantized Low-Rank Adaptation) [cite: 51]
92
- * [cite\_start]**Hardware**: NVIDIA A100-SXM4-40GB GPU on the Delta Supercomputer at NCSA[cite: 250].
93
- * [cite\_start]**Optimizations**: 4-bit quantization, gradient checkpointing, and other memory-saving techniques were used to ensure the model could be trained and run on consumer-grade hardware (under 6 GB of VRAM)[cite: 273].
94
  * **LoRA Configuration**:
95
  * Rank (`r`): 8
96
  * Alpha (`lora_alpha`): 16
@@ -102,9 +100,9 @@ The most significant improvements were observed in historically difficult-to-cla
102
 
103
  ### Training Data
104
 
105
- * [cite\_start]**Dataset**: [Global Terrorism Database (GTD)](https://www.start.umd.edu/gtd/)[cite: 62]. [cite\_start]The GTD contains systematic data on over 200,000 terrorist incidents[cite: 75].
106
- * [cite\_start]**Time Period**: The training dataset consists of 171,514 events that occurred before January 1, 2017. The test set includes 38,192 events from 2017 onwards[cite: 90].
107
- * [cite\_start]**Preprocessing**: The pipeline filters data by date, cleans text summaries, and combines primary, secondary, and tertiary attack types into a single multi-label field[cite: 89, 104, 105].
108
 
109
  \<p align="center"\>
110
  \<img src="images/preprocessing.png" alt="Data Preprocessing Pipeline" width="800"/\>
@@ -114,7 +112,7 @@ The most significant improvements were observed in historically difficult-to-cla
114
 
115
  ### Intended Use
116
 
117
- [cite\_start]This model is designed for academic and research purposes within the fields of political science, conflict studies, and security analysis[cite: 16].
118
 
119
  1. **Classification of terrorist events** based on narrative descriptions.
120
  2. **Research** into patterns of political violence and terrorism.
@@ -122,16 +120,16 @@ The most significant improvements were observed in historically difficult-to-cla
122
 
123
  ### Limitations
124
 
125
- 1. [cite\_start]**Temporal Scope**: The model is trained on events prior to 2017 and may not fully capture novel or evolving attack patterns that have emerged since[cite: 89].
126
- 2. [cite\_start]**Task-Specific Focus**: The model is specialized for **attack type classification** and is not designed for identifying perpetrators, locations, or targets[cite: 74].
127
  3. **Data Dependency**: Performance is dependent on the quality and detail of the input event descriptions.
128
- 4. [cite\_start]**Semantic Ambiguity**: The model may occasionally struggle to distinguish between semantically close categories, such as 'Armed Assault' and 'Assassination,' when tactical details overlap[cite: 194, 195].
129
 
130
  ### Ethical Considerations
131
 
132
  1. The model is trained on sensitive data related to real-world terrorism and should be used responsibly.
133
  2. It is intended for research and analysis, **not for operational security decisions** or prognostications.
134
- 3. Outputs should be interpreted with an understanding of the data's context and the model's limitations. [cite\_start]Over-classification can lead to resource misallocation in real-world scenarios[cite: 289].
135
 
136
  -----
137
 
@@ -169,10 +167,10 @@ If you use this model or the findings from the paper in your research, please ci
169
 
170
  ### Acknowledgments
171
 
172
- * [cite\_start]This research was supported by **NSF award 2311142**[cite: 250].
173
- * [cite\_start]This work utilized the **Delta** system at the **NCSA (University of Illinois)** through ACCESS allocation **CIS220162**[cite: 250].
174
  * Thanks to the **Unsloth** team for their optimization framework and base model.
175
  * Thanks to **Hugging Face** for the model hosting and `transformers` infrastructure.
176
- * [cite\_start]Thanks to the **Global Terrorism Database** team at the University of Maryland[cite: 258].
177
 
178
  \<img src="[https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)" width="200"/\>
 
48
  **ConflLlama** is a large language model fine-tuned to classify conflict events from text descriptions. This repository contains the GGUF quantized models (q4\_k\_m, q8\_0, and BF16) based on **Llama-3.1 8B**, which have been adapted for the specialized domain of political violence research.
49
 
50
  This model was developed as part of the research paper:
51
+ **Meher, S., & Brandt, P. T. (2025). ConflLlama: Domain-specific adaptation of large language models for conflict event classification. *Research & Politics*, July-September 2025. [https://doi.org/10.1177/20531680251356282](https://doi.org/10.1177/20531680251356282)**
52
 
53
  -----
54
 
55
  ### Key Contributions
56
 
57
+ The ConflLlama project demonstrates how efficient fine-tuning of large language models can significantly advance the automated classification of political events. The key contributions are:
58
 
59
+ * **State-of-the-Art Performance**: Achieves a macro-averaged AUC of 0.791 and a weighted F1-score of 0.758, representing a 37.6% improvement over the base model.
60
+ * **Efficient Domain Adaptation**: Utilizes Quantized Low-Rank Adaptation (QLORA) to fine-tune the Llama-3.1 8B model, making it accessible for researchers with consumer-grade hardware.
61
+ * **Enhanced Classification**: Delivers accuracy gains of up to 1463% in challenging and rare event categories like "Unarmed Assault".
62
+ * **Robust Multi-Label Classification**: Effectively handles complex events with multiple concurrent attack types, achieving a Subset Accuracy of 0.724.
63
 
64
  -----
65
 
66
  ### Model Performance
67
 
68
+ ConflLlama variants substantially outperform the base Llama-3.1 model in zero-shot classification. The fine-tuned models show significant gains across all major metrics, demonstrating the effectiveness of domain-specific adaptation.
69
 
70
  | Model | Accuracy | Macro F1 | Weighted F1 | AUC |
71
  | :--- | :--- | :--- | :--- | :--- |
 
73
  | ConflLlama-Q4 | 0.729 | 0.286 | 0.718 | 0.749 |
74
  | Base Llama-3.1 | 0.346 | 0.012 | 0.369 | 0.575 |
75
 
 
 
76
  The most significant improvements were observed in historically difficult-to-classify categories:
77
 
78
+ * **Unarmed Assault**: 1464% improvement (F1-score from 0.035 to 0.553).
79
+ * **Hostage Taking (Barricade)**: 692% improvement (F1-score from 0.045 to 0.353).
80
+ * **Hijacking**: 527% improvement (F1-score from 0.100 to 0.629).
81
+ * **Armed Assault**: 84% improvement (F1-score from 0.374 to 0.687).
82
+ * **Bombing/Explosion**: 65% improvement (F1-score from 0.549 to 0.908).
83
 
84
  -----
85
 
86
  ### Model Architecture and Training
87
 
88
  * **Base Model**: `unsloth/llama-3-8b-bnb-4bit`
89
+ * **Framework**: QLoRA (Quantized Low-Rank Adaptation)
90
+ * **Hardware**: NVIDIA A100-SXM4-40GB GPU on the Delta Supercomputer at NCSA.
91
+ * **Optimizations**: 4-bit quantization, gradient checkpointing, and other memory-saving techniques were used to ensure the model could be trained and run on consumer-grade hardware (under 6 GB of VRAM).
92
  * **LoRA Configuration**:
93
  * Rank (`r`): 8
94
  * Alpha (`lora_alpha`): 16
 
100
 
101
  ### Training Data
102
 
103
+ * **Dataset**: [Global Terrorism Database (GTD)](https://www.start.umd.edu/gtd/). The GTD contains systematic data on over 200,000 terrorist incidents.
104
+ * **Time Period**: The training dataset consists of 171,514 events that occurred before January 1, 2017. The test set includes 38,192 events from 2017 onwards.
105
+ * **Preprocessing**: The pipeline filters data by date, cleans text summaries, and combines primary, secondary, and tertiary attack types into a single multi-label field.
106
 
107
  \<p align="center"\>
108
  \<img src="images/preprocessing.png" alt="Data Preprocessing Pipeline" width="800"/\>
 
112
 
113
  ### Intended Use
114
 
115
+ This model is designed for academic and research purposes within the fields of political science, conflict studies, and security analysis.
116
 
117
  1. **Classification of terrorist events** based on narrative descriptions.
118
  2. **Research** into patterns of political violence and terrorism.
 
120
 
121
  ### Limitations
122
 
123
+ 1. **Temporal Scope**: The model is trained on events prior to 2017 and may not fully capture novel or evolving attack patterns that have emerged since.
124
+ 2. **Task-Specific Focus**: The model is specialized for **attack type classification** and is not designed for identifying perpetrators, locations, or targets.
125
  3. **Data Dependency**: Performance is dependent on the quality and detail of the input event descriptions.
126
+ 4. **Semantic Ambiguity**: The model may occasionally struggle to distinguish between semantically close categories, such as 'Armed Assault' and 'Assassination,' when tactical details overlap.
127
 
128
  ### Ethical Considerations
129
 
130
  1. The model is trained on sensitive data related to real-world terrorism and should be used responsibly.
131
  2. It is intended for research and analysis, **not for operational security decisions** or prognostications.
132
+ 3. Outputs should be interpreted with an understanding of the data's context and the model's limitations. Over-classification can lead to resource misallocation in real-world scenarios.
133
 
134
  -----
135
 
 
167
 
168
  ### Acknowledgments
169
 
170
+ * This research was supported by **NSF award 2311142**.
171
+ * This work utilized the **Delta** system at the **NCSA (University of Illinois)** through ACCESS allocation **CIS220162**.
172
  * Thanks to the **Unsloth** team for their optimization framework and base model.
173
  * Thanks to **Hugging Face** for the model hosting and `transformers` infrastructure.
174
+ * Thanks to the **Global Terrorism Database** team at the University of Maryland.
175
 
176
  \<img src="[https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)" width="200"/\>