Improve Metadata and add Paper/Github links (#1)
Browse files- Improve Metadata and add Paper/Github links (4e8d1f3edd9e2007995a9711acc15026ff468964)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,14 +1,19 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
base_model:
|
| 6 |
- OpenGVLab/InternVL2_5-8B
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
**DriveLMM-o1: A Large Multimodal Model for Autonomous Driving Reasoning**
|
| 11 |
|
|
|
|
|
|
|
| 12 |
DriveLMM-o1 is a fine-tuned large multimodal model designed for autonomous driving. Built on InternVL2.5-8B with LoRA-based adaptation, it leverages stitched multiview images to produce step-by-step reasoning. This structured approach enhances both final decision accuracy and interpretability in complex driving tasks like perception, prediction, and planning.
|
| 13 |
|
| 14 |
**Key Features:**
|
|
@@ -57,6 +62,8 @@ tokenizer = AutoTokenizer.from_pretrained(
|
|
| 57 |
|
| 58 |
For detailed usage instructions and additional configurations, please refer to the [OpenGVLab/InternVL2_5-8B](https://huggingface.co/OpenGVLab/InternVL2_5-8B) repository.
|
| 59 |
|
|
|
|
|
|
|
| 60 |
|
| 61 |
**Limitations:**
|
| 62 |
-
While DriveLMM-o1 demonstrates strong performance in autonomous driving tasks, it is fine-tuned for domain-specific reasoning. Users may need to further fine-tune or adapt the model for different driving environments.
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- OpenGVLab/InternVL2_5-8B
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: image-text-to-text
|
| 8 |
+
library_name: transformers
|
| 9 |
+
datasets:
|
| 10 |
+
- ayeshaishaq/DriveLMMo1
|
| 11 |
---
|
| 12 |
|
| 13 |
**DriveLMM-o1: A Large Multimodal Model for Autonomous Driving Reasoning**
|
| 14 |
|
| 15 |
+
[Paper](https://arxiv.org/abs/2503.10621)
|
| 16 |
+
|
| 17 |
DriveLMM-o1 is a fine-tuned large multimodal model designed for autonomous driving. Built on InternVL2.5-8B with LoRA-based adaptation, it leverages stitched multiview images to produce step-by-step reasoning. This structured approach enhances both final decision accuracy and interpretability in complex driving tasks like perception, prediction, and planning.
|
| 18 |
|
| 19 |
**Key Features:**
|
|
|
|
| 62 |
|
| 63 |
For detailed usage instructions and additional configurations, please refer to the [OpenGVLab/InternVL2_5-8B](https://huggingface.co/OpenGVLab/InternVL2_5-8B) repository.
|
| 64 |
|
| 65 |
+
Code: [https://github.com/Vision-CAIR/DriveLMM](https://github.com/Vision-CAIR/DriveLMM)
|
| 66 |
+
|
| 67 |
|
| 68 |
**Limitations:**
|
| 69 |
+
While DriveLMM-o1 demonstrates strong performance in autonomous driving tasks, it is fine-tuned for domain-specific reasoning. Users may need to further fine-tune or adapt the model for different driving environments.
|