r-g2-2024's picture
Update README.md
cc16e9c verified
|
raw
history blame
1.76 kB
metadata
license: other
language:
  - ja
base_model:
  - tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3
  - Qwen/Qwen2.5-VL-7B-Instruct
pipeline_tag: visual-question-answering

Llama-3.1-70B-Instruct-multimodal-JP-Graph - Built with Llama

Llama-3.1-70B-Instruct-multimodal-JP-Graph is a Japanese Large Vision Language Model. This model is based on Llama-3.1-Swallow-70B and Image Encoder of Qwen2-VL-7B.

How to use

1. Install LLaVA-NeXT

  • First, please install LLaVA-NeXT by following the instructions at the URL.
git clone https://github.com/LLaVA-VL/LLaVA-NeXT
cd LLaVA-NeXT
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # Enable PEP 660 support.
pip install -e ".[train]"

2. Install dependencies

pip install flash-attn==2.6.3
pip install transformers==4.45.2

3. Modify LLaVA-NeXT

  • Modify the LLaVA-NeXT code as follows.
    • Create the LLaVA-NeXT/llava/model/multimodal_encoder/qwen2_vl directory and copy the contents of the attached qwen2_vl directory into it.
    • Overwrite LLaVA-NeXT/llava/model/multimodal_encoder/builder.py with the attached "builder.py".
    • Copy the attached "qwen2vl_encoder.py" into LLaVA-NeXT/llava/model/multimodal_encoder/.
    • Overwrite LLaVA-NeXT/llava/model/language_model/llava_llama.py with the attached "llava_llama.py".
    • Overwrite LLaVA-NeXT/llava/model/llava_arch.py with the attached "llava_arch.py".
    • Overwrite LLaVA-NeXT/llava/conversation.py with the attached "conversation.py".

4. Inference

The following script loads the model and allows inference.