The current sentence in the model card
Gemma 3 models are multimodal, handling text and image input and generating text output
appears overly broad as not all Gemma 3 model sizes support image input (the smaller 270M and 1B variants are text-only).
· Sign up or log in to comment