|  | --- | 
					
						
						|  | language: | 
					
						
						|  | - en | 
					
						
						|  | pipeline_tag: text-generation | 
					
						
						|  | inference: false | 
					
						
						|  | tags: | 
					
						
						|  | - facebook | 
					
						
						|  | - meta | 
					
						
						|  | - pytorch | 
					
						
						|  | - mistral | 
					
						
						|  | - inferentia2 | 
					
						
						|  | - neuron | 
					
						
						|  | --- | 
					
						
						|  | # Neuronx model for [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) | 
					
						
						|  |  | 
					
						
						|  | This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). | 
					
						
						|  | You can find detailed information about the base model on its [Model Card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1). | 
					
						
						|  |  | 
					
						
						|  | This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below. | 
					
						
						|  |  | 
					
						
						|  | Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters. | 
					
						
						|  |  | 
					
						
						|  | ## Usage on Amazon SageMaker | 
					
						
						|  |  | 
					
						
						|  | _coming soon_ | 
					
						
						|  |  | 
					
						
						|  | ## Usage with 🤗 `optimum-neuron` | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | >>> from optimum.neuron import pipeline | 
					
						
						|  |  | 
					
						
						|  | >>> p = pipeline('text-generation', 'aws-neuron/Mistral-7B-Instruct-v0.1-neuron-1x2048-2-cores') | 
					
						
						|  | >>> p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50) | 
					
						
						|  | [{'generated_text': 'My favorite place on earth is the ocean. It is where I feel most | 
					
						
						|  | at peace. I love to travel and see new places. I have a'}] | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints. | 
					
						
						|  |  | 
					
						
						|  | ## Arguments passed during export | 
					
						
						|  |  | 
					
						
						|  | **input_shapes** | 
					
						
						|  |  | 
					
						
						|  | ```json | 
					
						
						|  | { | 
					
						
						|  | "batch_size": 1, | 
					
						
						|  | "sequence_length": 2048, | 
					
						
						|  | } | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | **compiler_args** | 
					
						
						|  |  | 
					
						
						|  | ```json | 
					
						
						|  | { | 
					
						
						|  | "auto_cast_type": "bf16", | 
					
						
						|  | "num_cores": 2, | 
					
						
						|  | } | 
					
						
						|  | ``` | 
					
						
						|  |  |