|  | --- | 
					
						
						|  | license: apache-2.0 | 
					
						
						|  | base_model: | 
					
						
						|  | - black-forest-labs/FLUX.1-dev | 
					
						
						|  | base_model_relation: quantized | 
					
						
						|  | pipeline_tag: text-to-image | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | # Elastic model: Fastest self-serving models. FLUX.1-dev. | 
					
						
						|  |  | 
					
						
						|  | Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models: | 
					
						
						|  |  | 
					
						
						|  | * __XL__: Mathematically equivalent neural network, optimized with our DNN compiler. | 
					
						
						|  |  | 
					
						
						|  | * __L__: Near lossless model, with less than 1% degradation obtained on corresponding benchmarks. | 
					
						
						|  |  | 
					
						
						|  | * __M__: Faster model, with accuracy degradation less than 1.5%. | 
					
						
						|  |  | 
					
						
						|  | * __S__: The fastest model, with accuracy degradation less than 2%. | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | __Goals of Elastic Models:__ | 
					
						
						|  |  | 
					
						
						|  | * Provide the fastest models and service for self-hosting. | 
					
						
						|  | * Provide flexibility in cost vs quality selection for inference. | 
					
						
						|  | * Provide clear quality and latency benchmarks. | 
					
						
						|  | * Provide interface of HF libraries: transformers and diffusers with a single line of code. | 
					
						
						|  | * Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT. | 
					
						
						|  |  | 
					
						
						|  | > It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well. | 
					
						
						|  |  | 
					
						
						|  | ----- | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Inference | 
					
						
						|  |  | 
					
						
						|  | Currently, our demo model only supports 1024x1024 outputs without batching. This will be updated in the near future. | 
					
						
						|  | To infer our models, you just need to replace `diffusers` import with `elastic_models.diffusers`: | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | import torch | 
					
						
						|  | from elastic_models.diffusers import FluxPipeline | 
					
						
						|  |  | 
					
						
						|  | mode_name = 'black-forest-labs/FLUX.1-dev' | 
					
						
						|  | hf_token = '' | 
					
						
						|  | device = torch.device("cuda") | 
					
						
						|  |  | 
					
						
						|  | pipeline = FluxPipeline.from_pretrained( | 
					
						
						|  | mode_name, | 
					
						
						|  | torch_dtype=torch.bfloat16, | 
					
						
						|  | token=hf_token, | 
					
						
						|  | mode='S' | 
					
						
						|  | ) | 
					
						
						|  | pipeline.to(device) | 
					
						
						|  |  | 
					
						
						|  | prompts = ["Kitten eating a banana"] | 
					
						
						|  | output = pipeline(prompt=prompts) | 
					
						
						|  |  | 
					
						
						|  | for prompt, output_image in zip(prompts, output.images): | 
					
						
						|  | output_image.save((prompt.replace(' ', '_') + '.png')) | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ### Installation | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | __System requirements:__ | 
					
						
						|  | * GPUs: H100, L40s, B200 | 
					
						
						|  | * CPU: AMD, Intel | 
					
						
						|  | * Python: 3.10-3.12 | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | To work with our models just run these lines in your terminal: | 
					
						
						|  |  | 
					
						
						|  | ```shell | 
					
						
						|  | pip install thestage | 
					
						
						|  | pip install elastic_models[nvidia]\ | 
					
						
						|  | --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\ | 
					
						
						|  | --extra-index-url https://pypi.nvidia.com\ | 
					
						
						|  | --extra-index-url https://pypi.org/simple | 
					
						
						|  |  | 
					
						
						|  | # or for blackwell support | 
					
						
						|  | pip install elastic_models[blackwell]\ | 
					
						
						|  | --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple\ | 
					
						
						|  | --extra-index-url https://pypi.nvidia.com\ | 
					
						
						|  | --extra-index-url https://pypi.org/simple | 
					
						
						|  |  | 
					
						
						|  | pip install flash_attn==2.7.3 --no-build-isolation | 
					
						
						|  | pip uninstall apex | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows: | 
					
						
						|  |  | 
					
						
						|  | ```shell | 
					
						
						|  | thestage config set --api-token <YOUR_API_TOKEN> | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | Congrats, now you can use accelerated models! | 
					
						
						|  |  | 
					
						
						|  | ---- | 
					
						
						|  |  | 
					
						
						|  | ## Benchmarks | 
					
						
						|  |  | 
					
						
						|  | Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms. | 
					
						
						|  |  | 
					
						
						|  | ### Quality benchmarks | 
					
						
						|  |  | 
					
						
						|  | For quality evaluation we have used: PSNR, SSIM and CLIP score. PSNR and SSIM were computed using outputs of original model. | 
					
						
						|  | | Metric/Model  | S | M | L | XL | Original | | 
					
						
						|  | |---------------|---|---|---|----|----------| | 
					
						
						|  | | PSNR          | 30.22 | 30.24 | 30.38 | inf  | inf | | 
					
						
						|  | | SSIM          | 0.72 | 0.72 | 0.76 | 1.0  | 1.0 | | 
					
						
						|  | | CLIP          | 12.49 | 12.51 | 12.69 | 12.41  | 12.41| | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ### Latency benchmarks | 
					
						
						|  |  | 
					
						
						|  | Time in seconds to generate one image 1024x1024 | 
					
						
						|  | | GPU/Model | S   | M | L | XL | Original | | 
					
						
						|  | |-----------|-----|---|---|----|----------| | 
					
						
						|  | | H100      | 2.71 | 3.0 | 3.18 | 4.17  | 6.46 | | 
					
						
						|  | | L40s      | 8.5  | 9.29 | 9.29 | 13.2  | 16| | 
					
						
						|  | | B200      | 1.89  | 2.04 | 2.12 | 2.23  | 4.4| | 
					
						
						|  | | GeForce RTX 5090      | 5.53  | - | - | -  | -| | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Links | 
					
						
						|  |  | 
					
						
						|  | * __Platform__: [app.thestage.ai](https://app.thestage.ai) | 
					
						
						|  | <!-- * __Elastic models Github__: [app.thestage.ai](app.thestage.ai) --> | 
					
						
						|  | * __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI) | 
					
						
						|  | * __Contact email__: [email protected] |