Update README.md (#8)

- Update README.md (a087c76b076f44db0fa2c2e56b4a30ad3b67f1ab)

Co-authored-by: Vaibhav Srivastav <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -67,22 +67,32 @@ Try out Bark yourself!
 </a>
-## 🤗 Transformers Usage
 You can run Bark locally with the 🤗 Transformers library from version 4.31.0 onwards.
-1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main:
 ```
-pip install git+https://github.com/huggingface/transformers.git
 ```
-2. Run the following Python code to generate speech samples:
 ```python
-from transformers import AutoProcessor, AutoModel
 processor = AutoProcessor.from_pretrained("suno/bark-small")
 model = AutoModel.from_pretrained("suno/bark-small")
@@ -95,7 +105,7 @@ inputs = processor(
 speech_values = model.generate(**inputs, do_sample=True)
 ```
-3. Listen to the speech samples either in an ipynb notebook:
 ```python
 from IPython.display import Audio
@@ -109,7 +119,7 @@ Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
 ```python
 import scipy
-sampling_rate = model.generation_config.sample_rate
 scipy.io.wavfile.write("bark_out.wav", rate=sampling_rate, data=speech_values.cpu().numpy().squeeze())
 ```

 </a>
 You can run Bark locally with the 🤗 Transformers library from version 4.31.0 onwards.
+1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) and scipy:
 ```
+pip install --upgrade pip
+pip install --upgrade transformers scipy
 ```
+2. Run inference via the `Text-to-Speech` (TTS) pipeline. You can infer the bark model via the TTS pipeline in just a few lines of code!
 ```python
+from transformers import pipeline
+import scipy
+synthesiser = pipeline("text-to-speech", "suno/bark-small")
+speech = pipe("Hello, my dog is cooler than you!", forward_params={"do_sample": True})
+scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=speech["audio"])
+```
+3. Run inference via the Transformers modelling code. You can use the processor + generate code to convert text into a mono 24 kHz speech waveform for more fine-grained control.
+```python
+from transformers import AutoProcessor, AutoModel
 processor = AutoProcessor.from_pretrained("suno/bark-small")
 model = AutoModel.from_pretrained("suno/bark-small")
 speech_values = model.generate(**inputs, do_sample=True)
 ```
+4. Listen to the speech samples either in an ipynb notebook:
 ```python
 from IPython.display import Audio
 ```python
 import scipy
+sampling_rate = model.config.sample_rate
 scipy.io.wavfile.write("bark_out.wav", rate=sampling_rate, data=speech_values.cpu().numpy().squeeze())
 ```