Update README.md

c9ad0c5 verified 8 months ago

5.02 kB

	---
	library_name: transformers
	license: cc-by-sa-4.0
	datasets:
	- classla/ParlaSpeech-RS
	- classla/ParlaSpeech-HR
	- classla/Mici_Princ
	language:
	- sl
	- hr
	- sr
	metrics:
	- accuracy
	base_model:
	- facebook/w2v-bert-2.0
	---

	# Model Card

	This model annotates primary stress in words on 20ms frames.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been Creatice Commons - Share Alike
	- Developed by: Peter Rupnik, Nikola Ljubešić, Ivan Porupski, Nejc Robida
	- Model type: Audio frame classifier
	- Language(s) (NLP): Croatian, Slovenian, Serbian, Chakavian variant of croatian
	- License: [More Information NeededC
	- Finetuned from model [optional]: [More Information Needed]

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Paper [optional]: Coming soon


	### Direct Use

	The model is intended for data-driven analyses in primary stress position. ATM, it has been proven to work on 4 datasets in 3 languages.


	## Example use

	```python
	import numpy as np

	from datasets import Audio, Dataset
	from transformers import AutoFeatureExtractor, Wav2Vec2BertForAudioFrameClassification
	import torch
	import numpy as np

	if torch.cuda.is_available():
	device = torch.device("cuda")
	else:
	device = torch.device("cpu")

	model_name = "5roop/Wav2Vec2BertPrimaryStressAudioFrameClassifier"
	feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
	model = Wav2Vec2BertForAudioFrameClassification.from_pretrained(model_name).to(device)
	# Path to the file, containing the word to be annotated:
	f = "wavs/word.wav"


	def frames_to_intervals(frames: list[int]) -> list[tuple[float]]:
	from itertools import pairwise
	import pandas as pd

	results = []
	ndf = pd.DataFrame(
	data={
	"time_s": [0.020 * i for i in range(len(frames))],
	"frames": frames,
	}
	)
	ndf = ndf.dropna()
	indices_of_change = ndf.frames.diff()[ndf.frames.diff() != 0].index.values
	for si, ei in pairwise(indices_of_change):
	if ndf.loc[si : ei - 1, "frames"].mode()[0] == 0:
	pass
	else:
	results.append(
	(round(ndf.loc[si, "time_s"], 3), round(ndf.loc[ei - 1, "time_s"], 3))
	)
	if results == []:
	return None
	# Post-processing: if multiple regions were returned, only the longest should be taken:
	if len(results) > 1:
	results = sorted(results, key=lambda t: t[1]-t[0], reverse=True)
	return results[0:1]


	def evaluator(chunks):
	sampling_rate = chunks["audio"][0]["sampling_rate"]
	with torch.no_grad():
	inputs = feature_extractor(
	[i["array"] for i in chunks["audio"]],
	return_tensors="pt",
	sampling_rate=sampling_rate,
	).to(device)
	logits = model(**inputs).logits
	y_pred_raw = np.array(logits.cpu())
	y_pred = y_pred_raw.argmax(axis=-1)
	primary_stress = [frames_to_intervals(i) for i in y_pred]
	return {
	"y_pred": y_pred,
	"y_pred_logits": y_pred_raw,
	"primary_stress": primary_stress,
	}

	# Create a dataset with a single instance and map our evaluator function on it:
	ds = Dataset.from_dict({"audio": [f]}).cast_column("audio", Audio(16000, mono=True))
	ds = ds.map(evaluator, batched=True, batch_size=1) # Adjust batch size according to your hardware specs
	print(ds["y_pred"][0])
	# Outputs: [0, 0, 1, 1, 1, 1, 1, ...]
	print(ds["y_pred_logits"][0])
	# Outputs:
	# [[ 0.89419061, -0.77746612],
	# [ 0.44213724, -0.34862748],
	# [-0.08605709, 0.13012762],
	# ....
	print(ds["prosodic_units"][0])
	# Outputs: [0.34, 0.4]

	```


	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	[More Information Needed]

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	[More Information Needed]

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Preprocessing [optional]

	[More Information Needed]


	#### Training Hyperparameters

	- Learning rate: 1e-5
	- Batch size: 32
	- Number of epochs: 20
	- Weight decay: 0.01
	- Gradient accumulation steps: 1

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics


	#### Summary




	## Citation

	Coming soon