---
# Metadata for Hugging Face repo card
library_name: transformers
pipeline_tag: feature-extraction
license: apache-2.0
tags:
  - autoencoder
  - pytorch
  - reconstruction
  - preprocessing
  - normalizing-flow
  - scaler
---

# Autoencoder Implementation for Hugging Face Transformers

A complete autoencoder implementation that integrates seamlessly with the Hugging Face Transformers ecosystem, providing all the standard functionality you expect from transformer models.


### Install-and-Use from the Hub (code repo)

If you want to use the implementation directly from the Hub code repository (without a packaged pip install), you can download the repo and add it to `sys.path`:

```python
from huggingface_hub import snapshot_download
import sys, torch

repo_dir = snapshot_download(
    "amaye15/autoencoder",
    repo_type="model",
    allow_patterns=["*.py", "config.json", "*.safetensors"],
)
sys.path.append(repo_dir)

from configuration_autoencoder import AutoencoderConfig
from modeling_autoencoder import AutoencoderForReconstruction

# Load placeholder weights from the same repo (or your own trained weights)
model = AutoencoderForReconstruction.from_pretrained(
    "amaye15/autoencoder",
    trust_remote_code=True,
)

# Quick smoke test
x = torch.randn(8, 20)
outputs = model(input_values=x)
print("Reconstructed:", tuple(outputs.reconstructed.shape), "Latent:", tuple(outputs.last_hidden_state.shape))
```

## 🚀 Features

- **Full Hugging Face Integration**: Compatible with `AutoModel`, `AutoConfig`, and `AutoTokenizer` patterns
- **Standard Training Workflows**: Works with `Trainer`, `TrainingArguments`, and all HF training utilities
- **Model Hub Compatible**: Save and share models on Hugging Face Hub with `push_to_hub()`
- **Flexible Architecture**: Configurable encoder-decoder architecture with various activation functions
- **Multiple Loss Functions**: Support for MSE, BCE, L1, Huber, Smooth L1, KL Divergence, Cosine, Focal, Dice, Tversky, SSIM, and Perceptual loss
- **Multiple Autoencoder Types (7)**: Classic, Variational (VAE), Beta-VAE, Denoising, Sparse, Contractive, and Recurrent autoencoders
- **Extended Activation Functions**: 18+ activation functions including ReLU, GELU, Swish, Mish, ELU, and more
- **Learnable Preprocessing**: Neural Scaler, Normalizing Flow, MinMax Scaler (learnable), Robust Scaler (learnable), and Yeo-Johnson preprocessors (2D and 3D tensors)
- **Extensible Design**: Easy to extend for new autoencoder variants and custom loss functions
- **Production Ready**: Proper serialization, checkpointing, and inference support


## 🏗️ Architecture

The implementation consists of three main components:

### 1. AutoencoderConfig
Configuration class that inherits from `PretrainedConfig`:
- Defines model architecture parameters
- Handles validation and serialization
- Enables `AutoConfig.from_pretrained()` functionality

### 2. AutoencoderModel
Base model class that inherits from `PreTrainedModel`:
- Implements encoder-decoder architecture
- Provides latent space representation
- Returns structured outputs with `AutoencoderOutput`

### 3. AutoencoderForReconstruction
Task-specific model for reconstruction:
- Adds reconstruction loss calculation
- Compatible with `Trainer` for easy training
- Returns `AutoencoderForReconstructionOutput` with loss

## 🔧 Quick Start

### Basic Usage

```python
from configuration_autoencoder import AutoencoderConfig
from modeling_autoencoder import AutoencoderForReconstruction
import torch

# Create configuration
config = AutoencoderConfig(
    input_dim=784,              # Input dimensionality (e.g., 28x28 images flattened)
    hidden_dims=[512, 256],     # Encoder hidden layers
    latent_dim=64,              # Latent space dimension
    activation="gelu",          # Activation function (18+ options available)
    reconstruction_loss="mse",  # Loss function (12+ options available)
    autoencoder_type="classic", # Autoencoder type (7 types available)
    # Optional learnable preprocessing
    use_learnable_preprocessing=True,
    preprocessing_type="neural_scaler",  # or "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson"
)

# Create model
model = AutoencoderForReconstruction(config)

# Forward pass
input_data = torch.randn(32, 784)  # Batch of 32 samples
outputs = model(input_values=input_data)

print(f"Reconstruction loss: {outputs.loss}")
print(f"Latent shape: {outputs.last_hidden_state.shape}")
print(f"Reconstructed shape: {outputs.reconstructed.shape}")
```


### Training with Hugging Face Trainer

```python
from transformers import Trainer, TrainingArguments
from torch.utils.data import Dataset

class AutoencoderDataset(Dataset):
    def __init__(self, data):
        self.data = torch.FloatTensor(data)

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return {
            "input_values": self.data[idx],
            "labels": self.data[idx]  # For autoencoder, input = target
        }

# Prepare data
train_dataset = AutoencoderDataset(your_training_data)
val_dataset = AutoencoderDataset(your_validation_data)

# Training arguments
training_args = TrainingArguments(
    output_dir="./autoencoder_output",
    num_train_epochs=10,
    per_device_train_batch_size=64,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    evaluation_strategy="steps",
    eval_steps=500,
    save_steps=1000,
    load_best_model_at_end=True,
)

# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

# Train
trainer.train()

# Save model
model.save_pretrained("./my_autoencoder")
config.save_pretrained("./my_autoencoder")
```

### Using AutoModel Framework

```python
from register_autoencoder import register_autoencoder_models
from transformers import AutoConfig, AutoModel

# Register models with AutoModel framework
register_autoencoder_models()

# Now you can use standard HF patterns
config = AutoConfig.from_pretrained("./my_autoencoder")
model = AutoModel.from_pretrained("./my_autoencoder")

# Use the model
outputs = model(input_values=your_data)
```

## ⚙️ Configuration Options

The `AutoencoderConfig` class supports extensive customization:

```python
config = AutoencoderConfig(
    input_dim=784,                    # Input dimension
    hidden_dims=[512, 256, 128],      # Encoder hidden layers
    latent_dim=64,                    # Latent space dimension
    activation="gelu",                # Activation function (see full list below)
    dropout_rate=0.1,                 # Dropout rate (0.0 to 1.0)
    use_batch_norm=True,              # Use batch normalization
    tie_weights=False,                # Tie encoder/decoder weights
    reconstruction_loss="mse",        # Loss function (see full list below)
    autoencoder_type="variational",   # Autoencoder type (see types below)
    beta=0.5,                         # Beta parameter for β-VAE
    temperature=1.0,                  # Temperature for Gumbel softmax
    noise_factor=0.1,                 # Noise factor for denoising AE
    # Recurrent autoencoder parameters
    rnn_type="lstm",                  # RNN type: "lstm", "gru", "rnn"
    num_layers=2,                     # Number of RNN layers
    bidirectional=True,               # Bidirectional encoding
    sequence_length=None,             # Fixed sequence length (None for variable)
    teacher_forcing_ratio=0.5,        # Teacher forcing ratio during training
    # Learnable preprocessing parameters
    use_learnable_preprocessing=False, # Enable learnable preprocessing
    preprocessing_type="none",        # "none", "neural_scaler", "normalizing_flow"
    preprocessing_hidden_dim=64,      # Hidden dimension for preprocessing networks
    preprocessing_num_layers=2,       # Number of layers in preprocessing networks
    learn_inverse_preprocessing=True, # Learn inverse transformation
    flow_coupling_layers=4,           # Number of coupling layers for flows
)
```

### 🎛️ Available Activation Functions

**Standard Activations:**
- `relu`, `leaky_relu`, `relu6`, `elu`, `prelu`
- `tanh`, `sigmoid`, `hardsigmoid`, `hardtanh`
- `gelu`, `swish`, `silu`, `hardswish`
- `mish`, `softplus`, `softsign`, `tanhshrink`, `threshold`

### 📊 Available Loss Functions

**Regression Losses:**
- `mse` - Mean Squared Error
- `l1` - L1/MAE Loss
- `huber` - Huber Loss
- `smooth_l1` - Smooth L1 Loss

**Classification/Probability Losses:**
- `bce` - Binary Cross Entropy
- `kl_div` - KL Divergence
- `focal` - Focal Loss

**Similarity Losses:**
- `cosine` - Cosine Similarity Loss
- `ssim` - Structural Similarity Loss
- `perceptual` - Perceptual Loss

**Segmentation Losses:**
- `dice` - Dice Loss
- `tversky` - Tversky Loss

### 🏗️ Available Autoencoder Types

**Classic Autoencoder (`classic`)**
- Standard encoder-decoder architecture
- Direct reconstruction loss minimization

**Variational Autoencoder (`variational`)**
- Probabilistic latent space with mean and variance
- KL divergence regularization
- Reparameterization trick for sampling

**Beta-VAE (`beta_vae`)**
- Variational autoencoder with adjustable β parameter
- Better disentanglement of latent factors

**Denoising Autoencoder (`denoising`)**
- Adds noise to input during training
- Learns robust representations
- Configurable noise factor

**Sparse Autoencoder (`sparse`)**
- Encourages sparse latent representations
- L1 regularization on latent activations
- Useful for feature selection

**Contractive Autoencoder (`contractive`)**
- Penalizes large gradients of latent w.r.t. input
- Learns smooth manifold representations
- Robust to small input perturbations

**Recurrent Autoencoder (`recurrent`)**
- LSTM/GRU/RNN encoder-decoder architecture
- Bidirectional encoding for better sequence representations
- Variable length sequence support with padding
- Teacher forcing during training for stable learning
- Sequence-to-sequence reconstruction
```

## 📊 Model Outputs

### AutoencoderOutput

The base model `AutoencoderModel` returns the following output:
```
```python

@dataclass
class AutoencoderOutput(ModelOutput):
    last_hidden_state: torch.FloatTensor = None    # Latent representation
    reconstructed: torch.FloatTensor = None        # Reconstructed input
    hidden_states: Tuple[torch.FloatTensor] = None # Intermediate states
    attentions: Tuple[torch.FloatTensor] = None    # Not used
```

### AutoencoderForReconstructionOutput
```python
@dataclass
class AutoencoderForReconstructionOutput(ModelOutput):
    loss: torch.FloatTensor = None                 # Reconstruction loss
    reconstructed: torch.FloatTensor = None        # Reconstructed input
    last_hidden_state: torch.FloatTensor = None    # Latent representation
    hidden_states: Tuple[torch.FloatTensor] = None # Intermediate states
```

## 🔬 Advanced Usage

### Custom Loss Functions

You can easily extend the model with custom loss functions:

```python
class CustomAutoencoder(AutoencoderForReconstruction):
    def _compute_reconstruction_loss(self, reconstructed, target):
        # Custom loss implementation
        return your_custom_loss(reconstructed, target)
```

### Recurrent Autoencoder for Sequences

Perfect for time series, text, and sequential data:

```python
config = AutoencoderConfig(
    input_dim=50,              # Feature dimension per timestep
    latent_dim=32,             # Compressed representation size
    autoencoder_type="recurrent",
    rnn_type="lstm",           # or "gru", "rnn"
    num_layers=2,              # Number of RNN layers
    bidirectional=True,        # Bidirectional encoding
    teacher_forcing_ratio=0.7, # Teacher forcing during training
    sequence_length=None       # Variable length sequences
)

# Usage with sequence data
model = AutoencoderForReconstruction(config)
sequence_data = torch.randn(batch_size, seq_len, input_dim)
outputs = model(input_values=sequence_data)
```

### Learnable Preprocessing

Deep learning-based data normalization that adapts to your data:

```python
# Neural Scaler - Learnable alternative to StandardScaler
config = AutoencoderConfig(
    input_dim=20,
    latent_dim=10,
    use_learnable_preprocessing=True,
    preprocessing_type="neural_scaler",
    preprocessing_hidden_dim=64
)

# Normalizing Flow - Invertible transformations
config = AutoencoderConfig(
    input_dim=20,
    latent_dim=10,
    use_learnable_preprocessing=True,
    preprocessing_type="normalizing_flow",
    flow_coupling_layers=4
)

# Works with all autoencoder types and sequence data
model = AutoencoderForReconstruction(config)
outputs = model(input_values=data)
print(f"Preprocessing loss: {outputs.preprocessing_loss}")
```

```python
# Learnable MinMax Scaler - scales to [0, 1] with learnable bounds
config = AutoencoderConfig(
    input_dim=20,
    latent_dim=10,
    use_learnable_preprocessing=True,
    preprocessing_type="minmax_scaler",
)

# Learnable Robust Scaler - robust to outliers using median/IQR
config = AutoencoderConfig(
    input_dim=20,
    latent_dim=10,
    use_learnable_preprocessing=True,
    preprocessing_type="robust_scaler",
)

# Learnable Yeo-Johnson - power transform for skewed distributions
config = AutoencoderConfig(
    input_dim=20,
    latent_dim=10,
    use_learnable_preprocessing=True,
    preprocessing_type="yeo_johnson",
)
```


### Variational Autoencoder Extension

The configuration supports variational autoencoders:

```python
config = AutoencoderConfig(
    autoencoder_type="variational",
    beta=0.5,  # β-VAE parameter
    # ... other parameters
)
```

### Integration with Datasets Library

```python
from datasets import Dataset

# Convert your data to HF Dataset
dataset = Dataset.from_dict({
    "input_values": your_data_list
})

# Use with Trainer
trainer = Trainer(
    model=model,
    train_dataset=dataset,
    # ... other arguments
)
```

## 📁 Project Structure

```
autoencoder/
├── __init__.py                    # Package initialization
├── configuration_autoencoder.py   # Configuration class
├── modeling_autoencoder.py        # Model implementations
├── register_autoencoder.py        # AutoModel registration
├── pyproject.toml                 # Project metadata and dependencies
└── README.md                      # This file
```

## 🤝 Contributing

This implementation follows Hugging Face conventions and can be easily extended:

1. **Adding new architectures**: Extend `AutoencoderModel` or create new model classes
2. **Custom configurations**: Add parameters to `AutoencoderConfig`
3. **Task-specific heads**: Create new classes like `AutoencoderForReconstruction`
4. **Integration**: Register new models with the AutoModel framework

## 📚 References

- [Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers)
- [Custom Models Guide](https://huggingface.co/docs/transformers/custom_models)
- [AutoModel Documentation](https://huggingface.co/docs/transformers/model_doc/auto)

## 🎯 Use Cases

This autoencoder implementation is perfect for:

- **Dimensionality Reduction**: Compress high-dimensional data to lower dimensions
- **Anomaly Detection**: Identify outliers based on reconstruction error
- **Data Denoising**: Remove noise from corrupted data
- **Feature Learning**: Learn meaningful representations for downstream tasks
- **Data Generation**: Generate new samples similar to training data
- **Pretraining**: Initialize encoders for other tasks

## 🔍 Model Comparison

| Feature | Standard PyTorch | This Implementation |
|---------|------------------|-------------------|
| HF Integration | ❌ | ✅ |
| AutoModel Support | ❌ | ✅ |
| Trainer Compatible | ❌ | ✅ |
| Hub Integration | ❌ | ✅ |
| Config Management | Manual | ✅ Automatic |
| Serialization | Manual | ✅ Built-in |
| Checkpointing | Manual | ✅ Built-in |

## 🚀 Performance Tips

1. **Batch Size**: Use larger batch sizes for better GPU utilization
2. **Learning Rate**: Start with 1e-3 and adjust based on convergence
3. **Architecture**: Gradually decrease hidden dimensions for better compression
4. **Regularization**: Use dropout and batch normalization for better generalization
5. **Loss Function**: Choose appropriate loss based on your data type

## 📄 License

This implementation is provided as an example and follows the same license terms as Hugging Face Transformers.