--- # Metadata for Hugging Face repo card library_name: transformers pipeline_tag: feature-extraction license: apache-2.0 tags: - autoencoder - pytorch - reconstruction - preprocessing - normalizing-flow - scaler --- # Autoencoder Implementation for Hugging Face Transformers A complete autoencoder implementation that integrates seamlessly with the Hugging Face Transformers ecosystem, providing all the standard functionality you expect from transformer models. ### Install-and-Use from the Hub (code repo) If you want to use the implementation directly from the Hub code repository (without a packaged pip install), you can download the repo and add it to `sys.path`: ```python from huggingface_hub import snapshot_download import sys, torch repo_dir = snapshot_download( "amaye15/autoencoder", repo_type="model", allow_patterns=["*.py", "config.json", "*.safetensors"], ) sys.path.append(repo_dir) from configuration_autoencoder import AutoencoderConfig from modeling_autoencoder import AutoencoderForReconstruction # Load placeholder weights from the same repo (or your own trained weights) model = AutoencoderForReconstruction.from_pretrained( "amaye15/autoencoder", trust_remote_code=True, ) # Quick smoke test x = torch.randn(8, 20) outputs = model(input_values=x) print("Reconstructed:", tuple(outputs.reconstructed.shape), "Latent:", tuple(outputs.last_hidden_state.shape)) ``` ## 🚀 Features - **Full Hugging Face Integration**: Compatible with `AutoModel`, `AutoConfig`, and `AutoTokenizer` patterns - **Standard Training Workflows**: Works with `Trainer`, `TrainingArguments`, and all HF training utilities - **Model Hub Compatible**: Save and share models on Hugging Face Hub with `push_to_hub()` - **Flexible Architecture**: Configurable encoder-decoder architecture with various activation functions - **Multiple Loss Functions**: Support for MSE, BCE, L1, Huber, Smooth L1, KL Divergence, Cosine, Focal, Dice, Tversky, SSIM, and Perceptual loss - **Multiple Autoencoder Types (7)**: Classic, Variational (VAE), Beta-VAE, Denoising, Sparse, Contractive, and Recurrent autoencoders - **Extended Activation Functions**: 18+ activation functions including ReLU, GELU, Swish, Mish, ELU, and more - **Learnable Preprocessing**: Neural Scaler, Normalizing Flow, MinMax Scaler (learnable), Robust Scaler (learnable), and Yeo-Johnson preprocessors (2D and 3D tensors) - **Extensible Design**: Easy to extend for new autoencoder variants and custom loss functions - **Production Ready**: Proper serialization, checkpointing, and inference support ## 🏗️ Architecture The implementation consists of three main components: ### 1. AutoencoderConfig Configuration class that inherits from `PretrainedConfig`: - Defines model architecture parameters - Handles validation and serialization - Enables `AutoConfig.from_pretrained()` functionality ### 2. AutoencoderModel Base model class that inherits from `PreTrainedModel`: - Implements encoder-decoder architecture - Provides latent space representation - Returns structured outputs with `AutoencoderOutput` ### 3. AutoencoderForReconstruction Task-specific model for reconstruction: - Adds reconstruction loss calculation - Compatible with `Trainer` for easy training - Returns `AutoencoderForReconstructionOutput` with loss ## 🔧 Quick Start ### Basic Usage ```python from configuration_autoencoder import AutoencoderConfig from modeling_autoencoder import AutoencoderForReconstruction import torch # Create configuration config = AutoencoderConfig( input_dim=784, # Input dimensionality (e.g., 28x28 images flattened) hidden_dims=[512, 256], # Encoder hidden layers latent_dim=64, # Latent space dimension activation="gelu", # Activation function (18+ options available) reconstruction_loss="mse", # Loss function (12+ options available) autoencoder_type="classic", # Autoencoder type (7 types available) # Optional learnable preprocessing use_learnable_preprocessing=True, preprocessing_type="neural_scaler", # or "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson" ) # Create model model = AutoencoderForReconstruction(config) # Forward pass input_data = torch.randn(32, 784) # Batch of 32 samples outputs = model(input_values=input_data) print(f"Reconstruction loss: {outputs.loss}") print(f"Latent shape: {outputs.last_hidden_state.shape}") print(f"Reconstructed shape: {outputs.reconstructed.shape}") ``` ### Training with Hugging Face Trainer ```python from transformers import Trainer, TrainingArguments from torch.utils.data import Dataset class AutoencoderDataset(Dataset): def __init__(self, data): self.data = torch.FloatTensor(data) def __len__(self): return len(self.data) def __getitem__(self, idx): return { "input_values": self.data[idx], "labels": self.data[idx] # For autoencoder, input = target } # Prepare data train_dataset = AutoencoderDataset(your_training_data) val_dataset = AutoencoderDataset(your_validation_data) # Training arguments training_args = TrainingArguments( output_dir="./autoencoder_output", num_train_epochs=10, per_device_train_batch_size=64, per_device_eval_batch_size=64, warmup_steps=500, weight_decay=0.01, logging_dir="./logs", evaluation_strategy="steps", eval_steps=500, save_steps=1000, load_best_model_at_end=True, ) # Create trainer trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=val_dataset, ) # Train trainer.train() # Save model model.save_pretrained("./my_autoencoder") config.save_pretrained("./my_autoencoder") ``` ### Using AutoModel Framework ```python from register_autoencoder import register_autoencoder_models from transformers import AutoConfig, AutoModel # Register models with AutoModel framework register_autoencoder_models() # Now you can use standard HF patterns config = AutoConfig.from_pretrained("./my_autoencoder") model = AutoModel.from_pretrained("./my_autoencoder") # Use the model outputs = model(input_values=your_data) ``` ## ⚙️ Configuration Options The `AutoencoderConfig` class supports extensive customization: ```python config = AutoencoderConfig( input_dim=784, # Input dimension hidden_dims=[512, 256, 128], # Encoder hidden layers latent_dim=64, # Latent space dimension activation="gelu", # Activation function (see full list below) dropout_rate=0.1, # Dropout rate (0.0 to 1.0) use_batch_norm=True, # Use batch normalization tie_weights=False, # Tie encoder/decoder weights reconstruction_loss="mse", # Loss function (see full list below) autoencoder_type="variational", # Autoencoder type (see types below) beta=0.5, # Beta parameter for β-VAE temperature=1.0, # Temperature for Gumbel softmax noise_factor=0.1, # Noise factor for denoising AE # Recurrent autoencoder parameters rnn_type="lstm", # RNN type: "lstm", "gru", "rnn" num_layers=2, # Number of RNN layers bidirectional=True, # Bidirectional encoding sequence_length=None, # Fixed sequence length (None for variable) teacher_forcing_ratio=0.5, # Teacher forcing ratio during training # Learnable preprocessing parameters use_learnable_preprocessing=False, # Enable learnable preprocessing preprocessing_type="none", # "none", "neural_scaler", "normalizing_flow" preprocessing_hidden_dim=64, # Hidden dimension for preprocessing networks preprocessing_num_layers=2, # Number of layers in preprocessing networks learn_inverse_preprocessing=True, # Learn inverse transformation flow_coupling_layers=4, # Number of coupling layers for flows ) ``` ### 🎛️ Available Activation Functions **Standard Activations:** - `relu`, `leaky_relu`, `relu6`, `elu`, `prelu` - `tanh`, `sigmoid`, `hardsigmoid`, `hardtanh` - `gelu`, `swish`, `silu`, `hardswish` - `mish`, `softplus`, `softsign`, `tanhshrink`, `threshold` ### 📊 Available Loss Functions **Regression Losses:** - `mse` - Mean Squared Error - `l1` - L1/MAE Loss - `huber` - Huber Loss - `smooth_l1` - Smooth L1 Loss **Classification/Probability Losses:** - `bce` - Binary Cross Entropy - `kl_div` - KL Divergence - `focal` - Focal Loss **Similarity Losses:** - `cosine` - Cosine Similarity Loss - `ssim` - Structural Similarity Loss - `perceptual` - Perceptual Loss **Segmentation Losses:** - `dice` - Dice Loss - `tversky` - Tversky Loss ### 🏗️ Available Autoencoder Types **Classic Autoencoder (`classic`)** - Standard encoder-decoder architecture - Direct reconstruction loss minimization **Variational Autoencoder (`variational`)** - Probabilistic latent space with mean and variance - KL divergence regularization - Reparameterization trick for sampling **Beta-VAE (`beta_vae`)** - Variational autoencoder with adjustable β parameter - Better disentanglement of latent factors **Denoising Autoencoder (`denoising`)** - Adds noise to input during training - Learns robust representations - Configurable noise factor **Sparse Autoencoder (`sparse`)** - Encourages sparse latent representations - L1 regularization on latent activations - Useful for feature selection **Contractive Autoencoder (`contractive`)** - Penalizes large gradients of latent w.r.t. input - Learns smooth manifold representations - Robust to small input perturbations **Recurrent Autoencoder (`recurrent`)** - LSTM/GRU/RNN encoder-decoder architecture - Bidirectional encoding for better sequence representations - Variable length sequence support with padding - Teacher forcing during training for stable learning - Sequence-to-sequence reconstruction ``` ## 📊 Model Outputs ### AutoencoderOutput The base model `AutoencoderModel` returns the following output: ``` ```python @dataclass class AutoencoderOutput(ModelOutput): last_hidden_state: torch.FloatTensor = None # Latent representation reconstructed: torch.FloatTensor = None # Reconstructed input hidden_states: Tuple[torch.FloatTensor] = None # Intermediate states attentions: Tuple[torch.FloatTensor] = None # Not used ``` ### AutoencoderForReconstructionOutput ```python @dataclass class AutoencoderForReconstructionOutput(ModelOutput): loss: torch.FloatTensor = None # Reconstruction loss reconstructed: torch.FloatTensor = None # Reconstructed input last_hidden_state: torch.FloatTensor = None # Latent representation hidden_states: Tuple[torch.FloatTensor] = None # Intermediate states ``` ## 🔬 Advanced Usage ### Custom Loss Functions You can easily extend the model with custom loss functions: ```python class CustomAutoencoder(AutoencoderForReconstruction): def _compute_reconstruction_loss(self, reconstructed, target): # Custom loss implementation return your_custom_loss(reconstructed, target) ``` ### Recurrent Autoencoder for Sequences Perfect for time series, text, and sequential data: ```python config = AutoencoderConfig( input_dim=50, # Feature dimension per timestep latent_dim=32, # Compressed representation size autoencoder_type="recurrent", rnn_type="lstm", # or "gru", "rnn" num_layers=2, # Number of RNN layers bidirectional=True, # Bidirectional encoding teacher_forcing_ratio=0.7, # Teacher forcing during training sequence_length=None # Variable length sequences ) # Usage with sequence data model = AutoencoderForReconstruction(config) sequence_data = torch.randn(batch_size, seq_len, input_dim) outputs = model(input_values=sequence_data) ``` ### Learnable Preprocessing Deep learning-based data normalization that adapts to your data: ```python # Neural Scaler - Learnable alternative to StandardScaler config = AutoencoderConfig( input_dim=20, latent_dim=10, use_learnable_preprocessing=True, preprocessing_type="neural_scaler", preprocessing_hidden_dim=64 ) # Normalizing Flow - Invertible transformations config = AutoencoderConfig( input_dim=20, latent_dim=10, use_learnable_preprocessing=True, preprocessing_type="normalizing_flow", flow_coupling_layers=4 ) # Works with all autoencoder types and sequence data model = AutoencoderForReconstruction(config) outputs = model(input_values=data) print(f"Preprocessing loss: {outputs.preprocessing_loss}") ``` ```python # Learnable MinMax Scaler - scales to [0, 1] with learnable bounds config = AutoencoderConfig( input_dim=20, latent_dim=10, use_learnable_preprocessing=True, preprocessing_type="minmax_scaler", ) # Learnable Robust Scaler - robust to outliers using median/IQR config = AutoencoderConfig( input_dim=20, latent_dim=10, use_learnable_preprocessing=True, preprocessing_type="robust_scaler", ) # Learnable Yeo-Johnson - power transform for skewed distributions config = AutoencoderConfig( input_dim=20, latent_dim=10, use_learnable_preprocessing=True, preprocessing_type="yeo_johnson", ) ``` ### Variational Autoencoder Extension The configuration supports variational autoencoders: ```python config = AutoencoderConfig( autoencoder_type="variational", beta=0.5, # β-VAE parameter # ... other parameters ) ``` ### Integration with Datasets Library ```python from datasets import Dataset # Convert your data to HF Dataset dataset = Dataset.from_dict({ "input_values": your_data_list }) # Use with Trainer trainer = Trainer( model=model, train_dataset=dataset, # ... other arguments ) ``` ## 📁 Project Structure ``` autoencoder/ ├── __init__.py # Package initialization ├── configuration_autoencoder.py # Configuration class ├── modeling_autoencoder.py # Model implementations ├── register_autoencoder.py # AutoModel registration ├── pyproject.toml # Project metadata and dependencies └── README.md # This file ``` ## 🤝 Contributing This implementation follows Hugging Face conventions and can be easily extended: 1. **Adding new architectures**: Extend `AutoencoderModel` or create new model classes 2. **Custom configurations**: Add parameters to `AutoencoderConfig` 3. **Task-specific heads**: Create new classes like `AutoencoderForReconstruction` 4. **Integration**: Register new models with the AutoModel framework ## 📚 References - [Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers) - [Custom Models Guide](https://huggingface.co/docs/transformers/custom_models) - [AutoModel Documentation](https://huggingface.co/docs/transformers/model_doc/auto) ## 🎯 Use Cases This autoencoder implementation is perfect for: - **Dimensionality Reduction**: Compress high-dimensional data to lower dimensions - **Anomaly Detection**: Identify outliers based on reconstruction error - **Data Denoising**: Remove noise from corrupted data - **Feature Learning**: Learn meaningful representations for downstream tasks - **Data Generation**: Generate new samples similar to training data - **Pretraining**: Initialize encoders for other tasks ## 🔍 Model Comparison | Feature | Standard PyTorch | This Implementation | |---------|------------------|-------------------| | HF Integration | ❌ | ✅ | | AutoModel Support | ❌ | ✅ | | Trainer Compatible | ❌ | ✅ | | Hub Integration | ❌ | ✅ | | Config Management | Manual | ✅ Automatic | | Serialization | Manual | ✅ Built-in | | Checkpointing | Manual | ✅ Built-in | ## 🚀 Performance Tips 1. **Batch Size**: Use larger batch sizes for better GPU utilization 2. **Learning Rate**: Start with 1e-3 and adjust based on convergence 3. **Architecture**: Gradually decrease hidden dimensions for better compression 4. **Regularization**: Use dropout and batch normalization for better generalization 5. **Loss Function**: Choose appropriate loss based on your data type ## 📄 License This implementation is provided as an example and follows the same license terms as Hugging Face Transformers.