Hub documentation

Bucket Integrations

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Bucket Integrations

Storage Buckets can be read and written from many Python data libraries using hf://buckets/ paths, backed by the huggingface_hub filesystem interface.

For the underlying access mechanisms — mounts, volume mounts, and fsspec — see Access Patterns.

pandas

import pandas as pd

df = pd.read_parquet("hf://buckets/username/my-bucket/data.parquet")
df.to_parquet("hf://buckets/username/my-bucket/output.parquet")

Dask

import dask.dataframe as dd

df = dd.read_parquet("hf://buckets/username/my-bucket/data.parquet")

PyArrow

import pyarrow.parquet as pq

table = pq.read_table("hf://buckets/username/my-bucket/data.parquet")

PySpark

With pyspark_huggingface installed:

df = (
    spark.read.format("huggingface")
    .option("data_files", '["data.parquet"]')
    .load("buckets/username/my-bucket")
)

See PySpark on the Hub for more.

🤗 Datasets

from datasets import load_dataset

ds = load_dataset("buckets/username/my-bucket", data_files=["data.parquet"])

Filesystem operations

For direct file operations, huggingface_hub exposes a pre-instantiated filesystem object, hffs:

from huggingface_hub import hffs

with hffs.open("buckets/username/my-bucket/hello.txt", "w") as f:
    f.write("Hello world!")

hffs.cp("buckets/username/my-bucket/hello.txt", "buckets/username/my-bucket/hello2.txt")
hffs.rm("buckets/username/my-bucket/hello2.txt")
files = hffs.ls("buckets/username/my-bucket")
text_files = hffs.glob("buckets/username/my-bucket/*.txt")

Other languages

OpenDAL provides a similar filesystem interface for Rust, Java, Go, JavaScript, and more.

Coming soon

Support for more libraries is on the way — including Polars, DuckDB (native hf:// URL support), Daft, and webdataset.

Update on GitHub

Free AI Image Generator No sign-up. Instant results. Open Now