Skip to main content
Version: devel

dlt.destinations.impl.filesystem.iceberg_adapter

PartitionSpec Objects

@dataclass(frozen=True)
class PartitionSpec()

View source on GitHub

get_transform

def get_transform() -> Transform[S, Any]

View source on GitHub

Get the PyIceberg Transform object for this partition.

Returns:

A PyIceberg Transform object

Raises:

  • ValueError - If the transform is not recognized

iceberg_partition Objects

class iceberg_partition()

View source on GitHub

Helper class with factory methods for creating partition specs.

identity

@staticmethod
def identity(column_name: str) -> PartitionSpec

View source on GitHub

Create an identity partition on a column.

Arguments:

  • column_name - The name of the column to partition on

Returns:

A PartitionSpec for identity partitioning

year

@staticmethod
def year(column_name: str,
partition_field_name: Optional[str] = None) -> PartitionSpec

View source on GitHub

Create a year partition on a timestamp/date column.

Arguments:

  • column_name - The name of the column to partition on
  • partition_field_name - Optional custom name for the partition field

Returns:

A PartitionSpec for year partitioning

month

@staticmethod
def month(column_name: str,
partition_field_name: Optional[str] = None) -> PartitionSpec

View source on GitHub

Create a month partition on a timestamp/date column.

Arguments:

  • column_name - The name of the column to partition on
  • partition_field_name - Optional custom name for the partition field

Returns:

A PartitionSpec for month partitioning

day

@staticmethod
def day(column_name: str,
partition_field_name: Optional[str] = None) -> PartitionSpec

View source on GitHub

Create a day partition on a timestamp/date column.

Arguments:

  • column_name - The name of the column to partition on
  • partition_field_name - Optional custom name for the partition field

Returns:

A PartitionSpec for day partitioning

hour

@staticmethod
def hour(column_name: str,
partition_field_name: Optional[str] = None) -> PartitionSpec

View source on GitHub

Create an hour partition on a timestamp column.

Arguments:

  • column_name - The name of the column to partition on
  • partition_field_name - Optional custom name for the partition field

Returns:

A PartitionSpec for hour partitioning

bucket

@staticmethod
def bucket(num_buckets: int,
column_name: str,
partition_field_name: Optional[str] = None) -> PartitionSpec

View source on GitHub

Create a bucket partition on a column.

Arguments:

  • num_buckets - The number of buckets to create
  • column_name - The name of the column to partition on
  • partition_field_name - Optional custom name for the partition field

Returns:

A PartitionSpec for bucket partitioning

truncate

@staticmethod
def truncate(width: int,
column_name: str,
partition_field_name: Optional[str] = None) -> PartitionSpec

View source on GitHub

Create a truncate partition on a string column.

Arguments:

  • width - The width to truncate to
  • column_name - The name of the column to partition on
  • partition_field_name - Optional custom name for the partition field

Returns:

A PartitionSpec for truncate partitioning

iceberg_adapter

def iceberg_adapter(
data: Any,
partition: Union[str, PartitionSpec, Sequence[Union[str,
PartitionSpec]]] = None
) -> DltResource

View source on GitHub

Prepares data or a DltResource for loading into Apache Iceberg table.

Takes raw data or an existing DltResource and configures it for Iceberg, primarily by defining partitioning strategies via the DltResource's hints.

Arguments:

  • data - The data to be transformed. This can be raw data (e.g., list of dicts) or an instance of DltResource. If raw data is provided, it will be encapsulated into a DltResource instance.
  • partition - Defines how the Iceberg table should be partitioned. Must be provided. It accepts:
    • A single column name (string): Defaults to an identity transform.
    • A PartitionSpec object: Allows for detailed partition configuration, including transformation types (year, month, day, hour, bucket, truncate). Use the iceberg_partition helper class to create these specs.
    • A sequence of the above: To define multiple partition columns.

Returns:

A DltResource instance configured with Iceberg-specific partitioning hints, ready for loading.

Raises:

  • ValueError - If partition is not specified or if an invalid partition transform is requested within a PartitionSpec.

Examples:

    data = [{"id": 1, "event_time": "2023-03-15T10:00:00Z", "category": "A"}]
resource = iceberg_adapter(

... data, ... partition=[ ... "category", # Identity partition on category ... iceberg_partition.year("event_time"), ... ] ... )

    # The resource's hints now contain the Iceberg partition specs:
# resource.compute_table_schema().get('x-iceberg-partition')
# [
# {'transform': 'identity', 'source_column': 'event_time'},
# {'transform': 'year', 'source_column': 'event_time'},
# ]
#
# Or in case of using an existing DltResource
@dlt.resource

... def my_data(): ... yield [{"value": "abc"}]

    iceberg_adapter(my_data, partition="value")

parse_partition_hints

def parse_partition_hints(
table_schema: PreparedTableSchema) -> List[PartitionSpec]

View source on GitHub

Parse PARTITION_HINT from table schema into PartitionSpec list.

Arguments:

  • table_schema - dlt table schema containing partition hints

Returns:

List of PartitionSpec objects from hints, empty list if no hints found

create_identity_specs

def create_identity_specs(column_names: List[str]) -> List[PartitionSpec]

View source on GitHub

Create identity partition specs from column names.

Arguments:

  • column_names - List of column names to partition by identity

Returns:

List of PartitionSpec objects with identity transform

build_iceberg_partition_spec

def build_iceberg_partition_spec(
arrow_schema: pa.Schema, spec_list: Sequence[PartitionSpec]
) -> tuple[IcebergPartitionSpec, IcebergSchema]

View source on GitHub

Turn a dlt PartitionSpec list into a PyIceberg PartitionSpec. Returns the PartitionSpec and the IcebergSchema derived from the Arrow schema.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.