This page documents the process of binding transforms to source types and applying them to literal values. Transform binding creates type-specific TransformFunction instances from generic Transform specifications. Transform application converts source values to partition values using the bound functions.
For information about the overall transform system architecture, see Transform System Overview. For details on transform types and their parameters, see Transform Types. For predicate projection using transforms, see Predicate Projection.
Transform binding is the process of associating a Transform with a specific source type to create a TransformFunction. The Transform::Bind() method performs this operation, validating type compatibility and instantiating the appropriate concrete transform function.
The Transform::Bind() method is the primary entry point for transform binding:
This method validates type compatibility using CanTransform() internally, then dispatches to the appropriate TransformFunction::Make() factory method based on the transform type. Returns either a bound TransformFunction or an error if the transform cannot be applied to the source type.
Sources: src/iceberg/transform.h145-152 src/iceberg/transform.cc94-132
Transform Binding Process from Transform to TransformFunction
Sources: src/iceberg/transform.cc94-132
The Transform::Bind() method uses a switch statement on transform_type_ to instantiate the appropriate concrete transform function:
| Transform Type | Factory Method | Required Parameter | Code Reference |
|---|---|---|---|
kIdentity | IdentityTransform::Make() | None | src/iceberg/transform.cc99-100 |
kBucket | BucketTransform::Make() | num_buckets (int32) | src/iceberg/transform.cc102-108 |
kTruncate | TruncateTransform::Make() | width (int32) | src/iceberg/transform.cc110-116 |
kYear | YearTransform::Make() | None | src/iceberg/transform.cc118-119 |
kMonth | MonthTransform::Make() | None | src/iceberg/transform.cc120-121 |
kDay | DayTransform::Make() | None | src/iceberg/transform.cc122-123 |
kHour | HourTransform::Make() | None | src/iceberg/transform.cc124-125 |
kVoid | VoidTransform::Make() | None | src/iceberg/transform.cc126-127 |
For parameterized transforms (kBucket, kTruncate), the Bind() method uses std::get_if<int32_t>(¶m_) to extract the parameter from the std::variant<std::monostate, int32_t> param_ member variable and passes it to the factory method. If the parameter is missing, it returns an InvalidArgument error.
Sources: src/iceberg/transform.h227-229 src/iceberg/transform.cc98-132
The CanTransform() method determines whether a transform can be applied to a given source type. This check occurs before binding and provides a fast way to validate compatibility without creating the transform function.
bool Transform::CanTransform(const Type& source_type) const
Sources: src/iceberg/transform.h154-157 src/iceberg/transform.cc134-201
Transform::CanTransform() Compatibility Rules
| Transform Type | Compatible TypeId Values | Implementation |
|---|---|---|
| kIdentity | All primitive types | src/iceberg/transform.cc136-140 |
| kBucket | kInt, kLong, kDecimal, kDate, kTime, kTimestamp, kTimestampTz, kString, kUuid, kFixed, kBinary | src/iceberg/transform.cc144-160 |
| kTruncate | kInt, kLong, kString, kBinary, kDecimal | src/iceberg/transform.cc161-171 |
| kYear, kMonth | kDate, kTimestamp, kTimestampTz | src/iceberg/transform.cc172-181 |
| kDay | kDate, kTimestamp, kTimestampTz | src/iceberg/transform.cc182-190 |
| kHour | kTimestamp, kTimestampTz (not kDate) | src/iceberg/transform.cc191-198 |
| kVoid, kUnknown | All types | src/iceberg/transform.cc141-143 |
The CanTransform() method performs these checks without instantiating a TransformFunction, making it suitable for fast validation during schema operations and partition spec building.
Sources: src/iceberg/transform.h154-157 src/iceberg/transform.cc134-201
The CanTransform() method uses nested switch statements to check compatibility:
Identity Transform: Accepts any primitive type, rejects nested types (struct, list, map).
Void and Unknown Transforms: Accept any type.
Bucket Transform: Accepts numeric types, temporal types, string types, UUID, fixed, and binary. Rejects float/double and nested types.
Truncate Transform: Accepts int, long, decimal, string, and binary. Rejects other types.
Temporal Transforms (Year, Month, Day): Accept date, timestamp, and timestamp_tz types only.
Hour Transform: Accepts timestamp and timestamp_tz types only (not date).
Sources: src/iceberg/transform.cc134-201
TransformFunction is the abstract base class for all concrete transform implementations. It provides the interface for applying transforms to literal values and querying result types.
Concrete Transform Function Implementations
Each concrete class implements:
Result<Literal> Transform(const Literal& literal) - Applies the transformstd::shared_ptr<Type> ResultType() const - Returns output typestatic Result<std::unique_ptr<TransformFunction>> Make(...) - Factory methodSources: src/iceberg/transform.h244-275 src/iceberg/transform_function.h1-207
The TransformFunction base class defines the interface that all concrete transform implementations must follow:
Constructor (src/iceberg/transform.cc414-416):
Stores the transform type enum and source type for later queries.
Transform Method (src/iceberg/transform.h249-252):
Pure virtual method that applies the transform to a literal value. All implementations must return null for null input values per the Iceberg specification.
ResultType Method (src/iceberg/transform.h254-263):
Pure virtual method that returns the output type of the transform. This defines both the physical representation (which must conform to Iceberg spec) and the display representation for partition fields.
Accessor Methods:
transform_type() (src/iceberg/transform.cc418): Returns the TransformType enum valuesource_type() (src/iceberg/transform.cc420-422): Returns the bound source type as const std::shared_ptr<Type>&Sources: src/iceberg/transform.h244-275 src/iceberg/transform.cc414-422
Each concrete transform function implements the TransformFunction interface with type-specific logic.
The IdentityTransform class returns the input value unchanged, used for direct partitioning without transformation. This is the simplest transform and is commonly used when you want partition values to exactly match the source column values.
Result Type: Same as source type
Compatible Types: All primitive types (verified by Type::is_primitive())
Factory Method: IdentityTransform::Make(source_type) (src/iceberg/transform_function.h42-43)
Transform Implementation: Returns literal unchanged after null check
Transform Data Flow for IdentityTransform
Sources: src/iceberg/transform_function.h27-44
The BucketTransform class hashes input values into N buckets using a 32-bit Murmur3 hash function. This enables uniform distribution of values across partitions based on hash values.
Result Type: INT32 (bucket number from 0 to num_buckets-1)
Compatible Types: Int, Long, Decimal, Date, Time, Timestamp, TimestampTz, String, UUID, Fixed, Binary
Parameters: num_buckets (int32) - stored in private member num_buckets_
Factory Method: BucketTransform::Make(source_type, num_buckets) (src/iceberg/transform_function.h69-70)
Hash Algorithm: Murmur3 32-bit hash as specified in Iceberg spec Appendix B
The bucket value is computed as: (hash(value) & Integer.MAX_VALUE) % num_buckets
Bucket Transform Data Flow
Sources: src/iceberg/transform_function.h46-74 src/iceberg/test/bucket_util_test.cc96-99
The TruncateTransform class truncates values to a specified width. The behavior varies by source type to maintain semantic correctness.
Result Type: Same as source type
Compatible Types: Int, Long, Decimal, String, Binary
Parameters: width (int32) - stored in private member width_, accessible via width() (src/iceberg/transform_function.h90)
Factory Method: TruncateTransform::Make(source_type, width) (src/iceberg/transform_function.h96-97)
Behavior by Type:
value - (value % width) for positive valueswidth Unicode code points using StringUtils::TruncateCodePoints()width bytesTruncate Transform for Different Types
Sources: src/iceberg/transform_function.h76-101 src/iceberg/test/transform_test.cc392-430
The YearTransform class extracts the year component from temporal values, represented as the number of years since 1970.
Result Type: INT32 (years from epoch)
Compatible Types: Date, Timestamp, TimestampTz
Factory Method: YearTransform::Make(source_type) (src/iceberg/transform_function.h119-120)
Implementation: Delegates to TemporalUtils::ExtractYear() (src/iceberg/util/temporal_util.cc153-170)
Example: 2021-06-01 → Literal::Int(51) (51 years since 1970)
Year Transform Processing
Sources: src/iceberg/transform_function.h103-121 src/iceberg/util/temporal_util.cc63-84 src/iceberg/test/transform_test.cc449-481
The MonthTransform class extracts the month component from temporal values, represented as the number of months since 1970-01.
Result Type: INT32 (months from epoch)
Compatible Types: Date, Timestamp, TimestampTz
Factory Method: MonthTransform::Make(source_type) (src/iceberg/transform_function.h139-140)
Implementation: Delegates to TemporalUtils::ExtractMonth() (src/iceberg/util/temporal_util.cc176-193)
The month calculation uses: (year - 1970) * 12 + (month - 1) where month is 1-indexed.
Example: 2021-06-01 → Literal::Int(617) (617 months since 1970-01)
Sources: src/iceberg/transform_function.h122-141 src/iceberg/util/temporal_util.cc54-108 src/iceberg/test/transform_test.cc500-514
The DayTransform class extracts the day component from temporal values, represented as the number of days since 1970-01-01.
Result Type: DATE (for display), physically INT32 (for storage)
Compatible Types: Date, Timestamp, TimestampTz
Factory Method: DayTransform::Make(source_type) (src/iceberg/transform_function.h163-164)
Implementation: Delegates to TemporalUtils::ExtractDay() (src/iceberg/util/temporal_util.cc199-216)
Special Note: The ResultType() method returns DateType (src/iceberg/transform_function.h153-158) to provide a more human-readable display representation of partition fields, while the physical representation conforms to the Iceberg spec as INT32.
For Date sources, the value is returned unchanged. For Timestamp sources, the timestamp is converted to days using floor<days>(sys_time<microseconds>(...)).
Sources: src/iceberg/transform_function.h141-165 src/iceberg/util/temporal_util.cc42-129 src/iceberg/test/transform_test.cc533-564
The HourTransform class extracts the hour component from timestamp values, represented as the number of hours since the Unix epoch.
Result Type: INT32 (hours from epoch)
Compatible Types: Timestamp, TimestampTz (note: Date is not supported)
Factory Method: HourTransform::Make(source_type) (src/iceberg/transform_function.h183-184)
Implementation: Delegates to TemporalUtils::ExtractHour() (src/iceberg/util/temporal_util.cc222-238)
The hour calculation uses floor<hours>(sys_time<microseconds>(...)) to convert microseconds to hours.
Example: Timestamp 1622547800000000 (2021-06-01T11:43:20Z) → Literal::Int(450707) hours since epoch
Sources: src/iceberg/transform_function.h164-185 src/iceberg/util/temporal_util.cc46-145 src/iceberg/test/transform_test.cc583-593
The VoidTransform class always returns null, discarding the input value. This is useful for hidden partitioning fields or placeholders in partition specs.
Result Type: Same as source type (but always null)
Compatible Types: All types
Factory Method: VoidTransform::Make(source_type) (src/iceberg/transform_function.h202-203)
Transform Implementation: Returns Literal::Null(source_type()) for any input
Void Transform Behavior
Sources: src/iceberg/transform_function.h187-204 src/iceberg/test/transform_test.cc595-672
Transforms have two key order-related properties that affect sort ordering and query optimization.
bool Transform::PreservesOrder() const
Returns true if the transform maintains the ordering of input values. A transform preserves order if transform(a) <= transform(b) whenever a <= b.
| Transform Type | Preserves Order |
|---|---|
| Identity | ✓ Yes |
| Truncate | ✓ Yes |
| Year | ✓ Yes |
| Month | ✓ Yes |
| Day | ✓ Yes |
| Hour | ✓ Yes |
| Bucket | ✗ No |
| Void | ✗ No |
| Unknown | ✗ No |
Sources: src/iceberg/transform.h159-160 src/iceberg/transform.cc203-218
bool Transform::SatisfiesOrderOf(const Transform& other) const
Returns true if ordering by this transform's result satisfies the ordering requirements of the other transform's result. This is important for determining whether a sort order on one partition field satisfies queries on another.
Order Satisfaction Hierarchy for Temporal Transforms
Special Cases (src/iceberg/transform.cc220-247):
other.PreservesOrder() - satisfies any order-preserving transformtruncate[W1] satisfies truncate[W2] if W1 >= W2 (wider truncation satisfies narrower)Sources: src/iceberg/transform.h162-171 src/iceberg/transform.cc220-247
Example 1: Sorting by day(timestamp) satisfies queries that filter by month(timestamp) because all records in a given day belong to the same month.
Example 2: Sorting by truncate<FileRef file-url="https://github.com/apache/iceberg-cpp/blob/d07b7564/10" undefined file-path="10">Hii</FileRef> satisfies queries on truncate<FileRef file-url="https://github.com/apache/iceberg-cpp/blob/d07b7564/5" undefined file-path="5">Hii</FileRef> because values truncated to width 10 are also appropriately grouped for width 5.
Example 3: Sorting by bucket<FileRef file-url="https://github.com/apache/iceberg-cpp/blob/d07b7564/16" undefined file-path="16">Hii</FileRef> does NOT satisfy queries on bucket<FileRef file-url="https://github.com/apache/iceberg-cpp/blob/d07b7564/8" undefined file-path="8">Hii</FileRef> because the hash buckets don't have a consistent relationship.
Sources: src/iceberg/test/transform_test.cc754-847
Transform application is the process of converting a source literal value to a partition value using a bound TransformFunction.
TransformFunction::Transform() Execution Flow
All concrete TransformFunction implementations follow the contract that null inputs produce null outputs with the appropriate result type.
Sources: src/iceberg/transform.h249-252 src/iceberg/test/transform_test.cc674-726
All transform functions must return null for null input values, as required by the Iceberg specification. This ensures consistent null semantics across all transform types.
Sources: src/iceberg/test/transform_test.cc674-726
The following demonstrates the complete flow from creating a transform to applying it to a literal value:
Create Transform (src/iceberg/transform.cc54-57):
Bind to Type (src/iceberg/transform.cc118-119):
Apply to Literal (src/iceberg/test/transform_test.cc441-443):
Query Result Type (src/iceberg/test/transform_test.cc165-168):
Sources: src/iceberg/test/transform_test.cc432-481 src/iceberg/transform.cc54-119
The following demonstrates using a parameterized transform (bucket with 4 buckets):
Create Transform (src/iceberg/transform.cc79-81):
Bind to Type (src/iceberg/transform.cc102-108):
Apply to Literal (src/iceberg/test/transform_test.cc297-300):
Access Parameters (src/iceberg/transform_function.h63):
Sources: src/iceberg/test/transform_test.cc289-375 src/iceberg/test/bucket_util_test.cc36-37 src/iceberg/transform.cc79-108
The following demonstrates truncate transform with width parameter:
Create Transform (src/iceberg/transform.cc83-85):
Bind to Type (src/iceberg/transform.cc110-116):
Apply to Literal (src/iceberg/test/transform_test.cc384-386):
String Truncation Example (src/iceberg/test/transform_test.cc415-419):
Sources: src/iceberg/test/transform_test.cc377-430 src/iceberg/transform.cc83-116
Each transform function defines its output type through the ResultType() method. The result type is important for:
| Transform | Source Type | Result Type | Notes |
|---|---|---|---|
| Identity | Any | Same as source | Unchanged |
| Bucket | Int/Long/String/etc | INT32 | Bucket number |
| Truncate | Int/Long | Same as source | Rounded value |
| Truncate | String | STRING | Truncated string |
| Truncate | Binary | BINARY | Truncated bytes |
| Truncate | Decimal | DECIMAL(P,S) | Same precision/scale |
| Year | Date/Timestamp | INT32 | Years from 1970 |
| Month | Date/Timestamp | INT32 | Months from 1970-01 |
| Day | Date/Timestamp | DATE | Days from 1970-01-01 |
| Hour | Timestamp | INT32 | Hours from 1970-01-01 00:00 |
| Void | Any | Same as source | Always null |
Sources: src/iceberg/test/transform_test.cc123-169
Note that DayTransform returns DateType as the result type for improved readability, but this is physically stored as INT32 according to the Iceberg specification. The display type provides a more human-readable representation of partition fields.
Sources: src/iceberg/transform_function.h150-155
Transform binding and application use the Result<T> type for error handling, returning errors in the following cases:
Type Incompatibility: When Bind() is called with an incompatible source type:
Missing Parameters: When required parameters are not present for parameterized transforms:
Invalid Source Values: Individual transform implementations may return errors for invalid values (though most handle edge cases gracefully).
Sources: src/iceberg/transform.cc94-132 src/iceberg/test/transform_test.cc171-196
Refresh this wiki
This wiki was recently refreshed. Please wait 3 days to refresh again.