GH-838: [FlightSQL][JDBC] Fix timestamp unit conversion and sub-millisecond precision in PreparedStatement parameter binding#1081
Open
dmitry-chirkov-dremio wants to merge 5 commits intoapache:mainfrom
Conversation
This comment has been minimized.
This comment has been minimized.
…aredStatement parameter binding
c43b2e8 to
6d63367
Compare
…ecision in PreparedStatement parameter binding Intercept raw java.sql.Timestamp objects at the JDBC layer before Avatica serializes them to epoch milliseconds (losing sub-ms nanos). Propagate the original Timestamp through ArrowFlightMetaImpl and AvaticaParameterBinder to TimestampAvaticaParameterConverter, which reconstructs full-precision epoch values from Timestamp.getTime() + Timestamp.getNanos(). Add unit tests for micro/nano/milli/second precision with raw Timestamp, and integration tests exercising the full setTimestamp() path through the mock FlightSQL server.
Author
jbonofre
approved these changes
Mar 21, 2026
Member
jbonofre
left a comment
There was a problem hiding this comment.
Good one ! Thanks @dmitry-chirkov-dremio !
I just have a suggestion for the convertFromMillis method.
...ain/java/org/apache/arrow/driver/jdbc/converter/impl/TimestampAvaticaParameterConverter.java
Outdated
Show resolved
Hide resolved
xborder
reviewed
Mar 21, 2026
Contributor
There was a problem hiding this comment.
overall LGTM. I'd probably just add a few tests for edge cases. Examples:
- Change the Timestamp object after setting the parameter. If we're keeping a reference to the object, is it possible to change it after the bind?
- Setting timestamp with null on a string with multiple parameters
- Test setObject
- Use setObject for a timestamp parameter, before/after setTimestamp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
The FlightSQL JDBC driver's
PreparedStatement.setTimestamp()has two bugs (#838):Unit scaling bug:
TimestampAvaticaParameterConverterwrites raw epoch millisecond values directly into Arrow timestamp vectors regardless of the vector's time unit. This causes data corruption when the server uses SECOND, MICROSECOND, or NANOSECOND precision — the millisecond value is misinterpreted as the target unit (e.g., epoch millis treated as epoch micros → dates near 1970).Precision loss bug: The underlying Avatica library serializes
java.sql.Timestampinto along(epoch milliseconds) viaTypedValue.ofJdbc(), discarding the sub-millisecond nanoseconds fromTimestamp.getNanos()before they reach the Arrow converters. This means even with correct unit scaling, microsecond and nanosecond precision is permanently lost.What changes are included in this PR?
Fix 1 — Unit scaling (converter layer):
TimestampAvaticaParameterConverterto store theArrowType.Timestampand scale the epoch millisecond value to the target unit before writing to the vector:Fix 2 — Sub-millisecond precision preservation (JDBC interception layer):
ArrowFlightPreparedStatement: OverridessetTimestamp()to capture the rawjava.sql.Timestampobjects (with full nanosecond precision) before Avatica serializes them to millis. Also overridesclearParameters()to clean up.ArrowFlightMetaImpl: AddedgetRawTimestamps()to bridge the captured timestamps from the statement to the parameter binder.AvaticaParameterBinder: Updatedbind()andBinderVisitorto propagate the optional rawTimestampto converters.TimestampAvaticaParameterConverter: AddedbindParameter(vector, typedValue, index, rawTimestamp)overload andconvertFromTimestamp(Timestamp)that reconstructs full-precision epoch values fromTimestamp.getTime()(epoch seconds) +Timestamp.getNanos()(fractional second in nanos), avoiding the Avatica millis truncation.ListAvaticaParameterConverter,LargeListAvaticaParameterConverter,FixedSizeListAvaticaParameterConverter: Updated to passnullas the raw timestamp argument to the newBinderVisitorconstructor.Are these changes tested?
Yes.
TimestampAvaticaParameterConverterTest): Added tests for all 8 vector types (4 units × with/without timezone) for the millis-based scaling, plus 6 new tests for the rawTimestamppath covering micro precision, nano precision, milli truncation, second truncation, and null-rawTimestamp fallback.ArrowFlightPreparedStatementTest): AddedtestTimestampParameterWithMicrosecondPrecisionandtestTimestampParameterWithMillisecondPrecisionthat exercise the fullsetTimestamp()→ interception → binding → mock server validation path.Are there any user-facing changes?
This PR includes breaking changes to public APIs.
PreparedStatement#setTimestamp()now correctly converts timestamp values for non-millisecond precision columns. Previously, timestamps inserted via the FlightSQL JDBC driver into SECOND, MICROSECOND, or NANOSECOND columns were silently corrupted. Sub-millisecond precision (microseconds, nanoseconds) is now preserved when the target column supports it.This PR contains a "Critical Fix". The bug caused incorrect data to be produced — epoch milliseconds were written as-is into microsecond/nanosecond vectors, resulting in wildly wrong dates (e.g., 2024 → 1970). Additionally, sub-millisecond precision was silently discarded.
Closes #838