Azure Data Factory v2 - wrong year copying from parquet to SQL DB
I'm having a weird issue with Azure Data Factory v2. There's a Spark Job which is running and producing parquet files as output, an ADFv2 copy activity then takes the output parquet and copies the data into an Azure SQL Database. All is working fine except for dates! When the data lands in SQL the year is 1969 years out. So todays date (2018-11-22) would land as 3987-11-22.
I've tried changing the source and destination types between Date, DateTime, DateTimeOffset and String but with no success. At the moment I'm correcting the dates in the database but this is not really ideal.
I've opened the source parquet files using Parquet Viewer, Spark and Python (desktop) and they all correctly show the year as 2018
add a comment |
I'm having a weird issue with Azure Data Factory v2. There's a Spark Job which is running and producing parquet files as output, an ADFv2 copy activity then takes the output parquet and copies the data into an Azure SQL Database. All is working fine except for dates! When the data lands in SQL the year is 1969 years out. So todays date (2018-11-22) would land as 3987-11-22.
I've tried changing the source and destination types between Date, DateTime, DateTimeOffset and String but with no success. At the moment I'm correcting the dates in the database but this is not really ideal.
I've opened the source parquet files using Parquet Viewer, Spark and Python (desktop) and they all correctly show the year as 2018
add a comment |
I'm having a weird issue with Azure Data Factory v2. There's a Spark Job which is running and producing parquet files as output, an ADFv2 copy activity then takes the output parquet and copies the data into an Azure SQL Database. All is working fine except for dates! When the data lands in SQL the year is 1969 years out. So todays date (2018-11-22) would land as 3987-11-22.
I've tried changing the source and destination types between Date, DateTime, DateTimeOffset and String but with no success. At the moment I'm correcting the dates in the database but this is not really ideal.
I've opened the source parquet files using Parquet Viewer, Spark and Python (desktop) and they all correctly show the year as 2018
I'm having a weird issue with Azure Data Factory v2. There's a Spark Job which is running and producing parquet files as output, an ADFv2 copy activity then takes the output parquet and copies the data into an Azure SQL Database. All is working fine except for dates! When the data lands in SQL the year is 1969 years out. So todays date (2018-11-22) would land as 3987-11-22.
I've tried changing the source and destination types between Date, DateTime, DateTimeOffset and String but with no success. At the moment I'm correcting the dates in the database but this is not really ideal.
I've opened the source parquet files using Parquet Viewer, Spark and Python (desktop) and they all correctly show the year as 2018
asked Nov 22 '18 at 14:28
daz-fullerdaz-fuller
8021613
8021613
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
According to parquet date type definition,
https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types
The date is stored as "the number of days from the Unix epoch, 1 January 1970"
And ADF is using .net type doing the transformation. According to .net type definition, Time values are measured in 100-nanosecond units called ticks. A particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.)
https://docs.microsoft.com/en-us/dotnet/api/system.datetime?view=netframework-4.7.2
Seems extra 1969 is added for this reason. But not sure whether is this a bug. What is your parquet data type? is it Date? and what is the sql data type?
Could you provide the copy activity run id? Or maybe some parquet sample data?
Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue
– daz-fuller
Nov 26 '18 at 22:21
1
Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.
– Qingquan Yang
Nov 29 '18 at 6:01
I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly
– daz-fuller
Nov 29 '18 at 7:04
add a comment |
Based on Parquet encoding definitions,no Date, DateTime, DateTimeOffset and String format exist,so you do not need to try with these formats.
Based on this Data type mapping for Parquet files in Azure Data Factory:

The DateTimeOffset format corresponds to Int96,I suggest you trying this transmission on the source of parquet file.
The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value
– daz-fuller
Nov 26 '18 at 21:48
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53433123%2fazure-data-factory-v2-wrong-year-copying-from-parquet-to-sql-db%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
According to parquet date type definition,
https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types
The date is stored as "the number of days from the Unix epoch, 1 January 1970"
And ADF is using .net type doing the transformation. According to .net type definition, Time values are measured in 100-nanosecond units called ticks. A particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.)
https://docs.microsoft.com/en-us/dotnet/api/system.datetime?view=netframework-4.7.2
Seems extra 1969 is added for this reason. But not sure whether is this a bug. What is your parquet data type? is it Date? and what is the sql data type?
Could you provide the copy activity run id? Or maybe some parquet sample data?
Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue
– daz-fuller
Nov 26 '18 at 22:21
1
Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.
– Qingquan Yang
Nov 29 '18 at 6:01
I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly
– daz-fuller
Nov 29 '18 at 7:04
add a comment |
According to parquet date type definition,
https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types
The date is stored as "the number of days from the Unix epoch, 1 January 1970"
And ADF is using .net type doing the transformation. According to .net type definition, Time values are measured in 100-nanosecond units called ticks. A particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.)
https://docs.microsoft.com/en-us/dotnet/api/system.datetime?view=netframework-4.7.2
Seems extra 1969 is added for this reason. But not sure whether is this a bug. What is your parquet data type? is it Date? and what is the sql data type?
Could you provide the copy activity run id? Or maybe some parquet sample data?
Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue
– daz-fuller
Nov 26 '18 at 22:21
1
Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.
– Qingquan Yang
Nov 29 '18 at 6:01
I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly
– daz-fuller
Nov 29 '18 at 7:04
add a comment |
According to parquet date type definition,
https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types
The date is stored as "the number of days from the Unix epoch, 1 January 1970"
And ADF is using .net type doing the transformation. According to .net type definition, Time values are measured in 100-nanosecond units called ticks. A particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.)
https://docs.microsoft.com/en-us/dotnet/api/system.datetime?view=netframework-4.7.2
Seems extra 1969 is added for this reason. But not sure whether is this a bug. What is your parquet data type? is it Date? and what is the sql data type?
Could you provide the copy activity run id? Or maybe some parquet sample data?
According to parquet date type definition,
https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types
The date is stored as "the number of days from the Unix epoch, 1 January 1970"
And ADF is using .net type doing the transformation. According to .net type definition, Time values are measured in 100-nanosecond units called ticks. A particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.)
https://docs.microsoft.com/en-us/dotnet/api/system.datetime?view=netframework-4.7.2
Seems extra 1969 is added for this reason. But not sure whether is this a bug. What is your parquet data type? is it Date? and what is the sql data type?
Could you provide the copy activity run id? Or maybe some parquet sample data?
edited Nov 24 '18 at 5:11
answered Nov 24 '18 at 5:04
Qingquan YangQingquan Yang
1562
1562
Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue
– daz-fuller
Nov 26 '18 at 22:21
1
Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.
– Qingquan Yang
Nov 29 '18 at 6:01
I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly
– daz-fuller
Nov 29 '18 at 7:04
add a comment |
Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue
– daz-fuller
Nov 26 '18 at 22:21
1
Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.
– Qingquan Yang
Nov 29 '18 at 6:01
I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly
– daz-fuller
Nov 29 '18 at 7:04
Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue
– daz-fuller
Nov 26 '18 at 22:21
Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue
– daz-fuller
Nov 26 '18 at 22:21
1
1
Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.
– Qingquan Yang
Nov 29 '18 at 6:01
Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.
– Qingquan Yang
Nov 29 '18 at 6:01
I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly
– daz-fuller
Nov 29 '18 at 7:04
I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly
– daz-fuller
Nov 29 '18 at 7:04
add a comment |
Based on Parquet encoding definitions,no Date, DateTime, DateTimeOffset and String format exist,so you do not need to try with these formats.
Based on this Data type mapping for Parquet files in Azure Data Factory:

The DateTimeOffset format corresponds to Int96,I suggest you trying this transmission on the source of parquet file.
The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value
– daz-fuller
Nov 26 '18 at 21:48
add a comment |
Based on Parquet encoding definitions,no Date, DateTime, DateTimeOffset and String format exist,so you do not need to try with these formats.
Based on this Data type mapping for Parquet files in Azure Data Factory:

The DateTimeOffset format corresponds to Int96,I suggest you trying this transmission on the source of parquet file.
The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value
– daz-fuller
Nov 26 '18 at 21:48
add a comment |
Based on Parquet encoding definitions,no Date, DateTime, DateTimeOffset and String format exist,so you do not need to try with these formats.
Based on this Data type mapping for Parquet files in Azure Data Factory:

The DateTimeOffset format corresponds to Int96,I suggest you trying this transmission on the source of parquet file.
Based on Parquet encoding definitions,no Date, DateTime, DateTimeOffset and String format exist,so you do not need to try with these formats.
Based on this Data type mapping for Parquet files in Azure Data Factory:

The DateTimeOffset format corresponds to Int96,I suggest you trying this transmission on the source of parquet file.
edited Nov 26 '18 at 8:47
answered Nov 26 '18 at 8:37
Jay GongJay Gong
8,6001512
8,6001512
The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value
– daz-fuller
Nov 26 '18 at 21:48
add a comment |
The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value
– daz-fuller
Nov 26 '18 at 21:48
The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value
– daz-fuller
Nov 26 '18 at 21:48
The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value
– daz-fuller
Nov 26 '18 at 21:48
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53433123%2fazure-data-factory-v2-wrong-year-copying-from-parquet-to-sql-db%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown