Azure Data Factory v2 - wrong year copying from parquet to SQL DB












2















I'm having a weird issue with Azure Data Factory v2. There's a Spark Job which is running and producing parquet files as output, an ADFv2 copy activity then takes the output parquet and copies the data into an Azure SQL Database. All is working fine except for dates! When the data lands in SQL the year is 1969 years out. So todays date (2018-11-22) would land as 3987-11-22.



I've tried changing the source and destination types between Date, DateTime, DateTimeOffset and String but with no success. At the moment I'm correcting the dates in the database but this is not really ideal.



I've opened the source parquet files using Parquet Viewer, Spark and Python (desktop) and they all correctly show the year as 2018










share|improve this question



























    2















    I'm having a weird issue with Azure Data Factory v2. There's a Spark Job which is running and producing parquet files as output, an ADFv2 copy activity then takes the output parquet and copies the data into an Azure SQL Database. All is working fine except for dates! When the data lands in SQL the year is 1969 years out. So todays date (2018-11-22) would land as 3987-11-22.



    I've tried changing the source and destination types between Date, DateTime, DateTimeOffset and String but with no success. At the moment I'm correcting the dates in the database but this is not really ideal.



    I've opened the source parquet files using Parquet Viewer, Spark and Python (desktop) and they all correctly show the year as 2018










    share|improve this question

























      2












      2








      2








      I'm having a weird issue with Azure Data Factory v2. There's a Spark Job which is running and producing parquet files as output, an ADFv2 copy activity then takes the output parquet and copies the data into an Azure SQL Database. All is working fine except for dates! When the data lands in SQL the year is 1969 years out. So todays date (2018-11-22) would land as 3987-11-22.



      I've tried changing the source and destination types between Date, DateTime, DateTimeOffset and String but with no success. At the moment I'm correcting the dates in the database but this is not really ideal.



      I've opened the source parquet files using Parquet Viewer, Spark and Python (desktop) and they all correctly show the year as 2018










      share|improve this question














      I'm having a weird issue with Azure Data Factory v2. There's a Spark Job which is running and producing parquet files as output, an ADFv2 copy activity then takes the output parquet and copies the data into an Azure SQL Database. All is working fine except for dates! When the data lands in SQL the year is 1969 years out. So todays date (2018-11-22) would land as 3987-11-22.



      I've tried changing the source and destination types between Date, DateTime, DateTimeOffset and String but with no success. At the moment I'm correcting the dates in the database but this is not really ideal.



      I've opened the source parquet files using Parquet Viewer, Spark and Python (desktop) and they all correctly show the year as 2018







      azure azure-sql-database parquet azure-data-factory azure-data-factory-2






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 22 '18 at 14:28









      daz-fullerdaz-fuller

      8021613




      8021613
























          2 Answers
          2






          active

          oldest

          votes


















          2














          According to parquet date type definition,
          https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types
          The date is stored as "the number of days from the Unix epoch, 1 January 1970"



          And ADF is using .net type doing the transformation. According to .net type definition, Time values are measured in 100-nanosecond units called ticks. A particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.)
          https://docs.microsoft.com/en-us/dotnet/api/system.datetime?view=netframework-4.7.2



          Seems extra 1969 is added for this reason. But not sure whether is this a bug. What is your parquet data type? is it Date? and what is the sql data type?
          Could you provide the copy activity run id? Or maybe some parquet sample data?






          share|improve this answer


























          • Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue

            – daz-fuller
            Nov 26 '18 at 22:21








          • 1





            Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.

            – Qingquan Yang
            Nov 29 '18 at 6:01











          • I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly

            – daz-fuller
            Nov 29 '18 at 7:04



















          1














          Based on Parquet encoding definitions,no Date, DateTime, DateTimeOffset and String format exist,so you do not need to try with these formats.



          Based on this Data type mapping for Parquet files in Azure Data Factory:



          enter image description here



          The DateTimeOffset format corresponds to Int96,I suggest you trying this transmission on the source of parquet file.






          share|improve this answer


























          • The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value

            – daz-fuller
            Nov 26 '18 at 21:48











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53433123%2fazure-data-factory-v2-wrong-year-copying-from-parquet-to-sql-db%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          According to parquet date type definition,
          https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types
          The date is stored as "the number of days from the Unix epoch, 1 January 1970"



          And ADF is using .net type doing the transformation. According to .net type definition, Time values are measured in 100-nanosecond units called ticks. A particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.)
          https://docs.microsoft.com/en-us/dotnet/api/system.datetime?view=netframework-4.7.2



          Seems extra 1969 is added for this reason. But not sure whether is this a bug. What is your parquet data type? is it Date? and what is the sql data type?
          Could you provide the copy activity run id? Or maybe some parquet sample data?






          share|improve this answer


























          • Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue

            – daz-fuller
            Nov 26 '18 at 22:21








          • 1





            Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.

            – Qingquan Yang
            Nov 29 '18 at 6:01











          • I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly

            – daz-fuller
            Nov 29 '18 at 7:04
















          2














          According to parquet date type definition,
          https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types
          The date is stored as "the number of days from the Unix epoch, 1 January 1970"



          And ADF is using .net type doing the transformation. According to .net type definition, Time values are measured in 100-nanosecond units called ticks. A particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.)
          https://docs.microsoft.com/en-us/dotnet/api/system.datetime?view=netframework-4.7.2



          Seems extra 1969 is added for this reason. But not sure whether is this a bug. What is your parquet data type? is it Date? and what is the sql data type?
          Could you provide the copy activity run id? Or maybe some parquet sample data?






          share|improve this answer


























          • Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue

            – daz-fuller
            Nov 26 '18 at 22:21








          • 1





            Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.

            – Qingquan Yang
            Nov 29 '18 at 6:01











          • I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly

            – daz-fuller
            Nov 29 '18 at 7:04














          2












          2








          2







          According to parquet date type definition,
          https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types
          The date is stored as "the number of days from the Unix epoch, 1 January 1970"



          And ADF is using .net type doing the transformation. According to .net type definition, Time values are measured in 100-nanosecond units called ticks. A particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.)
          https://docs.microsoft.com/en-us/dotnet/api/system.datetime?view=netframework-4.7.2



          Seems extra 1969 is added for this reason. But not sure whether is this a bug. What is your parquet data type? is it Date? and what is the sql data type?
          Could you provide the copy activity run id? Or maybe some parquet sample data?






          share|improve this answer















          According to parquet date type definition,
          https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types
          The date is stored as "the number of days from the Unix epoch, 1 January 1970"



          And ADF is using .net type doing the transformation. According to .net type definition, Time values are measured in 100-nanosecond units called ticks. A particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.)
          https://docs.microsoft.com/en-us/dotnet/api/system.datetime?view=netframework-4.7.2



          Seems extra 1969 is added for this reason. But not sure whether is this a bug. What is your parquet data type? is it Date? and what is the sql data type?
          Could you provide the copy activity run id? Or maybe some parquet sample data?







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 24 '18 at 5:11

























          answered Nov 24 '18 at 5:04









          Qingquan YangQingquan Yang

          1562




          1562













          • Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue

            – daz-fuller
            Nov 26 '18 at 22:21








          • 1





            Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.

            – Qingquan Yang
            Nov 29 '18 at 6:01











          • I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly

            – daz-fuller
            Nov 29 '18 at 7:04



















          • Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue

            – daz-fuller
            Nov 26 '18 at 22:21








          • 1





            Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.

            – Qingquan Yang
            Nov 29 '18 at 6:01











          • I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly

            – daz-fuller
            Nov 29 '18 at 7:04

















          Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue

          – daz-fuller
          Nov 26 '18 at 22:21







          Thanks for that. The datatype in the DataFrame in Databricks is java.sql.Date and it is targeting a DateTime field in the SQL database. I've created a repo over at github which contains a dotnetcore console app using parquet.net which reads a sample file created in Databricks. It has a java.sql.Date and and java.sql.Timestamp field. The parquet file is in the repo and in the /media directory there's some screenshots from Python, Parquet Viewer, Databricks, ADF and the output of the console app. github.com/dazfuller/adf-parquet-date-issue

          – daz-fuller
          Nov 26 '18 at 22:21






          1




          1





          Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.

          – Qingquan Yang
          Nov 29 '18 at 6:01





          Hi @daz-fuller, this is a known issue and the fix is deploying now. This may take a couple of days. Please try later to confirm if the issue is fixed.

          – Qingquan Yang
          Nov 29 '18 at 6:01













          I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly

          – daz-fuller
          Nov 29 '18 at 7:04





          I just spotted it last night when the SQL fix I implemented overflowed the type :) Thanks for resolving this so quickly

          – daz-fuller
          Nov 29 '18 at 7:04













          1














          Based on Parquet encoding definitions,no Date, DateTime, DateTimeOffset and String format exist,so you do not need to try with these formats.



          Based on this Data type mapping for Parquet files in Azure Data Factory:



          enter image description here



          The DateTimeOffset format corresponds to Int96,I suggest you trying this transmission on the source of parquet file.






          share|improve this answer


























          • The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value

            – daz-fuller
            Nov 26 '18 at 21:48
















          1














          Based on Parquet encoding definitions,no Date, DateTime, DateTimeOffset and String format exist,so you do not need to try with these formats.



          Based on this Data type mapping for Parquet files in Azure Data Factory:



          enter image description here



          The DateTimeOffset format corresponds to Int96,I suggest you trying this transmission on the source of parquet file.






          share|improve this answer


























          • The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value

            – daz-fuller
            Nov 26 '18 at 21:48














          1












          1








          1







          Based on Parquet encoding definitions,no Date, DateTime, DateTimeOffset and String format exist,so you do not need to try with these formats.



          Based on this Data type mapping for Parquet files in Azure Data Factory:



          enter image description here



          The DateTimeOffset format corresponds to Int96,I suggest you trying this transmission on the source of parquet file.






          share|improve this answer















          Based on Parquet encoding definitions,no Date, DateTime, DateTimeOffset and String format exist,so you do not need to try with these formats.



          Based on this Data type mapping for Parquet files in Azure Data Factory:



          enter image description here



          The DateTimeOffset format corresponds to Int96,I suggest you trying this transmission on the source of parquet file.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 26 '18 at 8:47

























          answered Nov 26 '18 at 8:37









          Jay GongJay Gong

          8,6001512




          8,6001512













          • The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value

            – daz-fuller
            Nov 26 '18 at 21:48



















          • The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value

            – daz-fuller
            Nov 26 '18 at 21:48

















          The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value

          – daz-fuller
          Nov 26 '18 at 21:48





          The parquet in this example can't change and Int96 isn't an option for the dataset. Changing the dataset type to Timespan would give the closest approximation to the underlying value

          – daz-fuller
          Nov 26 '18 at 21:48


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53433123%2fazure-data-factory-v2-wrong-year-copying-from-parquet-to-sql-db%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown