How to get the last not-null value in an ordered column of a huge table?












3















I have to following input:



 id | value 
----+-------
1 | 136
2 | NULL
3 | 650
4 | NULL
5 | NULL
6 | NULL
7 | 954
8 | NULL
9 | 104
10 | NULL


I expect the following result:



 id | value 
----+-------
1 | 136
2 | 136
3 | 650
4 | 650
5 | 650
6 | 650
7 | 954
8 | 954
9 | 104
10 | 104


The trivial solution would be join the tables with a < relation, and then selecting the MAX value in a GROUP BY:



WITH tmp AS (
SELECT t2.id, MAX(t1.id) AS lastKnownId
FROM t t1, t t2
WHERE
t1.value IS NOT NULL
AND
t2.id >= t1.id
GROUP BY t2.id
)
SELECT
tmp.id, t.value
FROM t, tmp
WHERE t.id = tmp.lastKnownId;


However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).



However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.



Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.



Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.



I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.



The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.










share|improve this question





























    3















    I have to following input:



     id | value 
    ----+-------
    1 | 136
    2 | NULL
    3 | 650
    4 | NULL
    5 | NULL
    6 | NULL
    7 | 954
    8 | NULL
    9 | 104
    10 | NULL


    I expect the following result:



     id | value 
    ----+-------
    1 | 136
    2 | 136
    3 | 650
    4 | 650
    5 | 650
    6 | 650
    7 | 954
    8 | 954
    9 | 104
    10 | 104


    The trivial solution would be join the tables with a < relation, and then selecting the MAX value in a GROUP BY:



    WITH tmp AS (
    SELECT t2.id, MAX(t1.id) AS lastKnownId
    FROM t t1, t t2
    WHERE
    t1.value IS NOT NULL
    AND
    t2.id >= t1.id
    GROUP BY t2.id
    )
    SELECT
    tmp.id, t.value
    FROM t, tmp
    WHERE t.id = tmp.lastKnownId;


    However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).



    However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.



    Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.



    Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.



    I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.



    The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.










    share|improve this question



























      3












      3








      3








      I have to following input:



       id | value 
      ----+-------
      1 | 136
      2 | NULL
      3 | 650
      4 | NULL
      5 | NULL
      6 | NULL
      7 | 954
      8 | NULL
      9 | 104
      10 | NULL


      I expect the following result:



       id | value 
      ----+-------
      1 | 136
      2 | 136
      3 | 650
      4 | 650
      5 | 650
      6 | 650
      7 | 954
      8 | 954
      9 | 104
      10 | 104


      The trivial solution would be join the tables with a < relation, and then selecting the MAX value in a GROUP BY:



      WITH tmp AS (
      SELECT t2.id, MAX(t1.id) AS lastKnownId
      FROM t t1, t t2
      WHERE
      t1.value IS NOT NULL
      AND
      t2.id >= t1.id
      GROUP BY t2.id
      )
      SELECT
      tmp.id, t.value
      FROM t, tmp
      WHERE t.id = tmp.lastKnownId;


      However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).



      However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.



      Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.



      Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.



      I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.



      The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.










      share|improve this question
















      I have to following input:



       id | value 
      ----+-------
      1 | 136
      2 | NULL
      3 | 650
      4 | NULL
      5 | NULL
      6 | NULL
      7 | 954
      8 | NULL
      9 | 104
      10 | NULL


      I expect the following result:



       id | value 
      ----+-------
      1 | 136
      2 | 136
      3 | 650
      4 | 650
      5 | 650
      6 | 650
      7 | 954
      8 | 954
      9 | 104
      10 | 104


      The trivial solution would be join the tables with a < relation, and then selecting the MAX value in a GROUP BY:



      WITH tmp AS (
      SELECT t2.id, MAX(t1.id) AS lastKnownId
      FROM t t1, t t2
      WHERE
      t1.value IS NOT NULL
      AND
      t2.id >= t1.id
      GROUP BY t2.id
      )
      SELECT
      tmp.id, t.value
      FROM t, tmp
      WHERE t.id = tmp.lastKnownId;


      However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).



      However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.



      Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.



      Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.



      I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.



      The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.







      t-sql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 2 hours ago







      peterh

















      asked 2 hours ago









      peterhpeterh

      1,07441431




      1,07441431






















          1 Answer
          1






          active

          oldest

          votes


















          2














          One method, by using OVER() and MAX() and COUNT() based on this source could be:



          SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
          FROM
          (
          SELECT ID, value
          ,COUNT(value) OVER (ORDER BY ID) AS Value2
          FROM dbo.HugeTable
          ) a
          ORDER BY ID;


          Result



          Id  UpdatedValue
          1 136
          2 136
          3 650
          4 650
          5 650
          6 650
          7 954
          8 954
          9 104
          10 104




          Another method based on this source, closely related to the first example



          ;WITH CTE As 
          (
          SELECT value,
          Id,
          COUNT(value)
          OVER(ORDER BY Id) As Value2
          FROM dbo.HugeTable
          ),

          CTE2 AS (
          SELECT Id,
          value,
          First_Value(value)
          OVER( PARTITION BY Value2
          ORDER BY Id) As UpdatedValue
          FROM CTE
          )
          SELECT Id,UpdatedValue
          FROM CTE2;





          share|improve this answer


























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "182"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f233610%2fhow-to-get-the-last-not-null-value-in-an-ordered-column-of-a-huge-table%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            One method, by using OVER() and MAX() and COUNT() based on this source could be:



            SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
            FROM
            (
            SELECT ID, value
            ,COUNT(value) OVER (ORDER BY ID) AS Value2
            FROM dbo.HugeTable
            ) a
            ORDER BY ID;


            Result



            Id  UpdatedValue
            1 136
            2 136
            3 650
            4 650
            5 650
            6 650
            7 954
            8 954
            9 104
            10 104




            Another method based on this source, closely related to the first example



            ;WITH CTE As 
            (
            SELECT value,
            Id,
            COUNT(value)
            OVER(ORDER BY Id) As Value2
            FROM dbo.HugeTable
            ),

            CTE2 AS (
            SELECT Id,
            value,
            First_Value(value)
            OVER( PARTITION BY Value2
            ORDER BY Id) As UpdatedValue
            FROM CTE
            )
            SELECT Id,UpdatedValue
            FROM CTE2;





            share|improve this answer






























              2














              One method, by using OVER() and MAX() and COUNT() based on this source could be:



              SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
              FROM
              (
              SELECT ID, value
              ,COUNT(value) OVER (ORDER BY ID) AS Value2
              FROM dbo.HugeTable
              ) a
              ORDER BY ID;


              Result



              Id  UpdatedValue
              1 136
              2 136
              3 650
              4 650
              5 650
              6 650
              7 954
              8 954
              9 104
              10 104




              Another method based on this source, closely related to the first example



              ;WITH CTE As 
              (
              SELECT value,
              Id,
              COUNT(value)
              OVER(ORDER BY Id) As Value2
              FROM dbo.HugeTable
              ),

              CTE2 AS (
              SELECT Id,
              value,
              First_Value(value)
              OVER( PARTITION BY Value2
              ORDER BY Id) As UpdatedValue
              FROM CTE
              )
              SELECT Id,UpdatedValue
              FROM CTE2;





              share|improve this answer




























                2












                2








                2







                One method, by using OVER() and MAX() and COUNT() based on this source could be:



                SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
                FROM
                (
                SELECT ID, value
                ,COUNT(value) OVER (ORDER BY ID) AS Value2
                FROM dbo.HugeTable
                ) a
                ORDER BY ID;


                Result



                Id  UpdatedValue
                1 136
                2 136
                3 650
                4 650
                5 650
                6 650
                7 954
                8 954
                9 104
                10 104




                Another method based on this source, closely related to the first example



                ;WITH CTE As 
                (
                SELECT value,
                Id,
                COUNT(value)
                OVER(ORDER BY Id) As Value2
                FROM dbo.HugeTable
                ),

                CTE2 AS (
                SELECT Id,
                value,
                First_Value(value)
                OVER( PARTITION BY Value2
                ORDER BY Id) As UpdatedValue
                FROM CTE
                )
                SELECT Id,UpdatedValue
                FROM CTE2;





                share|improve this answer















                One method, by using OVER() and MAX() and COUNT() based on this source could be:



                SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
                FROM
                (
                SELECT ID, value
                ,COUNT(value) OVER (ORDER BY ID) AS Value2
                FROM dbo.HugeTable
                ) a
                ORDER BY ID;


                Result



                Id  UpdatedValue
                1 136
                2 136
                3 650
                4 650
                5 650
                6 650
                7 954
                8 954
                9 104
                10 104




                Another method based on this source, closely related to the first example



                ;WITH CTE As 
                (
                SELECT value,
                Id,
                COUNT(value)
                OVER(ORDER BY Id) As Value2
                FROM dbo.HugeTable
                ),

                CTE2 AS (
                SELECT Id,
                value,
                First_Value(value)
                OVER( PARTITION BY Value2
                ORDER BY Id) As UpdatedValue
                FROM CTE
                )
                SELECT Id,UpdatedValue
                FROM CTE2;






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 52 mins ago

























                answered 1 hour ago









                Randi VertongenRandi Vertongen

                4,121924




                4,121924






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Database Administrators Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f233610%2fhow-to-get-the-last-not-null-value-in-an-ordered-column-of-a-huge-table%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Costa Masnaga

                    Create new schema in PostgreSQL using DBeaver

                    Fotorealismo