How to get the last not-null value in an ordered column of a huge table?
I have to following input:
id | value
----+-------
1 | 136
2 | NULL
3 | 650
4 | NULL
5 | NULL
6 | NULL
7 | 954
8 | NULL
9 | 104
10 | NULL
I expect the following result:
id | value
----+-------
1 | 136
2 | 136
3 | 650
4 | 650
5 | 650
6 | 650
7 | 954
8 | 954
9 | 104
10 | 104
The trivial solution would be join the tables with a <
relation, and then selecting the MAX
value in a GROUP BY
:
WITH tmp AS (
SELECT t2.id, MAX(t1.id) AS lastKnownId
FROM t t1, t t2
WHERE
t1.value IS NOT NULL
AND
t2.id >= t1.id
GROUP BY t2.id
)
SELECT
tmp.id, t.value
FROM t, tmp
WHERE t.id = tmp.lastKnownId;
However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).
However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.
Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.
Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.
I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.
The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.
t-sql
add a comment |
I have to following input:
id | value
----+-------
1 | 136
2 | NULL
3 | 650
4 | NULL
5 | NULL
6 | NULL
7 | 954
8 | NULL
9 | 104
10 | NULL
I expect the following result:
id | value
----+-------
1 | 136
2 | 136
3 | 650
4 | 650
5 | 650
6 | 650
7 | 954
8 | 954
9 | 104
10 | 104
The trivial solution would be join the tables with a <
relation, and then selecting the MAX
value in a GROUP BY
:
WITH tmp AS (
SELECT t2.id, MAX(t1.id) AS lastKnownId
FROM t t1, t t2
WHERE
t1.value IS NOT NULL
AND
t2.id >= t1.id
GROUP BY t2.id
)
SELECT
tmp.id, t.value
FROM t, tmp
WHERE t.id = tmp.lastKnownId;
However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).
However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.
Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.
Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.
I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.
The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.
t-sql
add a comment |
I have to following input:
id | value
----+-------
1 | 136
2 | NULL
3 | 650
4 | NULL
5 | NULL
6 | NULL
7 | 954
8 | NULL
9 | 104
10 | NULL
I expect the following result:
id | value
----+-------
1 | 136
2 | 136
3 | 650
4 | 650
5 | 650
6 | 650
7 | 954
8 | 954
9 | 104
10 | 104
The trivial solution would be join the tables with a <
relation, and then selecting the MAX
value in a GROUP BY
:
WITH tmp AS (
SELECT t2.id, MAX(t1.id) AS lastKnownId
FROM t t1, t t2
WHERE
t1.value IS NOT NULL
AND
t2.id >= t1.id
GROUP BY t2.id
)
SELECT
tmp.id, t.value
FROM t, tmp
WHERE t.id = tmp.lastKnownId;
However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).
However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.
Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.
Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.
I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.
The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.
t-sql
I have to following input:
id | value
----+-------
1 | 136
2 | NULL
3 | 650
4 | NULL
5 | NULL
6 | NULL
7 | 954
8 | NULL
9 | 104
10 | NULL
I expect the following result:
id | value
----+-------
1 | 136
2 | 136
3 | 650
4 | 650
5 | 650
6 | 650
7 | 954
8 | 954
9 | 104
10 | 104
The trivial solution would be join the tables with a <
relation, and then selecting the MAX
value in a GROUP BY
:
WITH tmp AS (
SELECT t2.id, MAX(t1.id) AS lastKnownId
FROM t t1, t t2
WHERE
t1.value IS NOT NULL
AND
t2.id >= t1.id
GROUP BY t2.id
)
SELECT
tmp.id, t.value
FROM t, tmp
WHERE t.id = tmp.lastKnownId;
However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( O(n^2) ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( O(n) ).
However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.
Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.
Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.
I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.
The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.
t-sql
t-sql
edited 2 hours ago
peterh
asked 2 hours ago
peterhpeterh
1,07441431
1,07441431
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
One method, by using OVER()
and MAX()
and COUNT()
based on this source could be:
SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
FROM
(
SELECT ID, value
,COUNT(value) OVER (ORDER BY ID) AS Value2
FROM dbo.HugeTable
) a
ORDER BY ID;
Result
Id UpdatedValue
1 136
2 136
3 650
4 650
5 650
6 650
7 954
8 954
9 104
10 104
Another method based on this source, closely related to the first example
;WITH CTE As
(
SELECT value,
Id,
COUNT(value)
OVER(ORDER BY Id) As Value2
FROM dbo.HugeTable
),
CTE2 AS (
SELECT Id,
value,
First_Value(value)
OVER( PARTITION BY Value2
ORDER BY Id) As UpdatedValue
FROM CTE
)
SELECT Id,UpdatedValue
FROM CTE2;
add a comment |
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f233610%2fhow-to-get-the-last-not-null-value-in-an-ordered-column-of-a-huge-table%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
One method, by using OVER()
and MAX()
and COUNT()
based on this source could be:
SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
FROM
(
SELECT ID, value
,COUNT(value) OVER (ORDER BY ID) AS Value2
FROM dbo.HugeTable
) a
ORDER BY ID;
Result
Id UpdatedValue
1 136
2 136
3 650
4 650
5 650
6 650
7 954
8 954
9 104
10 104
Another method based on this source, closely related to the first example
;WITH CTE As
(
SELECT value,
Id,
COUNT(value)
OVER(ORDER BY Id) As Value2
FROM dbo.HugeTable
),
CTE2 AS (
SELECT Id,
value,
First_Value(value)
OVER( PARTITION BY Value2
ORDER BY Id) As UpdatedValue
FROM CTE
)
SELECT Id,UpdatedValue
FROM CTE2;
add a comment |
One method, by using OVER()
and MAX()
and COUNT()
based on this source could be:
SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
FROM
(
SELECT ID, value
,COUNT(value) OVER (ORDER BY ID) AS Value2
FROM dbo.HugeTable
) a
ORDER BY ID;
Result
Id UpdatedValue
1 136
2 136
3 650
4 650
5 650
6 650
7 954
8 954
9 104
10 104
Another method based on this source, closely related to the first example
;WITH CTE As
(
SELECT value,
Id,
COUNT(value)
OVER(ORDER BY Id) As Value2
FROM dbo.HugeTable
),
CTE2 AS (
SELECT Id,
value,
First_Value(value)
OVER( PARTITION BY Value2
ORDER BY Id) As UpdatedValue
FROM CTE
)
SELECT Id,UpdatedValue
FROM CTE2;
add a comment |
One method, by using OVER()
and MAX()
and COUNT()
based on this source could be:
SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
FROM
(
SELECT ID, value
,COUNT(value) OVER (ORDER BY ID) AS Value2
FROM dbo.HugeTable
) a
ORDER BY ID;
Result
Id UpdatedValue
1 136
2 136
3 650
4 650
5 650
6 650
7 954
8 954
9 104
10 104
Another method based on this source, closely related to the first example
;WITH CTE As
(
SELECT value,
Id,
COUNT(value)
OVER(ORDER BY Id) As Value2
FROM dbo.HugeTable
),
CTE2 AS (
SELECT Id,
value,
First_Value(value)
OVER( PARTITION BY Value2
ORDER BY Id) As UpdatedValue
FROM CTE
)
SELECT Id,UpdatedValue
FROM CTE2;
One method, by using OVER()
and MAX()
and COUNT()
based on this source could be:
SELECT ID, MAX(value) OVER (PARTITION BY Value2) as value
FROM
(
SELECT ID, value
,COUNT(value) OVER (ORDER BY ID) AS Value2
FROM dbo.HugeTable
) a
ORDER BY ID;
Result
Id UpdatedValue
1 136
2 136
3 650
4 650
5 650
6 650
7 954
8 954
9 104
10 104
Another method based on this source, closely related to the first example
;WITH CTE As
(
SELECT value,
Id,
COUNT(value)
OVER(ORDER BY Id) As Value2
FROM dbo.HugeTable
),
CTE2 AS (
SELECT Id,
value,
First_Value(value)
OVER( PARTITION BY Value2
ORDER BY Id) As UpdatedValue
FROM CTE
)
SELECT Id,UpdatedValue
FROM CTE2;
edited 52 mins ago
answered 1 hour ago
Randi VertongenRandi Vertongen
4,121924
4,121924
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f233610%2fhow-to-get-the-last-not-null-value-in-an-ordered-column-of-a-huge-table%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown