Storing relational data in Apache Flink as State and querying by a property
I have a database with Tables T1(id, name, age) and T2(id, subject).
Flink receives all updates from the database as event stream using something like debezium. The tables are related to each other and required data can be extracted by joining T1 with T2 on id. Currently the whole state of the database is stored in Flink MapState with id as the key. Now the problem is that I need to select the row based on name from T1 without using id. It seems like I need an index on T1(name) for making it faster. Is there any way I can automatically index it, without having to manually create an index for each table. What is the recommended way for doing this?. I know about SQL streaming on tables, but I require support for updates to the tables. By the way, I use Flink with Scala. Any pointers/suggestions would be appreciated.
scala apache-flink flink-streaming
add a comment |
I have a database with Tables T1(id, name, age) and T2(id, subject).
Flink receives all updates from the database as event stream using something like debezium. The tables are related to each other and required data can be extracted by joining T1 with T2 on id. Currently the whole state of the database is stored in Flink MapState with id as the key. Now the problem is that I need to select the row based on name from T1 without using id. It seems like I need an index on T1(name) for making it faster. Is there any way I can automatically index it, without having to manually create an index for each table. What is the recommended way for doing this?. I know about SQL streaming on tables, but I require support for updates to the tables. By the way, I use Flink with Scala. Any pointers/suggestions would be appreciated.
scala apache-flink flink-streaming
add a comment |
I have a database with Tables T1(id, name, age) and T2(id, subject).
Flink receives all updates from the database as event stream using something like debezium. The tables are related to each other and required data can be extracted by joining T1 with T2 on id. Currently the whole state of the database is stored in Flink MapState with id as the key. Now the problem is that I need to select the row based on name from T1 without using id. It seems like I need an index on T1(name) for making it faster. Is there any way I can automatically index it, without having to manually create an index for each table. What is the recommended way for doing this?. I know about SQL streaming on tables, but I require support for updates to the tables. By the way, I use Flink with Scala. Any pointers/suggestions would be appreciated.
scala apache-flink flink-streaming
I have a database with Tables T1(id, name, age) and T2(id, subject).
Flink receives all updates from the database as event stream using something like debezium. The tables are related to each other and required data can be extracted by joining T1 with T2 on id. Currently the whole state of the database is stored in Flink MapState with id as the key. Now the problem is that I need to select the row based on name from T1 without using id. It seems like I need an index on T1(name) for making it faster. Is there any way I can automatically index it, without having to manually create an index for each table. What is the recommended way for doing this?. I know about SQL streaming on tables, but I require support for updates to the tables. By the way, I use Flink with Scala. Any pointers/suggestions would be appreciated.
scala apache-flink flink-streaming
scala apache-flink flink-streaming
asked Nov 22 '18 at 19:37
jvcjvc
1742832
1742832
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
My understanding is that you are connecting T1 and T2, and storing some representation (in MapState) of the data from these two streams in keyed state, keyed by id. It sounds like T1 and T2 are evolving over time, and you want to be able to interactively query the join at any time by specifying a name.
One idea would be to broadcast in the name(s) you want to select, and use a KeyedBroadcastProcessFunction to process them. In its processBroadcastElement method you could use ctx.applyToKeyedState to compute the results by extracting data from the MapState records (which would have to be held in this operator). I suspect you will want to use the names as the keys in these MapState records, so that you don't have to iterate over all of the entries in each map to find the items of interest.
You will find a somewhat similar example of this pattern in https://training.data-artisans.com/exercises/ongoingRides.html.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437185%2fstoring-relational-data-in-apache-flink-as-state-and-querying-by-a-property%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
My understanding is that you are connecting T1 and T2, and storing some representation (in MapState) of the data from these two streams in keyed state, keyed by id. It sounds like T1 and T2 are evolving over time, and you want to be able to interactively query the join at any time by specifying a name.
One idea would be to broadcast in the name(s) you want to select, and use a KeyedBroadcastProcessFunction to process them. In its processBroadcastElement method you could use ctx.applyToKeyedState to compute the results by extracting data from the MapState records (which would have to be held in this operator). I suspect you will want to use the names as the keys in these MapState records, so that you don't have to iterate over all of the entries in each map to find the items of interest.
You will find a somewhat similar example of this pattern in https://training.data-artisans.com/exercises/ongoingRides.html.
add a comment |
My understanding is that you are connecting T1 and T2, and storing some representation (in MapState) of the data from these two streams in keyed state, keyed by id. It sounds like T1 and T2 are evolving over time, and you want to be able to interactively query the join at any time by specifying a name.
One idea would be to broadcast in the name(s) you want to select, and use a KeyedBroadcastProcessFunction to process them. In its processBroadcastElement method you could use ctx.applyToKeyedState to compute the results by extracting data from the MapState records (which would have to be held in this operator). I suspect you will want to use the names as the keys in these MapState records, so that you don't have to iterate over all of the entries in each map to find the items of interest.
You will find a somewhat similar example of this pattern in https://training.data-artisans.com/exercises/ongoingRides.html.
add a comment |
My understanding is that you are connecting T1 and T2, and storing some representation (in MapState) of the data from these two streams in keyed state, keyed by id. It sounds like T1 and T2 are evolving over time, and you want to be able to interactively query the join at any time by specifying a name.
One idea would be to broadcast in the name(s) you want to select, and use a KeyedBroadcastProcessFunction to process them. In its processBroadcastElement method you could use ctx.applyToKeyedState to compute the results by extracting data from the MapState records (which would have to be held in this operator). I suspect you will want to use the names as the keys in these MapState records, so that you don't have to iterate over all of the entries in each map to find the items of interest.
You will find a somewhat similar example of this pattern in https://training.data-artisans.com/exercises/ongoingRides.html.
My understanding is that you are connecting T1 and T2, and storing some representation (in MapState) of the data from these two streams in keyed state, keyed by id. It sounds like T1 and T2 are evolving over time, and you want to be able to interactively query the join at any time by specifying a name.
One idea would be to broadcast in the name(s) you want to select, and use a KeyedBroadcastProcessFunction to process them. In its processBroadcastElement method you could use ctx.applyToKeyedState to compute the results by extracting data from the MapState records (which would have to be held in this operator). I suspect you will want to use the names as the keys in these MapState records, so that you don't have to iterate over all of the entries in each map to find the items of interest.
You will find a somewhat similar example of this pattern in https://training.data-artisans.com/exercises/ongoingRides.html.
edited Nov 22 '18 at 22:34
answered Nov 22 '18 at 22:11
David AndersonDavid Anderson
5,54421321
5,54421321
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437185%2fstoring-relational-data-in-apache-flink-as-state-and-querying-by-a-property%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown