Storing relational data in Apache Flink as State and querying by a property












0















I have a database with Tables T1(id, name, age) and T2(id, subject).
Flink receives all updates from the database as event stream using something like debezium. The tables are related to each other and required data can be extracted by joining T1 with T2 on id. Currently the whole state of the database is stored in Flink MapState with id as the key. Now the problem is that I need to select the row based on name from T1 without using id. It seems like I need an index on T1(name) for making it faster. Is there any way I can automatically index it, without having to manually create an index for each table. What is the recommended way for doing this?. I know about SQL streaming on tables, but I require support for updates to the tables. By the way, I use Flink with Scala. Any pointers/suggestions would be appreciated.










share|improve this question



























    0















    I have a database with Tables T1(id, name, age) and T2(id, subject).
    Flink receives all updates from the database as event stream using something like debezium. The tables are related to each other and required data can be extracted by joining T1 with T2 on id. Currently the whole state of the database is stored in Flink MapState with id as the key. Now the problem is that I need to select the row based on name from T1 without using id. It seems like I need an index on T1(name) for making it faster. Is there any way I can automatically index it, without having to manually create an index for each table. What is the recommended way for doing this?. I know about SQL streaming on tables, but I require support for updates to the tables. By the way, I use Flink with Scala. Any pointers/suggestions would be appreciated.










    share|improve this question

























      0












      0








      0








      I have a database with Tables T1(id, name, age) and T2(id, subject).
      Flink receives all updates from the database as event stream using something like debezium. The tables are related to each other and required data can be extracted by joining T1 with T2 on id. Currently the whole state of the database is stored in Flink MapState with id as the key. Now the problem is that I need to select the row based on name from T1 without using id. It seems like I need an index on T1(name) for making it faster. Is there any way I can automatically index it, without having to manually create an index for each table. What is the recommended way for doing this?. I know about SQL streaming on tables, but I require support for updates to the tables. By the way, I use Flink with Scala. Any pointers/suggestions would be appreciated.










      share|improve this question














      I have a database with Tables T1(id, name, age) and T2(id, subject).
      Flink receives all updates from the database as event stream using something like debezium. The tables are related to each other and required data can be extracted by joining T1 with T2 on id. Currently the whole state of the database is stored in Flink MapState with id as the key. Now the problem is that I need to select the row based on name from T1 without using id. It seems like I need an index on T1(name) for making it faster. Is there any way I can automatically index it, without having to manually create an index for each table. What is the recommended way for doing this?. I know about SQL streaming on tables, but I require support for updates to the tables. By the way, I use Flink with Scala. Any pointers/suggestions would be appreciated.







      scala apache-flink flink-streaming






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 22 '18 at 19:37









      jvcjvc

      1742832




      1742832
























          1 Answer
          1






          active

          oldest

          votes


















          1














          My understanding is that you are connecting T1 and T2, and storing some representation (in MapState) of the data from these two streams in keyed state, keyed by id. It sounds like T1 and T2 are evolving over time, and you want to be able to interactively query the join at any time by specifying a name.



          One idea would be to broadcast in the name(s) you want to select, and use a KeyedBroadcastProcessFunction to process them. In its processBroadcastElement method you could use ctx.applyToKeyedState to compute the results by extracting data from the MapState records (which would have to be held in this operator). I suspect you will want to use the names as the keys in these MapState records, so that you don't have to iterate over all of the entries in each map to find the items of interest.



          You will find a somewhat similar example of this pattern in https://training.data-artisans.com/exercises/ongoingRides.html.






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437185%2fstoring-relational-data-in-apache-flink-as-state-and-querying-by-a-property%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            My understanding is that you are connecting T1 and T2, and storing some representation (in MapState) of the data from these two streams in keyed state, keyed by id. It sounds like T1 and T2 are evolving over time, and you want to be able to interactively query the join at any time by specifying a name.



            One idea would be to broadcast in the name(s) you want to select, and use a KeyedBroadcastProcessFunction to process them. In its processBroadcastElement method you could use ctx.applyToKeyedState to compute the results by extracting data from the MapState records (which would have to be held in this operator). I suspect you will want to use the names as the keys in these MapState records, so that you don't have to iterate over all of the entries in each map to find the items of interest.



            You will find a somewhat similar example of this pattern in https://training.data-artisans.com/exercises/ongoingRides.html.






            share|improve this answer






























              1














              My understanding is that you are connecting T1 and T2, and storing some representation (in MapState) of the data from these two streams in keyed state, keyed by id. It sounds like T1 and T2 are evolving over time, and you want to be able to interactively query the join at any time by specifying a name.



              One idea would be to broadcast in the name(s) you want to select, and use a KeyedBroadcastProcessFunction to process them. In its processBroadcastElement method you could use ctx.applyToKeyedState to compute the results by extracting data from the MapState records (which would have to be held in this operator). I suspect you will want to use the names as the keys in these MapState records, so that you don't have to iterate over all of the entries in each map to find the items of interest.



              You will find a somewhat similar example of this pattern in https://training.data-artisans.com/exercises/ongoingRides.html.






              share|improve this answer




























                1












                1








                1







                My understanding is that you are connecting T1 and T2, and storing some representation (in MapState) of the data from these two streams in keyed state, keyed by id. It sounds like T1 and T2 are evolving over time, and you want to be able to interactively query the join at any time by specifying a name.



                One idea would be to broadcast in the name(s) you want to select, and use a KeyedBroadcastProcessFunction to process them. In its processBroadcastElement method you could use ctx.applyToKeyedState to compute the results by extracting data from the MapState records (which would have to be held in this operator). I suspect you will want to use the names as the keys in these MapState records, so that you don't have to iterate over all of the entries in each map to find the items of interest.



                You will find a somewhat similar example of this pattern in https://training.data-artisans.com/exercises/ongoingRides.html.






                share|improve this answer















                My understanding is that you are connecting T1 and T2, and storing some representation (in MapState) of the data from these two streams in keyed state, keyed by id. It sounds like T1 and T2 are evolving over time, and you want to be able to interactively query the join at any time by specifying a name.



                One idea would be to broadcast in the name(s) you want to select, and use a KeyedBroadcastProcessFunction to process them. In its processBroadcastElement method you could use ctx.applyToKeyedState to compute the results by extracting data from the MapState records (which would have to be held in this operator). I suspect you will want to use the names as the keys in these MapState records, so that you don't have to iterate over all of the entries in each map to find the items of interest.



                You will find a somewhat similar example of this pattern in https://training.data-artisans.com/exercises/ongoingRides.html.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 22 '18 at 22:34

























                answered Nov 22 '18 at 22:11









                David AndersonDavid Anderson

                5,54421321




                5,54421321






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437185%2fstoring-relational-data-in-apache-flink-as-state-and-querying-by-a-property%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Costa Masnaga

                    Fotorealismo

                    Sidney Franklin