How to fill for the first time a SQL database with multiple tables











up vote
0
down vote

favorite












I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.



For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.



My question is : Now, what is the easiest way for me to populate this DB ?



My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?



Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related



Notes : I work with R (dplyr, etc.) and MySQL










share|improve this question
























  • Tip: You may find more valuable info on the sister site, DBA Stack Exchange
    – Basil Bourque
    Nov 18 at 20:20















up vote
0
down vote

favorite












I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.



For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.



My question is : Now, what is the easiest way for me to populate this DB ?



My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?



Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related



Notes : I work with R (dplyr, etc.) and MySQL










share|improve this question
























  • Tip: You may find more valuable info on the sister site, DBA Stack Exchange
    – Basil Bourque
    Nov 18 at 20:20













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.



For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.



My question is : Now, what is the easiest way for me to populate this DB ?



My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?



Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related



Notes : I work with R (dplyr, etc.) and MySQL










share|improve this question















I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.



For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.



My question is : Now, what is the easiest way for me to populate this DB ?



My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?



Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related



Notes : I work with R (dplyr, etc.) and MySQL







mysql r database data-modeling data-science






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 18 at 19:49

























asked Nov 16 at 8:06









Nethim

194




194












  • Tip: You may find more valuable info on the sister site, DBA Stack Exchange
    – Basil Bourque
    Nov 18 at 20:20


















  • Tip: You may find more valuable info on the sister site, DBA Stack Exchange
    – Basil Bourque
    Nov 18 at 20:20
















Tip: You may find more valuable info on the sister site, DBA Stack Exchange
– Basil Bourque
Nov 18 at 20:20




Tip: You may find more valuable info on the sister site, DBA Stack Exchange
– Basil Bourque
Nov 18 at 20:20












1 Answer
1






active

oldest

votes

















up vote
2
down vote













A serious relational database, such as Postgres for example, will offer features for populating a large database.



Bulk loading



Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.



In Postgres that would be the COPY command.



Temporarily disabling referential-integrity



Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.



In Postgres, use SET CONSTRAINTS … DEFERRED to not check constraints during each statement, and instead wait until the end of the transaction.



Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.



Other issues



For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).




  • Disable Autocommit

  • Use COPY (for mass import, mentioned above)

  • Remove Indexes

  • Remove Foreign Key Constraints (mentioned above)

  • Increase maintenance_work_mem (changing the memory allocation of your database engine)

  • Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)

  • Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)

  • Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)


Database migration



By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.






share|improve this answer























  • Thank you very much for your very complete answer @Basil
    – Nethim
    Nov 18 at 22:10











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53333749%2fhow-to-fill-for-the-first-time-a-sql-database-with-multiple-tables%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote













A serious relational database, such as Postgres for example, will offer features for populating a large database.



Bulk loading



Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.



In Postgres that would be the COPY command.



Temporarily disabling referential-integrity



Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.



In Postgres, use SET CONSTRAINTS … DEFERRED to not check constraints during each statement, and instead wait until the end of the transaction.



Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.



Other issues



For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).




  • Disable Autocommit

  • Use COPY (for mass import, mentioned above)

  • Remove Indexes

  • Remove Foreign Key Constraints (mentioned above)

  • Increase maintenance_work_mem (changing the memory allocation of your database engine)

  • Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)

  • Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)

  • Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)


Database migration



By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.






share|improve this answer























  • Thank you very much for your very complete answer @Basil
    – Nethim
    Nov 18 at 22:10















up vote
2
down vote













A serious relational database, such as Postgres for example, will offer features for populating a large database.



Bulk loading



Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.



In Postgres that would be the COPY command.



Temporarily disabling referential-integrity



Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.



In Postgres, use SET CONSTRAINTS … DEFERRED to not check constraints during each statement, and instead wait until the end of the transaction.



Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.



Other issues



For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).




  • Disable Autocommit

  • Use COPY (for mass import, mentioned above)

  • Remove Indexes

  • Remove Foreign Key Constraints (mentioned above)

  • Increase maintenance_work_mem (changing the memory allocation of your database engine)

  • Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)

  • Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)

  • Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)


Database migration



By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.






share|improve this answer























  • Thank you very much for your very complete answer @Basil
    – Nethim
    Nov 18 at 22:10













up vote
2
down vote










up vote
2
down vote









A serious relational database, such as Postgres for example, will offer features for populating a large database.



Bulk loading



Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.



In Postgres that would be the COPY command.



Temporarily disabling referential-integrity



Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.



In Postgres, use SET CONSTRAINTS … DEFERRED to not check constraints during each statement, and instead wait until the end of the transaction.



Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.



Other issues



For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).




  • Disable Autocommit

  • Use COPY (for mass import, mentioned above)

  • Remove Indexes

  • Remove Foreign Key Constraints (mentioned above)

  • Increase maintenance_work_mem (changing the memory allocation of your database engine)

  • Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)

  • Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)

  • Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)


Database migration



By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.






share|improve this answer














A serious relational database, such as Postgres for example, will offer features for populating a large database.



Bulk loading



Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.



In Postgres that would be the COPY command.



Temporarily disabling referential-integrity



Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.



In Postgres, use SET CONSTRAINTS … DEFERRED to not check constraints during each statement, and instead wait until the end of the transaction.



Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.



Other issues



For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).




  • Disable Autocommit

  • Use COPY (for mass import, mentioned above)

  • Remove Indexes

  • Remove Foreign Key Constraints (mentioned above)

  • Increase maintenance_work_mem (changing the memory allocation of your database engine)

  • Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)

  • Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)

  • Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)


Database migration



By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 18 at 20:18

























answered Nov 18 at 20:03









Basil Bourque

103k25353510




103k25353510












  • Thank you very much for your very complete answer @Basil
    – Nethim
    Nov 18 at 22:10


















  • Thank you very much for your very complete answer @Basil
    – Nethim
    Nov 18 at 22:10
















Thank you very much for your very complete answer @Basil
– Nethim
Nov 18 at 22:10




Thank you very much for your very complete answer @Basil
– Nethim
Nov 18 at 22:10


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53333749%2fhow-to-fill-for-the-first-time-a-sql-database-with-multiple-tables%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Costa Masnaga

Fotorealismo

Sidney Franklin