How to fill for the first time a SQL database with multiple tables
up vote
0
down vote
favorite
I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.
For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.
My question is : Now, what is the easiest way for me to populate this DB ?
My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?
Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related
Notes : I work with R (dplyr, etc.) and MySQL
mysql r database data-modeling data-science
add a comment |
up vote
0
down vote
favorite
I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.
For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.
My question is : Now, what is the easiest way for me to populate this DB ?
My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?
Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related
Notes : I work with R (dplyr, etc.) and MySQL
mysql r database data-modeling data-science
Tip: You may find more valuable info on the sister site, DBA Stack Exchange
– Basil Bourque
Nov 18 at 20:20
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.
For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.
My question is : Now, what is the easiest way for me to populate this DB ?
My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?
Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related
Notes : I work with R (dplyr, etc.) and MySQL
mysql r database data-modeling data-science
I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.
For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.
My question is : Now, what is the easiest way for me to populate this DB ?
My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?
Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related
Notes : I work with R (dplyr, etc.) and MySQL
mysql r database data-modeling data-science
mysql r database data-modeling data-science
edited Nov 18 at 19:49
asked Nov 16 at 8:06
Nethim
194
194
Tip: You may find more valuable info on the sister site, DBA Stack Exchange
– Basil Bourque
Nov 18 at 20:20
add a comment |
Tip: You may find more valuable info on the sister site, DBA Stack Exchange
– Basil Bourque
Nov 18 at 20:20
Tip: You may find more valuable info on the sister site, DBA Stack Exchange
– Basil Bourque
Nov 18 at 20:20
Tip: You may find more valuable info on the sister site, DBA Stack Exchange
– Basil Bourque
Nov 18 at 20:20
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
A serious relational database, such as Postgres for example, will offer features for populating a large database.
Bulk loading
Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT
. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.
In Postgres that would be the COPY
command.
Temporarily disabling referential-integrity
Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.
In Postgres, use SET CONSTRAINTS … DEFERRED
to not check constraints during each statement, and instead wait until the end of the transaction.
Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.
Other issues
For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).
- Disable Autocommit
- Use
COPY
(for mass import, mentioned above) - Remove Indexes
- Remove Foreign Key Constraints (mentioned above)
- Increase
maintenance_work_mem
(changing the memory allocation of your database engine) - Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)
- Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)
- Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)
Database migration
By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.
Thank you very much for your very complete answer @Basil
– Nethim
Nov 18 at 22:10
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
A serious relational database, such as Postgres for example, will offer features for populating a large database.
Bulk loading
Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT
. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.
In Postgres that would be the COPY
command.
Temporarily disabling referential-integrity
Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.
In Postgres, use SET CONSTRAINTS … DEFERRED
to not check constraints during each statement, and instead wait until the end of the transaction.
Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.
Other issues
For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).
- Disable Autocommit
- Use
COPY
(for mass import, mentioned above) - Remove Indexes
- Remove Foreign Key Constraints (mentioned above)
- Increase
maintenance_work_mem
(changing the memory allocation of your database engine) - Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)
- Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)
- Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)
Database migration
By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.
Thank you very much for your very complete answer @Basil
– Nethim
Nov 18 at 22:10
add a comment |
up vote
2
down vote
A serious relational database, such as Postgres for example, will offer features for populating a large database.
Bulk loading
Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT
. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.
In Postgres that would be the COPY
command.
Temporarily disabling referential-integrity
Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.
In Postgres, use SET CONSTRAINTS … DEFERRED
to not check constraints during each statement, and instead wait until the end of the transaction.
Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.
Other issues
For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).
- Disable Autocommit
- Use
COPY
(for mass import, mentioned above) - Remove Indexes
- Remove Foreign Key Constraints (mentioned above)
- Increase
maintenance_work_mem
(changing the memory allocation of your database engine) - Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)
- Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)
- Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)
Database migration
By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.
Thank you very much for your very complete answer @Basil
– Nethim
Nov 18 at 22:10
add a comment |
up vote
2
down vote
up vote
2
down vote
A serious relational database, such as Postgres for example, will offer features for populating a large database.
Bulk loading
Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT
. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.
In Postgres that would be the COPY
command.
Temporarily disabling referential-integrity
Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.
In Postgres, use SET CONSTRAINTS … DEFERRED
to not check constraints during each statement, and instead wait until the end of the transaction.
Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.
Other issues
For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).
- Disable Autocommit
- Use
COPY
(for mass import, mentioned above) - Remove Indexes
- Remove Foreign Key Constraints (mentioned above)
- Increase
maintenance_work_mem
(changing the memory allocation of your database engine) - Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)
- Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)
- Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)
Database migration
By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.
A serious relational database, such as Postgres for example, will offer features for populating a large database.
Bulk loading
Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT
. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.
In Postgres that would be the COPY
command.
Temporarily disabling referential-integrity
Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.
In Postgres, use SET CONSTRAINTS … DEFERRED
to not check constraints during each statement, and instead wait until the end of the transaction.
Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.
Other issues
For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).
- Disable Autocommit
- Use
COPY
(for mass import, mentioned above) - Remove Indexes
- Remove Foreign Key Constraints (mentioned above)
- Increase
maintenance_work_mem
(changing the memory allocation of your database engine) - Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)
- Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)
- Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)
Database migration
By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.
edited Nov 18 at 20:18
answered Nov 18 at 20:03
Basil Bourque
103k25353510
103k25353510
Thank you very much for your very complete answer @Basil
– Nethim
Nov 18 at 22:10
add a comment |
Thank you very much for your very complete answer @Basil
– Nethim
Nov 18 at 22:10
Thank you very much for your very complete answer @Basil
– Nethim
Nov 18 at 22:10
Thank you very much for your very complete answer @Basil
– Nethim
Nov 18 at 22:10
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53333749%2fhow-to-fill-for-the-first-time-a-sql-database-with-multiple-tables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Tip: You may find more valuable info on the sister site, DBA Stack Exchange
– Basil Bourque
Nov 18 at 20:20