Accessing files newest to oldest in a efficient way
So for a project I am working on I am saving lots of json files locally. Each file has a time listed inside of it and I want to be able to access them newest to oldest without having to look through each one of them get the date and sort it. I was thinking to use a binary tree to do this but I cant think of a good way to implement this. Is there a module in npm for this or some other way I could do this to get better results.
javascript node.js json performance filesystems
add a comment |
So for a project I am working on I am saving lots of json files locally. Each file has a time listed inside of it and I want to be able to access them newest to oldest without having to look through each one of them get the date and sort it. I was thinking to use a binary tree to do this but I cant think of a good way to implement this. Is there a module in npm for this or some other way I could do this to get better results.
javascript node.js json performance filesystems
Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?
– Matt Way
Nov 25 '18 at 22:54
Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.
– John Becker
Nov 26 '18 at 0:30
add a comment |
So for a project I am working on I am saving lots of json files locally. Each file has a time listed inside of it and I want to be able to access them newest to oldest without having to look through each one of them get the date and sort it. I was thinking to use a binary tree to do this but I cant think of a good way to implement this. Is there a module in npm for this or some other way I could do this to get better results.
javascript node.js json performance filesystems
So for a project I am working on I am saving lots of json files locally. Each file has a time listed inside of it and I want to be able to access them newest to oldest without having to look through each one of them get the date and sort it. I was thinking to use a binary tree to do this but I cant think of a good way to implement this. Is there a module in npm for this or some other way I could do this to get better results.
javascript node.js json performance filesystems
javascript node.js json performance filesystems
asked Nov 25 '18 at 21:58
John BeckerJohn Becker
61
61
Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?
– Matt Way
Nov 25 '18 at 22:54
Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.
– John Becker
Nov 26 '18 at 0:30
add a comment |
Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?
– Matt Way
Nov 25 '18 at 22:54
Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.
– John Becker
Nov 26 '18 at 0:30
Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?
– Matt Way
Nov 25 '18 at 22:54
Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?
– Matt Way
Nov 25 '18 at 22:54
Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.
– John Becker
Nov 26 '18 at 0:30
Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.
– John Becker
Nov 26 '18 at 0:30
add a comment |
2 Answers
2
active
oldest
votes
fs.stat would be useful in this case, and wouldn't require any modules in npm.
However, in this case, you will come across a lot of problems with synchronous loops; for this you might want to use await and async(see this for more details).
fs.stat returns an object which returns things such as when the file was edited, and when the file was created.
If you want to put the JSON files in a folder, I would use fs.readdir; if you didn't, you could use fs.readdir to list all of the files in your current folder, then use the mime-type npm module to check whether the files you are ordering are JSON files or not.
That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(
– John Becker
Nov 26 '18 at 1:24
add a comment |
What you could do is store your own lookup file. This is a separate json file that contains ids (filenames), and their associated internal date details. This would also afford you the ability to add any additional data you might need to search/sort by. The downside of this method is that you need ensure that the lookup file, and the actual files are all in sync. This means that wherever you CRUD your data, you also need to make sure the lookup is updated. An alternative to this is to build a program that does a long scan of all your files periodically, and builds the lookup file. This allows you to not have to edit the file on all changes, but does limit the up to dateness of the lookup file.
All performance gains are usually a trade off of mem/caching and complexity.
The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?
This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time
– John Becker
Nov 27 '18 at 5:56
Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.
– Matt Way
Nov 27 '18 at 6:49
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472414%2faccessing-files-newest-to-oldest-in-a-efficient-way%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
fs.stat would be useful in this case, and wouldn't require any modules in npm.
However, in this case, you will come across a lot of problems with synchronous loops; for this you might want to use await and async(see this for more details).
fs.stat returns an object which returns things such as when the file was edited, and when the file was created.
If you want to put the JSON files in a folder, I would use fs.readdir; if you didn't, you could use fs.readdir to list all of the files in your current folder, then use the mime-type npm module to check whether the files you are ordering are JSON files or not.
That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(
– John Becker
Nov 26 '18 at 1:24
add a comment |
fs.stat would be useful in this case, and wouldn't require any modules in npm.
However, in this case, you will come across a lot of problems with synchronous loops; for this you might want to use await and async(see this for more details).
fs.stat returns an object which returns things such as when the file was edited, and when the file was created.
If you want to put the JSON files in a folder, I would use fs.readdir; if you didn't, you could use fs.readdir to list all of the files in your current folder, then use the mime-type npm module to check whether the files you are ordering are JSON files or not.
That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(
– John Becker
Nov 26 '18 at 1:24
add a comment |
fs.stat would be useful in this case, and wouldn't require any modules in npm.
However, in this case, you will come across a lot of problems with synchronous loops; for this you might want to use await and async(see this for more details).
fs.stat returns an object which returns things such as when the file was edited, and when the file was created.
If you want to put the JSON files in a folder, I would use fs.readdir; if you didn't, you could use fs.readdir to list all of the files in your current folder, then use the mime-type npm module to check whether the files you are ordering are JSON files or not.
fs.stat would be useful in this case, and wouldn't require any modules in npm.
However, in this case, you will come across a lot of problems with synchronous loops; for this you might want to use await and async(see this for more details).
fs.stat returns an object which returns things such as when the file was edited, and when the file was created.
If you want to put the JSON files in a folder, I would use fs.readdir; if you didn't, you could use fs.readdir to list all of the files in your current folder, then use the mime-type npm module to check whether the files you are ordering are JSON files or not.
answered Nov 26 '18 at 1:16
Sarah CrossSarah Cross
1216
1216
That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(
– John Becker
Nov 26 '18 at 1:24
add a comment |
That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(
– John Becker
Nov 26 '18 at 1:24
That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(
– John Becker
Nov 26 '18 at 1:24
That would work but the problem is the files will be periodically appended changing the last edited date. I want to be able to access file based on their internal creation data (This cant be the creation data from fs.stat because the time data arrives might be days to weeks after it was collected and I want to be able to get it based on the collection date) I like the solution you have and im sure that someone else could use it but it doesn't work in my case :(
– John Becker
Nov 26 '18 at 1:24
add a comment |
What you could do is store your own lookup file. This is a separate json file that contains ids (filenames), and their associated internal date details. This would also afford you the ability to add any additional data you might need to search/sort by. The downside of this method is that you need ensure that the lookup file, and the actual files are all in sync. This means that wherever you CRUD your data, you also need to make sure the lookup is updated. An alternative to this is to build a program that does a long scan of all your files periodically, and builds the lookup file. This allows you to not have to edit the file on all changes, but does limit the up to dateness of the lookup file.
All performance gains are usually a trade off of mem/caching and complexity.
The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?
This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time
– John Becker
Nov 27 '18 at 5:56
Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.
– Matt Way
Nov 27 '18 at 6:49
add a comment |
What you could do is store your own lookup file. This is a separate json file that contains ids (filenames), and their associated internal date details. This would also afford you the ability to add any additional data you might need to search/sort by. The downside of this method is that you need ensure that the lookup file, and the actual files are all in sync. This means that wherever you CRUD your data, you also need to make sure the lookup is updated. An alternative to this is to build a program that does a long scan of all your files periodically, and builds the lookup file. This allows you to not have to edit the file on all changes, but does limit the up to dateness of the lookup file.
All performance gains are usually a trade off of mem/caching and complexity.
The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?
This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time
– John Becker
Nov 27 '18 at 5:56
Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.
– Matt Way
Nov 27 '18 at 6:49
add a comment |
What you could do is store your own lookup file. This is a separate json file that contains ids (filenames), and their associated internal date details. This would also afford you the ability to add any additional data you might need to search/sort by. The downside of this method is that you need ensure that the lookup file, and the actual files are all in sync. This means that wherever you CRUD your data, you also need to make sure the lookup is updated. An alternative to this is to build a program that does a long scan of all your files periodically, and builds the lookup file. This allows you to not have to edit the file on all changes, but does limit the up to dateness of the lookup file.
All performance gains are usually a trade off of mem/caching and complexity.
The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?
What you could do is store your own lookup file. This is a separate json file that contains ids (filenames), and their associated internal date details. This would also afford you the ability to add any additional data you might need to search/sort by. The downside of this method is that you need ensure that the lookup file, and the actual files are all in sync. This means that wherever you CRUD your data, you also need to make sure the lookup is updated. An alternative to this is to build a program that does a long scan of all your files periodically, and builds the lookup file. This allows you to not have to edit the file on all changes, but does limit the up to dateness of the lookup file.
All performance gains are usually a trade off of mem/caching and complexity.
The only other important question, is have you actually tested the performance of your system for real bottlenecks? Are you sure you even need to optimise this?
answered Nov 26 '18 at 4:56
Matt WayMatt Way
23.2k76069
23.2k76069
This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time
– John Becker
Nov 27 '18 at 5:56
Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.
– Matt Way
Nov 27 '18 at 6:49
add a comment |
This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time
– John Becker
Nov 27 '18 at 5:56
Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.
– Matt Way
Nov 27 '18 at 6:49
This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time
– John Becker
Nov 27 '18 at 5:56
This would work perfectly. I havent been able to test it because I dont have the data set yet but im going to be pulling data from it about once a min and if the data set is around a gig like I expect it to be 80-160 MB/s would be a realy long wait time
– John Becker
Nov 27 '18 at 5:56
Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.
– Matt Way
Nov 27 '18 at 6:49
Cool. The question then becomes, how often is the dataset updated, and how is it updated? If it updates often, and you need very up to date and accurate results, then you'll have to incorporate lookup updating into the actual data update process. If the dataset is updated periodically (or you don't need very up to date data), then I would just start with a full lookup build process.
– Matt Way
Nov 27 '18 at 6:49
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472414%2faccessing-files-newest-to-oldest-in-a-efficient-way%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you be more specific? What do you mean by access? Loading them from the file system in date order? Have you tried naming the actual files with the date in the name?
– Matt Way
Nov 25 '18 at 22:54
Basically I want to be able get all files in an array that fit a parameter like all files from time 1543192032682 to 1543191032682 but I don't want to have to open every single file and parse it to check the time on it (because I expect this it get around 0.5-1 gb in size and without a ssd that gets to be slow). I could name them based on time but then that causes a new problem because they are all named by their id which I also need to be able to get them by.
– John Becker
Nov 26 '18 at 0:30