GitHub repo tree generator
I have built a generator that, using the GitHub API, creates a dictionary containing a tree of all the resources in any GitHub repo. It uses the function git_tree
, which takes one argument, a repo-string (repo-author/repo-name
). For example, this repo only contains one file, a README
. If we pass in the repo-string, and format the end result, this is what we get:
>> json.dumps(git_tree("githubtraining/github-move"), indent=3)
{
"github-move": {
"files": [
"README.md"
],
"dirs": {}
}
}
You can see that it returns a dict
with a key/value pair of {repo_name: tree}
. So the github_move
item contains a list of all files in that directory, and a dict with more nested directories. Obviously, in this repository there aren't any other directories, so that dict is just blank.
For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files
list and dirs
dict.
Here's the code (repl.it online program for testing):
import requests
from pprint import pprint
from functools import reduce
import operator
import json
from itertools import chain, repeat, islice
class GitError(Exception): pass
def intersperse(delimiter, seq):
return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))
def _get_from_dict(dataDict, mapList):
return reduce(operator.getitem, mapList, dataDict)
def _append_in_dict(dataDict, mapList, value):
_get_from_dict(dataDict, mapList[:-1]).append(value)
def _get_sha(author, repo):
try:
return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
except KeyError as ex:
raise GitError("Invalid author or repo name") from ex
def _get_git_tree(author, repo):
return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]
def git_tree(repostring):
author, repo = repostring.split("/")
tree = {repo: {"files": , "dirs": {}}}
for token in _get_git_tree(author, repo):
if token["type"] == "tree" and "/" not in token["path"]:
tree[repo]["dirs"].update({token["path"]: {}})
tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})
elif token["type"] == "tree" and "/" in token["path"]:
temp_dict = {}
a = list(reversed(token["path"].split("/")))
for k in a[:-1]:
temp_dict = {k: {"files": , "dirs": temp_dict}}
tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict
elif token["type"] == "blob":
path = token["path"].split("/")
if len(path) == 1:
tree[repo]["files"].append(path[0])
else:
dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]
_append_in_dict(tree, dict_path, dict_path[-1])
return tree
print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))
(The json.dumps
is just there for easy viewing, it can be ommited).
My questions:
Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?
Do I have any unnecessary code in there?
Is there anything else you deem wrong with the program?
python json api git
add a comment |
I have built a generator that, using the GitHub API, creates a dictionary containing a tree of all the resources in any GitHub repo. It uses the function git_tree
, which takes one argument, a repo-string (repo-author/repo-name
). For example, this repo only contains one file, a README
. If we pass in the repo-string, and format the end result, this is what we get:
>> json.dumps(git_tree("githubtraining/github-move"), indent=3)
{
"github-move": {
"files": [
"README.md"
],
"dirs": {}
}
}
You can see that it returns a dict
with a key/value pair of {repo_name: tree}
. So the github_move
item contains a list of all files in that directory, and a dict with more nested directories. Obviously, in this repository there aren't any other directories, so that dict is just blank.
For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files
list and dirs
dict.
Here's the code (repl.it online program for testing):
import requests
from pprint import pprint
from functools import reduce
import operator
import json
from itertools import chain, repeat, islice
class GitError(Exception): pass
def intersperse(delimiter, seq):
return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))
def _get_from_dict(dataDict, mapList):
return reduce(operator.getitem, mapList, dataDict)
def _append_in_dict(dataDict, mapList, value):
_get_from_dict(dataDict, mapList[:-1]).append(value)
def _get_sha(author, repo):
try:
return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
except KeyError as ex:
raise GitError("Invalid author or repo name") from ex
def _get_git_tree(author, repo):
return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]
def git_tree(repostring):
author, repo = repostring.split("/")
tree = {repo: {"files": , "dirs": {}}}
for token in _get_git_tree(author, repo):
if token["type"] == "tree" and "/" not in token["path"]:
tree[repo]["dirs"].update({token["path"]: {}})
tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})
elif token["type"] == "tree" and "/" in token["path"]:
temp_dict = {}
a = list(reversed(token["path"].split("/")))
for k in a[:-1]:
temp_dict = {k: {"files": , "dirs": temp_dict}}
tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict
elif token["type"] == "blob":
path = token["path"].split("/")
if len(path) == 1:
tree[repo]["files"].append(path[0])
else:
dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]
_append_in_dict(tree, dict_path, dict_path[-1])
return tree
print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))
(The json.dumps
is just there for easy viewing, it can be ommited).
My questions:
Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?
Do I have any unnecessary code in there?
Is there anything else you deem wrong with the program?
python json api git
add a comment |
I have built a generator that, using the GitHub API, creates a dictionary containing a tree of all the resources in any GitHub repo. It uses the function git_tree
, which takes one argument, a repo-string (repo-author/repo-name
). For example, this repo only contains one file, a README
. If we pass in the repo-string, and format the end result, this is what we get:
>> json.dumps(git_tree("githubtraining/github-move"), indent=3)
{
"github-move": {
"files": [
"README.md"
],
"dirs": {}
}
}
You can see that it returns a dict
with a key/value pair of {repo_name: tree}
. So the github_move
item contains a list of all files in that directory, and a dict with more nested directories. Obviously, in this repository there aren't any other directories, so that dict is just blank.
For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files
list and dirs
dict.
Here's the code (repl.it online program for testing):
import requests
from pprint import pprint
from functools import reduce
import operator
import json
from itertools import chain, repeat, islice
class GitError(Exception): pass
def intersperse(delimiter, seq):
return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))
def _get_from_dict(dataDict, mapList):
return reduce(operator.getitem, mapList, dataDict)
def _append_in_dict(dataDict, mapList, value):
_get_from_dict(dataDict, mapList[:-1]).append(value)
def _get_sha(author, repo):
try:
return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
except KeyError as ex:
raise GitError("Invalid author or repo name") from ex
def _get_git_tree(author, repo):
return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]
def git_tree(repostring):
author, repo = repostring.split("/")
tree = {repo: {"files": , "dirs": {}}}
for token in _get_git_tree(author, repo):
if token["type"] == "tree" and "/" not in token["path"]:
tree[repo]["dirs"].update({token["path"]: {}})
tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})
elif token["type"] == "tree" and "/" in token["path"]:
temp_dict = {}
a = list(reversed(token["path"].split("/")))
for k in a[:-1]:
temp_dict = {k: {"files": , "dirs": temp_dict}}
tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict
elif token["type"] == "blob":
path = token["path"].split("/")
if len(path) == 1:
tree[repo]["files"].append(path[0])
else:
dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]
_append_in_dict(tree, dict_path, dict_path[-1])
return tree
print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))
(The json.dumps
is just there for easy viewing, it can be ommited).
My questions:
Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?
Do I have any unnecessary code in there?
Is there anything else you deem wrong with the program?
python json api git
I have built a generator that, using the GitHub API, creates a dictionary containing a tree of all the resources in any GitHub repo. It uses the function git_tree
, which takes one argument, a repo-string (repo-author/repo-name
). For example, this repo only contains one file, a README
. If we pass in the repo-string, and format the end result, this is what we get:
>> json.dumps(git_tree("githubtraining/github-move"), indent=3)
{
"github-move": {
"files": [
"README.md"
],
"dirs": {}
}
}
You can see that it returns a dict
with a key/value pair of {repo_name: tree}
. So the github_move
item contains a list of all files in that directory, and a dict with more nested directories. Obviously, in this repository there aren't any other directories, so that dict is just blank.
For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files
list and dirs
dict.
Here's the code (repl.it online program for testing):
import requests
from pprint import pprint
from functools import reduce
import operator
import json
from itertools import chain, repeat, islice
class GitError(Exception): pass
def intersperse(delimiter, seq):
return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))
def _get_from_dict(dataDict, mapList):
return reduce(operator.getitem, mapList, dataDict)
def _append_in_dict(dataDict, mapList, value):
_get_from_dict(dataDict, mapList[:-1]).append(value)
def _get_sha(author, repo):
try:
return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
except KeyError as ex:
raise GitError("Invalid author or repo name") from ex
def _get_git_tree(author, repo):
return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]
def git_tree(repostring):
author, repo = repostring.split("/")
tree = {repo: {"files": , "dirs": {}}}
for token in _get_git_tree(author, repo):
if token["type"] == "tree" and "/" not in token["path"]:
tree[repo]["dirs"].update({token["path"]: {}})
tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})
elif token["type"] == "tree" and "/" in token["path"]:
temp_dict = {}
a = list(reversed(token["path"].split("/")))
for k in a[:-1]:
temp_dict = {k: {"files": , "dirs": temp_dict}}
tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict
elif token["type"] == "blob":
path = token["path"].split("/")
if len(path) == 1:
tree[repo]["files"].append(path[0])
else:
dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]
_append_in_dict(tree, dict_path, dict_path[-1])
return tree
print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))
(The json.dumps
is just there for easy viewing, it can be ommited).
My questions:
Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?
Do I have any unnecessary code in there?
Is there anything else you deem wrong with the program?
python json api git
python json api git
edited 3 mins ago
200_success
128k15150412
128k15150412
asked 12 hours ago
connectyourcharger
1215
1215
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
I see nothing wrong with the code itself, but you're in dire need of docstrings. Arcane functional one-liners like the one in intersperse
are impenetrable unless they're well-documented. You'd also benefit from splitting that one into multiple lines.
For strings like this:
return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
consider rewriting your format
call as an f-string; i.e.
f'https://api.github.com/repos/{author}/{repo}/branches/master'
The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago
But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago
My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210435%2fgithub-repo-tree-generator%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I see nothing wrong with the code itself, but you're in dire need of docstrings. Arcane functional one-liners like the one in intersperse
are impenetrable unless they're well-documented. You'd also benefit from splitting that one into multiple lines.
For strings like this:
return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
consider rewriting your format
call as an f-string; i.e.
f'https://api.github.com/repos/{author}/{repo}/branches/master'
The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago
But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago
My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago
add a comment |
I see nothing wrong with the code itself, but you're in dire need of docstrings. Arcane functional one-liners like the one in intersperse
are impenetrable unless they're well-documented. You'd also benefit from splitting that one into multiple lines.
For strings like this:
return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
consider rewriting your format
call as an f-string; i.e.
f'https://api.github.com/repos/{author}/{repo}/branches/master'
The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago
But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago
My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago
add a comment |
I see nothing wrong with the code itself, but you're in dire need of docstrings. Arcane functional one-liners like the one in intersperse
are impenetrable unless they're well-documented. You'd also benefit from splitting that one into multiple lines.
For strings like this:
return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
consider rewriting your format
call as an f-string; i.e.
f'https://api.github.com/repos/{author}/{repo}/branches/master'
I see nothing wrong with the code itself, but you're in dire need of docstrings. Arcane functional one-liners like the one in intersperse
are impenetrable unless they're well-documented. You'd also benefit from splitting that one into multiple lines.
For strings like this:
return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
consider rewriting your format
call as an f-string; i.e.
f'https://api.github.com/repos/{author}/{repo}/branches/master'
answered 8 hours ago
Reinderien
2,809719
2,809719
The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago
But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago
My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago
add a comment |
The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago
But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago
My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago
The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago
The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago
But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago
But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago
My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago
My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210435%2fgithub-repo-tree-generator%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown