GitHub repo tree generator

I have built a generator that, using the GitHub API, creates a dictionary containing a tree of all the resources in any GitHub repo. It uses the function git_tree, which takes one argument, a repo-string (repo-author/repo-name). For example, this repo only contains one file, a README. If we pass in the repo-string, and format the end result, this is what we get:

>> json.dumps(git_tree("githubtraining/github-move"), indent=3)

{

   "github-move": {

      "files": [

         "README.md"

      ],

      "dirs": {}

   }

}

You can see that it returns a dict with a key/value pair of {repo_name: tree}. So the github_move item contains a list of all files in that directory, and a dict with more nested directories. Obviously, in this repository there aren't any other directories, so that dict is just blank.

For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files list and dirs dict.

Here's the code (repl.it online program for testing):

import requests

from pprint import pprint

from functools import reduce

import operator

import json

from itertools import chain, repeat, islice



class GitError(Exception): pass



def intersperse(delimiter, seq):

  return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))



def _get_from_dict(dataDict, mapList):

  return reduce(operator.getitem, mapList, dataDict)



def _append_in_dict(dataDict, mapList, value):

  _get_from_dict(dataDict, mapList[:-1]).append(value)



def _get_sha(author, repo):

  try:

    return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']

  except KeyError as ex:

    raise GitError("Invalid author or repo name") from ex



def _get_git_tree(author, repo):

  return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]



def git_tree(repostring):

  author, repo = repostring.split("/")

  tree = {repo: {"files": , "dirs": {}}}

  for token in _get_git_tree(author, repo):

    if token["type"] == "tree" and "/" not in token["path"]:

      tree[repo]["dirs"].update({token["path"]: {}})

      tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})

    elif token["type"] == "tree" and "/" in token["path"]:

      temp_dict = {}

      a = list(reversed(token["path"].split("/")))

      for k in a[:-1]:

        temp_dict = {k: {"files": , "dirs": temp_dict}}

      tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict

    elif token["type"] == "blob":

      path = token["path"].split("/")

      if len(path) == 1:

        tree[repo]["files"].append(path[0])

      else:

        dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]

        _append_in_dict(tree, dict_path, dict_path[-1])

  return tree



print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))

(The json.dumps is just there for easy viewing, it can be ommited).

My questions:

Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?

Do I have any unnecessary code in there?

Is there anything else you deem wrong with the program?

edited 3 mins ago

200_success

128k15150412

asked 12 hours ago

connectyourcharger

1215

add a comment |

>> json.dumps(git_tree("githubtraining/github-move"), indent=3)

{

   "github-move": {

      "files": [

         "README.md"

      ],

      "dirs": {}

   }

}

For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files list and dirs dict.

Here's the code (repl.it online program for testing):

import requests

from pprint import pprint

from functools import reduce

import operator

import json

from itertools import chain, repeat, islice



class GitError(Exception): pass



def intersperse(delimiter, seq):

  return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))



def _get_from_dict(dataDict, mapList):

  return reduce(operator.getitem, mapList, dataDict)



def _append_in_dict(dataDict, mapList, value):

  _get_from_dict(dataDict, mapList[:-1]).append(value)



def _get_sha(author, repo):

  try:

    return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']

  except KeyError as ex:

    raise GitError("Invalid author or repo name") from ex



def _get_git_tree(author, repo):

  return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]



def git_tree(repostring):

  author, repo = repostring.split("/")

  tree = {repo: {"files": , "dirs": {}}}

  for token in _get_git_tree(author, repo):

    if token["type"] == "tree" and "/" not in token["path"]:

      tree[repo]["dirs"].update({token["path"]: {}})

      tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})

    elif token["type"] == "tree" and "/" in token["path"]:

      temp_dict = {}

      a = list(reversed(token["path"].split("/")))

      for k in a[:-1]:

        temp_dict = {k: {"files": , "dirs": temp_dict}}

      tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict

    elif token["type"] == "blob":

      path = token["path"].split("/")

      if len(path) == 1:

        tree[repo]["files"].append(path[0])

      else:

        dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]

        _append_in_dict(tree, dict_path, dict_path[-1])

  return tree



print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))

(The json.dumps is just there for easy viewing, it can be ommited).

My questions:

Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?

Do I have any unnecessary code in there?

Is there anything else you deem wrong with the program?

edited 3 mins ago

200_success

128k15150412

asked 12 hours ago

connectyourcharger

1215

add a comment |

>> json.dumps(git_tree("githubtraining/github-move"), indent=3)

{

   "github-move": {

      "files": [

         "README.md"

      ],

      "dirs": {}

   }

}

For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files list and dirs dict.

Here's the code (repl.it online program for testing):

import requests

from pprint import pprint

from functools import reduce

import operator

import json

from itertools import chain, repeat, islice



class GitError(Exception): pass



def intersperse(delimiter, seq):

  return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))



def _get_from_dict(dataDict, mapList):

  return reduce(operator.getitem, mapList, dataDict)



def _append_in_dict(dataDict, mapList, value):

  _get_from_dict(dataDict, mapList[:-1]).append(value)



def _get_sha(author, repo):

  try:

    return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']

  except KeyError as ex:

    raise GitError("Invalid author or repo name") from ex



def _get_git_tree(author, repo):

  return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]



def git_tree(repostring):

  author, repo = repostring.split("/")

  tree = {repo: {"files": , "dirs": {}}}

  for token in _get_git_tree(author, repo):

    if token["type"] == "tree" and "/" not in token["path"]:

      tree[repo]["dirs"].update({token["path"]: {}})

      tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})

    elif token["type"] == "tree" and "/" in token["path"]:

      temp_dict = {}

      a = list(reversed(token["path"].split("/")))

      for k in a[:-1]:

        temp_dict = {k: {"files": , "dirs": temp_dict}}

      tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict

    elif token["type"] == "blob":

      path = token["path"].split("/")

      if len(path) == 1:

        tree[repo]["files"].append(path[0])

      else:

        dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]

        _append_in_dict(tree, dict_path, dict_path[-1])

  return tree



print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))

(The json.dumps is just there for easy viewing, it can be ommited).

My questions:

Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?

Do I have any unnecessary code in there?

Is there anything else you deem wrong with the program?

edited 3 mins ago

200_success

128k15150412

asked 12 hours ago

connectyourcharger

1215

>> json.dumps(git_tree("githubtraining/github-move"), indent=3)

{

   "github-move": {

      "files": [

         "README.md"

      ],

      "dirs": {}

   }

}

For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files list and dirs dict.

Here's the code (repl.it online program for testing):

import requests

from pprint import pprint

from functools import reduce

import operator

import json

from itertools import chain, repeat, islice



class GitError(Exception): pass



def intersperse(delimiter, seq):

  return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))



def _get_from_dict(dataDict, mapList):

  return reduce(operator.getitem, mapList, dataDict)



def _append_in_dict(dataDict, mapList, value):

  _get_from_dict(dataDict, mapList[:-1]).append(value)



def _get_sha(author, repo):

  try:

    return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']

  except KeyError as ex:

    raise GitError("Invalid author or repo name") from ex



def _get_git_tree(author, repo):

  return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]



def git_tree(repostring):

  author, repo = repostring.split("/")

  tree = {repo: {"files": , "dirs": {}}}

  for token in _get_git_tree(author, repo):

    if token["type"] == "tree" and "/" not in token["path"]:

      tree[repo]["dirs"].update({token["path"]: {}})

      tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})

    elif token["type"] == "tree" and "/" in token["path"]:

      temp_dict = {}

      a = list(reversed(token["path"].split("/")))

      for k in a[:-1]:

        temp_dict = {k: {"files": , "dirs": temp_dict}}

      tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict

    elif token["type"] == "blob":

      path = token["path"].split("/")

      if len(path) == 1:

        tree[repo]["files"].append(path[0])

      else:

        dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]

        _append_in_dict(tree, dict_path, dict_path[-1])

  return tree



print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))

(The json.dumps is just there for easy viewing, it can be ommited).

My questions:

Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?

Do I have any unnecessary code in there?

Is there anything else you deem wrong with the program?

python json api git

edited 3 mins ago

200_success

128k15150412

asked 12 hours ago

connectyourcharger

1215

edited 3 mins ago

200_success

128k15150412

asked 12 hours ago

connectyourcharger

1215

edited 3 mins ago

200_success

128k15150412

edited 3 mins ago

200_success

128k15150412

edited 3 mins ago

200_success

128k15150412

asked 12 hours ago

connectyourcharger

1215

asked 12 hours ago

connectyourcharger

1215

asked 12 hours ago

connectyourcharger

1215

add a comment |

1 Answer
1

active

oldest

votes

I see nothing wrong with the code itself, but you're in dire need of docstrings. Arcane functional one-liners like the one in intersperse are impenetrable unless they're well-documented. You'd also benefit from splitting that one into multiple lines.

For strings like this:

    return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']

consider rewriting your format call as an f-string; i.e.

f'https://api.github.com/repos/{author}/{repo}/branches/master'

answered 8 hours ago

Reinderien

2,809719

The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago

But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago

My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210435%2fgithub-repo-tree-generator%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

For strings like this:

    return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']

consider rewriting your format call as an f-string; i.e.

f'https://api.github.com/repos/{author}/{repo}/branches/master'

answered 8 hours ago

Reinderien

2,809719

The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago

But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago

My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago

add a comment |

For strings like this:

    return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']

consider rewriting your format call as an f-string; i.e.

f'https://api.github.com/repos/{author}/{repo}/branches/master'

answered 8 hours ago

Reinderien

2,809719

The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago

But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago

My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago

add a comment |

For strings like this:

    return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']

consider rewriting your format call as an f-string; i.e.

f'https://api.github.com/repos/{author}/{repo}/branches/master'

answered 8 hours ago

Reinderien

2,809719

For strings like this:

    return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']

consider rewriting your format call as an f-string; i.e.

f'https://api.github.com/repos/{author}/{repo}/branches/master'

answered 8 hours ago

Reinderien

2,809719

answered 8 hours ago

Reinderien

2,809719

answered 8 hours ago

Reinderien

2,809719

answered 8 hours ago

Reinderien

2,809719

The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago

But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago

My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago

add a comment |

The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago

But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago

My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago

The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
– connectyourcharger
8 hours ago

But great advice otherwise! Yeah, I probably should have some comments in there.
– connectyourcharger
8 hours ago

My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
– Reinderien
8 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk