GitHub repo tree generator












1














I have built a generator that, using the GitHub API, creates a dictionary containing a tree of all the resources in any GitHub repo. It uses the function git_tree, which takes one argument, a repo-string (repo-author/repo-name). For example, this repo only contains one file, a README. If we pass in the repo-string, and format the end result, this is what we get:



>> json.dumps(git_tree("githubtraining/github-move"), indent=3)
{
"github-move": {
"files": [
"README.md"
],
"dirs": {}
}
}


You can see that it returns a dict with a key/value pair of {repo_name: tree}. So the github_move item contains a list of all files in that directory, and a dict with more nested directories. Obviously, in this repository there aren't any other directories, so that dict is just blank.



For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files list and dirs dict.



Here's the code (repl.it online program for testing):



import requests
from pprint import pprint
from functools import reduce
import operator
import json
from itertools import chain, repeat, islice

class GitError(Exception): pass

def intersperse(delimiter, seq):
return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))

def _get_from_dict(dataDict, mapList):
return reduce(operator.getitem, mapList, dataDict)

def _append_in_dict(dataDict, mapList, value):
_get_from_dict(dataDict, mapList[:-1]).append(value)

def _get_sha(author, repo):
try:
return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
except KeyError as ex:
raise GitError("Invalid author or repo name") from ex

def _get_git_tree(author, repo):
return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]

def git_tree(repostring):
author, repo = repostring.split("/")
tree = {repo: {"files": , "dirs": {}}}
for token in _get_git_tree(author, repo):
if token["type"] == "tree" and "/" not in token["path"]:
tree[repo]["dirs"].update({token["path"]: {}})
tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})
elif token["type"] == "tree" and "/" in token["path"]:
temp_dict = {}
a = list(reversed(token["path"].split("/")))
for k in a[:-1]:
temp_dict = {k: {"files": , "dirs": temp_dict}}
tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict
elif token["type"] == "blob":
path = token["path"].split("/")
if len(path) == 1:
tree[repo]["files"].append(path[0])
else:
dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]
_append_in_dict(tree, dict_path, dict_path[-1])
return tree

print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))


(The json.dumps is just there for easy viewing, it can be ommited).



My questions:




  1. Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?


  2. Do I have any unnecessary code in there?


  3. Is there anything else you deem wrong with the program?











share|improve this question





























    1














    I have built a generator that, using the GitHub API, creates a dictionary containing a tree of all the resources in any GitHub repo. It uses the function git_tree, which takes one argument, a repo-string (repo-author/repo-name). For example, this repo only contains one file, a README. If we pass in the repo-string, and format the end result, this is what we get:



    >> json.dumps(git_tree("githubtraining/github-move"), indent=3)
    {
    "github-move": {
    "files": [
    "README.md"
    ],
    "dirs": {}
    }
    }


    You can see that it returns a dict with a key/value pair of {repo_name: tree}. So the github_move item contains a list of all files in that directory, and a dict with more nested directories. Obviously, in this repository there aren't any other directories, so that dict is just blank.



    For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files list and dirs dict.



    Here's the code (repl.it online program for testing):



    import requests
    from pprint import pprint
    from functools import reduce
    import operator
    import json
    from itertools import chain, repeat, islice

    class GitError(Exception): pass

    def intersperse(delimiter, seq):
    return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))

    def _get_from_dict(dataDict, mapList):
    return reduce(operator.getitem, mapList, dataDict)

    def _append_in_dict(dataDict, mapList, value):
    _get_from_dict(dataDict, mapList[:-1]).append(value)

    def _get_sha(author, repo):
    try:
    return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
    except KeyError as ex:
    raise GitError("Invalid author or repo name") from ex

    def _get_git_tree(author, repo):
    return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]

    def git_tree(repostring):
    author, repo = repostring.split("/")
    tree = {repo: {"files": , "dirs": {}}}
    for token in _get_git_tree(author, repo):
    if token["type"] == "tree" and "/" not in token["path"]:
    tree[repo]["dirs"].update({token["path"]: {}})
    tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})
    elif token["type"] == "tree" and "/" in token["path"]:
    temp_dict = {}
    a = list(reversed(token["path"].split("/")))
    for k in a[:-1]:
    temp_dict = {k: {"files": , "dirs": temp_dict}}
    tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict
    elif token["type"] == "blob":
    path = token["path"].split("/")
    if len(path) == 1:
    tree[repo]["files"].append(path[0])
    else:
    dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]
    _append_in_dict(tree, dict_path, dict_path[-1])
    return tree

    print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))


    (The json.dumps is just there for easy viewing, it can be ommited).



    My questions:




    1. Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?


    2. Do I have any unnecessary code in there?


    3. Is there anything else you deem wrong with the program?











    share|improve this question



























      1












      1








      1


      2





      I have built a generator that, using the GitHub API, creates a dictionary containing a tree of all the resources in any GitHub repo. It uses the function git_tree, which takes one argument, a repo-string (repo-author/repo-name). For example, this repo only contains one file, a README. If we pass in the repo-string, and format the end result, this is what we get:



      >> json.dumps(git_tree("githubtraining/github-move"), indent=3)
      {
      "github-move": {
      "files": [
      "README.md"
      ],
      "dirs": {}
      }
      }


      You can see that it returns a dict with a key/value pair of {repo_name: tree}. So the github_move item contains a list of all files in that directory, and a dict with more nested directories. Obviously, in this repository there aren't any other directories, so that dict is just blank.



      For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files list and dirs dict.



      Here's the code (repl.it online program for testing):



      import requests
      from pprint import pprint
      from functools import reduce
      import operator
      import json
      from itertools import chain, repeat, islice

      class GitError(Exception): pass

      def intersperse(delimiter, seq):
      return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))

      def _get_from_dict(dataDict, mapList):
      return reduce(operator.getitem, mapList, dataDict)

      def _append_in_dict(dataDict, mapList, value):
      _get_from_dict(dataDict, mapList[:-1]).append(value)

      def _get_sha(author, repo):
      try:
      return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
      except KeyError as ex:
      raise GitError("Invalid author or repo name") from ex

      def _get_git_tree(author, repo):
      return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]

      def git_tree(repostring):
      author, repo = repostring.split("/")
      tree = {repo: {"files": , "dirs": {}}}
      for token in _get_git_tree(author, repo):
      if token["type"] == "tree" and "/" not in token["path"]:
      tree[repo]["dirs"].update({token["path"]: {}})
      tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})
      elif token["type"] == "tree" and "/" in token["path"]:
      temp_dict = {}
      a = list(reversed(token["path"].split("/")))
      for k in a[:-1]:
      temp_dict = {k: {"files": , "dirs": temp_dict}}
      tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict
      elif token["type"] == "blob":
      path = token["path"].split("/")
      if len(path) == 1:
      tree[repo]["files"].append(path[0])
      else:
      dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]
      _append_in_dict(tree, dict_path, dict_path[-1])
      return tree

      print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))


      (The json.dumps is just there for easy viewing, it can be ommited).



      My questions:




      1. Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?


      2. Do I have any unnecessary code in there?


      3. Is there anything else you deem wrong with the program?











      share|improve this question















      I have built a generator that, using the GitHub API, creates a dictionary containing a tree of all the resources in any GitHub repo. It uses the function git_tree, which takes one argument, a repo-string (repo-author/repo-name). For example, this repo only contains one file, a README. If we pass in the repo-string, and format the end result, this is what we get:



      >> json.dumps(git_tree("githubtraining/github-move"), indent=3)
      {
      "github-move": {
      "files": [
      "README.md"
      ],
      "dirs": {}
      }
      }


      You can see that it returns a dict with a key/value pair of {repo_name: tree}. So the github_move item contains a list of all files in that directory, and a dict with more nested directories. Obviously, in this repository there aren't any other directories, so that dict is just blank.



      For sake of purpose, here is the tree of this repo (it was too long to put in the post). You can see each directory and subdirectory has its own files list and dirs dict.



      Here's the code (repl.it online program for testing):



      import requests
      from pprint import pprint
      from functools import reduce
      import operator
      import json
      from itertools import chain, repeat, islice

      class GitError(Exception): pass

      def intersperse(delimiter, seq):
      return list(islice(chain.from_iterable(zip(repeat(delimiter), seq)), 1, None))

      def _get_from_dict(dataDict, mapList):
      return reduce(operator.getitem, mapList, dataDict)

      def _append_in_dict(dataDict, mapList, value):
      _get_from_dict(dataDict, mapList[:-1]).append(value)

      def _get_sha(author, repo):
      try:
      return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']
      except KeyError as ex:
      raise GitError("Invalid author or repo name") from ex

      def _get_git_tree(author, repo):
      return requests.get("https://api.github.com/repos/{}/{}/git/trees/{}?recursive=1".format(author, repo, _get_sha(author, repo))).json()["tree"]

      def git_tree(repostring):
      author, repo = repostring.split("/")
      tree = {repo: {"files": , "dirs": {}}}
      for token in _get_git_tree(author, repo):
      if token["type"] == "tree" and "/" not in token["path"]:
      tree[repo]["dirs"].update({token["path"]: {}})
      tree[repo]["dirs"][token["path"]].update({"files": , "dirs": {}})
      elif token["type"] == "tree" and "/" in token["path"]:
      temp_dict = {}
      a = list(reversed(token["path"].split("/")))
      for k in a[:-1]:
      temp_dict = {k: {"files": , "dirs": temp_dict}}
      tree[repo]["dirs"][a[-1]]["dirs"] = temp_dict
      elif token["type"] == "blob":
      path = token["path"].split("/")
      if len(path) == 1:
      tree[repo]["files"].append(path[0])
      else:
      dict_path = [repo, "dirs"] + intersperse("dirs", path[:-1]) + ["files", path[-1]]
      _append_in_dict(tree, dict_path, dict_path[-1])
      return tree

      print(json.dumps(git_tree("githubtraining/caption-this"), indent=3))


      (The json.dumps is just there for easy viewing, it can be ommited).



      My questions:




      1. Is it too messy? I look back at this function and it looks a bit cluttered and all over the place. Is that the case, or am I just going crazy?


      2. Do I have any unnecessary code in there?


      3. Is there anything else you deem wrong with the program?








      python json api git






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 3 mins ago









      200_success

      128k15150412




      128k15150412










      asked 12 hours ago









      connectyourcharger

      1215




      1215






















          1 Answer
          1






          active

          oldest

          votes


















          1














          I see nothing wrong with the code itself, but you're in dire need of docstrings. Arcane functional one-liners like the one in intersperse are impenetrable unless they're well-documented. You'd also benefit from splitting that one into multiple lines.



          For strings like this:



              return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']


          consider rewriting your format call as an f-string; i.e.



          f'https://api.github.com/repos/{author}/{repo}/branches/master'





          share|improve this answer





















          • The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
            – connectyourcharger
            8 hours ago












          • But great advice otherwise! Yeah, I probably should have some comments in there.
            – connectyourcharger
            8 hours ago










          • My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
            – Reinderien
            8 hours ago













          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210435%2fgithub-repo-tree-generator%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          I see nothing wrong with the code itself, but you're in dire need of docstrings. Arcane functional one-liners like the one in intersperse are impenetrable unless they're well-documented. You'd also benefit from splitting that one into multiple lines.



          For strings like this:



              return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']


          consider rewriting your format call as an f-string; i.e.



          f'https://api.github.com/repos/{author}/{repo}/branches/master'





          share|improve this answer





















          • The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
            – connectyourcharger
            8 hours ago












          • But great advice otherwise! Yeah, I probably should have some comments in there.
            – connectyourcharger
            8 hours ago










          • My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
            – Reinderien
            8 hours ago


















          1














          I see nothing wrong with the code itself, but you're in dire need of docstrings. Arcane functional one-liners like the one in intersperse are impenetrable unless they're well-documented. You'd also benefit from splitting that one into multiple lines.



          For strings like this:



              return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']


          consider rewriting your format call as an f-string; i.e.



          f'https://api.github.com/repos/{author}/{repo}/branches/master'





          share|improve this answer





















          • The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
            – connectyourcharger
            8 hours ago












          • But great advice otherwise! Yeah, I probably should have some comments in there.
            – connectyourcharger
            8 hours ago










          • My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
            – Reinderien
            8 hours ago
















          1












          1








          1






          I see nothing wrong with the code itself, but you're in dire need of docstrings. Arcane functional one-liners like the one in intersperse are impenetrable unless they're well-documented. You'd also benefit from splitting that one into multiple lines.



          For strings like this:



              return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']


          consider rewriting your format call as an f-string; i.e.



          f'https://api.github.com/repos/{author}/{repo}/branches/master'





          share|improve this answer












          I see nothing wrong with the code itself, but you're in dire need of docstrings. Arcane functional one-liners like the one in intersperse are impenetrable unless they're well-documented. You'd also benefit from splitting that one into multiple lines.



          For strings like this:



              return requests.get('https://api.github.com/repos/{}/{}/branches/master'.format(author, repo)).json()['commit']['commit']['tree']['sha']


          consider rewriting your format call as an f-string; i.e.



          f'https://api.github.com/repos/{author}/{repo}/branches/master'






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 8 hours ago









          Reinderien

          2,809719




          2,809719












          • The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
            – connectyourcharger
            8 hours ago












          • But great advice otherwise! Yeah, I probably should have some comments in there.
            – connectyourcharger
            8 hours ago










          • My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
            – Reinderien
            8 hours ago




















          • The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
            – connectyourcharger
            8 hours ago












          • But great advice otherwise! Yeah, I probably should have some comments in there.
            – connectyourcharger
            8 hours ago










          • My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
            – Reinderien
            8 hours ago


















          The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
          – connectyourcharger
          8 hours ago






          The only reason I use formats instead of f-strings is for compatibility for Python 3.x, not just > 3.5.
          – connectyourcharger
          8 hours ago














          But great advice otherwise! Yeah, I probably should have some comments in there.
          – connectyourcharger
          8 hours ago




          But great advice otherwise! Yeah, I probably should have some comments in there.
          – connectyourcharger
          8 hours ago












          My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
          – Reinderien
          8 hours ago






          My philosophy is: if you've made it past the 2x hurdle, you're allowed to target modern 3x, consequences be damned. But ¯_(ツ)_/¯
          – Reinderien
          8 hours ago




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210435%2fgithub-repo-tree-generator%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Create new schema in PostgreSQL using DBeaver

          Deepest pit of an array with Javascript: test on Codility

          Fotorealismo