How can I host a large list in my 8G DDR3 RAM?











up vote
2
down vote

favorite
1












I am new to python and just wondering how memory allocation works there.
It turns out that one way to measure the size of a variable stored is to use sys.getsizeof(x) and it will return the number of bytes that are occupied by x in the memory, right? The following is an example code:



import struct
import sys

x = struct.pack('<L', 0xffffffff)
print(len(x))
print(sys.getsizeof(x))


which gives:



4
37


The variable x that I have just created is a 4-byte string and the first question rises here. Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?



The story gets more complicated when I start to create a list of 2 * 4-byte strings.
Bellow you will find another few lines:



import struct
import sys

k = 2
rng = range(0, k)

x = [b''] * k

for i in rng:
x[i] = struct.pack('<L', 0xffffffff)

print(len(x))
print(len(x[0]))
print(sys.getsizeof(x))
print(sys.getsizeof(x[0]))


from which I get:



2
4
80
37


Another question is that why when I store two 4-byte strings in a list the total sum of the memory allocated to them is not equal to the sum of their solo sizes?! That is 37 + 37 != 80. What are those extra 6 bytes for?



Lets enlarge k to 10000, the previous code gives:



10000
4
80064
37


Here the difference rises dramatically when comparing the solo size to the whole: 37 * 10000 = 370000 != 80064. It looks like that each item in the list is now occupying 80064/10000 = 8.0064 bytes. Sounds feasible but I still cannot address previously shown conflicts.



After all, the main question of mine is that when I rise k to 0xffffffff and expect to get a list of size ~ 8 * 0xffffffff = 34359738360 I actually encounter an exception of MemoryError. Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?










share|improve this question






















  • Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
    – cricket_007
    17 mins ago

















up vote
2
down vote

favorite
1












I am new to python and just wondering how memory allocation works there.
It turns out that one way to measure the size of a variable stored is to use sys.getsizeof(x) and it will return the number of bytes that are occupied by x in the memory, right? The following is an example code:



import struct
import sys

x = struct.pack('<L', 0xffffffff)
print(len(x))
print(sys.getsizeof(x))


which gives:



4
37


The variable x that I have just created is a 4-byte string and the first question rises here. Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?



The story gets more complicated when I start to create a list of 2 * 4-byte strings.
Bellow you will find another few lines:



import struct
import sys

k = 2
rng = range(0, k)

x = [b''] * k

for i in rng:
x[i] = struct.pack('<L', 0xffffffff)

print(len(x))
print(len(x[0]))
print(sys.getsizeof(x))
print(sys.getsizeof(x[0]))


from which I get:



2
4
80
37


Another question is that why when I store two 4-byte strings in a list the total sum of the memory allocated to them is not equal to the sum of their solo sizes?! That is 37 + 37 != 80. What are those extra 6 bytes for?



Lets enlarge k to 10000, the previous code gives:



10000
4
80064
37


Here the difference rises dramatically when comparing the solo size to the whole: 37 * 10000 = 370000 != 80064. It looks like that each item in the list is now occupying 80064/10000 = 8.0064 bytes. Sounds feasible but I still cannot address previously shown conflicts.



After all, the main question of mine is that when I rise k to 0xffffffff and expect to get a list of size ~ 8 * 0xffffffff = 34359738360 I actually encounter an exception of MemoryError. Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?










share|improve this question






















  • Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
    – cricket_007
    17 mins ago















up vote
2
down vote

favorite
1









up vote
2
down vote

favorite
1






1





I am new to python and just wondering how memory allocation works there.
It turns out that one way to measure the size of a variable stored is to use sys.getsizeof(x) and it will return the number of bytes that are occupied by x in the memory, right? The following is an example code:



import struct
import sys

x = struct.pack('<L', 0xffffffff)
print(len(x))
print(sys.getsizeof(x))


which gives:



4
37


The variable x that I have just created is a 4-byte string and the first question rises here. Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?



The story gets more complicated when I start to create a list of 2 * 4-byte strings.
Bellow you will find another few lines:



import struct
import sys

k = 2
rng = range(0, k)

x = [b''] * k

for i in rng:
x[i] = struct.pack('<L', 0xffffffff)

print(len(x))
print(len(x[0]))
print(sys.getsizeof(x))
print(sys.getsizeof(x[0]))


from which I get:



2
4
80
37


Another question is that why when I store two 4-byte strings in a list the total sum of the memory allocated to them is not equal to the sum of their solo sizes?! That is 37 + 37 != 80. What are those extra 6 bytes for?



Lets enlarge k to 10000, the previous code gives:



10000
4
80064
37


Here the difference rises dramatically when comparing the solo size to the whole: 37 * 10000 = 370000 != 80064. It looks like that each item in the list is now occupying 80064/10000 = 8.0064 bytes. Sounds feasible but I still cannot address previously shown conflicts.



After all, the main question of mine is that when I rise k to 0xffffffff and expect to get a list of size ~ 8 * 0xffffffff = 34359738360 I actually encounter an exception of MemoryError. Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?










share|improve this question













I am new to python and just wondering how memory allocation works there.
It turns out that one way to measure the size of a variable stored is to use sys.getsizeof(x) and it will return the number of bytes that are occupied by x in the memory, right? The following is an example code:



import struct
import sys

x = struct.pack('<L', 0xffffffff)
print(len(x))
print(sys.getsizeof(x))


which gives:



4
37


The variable x that I have just created is a 4-byte string and the first question rises here. Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?



The story gets more complicated when I start to create a list of 2 * 4-byte strings.
Bellow you will find another few lines:



import struct
import sys

k = 2
rng = range(0, k)

x = [b''] * k

for i in rng:
x[i] = struct.pack('<L', 0xffffffff)

print(len(x))
print(len(x[0]))
print(sys.getsizeof(x))
print(sys.getsizeof(x[0]))


from which I get:



2
4
80
37


Another question is that why when I store two 4-byte strings in a list the total sum of the memory allocated to them is not equal to the sum of their solo sizes?! That is 37 + 37 != 80. What are those extra 6 bytes for?



Lets enlarge k to 10000, the previous code gives:



10000
4
80064
37


Here the difference rises dramatically when comparing the solo size to the whole: 37 * 10000 = 370000 != 80064. It looks like that each item in the list is now occupying 80064/10000 = 8.0064 bytes. Sounds feasible but I still cannot address previously shown conflicts.



After all, the main question of mine is that when I rise k to 0xffffffff and expect to get a list of size ~ 8 * 0xffffffff = 34359738360 I actually encounter an exception of MemoryError. Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?







python python-3.x memory






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 1 hour ago









PouJa

386




386












  • Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
    – cricket_007
    17 mins ago




















  • Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
    – cricket_007
    17 mins ago


















Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
– cricket_007
17 mins ago






Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
– cricket_007
17 mins ago














1 Answer
1






active

oldest

votes

















up vote
0
down vote














Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?




All objects in Python have some amount of "slop" on a per-object basis. Note that in the case of bytes and probably all immutable stdlib types, this padding (here 33 bytes) is independent of the length of the object:



from sys import getsizeof as gso
print(gso(b'x'*1000) - gso(b''))
# 1000


Note, this is not the same as:



print(gso([b'x']*1000) - gso(b''))
# 8031


In the former, you're making a bytes object of 100 x's.



In the latter, you're making a list of 100 bytes objects. The important distinction is that in the latter, you're (a) replicating that (something. more on this later) 100 times, and incorporating the size of the list container.



Lets talk about containers:



print(gso([b'x'*100,]) - gso())


Here we print the difference between the getsizeof of a one element list (of a 100 byte long byte object) and an empty list. We're effectively taring out the size of the container.



We might expect that this is equal to getsizeof(b'x' * 100).



It is not.



The result of print(gso([b'x'*100,]) - gso()) is 8 bytes (on my machine) and is because the list contains just references/pointers to underlying objects and those 8 bytes are just that.




That is 37 + 37 != 80. What are those extra 6 bytes for?




Lets do the same thing and look at the net size, by subtracting the size of the container:



x = [b'xffxffxffxff', b'xffxffxffxff']

print(gso(x[0]) - gso(b'')) # 4
print(gso(x) - gso()) # 16


In the first, the 4 returned is just as the 1000 returned in the first, one per byte. (The length of x[0] is 4).



In the second, it's 8 bytes per reference-to-sub list. It has nothing to do with the contents of those sublists:



N = 1000
x = [b'x'] * N
y = [b'xxxx'] * N
print(gso(x) == gso(y))
# True


But while mutable containers don't seem to have a fixed "slop":



lst = 
for _ in range(100):
lst.append('-')
x = list(lst)

slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})


Output:




{'len': 1, 'slop': 88}
{'len': 2, 'slop': 88}
{'len': 3, 'slop': 88}
{'len': 4, 'slop': 88}
{'len': 5, 'slop': 88}
{'len': 6, 'slop': 88}
{'len': 7, 'slop': 88}
{'len': 8, 'slop': 96}
{'len': 9, 'slop': 120}
{'len': 10, 'slop': 120}
{'len': 11, 'slop': 120}
{'len': 12, 'slop': 120}
{'len': 13, 'slop': 120}
{'len': 14, 'slop': 120}
{'len': 15, 'slop': 120}
{'len': 16, 'slop': 128}
{'len': 17, 'slop': 128}
{'len': 18, 'slop': 128}
{'len': 19, 'slop': 128}
{'len': 20, 'slop': 128}
{'len': 21, 'slop': 128}
{'len': 22, 'slop': 128}
{'len': 23, 'slop': 128}
{'len': 24, 'slop': 136}
...


...Immutable containers do:



lst = 
for _ in range(100):
lst.append('-')
x = tuple(lst)

slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})



{'len': 1, 'slop': 48}
{'len': 2, 'slop': 48}
{'len': 3, 'slop': 48}
{'len': 4, 'slop': 48}
{'len': 5, 'slop': 48}
{'len': 6, 'slop': 48}
{'len': 7, 'slop': 48}
{'len': 8, 'slop': 48}
{'len': 9, 'slop': 48}
{'len': 10, 'slop': 48}
{'len': 11, 'slop': 48}
{'len': 12, 'slop': 48}
{'len': 13, 'slop': 48}
{'len': 14, 'slop': 48}
...



Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?




First, recall that the sizeof a container will not reflect the entire amount of memory used by Python. The ~8 bytes per element is the size of the pointer, each of those elements will consume an additional 37 (or whatever) bytes.



But the good news is that it's unlikely you probably don't need the entire list at the same time. If you're just building a list to iterate over, then generate it one element at a time, with a for loop or generator function.



Or generate it a chunk at a time, process it, and then continue, letting the garbage collector clean up the no-longer-used memory.





One other interesting thing to point out



N = 1000
x = [b'x' for _ in range(N)]
y = [b'x'] * N
print(x == y) # True
print(gso(x) == gso(y)) # False


(This is likely due to the size of y being known a priori, while the size of x is not and has been resized as it grew).






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53348835%2fhow-can-i-host-a-large-list-in-my-8g-ddr3-ram%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote














    Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?




    All objects in Python have some amount of "slop" on a per-object basis. Note that in the case of bytes and probably all immutable stdlib types, this padding (here 33 bytes) is independent of the length of the object:



    from sys import getsizeof as gso
    print(gso(b'x'*1000) - gso(b''))
    # 1000


    Note, this is not the same as:



    print(gso([b'x']*1000) - gso(b''))
    # 8031


    In the former, you're making a bytes object of 100 x's.



    In the latter, you're making a list of 100 bytes objects. The important distinction is that in the latter, you're (a) replicating that (something. more on this later) 100 times, and incorporating the size of the list container.



    Lets talk about containers:



    print(gso([b'x'*100,]) - gso())


    Here we print the difference between the getsizeof of a one element list (of a 100 byte long byte object) and an empty list. We're effectively taring out the size of the container.



    We might expect that this is equal to getsizeof(b'x' * 100).



    It is not.



    The result of print(gso([b'x'*100,]) - gso()) is 8 bytes (on my machine) and is because the list contains just references/pointers to underlying objects and those 8 bytes are just that.




    That is 37 + 37 != 80. What are those extra 6 bytes for?




    Lets do the same thing and look at the net size, by subtracting the size of the container:



    x = [b'xffxffxffxff', b'xffxffxffxff']

    print(gso(x[0]) - gso(b'')) # 4
    print(gso(x) - gso()) # 16


    In the first, the 4 returned is just as the 1000 returned in the first, one per byte. (The length of x[0] is 4).



    In the second, it's 8 bytes per reference-to-sub list. It has nothing to do with the contents of those sublists:



    N = 1000
    x = [b'x'] * N
    y = [b'xxxx'] * N
    print(gso(x) == gso(y))
    # True


    But while mutable containers don't seem to have a fixed "slop":



    lst = 
    for _ in range(100):
    lst.append('-')
    x = list(lst)

    slop = gso(x) - (8 * len(x))
    print({"len": len(x), "slop": slop})


    Output:




    {'len': 1, 'slop': 88}
    {'len': 2, 'slop': 88}
    {'len': 3, 'slop': 88}
    {'len': 4, 'slop': 88}
    {'len': 5, 'slop': 88}
    {'len': 6, 'slop': 88}
    {'len': 7, 'slop': 88}
    {'len': 8, 'slop': 96}
    {'len': 9, 'slop': 120}
    {'len': 10, 'slop': 120}
    {'len': 11, 'slop': 120}
    {'len': 12, 'slop': 120}
    {'len': 13, 'slop': 120}
    {'len': 14, 'slop': 120}
    {'len': 15, 'slop': 120}
    {'len': 16, 'slop': 128}
    {'len': 17, 'slop': 128}
    {'len': 18, 'slop': 128}
    {'len': 19, 'slop': 128}
    {'len': 20, 'slop': 128}
    {'len': 21, 'slop': 128}
    {'len': 22, 'slop': 128}
    {'len': 23, 'slop': 128}
    {'len': 24, 'slop': 136}
    ...


    ...Immutable containers do:



    lst = 
    for _ in range(100):
    lst.append('-')
    x = tuple(lst)

    slop = gso(x) - (8 * len(x))
    print({"len": len(x), "slop": slop})



    {'len': 1, 'slop': 48}
    {'len': 2, 'slop': 48}
    {'len': 3, 'slop': 48}
    {'len': 4, 'slop': 48}
    {'len': 5, 'slop': 48}
    {'len': 6, 'slop': 48}
    {'len': 7, 'slop': 48}
    {'len': 8, 'slop': 48}
    {'len': 9, 'slop': 48}
    {'len': 10, 'slop': 48}
    {'len': 11, 'slop': 48}
    {'len': 12, 'slop': 48}
    {'len': 13, 'slop': 48}
    {'len': 14, 'slop': 48}
    ...



    Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?




    First, recall that the sizeof a container will not reflect the entire amount of memory used by Python. The ~8 bytes per element is the size of the pointer, each of those elements will consume an additional 37 (or whatever) bytes.



    But the good news is that it's unlikely you probably don't need the entire list at the same time. If you're just building a list to iterate over, then generate it one element at a time, with a for loop or generator function.



    Or generate it a chunk at a time, process it, and then continue, letting the garbage collector clean up the no-longer-used memory.





    One other interesting thing to point out



    N = 1000
    x = [b'x' for _ in range(N)]
    y = [b'x'] * N
    print(x == y) # True
    print(gso(x) == gso(y)) # False


    (This is likely due to the size of y being known a priori, while the size of x is not and has been resized as it grew).






    share|improve this answer



























      up vote
      0
      down vote














      Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?




      All objects in Python have some amount of "slop" on a per-object basis. Note that in the case of bytes and probably all immutable stdlib types, this padding (here 33 bytes) is independent of the length of the object:



      from sys import getsizeof as gso
      print(gso(b'x'*1000) - gso(b''))
      # 1000


      Note, this is not the same as:



      print(gso([b'x']*1000) - gso(b''))
      # 8031


      In the former, you're making a bytes object of 100 x's.



      In the latter, you're making a list of 100 bytes objects. The important distinction is that in the latter, you're (a) replicating that (something. more on this later) 100 times, and incorporating the size of the list container.



      Lets talk about containers:



      print(gso([b'x'*100,]) - gso())


      Here we print the difference between the getsizeof of a one element list (of a 100 byte long byte object) and an empty list. We're effectively taring out the size of the container.



      We might expect that this is equal to getsizeof(b'x' * 100).



      It is not.



      The result of print(gso([b'x'*100,]) - gso()) is 8 bytes (on my machine) and is because the list contains just references/pointers to underlying objects and those 8 bytes are just that.




      That is 37 + 37 != 80. What are those extra 6 bytes for?




      Lets do the same thing and look at the net size, by subtracting the size of the container:



      x = [b'xffxffxffxff', b'xffxffxffxff']

      print(gso(x[0]) - gso(b'')) # 4
      print(gso(x) - gso()) # 16


      In the first, the 4 returned is just as the 1000 returned in the first, one per byte. (The length of x[0] is 4).



      In the second, it's 8 bytes per reference-to-sub list. It has nothing to do with the contents of those sublists:



      N = 1000
      x = [b'x'] * N
      y = [b'xxxx'] * N
      print(gso(x) == gso(y))
      # True


      But while mutable containers don't seem to have a fixed "slop":



      lst = 
      for _ in range(100):
      lst.append('-')
      x = list(lst)

      slop = gso(x) - (8 * len(x))
      print({"len": len(x), "slop": slop})


      Output:




      {'len': 1, 'slop': 88}
      {'len': 2, 'slop': 88}
      {'len': 3, 'slop': 88}
      {'len': 4, 'slop': 88}
      {'len': 5, 'slop': 88}
      {'len': 6, 'slop': 88}
      {'len': 7, 'slop': 88}
      {'len': 8, 'slop': 96}
      {'len': 9, 'slop': 120}
      {'len': 10, 'slop': 120}
      {'len': 11, 'slop': 120}
      {'len': 12, 'slop': 120}
      {'len': 13, 'slop': 120}
      {'len': 14, 'slop': 120}
      {'len': 15, 'slop': 120}
      {'len': 16, 'slop': 128}
      {'len': 17, 'slop': 128}
      {'len': 18, 'slop': 128}
      {'len': 19, 'slop': 128}
      {'len': 20, 'slop': 128}
      {'len': 21, 'slop': 128}
      {'len': 22, 'slop': 128}
      {'len': 23, 'slop': 128}
      {'len': 24, 'slop': 136}
      ...


      ...Immutable containers do:



      lst = 
      for _ in range(100):
      lst.append('-')
      x = tuple(lst)

      slop = gso(x) - (8 * len(x))
      print({"len": len(x), "slop": slop})



      {'len': 1, 'slop': 48}
      {'len': 2, 'slop': 48}
      {'len': 3, 'slop': 48}
      {'len': 4, 'slop': 48}
      {'len': 5, 'slop': 48}
      {'len': 6, 'slop': 48}
      {'len': 7, 'slop': 48}
      {'len': 8, 'slop': 48}
      {'len': 9, 'slop': 48}
      {'len': 10, 'slop': 48}
      {'len': 11, 'slop': 48}
      {'len': 12, 'slop': 48}
      {'len': 13, 'slop': 48}
      {'len': 14, 'slop': 48}
      ...



      Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?




      First, recall that the sizeof a container will not reflect the entire amount of memory used by Python. The ~8 bytes per element is the size of the pointer, each of those elements will consume an additional 37 (or whatever) bytes.



      But the good news is that it's unlikely you probably don't need the entire list at the same time. If you're just building a list to iterate over, then generate it one element at a time, with a for loop or generator function.



      Or generate it a chunk at a time, process it, and then continue, letting the garbage collector clean up the no-longer-used memory.





      One other interesting thing to point out



      N = 1000
      x = [b'x' for _ in range(N)]
      y = [b'x'] * N
      print(x == y) # True
      print(gso(x) == gso(y)) # False


      (This is likely due to the size of y being known a priori, while the size of x is not and has been resized as it grew).






      share|improve this answer

























        up vote
        0
        down vote










        up vote
        0
        down vote










        Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?




        All objects in Python have some amount of "slop" on a per-object basis. Note that in the case of bytes and probably all immutable stdlib types, this padding (here 33 bytes) is independent of the length of the object:



        from sys import getsizeof as gso
        print(gso(b'x'*1000) - gso(b''))
        # 1000


        Note, this is not the same as:



        print(gso([b'x']*1000) - gso(b''))
        # 8031


        In the former, you're making a bytes object of 100 x's.



        In the latter, you're making a list of 100 bytes objects. The important distinction is that in the latter, you're (a) replicating that (something. more on this later) 100 times, and incorporating the size of the list container.



        Lets talk about containers:



        print(gso([b'x'*100,]) - gso())


        Here we print the difference between the getsizeof of a one element list (of a 100 byte long byte object) and an empty list. We're effectively taring out the size of the container.



        We might expect that this is equal to getsizeof(b'x' * 100).



        It is not.



        The result of print(gso([b'x'*100,]) - gso()) is 8 bytes (on my machine) and is because the list contains just references/pointers to underlying objects and those 8 bytes are just that.




        That is 37 + 37 != 80. What are those extra 6 bytes for?




        Lets do the same thing and look at the net size, by subtracting the size of the container:



        x = [b'xffxffxffxff', b'xffxffxffxff']

        print(gso(x[0]) - gso(b'')) # 4
        print(gso(x) - gso()) # 16


        In the first, the 4 returned is just as the 1000 returned in the first, one per byte. (The length of x[0] is 4).



        In the second, it's 8 bytes per reference-to-sub list. It has nothing to do with the contents of those sublists:



        N = 1000
        x = [b'x'] * N
        y = [b'xxxx'] * N
        print(gso(x) == gso(y))
        # True


        But while mutable containers don't seem to have a fixed "slop":



        lst = 
        for _ in range(100):
        lst.append('-')
        x = list(lst)

        slop = gso(x) - (8 * len(x))
        print({"len": len(x), "slop": slop})


        Output:




        {'len': 1, 'slop': 88}
        {'len': 2, 'slop': 88}
        {'len': 3, 'slop': 88}
        {'len': 4, 'slop': 88}
        {'len': 5, 'slop': 88}
        {'len': 6, 'slop': 88}
        {'len': 7, 'slop': 88}
        {'len': 8, 'slop': 96}
        {'len': 9, 'slop': 120}
        {'len': 10, 'slop': 120}
        {'len': 11, 'slop': 120}
        {'len': 12, 'slop': 120}
        {'len': 13, 'slop': 120}
        {'len': 14, 'slop': 120}
        {'len': 15, 'slop': 120}
        {'len': 16, 'slop': 128}
        {'len': 17, 'slop': 128}
        {'len': 18, 'slop': 128}
        {'len': 19, 'slop': 128}
        {'len': 20, 'slop': 128}
        {'len': 21, 'slop': 128}
        {'len': 22, 'slop': 128}
        {'len': 23, 'slop': 128}
        {'len': 24, 'slop': 136}
        ...


        ...Immutable containers do:



        lst = 
        for _ in range(100):
        lst.append('-')
        x = tuple(lst)

        slop = gso(x) - (8 * len(x))
        print({"len": len(x), "slop": slop})



        {'len': 1, 'slop': 48}
        {'len': 2, 'slop': 48}
        {'len': 3, 'slop': 48}
        {'len': 4, 'slop': 48}
        {'len': 5, 'slop': 48}
        {'len': 6, 'slop': 48}
        {'len': 7, 'slop': 48}
        {'len': 8, 'slop': 48}
        {'len': 9, 'slop': 48}
        {'len': 10, 'slop': 48}
        {'len': 11, 'slop': 48}
        {'len': 12, 'slop': 48}
        {'len': 13, 'slop': 48}
        {'len': 14, 'slop': 48}
        ...



        Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?




        First, recall that the sizeof a container will not reflect the entire amount of memory used by Python. The ~8 bytes per element is the size of the pointer, each of those elements will consume an additional 37 (or whatever) bytes.



        But the good news is that it's unlikely you probably don't need the entire list at the same time. If you're just building a list to iterate over, then generate it one element at a time, with a for loop or generator function.



        Or generate it a chunk at a time, process it, and then continue, letting the garbage collector clean up the no-longer-used memory.





        One other interesting thing to point out



        N = 1000
        x = [b'x' for _ in range(N)]
        y = [b'x'] * N
        print(x == y) # True
        print(gso(x) == gso(y)) # False


        (This is likely due to the size of y being known a priori, while the size of x is not and has been resized as it grew).






        share|improve this answer















        Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?




        All objects in Python have some amount of "slop" on a per-object basis. Note that in the case of bytes and probably all immutable stdlib types, this padding (here 33 bytes) is independent of the length of the object:



        from sys import getsizeof as gso
        print(gso(b'x'*1000) - gso(b''))
        # 1000


        Note, this is not the same as:



        print(gso([b'x']*1000) - gso(b''))
        # 8031


        In the former, you're making a bytes object of 100 x's.



        In the latter, you're making a list of 100 bytes objects. The important distinction is that in the latter, you're (a) replicating that (something. more on this later) 100 times, and incorporating the size of the list container.



        Lets talk about containers:



        print(gso([b'x'*100,]) - gso())


        Here we print the difference between the getsizeof of a one element list (of a 100 byte long byte object) and an empty list. We're effectively taring out the size of the container.



        We might expect that this is equal to getsizeof(b'x' * 100).



        It is not.



        The result of print(gso([b'x'*100,]) - gso()) is 8 bytes (on my machine) and is because the list contains just references/pointers to underlying objects and those 8 bytes are just that.




        That is 37 + 37 != 80. What are those extra 6 bytes for?




        Lets do the same thing and look at the net size, by subtracting the size of the container:



        x = [b'xffxffxffxff', b'xffxffxffxff']

        print(gso(x[0]) - gso(b'')) # 4
        print(gso(x) - gso()) # 16


        In the first, the 4 returned is just as the 1000 returned in the first, one per byte. (The length of x[0] is 4).



        In the second, it's 8 bytes per reference-to-sub list. It has nothing to do with the contents of those sublists:



        N = 1000
        x = [b'x'] * N
        y = [b'xxxx'] * N
        print(gso(x) == gso(y))
        # True


        But while mutable containers don't seem to have a fixed "slop":



        lst = 
        for _ in range(100):
        lst.append('-')
        x = list(lst)

        slop = gso(x) - (8 * len(x))
        print({"len": len(x), "slop": slop})


        Output:




        {'len': 1, 'slop': 88}
        {'len': 2, 'slop': 88}
        {'len': 3, 'slop': 88}
        {'len': 4, 'slop': 88}
        {'len': 5, 'slop': 88}
        {'len': 6, 'slop': 88}
        {'len': 7, 'slop': 88}
        {'len': 8, 'slop': 96}
        {'len': 9, 'slop': 120}
        {'len': 10, 'slop': 120}
        {'len': 11, 'slop': 120}
        {'len': 12, 'slop': 120}
        {'len': 13, 'slop': 120}
        {'len': 14, 'slop': 120}
        {'len': 15, 'slop': 120}
        {'len': 16, 'slop': 128}
        {'len': 17, 'slop': 128}
        {'len': 18, 'slop': 128}
        {'len': 19, 'slop': 128}
        {'len': 20, 'slop': 128}
        {'len': 21, 'slop': 128}
        {'len': 22, 'slop': 128}
        {'len': 23, 'slop': 128}
        {'len': 24, 'slop': 136}
        ...


        ...Immutable containers do:



        lst = 
        for _ in range(100):
        lst.append('-')
        x = tuple(lst)

        slop = gso(x) - (8 * len(x))
        print({"len": len(x), "slop": slop})



        {'len': 1, 'slop': 48}
        {'len': 2, 'slop': 48}
        {'len': 3, 'slop': 48}
        {'len': 4, 'slop': 48}
        {'len': 5, 'slop': 48}
        {'len': 6, 'slop': 48}
        {'len': 7, 'slop': 48}
        {'len': 8, 'slop': 48}
        {'len': 9, 'slop': 48}
        {'len': 10, 'slop': 48}
        {'len': 11, 'slop': 48}
        {'len': 12, 'slop': 48}
        {'len': 13, 'slop': 48}
        {'len': 14, 'slop': 48}
        ...



        Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?




        First, recall that the sizeof a container will not reflect the entire amount of memory used by Python. The ~8 bytes per element is the size of the pointer, each of those elements will consume an additional 37 (or whatever) bytes.



        But the good news is that it's unlikely you probably don't need the entire list at the same time. If you're just building a list to iterate over, then generate it one element at a time, with a for loop or generator function.



        Or generate it a chunk at a time, process it, and then continue, letting the garbage collector clean up the no-longer-used memory.





        One other interesting thing to point out



        N = 1000
        x = [b'x' for _ in range(N)]
        y = [b'x'] * N
        print(x == y) # True
        print(gso(x) == gso(y)) # False


        (This is likely due to the size of y being known a priori, while the size of x is not and has been resized as it grew).







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 5 mins ago

























        answered 10 mins ago









        jedwards

        20.7k3158




        20.7k3158






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53348835%2fhow-can-i-host-a-large-list-in-my-8g-ddr3-ram%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Ottavio Pratesi

            Tricia Helfer

            15 giugno