How can I host a large list in my 8G DDR3 RAM?
up vote
2
down vote
favorite
I am new to python and just wondering how memory allocation works there.
It turns out that one way to measure the size of a variable stored is to use sys.getsizeof(x) and it will return the number of bytes that are occupied by x in the memory, right? The following is an example code:
import struct
import sys
x = struct.pack('<L', 0xffffffff)
print(len(x))
print(sys.getsizeof(x))
which gives:
4
37
The variable x that I have just created is a 4-byte string and the first question rises here. Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?
The story gets more complicated when I start to create a list of 2 * 4-byte strings.
Bellow you will find another few lines:
import struct
import sys
k = 2
rng = range(0, k)
x = [b''] * k
for i in rng:
x[i] = struct.pack('<L', 0xffffffff)
print(len(x))
print(len(x[0]))
print(sys.getsizeof(x))
print(sys.getsizeof(x[0]))
from which I get:
2
4
80
37
Another question is that why when I store two 4-byte strings in a list the total sum of the memory allocated to them is not equal to the sum of their solo sizes?! That is 37 + 37 != 80. What are those extra 6 bytes for?
Lets enlarge k to 10000, the previous code gives:
10000
4
80064
37
Here the difference rises dramatically when comparing the solo size to the whole: 37 * 10000 = 370000 != 80064. It looks like that each item in the list is now occupying 80064/10000 = 8.0064 bytes. Sounds feasible but I still cannot address previously shown conflicts.
After all, the main question of mine is that when I rise k to 0xffffffff and expect to get a list of size ~ 8 * 0xffffffff = 34359738360 I actually encounter an exception of MemoryError. Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?
python python-3.x memory
add a comment |
up vote
2
down vote
favorite
I am new to python and just wondering how memory allocation works there.
It turns out that one way to measure the size of a variable stored is to use sys.getsizeof(x) and it will return the number of bytes that are occupied by x in the memory, right? The following is an example code:
import struct
import sys
x = struct.pack('<L', 0xffffffff)
print(len(x))
print(sys.getsizeof(x))
which gives:
4
37
The variable x that I have just created is a 4-byte string and the first question rises here. Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?
The story gets more complicated when I start to create a list of 2 * 4-byte strings.
Bellow you will find another few lines:
import struct
import sys
k = 2
rng = range(0, k)
x = [b''] * k
for i in rng:
x[i] = struct.pack('<L', 0xffffffff)
print(len(x))
print(len(x[0]))
print(sys.getsizeof(x))
print(sys.getsizeof(x[0]))
from which I get:
2
4
80
37
Another question is that why when I store two 4-byte strings in a list the total sum of the memory allocated to them is not equal to the sum of their solo sizes?! That is 37 + 37 != 80. What are those extra 6 bytes for?
Lets enlarge k to 10000, the previous code gives:
10000
4
80064
37
Here the difference rises dramatically when comparing the solo size to the whole: 37 * 10000 = 370000 != 80064. It looks like that each item in the list is now occupying 80064/10000 = 8.0064 bytes. Sounds feasible but I still cannot address previously shown conflicts.
After all, the main question of mine is that when I rise k to 0xffffffff and expect to get a list of size ~ 8 * 0xffffffff = 34359738360 I actually encounter an exception of MemoryError. Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?
python python-3.x memory
Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
– cricket_007
17 mins ago
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am new to python and just wondering how memory allocation works there.
It turns out that one way to measure the size of a variable stored is to use sys.getsizeof(x) and it will return the number of bytes that are occupied by x in the memory, right? The following is an example code:
import struct
import sys
x = struct.pack('<L', 0xffffffff)
print(len(x))
print(sys.getsizeof(x))
which gives:
4
37
The variable x that I have just created is a 4-byte string and the first question rises here. Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?
The story gets more complicated when I start to create a list of 2 * 4-byte strings.
Bellow you will find another few lines:
import struct
import sys
k = 2
rng = range(0, k)
x = [b''] * k
for i in rng:
x[i] = struct.pack('<L', 0xffffffff)
print(len(x))
print(len(x[0]))
print(sys.getsizeof(x))
print(sys.getsizeof(x[0]))
from which I get:
2
4
80
37
Another question is that why when I store two 4-byte strings in a list the total sum of the memory allocated to them is not equal to the sum of their solo sizes?! That is 37 + 37 != 80. What are those extra 6 bytes for?
Lets enlarge k to 10000, the previous code gives:
10000
4
80064
37
Here the difference rises dramatically when comparing the solo size to the whole: 37 * 10000 = 370000 != 80064. It looks like that each item in the list is now occupying 80064/10000 = 8.0064 bytes. Sounds feasible but I still cannot address previously shown conflicts.
After all, the main question of mine is that when I rise k to 0xffffffff and expect to get a list of size ~ 8 * 0xffffffff = 34359738360 I actually encounter an exception of MemoryError. Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?
python python-3.x memory
I am new to python and just wondering how memory allocation works there.
It turns out that one way to measure the size of a variable stored is to use sys.getsizeof(x) and it will return the number of bytes that are occupied by x in the memory, right? The following is an example code:
import struct
import sys
x = struct.pack('<L', 0xffffffff)
print(len(x))
print(sys.getsizeof(x))
which gives:
4
37
The variable x that I have just created is a 4-byte string and the first question rises here. Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?
The story gets more complicated when I start to create a list of 2 * 4-byte strings.
Bellow you will find another few lines:
import struct
import sys
k = 2
rng = range(0, k)
x = [b''] * k
for i in rng:
x[i] = struct.pack('<L', 0xffffffff)
print(len(x))
print(len(x[0]))
print(sys.getsizeof(x))
print(sys.getsizeof(x[0]))
from which I get:
2
4
80
37
Another question is that why when I store two 4-byte strings in a list the total sum of the memory allocated to them is not equal to the sum of their solo sizes?! That is 37 + 37 != 80. What are those extra 6 bytes for?
Lets enlarge k to 10000, the previous code gives:
10000
4
80064
37
Here the difference rises dramatically when comparing the solo size to the whole: 37 * 10000 = 370000 != 80064. It looks like that each item in the list is now occupying 80064/10000 = 8.0064 bytes. Sounds feasible but I still cannot address previously shown conflicts.
After all, the main question of mine is that when I rise k to 0xffffffff and expect to get a list of size ~ 8 * 0xffffffff = 34359738360 I actually encounter an exception of MemoryError. Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?
python python-3.x memory
python python-3.x memory
asked 1 hour ago
PouJa
386
386
Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
– cricket_007
17 mins ago
add a comment |
Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
– cricket_007
17 mins ago
Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
– cricket_007
17 mins ago
Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
– cricket_007
17 mins ago
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?
All objects in Python have some amount of "slop" on a per-object basis. Note that in the case of bytes and probably all immutable stdlib types, this padding (here 33 bytes) is independent of the length of the object:
from sys import getsizeof as gso
print(gso(b'x'*1000) - gso(b''))
# 1000
Note, this is not the same as:
print(gso([b'x']*1000) - gso(b''))
# 8031
In the former, you're making a bytes object of 100 x's.
In the latter, you're making a list of 100 bytes objects. The important distinction is that in the latter, you're (a) replicating that (something. more on this later) 100 times, and incorporating the size of the list container.
Lets talk about containers:
print(gso([b'x'*100,]) - gso())
Here we print the difference between the getsizeof of a one element list (of a 100 byte long byte object) and an empty list. We're effectively taring out the size of the container.
We might expect that this is equal to getsizeof(b'x' * 100).
It is not.
The result of print(gso([b'x'*100,]) - gso()) is 8 bytes (on my machine) and is because the list contains just references/pointers to underlying objects and those 8 bytes are just that.
That is 37 + 37 != 80. What are those extra 6 bytes for?
Lets do the same thing and look at the net size, by subtracting the size of the container:
x = [b'xffxffxffxff', b'xffxffxffxff']
print(gso(x[0]) - gso(b'')) # 4
print(gso(x) - gso()) # 16
In the first, the 4 returned is just as the 1000 returned in the first, one per byte. (The length of x[0] is 4).
In the second, it's 8 bytes per reference-to-sub list. It has nothing to do with the contents of those sublists:
N = 1000
x = [b'x'] * N
y = [b'xxxx'] * N
print(gso(x) == gso(y))
# True
But while mutable containers don't seem to have a fixed "slop":
lst =
for _ in range(100):
lst.append('-')
x = list(lst)
slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})
Output:
{'len': 1, 'slop': 88}
{'len': 2, 'slop': 88}
{'len': 3, 'slop': 88}
{'len': 4, 'slop': 88}
{'len': 5, 'slop': 88}
{'len': 6, 'slop': 88}
{'len': 7, 'slop': 88}
{'len': 8, 'slop': 96}
{'len': 9, 'slop': 120}
{'len': 10, 'slop': 120}
{'len': 11, 'slop': 120}
{'len': 12, 'slop': 120}
{'len': 13, 'slop': 120}
{'len': 14, 'slop': 120}
{'len': 15, 'slop': 120}
{'len': 16, 'slop': 128}
{'len': 17, 'slop': 128}
{'len': 18, 'slop': 128}
{'len': 19, 'slop': 128}
{'len': 20, 'slop': 128}
{'len': 21, 'slop': 128}
{'len': 22, 'slop': 128}
{'len': 23, 'slop': 128}
{'len': 24, 'slop': 136}
...
...Immutable containers do:
lst =
for _ in range(100):
lst.append('-')
x = tuple(lst)
slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})
{'len': 1, 'slop': 48}
{'len': 2, 'slop': 48}
{'len': 3, 'slop': 48}
{'len': 4, 'slop': 48}
{'len': 5, 'slop': 48}
{'len': 6, 'slop': 48}
{'len': 7, 'slop': 48}
{'len': 8, 'slop': 48}
{'len': 9, 'slop': 48}
{'len': 10, 'slop': 48}
{'len': 11, 'slop': 48}
{'len': 12, 'slop': 48}
{'len': 13, 'slop': 48}
{'len': 14, 'slop': 48}
...
Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?
First, recall that the sizeof a container will not reflect the entire amount of memory used by Python. The ~8 bytes per element is the size of the pointer, each of those elements will consume an additional 37 (or whatever) bytes.
But the good news is that it's unlikely you probably don't need the entire list at the same time. If you're just building a list to iterate over, then generate it one element at a time, with a for loop or generator function.
Or generate it a chunk at a time, process it, and then continue, letting the garbage collector clean up the no-longer-used memory.
One other interesting thing to point out
N = 1000
x = [b'x' for _ in range(N)]
y = [b'x'] * N
print(x == y) # True
print(gso(x) == gso(y)) # False
(This is likely due to the size of y being known a priori, while the size of x is not and has been resized as it grew).
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?
All objects in Python have some amount of "slop" on a per-object basis. Note that in the case of bytes and probably all immutable stdlib types, this padding (here 33 bytes) is independent of the length of the object:
from sys import getsizeof as gso
print(gso(b'x'*1000) - gso(b''))
# 1000
Note, this is not the same as:
print(gso([b'x']*1000) - gso(b''))
# 8031
In the former, you're making a bytes object of 100 x's.
In the latter, you're making a list of 100 bytes objects. The important distinction is that in the latter, you're (a) replicating that (something. more on this later) 100 times, and incorporating the size of the list container.
Lets talk about containers:
print(gso([b'x'*100,]) - gso())
Here we print the difference between the getsizeof of a one element list (of a 100 byte long byte object) and an empty list. We're effectively taring out the size of the container.
We might expect that this is equal to getsizeof(b'x' * 100).
It is not.
The result of print(gso([b'x'*100,]) - gso()) is 8 bytes (on my machine) and is because the list contains just references/pointers to underlying objects and those 8 bytes are just that.
That is 37 + 37 != 80. What are those extra 6 bytes for?
Lets do the same thing and look at the net size, by subtracting the size of the container:
x = [b'xffxffxffxff', b'xffxffxffxff']
print(gso(x[0]) - gso(b'')) # 4
print(gso(x) - gso()) # 16
In the first, the 4 returned is just as the 1000 returned in the first, one per byte. (The length of x[0] is 4).
In the second, it's 8 bytes per reference-to-sub list. It has nothing to do with the contents of those sublists:
N = 1000
x = [b'x'] * N
y = [b'xxxx'] * N
print(gso(x) == gso(y))
# True
But while mutable containers don't seem to have a fixed "slop":
lst =
for _ in range(100):
lst.append('-')
x = list(lst)
slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})
Output:
{'len': 1, 'slop': 88}
{'len': 2, 'slop': 88}
{'len': 3, 'slop': 88}
{'len': 4, 'slop': 88}
{'len': 5, 'slop': 88}
{'len': 6, 'slop': 88}
{'len': 7, 'slop': 88}
{'len': 8, 'slop': 96}
{'len': 9, 'slop': 120}
{'len': 10, 'slop': 120}
{'len': 11, 'slop': 120}
{'len': 12, 'slop': 120}
{'len': 13, 'slop': 120}
{'len': 14, 'slop': 120}
{'len': 15, 'slop': 120}
{'len': 16, 'slop': 128}
{'len': 17, 'slop': 128}
{'len': 18, 'slop': 128}
{'len': 19, 'slop': 128}
{'len': 20, 'slop': 128}
{'len': 21, 'slop': 128}
{'len': 22, 'slop': 128}
{'len': 23, 'slop': 128}
{'len': 24, 'slop': 136}
...
...Immutable containers do:
lst =
for _ in range(100):
lst.append('-')
x = tuple(lst)
slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})
{'len': 1, 'slop': 48}
{'len': 2, 'slop': 48}
{'len': 3, 'slop': 48}
{'len': 4, 'slop': 48}
{'len': 5, 'slop': 48}
{'len': 6, 'slop': 48}
{'len': 7, 'slop': 48}
{'len': 8, 'slop': 48}
{'len': 9, 'slop': 48}
{'len': 10, 'slop': 48}
{'len': 11, 'slop': 48}
{'len': 12, 'slop': 48}
{'len': 13, 'slop': 48}
{'len': 14, 'slop': 48}
...
Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?
First, recall that the sizeof a container will not reflect the entire amount of memory used by Python. The ~8 bytes per element is the size of the pointer, each of those elements will consume an additional 37 (or whatever) bytes.
But the good news is that it's unlikely you probably don't need the entire list at the same time. If you're just building a list to iterate over, then generate it one element at a time, with a for loop or generator function.
Or generate it a chunk at a time, process it, and then continue, letting the garbage collector clean up the no-longer-used memory.
One other interesting thing to point out
N = 1000
x = [b'x' for _ in range(N)]
y = [b'x'] * N
print(x == y) # True
print(gso(x) == gso(y)) # False
(This is likely due to the size of y being known a priori, while the size of x is not and has been resized as it grew).
add a comment |
up vote
0
down vote
Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?
All objects in Python have some amount of "slop" on a per-object basis. Note that in the case of bytes and probably all immutable stdlib types, this padding (here 33 bytes) is independent of the length of the object:
from sys import getsizeof as gso
print(gso(b'x'*1000) - gso(b''))
# 1000
Note, this is not the same as:
print(gso([b'x']*1000) - gso(b''))
# 8031
In the former, you're making a bytes object of 100 x's.
In the latter, you're making a list of 100 bytes objects. The important distinction is that in the latter, you're (a) replicating that (something. more on this later) 100 times, and incorporating the size of the list container.
Lets talk about containers:
print(gso([b'x'*100,]) - gso())
Here we print the difference between the getsizeof of a one element list (of a 100 byte long byte object) and an empty list. We're effectively taring out the size of the container.
We might expect that this is equal to getsizeof(b'x' * 100).
It is not.
The result of print(gso([b'x'*100,]) - gso()) is 8 bytes (on my machine) and is because the list contains just references/pointers to underlying objects and those 8 bytes are just that.
That is 37 + 37 != 80. What are those extra 6 bytes for?
Lets do the same thing and look at the net size, by subtracting the size of the container:
x = [b'xffxffxffxff', b'xffxffxffxff']
print(gso(x[0]) - gso(b'')) # 4
print(gso(x) - gso()) # 16
In the first, the 4 returned is just as the 1000 returned in the first, one per byte. (The length of x[0] is 4).
In the second, it's 8 bytes per reference-to-sub list. It has nothing to do with the contents of those sublists:
N = 1000
x = [b'x'] * N
y = [b'xxxx'] * N
print(gso(x) == gso(y))
# True
But while mutable containers don't seem to have a fixed "slop":
lst =
for _ in range(100):
lst.append('-')
x = list(lst)
slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})
Output:
{'len': 1, 'slop': 88}
{'len': 2, 'slop': 88}
{'len': 3, 'slop': 88}
{'len': 4, 'slop': 88}
{'len': 5, 'slop': 88}
{'len': 6, 'slop': 88}
{'len': 7, 'slop': 88}
{'len': 8, 'slop': 96}
{'len': 9, 'slop': 120}
{'len': 10, 'slop': 120}
{'len': 11, 'slop': 120}
{'len': 12, 'slop': 120}
{'len': 13, 'slop': 120}
{'len': 14, 'slop': 120}
{'len': 15, 'slop': 120}
{'len': 16, 'slop': 128}
{'len': 17, 'slop': 128}
{'len': 18, 'slop': 128}
{'len': 19, 'slop': 128}
{'len': 20, 'slop': 128}
{'len': 21, 'slop': 128}
{'len': 22, 'slop': 128}
{'len': 23, 'slop': 128}
{'len': 24, 'slop': 136}
...
...Immutable containers do:
lst =
for _ in range(100):
lst.append('-')
x = tuple(lst)
slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})
{'len': 1, 'slop': 48}
{'len': 2, 'slop': 48}
{'len': 3, 'slop': 48}
{'len': 4, 'slop': 48}
{'len': 5, 'slop': 48}
{'len': 6, 'slop': 48}
{'len': 7, 'slop': 48}
{'len': 8, 'slop': 48}
{'len': 9, 'slop': 48}
{'len': 10, 'slop': 48}
{'len': 11, 'slop': 48}
{'len': 12, 'slop': 48}
{'len': 13, 'slop': 48}
{'len': 14, 'slop': 48}
...
Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?
First, recall that the sizeof a container will not reflect the entire amount of memory used by Python. The ~8 bytes per element is the size of the pointer, each of those elements will consume an additional 37 (or whatever) bytes.
But the good news is that it's unlikely you probably don't need the entire list at the same time. If you're just building a list to iterate over, then generate it one element at a time, with a for loop or generator function.
Or generate it a chunk at a time, process it, and then continue, letting the garbage collector clean up the no-longer-used memory.
One other interesting thing to point out
N = 1000
x = [b'x' for _ in range(N)]
y = [b'x'] * N
print(x == y) # True
print(gso(x) == gso(y)) # False
(This is likely due to the size of y being known a priori, while the size of x is not and has been resized as it grew).
add a comment |
up vote
0
down vote
up vote
0
down vote
Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?
All objects in Python have some amount of "slop" on a per-object basis. Note that in the case of bytes and probably all immutable stdlib types, this padding (here 33 bytes) is independent of the length of the object:
from sys import getsizeof as gso
print(gso(b'x'*1000) - gso(b''))
# 1000
Note, this is not the same as:
print(gso([b'x']*1000) - gso(b''))
# 8031
In the former, you're making a bytes object of 100 x's.
In the latter, you're making a list of 100 bytes objects. The important distinction is that in the latter, you're (a) replicating that (something. more on this later) 100 times, and incorporating the size of the list container.
Lets talk about containers:
print(gso([b'x'*100,]) - gso())
Here we print the difference between the getsizeof of a one element list (of a 100 byte long byte object) and an empty list. We're effectively taring out the size of the container.
We might expect that this is equal to getsizeof(b'x' * 100).
It is not.
The result of print(gso([b'x'*100,]) - gso()) is 8 bytes (on my machine) and is because the list contains just references/pointers to underlying objects and those 8 bytes are just that.
That is 37 + 37 != 80. What are those extra 6 bytes for?
Lets do the same thing and look at the net size, by subtracting the size of the container:
x = [b'xffxffxffxff', b'xffxffxffxff']
print(gso(x[0]) - gso(b'')) # 4
print(gso(x) - gso()) # 16
In the first, the 4 returned is just as the 1000 returned in the first, one per byte. (The length of x[0] is 4).
In the second, it's 8 bytes per reference-to-sub list. It has nothing to do with the contents of those sublists:
N = 1000
x = [b'x'] * N
y = [b'xxxx'] * N
print(gso(x) == gso(y))
# True
But while mutable containers don't seem to have a fixed "slop":
lst =
for _ in range(100):
lst.append('-')
x = list(lst)
slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})
Output:
{'len': 1, 'slop': 88}
{'len': 2, 'slop': 88}
{'len': 3, 'slop': 88}
{'len': 4, 'slop': 88}
{'len': 5, 'slop': 88}
{'len': 6, 'slop': 88}
{'len': 7, 'slop': 88}
{'len': 8, 'slop': 96}
{'len': 9, 'slop': 120}
{'len': 10, 'slop': 120}
{'len': 11, 'slop': 120}
{'len': 12, 'slop': 120}
{'len': 13, 'slop': 120}
{'len': 14, 'slop': 120}
{'len': 15, 'slop': 120}
{'len': 16, 'slop': 128}
{'len': 17, 'slop': 128}
{'len': 18, 'slop': 128}
{'len': 19, 'slop': 128}
{'len': 20, 'slop': 128}
{'len': 21, 'slop': 128}
{'len': 22, 'slop': 128}
{'len': 23, 'slop': 128}
{'len': 24, 'slop': 136}
...
...Immutable containers do:
lst =
for _ in range(100):
lst.append('-')
x = tuple(lst)
slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})
{'len': 1, 'slop': 48}
{'len': 2, 'slop': 48}
{'len': 3, 'slop': 48}
{'len': 4, 'slop': 48}
{'len': 5, 'slop': 48}
{'len': 6, 'slop': 48}
{'len': 7, 'slop': 48}
{'len': 8, 'slop': 48}
{'len': 9, 'slop': 48}
{'len': 10, 'slop': 48}
{'len': 11, 'slop': 48}
{'len': 12, 'slop': 48}
{'len': 13, 'slop': 48}
{'len': 14, 'slop': 48}
...
Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?
First, recall that the sizeof a container will not reflect the entire amount of memory used by Python. The ~8 bytes per element is the size of the pointer, each of those elements will consume an additional 37 (or whatever) bytes.
But the good news is that it's unlikely you probably don't need the entire list at the same time. If you're just building a list to iterate over, then generate it one element at a time, with a for loop or generator function.
Or generate it a chunk at a time, process it, and then continue, letting the garbage collector clean up the no-longer-used memory.
One other interesting thing to point out
N = 1000
x = [b'x' for _ in range(N)]
y = [b'x'] * N
print(x == y) # True
print(gso(x) == gso(y)) # False
(This is likely due to the size of y being known a priori, while the size of x is not and has been resized as it grew).
Why is the memory allocated to a 4-byte string is 37 bytes? Is not that too much extra space?
All objects in Python have some amount of "slop" on a per-object basis. Note that in the case of bytes and probably all immutable stdlib types, this padding (here 33 bytes) is independent of the length of the object:
from sys import getsizeof as gso
print(gso(b'x'*1000) - gso(b''))
# 1000
Note, this is not the same as:
print(gso([b'x']*1000) - gso(b''))
# 8031
In the former, you're making a bytes object of 100 x's.
In the latter, you're making a list of 100 bytes objects. The important distinction is that in the latter, you're (a) replicating that (something. more on this later) 100 times, and incorporating the size of the list container.
Lets talk about containers:
print(gso([b'x'*100,]) - gso())
Here we print the difference between the getsizeof of a one element list (of a 100 byte long byte object) and an empty list. We're effectively taring out the size of the container.
We might expect that this is equal to getsizeof(b'x' * 100).
It is not.
The result of print(gso([b'x'*100,]) - gso()) is 8 bytes (on my machine) and is because the list contains just references/pointers to underlying objects and those 8 bytes are just that.
That is 37 + 37 != 80. What are those extra 6 bytes for?
Lets do the same thing and look at the net size, by subtracting the size of the container:
x = [b'xffxffxffxff', b'xffxffxffxff']
print(gso(x[0]) - gso(b'')) # 4
print(gso(x) - gso()) # 16
In the first, the 4 returned is just as the 1000 returned in the first, one per byte. (The length of x[0] is 4).
In the second, it's 8 bytes per reference-to-sub list. It has nothing to do with the contents of those sublists:
N = 1000
x = [b'x'] * N
y = [b'xxxx'] * N
print(gso(x) == gso(y))
# True
But while mutable containers don't seem to have a fixed "slop":
lst =
for _ in range(100):
lst.append('-')
x = list(lst)
slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})
Output:
{'len': 1, 'slop': 88}
{'len': 2, 'slop': 88}
{'len': 3, 'slop': 88}
{'len': 4, 'slop': 88}
{'len': 5, 'slop': 88}
{'len': 6, 'slop': 88}
{'len': 7, 'slop': 88}
{'len': 8, 'slop': 96}
{'len': 9, 'slop': 120}
{'len': 10, 'slop': 120}
{'len': 11, 'slop': 120}
{'len': 12, 'slop': 120}
{'len': 13, 'slop': 120}
{'len': 14, 'slop': 120}
{'len': 15, 'slop': 120}
{'len': 16, 'slop': 128}
{'len': 17, 'slop': 128}
{'len': 18, 'slop': 128}
{'len': 19, 'slop': 128}
{'len': 20, 'slop': 128}
{'len': 21, 'slop': 128}
{'len': 22, 'slop': 128}
{'len': 23, 'slop': 128}
{'len': 24, 'slop': 136}
...
...Immutable containers do:
lst =
for _ in range(100):
lst.append('-')
x = tuple(lst)
slop = gso(x) - (8 * len(x))
print({"len": len(x), "slop": slop})
{'len': 1, 'slop': 48}
{'len': 2, 'slop': 48}
{'len': 3, 'slop': 48}
{'len': 4, 'slop': 48}
{'len': 5, 'slop': 48}
{'len': 6, 'slop': 48}
{'len': 7, 'slop': 48}
{'len': 8, 'slop': 48}
{'len': 9, 'slop': 48}
{'len': 10, 'slop': 48}
{'len': 11, 'slop': 48}
{'len': 12, 'slop': 48}
{'len': 13, 'slop': 48}
{'len': 14, 'slop': 48}
...
Is there any way to eliminate non-critical memory spaces so that my 8G DDR3 RAM can host this variable x?
First, recall that the sizeof a container will not reflect the entire amount of memory used by Python. The ~8 bytes per element is the size of the pointer, each of those elements will consume an additional 37 (or whatever) bytes.
But the good news is that it's unlikely you probably don't need the entire list at the same time. If you're just building a list to iterate over, then generate it one element at a time, with a for loop or generator function.
Or generate it a chunk at a time, process it, and then continue, letting the garbage collector clean up the no-longer-used memory.
One other interesting thing to point out
N = 1000
x = [b'x' for _ in range(N)]
y = [b'x'] * N
print(x == y) # True
print(gso(x) == gso(y)) # False
(This is likely due to the size of y being known a priori, while the size of x is not and has been resized as it grew).
edited 5 mins ago
answered 10 mins ago
jedwards
20.7k3158
20.7k3158
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53348835%2fhow-can-i-host-a-large-list-in-my-8g-ddr3-ram%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Just because you have 8 GB installed, doesn't mean Python has access to all of it. E.g. Your OS and other processes need RAM as well
– cricket_007
17 mins ago