Separate scrap and production by area
up vote
3
down vote
favorite
I'm looking for feedback on things that can be streamlined in each section. The data it creates is all needed whether it's used in a graph or not so I can't get rid of anything, just combine steps maybe.
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
data = pd.read_csv("C:\Users\eb2547\Desktop\python code for work\2017 TB.csv")
# Scrap separation
noNan = data[['Quantity','Scrap Cause']].dropna()
scrap = noNan.groupby('Scrap Cause').sum()
# Hanging production
hangdata = data[(data['Transaction Description'].str.contains('INVM') == True) &
(data['Location No'] == 'SF HANK STAGING')]
hangprod = hangdata['Quantity'].sum()
# Tapered production
# Wrapper production
wrapdatatobug = data[(data['Transaction Description'] == 'INVM-ISS') &
(data['Location No'].str.contains('BUG') == True)]
wrapprodtobug = wrapdatatobug['Quantity'].sum()
wrapdatatobox = data[(data['Transaction Description'] == 'OOREC') &
(data['Location No'].str.contains('BOX ASSEMBLY-WRAP') == True)]
wrapprodtobox = wrapdatatobox['Quantity'].sum()
wrapscrap = data[(data['Scrap Cause'].str.contains('W') == True)]
wrapscraptotal = wrapscrap['Quantity'].sum()
wrapprodtotal = wrapprodtobug + wrapprodtobox
# Cut Piece production
cpdata = data[(data['Transaction Description'] == 'OOREC') &
(data['Location No'].str.contains('BOX ASSEMBLY-CP') == True)]
cpprod = cpdata['Quantity'].sum()
cpscrap = data[(data['Scrap Cause'].str.contains('C') == True)]
cpscraptotal = cpscrap['Quantity'].sum()
# Graphs of scrap data
wrapscrap2 = scrap[(scrap.index.str.contains('W') == True)]
cpscrap2 = scrap[(scrap.index.str.contains('C') == True)]
spinscrap = scrap[(scrap.index).str.contains('S') == True]
def Wrap_Scrap():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=wrapscrap2,x=wrapscrap2.index,y='Quantity')
def CP_Scrap():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=cpscrap2,x=cpscrap2.index,y='Quantity')
# Graph of production
prodoverview = pd.DataFrame([hangprod,wrapprodtotal,cpprod],
index=['Hanging','Wrapping','CP'],
columns=['Quantity'])
def Prod_Graph():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=prodoverview,x=prodoverview.index,y='Quantity')
python python-3.x pandas matplotlib
bumped to the homepage by Community♦ 10 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
up vote
3
down vote
favorite
I'm looking for feedback on things that can be streamlined in each section. The data it creates is all needed whether it's used in a graph or not so I can't get rid of anything, just combine steps maybe.
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
data = pd.read_csv("C:\Users\eb2547\Desktop\python code for work\2017 TB.csv")
# Scrap separation
noNan = data[['Quantity','Scrap Cause']].dropna()
scrap = noNan.groupby('Scrap Cause').sum()
# Hanging production
hangdata = data[(data['Transaction Description'].str.contains('INVM') == True) &
(data['Location No'] == 'SF HANK STAGING')]
hangprod = hangdata['Quantity'].sum()
# Tapered production
# Wrapper production
wrapdatatobug = data[(data['Transaction Description'] == 'INVM-ISS') &
(data['Location No'].str.contains('BUG') == True)]
wrapprodtobug = wrapdatatobug['Quantity'].sum()
wrapdatatobox = data[(data['Transaction Description'] == 'OOREC') &
(data['Location No'].str.contains('BOX ASSEMBLY-WRAP') == True)]
wrapprodtobox = wrapdatatobox['Quantity'].sum()
wrapscrap = data[(data['Scrap Cause'].str.contains('W') == True)]
wrapscraptotal = wrapscrap['Quantity'].sum()
wrapprodtotal = wrapprodtobug + wrapprodtobox
# Cut Piece production
cpdata = data[(data['Transaction Description'] == 'OOREC') &
(data['Location No'].str.contains('BOX ASSEMBLY-CP') == True)]
cpprod = cpdata['Quantity'].sum()
cpscrap = data[(data['Scrap Cause'].str.contains('C') == True)]
cpscraptotal = cpscrap['Quantity'].sum()
# Graphs of scrap data
wrapscrap2 = scrap[(scrap.index.str.contains('W') == True)]
cpscrap2 = scrap[(scrap.index.str.contains('C') == True)]
spinscrap = scrap[(scrap.index).str.contains('S') == True]
def Wrap_Scrap():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=wrapscrap2,x=wrapscrap2.index,y='Quantity')
def CP_Scrap():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=cpscrap2,x=cpscrap2.index,y='Quantity')
# Graph of production
prodoverview = pd.DataFrame([hangprod,wrapprodtotal,cpprod],
index=['Hanging','Wrapping','CP'],
columns=['Quantity'])
def Prod_Graph():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=prodoverview,x=prodoverview.index,y='Quantity')
python python-3.x pandas matplotlib
bumped to the homepage by Community♦ 10 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
Can you edit the post to explain what this code is supposed to do, please? It is hard to review code unless we know what problem it is supposed to solve.
– Gareth Rees
Jan 24 at 10:10
@GarethRees it takes the data frame which is a 95,000 row 16 column excel sheet, in the scrap cause column there are about 30-40 different reasons, breaks them all down by grouping them and summing them. Then in the rest of it it breaks down different areas by specific scrap code and makes them into graphs or just variables for me to look at.
– letto4135
Jan 24 at 13:23
Normally you'd do that kind of thing directly in Excel — you can group and sum rows by categories using a pivot table, and draw graphs using pivot charts. Can you edit the post to explain why this didn't work for you?
– Gareth Rees
Jan 24 at 13:45
1
@GarethRees it did work, just want to know if there are steps I can combine. Also I do this in excel normally, started working on this to just see if I could with programming and it worked, now I’m looking for improvements. I do a lot of very advanced work in excel to the point that I thought I might enjoy programming so I started learning it, and I do enjoy it.
– letto4135
Jan 24 at 14:51
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I'm looking for feedback on things that can be streamlined in each section. The data it creates is all needed whether it's used in a graph or not so I can't get rid of anything, just combine steps maybe.
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
data = pd.read_csv("C:\Users\eb2547\Desktop\python code for work\2017 TB.csv")
# Scrap separation
noNan = data[['Quantity','Scrap Cause']].dropna()
scrap = noNan.groupby('Scrap Cause').sum()
# Hanging production
hangdata = data[(data['Transaction Description'].str.contains('INVM') == True) &
(data['Location No'] == 'SF HANK STAGING')]
hangprod = hangdata['Quantity'].sum()
# Tapered production
# Wrapper production
wrapdatatobug = data[(data['Transaction Description'] == 'INVM-ISS') &
(data['Location No'].str.contains('BUG') == True)]
wrapprodtobug = wrapdatatobug['Quantity'].sum()
wrapdatatobox = data[(data['Transaction Description'] == 'OOREC') &
(data['Location No'].str.contains('BOX ASSEMBLY-WRAP') == True)]
wrapprodtobox = wrapdatatobox['Quantity'].sum()
wrapscrap = data[(data['Scrap Cause'].str.contains('W') == True)]
wrapscraptotal = wrapscrap['Quantity'].sum()
wrapprodtotal = wrapprodtobug + wrapprodtobox
# Cut Piece production
cpdata = data[(data['Transaction Description'] == 'OOREC') &
(data['Location No'].str.contains('BOX ASSEMBLY-CP') == True)]
cpprod = cpdata['Quantity'].sum()
cpscrap = data[(data['Scrap Cause'].str.contains('C') == True)]
cpscraptotal = cpscrap['Quantity'].sum()
# Graphs of scrap data
wrapscrap2 = scrap[(scrap.index.str.contains('W') == True)]
cpscrap2 = scrap[(scrap.index.str.contains('C') == True)]
spinscrap = scrap[(scrap.index).str.contains('S') == True]
def Wrap_Scrap():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=wrapscrap2,x=wrapscrap2.index,y='Quantity')
def CP_Scrap():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=cpscrap2,x=cpscrap2.index,y='Quantity')
# Graph of production
prodoverview = pd.DataFrame([hangprod,wrapprodtotal,cpprod],
index=['Hanging','Wrapping','CP'],
columns=['Quantity'])
def Prod_Graph():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=prodoverview,x=prodoverview.index,y='Quantity')
python python-3.x pandas matplotlib
I'm looking for feedback on things that can be streamlined in each section. The data it creates is all needed whether it's used in a graph or not so I can't get rid of anything, just combine steps maybe.
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
data = pd.read_csv("C:\Users\eb2547\Desktop\python code for work\2017 TB.csv")
# Scrap separation
noNan = data[['Quantity','Scrap Cause']].dropna()
scrap = noNan.groupby('Scrap Cause').sum()
# Hanging production
hangdata = data[(data['Transaction Description'].str.contains('INVM') == True) &
(data['Location No'] == 'SF HANK STAGING')]
hangprod = hangdata['Quantity'].sum()
# Tapered production
# Wrapper production
wrapdatatobug = data[(data['Transaction Description'] == 'INVM-ISS') &
(data['Location No'].str.contains('BUG') == True)]
wrapprodtobug = wrapdatatobug['Quantity'].sum()
wrapdatatobox = data[(data['Transaction Description'] == 'OOREC') &
(data['Location No'].str.contains('BOX ASSEMBLY-WRAP') == True)]
wrapprodtobox = wrapdatatobox['Quantity'].sum()
wrapscrap = data[(data['Scrap Cause'].str.contains('W') == True)]
wrapscraptotal = wrapscrap['Quantity'].sum()
wrapprodtotal = wrapprodtobug + wrapprodtobox
# Cut Piece production
cpdata = data[(data['Transaction Description'] == 'OOREC') &
(data['Location No'].str.contains('BOX ASSEMBLY-CP') == True)]
cpprod = cpdata['Quantity'].sum()
cpscrap = data[(data['Scrap Cause'].str.contains('C') == True)]
cpscraptotal = cpscrap['Quantity'].sum()
# Graphs of scrap data
wrapscrap2 = scrap[(scrap.index.str.contains('W') == True)]
cpscrap2 = scrap[(scrap.index.str.contains('C') == True)]
spinscrap = scrap[(scrap.index).str.contains('S') == True]
def Wrap_Scrap():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=wrapscrap2,x=wrapscrap2.index,y='Quantity')
def CP_Scrap():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=cpscrap2,x=cpscrap2.index,y='Quantity')
# Graph of production
prodoverview = pd.DataFrame([hangprod,wrapprodtotal,cpprod],
index=['Hanging','Wrapping','CP'],
columns=['Quantity'])
def Prod_Graph():
fix, ax = plt.subplots(figsize=(10,4))
sns.barplot(data=prodoverview,x=prodoverview.index,y='Quantity')
python python-3.x pandas matplotlib
python python-3.x pandas matplotlib
edited Jan 23 at 15:42
200_success
127k15148411
127k15148411
asked Jan 23 at 1:23
letto4135
192
192
bumped to the homepage by Community♦ 10 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 10 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
Can you edit the post to explain what this code is supposed to do, please? It is hard to review code unless we know what problem it is supposed to solve.
– Gareth Rees
Jan 24 at 10:10
@GarethRees it takes the data frame which is a 95,000 row 16 column excel sheet, in the scrap cause column there are about 30-40 different reasons, breaks them all down by grouping them and summing them. Then in the rest of it it breaks down different areas by specific scrap code and makes them into graphs or just variables for me to look at.
– letto4135
Jan 24 at 13:23
Normally you'd do that kind of thing directly in Excel — you can group and sum rows by categories using a pivot table, and draw graphs using pivot charts. Can you edit the post to explain why this didn't work for you?
– Gareth Rees
Jan 24 at 13:45
1
@GarethRees it did work, just want to know if there are steps I can combine. Also I do this in excel normally, started working on this to just see if I could with programming and it worked, now I’m looking for improvements. I do a lot of very advanced work in excel to the point that I thought I might enjoy programming so I started learning it, and I do enjoy it.
– letto4135
Jan 24 at 14:51
add a comment |
1
Can you edit the post to explain what this code is supposed to do, please? It is hard to review code unless we know what problem it is supposed to solve.
– Gareth Rees
Jan 24 at 10:10
@GarethRees it takes the data frame which is a 95,000 row 16 column excel sheet, in the scrap cause column there are about 30-40 different reasons, breaks them all down by grouping them and summing them. Then in the rest of it it breaks down different areas by specific scrap code and makes them into graphs or just variables for me to look at.
– letto4135
Jan 24 at 13:23
Normally you'd do that kind of thing directly in Excel — you can group and sum rows by categories using a pivot table, and draw graphs using pivot charts. Can you edit the post to explain why this didn't work for you?
– Gareth Rees
Jan 24 at 13:45
1
@GarethRees it did work, just want to know if there are steps I can combine. Also I do this in excel normally, started working on this to just see if I could with programming and it worked, now I’m looking for improvements. I do a lot of very advanced work in excel to the point that I thought I might enjoy programming so I started learning it, and I do enjoy it.
– letto4135
Jan 24 at 14:51
1
1
Can you edit the post to explain what this code is supposed to do, please? It is hard to review code unless we know what problem it is supposed to solve.
– Gareth Rees
Jan 24 at 10:10
Can you edit the post to explain what this code is supposed to do, please? It is hard to review code unless we know what problem it is supposed to solve.
– Gareth Rees
Jan 24 at 10:10
@GarethRees it takes the data frame which is a 95,000 row 16 column excel sheet, in the scrap cause column there are about 30-40 different reasons, breaks them all down by grouping them and summing them. Then in the rest of it it breaks down different areas by specific scrap code and makes them into graphs or just variables for me to look at.
– letto4135
Jan 24 at 13:23
@GarethRees it takes the data frame which is a 95,000 row 16 column excel sheet, in the scrap cause column there are about 30-40 different reasons, breaks them all down by grouping them and summing them. Then in the rest of it it breaks down different areas by specific scrap code and makes them into graphs or just variables for me to look at.
– letto4135
Jan 24 at 13:23
Normally you'd do that kind of thing directly in Excel — you can group and sum rows by categories using a pivot table, and draw graphs using pivot charts. Can you edit the post to explain why this didn't work for you?
– Gareth Rees
Jan 24 at 13:45
Normally you'd do that kind of thing directly in Excel — you can group and sum rows by categories using a pivot table, and draw graphs using pivot charts. Can you edit the post to explain why this didn't work for you?
– Gareth Rees
Jan 24 at 13:45
1
1
@GarethRees it did work, just want to know if there are steps I can combine. Also I do this in excel normally, started working on this to just see if I could with programming and it worked, now I’m looking for improvements. I do a lot of very advanced work in excel to the point that I thought I might enjoy programming so I started learning it, and I do enjoy it.
– letto4135
Jan 24 at 14:51
@GarethRees it did work, just want to know if there are steps I can combine. Also I do this in excel normally, started working on this to just see if I could with programming and it worked, now I’m looking for improvements. I do a lot of very advanced work in excel to the point that I thought I might enjoy programming so I started learning it, and I do enjoy it.
– letto4135
Jan 24 at 14:51
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
efficiency
Overall, the code looks idiomatic and sensible, no major issues. But you want to tweak it for efficiency. Here are some things you might try.
Most importantly, instrument elapsed time, so you can tell if a tweak is useful:
t0 = time.time()
...
elapsed = time.time() - t0
And definitely use the profiler, so you know where to focus your efforts.
Ok, with that in hand, it looks like you need to synthesize some boolean columns. There are several columns that you repeatedly inspect, e.g. Transaction Description. Perhaps you could add them to the spreadsheet, to avoid repeatedly processing unchanging data? Another approach would be to build up a list of rows and then do data = pd.DataFrame(rows)
. This would let you examine each Transaction Description once and derive a couple of columns. But frankly, I'm skeptical this would improve your timings at all, the code looks fine as is.
style
Do use a pep8 linter: flake8, and follow its advice.
In hard-to-read identifiers like spinscrap
, introduce an underscore so it is clear to the Gentle Reader where word breaks are.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
efficiency
Overall, the code looks idiomatic and sensible, no major issues. But you want to tweak it for efficiency. Here are some things you might try.
Most importantly, instrument elapsed time, so you can tell if a tweak is useful:
t0 = time.time()
...
elapsed = time.time() - t0
And definitely use the profiler, so you know where to focus your efforts.
Ok, with that in hand, it looks like you need to synthesize some boolean columns. There are several columns that you repeatedly inspect, e.g. Transaction Description. Perhaps you could add them to the spreadsheet, to avoid repeatedly processing unchanging data? Another approach would be to build up a list of rows and then do data = pd.DataFrame(rows)
. This would let you examine each Transaction Description once and derive a couple of columns. But frankly, I'm skeptical this would improve your timings at all, the code looks fine as is.
style
Do use a pep8 linter: flake8, and follow its advice.
In hard-to-read identifiers like spinscrap
, introduce an underscore so it is clear to the Gentle Reader where word breaks are.
add a comment |
up vote
0
down vote
efficiency
Overall, the code looks idiomatic and sensible, no major issues. But you want to tweak it for efficiency. Here are some things you might try.
Most importantly, instrument elapsed time, so you can tell if a tweak is useful:
t0 = time.time()
...
elapsed = time.time() - t0
And definitely use the profiler, so you know where to focus your efforts.
Ok, with that in hand, it looks like you need to synthesize some boolean columns. There are several columns that you repeatedly inspect, e.g. Transaction Description. Perhaps you could add them to the spreadsheet, to avoid repeatedly processing unchanging data? Another approach would be to build up a list of rows and then do data = pd.DataFrame(rows)
. This would let you examine each Transaction Description once and derive a couple of columns. But frankly, I'm skeptical this would improve your timings at all, the code looks fine as is.
style
Do use a pep8 linter: flake8, and follow its advice.
In hard-to-read identifiers like spinscrap
, introduce an underscore so it is clear to the Gentle Reader where word breaks are.
add a comment |
up vote
0
down vote
up vote
0
down vote
efficiency
Overall, the code looks idiomatic and sensible, no major issues. But you want to tweak it for efficiency. Here are some things you might try.
Most importantly, instrument elapsed time, so you can tell if a tweak is useful:
t0 = time.time()
...
elapsed = time.time() - t0
And definitely use the profiler, so you know where to focus your efforts.
Ok, with that in hand, it looks like you need to synthesize some boolean columns. There are several columns that you repeatedly inspect, e.g. Transaction Description. Perhaps you could add them to the spreadsheet, to avoid repeatedly processing unchanging data? Another approach would be to build up a list of rows and then do data = pd.DataFrame(rows)
. This would let you examine each Transaction Description once and derive a couple of columns. But frankly, I'm skeptical this would improve your timings at all, the code looks fine as is.
style
Do use a pep8 linter: flake8, and follow its advice.
In hard-to-read identifiers like spinscrap
, introduce an underscore so it is clear to the Gentle Reader where word breaks are.
efficiency
Overall, the code looks idiomatic and sensible, no major issues. But you want to tweak it for efficiency. Here are some things you might try.
Most importantly, instrument elapsed time, so you can tell if a tweak is useful:
t0 = time.time()
...
elapsed = time.time() - t0
And definitely use the profiler, so you know where to focus your efforts.
Ok, with that in hand, it looks like you need to synthesize some boolean columns. There are several columns that you repeatedly inspect, e.g. Transaction Description. Perhaps you could add them to the spreadsheet, to avoid repeatedly processing unchanging data? Another approach would be to build up a list of rows and then do data = pd.DataFrame(rows)
. This would let you examine each Transaction Description once and derive a couple of columns. But frankly, I'm skeptical this would improve your timings at all, the code looks fine as is.
style
Do use a pep8 linter: flake8, and follow its advice.
In hard-to-read identifiers like spinscrap
, introduce an underscore so it is clear to the Gentle Reader where word breaks are.
answered Mar 31 at 22:15
J_H
4,387130
4,387130
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185740%2fseparate-scrap-and-production-by-area%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Can you edit the post to explain what this code is supposed to do, please? It is hard to review code unless we know what problem it is supposed to solve.
– Gareth Rees
Jan 24 at 10:10
@GarethRees it takes the data frame which is a 95,000 row 16 column excel sheet, in the scrap cause column there are about 30-40 different reasons, breaks them all down by grouping them and summing them. Then in the rest of it it breaks down different areas by specific scrap code and makes them into graphs or just variables for me to look at.
– letto4135
Jan 24 at 13:23
Normally you'd do that kind of thing directly in Excel — you can group and sum rows by categories using a pivot table, and draw graphs using pivot charts. Can you edit the post to explain why this didn't work for you?
– Gareth Rees
Jan 24 at 13:45
1
@GarethRees it did work, just want to know if there are steps I can combine. Also I do this in excel normally, started working on this to just see if I could with programming and it worked, now I’m looking for improvements. I do a lot of very advanced work in excel to the point that I thought I might enjoy programming so I started learning it, and I do enjoy it.
– letto4135
Jan 24 at 14:51