extract product code using regular expression in Python and apply to a column [duplicate]
up vote
0
down vote
favorite
This question already has an answer here:
Pandas Extract Number from String
2 answers
I have a pd.DataFrame
with multiple columns and one column has url extracted from web e.g.:
url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"
I have used regular expressions to extract the product code as below
re.findall('d+', url)
However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error
regex = lambda x: x.re.findall('d+')
df["new_column"] = df['url'].apply(regex)
'str' object has no attribute 're' .
python pandas
marked as duplicate by Vaishali
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
up vote
0
down vote
favorite
This question already has an answer here:
Pandas Extract Number from String
2 answers
I have a pd.DataFrame
with multiple columns and one column has url extracted from web e.g.:
url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"
I have used regular expressions to extract the product code as below
re.findall('d+', url)
However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error
regex = lambda x: x.re.findall('d+')
df["new_column"] = df['url'].apply(regex)
'str' object has no attribute 're' .
python pandas
marked as duplicate by Vaishali
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
In pandas, usedf['url'].str.extractall(r'd+')
instead. pandas.pydata.org/pandas-docs/stable/generated/…
– Frank
Nov 19 at 20:18
1
Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
This question already has an answer here:
Pandas Extract Number from String
2 answers
I have a pd.DataFrame
with multiple columns and one column has url extracted from web e.g.:
url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"
I have used regular expressions to extract the product code as below
re.findall('d+', url)
However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error
regex = lambda x: x.re.findall('d+')
df["new_column"] = df['url'].apply(regex)
'str' object has no attribute 're' .
python pandas
This question already has an answer here:
Pandas Extract Number from String
2 answers
I have a pd.DataFrame
with multiple columns and one column has url extracted from web e.g.:
url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"
I have used regular expressions to extract the product code as below
re.findall('d+', url)
However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error
regex = lambda x: x.re.findall('d+')
df["new_column"] = df['url'].apply(regex)
'str' object has no attribute 're' .
This question already has an answer here:
Pandas Extract Number from String
2 answers
python pandas
python pandas
edited Nov 19 at 20:22
Idlehands
3,9021417
3,9021417
asked Nov 19 at 20:16
Neha Sharma
12410
12410
marked as duplicate by Vaishali
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Vaishali
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
In pandas, usedf['url'].str.extractall(r'd+')
instead. pandas.pydata.org/pandas-docs/stable/generated/…
– Frank
Nov 19 at 20:18
1
Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22
add a comment |
1
In pandas, usedf['url'].str.extractall(r'd+')
instead. pandas.pydata.org/pandas-docs/stable/generated/…
– Frank
Nov 19 at 20:18
1
Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22
1
1
In pandas, use
df['url'].str.extractall(r'd+')
instead. pandas.pydata.org/pandas-docs/stable/generated/…– Frank
Nov 19 at 20:18
In pandas, use
df['url'].str.extractall(r'd+')
instead. pandas.pydata.org/pandas-docs/stable/generated/…– Frank
Nov 19 at 20:18
1
1
Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22
Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Just use the same syntax in your lambda function that you used in your scaler example:
regex = lambda x: re.findall('d+', x)
you probably want the zeroeth element too so you don't any up with a series of lists
regex = lambda x: re.findall('d+', x)[0]
df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Just use the same syntax in your lambda function that you used in your scaler example:
regex = lambda x: re.findall('d+', x)
you probably want the zeroeth element too so you don't any up with a series of lists
regex = lambda x: re.findall('d+', x)[0]
df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59
add a comment |
up vote
0
down vote
Just use the same syntax in your lambda function that you used in your scaler example:
regex = lambda x: re.findall('d+', x)
you probably want the zeroeth element too so you don't any up with a series of lists
regex = lambda x: re.findall('d+', x)[0]
df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59
add a comment |
up vote
0
down vote
up vote
0
down vote
Just use the same syntax in your lambda function that you used in your scaler example:
regex = lambda x: re.findall('d+', x)
you probably want the zeroeth element too so you don't any up with a series of lists
regex = lambda x: re.findall('d+', x)[0]
Just use the same syntax in your lambda function that you used in your scaler example:
regex = lambda x: re.findall('d+', x)
you probably want the zeroeth element too so you don't any up with a series of lists
regex = lambda x: re.findall('d+', x)[0]
answered Nov 19 at 20:22
Robert
33429
33429
df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59
add a comment |
df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59
df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59
df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59
add a comment |
1
In pandas, use
df['url'].str.extractall(r'd+')
instead. pandas.pydata.org/pandas-docs/stable/generated/…– Frank
Nov 19 at 20:18
1
Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22