extract product code using regular expression in Python and apply to a column [duplicate]











up vote
0
down vote

favorite













This question already has an answer here:




  • Pandas Extract Number from String

    2 answers




I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:



url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"


I have used regular expressions to extract the product code as below



re.findall('d+', url)


However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error



regex = lambda x: x.re.findall('d+')
df["new_column"] = df['url'].apply(regex)



'str' object has no attribute 're' .











share|improve this question















marked as duplicate by Vaishali pandas
Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.











  • 1




    In pandas, use df['url'].str.extractall(r'd+') instead. pandas.pydata.org/pandas-docs/stable/generated/…
    – Frank
    Nov 19 at 20:18








  • 1




    Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
    – Vaishali
    Nov 19 at 20:22















up vote
0
down vote

favorite













This question already has an answer here:




  • Pandas Extract Number from String

    2 answers




I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:



url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"


I have used regular expressions to extract the product code as below



re.findall('d+', url)


However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error



regex = lambda x: x.re.findall('d+')
df["new_column"] = df['url'].apply(regex)



'str' object has no attribute 're' .











share|improve this question















marked as duplicate by Vaishali pandas
Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.











  • 1




    In pandas, use df['url'].str.extractall(r'd+') instead. pandas.pydata.org/pandas-docs/stable/generated/…
    – Frank
    Nov 19 at 20:18








  • 1




    Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
    – Vaishali
    Nov 19 at 20:22













up vote
0
down vote

favorite









up vote
0
down vote

favorite












This question already has an answer here:




  • Pandas Extract Number from String

    2 answers




I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:



url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"


I have used regular expressions to extract the product code as below



re.findall('d+', url)


However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error



regex = lambda x: x.re.findall('d+')
df["new_column"] = df['url'].apply(regex)



'str' object has no attribute 're' .











share|improve this question
















This question already has an answer here:




  • Pandas Extract Number from String

    2 answers




I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:



url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"


I have used regular expressions to extract the product code as below



re.findall('d+', url)


However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error



regex = lambda x: x.re.findall('d+')
df["new_column"] = df['url'].apply(regex)



'str' object has no attribute 're' .






This question already has an answer here:




  • Pandas Extract Number from String

    2 answers








python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 20:22









Idlehands

3,9021417




3,9021417










asked Nov 19 at 20:16









Neha Sharma

12410




12410




marked as duplicate by Vaishali pandas
Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






marked as duplicate by Vaishali pandas
Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.










  • 1




    In pandas, use df['url'].str.extractall(r'd+') instead. pandas.pydata.org/pandas-docs/stable/generated/…
    – Frank
    Nov 19 at 20:18








  • 1




    Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
    – Vaishali
    Nov 19 at 20:22














  • 1




    In pandas, use df['url'].str.extractall(r'd+') instead. pandas.pydata.org/pandas-docs/stable/generated/…
    – Frank
    Nov 19 at 20:18








  • 1




    Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
    – Vaishali
    Nov 19 at 20:22








1




1




In pandas, use df['url'].str.extractall(r'd+') instead. pandas.pydata.org/pandas-docs/stable/generated/…
– Frank
Nov 19 at 20:18






In pandas, use df['url'].str.extractall(r'd+') instead. pandas.pydata.org/pandas-docs/stable/generated/…
– Frank
Nov 19 at 20:18






1




1




Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22




Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22












1 Answer
1






active

oldest

votes

















up vote
0
down vote













Just use the same syntax in your lambda function that you used in your scaler example:



regex = lambda x: re.findall('d+', x)


you probably want the zeroeth element too so you don't any up with a series of lists



regex = lambda x: re.findall('d+', x)[0]





share|improve this answer





















  • df['url'].str.extract('(d+)', expand = False) this one does the trick
    – Neha Sharma
    Nov 19 at 21:59


















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













Just use the same syntax in your lambda function that you used in your scaler example:



regex = lambda x: re.findall('d+', x)


you probably want the zeroeth element too so you don't any up with a series of lists



regex = lambda x: re.findall('d+', x)[0]





share|improve this answer





















  • df['url'].str.extract('(d+)', expand = False) this one does the trick
    – Neha Sharma
    Nov 19 at 21:59















up vote
0
down vote













Just use the same syntax in your lambda function that you used in your scaler example:



regex = lambda x: re.findall('d+', x)


you probably want the zeroeth element too so you don't any up with a series of lists



regex = lambda x: re.findall('d+', x)[0]





share|improve this answer





















  • df['url'].str.extract('(d+)', expand = False) this one does the trick
    – Neha Sharma
    Nov 19 at 21:59













up vote
0
down vote










up vote
0
down vote









Just use the same syntax in your lambda function that you used in your scaler example:



regex = lambda x: re.findall('d+', x)


you probably want the zeroeth element too so you don't any up with a series of lists



regex = lambda x: re.findall('d+', x)[0]





share|improve this answer












Just use the same syntax in your lambda function that you used in your scaler example:



regex = lambda x: re.findall('d+', x)


you probably want the zeroeth element too so you don't any up with a series of lists



regex = lambda x: re.findall('d+', x)[0]






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 19 at 20:22









Robert

33429




33429












  • df['url'].str.extract('(d+)', expand = False) this one does the trick
    – Neha Sharma
    Nov 19 at 21:59


















  • df['url'].str.extract('(d+)', expand = False) this one does the trick
    – Neha Sharma
    Nov 19 at 21:59
















df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59




df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59



Popular posts from this blog

Create new schema in PostgreSQL using DBeaver

Deepest pit of an array with Javascript: test on Codility

Costa Masnaga