extract product code using regular expression in Python and apply to a column [duplicate]

up vote
0
down vote

favorite

This question already has an answer here:

Pandas Extract Number from String

2 answers

I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:

url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"

I have used regular expressions to extract the product code as below

re.findall('d+', url)

However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error

regex = lambda x: x.re.findall('d+')

df["new_column"] = df['url'].apply(regex)

'str' object has no attribute 're' .

edited Nov 19 at 20:22

Idlehands

3,9021417

asked Nov 19 at 20:16

Neha Sharma

12410

marked as duplicate by Vaishali pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

1

In pandas, use df['url'].str.extractall(r'd+') instead. pandas.pydata.org/pandas-docs/stable/generated/…
– Frank
Nov 19 at 20:18

1

Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22

add a comment |

up vote
0
down vote

favorite

This question already has an answer here:

Pandas Extract Number from String

2 answers

I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:

url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"

I have used regular expressions to extract the product code as below

re.findall('d+', url)

However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error

regex = lambda x: x.re.findall('d+')

df["new_column"] = df['url'].apply(regex)

'str' object has no attribute 're' .

edited Nov 19 at 20:22

Idlehands

3,9021417

asked Nov 19 at 20:16

Neha Sharma

12410

marked as duplicate by Vaishali pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

1

In pandas, use df['url'].str.extractall(r'd+') instead. pandas.pydata.org/pandas-docs/stable/generated/…
– Frank
Nov 19 at 20:18

1

Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22

add a comment |

up vote
0
down vote

favorite

This question already has an answer here:

Pandas Extract Number from String

2 answers

I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:

url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"

I have used regular expressions to extract the product code as below

re.findall('d+', url)

However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error

regex = lambda x: x.re.findall('d+')

df["new_column"] = df['url'].apply(regex)

'str' object has no attribute 're' .

edited Nov 19 at 20:22

Idlehands

3,9021417

asked Nov 19 at 20:16

Neha Sharma

12410

This question already has an answer here:

Pandas Extract Number from String

2 answers

I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:

url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"

I have used regular expressions to extract the product code as below

re.findall('d+', url)

However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error

regex = lambda x: x.re.findall('d+')

df["new_column"] = df['url'].apply(regex)

'str' object has no attribute 're' .

This question already has an answer here:

Pandas Extract Number from String

2 answers

python pandas

edited Nov 19 at 20:22

Idlehands

3,9021417

asked Nov 19 at 20:16

Neha Sharma

12410

edited Nov 19 at 20:22

Idlehands

3,9021417

asked Nov 19 at 20:16

Neha Sharma

12410

edited Nov 19 at 20:22

Idlehands

3,9021417

edited Nov 19 at 20:22

Idlehands

3,9021417

edited Nov 19 at 20:22

Idlehands

3,9021417

asked Nov 19 at 20:16

Neha Sharma

12410

asked Nov 19 at 20:16

Neha Sharma

12410

asked Nov 19 at 20:16

Neha Sharma

12410

marked as duplicate by Vaishali pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by Vaishali pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 19 at 20:22

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

1

In pandas, use df['url'].str.extractall(r'd+') instead. pandas.pydata.org/pandas-docs/stable/generated/…
– Frank
Nov 19 at 20:18

1

Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22

add a comment |

1

In pandas, use df['url'].str.extractall(r'd+') instead. pandas.pydata.org/pandas-docs/stable/generated/…
– Frank
Nov 19 at 20:18

1

Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22

In pandas, use df['url'].str.extractall(r'd+') instead. pandas.pydata.org/pandas-docs/stable/generated/…
– Frank
Nov 19 at 20:18

Use pandas str methods, df['url'].str.extract('(d+)', expand = False)
– Vaishali
Nov 19 at 20:22

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

Just use the same syntax in your lambda function that you used in your scaler example:

regex = lambda x: re.findall('d+', x)

you probably want the zeroeth element too so you don't any up with a series of lists

regex = lambda x: re.findall('d+', x)[0]

answered Nov 19 at 20:22

Robert

33429

df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

Just use the same syntax in your lambda function that you used in your scaler example:

regex = lambda x: re.findall('d+', x)

you probably want the zeroeth element too so you don't any up with a series of lists

regex = lambda x: re.findall('d+', x)[0]

answered Nov 19 at 20:22

Robert

33429

df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59

add a comment |

up vote
0
down vote

Just use the same syntax in your lambda function that you used in your scaler example:

regex = lambda x: re.findall('d+', x)

you probably want the zeroeth element too so you don't any up with a series of lists

regex = lambda x: re.findall('d+', x)[0]

answered Nov 19 at 20:22

Robert

33429

df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59

add a comment |

up vote
0
down vote

Just use the same syntax in your lambda function that you used in your scaler example:

regex = lambda x: re.findall('d+', x)

you probably want the zeroeth element too so you don't any up with a series of lists

regex = lambda x: re.findall('d+', x)[0]

answered Nov 19 at 20:22

Robert

33429

Just use the same syntax in your lambda function that you used in your scaler example:

regex = lambda x: re.findall('d+', x)

you probably want the zeroeth element too so you don't any up with a series of lists

regex = lambda x: re.findall('d+', x)[0]

answered Nov 19 at 20:22

Robert

33429

answered Nov 19 at 20:22

Robert

33429

answered Nov 19 at 20:22

Robert

33429

answered Nov 19 at 20:22

Robert

33429

df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59

add a comment |

df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59

df['url'].str.extract('(d+)', expand = False) this one does the trick
– Neha Sharma
Nov 19 at 21:59

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk