BeautifulSoup stripping whitespace
I am working on a basic horoscope parser from a website. Below is my code:
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.astrospeak.com/horoscope/capricorn"
response = requests.request("GET", url)
soup = bs(response.text, 'html.parser')
locater = soup.select("#sunsignPredictionDiv > div.fullDIV > div.lineHght18 > div")
quote = locater[0].previousSibling
This leaves me with the following <class 'bs4.element.NavigableString'>
:
"n You are working towards yet another dream and as you pursue this vision there's no doubt in your mind that it will come to fruition. It's written in the stars! n "
I am struggling how I can use the BeautifulSoup stripped_strings
generator on the bs4.element.NavigableString. What I would like to end up with is just the string You are working towards yet another dream and as you pursue this vision there's no doubt in your mind that it will come to fruition. It's written in the stars!
python python-3.x beautifulsoup
add a comment |
I am working on a basic horoscope parser from a website. Below is my code:
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.astrospeak.com/horoscope/capricorn"
response = requests.request("GET", url)
soup = bs(response.text, 'html.parser')
locater = soup.select("#sunsignPredictionDiv > div.fullDIV > div.lineHght18 > div")
quote = locater[0].previousSibling
This leaves me with the following <class 'bs4.element.NavigableString'>
:
"n You are working towards yet another dream and as you pursue this vision there's no doubt in your mind that it will come to fruition. It's written in the stars! n "
I am struggling how I can use the BeautifulSoup stripped_strings
generator on the bs4.element.NavigableString. What I would like to end up with is just the string You are working towards yet another dream and as you pursue this vision there's no doubt in your mind that it will come to fruition. It's written in the stars!
python python-3.x beautifulsoup
1
Sounds like something simple likequote = locater[0].previousSibling.strip()
should work...?
– Joachim Isaksson
Nov 22 '18 at 5:19
If you would like to post that as an answer, I will mark it as correct.
– Dan
Nov 22 '18 at 5:50
add a comment |
I am working on a basic horoscope parser from a website. Below is my code:
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.astrospeak.com/horoscope/capricorn"
response = requests.request("GET", url)
soup = bs(response.text, 'html.parser')
locater = soup.select("#sunsignPredictionDiv > div.fullDIV > div.lineHght18 > div")
quote = locater[0].previousSibling
This leaves me with the following <class 'bs4.element.NavigableString'>
:
"n You are working towards yet another dream and as you pursue this vision there's no doubt in your mind that it will come to fruition. It's written in the stars! n "
I am struggling how I can use the BeautifulSoup stripped_strings
generator on the bs4.element.NavigableString. What I would like to end up with is just the string You are working towards yet another dream and as you pursue this vision there's no doubt in your mind that it will come to fruition. It's written in the stars!
python python-3.x beautifulsoup
I am working on a basic horoscope parser from a website. Below is my code:
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.astrospeak.com/horoscope/capricorn"
response = requests.request("GET", url)
soup = bs(response.text, 'html.parser')
locater = soup.select("#sunsignPredictionDiv > div.fullDIV > div.lineHght18 > div")
quote = locater[0].previousSibling
This leaves me with the following <class 'bs4.element.NavigableString'>
:
"n You are working towards yet another dream and as you pursue this vision there's no doubt in your mind that it will come to fruition. It's written in the stars! n "
I am struggling how I can use the BeautifulSoup stripped_strings
generator on the bs4.element.NavigableString. What I would like to end up with is just the string You are working towards yet another dream and as you pursue this vision there's no doubt in your mind that it will come to fruition. It's written in the stars!
python python-3.x beautifulsoup
python python-3.x beautifulsoup
asked Nov 22 '18 at 5:02
DanDan
576
576
1
Sounds like something simple likequote = locater[0].previousSibling.strip()
should work...?
– Joachim Isaksson
Nov 22 '18 at 5:19
If you would like to post that as an answer, I will mark it as correct.
– Dan
Nov 22 '18 at 5:50
add a comment |
1
Sounds like something simple likequote = locater[0].previousSibling.strip()
should work...?
– Joachim Isaksson
Nov 22 '18 at 5:19
If you would like to post that as an answer, I will mark it as correct.
– Dan
Nov 22 '18 at 5:50
1
1
Sounds like something simple like
quote = locater[0].previousSibling.strip()
should work...?– Joachim Isaksson
Nov 22 '18 at 5:19
Sounds like something simple like
quote = locater[0].previousSibling.strip()
should work...?– Joachim Isaksson
Nov 22 '18 at 5:19
If you would like to post that as an answer, I will mark it as correct.
– Dan
Nov 22 '18 at 5:50
If you would like to post that as an answer, I will mark it as correct.
– Dan
Nov 22 '18 at 5:50
add a comment |
1 Answer
1
active
oldest
votes
I know the answer in the comment pretty much solves your problem, but I hope to give you some background:
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.astrospeak.com/horoscope/capricorn"
response = requests.get(url)
soup = bs(response.text, 'html.parser')
locater = soup.select("#sunsignPredictionDiv > div.fullDIV > div.lineHght18 > div")
quote = locater[0].previousSibling.strip()
So essentially I simplified your syntax by using just request.get
which is also documented in the requests docs. And added .strip()
. strip
is used to remove all whitespaces, this also includes newlines,n
and tabs,t
which are shown in their raw forms in a string. strip()
can also be used to remove leading and traling chars.
There is also lstrip()
and rstrip()
which basically translates to left leading or right trailing spaces respectively, that does the same thing. For examples and if you would like to read more, you can refer here
Thanks @BernardL, I have accepted your answer.
– Dan
Nov 22 '18 at 6:40
Just hope that the feedback helps you understand how it's being used.
– BernardL
Nov 22 '18 at 6:42
Absolutely @BernardL!
– Dan
Nov 25 '18 at 23:03
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424179%2fbeautifulsoup-stripping-whitespace%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I know the answer in the comment pretty much solves your problem, but I hope to give you some background:
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.astrospeak.com/horoscope/capricorn"
response = requests.get(url)
soup = bs(response.text, 'html.parser')
locater = soup.select("#sunsignPredictionDiv > div.fullDIV > div.lineHght18 > div")
quote = locater[0].previousSibling.strip()
So essentially I simplified your syntax by using just request.get
which is also documented in the requests docs. And added .strip()
. strip
is used to remove all whitespaces, this also includes newlines,n
and tabs,t
which are shown in their raw forms in a string. strip()
can also be used to remove leading and traling chars.
There is also lstrip()
and rstrip()
which basically translates to left leading or right trailing spaces respectively, that does the same thing. For examples and if you would like to read more, you can refer here
Thanks @BernardL, I have accepted your answer.
– Dan
Nov 22 '18 at 6:40
Just hope that the feedback helps you understand how it's being used.
– BernardL
Nov 22 '18 at 6:42
Absolutely @BernardL!
– Dan
Nov 25 '18 at 23:03
add a comment |
I know the answer in the comment pretty much solves your problem, but I hope to give you some background:
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.astrospeak.com/horoscope/capricorn"
response = requests.get(url)
soup = bs(response.text, 'html.parser')
locater = soup.select("#sunsignPredictionDiv > div.fullDIV > div.lineHght18 > div")
quote = locater[0].previousSibling.strip()
So essentially I simplified your syntax by using just request.get
which is also documented in the requests docs. And added .strip()
. strip
is used to remove all whitespaces, this also includes newlines,n
and tabs,t
which are shown in their raw forms in a string. strip()
can also be used to remove leading and traling chars.
There is also lstrip()
and rstrip()
which basically translates to left leading or right trailing spaces respectively, that does the same thing. For examples and if you would like to read more, you can refer here
Thanks @BernardL, I have accepted your answer.
– Dan
Nov 22 '18 at 6:40
Just hope that the feedback helps you understand how it's being used.
– BernardL
Nov 22 '18 at 6:42
Absolutely @BernardL!
– Dan
Nov 25 '18 at 23:03
add a comment |
I know the answer in the comment pretty much solves your problem, but I hope to give you some background:
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.astrospeak.com/horoscope/capricorn"
response = requests.get(url)
soup = bs(response.text, 'html.parser')
locater = soup.select("#sunsignPredictionDiv > div.fullDIV > div.lineHght18 > div")
quote = locater[0].previousSibling.strip()
So essentially I simplified your syntax by using just request.get
which is also documented in the requests docs. And added .strip()
. strip
is used to remove all whitespaces, this also includes newlines,n
and tabs,t
which are shown in their raw forms in a string. strip()
can also be used to remove leading and traling chars.
There is also lstrip()
and rstrip()
which basically translates to left leading or right trailing spaces respectively, that does the same thing. For examples and if you would like to read more, you can refer here
I know the answer in the comment pretty much solves your problem, but I hope to give you some background:
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.astrospeak.com/horoscope/capricorn"
response = requests.get(url)
soup = bs(response.text, 'html.parser')
locater = soup.select("#sunsignPredictionDiv > div.fullDIV > div.lineHght18 > div")
quote = locater[0].previousSibling.strip()
So essentially I simplified your syntax by using just request.get
which is also documented in the requests docs. And added .strip()
. strip
is used to remove all whitespaces, this also includes newlines,n
and tabs,t
which are shown in their raw forms in a string. strip()
can also be used to remove leading and traling chars.
There is also lstrip()
and rstrip()
which basically translates to left leading or right trailing spaces respectively, that does the same thing. For examples and if you would like to read more, you can refer here
answered Nov 22 '18 at 6:05
BernardLBernardL
2,37311029
2,37311029
Thanks @BernardL, I have accepted your answer.
– Dan
Nov 22 '18 at 6:40
Just hope that the feedback helps you understand how it's being used.
– BernardL
Nov 22 '18 at 6:42
Absolutely @BernardL!
– Dan
Nov 25 '18 at 23:03
add a comment |
Thanks @BernardL, I have accepted your answer.
– Dan
Nov 22 '18 at 6:40
Just hope that the feedback helps you understand how it's being used.
– BernardL
Nov 22 '18 at 6:42
Absolutely @BernardL!
– Dan
Nov 25 '18 at 23:03
Thanks @BernardL, I have accepted your answer.
– Dan
Nov 22 '18 at 6:40
Thanks @BernardL, I have accepted your answer.
– Dan
Nov 22 '18 at 6:40
Just hope that the feedback helps you understand how it's being used.
– BernardL
Nov 22 '18 at 6:42
Just hope that the feedback helps you understand how it's being used.
– BernardL
Nov 22 '18 at 6:42
Absolutely @BernardL!
– Dan
Nov 25 '18 at 23:03
Absolutely @BernardL!
– Dan
Nov 25 '18 at 23:03
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424179%2fbeautifulsoup-stripping-whitespace%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Sounds like something simple like
quote = locater[0].previousSibling.strip()
should work...?– Joachim Isaksson
Nov 22 '18 at 5:19
If you would like to post that as an answer, I will mark it as correct.
– Dan
Nov 22 '18 at 5:50