lxml/python reading xml with CDATA section
In my xml I have a CDATA section. I want to keep the CDATA part, and then strip it. Can someone help with the following?
Default does not work:
$ from io import StringIO
$ from lxml import etree
$ xml = '<Subject> My Subject: 美海軍研究船勘查台海水文? 船<![CDATA[é]]>€ </Subject>'
$ tree = etree.parse(StringIO(xml))
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
This post seems to suggest that a parser option strip_cdata=False may keep the cdata, but it has no effect:
$ parser=etree.XMLParser(strip_cdata=False)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
Using strip_cdata=True, which should be the default, yields the same:
$ parser=etree.XMLParser(strip_cdata=True)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
python python-3.x lxml elementtree cdata
add a comment |
In my xml I have a CDATA section. I want to keep the CDATA part, and then strip it. Can someone help with the following?
Default does not work:
$ from io import StringIO
$ from lxml import etree
$ xml = '<Subject> My Subject: 美海軍研究船勘查台海水文? 船<![CDATA[é]]>€ </Subject>'
$ tree = etree.parse(StringIO(xml))
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
This post seems to suggest that a parser option strip_cdata=False may keep the cdata, but it has no effect:
$ parser=etree.XMLParser(strip_cdata=False)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
Using strip_cdata=True, which should be the default, yields the same:
$ parser=etree.XMLParser(strip_cdata=True)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
python python-3.x lxml elementtree cdata
If you add enough of the relevant XML, we might able to test.
– usr2564301
Nov 23 '18 at 23:19
Is that example not enough? I can add more.
– Sudipta Basak
Nov 23 '18 at 23:27
Ah, sorry. It's hard to read, with those numbers before your actual code and data. If they are not an important part of your question, remove them.
– usr2564301
Nov 23 '18 at 23:28
add a comment |
In my xml I have a CDATA section. I want to keep the CDATA part, and then strip it. Can someone help with the following?
Default does not work:
$ from io import StringIO
$ from lxml import etree
$ xml = '<Subject> My Subject: 美海軍研究船勘查台海水文? 船<![CDATA[é]]>€ </Subject>'
$ tree = etree.parse(StringIO(xml))
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
This post seems to suggest that a parser option strip_cdata=False may keep the cdata, but it has no effect:
$ parser=etree.XMLParser(strip_cdata=False)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
Using strip_cdata=True, which should be the default, yields the same:
$ parser=etree.XMLParser(strip_cdata=True)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
python python-3.x lxml elementtree cdata
In my xml I have a CDATA section. I want to keep the CDATA part, and then strip it. Can someone help with the following?
Default does not work:
$ from io import StringIO
$ from lxml import etree
$ xml = '<Subject> My Subject: 美海軍研究船勘查台海水文? 船<![CDATA[é]]>€ </Subject>'
$ tree = etree.parse(StringIO(xml))
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
This post seems to suggest that a parser option strip_cdata=False may keep the cdata, but it has no effect:
$ parser=etree.XMLParser(strip_cdata=False)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
Using strip_cdata=True, which should be the default, yields the same:
$ parser=etree.XMLParser(strip_cdata=True)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
python python-3.x lxml elementtree cdata
python python-3.x lxml elementtree cdata
edited Nov 24 '18 at 1:02
Sudipta Basak
asked Nov 23 '18 at 23:17
Sudipta BasakSudipta Basak
2,22821112
2,22821112
If you add enough of the relevant XML, we might able to test.
– usr2564301
Nov 23 '18 at 23:19
Is that example not enough? I can add more.
– Sudipta Basak
Nov 23 '18 at 23:27
Ah, sorry. It's hard to read, with those numbers before your actual code and data. If they are not an important part of your question, remove them.
– usr2564301
Nov 23 '18 at 23:28
add a comment |
If you add enough of the relevant XML, we might able to test.
– usr2564301
Nov 23 '18 at 23:19
Is that example not enough? I can add more.
– Sudipta Basak
Nov 23 '18 at 23:27
Ah, sorry. It's hard to read, with those numbers before your actual code and data. If they are not an important part of your question, remove them.
– usr2564301
Nov 23 '18 at 23:28
If you add enough of the relevant XML, we might able to test.
– usr2564301
Nov 23 '18 at 23:19
If you add enough of the relevant XML, we might able to test.
– usr2564301
Nov 23 '18 at 23:19
Is that example not enough? I can add more.
– Sudipta Basak
Nov 23 '18 at 23:27
Is that example not enough? I can add more.
– Sudipta Basak
Nov 23 '18 at 23:27
Ah, sorry. It's hard to read, with those numbers before your actual code and data. If they are not an important part of your question, remove them.
– usr2564301
Nov 23 '18 at 23:28
Ah, sorry. It's hard to read, with those numbers before your actual code and data. If they are not an important part of your question, remove them.
– usr2564301
Nov 23 '18 at 23:28
add a comment |
1 Answer
1
active
oldest
votes
CDATA sections are not preserved in the text property of an element, even if strip_cdata=False is used when the XML content is parsed, as you have noticed. See https://lxml.de/api.html#cdata.
CDATA sections are preserved in these cases:
When serializing with
tostring():
print(etree.tostring(tree.getroot(), encoding="UTF-8").decode())
When writing to a file:
tree.write("subject.xml", encoding="UTF-8")
Thanks for that. I read that part, but did not realiseetree.tostringserialises.
– Sudipta Basak
Nov 24 '18 at 14:26
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53453791%2flxml-python-reading-xml-with-cdata-section%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
CDATA sections are not preserved in the text property of an element, even if strip_cdata=False is used when the XML content is parsed, as you have noticed. See https://lxml.de/api.html#cdata.
CDATA sections are preserved in these cases:
When serializing with
tostring():
print(etree.tostring(tree.getroot(), encoding="UTF-8").decode())
When writing to a file:
tree.write("subject.xml", encoding="UTF-8")
Thanks for that. I read that part, but did not realiseetree.tostringserialises.
– Sudipta Basak
Nov 24 '18 at 14:26
add a comment |
CDATA sections are not preserved in the text property of an element, even if strip_cdata=False is used when the XML content is parsed, as you have noticed. See https://lxml.de/api.html#cdata.
CDATA sections are preserved in these cases:
When serializing with
tostring():
print(etree.tostring(tree.getroot(), encoding="UTF-8").decode())
When writing to a file:
tree.write("subject.xml", encoding="UTF-8")
Thanks for that. I read that part, but did not realiseetree.tostringserialises.
– Sudipta Basak
Nov 24 '18 at 14:26
add a comment |
CDATA sections are not preserved in the text property of an element, even if strip_cdata=False is used when the XML content is parsed, as you have noticed. See https://lxml.de/api.html#cdata.
CDATA sections are preserved in these cases:
When serializing with
tostring():
print(etree.tostring(tree.getroot(), encoding="UTF-8").decode())
When writing to a file:
tree.write("subject.xml", encoding="UTF-8")
CDATA sections are not preserved in the text property of an element, even if strip_cdata=False is used when the XML content is parsed, as you have noticed. See https://lxml.de/api.html#cdata.
CDATA sections are preserved in these cases:
When serializing with
tostring():
print(etree.tostring(tree.getroot(), encoding="UTF-8").decode())
When writing to a file:
tree.write("subject.xml", encoding="UTF-8")
answered Nov 24 '18 at 7:02
mzjnmzjn
31.9k669155
31.9k669155
Thanks for that. I read that part, but did not realiseetree.tostringserialises.
– Sudipta Basak
Nov 24 '18 at 14:26
add a comment |
Thanks for that. I read that part, but did not realiseetree.tostringserialises.
– Sudipta Basak
Nov 24 '18 at 14:26
Thanks for that. I read that part, but did not realise
etree.tostring serialises.– Sudipta Basak
Nov 24 '18 at 14:26
Thanks for that. I read that part, but did not realise
etree.tostring serialises.– Sudipta Basak
Nov 24 '18 at 14:26
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53453791%2flxml-python-reading-xml-with-cdata-section%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you add enough of the relevant XML, we might able to test.
– usr2564301
Nov 23 '18 at 23:19
Is that example not enough? I can add more.
– Sudipta Basak
Nov 23 '18 at 23:27
Ah, sorry. It's hard to read, with those numbers before your actual code and data. If they are not an important part of your question, remove them.
– usr2564301
Nov 23 '18 at 23:28