htmlagilitypack and dynamic content issue











up vote
10
down vote

favorite
9












I want to create a web scrapper application and i want to do it with webbrowser control, htmlagilitypack and xpath.



right now i managed to create xpath generator(I used webbrowser for this purpose), which works fine, but sometimes I cannot grab dynamically (via javascript or ajax) generated content. Also I found out that when webbrowser control(actually IE browser) generates some extra tags like "tbody", while again htmlagilitypack
`htmlWeb.Load(webBrowser.DocumentStream);` doesn't see it.



another note. I found out that following code actually grabs current webpage source, but I couldn't supply with it the htmlagilitypack
`(mshtml.IHTMLDocument3)webBrowser.Document.DomDocument;`



Can you please help me with it?










share|improve this question
























  • help with what? what is your specific question? You have to show some code to get a real help.
    – L.B
    Apr 16 '12 at 6:22








  • 2




    sorry guys, i found solution here: var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser.Document.DomDocument; StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML); htmlDoc.Load(sr); and it worked.
    – Chyngyz Sydykov
    Apr 16 '12 at 6:32












  • @user1322188: how you can reterived the dynamic content of page ?is htmlagility pack is used to reterive the dynamic content.
    – SivaRajini
    Feb 18 '14 at 10:39















up vote
10
down vote

favorite
9












I want to create a web scrapper application and i want to do it with webbrowser control, htmlagilitypack and xpath.



right now i managed to create xpath generator(I used webbrowser for this purpose), which works fine, but sometimes I cannot grab dynamically (via javascript or ajax) generated content. Also I found out that when webbrowser control(actually IE browser) generates some extra tags like "tbody", while again htmlagilitypack
`htmlWeb.Load(webBrowser.DocumentStream);` doesn't see it.



another note. I found out that following code actually grabs current webpage source, but I couldn't supply with it the htmlagilitypack
`(mshtml.IHTMLDocument3)webBrowser.Document.DomDocument;`



Can you please help me with it?










share|improve this question
























  • help with what? what is your specific question? You have to show some code to get a real help.
    – L.B
    Apr 16 '12 at 6:22








  • 2




    sorry guys, i found solution here: var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser.Document.DomDocument; StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML); htmlDoc.Load(sr); and it worked.
    – Chyngyz Sydykov
    Apr 16 '12 at 6:32












  • @user1322188: how you can reterived the dynamic content of page ?is htmlagility pack is used to reterive the dynamic content.
    – SivaRajini
    Feb 18 '14 at 10:39













up vote
10
down vote

favorite
9









up vote
10
down vote

favorite
9






9





I want to create a web scrapper application and i want to do it with webbrowser control, htmlagilitypack and xpath.



right now i managed to create xpath generator(I used webbrowser for this purpose), which works fine, but sometimes I cannot grab dynamically (via javascript or ajax) generated content. Also I found out that when webbrowser control(actually IE browser) generates some extra tags like "tbody", while again htmlagilitypack
`htmlWeb.Load(webBrowser.DocumentStream);` doesn't see it.



another note. I found out that following code actually grabs current webpage source, but I couldn't supply with it the htmlagilitypack
`(mshtml.IHTMLDocument3)webBrowser.Document.DomDocument;`



Can you please help me with it?










share|improve this question















I want to create a web scrapper application and i want to do it with webbrowser control, htmlagilitypack and xpath.



right now i managed to create xpath generator(I used webbrowser for this purpose), which works fine, but sometimes I cannot grab dynamically (via javascript or ajax) generated content. Also I found out that when webbrowser control(actually IE browser) generates some extra tags like "tbody", while again htmlagilitypack
`htmlWeb.Load(webBrowser.DocumentStream);` doesn't see it.



another note. I found out that following code actually grabs current webpage source, but I couldn't supply with it the htmlagilitypack
`(mshtml.IHTMLDocument3)webBrowser.Document.DomDocument;`



Can you please help me with it?







c# html-agility-pack dynamic-content






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 9:55









Vinay Pandey

4,50862947




4,50862947










asked Apr 16 '12 at 6:17









Chyngyz Sydykov

1551316




1551316












  • help with what? what is your specific question? You have to show some code to get a real help.
    – L.B
    Apr 16 '12 at 6:22








  • 2




    sorry guys, i found solution here: var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser.Document.DomDocument; StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML); htmlDoc.Load(sr); and it worked.
    – Chyngyz Sydykov
    Apr 16 '12 at 6:32












  • @user1322188: how you can reterived the dynamic content of page ?is htmlagility pack is used to reterive the dynamic content.
    – SivaRajini
    Feb 18 '14 at 10:39


















  • help with what? what is your specific question? You have to show some code to get a real help.
    – L.B
    Apr 16 '12 at 6:22








  • 2




    sorry guys, i found solution here: var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser.Document.DomDocument; StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML); htmlDoc.Load(sr); and it worked.
    – Chyngyz Sydykov
    Apr 16 '12 at 6:32












  • @user1322188: how you can reterived the dynamic content of page ?is htmlagility pack is used to reterive the dynamic content.
    – SivaRajini
    Feb 18 '14 at 10:39
















help with what? what is your specific question? You have to show some code to get a real help.
– L.B
Apr 16 '12 at 6:22






help with what? what is your specific question? You have to show some code to get a real help.
– L.B
Apr 16 '12 at 6:22






2




2




sorry guys, i found solution here: var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser.Document.DomDocument; StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML); htmlDoc.Load(sr); and it worked.
– Chyngyz Sydykov
Apr 16 '12 at 6:32






sorry guys, i found solution here: var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser.Document.DomDocument; StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML); htmlDoc.Load(sr); and it worked.
– Chyngyz Sydykov
Apr 16 '12 at 6:32














@user1322188: how you can reterived the dynamic content of page ?is htmlagility pack is used to reterive the dynamic content.
– SivaRajini
Feb 18 '14 at 10:39




@user1322188: how you can reterived the dynamic content of page ?is htmlagility pack is used to reterive the dynamic content.
– SivaRajini
Feb 18 '14 at 10:39












3 Answers
3






active

oldest

votes

















up vote
18
down vote













I just spent hours trying to get HtmlAgilityPack to render some ajax dynamic content from a webpage and I was going from one useless post to another until I found this one.



The answer is hidden in a comment under the initial post and I thought I should straighten it out.



This is the method that I used initially and didn't work:



private void LoadTraditionalWay(String url)
{
WebRequest myWebRequest = WebRequest.Create(url);
WebResponse myWebResponse = myWebRequest.GetResponse();
Stream ReceiveStream = myWebResponse.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
TextReader reader = new StreamReader(ReceiveStream, encode);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(reader);
reader.Close();
}


WebRequest will not render or execute the ajax queries that render the missing content.



This is the solution that worked:



private void LoadHtmlWithBrowser(String url)
{
webBrowser1.ScriptErrorsSuppressed = true;
webBrowser1.Navigate(url);

waitTillLoad(this.webBrowser1);

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser1.Document.DomDocument;
StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML);
doc.Load(sr);
}

private void waitTillLoad(WebBrowser webBrControl)
{
WebBrowserReadyState loadStatus;
int waittime = 100000;
int counter = 0;
while (true)
{
loadStatus = webBrControl.ReadyState;
Application.DoEvents();
if ((counter > waittime) || (loadStatus == WebBrowserReadyState.Uninitialized) || (loadStatus == WebBrowserReadyState.Loading) || (loadStatus == WebBrowserReadyState.Interactive))
{
break;
}
counter++;
}

counter = 0;
while (true)
{
loadStatus = webBrControl.ReadyState;
Application.DoEvents();
if (loadStatus == WebBrowserReadyState.Complete && webBrControl.IsBusy != true)
{
break;
}
counter++;
}
}


The idea is to load using the WebBrowser which is capable of rendering the ajax content and then wait till the page has fully rendered before then using the Microsoft.mshtml library to re-parse the HTML into the agility pack.



This was the only way I could get access to the dynamic data.



Hope it helps someone






share|improve this answer

















  • 2




    Good work, Nick! Thanks for posting your solution -- it was very useful for me! What a chore! I'll add that MSHTML is named "Microsoft HTML object library" when adding the reference.
    – Daniel
    Jul 14 '14 at 23:46












  • Is the document for passing to HTMLAgilityPAck now in 'sr' and this just needs manipulating?
    – Phill Healey
    Jul 23 '14 at 8:47










  • what time is webBrowser1?
    – tweetypi
    Sep 21 '17 at 9:18










  • Just for the reference, if you're running not in WinForms (or any STA) context, you will have to start the WebBrowser in STA container. Something like this: var t = new Thread(MyThreadStartMethod); t.SetApartmentState(ApartmentState.STA); t.Start();
    – Korli
    Dec 20 '17 at 9:48












  • I am having the same problem I want to get the content of table which is dynamically loaded with JS the div which is created by JS its id is packageTabContainer but I get null, I have tried the solution but didn't get the content here is the link I am need to extract. ikea.com/qa/en/catalog/products/60368726
    – Khan Engineer
    Jun 27 at 15:59


















up vote
1
down vote













Would Selenium do the trick. As far as I am aware it creates instances of browser engines.. sort of and should allow js to be executed and allow you to get the result of the manipulated DOM.






share|improve this answer





















  • I tried this myself last night with Selenium (albeit with a wait) and it allowed the javascript on the page to update the DOM and I could access the changes to the DOM via code.
    – Lee Englestone
    Aug 7 '15 at 12:52


















up vote
-4
down vote













Use HTML Agility pack document's following method.



htmlAgilityPackDocument.LoadHtml(this.browser.DocumentText);


OR



if (this.browser.Document.GetElementsByTagName("html")[0] != null)
_htmlAgilityPackDocument.LoadHtml(this.browser.Document.GetElementsByTagName("html")[0].OuterHtml);





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f10169484%2fhtmlagilitypack-and-dynamic-content-issue%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    18
    down vote













    I just spent hours trying to get HtmlAgilityPack to render some ajax dynamic content from a webpage and I was going from one useless post to another until I found this one.



    The answer is hidden in a comment under the initial post and I thought I should straighten it out.



    This is the method that I used initially and didn't work:



    private void LoadTraditionalWay(String url)
    {
    WebRequest myWebRequest = WebRequest.Create(url);
    WebResponse myWebResponse = myWebRequest.GetResponse();
    Stream ReceiveStream = myWebResponse.GetResponseStream();
    Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
    TextReader reader = new StreamReader(ReceiveStream, encode);
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.Load(reader);
    reader.Close();
    }


    WebRequest will not render or execute the ajax queries that render the missing content.



    This is the solution that worked:



    private void LoadHtmlWithBrowser(String url)
    {
    webBrowser1.ScriptErrorsSuppressed = true;
    webBrowser1.Navigate(url);

    waitTillLoad(this.webBrowser1);

    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser1.Document.DomDocument;
    StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML);
    doc.Load(sr);
    }

    private void waitTillLoad(WebBrowser webBrControl)
    {
    WebBrowserReadyState loadStatus;
    int waittime = 100000;
    int counter = 0;
    while (true)
    {
    loadStatus = webBrControl.ReadyState;
    Application.DoEvents();
    if ((counter > waittime) || (loadStatus == WebBrowserReadyState.Uninitialized) || (loadStatus == WebBrowserReadyState.Loading) || (loadStatus == WebBrowserReadyState.Interactive))
    {
    break;
    }
    counter++;
    }

    counter = 0;
    while (true)
    {
    loadStatus = webBrControl.ReadyState;
    Application.DoEvents();
    if (loadStatus == WebBrowserReadyState.Complete && webBrControl.IsBusy != true)
    {
    break;
    }
    counter++;
    }
    }


    The idea is to load using the WebBrowser which is capable of rendering the ajax content and then wait till the page has fully rendered before then using the Microsoft.mshtml library to re-parse the HTML into the agility pack.



    This was the only way I could get access to the dynamic data.



    Hope it helps someone






    share|improve this answer

















    • 2




      Good work, Nick! Thanks for posting your solution -- it was very useful for me! What a chore! I'll add that MSHTML is named "Microsoft HTML object library" when adding the reference.
      – Daniel
      Jul 14 '14 at 23:46












    • Is the document for passing to HTMLAgilityPAck now in 'sr' and this just needs manipulating?
      – Phill Healey
      Jul 23 '14 at 8:47










    • what time is webBrowser1?
      – tweetypi
      Sep 21 '17 at 9:18










    • Just for the reference, if you're running not in WinForms (or any STA) context, you will have to start the WebBrowser in STA container. Something like this: var t = new Thread(MyThreadStartMethod); t.SetApartmentState(ApartmentState.STA); t.Start();
      – Korli
      Dec 20 '17 at 9:48












    • I am having the same problem I want to get the content of table which is dynamically loaded with JS the div which is created by JS its id is packageTabContainer but I get null, I have tried the solution but didn't get the content here is the link I am need to extract. ikea.com/qa/en/catalog/products/60368726
      – Khan Engineer
      Jun 27 at 15:59















    up vote
    18
    down vote













    I just spent hours trying to get HtmlAgilityPack to render some ajax dynamic content from a webpage and I was going from one useless post to another until I found this one.



    The answer is hidden in a comment under the initial post and I thought I should straighten it out.



    This is the method that I used initially and didn't work:



    private void LoadTraditionalWay(String url)
    {
    WebRequest myWebRequest = WebRequest.Create(url);
    WebResponse myWebResponse = myWebRequest.GetResponse();
    Stream ReceiveStream = myWebResponse.GetResponseStream();
    Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
    TextReader reader = new StreamReader(ReceiveStream, encode);
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.Load(reader);
    reader.Close();
    }


    WebRequest will not render or execute the ajax queries that render the missing content.



    This is the solution that worked:



    private void LoadHtmlWithBrowser(String url)
    {
    webBrowser1.ScriptErrorsSuppressed = true;
    webBrowser1.Navigate(url);

    waitTillLoad(this.webBrowser1);

    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser1.Document.DomDocument;
    StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML);
    doc.Load(sr);
    }

    private void waitTillLoad(WebBrowser webBrControl)
    {
    WebBrowserReadyState loadStatus;
    int waittime = 100000;
    int counter = 0;
    while (true)
    {
    loadStatus = webBrControl.ReadyState;
    Application.DoEvents();
    if ((counter > waittime) || (loadStatus == WebBrowserReadyState.Uninitialized) || (loadStatus == WebBrowserReadyState.Loading) || (loadStatus == WebBrowserReadyState.Interactive))
    {
    break;
    }
    counter++;
    }

    counter = 0;
    while (true)
    {
    loadStatus = webBrControl.ReadyState;
    Application.DoEvents();
    if (loadStatus == WebBrowserReadyState.Complete && webBrControl.IsBusy != true)
    {
    break;
    }
    counter++;
    }
    }


    The idea is to load using the WebBrowser which is capable of rendering the ajax content and then wait till the page has fully rendered before then using the Microsoft.mshtml library to re-parse the HTML into the agility pack.



    This was the only way I could get access to the dynamic data.



    Hope it helps someone






    share|improve this answer

















    • 2




      Good work, Nick! Thanks for posting your solution -- it was very useful for me! What a chore! I'll add that MSHTML is named "Microsoft HTML object library" when adding the reference.
      – Daniel
      Jul 14 '14 at 23:46












    • Is the document for passing to HTMLAgilityPAck now in 'sr' and this just needs manipulating?
      – Phill Healey
      Jul 23 '14 at 8:47










    • what time is webBrowser1?
      – tweetypi
      Sep 21 '17 at 9:18










    • Just for the reference, if you're running not in WinForms (or any STA) context, you will have to start the WebBrowser in STA container. Something like this: var t = new Thread(MyThreadStartMethod); t.SetApartmentState(ApartmentState.STA); t.Start();
      – Korli
      Dec 20 '17 at 9:48












    • I am having the same problem I want to get the content of table which is dynamically loaded with JS the div which is created by JS its id is packageTabContainer but I get null, I have tried the solution but didn't get the content here is the link I am need to extract. ikea.com/qa/en/catalog/products/60368726
      – Khan Engineer
      Jun 27 at 15:59













    up vote
    18
    down vote










    up vote
    18
    down vote









    I just spent hours trying to get HtmlAgilityPack to render some ajax dynamic content from a webpage and I was going from one useless post to another until I found this one.



    The answer is hidden in a comment under the initial post and I thought I should straighten it out.



    This is the method that I used initially and didn't work:



    private void LoadTraditionalWay(String url)
    {
    WebRequest myWebRequest = WebRequest.Create(url);
    WebResponse myWebResponse = myWebRequest.GetResponse();
    Stream ReceiveStream = myWebResponse.GetResponseStream();
    Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
    TextReader reader = new StreamReader(ReceiveStream, encode);
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.Load(reader);
    reader.Close();
    }


    WebRequest will not render or execute the ajax queries that render the missing content.



    This is the solution that worked:



    private void LoadHtmlWithBrowser(String url)
    {
    webBrowser1.ScriptErrorsSuppressed = true;
    webBrowser1.Navigate(url);

    waitTillLoad(this.webBrowser1);

    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser1.Document.DomDocument;
    StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML);
    doc.Load(sr);
    }

    private void waitTillLoad(WebBrowser webBrControl)
    {
    WebBrowserReadyState loadStatus;
    int waittime = 100000;
    int counter = 0;
    while (true)
    {
    loadStatus = webBrControl.ReadyState;
    Application.DoEvents();
    if ((counter > waittime) || (loadStatus == WebBrowserReadyState.Uninitialized) || (loadStatus == WebBrowserReadyState.Loading) || (loadStatus == WebBrowserReadyState.Interactive))
    {
    break;
    }
    counter++;
    }

    counter = 0;
    while (true)
    {
    loadStatus = webBrControl.ReadyState;
    Application.DoEvents();
    if (loadStatus == WebBrowserReadyState.Complete && webBrControl.IsBusy != true)
    {
    break;
    }
    counter++;
    }
    }


    The idea is to load using the WebBrowser which is capable of rendering the ajax content and then wait till the page has fully rendered before then using the Microsoft.mshtml library to re-parse the HTML into the agility pack.



    This was the only way I could get access to the dynamic data.



    Hope it helps someone






    share|improve this answer












    I just spent hours trying to get HtmlAgilityPack to render some ajax dynamic content from a webpage and I was going from one useless post to another until I found this one.



    The answer is hidden in a comment under the initial post and I thought I should straighten it out.



    This is the method that I used initially and didn't work:



    private void LoadTraditionalWay(String url)
    {
    WebRequest myWebRequest = WebRequest.Create(url);
    WebResponse myWebResponse = myWebRequest.GetResponse();
    Stream ReceiveStream = myWebResponse.GetResponseStream();
    Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
    TextReader reader = new StreamReader(ReceiveStream, encode);
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.Load(reader);
    reader.Close();
    }


    WebRequest will not render or execute the ajax queries that render the missing content.



    This is the solution that worked:



    private void LoadHtmlWithBrowser(String url)
    {
    webBrowser1.ScriptErrorsSuppressed = true;
    webBrowser1.Navigate(url);

    waitTillLoad(this.webBrowser1);

    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser1.Document.DomDocument;
    StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML);
    doc.Load(sr);
    }

    private void waitTillLoad(WebBrowser webBrControl)
    {
    WebBrowserReadyState loadStatus;
    int waittime = 100000;
    int counter = 0;
    while (true)
    {
    loadStatus = webBrControl.ReadyState;
    Application.DoEvents();
    if ((counter > waittime) || (loadStatus == WebBrowserReadyState.Uninitialized) || (loadStatus == WebBrowserReadyState.Loading) || (loadStatus == WebBrowserReadyState.Interactive))
    {
    break;
    }
    counter++;
    }

    counter = 0;
    while (true)
    {
    loadStatus = webBrControl.ReadyState;
    Application.DoEvents();
    if (loadStatus == WebBrowserReadyState.Complete && webBrControl.IsBusy != true)
    {
    break;
    }
    counter++;
    }
    }


    The idea is to load using the WebBrowser which is capable of rendering the ajax content and then wait till the page has fully rendered before then using the Microsoft.mshtml library to re-parse the HTML into the agility pack.



    This was the only way I could get access to the dynamic data.



    Hope it helps someone







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Feb 22 '14 at 14:58









    Nick

    1,56812241




    1,56812241








    • 2




      Good work, Nick! Thanks for posting your solution -- it was very useful for me! What a chore! I'll add that MSHTML is named "Microsoft HTML object library" when adding the reference.
      – Daniel
      Jul 14 '14 at 23:46












    • Is the document for passing to HTMLAgilityPAck now in 'sr' and this just needs manipulating?
      – Phill Healey
      Jul 23 '14 at 8:47










    • what time is webBrowser1?
      – tweetypi
      Sep 21 '17 at 9:18










    • Just for the reference, if you're running not in WinForms (or any STA) context, you will have to start the WebBrowser in STA container. Something like this: var t = new Thread(MyThreadStartMethod); t.SetApartmentState(ApartmentState.STA); t.Start();
      – Korli
      Dec 20 '17 at 9:48












    • I am having the same problem I want to get the content of table which is dynamically loaded with JS the div which is created by JS its id is packageTabContainer but I get null, I have tried the solution but didn't get the content here is the link I am need to extract. ikea.com/qa/en/catalog/products/60368726
      – Khan Engineer
      Jun 27 at 15:59














    • 2




      Good work, Nick! Thanks for posting your solution -- it was very useful for me! What a chore! I'll add that MSHTML is named "Microsoft HTML object library" when adding the reference.
      – Daniel
      Jul 14 '14 at 23:46












    • Is the document for passing to HTMLAgilityPAck now in 'sr' and this just needs manipulating?
      – Phill Healey
      Jul 23 '14 at 8:47










    • what time is webBrowser1?
      – tweetypi
      Sep 21 '17 at 9:18










    • Just for the reference, if you're running not in WinForms (or any STA) context, you will have to start the WebBrowser in STA container. Something like this: var t = new Thread(MyThreadStartMethod); t.SetApartmentState(ApartmentState.STA); t.Start();
      – Korli
      Dec 20 '17 at 9:48












    • I am having the same problem I want to get the content of table which is dynamically loaded with JS the div which is created by JS its id is packageTabContainer but I get null, I have tried the solution but didn't get the content here is the link I am need to extract. ikea.com/qa/en/catalog/products/60368726
      – Khan Engineer
      Jun 27 at 15:59








    2




    2




    Good work, Nick! Thanks for posting your solution -- it was very useful for me! What a chore! I'll add that MSHTML is named "Microsoft HTML object library" when adding the reference.
    – Daniel
    Jul 14 '14 at 23:46






    Good work, Nick! Thanks for posting your solution -- it was very useful for me! What a chore! I'll add that MSHTML is named "Microsoft HTML object library" when adding the reference.
    – Daniel
    Jul 14 '14 at 23:46














    Is the document for passing to HTMLAgilityPAck now in 'sr' and this just needs manipulating?
    – Phill Healey
    Jul 23 '14 at 8:47




    Is the document for passing to HTMLAgilityPAck now in 'sr' and this just needs manipulating?
    – Phill Healey
    Jul 23 '14 at 8:47












    what time is webBrowser1?
    – tweetypi
    Sep 21 '17 at 9:18




    what time is webBrowser1?
    – tweetypi
    Sep 21 '17 at 9:18












    Just for the reference, if you're running not in WinForms (or any STA) context, you will have to start the WebBrowser in STA container. Something like this: var t = new Thread(MyThreadStartMethod); t.SetApartmentState(ApartmentState.STA); t.Start();
    – Korli
    Dec 20 '17 at 9:48






    Just for the reference, if you're running not in WinForms (or any STA) context, you will have to start the WebBrowser in STA container. Something like this: var t = new Thread(MyThreadStartMethod); t.SetApartmentState(ApartmentState.STA); t.Start();
    – Korli
    Dec 20 '17 at 9:48














    I am having the same problem I want to get the content of table which is dynamically loaded with JS the div which is created by JS its id is packageTabContainer but I get null, I have tried the solution but didn't get the content here is the link I am need to extract. ikea.com/qa/en/catalog/products/60368726
    – Khan Engineer
    Jun 27 at 15:59




    I am having the same problem I want to get the content of table which is dynamically loaded with JS the div which is created by JS its id is packageTabContainer but I get null, I have tried the solution but didn't get the content here is the link I am need to extract. ikea.com/qa/en/catalog/products/60368726
    – Khan Engineer
    Jun 27 at 15:59












    up vote
    1
    down vote













    Would Selenium do the trick. As far as I am aware it creates instances of browser engines.. sort of and should allow js to be executed and allow you to get the result of the manipulated DOM.






    share|improve this answer





















    • I tried this myself last night with Selenium (albeit with a wait) and it allowed the javascript on the page to update the DOM and I could access the changes to the DOM via code.
      – Lee Englestone
      Aug 7 '15 at 12:52















    up vote
    1
    down vote













    Would Selenium do the trick. As far as I am aware it creates instances of browser engines.. sort of and should allow js to be executed and allow you to get the result of the manipulated DOM.






    share|improve this answer





















    • I tried this myself last night with Selenium (albeit with a wait) and it allowed the javascript on the page to update the DOM and I could access the changes to the DOM via code.
      – Lee Englestone
      Aug 7 '15 at 12:52













    up vote
    1
    down vote










    up vote
    1
    down vote









    Would Selenium do the trick. As far as I am aware it creates instances of browser engines.. sort of and should allow js to be executed and allow you to get the result of the manipulated DOM.






    share|improve this answer












    Would Selenium do the trick. As far as I am aware it creates instances of browser engines.. sort of and should allow js to be executed and allow you to get the result of the manipulated DOM.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Aug 6 '15 at 17:39









    Lee Englestone

    1,969103673




    1,969103673












    • I tried this myself last night with Selenium (albeit with a wait) and it allowed the javascript on the page to update the DOM and I could access the changes to the DOM via code.
      – Lee Englestone
      Aug 7 '15 at 12:52


















    • I tried this myself last night with Selenium (albeit with a wait) and it allowed the javascript on the page to update the DOM and I could access the changes to the DOM via code.
      – Lee Englestone
      Aug 7 '15 at 12:52
















    I tried this myself last night with Selenium (albeit with a wait) and it allowed the javascript on the page to update the DOM and I could access the changes to the DOM via code.
    – Lee Englestone
    Aug 7 '15 at 12:52




    I tried this myself last night with Selenium (albeit with a wait) and it allowed the javascript on the page to update the DOM and I could access the changes to the DOM via code.
    – Lee Englestone
    Aug 7 '15 at 12:52










    up vote
    -4
    down vote













    Use HTML Agility pack document's following method.



    htmlAgilityPackDocument.LoadHtml(this.browser.DocumentText);


    OR



    if (this.browser.Document.GetElementsByTagName("html")[0] != null)
    _htmlAgilityPackDocument.LoadHtml(this.browser.Document.GetElementsByTagName("html")[0].OuterHtml);





    share|improve this answer



























      up vote
      -4
      down vote













      Use HTML Agility pack document's following method.



      htmlAgilityPackDocument.LoadHtml(this.browser.DocumentText);


      OR



      if (this.browser.Document.GetElementsByTagName("html")[0] != null)
      _htmlAgilityPackDocument.LoadHtml(this.browser.Document.GetElementsByTagName("html")[0].OuterHtml);





      share|improve this answer

























        up vote
        -4
        down vote










        up vote
        -4
        down vote









        Use HTML Agility pack document's following method.



        htmlAgilityPackDocument.LoadHtml(this.browser.DocumentText);


        OR



        if (this.browser.Document.GetElementsByTagName("html")[0] != null)
        _htmlAgilityPackDocument.LoadHtml(this.browser.Document.GetElementsByTagName("html")[0].OuterHtml);





        share|improve this answer














        Use HTML Agility pack document's following method.



        htmlAgilityPackDocument.LoadHtml(this.browser.DocumentText);


        OR



        if (this.browser.Document.GetElementsByTagName("html")[0] != null)
        _htmlAgilityPackDocument.LoadHtml(this.browser.Document.GetElementsByTagName("html")[0].OuterHtml);






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 12 '13 at 9:11









        j0k

        20.1k136575




        20.1k136575










        answered Mar 12 '13 at 8:48









        dev

        1




        1






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f10169484%2fhtmlagilitypack-and-dynamic-content-issue%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Ottavio Pratesi

            Tricia Helfer

            15 giugno