christojuan 5 Posted June 11, 2017 Report Share Posted June 11, 2017 (edited) Hi, I'm fairly new to this and am struggling with what is likely a simple issue to anyone with experience. On this page http://bazoogle3.com/testscrape2/I am trying to scrape data that is in the following positions on multiple pages per the example page 1) "Visits" "This period" which is 136 in this example2) "Page Views" "This period" which is 296 in this example3) "Mobile Visits" "This period" which is 62 in this exmaple. You can view the source of the page, but the relevant sections are: For 1 and 2 <table class="trafficSummary"> <tbody><tr> <!-- Summary data --> <td class="summaryData"> <p class="date">June 2017<span></span></p> <table class="stat_table"> <tbody><tr class="at"> <td class="stat_label"> </td> <td class="label">THIS PERIOD:</td> <td class="label">MOST RECENT 12 MONTHS:</td> </tr> </tbody></table> <table class="blue stat_table"> <tbody><tr> <td class="stat_label"><h3>VISITS</h3></td> <td class="num">136</td> <td class="num">4,441</td> </tr> </tbody></table> <hr> <table class="green"> <tbody><tr> <td class="stat_label"><h3>PAGE VIEWS</h3></td> <td class="num">295</td> <td class="num">8,923</td> </tr> </tbody></table> <p> Visits represent the number of potential clients who visited your website or blog. Page Views are the total pages they viewed. <br><br><i>Current month not included in 12-month totals or graphs. Data current within 72 hours.</i> </p> </td> <!-- End summary data -->For #3 <table class="trafficSummary"> <tbody><tr> <td class="summaryData"> <p class="date">June 2017<span></span></p> <table class="stat_table"> <tbody><tr class="at"> <td class="stat_label"> </td> <td class="label">THIS PERIOD:</td> <td class="label">MOST RECENT 12 MONTHS:</td> </tr> </tbody></table> <table class="blue stat_table"> <tbody><tr> <td class="stat_label"><h3>MOBILE VISITS</h3></td> <td class="num">62</td> <td class="num">1,244</td> </tr> </tbody></table> <hr> <table class="green"> <tbody><tr> <td class="stat_label"><h3>PERCENT</h3></td> <td class="num">46%</td> <td class="num">28%</td> </tr> </tbody></table> <p> Mobile visits include visits from mobile phone and tablet devices. The percent displayed represents how many total visits were from mobile devices. <br><br><i>Current month not included in 12-month totals or graphs. Data current within 72 hours.</i> </p> </td> <td class="chart"> <h3>MOBILE VISITS</h3> <div class="chart_img"> <img src="index_files/mobileVisitsChart-3779360-201706.png" alt="Visits / Page Views"> </div> </td> </tr> </tbody></table>For the first step, I am trying this code to srape "Visits" "This period". I tried using the selector and then adding a wild card. navigate("http://bazoogle3.com/testscrape2/","Wait")set(#var1,$scrape attribute(<class=w"*">,"innertext"),"Global") But, as you can see in debugger it generates a lot more data than the targeted 136 Can anyone provide me with some guidance/direction/solution to sucessfully scrape the 3 pieces of data that I seek, noting again that I am going to be load multiple similar pages that have different data in each of those positions. Note that I just purchased the Ex Browser plugin (but have not even opened it yet) so if there is a better solution using that please don't hesitate to offer the associated guidance. Thanks very much!Chris Edited June 11, 2017 by christojuan Quote Link to post Share on other sites
Varo 28 Posted June 11, 2017 Report Share Posted June 11, 2017 Hi, for this kind of page, the easiest way is using xpath parser.You can use Free Xpath Plugin by Dan. http://network.ubotstudio.com/forum/index.php/topic/19449-free-xpath-plugin/ here the code: navigate("http://bazoogle3.com/testscrape2/","Wait") set(#var1,$plugin function("XpathPlugin.dll", "$Generic Xpath Parser", $document text, "//td[@class=\'stat_label\']/h3[contains(text(),\'VISITS\')]/../../td[2]", "innertext", ""),"Global") alert(#var1) and here are the results: http://i.imgur.com/gjWkIua.png Hope it helps. Quote Link to post Share on other sites
christojuan 5 Posted June 11, 2017 Author Report Share Posted June 11, 2017 I was just playing with the XPATH Builder pro and struggling with the xpath to do that. This is perfect! I'll give it a try.Thanks VERY much!Chris Quote Link to post Share on other sites
christojuan 5 Posted June 11, 2017 Author Report Share Posted June 11, 2017 Hey I just quickly ran it through x path pro... is it possible for you to show me how I would isolate each of those?e.g. if just want the 136 (Visits) or just the 295 (Page views), or just 106 (search visits) etc.https://www.screencast.com/t/zdV3g2RXD any additional help would be appreciated.Thanks!Chris Quote Link to post Share on other sites
Varo 28 Posted June 12, 2017 Report Share Posted June 12, 2017 Hey I just quickly ran it through x path pro... is it possible for you to show me how I would isolate each of those?e.g. if just want the 136 (Visits) or just the 295 (Page views), or just 106 (search visits) etc.https://www.screencast.com/t/zdV3g2RXD any additional help would be appreciated.Thanks!Chris Yes you can isolate each of those. You can achieve it with 2 ways: 1. based on sequence results no 1 (//td[@class='stat_label']/h3[contains(text(),'VISITS')]/../../td[2])[1] results no 2 (//td[@class='stat_label']/h3[contains(text(),'VISITS')]/../../td[2])[2] results no 3 (//td[@class='stat_label']/h3[contains(text(),'VISITS')]/../../td[2])[3] 2. based on h3 text --> //td[@class=chart]/h3 as starting point and then you continue the xpath to the destination elementhttp://i.imgur.com/3HsaAul.png Quote Link to post Share on other sites
christojuan 5 Posted June 12, 2017 Author Report Share Posted June 12, 2017 (edited) You, my friend, are awesome. Before I saw your response, I went through Dan's X path training and your examples/solutions helped bring it all together. Thank you SO much for taking the time to help.I sincerely appreciate it. Edited June 12, 2017 by christojuan Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.