mdc101 15 Posted January 18, 2012 Report Share Posted January 18, 2012 Hi Folks I have come up against an odd scraping issue. I am scraping the following html This question has been viewed <strong>988</strong> times; it has <strong>1</strong> monitor with <strong>11277</strong> topic followers and <strong>0 </strong><a href="/Does-a-vanity-URL-shortener-improve-SEO/alias">aliases</a> exist. here is the code if($exists(<innertext="Question Stats">)) {then { comment("Question viewed - times") set(#QuestionViewed, $page scrape("This question has been viewed ", " times; it has "), "Global") set(#QuestionViewed, $replace(#QuestionViewed, "<strong>", ""), "Global") set(#QuestionViewed, $replace(#QuestionViewed, "</strong>", ""), "Global") set(#QuestionViewed, $trim(#QuestionViewed), "Global") add list to list(%ListViewed, $list from text(#QuestionViewed, ""), "Don\'t Delete", "Global") comment("monitor amount") set(#QuestionMonitor, $page scrape(" times; it has ", " monitor with "), "Global") set(#QuestionMonitor, $replace(#QuestionMonitor, "<strong>", ""), "Global") set(#QuestionMonitor, $replace(#QuestionMonitor, "</strong>", ""), "Global") set(#QuestionMonitor, $trim(#QuestionMonitor), "Global") add list to list(%monitor, $list from text(#QuestionMonitor, ""), "Don\'t Delete", "Global") comment("Topic followers") set(#QuestionFollower, $page scrape(" monitor with ", " topic followers and"), "Global") set(#QuestionFollower, $replace(#QuestionFollower, "<strong>", ""), "Global") set(#QuestionFollower, $replace(#QuestionFollower, "</strong>", ""), "Global") set(#QuestionFollower, $trim(#QuestionFollower), "Global") add list to list(%topicFollowers, $list from text(#QuestionFollower, ""), "Don\'t Delete", "Global") comment("Following this question") set(#QuestionStatFollowingQuestion, $scrape attribute(<class="following_count">, "innertext"), "Global") set(#QuestionStatFollowingQuestion, $replace(#QuestionStatFollowingQuestion, " people are following this question.", ""), "Global") set(#QuestionStatFollowingQuestion, $replace(#QuestionStatFollowingQuestion, " person is following this question.", ""), "Global") set(#QuestionStatFollowingQuestion, $trim(#QuestionStatFollowingQuestion), "Global") add list to list(%QuestionFollowers, $list from text(#QuestionStatFollowingQuestion, ""), "Don\'t Delete", "Global")} When watching in the debugger the original number scraped is wrapped in <strong> </strong> tagsI strip away the tags and trim the number but I end up with the number repeated in the variable example of variable output:Should be single number, not repeated974974 see attached image of debugger I am building up a data set and we need to allow duplicates in the column but not in the cell. How do i validate this to ensure only one number is in the variable before I add it to the list?Is this a bug or have I done something wrong? Thanks for any suggestions Quote Link to post Share on other sites
JohnB 255 Posted January 18, 2012 Report Share Posted January 18, 2012 This is the code you want: set(#views, $page scrape("This question has been viewed <strong>", "</strong>"), "Global") John Quote Link to post Share on other sites
mdc101 15 Posted January 18, 2012 Author Report Share Posted January 18, 2012 Thanks JohnI am still getting duplicates in the variable?Could this be a bug? Quote Link to post Share on other sites
mdc101 15 Posted January 18, 2012 Author Report Share Posted January 18, 2012 example 988988 will be written back into the single cell in csv instead of just 988 Quote Link to post Share on other sites
JohnB 255 Posted January 18, 2012 Report Share Posted January 18, 2012 It's working fine at my end. http://screencast.com/t/bLwhMJj40 You can send the script if you like and I'll have a look at it John Quote Link to post Share on other sites
mdc101 15 Posted January 18, 2012 Author Report Share Posted January 18, 2012 Hi johnI have sent you a private message with what I am doing Thanks Matt Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.