webpro 31 Posted February 26, 2013 Report Share Posted February 26, 2013 Now i got a tricky one for you guys. I'm scrapping user names as it's the only way to go for now.I then append the usernames to an urlSo "Cutegirl" as username looks like http://www.site.com/users/cutegirl/This work fine. But the problem is, some user names have spaces. For instance:"Another Cute girl" winds up like thishttp://www.site.com/users/another%20cute%20girlWhich of course triggers 404 or the site stalls. How can i work this out ?Unless i aint' scrapping right at first place ?Here's a portion of the html <div class="Page PageSearch PageResults"> <div class="headerBar" style="display: block; "><div class="paging"><span class="buttons"><span class="buttonReload icon-refresh font-blue size-ok" style="margin-right: 10px; " data-tooltips="Reload"></span><span class="icon-tri-left size-ok"></span><span class="icon-tri-right size-ok active"></span></span><span class="pageText">Page <span class="pageNav">1 / 90</span></span></div><div class="options"><div class="FilterBar"> <div class="Button small buttonSearch" title="Search"><< Search Criteria</div> <div class="popOrder Button small">Order: <span class="FilterButtonValue" id="dynOrder">Last Online</span></div> <div class="message"> </div> </div></div><div class="Clear"></div></div> <div class="results"><div class="MiniProfile " data-username="steve-london"> <div class="photo contentLeft"> <img src="http://wac.9B87.site.net/809B87/member-media/28955738/f45ce0d2-b9af-4ae0-8dd8-0cecf05a3744_f.jpg"> </div> <div class="contentRight"> <div class="buttons" style=""><div> <span class="Button small grey buttonRemove" style="display: none; ">Remove</span> <span class="Button small grey buttonCancel" style="display: none; ">Cancel</span> <span class="Button small green buttonAccept" style="display: none; ">Add to Contacts</span> <span class="Button small grey buttonReject" style="display: none; ">No Thanks </span> </div></div> <div class="name"> <span class="icon-member buddy-offline size-ok gradient" data-tooltips="Last Online: one hour ago"></span> <span class="Strong">steve-london</span> <span> <span style="margin-left:10px;"> </span> </span> </div> <div class="clear"></div> <div class="details"> Male 32, Ontario <span class="distance"> (1054km) </span> </div> <div class="desc"> i am simple person with simple thinking </div> </div> <div class="Clear"></div> </div><div class="MiniProfile " data-username="llcoolj888"> <div class="photo contentLeft"> <img src="http://wac.9B87.site.net/809B87/member-media/27430474/35b50af0-0b77-42b3-a8d7-3db1e8d8167b_f.jpg"> <div class="gallery"> <span class="icon-public-gallery size-ok" data-position="n" data-tooltips="Public Photos: 5"></span> </div> </div> <div class="contentRight"> <div class="buttons" style=""><div> <span class="Button small grey buttonRemove" style="display: none; ">Remove</span> <span class="Button small grey buttonCancel" style="display: none; ">Cancel</span> <span class="Button small green buttonAccept" style="display: none; ">Add to Contacts</span> <span class="Button small grey buttonReject" style="display: none; ">No Thanks </span> </div></div> <div class="name"> <span class="icon-member buddy-online size-ok gradient" data-tooltips="Online"></span> <span class="Strong">llcoolj888</span> <span> <span style="margin-left:10px;"> <span class="icon-friendship size-ok" data-tooltips="You match each other's seeking criteria for Friendship!"></span> <span class="icon-relationship size-ok " data-tooltips="You match each other's seeking criteria for Relationships!"></span> <span class="icon-casual-relation size-ok" data-tooltips="You match each other's seeking criteria for Casual Dating!"></span> </span> </span> </div> <div class="clear"></div> <div class="details"> Male 30, Ontario <span class="distance"> (889km) </span> </div> <div class="desc"> im from israel, but now im stay in canada </div> </div> <div class="Clear"></div> </div><div> <div class="ad"> <div class="DoubleSectionBreak"></div> <div class="Ad ad468x60" data-adlocation="Search" data-adrotation="manual"><iframe src="http://site.site.net/adserver/adsensemanual/adexchangeca_468x60.htm?0.6284087207168341&googleurl=http://www.site.com/en/member/llcoolj888" width="468" height="60" frameborder="0" scrolling="no"></iframe></div> <div class="DoubleSectionBreak"></div> </div> </div><div class="MiniProfile " data-username="mrcharm83"> <div class="photo contentLeft"> <img src="http://wac.9B87.site.net/809B87/member-media/30599560/df044395-9b37-4d8f-8962-2cb72ed197d7_f.jpg"> <div class="gallery"> <span class="icon-public-gallery size-ok" data-position="n" data-tooltips="Public Photos: 4"></span> </div> </div> <div class="contentRight"> <div class="buttons" style=""><div> <span class="Button small grey buttonRemove" style="display: none; ">Remove</span> <span class="Button small grey buttonCancel" style="display: none; ">Cancel</span> <span class="Button small green buttonAccept" style="display: none; ">Add to Contacts</span> <span class="Button small grey buttonReject" style="display: none; ">No Thanks </span> </div></div> <div class="name"> <span class="icon-member buddy-online size-ok gradient" data-tooltips="Online"></span> <span class="Strong">Mrcharm83</span> <span> <span style="margin-left:10px;"> <span class="icon-friendship size-ok" data-tooltips="You match each other's seeking criteria for Friendship!"></span> <span class="icon-relationship size-ok " data-tooltips="You match each other's seeking criteria for Relationships!"></span> <span class="icon-casual-relation size-ok" data-tooltips="You match each other's seeking criteria for Casual Dating!"></span> </span> </span> </div> <div class="clear"></div> <div class="details"> Male 29, Saint-Roch-De-Richelieu <span class="distance"> (350km) </span> </div> <div class="desc"> Vit chaque jour comme si s'était le dernier! J'aime a peu près tout de la vie. Je suis quelqu'un de positif, qui a toujours le mot pour rire.Je suis facile à vivre, pas compliqué, toujours plein de bonnes intentions et j'estime avoir de bonnes valeurs. J'adore apprendre de nouvel... </div> </div> <div class="Clear"></div> </div><div class="MiniProfile " data-username="citchel"> <div class="photo contentLeft"> <img src="http://wac.9B87.site.net/809B87/member-media/24419044/bc73da3f-209e-4630-8c37-df1fa2fd4115_f.jpg"> <div class="gallery"> <span class="icon-public-gallery size-ok" data-position="n" data-tooltips="Public Photos: 3"></span> </div> </div> <div class="contentRight"> <div class="buttons" style=""><div> <span class="Button small grey buttonRemove" style="display: none; ">Remove</span> <span class="Button small grey buttonCancel" style="display: none; ">Cancel</span> <span class="Button small green buttonAccept" style="display: none; ">Add to Contacts</span> <span class="Button small grey buttonReject" style="display: none; ">No Thanks </span> </div></div> <div class="name"> <span class="icon-member buddy-online size-ok gradient" data-tooltips="Online"></span> <span class="Strong">citchel</span> <span> <span style="margin-left:10px;"> <span class="icon-friendship size-ok" data-tooltips="You match each other's seeking criteria for Friendship!"></span> <span class="icon-relationship size-ok " data-tooltips="You match each other's seeking criteria for Relationships!"></span> </span> </span> </div> <div class="clear"></div> <div class="details"> Male 33, Ontario <span class="distance"> (952km) </span> </div> <div class="desc"> A little about myself: I am an outgoing person, passionate for my family and friends. I love different cultures and make new friends. I enjoy cooking and tasting food and wines. Thanks EDITED: Ok scrapping the <class="strong"> with innertext works out good. I found out that on postition #2 and #3 there was "Theme Profile" and "Personalised Something" which didn't really belong in the list (didn't appear either when i kept on scrapping other pages)So i used remove from list(%list, 2) remove from list(%list, 2) and i got ride of this. Now i have a clean list of usernames without any spaces. If you think i'm banging myself on the walls and over doing it hahahahahaLet me know. Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted February 26, 2013 Report Share Posted February 26, 2013 What is the exact part you want to scrape on that code you posted? Quote Link to post Share on other sites
magoo 31 Posted February 26, 2013 Report Share Posted February 26, 2013 I don't see Another Cute girl anywhere in the code above, reckon you could narrow down what we are looking for? Quote Link to post Share on other sites
Pete 121 Posted February 26, 2013 Report Share Posted February 26, 2013 (edited) Would be a regex replace \s+ with nothing navigate("http://www.ubotstudio.com/forum/index.php?/topic/13092-now-i-got-a-tricky-one-for-you-guys/", "Wait") wait(2) add list to list(%UserNames, $list from text($replace regular expression($page scrape("Unless", "place"), "\\s", $nothing), $new line), "Delete", "Global") Edited February 26, 2013 by zap 1 Quote Link to post Share on other sites
Lucius 7 Posted February 26, 2013 Report Share Posted February 26, 2013 u need to encode your URL links.. All spaces should be either percent encoded (%20) or plus encoded (+) so.. have something like >>>>>> if url contains space.. then replace space with %20 Quote Link to post Share on other sites
webpro 31 Posted February 28, 2013 Author Report Share Posted February 28, 2013 Thanks guys i'll have a look at your ideas cause it was working, well the way i worked it out (even tho it's not perfect) but when i relauched the bot to work with it again, i got an error. I couldn't see the script in node view anymore (fix the problem first thingo) Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.