Learjet 27 Posted January 14, 2016 Report Share Posted January 14, 2016 I scraped a table of companies and now the information is in a list, however there are some bold tags and image tags that I need to filter out. I just need the company name and that's it (please see attached image). I've looked on the forum and can't find any wisdom regarding how to filter the list. If you could provide some wisdom to point me in the right direction I would be very grateful! Thanks for your help! Here's the image: Peace,LJ Quote Link to post Share on other sites
Brutal 164 Posted January 14, 2016 Report Share Posted January 14, 2016 LJ, the first thing to do is go back and try scraping the data differently. Show me the command/process you used to scrape the dada into your list. Quote Link to post Share on other sites
pash 504 Posted January 14, 2016 Report Share Posted January 14, 2016 I'm not sure why you to put in the list. not test set list position(%List,0) loop($list total(%List)) { set(#Data,$next list item(%List),"Global") if($comparison($find regular expression(#Data,"<[^>]*>"),"= Equals","")) { then { alert(#Data) } else { } } } Quote Link to post Share on other sites
Learjet 27 Posted January 14, 2016 Author Report Share Posted January 14, 2016 LJ, the first thing to do is go back and try scraping the data differently. Show me the command/process you used to scrape the dada into your list.Thanks for the response Brutal and Pash, here's what I used to scrape the list: navigate("http://www.theassemblyshow.com/index.php/attend/exhibitor-list","Wait") add list to list(%companies,$find regular expression($read file("http://www.theassemblyshow.com/index.php/attend/exhibitor-list"),"(?<=\">(<strong>|)).*(?=(</strong>|)</a></td>)"),"Delete","Global") Thanks again! Quote Link to post Share on other sites
pash 504 Posted January 14, 2016 Report Share Posted January 14, 2016 (edited) try navigate("http://www.theassemblyshow.com/index.php/attend/exhibitor-list","Wait") wait for browser event("Everything Loaded","") wait(1) set(#Html,$scrape attribute(<class="exhibitor">,"innerhtml"),"Global") set(#Html,$replace regular expression(#Html,"<img.*(<strong>|png\">)",""),"Global") clear list(%List) add list to list(%List,$find regular expression(#Html,"(?<=_blank\">).*?(?=<\\/)"),"Delete","Global") Edited January 14, 2016 by pash Quote Link to post Share on other sites
pash 504 Posted January 14, 2016 Report Share Posted January 14, 2016 or this but slow navigate("http://www.theassemblyshow.com/index.php/attend/exhibitor-list","Wait") wait for browser event("Everything Loaded","") wait(1) clear list(%List) set(#Loop,0,"Global") set(#MaxLoop,$divide($list total($scrape attribute(<tagname="td">,"innertext")),2),"Global") loop(#MaxLoop) { add item to list(%List,$scrape attribute($element offset(<tagname="td">,#Loop),"innertext"),"Don\'t Delete","Global") set(#Loop,$add(#Loop,2),"Global") } Quote Link to post Share on other sites
Learjet 27 Posted January 14, 2016 Author Report Share Posted January 14, 2016 Pash, Thanks so much, I see what you did and more importantly I understand why you did it! Can't thank you enough, thanks for your patience while I'm learning :-) Respectfully,LJ Quote Link to post Share on other sites
Brutal 164 Posted January 14, 2016 Report Share Posted January 14, 2016 navigate("http://www.theassemblyshow.com/index.php/attend/exhibitor-list","Wait") set(#get_companies,$scrape attribute(<outerhtml=w"<td><a href=\"*\" target=\"_blank\">*</a></td>">,"innertext"),"Global") add list to list(%companies,$list from text(#get_companies,$new line),"Delete","Global") set(#get_companies,$nothing,"Global") Quote Link to post Share on other sites
Brutal 164 Posted January 14, 2016 Report Share Posted January 14, 2016 The beauty of ubot is that there are almost always multiple paths you can use to achieve your desired end result. I always try to use whatever causes the least amount of code because my bots get pretty big so using the least amount of code helps me greatly. The way to find those multiple paths is to just start poking around - Once you achieve your goal and you're looking at your code, let it play through your mind to see if there are any other commands/parameters that you think may achieve the same thing, then give it a try. Quote Link to post Share on other sites
Learjet 27 Posted January 14, 2016 Author Report Share Posted January 14, 2016 The beauty of ubot is that there are almost always multiple paths you can use to achieve your desired end result. I always try to use whatever causes the least amount of code because my bots get pretty big so using the least amount of code helps me greatly. The way to find those multiple paths is to just start poking around - Once you achieve your goal and you're looking at your code, let it play through your mind to see if there are any other commands/parameters that you think may achieve the same thing, then give it a try. Thanks Brutal, admittedly my knowledge is pretty limited now. Moving from front end development to programming is taking a bit of time but I will get there :-) There's nothing Front End wise that I cannot do, and it got very boring. This however is very fun, I'm still having a blast, frustrating at times but still fun. I got pretty good at fixing PHP scripts and customizing them, but starting from scratch is another issue. Starting to come together slowly! Seriously, I can't thank you guys enough for your patience and willingness to share and help out! Many thanks! Peace,LJ Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.