Kreatus (Ubot Ninja) 422 Posted March 24, 2011 Report Share Posted March 24, 2011 Hi guys I need help on this.. How can I isolate the company name at these samples: <SPAN class=pp-place-title><SPAN>Company Name</SPAN></SPAN> <SPAN class=pp-place-title><SPAN>Company XYZ Inc,</SPAN></SPAN> <SPAN class=pp-place-title><SPAN>Company Name - Inc</SPAN></SPAN> <SPAN class=pp-place-title><SPAN>Company Name, ABC</SPAN></SPAN> Thanks in advance Quote Link to post Share on other sites
JohnB 255 Posted March 24, 2011 Report Share Posted March 24, 2011 Try this: parse.ubot John Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted March 24, 2011 Author Report Share Posted March 24, 2011 Thanks John. But not like that.. In regex.. How to match the company names only using regex.. Quote Link to post Share on other sites
JohnB 255 Posted March 24, 2011 Report Share Posted March 24, 2011 lol...ok...give me few (I somehow missed the sub forum!) Quote Link to post Share on other sites
JohnB 255 Posted March 24, 2011 Report Share Posted March 24, 2011 Kreatus, what function are you looking to use? The reason I ask is because the process will be the same with or without regex. (You will most likely still need to use the $replace or $replace regular expression). And since the text within the tags is constant, it will simple be a literal match in regex. John Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted March 24, 2011 Author Report Share Posted March 24, 2011 I wanted to scrape the company title that may vary on different pages. Here are couple of samples:http://maps.google.com/maps/place?cid=7378256458188814955&q=beer&hl=en&ved=0CF8Q-QswAA&sa=X&ei=9HaLTauDA5TyvgPCufi0CQhttp://maps.google.com/maps/place?cid=7378256458188814955&q=beer&hl=en&ved=0CF8Q-QswAA&sa=X&ei=9HaLTauDA5TyvgPCufi0CQ I am scraping it inside the socket command and regex is my only hope to get it since I tried several attempts to scrape it but still no luck.. Outside socket command I can scrape this easily but outside socket I cant. Regex will surely work on this I just dont know the right pattern.. Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted March 24, 2011 Author Report Share Posted March 24, 2011 Check this googlemap.ubot my problem is scraping address and company name inside socket compartment.. Quote Link to post Share on other sites
UBotBuddy 331 Posted March 24, 2011 Report Share Posted March 24, 2011 Kreatus, The problem I think is the ability for "choose by attribute" is not working reliably. I am hoping for a fix for this soon. I will try to get am update for you. Quote Link to post Share on other sites
JohnB 255 Posted March 24, 2011 Report Share Posted March 24, 2011 Ahhh...no problem...Let me work something up for you. John Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted March 24, 2011 Author Report Share Posted March 24, 2011 Kreatus, The problem I think is the ability for "choose by attribute" is not working reliably. I am hoping for a fix for this soon. I will try to get am update for you.Yes choose by attribute is really not working properly inside socket.. But regex is an alternative to use while waiting on the enhancement.. Ahhh...no problem...Let me work something up for you. JohnThanks John! Looking forward.. Quote Link to post Share on other sites
JohnB 255 Posted March 24, 2011 Report Share Posted March 24, 2011 wow, that's amazing...clearly regex works in this bot...HOWEVER...it will not work for the title and I think it may have something to do with the fact that the attribute only includes the simple span tags (it does not include...<SPAN class=pp-place-title>). You aren't using the attributes in any of the other variables which is why they work. You have specific text you are searching for with those. I can't even grab it by position. I'll keep on trying. John Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted March 24, 2011 Author Report Share Posted March 24, 2011 wow, that's amazing...clearly regex works in this bot...HOWEVER...it will not work for the title and I think it may have something to do with the fact that the attribute only includes the simple span tags (it does not include...<SPAN class=pp-place-title>). You aren't using the attributes in any of the other variables which is why they work. You have specific text you are searching for with those. I can't even grab it by position. I'll keep on trying. John Hi John I put specific text on that just to test if it will grab the same exact text but it didnt.. Still trying also. Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted March 24, 2011 Author Report Share Posted March 24, 2011 I found a way to get the company name.. my only problem now is the address. googlemap.ubot Quote Link to post Share on other sites
Abs* 12 Posted March 29, 2011 Report Share Posted March 29, 2011 HI Kreatus Aint got a clue how you managed to find the <title>*</title> to scrape the listing name - WOuld be great if you could share how you found this - Must have spent more then 30 mins trying all sorts of variations without joy - The address of course was another area I coudnt get passed - There is workaround - but of course im sure many will not like it as you need to keep writing to the browser - Ive amended your bot and added a scrape page command outside of the socket compartment so that it works after a write to browser - However I also show how this can be used running through lists - So I have added a small block text which is line seperated - you can place your gmap urls inside of it and the software will go through upto 5 - Of course you can change this looks like we are very limited with what we can do inside of sockets - But using the write to browswer then scraping outside of the sockets will work a treat - after the scraping has been done then you can continue to use sockets - You may have already thought of this but thought I would make notes just incase - Also the regex for the telephone isnt working for a few uk based listings - Try the following to see what I mean http://maps.google.co.uk/maps/place?client=firefox-a&rls=org.mozilla:en-GB:official&channel=s&hl=en&biw=1280&bih=746&um=1&ie=UTF-8&q=solicitors+peterborough&fb=1&gl=uk&hq=solicitors&hnear=Peterborough&cid=15462276534218036938&ei=hsGRTeHTC5yqhAeMwuSYDw&sa=X&oi=local_result&ct=placepage-link&resnum=5&ved=0CFQQ4gkwBAhttp://maps.google.co.uk/maps/place?client=firefox-a&rls=org.mozilla:en-GB:official&channel=s&hl=en&biw=1280&bih=746&um=1&ie=UTF-8&q=solicitors+peterborough&fb=1&gl=uk&hq=solicitors&hnear=Peterborough&cid=14080472938216458285&ei=hsGRTeHTC5yqhAeMwuSYDw&sa=X&oi=local_result&ct=placepage-link&resnum=2&ved=0CC8Q4gkwAQ Bot attached thanksGmaps.ubot Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.