How to isolate text within codes

Kreatus (Ubot Ninja) · March 24, 2011

Hi guys I need help on this.. How can I isolate the company name at these samples:

<SPAN class=pp-place-title><SPAN>Company Name</SPAN></SPAN>
<SPAN class=pp-place-title><SPAN>Company XYZ Inc,</SPAN></SPAN>
<SPAN class=pp-place-title><SPAN>Company Name - Inc</SPAN></SPAN>
<SPAN class=pp-place-title><SPAN>Company Name, ABC</SPAN></SPAN>

Thanks in advance

JohnB · March 24, 2011

Try this:

parse.ubot

John

Kreatus (Ubot Ninja) · March 24, 2011

Thanks John. But not like that.. In regex..

How to match the company names only using regex..

JohnB · March 24, 2011

lol...ok...give me few (I somehow missed the sub forum!)

JohnB · March 24, 2011

Kreatus, what function are you looking to use? The reason I ask is because the process will be the same with or without regex. (You will most likely still need to use the $replace or $replace regular expression). And since the text within the tags is constant, it will simple be a literal match in regex.

John

Kreatus (Ubot Ninja) · March 24, 2011

I wanted to scrape the company title that may vary on different pages.

Here are couple of samples:

http://maps.google.com/maps/place?cid=7378256458188814955&q=beer&hl=en&ved=0CF8Q-QswAA&sa=X&ei=9HaLTauDA5TyvgPCufi0CQ

I am scraping it inside the socket command and regex is my only hope to get it since I tried several attempts to scrape it but still no luck.. Outside socket command I can scrape this easily but outside socket I cant. Regex will surely work on this I just dont know the right pattern..

Kreatus (Ubot Ninja) · March 24, 2011

Check this googlemap.ubot my problem is scraping address and company name inside socket compartment..

UBotBuddy · March 24, 2011

Kreatus,

The problem I think is the ability for "choose by attribute" is not working reliably. I am hoping for a fix for this soon.

I will try to get am update for you.

JohnB · March 24, 2011

Ahhh...no problem...Let me work something up for you.

John

Kreatus (Ubot Ninja) · March 24, 2011

Kreatus,

The problem I think is the ability for "choose by attribute" is not working reliably. I am hoping for a fix for this soon.

I will try to get am update for you.

Yes choose by attribute is really not working properly inside socket.. But regex is an alternative to use while waiting on the enhancement..

Ahhh...no problem...Let me work something up for you.

John

Thanks John! Looking forward..

JohnB · March 24, 2011

wow, that's amazing...clearly regex works in this bot...HOWEVER...it will not work for the title and I think it may have something to do with the fact that the attribute only includes the simple span tags (it does not include...<SPAN class=pp-place-title>). You aren't using the attributes in any of the other variables which is why they work. You have specific text you are searching for with those.

I can't even grab it by position.

I'll keep on trying.

John

Kreatus (Ubot Ninja) · March 24, 2011

wow, that's amazing...clearly regex works in this bot...HOWEVER...it will not work for the title and I think it may have something to do with the fact that the attribute only includes the simple span tags (it does not include...<SPAN class=pp-place-title>). You aren't using the attributes in any of the other variables which is why they work. You have specific text you are searching for with those.

I can't even grab it by position.

I'll keep on trying.

John

Hi John I put specific text on that just to test if it will grab the same exact text but it didnt..

Still trying also.

Kreatus (Ubot Ninja) · March 24, 2011

I found a way to get the company name.. my only problem now is the address.

googlemap.ubot

Abs* · March 29, 2011

HI Kreatus

Aint got a clue how you managed to find the <title>*</title> to scrape the listing name - WOuld be great if you could share how you found this - Must have spent more then 30 mins trying all sorts of variations without joy -

The address of course was another area I coudnt get passed -

There is workaround - but of course im sure many will not like it as you need to keep writing to the browser -

Ive amended your bot and added a scrape page command outside of the socket compartment so that it works after a write to browser - However I also show how this can be used running through lists - So I have added a small block text which is line seperated - you can place your gmap urls inside of it and the software will go through upto 5 - Of course you can change this

looks like we are very limited with what we can do inside of sockets - But using the write to browswer then scraping outside of the sockets will work a treat - after the scraping has been done then you can continue to use sockets -

You may have already thought of this but thought I would make notes just incase - Also the regex for the telephone isnt working for a few uk based listings - Try the following to see what I mean

http://maps.google.co.uk/maps/place?client=firefox-a&rls=org.mozilla:en-GB:official&channel=s&hl=en&biw=1280&bih=746&um=1&ie=UTF-8&q=solicitors+peterborough&fb=1&gl=uk&hq=solicitors&hnear=Peterborough&cid=15462276534218036938&ei=hsGRTeHTC5yqhAeMwuSYDw&sa=X&oi=local_result&ct=placepage-link&resnum=5&ved=0CFQQ4gkwBA

http://maps.google.co.uk/maps/place?client=firefox-a&rls=org.mozilla:en-GB:official&channel=s&hl=en&biw=1280&bih=746&um=1&ie=UTF-8&q=solicitors+peterborough&fb=1&gl=uk&hq=solicitors&hnear=Peterborough&cid=14080472938216458285&ei=hsGRTeHTC5yqhAeMwuSYDw&sa=X&oi=local_result&ct=placepage-link&resnum=2&ved=0CC8Q4gkwAQ

Bot attached

thanks

Gmaps.ubot

How to isolate text within codes

Recommended Posts

Kreatus (Ubot Ninja) 422

Link to post

Share on other sites

JohnB 255

Link to post

Share on other sites

Kreatus (Ubot Ninja) 422

Link to post

Share on other sites

JohnB 255

Link to post

Share on other sites

JohnB 255

Link to post

Share on other sites

Kreatus (Ubot Ninja) 422

Link to post

Share on other sites

Kreatus (Ubot Ninja) 422

Link to post

Share on other sites

UBotBuddy 331

Link to post

Share on other sites

JohnB 255

Link to post

Share on other sites

Kreatus (Ubot Ninja) 422

Link to post

Share on other sites

JohnB 255

Link to post

Share on other sites

Kreatus (Ubot Ninja) 422

Link to post

Share on other sites

Kreatus (Ubot Ninja) 422

Link to post

Share on other sites

Abs* 12

Link to post

Share on other sites

Join the conversation