Regex to exclude/filter string

veeco · May 25, 2011

Hello Guys,

i'm not an expert (yet) in Regex, been searching to solve this problem...

i have a page to scrape that has this sub-text:

"the Level0 means..."

<td>Invitation-Level0</td>

<td>Level0</td>

<td>Level1</td>

<td>Level0-Gold</td>

it has lots data like the above...

i want to grab the Level0 inside "<td>" only (filter out the Invitation and Gold) to count how many Level0 displayed..

So far i use the pattern "\>Level0\<" , but i could not use it to save the scraped attribute as it return string ">Level0<", hopefully there's pattern to Grab "Level0" only..

Any idea is appreciated..

Thanks

Kreatus (Ubot Ninja) · May 25, 2011

Hi this is not regex but this is the way I do when I want the bot to count the keywords inside a page.

Check this file count keywords.ubot

Note: I actually got this idea from botbuddy.

Kreatus (Ubot Ninja) · May 25, 2011

With regex you can use this pattern "w+0" or literal "Level0"

Here is the sample count keywords.ubot

UBotBuddy · May 25, 2011

@Kreatus

Interesting implementation.

veeco · May 25, 2011

@Kreatus your idea to count the keyword is great. But my problem is filtering the keyword..

for example:

"the Level0 means..."

<td>Invitation-Level0</td>

<td>Level0</td>

<td>Level1</td>

<td>Level0-Gold1</td>

<td>Level0-Gold2</td>

<td>Level0-Gold3</td>

<td>Level0</td>

<td>Level0-Gold4</td>

<td>Level0-Gold5</td>

<td>Level0-Gold6</td>

now i want to get the keyword which only show Level0 in a cell table... this mean i filterout "Invitation-Level0, Level0-Gold1, etc"

So if i scrape this text i only will get "Level0" two times.

Hope this clear my problem

veeco · May 25, 2011

Another same scenario i face... this is the url of page to scrape:

http://www.ip-adress.com/proxy_list/?k=time&d=desc

If i scrape with pattern "Elite" , the IP from the link (which is 119.57.7.118:80:Elite)is also selected.

if i use the pattern [^:]Elite, i get the the list of keywords i targeted but only it consist of ">Elite"

Basically i would like to scrape the proxy's IP, type, and country...

I hope there's solution for Standard Edition, i saw the tutorial when it use the Professional use choose ancestor from a cell... someday i might get the Pro Edition.. but now i must limit myself with Standard Edition one..

Pete · May 25, 2011

Basically i would like to scrape the proxy's IP, type, and country...

Maybe this will help

ProxieScraper.ubot

veeco · May 26, 2011

@zap.. i got your ideas... nice.. but there's little problem that some IP has unknown country so the page scrape for country will have less list..

Frank · June 24, 2011

Guys, it can be so much easier.... http://ubotstudio.com/forum/index.php?/topic/7162-using-regex-to-catch-text-between-sections/#entry36055

Frank

lazlink · August 8, 2011

thats great. thank u for your sharing dude..

i'new commer here..

Sign In

Regex to exclude/filter string

Recommended Posts

veeco 0

Link to post

Share on other sites

Kreatus (Ubot Ninja) 422

Link to post

Share on other sites

Kreatus (Ubot Ninja) 422

Link to post

Share on other sites

UBotBuddy 331

Link to post

Share on other sites

veeco 0

Link to post

Share on other sites

veeco 0

Link to post

Share on other sites

Pete 121

Link to post

Share on other sites

veeco 0

Link to post

Share on other sites

Frank 177

Link to post

Share on other sites

lazlink 0

Link to post

Share on other sites

Join the conversation

Browse

Activity