Jump to content
UBot Underground

Recommended Posts

Hello Guys,

i'm not an expert (yet) in Regex, been searching to solve this problem...

 

i have a page to scrape that has this sub-text:

 

"the Level0 means..."

<td>Invitation-Level0</td>

<td>Level0</td>

<td>Level1</td>

<td>Level0-Gold</td>

 

it has lots data like the above...

i want to grab the Level0 inside "<td>" only (filter out the Invitation and Gold) to count how many Level0 displayed..

 

So far i use the pattern "\>Level0\<" , but i could not use it to save the scraped attribute as it return string ">Level0<", hopefully there's pattern to Grab "Level0" only..

 

Any idea is appreciated..

 

Thanks

Link to post
Share on other sites

@Kreatus your idea to count the keyword is great. But my problem is filtering the keyword..

for example:

"the Level0 means..."

<td>Invitation-Level0</td>

<td>Level0</td>

<td>Level1</td>

<td>Level0-Gold1</td>

<td>Level0-Gold2</td>

<td>Level0-Gold3</td>

<td>Level0</td>

<td>Level0-Gold4</td>

<td>Level0-Gold5</td>

<td>Level0-Gold6</td>

 

now i want to get the keyword which only show Level0 in a cell table... this mean i filterout "Invitation-Level0, Level0-Gold1, etc"

 

So if i scrape this text i only will get "Level0" two times.

 

Hope this clear my problem

Link to post
Share on other sites

Another same scenario i face... this is the url of page to scrape:

http://www.ip-adress.com/proxy_list/?k=time&d=desc

 

If i scrape with pattern "Elite" , the IP from the link (which is 119.57.7.118:80:Elite)is also selected.

if i use the pattern [^:]Elite, i get the the list of keywords i targeted but only it consist of ">Elite"

 

Basically i would like to scrape the proxy's IP, type, and country...

 

I hope there's solution for Standard Edition, i saw the tutorial when it use the Professional use choose ancestor from a cell... someday i might get the Pro Edition.. but now i must limit myself with Standard Edition one..

Link to post
Share on other sites

@zap.. i got your ideas... nice.. but there's little problem that some IP has unknown country so the page scrape for country will have less list..

Link to post
Share on other sites
  • 4 weeks later...
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...