Jump to content
UBot Underground

Removing Special Character from Scraped Data


Recommended Posts

Hey guys,

 

I'm having issues scraping the site name in google analytics. I managed to remove the [DEFAULT], but when it saves the data into a file, I see a weird   at the end, following with a space after it.

 

what I'm using is this: \w.+\s

 

Any help would be appreciated.

 

Thanks!

Link to post
Share on other sites
  • 5 months later...

This thread is a little old but I think there may be a few people learning Regex that might benefit from this update.

 

Getting rid of non-Ascii characters can be a real pain.  Especially if you are trying to use the non-whitespace or non-word characters.

Both of these work:


[^ \t-~]

My favorite is this one and it is using the hexidecimal range.

[\x80-\xFF]
 

Buddy

  • Like 2
Link to post
Share on other sites
  • 4 years later...
  • 4 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...