Jump to content
UBot Underground

Need Help With Regex


Go to solution Solved by HelloInsomnia,

Recommended Posts

Hey guys, I'm not able to figure out how to solve this problem I'm having..

 

I have a list of hundreds of URLs

 

Example: 

 

http://dailycaller.com/2017/11/02/trump-pick-for-top-agriculture-post-withdraws-name-following-russia-probe-revelations/
https://www.newsmax.com/politics/jeff-sessions-vladimir-putin-court-filings-donald-trump/2017/11/02/id/823762
https://www.wthr.com/article/ship-to-attempt-raising-russian-chopper-wreckage-in-arctic
http://www.motherjones.com/kevin-drum/2017/11/a-little-bit-of-pushback-on-the-jeff-sessions-story/
https://pjmedia.com/trending/nunes-dems-suddenly-interested-viewing-doj-dossier-docs-didnt-want-subpoenaed/
https://boingboing.net/2017/11/02/kgb-killed-jfk-celebrity-weig.html
https://www.rt.com/business/408536-rosneft-iran-energy-investments/
https://jingtravel.com/russia-mulls-easing-visa-requirements-as-chinese-tourist-numbers-grow/
http://forward.com/fast-forward/386807/billionaire-trump-backer-robert-mercer-sells-breitbart-stake-over-racism-cl/
https://finance.yahoo.com/news/hillary-clinton-defends-her-campaign-120530439.html
 
I want to scan them for foreign links.. and remove those URLs from the list.
 
Example: 
 
.co.uk/
.in/
.ru/
Link to post
Share on other sites
  • Solution

You can use something like this: http(|s)\:\/\/(|www\.)(|[a-zA-Z0-9-]+\.)[a-zA-Z0-9-]+\.(com|net|org).*

 

Depending on where the links are the code may need to be modified but if its just a list like that it should work. You can add more tlds, near the end in the same format like (com|net|org|us|ca) and so on.

 

Here is some example code:

set(#links,"http://www.stuff.dailycaller.com/2017/11/02/trump-pick-for-top-agriculture-post-withdraws-name-following-russia-probe-revelations/
https://www.newsmax.com/politics/jeff-sessions-vladimir-putin-court-filings-donald-trump/2017/11/02/id/823762
https://www.wthr.com/article/ship-to-attempt-raising-russian-chopper-wreckage-in-arctic
http://www.motherjones.com/kevin-drum/2017/11/a-little-bit-of-pushback-on-the-jeff-sessions-story/
http://www.motherjones.ru/kevin-drum/2017/11/a-little-bit-of-pushback-on-the-jeff-sessions-story/
https://pjmedia.com/trending/nunes-dems-suddenly-interested-viewing-doj-dossier-docs-didnt-want-subpoenaed/
https://boingboing.net/2017/11/02/kgb-killed-jfk-celebrity-weig.html
https://www.rt.com/business/408536-rosneft-iran-energy-investments/
https://jingtravel.com/russia-mulls-easing-visa-requirements-as-chinese-tourist-numbers-grow/
https://jingtravel.co.uk/russia-mulls-easing-visa-requirements-as-chinese-tourist-numbers-grow/
http://forward.com/fast-forward/386807/billionaire-trump-backer-robert-mercer-sells-breitbart-stake-over-racism-cl/
https://finance.yahoo.com/news/hillary-clinton-defends-her-campaign-120530439.html","Global")
clear list(%links)
add list to list(%links,$find regular expression(#links,"http(|s)\\:\\/\\/(|www\\.)(|[a-zA-Z0-9-]+\\.)[a-zA-Z0-9-]+\\.(com|net|org).*"),"Delete","Global")
  • Like 1
Link to post
Share on other sites

 

You can use something like this: http(|s)\:\/\/(|www\.)(|[a-zA-Z0-9-]+\.)[a-zA-Z0-9-]+\.(com|net|org).*

 

Depending on where the links are the code may need to be modified but if its just a list like that it should work. You can add more tlds, near the end in the same format like (com|net|org|us|ca) and so on.

 

Here is some example code:

set(#links,"http://www.stuff.dailycaller.com/2017/11/02/trump-pick-for-top-agriculture-post-withdraws-name-following-russia-probe-revelations/
https://www.newsmax.com/politics/jeff-sessions-vladimir-putin-court-filings-donald-trump/2017/11/02/id/823762
https://www.wthr.com/article/ship-to-attempt-raising-russian-chopper-wreckage-in-arctic
http://www.motherjones.com/kevin-drum/2017/11/a-little-bit-of-pushback-on-the-jeff-sessions-story/
http://www.motherjones.ru/kevin-drum/2017/11/a-little-bit-of-pushback-on-the-jeff-sessions-story/
https://pjmedia.com/trending/nunes-dems-suddenly-interested-viewing-doj-dossier-docs-didnt-want-subpoenaed/
https://boingboing.net/2017/11/02/kgb-killed-jfk-celebrity-weig.html
https://www.rt.com/business/408536-rosneft-iran-energy-investments/
https://jingtravel.com/russia-mulls-easing-visa-requirements-as-chinese-tourist-numbers-grow/
https://jingtravel.co.uk/russia-mulls-easing-visa-requirements-as-chinese-tourist-numbers-grow/
http://forward.com/fast-forward/386807/billionaire-trump-backer-robert-mercer-sells-breitbart-stake-over-racism-cl/
https://finance.yahoo.com/news/hillary-clinton-defends-her-campaign-120530439.html","Global")
clear list(%links)
add list to list(%links,$find regular expression(#links,"http(|s)\\:\\/\\/(|www\\.)(|[a-zA-Z0-9-]+\\.)[a-zA-Z0-9-]+\\.(com|net|org).*"),"Delete","Global")

 

Thanks! Works like a charm  :)

 

If I were to write that regex myself it would have taken me 3 days to figure out lol. 

 

Really appreciate it! 

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...