Mrcrix 10 Posted May 21, 2012 Report Share Posted May 21, 2012 I'm making a bot which sends spun articles to different websites, the problem is that some websites allows 3 links within the text, some other websites allows just 2 or 1 link in the text. So I thought to make a function where I send as parameter the complete text with all the links and another parameter as the number of total links I need, for instance if I set 1 link as maximum and I send a text with 3 links total I would aspect as result the same text with the last 2 links removed and replaced just with the word . The problem is that I can't setup the Regex to make it... any suggestion? Quote Link to post Share on other sites
Legend 181 Posted May 21, 2012 Report Share Posted May 21, 2012 You might want to just create 3 separate lists (1 for 3 link sites, 1 for 2 link sites & 1 for 1 link sites) and just loop through them individually... http://ubotstudio.com/forum/public/style_emoticons/default/rolleyes.gif Quote Link to post Share on other sites
Mrcrix 10 Posted May 21, 2012 Author Report Share Posted May 21, 2012 Sorry but I didn't get what you meant. Having the lists of the websites by number of links allowed I don't think should be useful. The original spun text has always the maximum number of links let me say 3, what I need should be just to replace the last X links and get the final text with the remaining links. I tried to regex with <a.*>(.*)<\/a> or <a.*>(.*)<\/a>{2} but doing so it get from the first occurrence untill the end of the last occurrence, I would need to limit it only for the 3rd, then for the 2nd and get as result the same text with only the first occurence. For example in this text: Just a simple text with a <a href="http://www.link.com">Link</a> and after some words another <a href="http://www.otherlink.com">Another Link</a> and after more word here comes another <a href="http://www.thirdlink.com">3rd link</a> I should get back Just a simple text with a <a href="http://www.link.com">Link</a> and after some words another Another Link and after more word here comes another 3rd link It shouldn't be too difficult to make it, the problem is my low knowledge of regex syntax P.S: Using as regex <a href=(.*?)</a> it match each link, I cannot get to match only the 2nd or only the 3rd... Quote Link to post Share on other sites
JohnB 255 Posted May 22, 2012 Report Share Posted May 22, 2012 I think the bigger question is how do you know which sites allow what number of links? I am assuming you are talking about links in the resource box, but I am not sure. If you are talking about the body of the article, I would probably set three variables, #bodywith3links, #bodywith2links and #bodywith1link. it would be just as easy as trying to do a replace using regex, especially if you are responding to an error message on the submission page (i.e. "Error: your article has too many links") John Quote Link to post Share on other sites
Mrcrix 10 Posted May 22, 2012 Author Report Share Posted May 22, 2012 No, I was talking about the links in the body of the article. To know how many links each site allows is not the problem 'cause I'm making different "defines" for each website to send the content, then in each defined website I will define explicitly how many links I want for the body text. Anyway I solved the problem, now I got the function to replace the X unwanted links, the solution I've found is to find each link in the article then setting a list with every complete link (eg: <a href="http://url.com">MYLINK1</a>), then setting up another list with the words used for each link (eg: MYLINK1), then by a loop I just need to replace each complete link with the word used, but in this way it replaces all the links with the word, then inside the loop I've set an IF comparison (If the number of links I want is >= the number of the loop) it doesn't do anything ELSE I replace that complete link with only the word used. When the loop is finished I return the result. I'm sure it can be more optimized but the important for now is that it works, the only problem I still have to fix is that it removes the link beginning with the first rather than the last but for me it's ok anyway. Thank you for the suggestions!! Here's the function. #ArticoloDaLimitare: is the complete article body with all links #NumLinksVoluti: is the total links will be present at the end. define $LimitaLinksArticolo(#ArticoloDaLimitare, #NumLinksVoluti) { decrement(#NumLinksVoluti) set(#ListaLinkTrovai, $find regular expression(#ArticoloDaLimitare, "<a href=(.*?)</a>"), "Global") set(#ListaEspressa, $replace(#ListaLinkTrovai, "</a>", "</a>,"), "Global") set(#ListaEspressa, $replace regular expression(#ListaEspressa, "\\n", ""), "Global") add list to list(%LinkList, $list from text(#ListaEspressa, ","), "Delete", "Global") set(#ListaLinkTrovai, $replace regular expression(#ListaLinkTrovai, "<a href=(.*?)>", ""), "Global") set(#ListaLinkTrovai, $replace regular expression(#ListaLinkTrovai, "\\n", ""), "Global") set(#ListaLinkTrovai, "{$replace(#ListaLinkTrovai, "</a>", ",")}END", "Global") set(#ListaLinkTrovai, $replace(#ListaLinkTrovai, ",END", ""), "Global") add list to list(%LinkTrovatiArticolo, $list from text(#ListaLinkTrovai, ","), "Delete", "Global") set list position(%LinkTrovatiArticolo, 0) set list position(%LinkList, 0) set(#NumeroLoop, 0, "Global") set(#var, #ArticoloDaLimitare, "Global") loop($list total(%LinkTrovatiArticolo)) { if($comparison(#NumLinksVoluti, ">=", #NumeroLoop)) { then { } else { set(#var, $replace regular expression(#var, $next list item(%LinkList), $next list item(%LinkTrovatiArticolo)), "Global") } } increment(#NumeroLoop) } return(#var) Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.