Jump to content
UBot Underground

Removing all spaces in a scraped block of html (help!)


Recommended Posts

I've got a bot that is scraping pretty happily and placing all the contents in a nicely formatted CSV, but there's a  bug that is making it fall foul of its duty.

 

When the bot encounters a block of HTML where the user has made use of a carriage return the scrape is captured like this:

 

Example from this page:

 

        ipb.vars['highlight_color'] = "#ade57a";
        ipb.vars['charset']                = "UTF-8";
        ipb.vars['time_offset']            = "-7";
        ipb.vars['hour_format']            = "12";
        ipb.vars['seo_enabled']            = 1;
       

What I need to do is take the above and perform some kind of regex or other on this to get the following result.

 

ipb.vars['highlight_color'] = "#ade57a";ipb.vars['charset']                = "UTF-8"; ipb.vars['time_offset']            = "-7"; ipb.vars['hour_format']            = "12"; ipb.vars['seo_enabled']            = 1;

 

IE that all the hard breaks get removed and the output gets put into one line.

 

Does anyone have any idea how to accomplish this.

 

My ubot can then return to duties.

 

Cheers.

 

       

Link to post
Share on other sites

Something like that?

set(#Content, "  ipb.vars[\'highlight_color\'] = \"#ade57a\";
        ipb.vars[\'charset\']                = \"UTF-8\";
        ipb.vars[\'time_offset\']            = \"-7\";
        ipb.vars[\'hour_format\']            = \"12\";
        ipb.vars[\'seo_enabled\']            = 1;", "Global")
set(#Pos, 0, "Global")
clear list(%List)
add list to list(%List, $list from text(#Content, "
"), "Delete", "Global")
set(#Output, "", "Global")
loop($list total(%List)) {
    set(#Output, "{#Output}{$list item(%List, #Pos)}", "Global")
    increment(#Pos)
}

Link to post
Share on other sites
  • 2 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...