Jump to content
UBot Underground

Recommended Posts

I have really hard time with this one. So I really would like to ask you for a help.

I have a html file that I  need to clean up and sort out. File looks like this:

<html>
<tag>ABC1</tag>
<tag>ABC2</tag>
<tag>ABC3</tag>
</html>

Now, I sorted text between "tags", so I have list:
ABC1new
ABC2new
ABC3new

and now I want to replace each and every text between those tags and have no idea how to archive that.

Replace ABC1 for ABC1new, ABC2 for ABC2new...

 

THX
 

Link to post
Share on other sites

wow this looks great. unfortunately I forgot to mention that my html file is in one line:

 

<html><tag>ABC1</tag><tag>ABC2</tag><tag>ABC3</tag></html>

 

Also "ABC1" and "ABC2new" those are sentences so there is no actually "footprint" behing it /as you used: <tag>{#list item}new</tag>/

 

Still thanks a lot.

Link to post
Share on other sites

It's a bit shorter with regex and it goes like this:

set(#i, 1, "Global")
set(#HTML, "<html><tag>ABC1</tag><tag>ABC2</tag><tag>ABC3</tag></html>", "Global")
loop(3) {
    set(#HTML, $replace regular expression(#HTML, "(?<=<tag>)ABC{#i}(?=</tag>)", "ABC{#i}new"), "Global")
    increment(#i)
}
Link to post
Share on other sites
  • 2 weeks later...

Guys thanks a lot for your help, but /and it is only my fault, sorry/ I didnt make it clear completely. ABC1....ABC2...ABC3 are just ilustrations of different texts. Actually it is something like this.

 

<tag>sfhgydfghnyfg</tag>
<tag>xvxcvhdhf</tag>
<tag>xcvbdhydfh</tag>

 

 

And that is why I am stuck..still....

Link to post
Share on other sites

                    <h3>Search Results</h3>
                    <hr />
                        <h4>2012 Spring New Women's Jeans Washed Straight Jeans Cotton Patchwork Drawstring Waist Casual Jeans</h4>
                    </a>
                    <span class="Price_Info">
                        <span class="Price">Price:</span><br />
                        <span class="Price_Normal">$34.99</span>
                        <span class="Price_Currency">USD</span>
                    </span>
                    <p class="description">
                        Fabric: other fabric Size: cm Length Waist Front Crotch Back Crotch Hip Thigh Leg Opening Free 30 76 25 32 100 54 30.
                    </p>
                    <div class="Extra_Info">
                    </div>
                    <hr class="hr" />
                        <h4>Wrangler Blues Women's Jeans</h4>
                    </a>
                    <span class="Price_Info">
                        <span class="Price">Price:</span><br />
                        <span class="Price_Normal">$26.99</span>
                        <span class="Price_Currency">USD</span>
                    </span>
                    <p class="description">
                        Wrangler Blues Women's Jeans.
                    </p>
                    <div class="Extra_Info">
                        • <h5>Brand:</h5> <a href="search__wrangler_Women+Jeans-REL-1.htm" title="Wrangler">Wrangler</a>
                    </div>
                    <hr class="hr" />
                        <h4>Fashion Casual Jeans Blue 100% Cotton Womens Jeans</h4>
                    </a>
                    <span class="Price_Info">
                        <span class="Price">Price:</span><br />
                        <span class="Price_Normal">$34.99</span>
                        <span class="Price_Currency">USD</span>
                    </span>
                    <p class="description">
                        Category: / Women's Clothing / Women's Jeans.
                    </p>
                    <div class="Extra_Info">
                    </div>
                    <hr class="hr" />

 

 

I need to take out everything in between description:

 

                    <p class="description">
                        *
                    </p>

 

And put it back, once I clear some stuff out. But in same order.

Link to post
Share on other sites

This may help get your brain matter flowing, just drop that search file into the load html
Then run it, then check the page source code

 

load html("")set(#Webpage, $document text, "Global")clear list(%Temp)add list to list(%Temp, $list from text($scrape 
attribute(<class="description">, "tagname"), $new 
line), "Don\'t 
Delete", "Global")set 
list position(%Temp, 
0)set(#Counter, 
0, "Global")loop($list total(%Temp)) 
{    set(#ItemIndex, $find index(#Webpage, "<p 
class=\"description\">"), "Global")    set(#Webpage, $insert text(#Webpage, " {#Counter} ", $add(#ItemIndex, 
21)), "Global")    increment(#Counter)}load html(#Webpage)clear list(%Temp)


 

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...