jewcat

April 28, 2010

Alright, I'm scraping a site for specs on different electronics. Each item on the site has a large number of different values to scrape and move into a CSV. Some pages have some specs while others don't, so I am doing a if command to search the page for each value's heading before setting the variable value and adding it into the CSV string.

Then I encountered a problem. From page to page the headings for each value are different:

Digital:Media_Broadcast_Tuner:

Digital;Media+Broadcast-Tuner:

Digital_Media;Broadcast;Tuner:

Digital+Media-Broadcast_Tuner:

They are hiding different characters in between the text, matched to the background color of the table cell. So simply doing a page search for "Digital Media Broadcast Tuner" for example will not work on each page, due to this protection. Now i could go through and create an if command for every single value, but let's face it, there are A LOT of values to scrape, and creating 4-16 if commands for every single value is going to take ages. Is there no way I can use a wildcard or something to ease the pain here?

Any ideas would be appreciated.

April 7, 2010

ya, i'd just $replace all $newline with $nothing, should do the trick. or at least get you close to your aim.

April 7, 2010

I agree that the UI could be far more flexible, and one of the areas that ubot is lacking most is in options for compiled bots (ex. i would love to be able to set the onload url, so the bot opens to a webpage instead of a white blank upon opening), but in the long run I'd say that UI flexibility is an improvement that can wait a good while yet. There's lots of long needed improvements still on the waiting list to improve the actual functionality of ubot that are far more key than making compiled bots look prettier for the end user.

April 7, 2010

i haven't used it yet, but i was under the impression that ubot let you tap directly into your own decaptcher account. so if that's the case i'd assume you're only going to pay the $2/1000. Hopefully that's the case, need to implement decaptcher into the current bot i'm working with too...

April 1, 2010

Any news regarding the new beta release? What's the status at this time?

March 31, 2010

bumping this one, this is getting to be a recurring issue when i'm looking at automating processes on a variety of sites. how can i work with this?

March 28, 2010

Was going to continue this thread http://ubotstudio.com/forum/index.php?/topic/1943-seeking-help-with-squidoo-lens-making-bot/page__hl__squidoo__fromsearch__1 but it seems no one had anything to say and it doesn't sound like the OP ever made any progress.

The issue I'm seeing is that when you click an edit button you get this flashy dHTML lightbox sort of popup window. I encountered the same thing a few times, and it never seems to let me select any elements within these popups.

Is there a way to handle these? Same sort of popups boxes you see in Facebook, and Google Buzz...

March 26, 2010

I'm using a piece of php on my server. The bot inputs a directory name in a text input, and a list of scraped image URLs into another text input. The files are then loaded directly to my server for working with. had it hacked together by a coder on DP. Only issue is that it seems to drop about 20% of the images without fully downloading them.

Still can't wait for an actual Save As feature to come along.

March 24, 2010

Just curious what all captchas are supported by decaptcher. Do they do ANY image captcha?

March 21, 2010

i clear and reload personally.

March 19, 2010

Gonna drop a noob question here, lol. What do you mean by "hooks that are removed in compiled bots"?

March 19, 2010

having an issue sort of like this with a client's bot. everything works fantastic locally, yet refused to clicksome links running ona clinets machine, not sure why yet.

March 19, 2010

i just did

select by attribute -> innertext

click chosen

and it clicked fine here. Don't know what to say.

March 18, 2010

Use the Document Constant $meta keywords, you can also scrape meta description...

meta_tag_scraper.ubot

March 18, 2010

navigate away to another page, then retry. usually solves the problem for me.

March 18, 2010

Here, give this a try...

YT_description.ubot

March 18, 2010

problem solved. selecting a table element or div by position has worked in most cases.

March 17, 2010

having a hell of a time trying to load the article text in a file here on this particular article site. i have tried $page scrape $scrape chosen attribute. tried loading it into a variable, tried loading it into a list. Just can't seem to pull the data from here. Any suggestions?

Demo article url: http://www.articledashboard.com/Article/Celebrity-Gossip--Celeb-Relationships-/1415867

I tried $page scraping between the , which I thought would be the best option, but than I discovered it was scraping a different element elsewhere on the page. So I tried stepping up and scraping between the <td></td> and it returns a blank value.

Any suggestions?

March 17, 2010

ya, definitely going to be multiple saves along the way from this point forward.

March 17, 2010

I have a funny feeling I am not going to like the answer I get from this thread, but is there any way to recover a corrupted bot source? I have been working on a project for someone and just had a freeze up on my computer. I saved the bot I was working on, rebooted the computer, and came back to ubot to get things completed. Every time I attempt to open the file, ubot crashes. I can open my other saved files just fine.

Is this file lost forever? Nothing popping up for the emergency backup feature, so I'm assuming I did save fine, it just didn't save properly. Tell me there's some way to recover this work. Things were about 80% complete, and If I gotta start over this is a day down the drain...

March 16, 2010

So simple I'm embarrassed to have not thought of it myself. Nice loop action happening.

Thanks bluegoat!

March 16, 2010

I don't see why you couldn't set it up to run with Windows Scheduled Tasks. Shouldn't really need CRON, though you can install CRONw on a windows server.

March 16, 2010

Being able to save files and name them is a MUST. I really want to be able to save text files and open them back up in ubot (by reusing the save location and file name variables to open those files).

Remember, SAVE AS, not SAVE!

Agreed

March 16, 2010

Can't wait for a $local folder to come in. until then I'll have to try this out with a compiled bot.

March 15, 2010

Maybe I'm just not seeing it, but I fail to see where the replacement is happening there at all.

What I have is a list, that I want to remove certain parts from. Another good example would be something like an html snippet that you want to remove all italics and bold font styles from.

The dog walked across the road

The cat ran away from the dog.

So I'd have to replace , , , & all with a $nothing variable.

Now i can easily take one of those strings of text out, but it's the next one where I have problems. At least moving from removing all spaces, on to removing all ' & " I'm having difficulties. I can get the spaces all out, but then getting rid of the quotations and other invalid characters does not work.

I am stripping down keyphrases for input into a form. occasionally the phrases contain an invalid character, I need to strip all of these invalid characters down.

The snippet posted seems to remove entire lines from the list. I need to remove characters (or entire strings in the case of the HTML example) from each item.

Sign In

jewcat

Content Count

Joined

Last visited

Content Type

Profiles

Forums

Posts posted by jewcat

weird scrape protection in place on site, any ideas for an easier work around?

Putting article body in one line??

Building a better GUI

How much for decaptcha charges does ubot charge?

What is the status on the new beta model?

Squidoo Lens Bot?

Squidoo Lens Bot?

WHy does "Save Chosen Image" take a screencap as opposed to actually saving the image?

Is anyone else having problems with captcha services?

Is it better to clear and reload the same %list used in different scripts?

Can somebody click this submit button?

Can somebody click this submit button?

Can somebody click this submit button?

Possible to Scrape Meta Keywords on a page?

Sometimes the browser inside my uBot can't load at all..

Can't scrape and store simple data

scraping article dashboard?

scraping article dashboard?

Corrupted Bot Source?

Corrupted Bot Source?

$replace multiple strings on a single list item?

Will Standalones Work on a Server w/ Cron Job?

WHy does "Save Chosen Image" take a screencap as opposed to actually saving the image?

Loading Values automatically

$replace multiple strings on a single list item?

Browse

Activity