# Which is the best web scraping software?



## sejau (Aug 16, 2012)

Hi there

I know I maybe wrong with my question, here but I posted the question here, because I did not know where else...

So I want to know if somebody has experience with web scraping and some specific software.

Because here is the task, I want to have completed:

I'm looking for an easy web scraping software, which allows extracting some simple html-text to .txt-Format, or even to save it into a local database (mysql e.g.).

The HTML looks more or less like this. So it's really simple HTML, showing the content of articles of different newspapers or magazines. Instead of going through all the newspapers/magazines manually and copy - pasting every page, I want to have an automized solution.



```
<h1>Magazine Name</h1>

<h2>Category</h2>


<h4>Title of the Article</h4>
<p>
<p><strong>Subtitle</strong></p>
<p>Author Name</p>
<p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.</p>
<p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.</p>
<p>Author Name</p>
```

So again, does somebody have some hints how to achieve this? Which software meets my demands? 

I tried google, but can't decide whether the software found is 1. userfriendly and 2. meets my demands.

I hope somebody can help me out with this.

cheers & thanks


----------



## cl-scott (Jul 5, 2012)

The problem with this is that it gets into a legal quagmire known as copyright infringement. In most western nations what you want to do could very well result in a nasty letter from a lawyer arriving in your mailbox one day.

Not to mention the concept of "user friendly" is completely amorphous and different from one person to the next. A lot of people find Apple's iOS to be pretty user friendly, but I have several gripes about what I see as rather idiotic design choices made, which make the interface annoying and/or obnoxious to use. So only you can answer whether or not a program is user friendly.


----------



## sejau (Aug 16, 2012)

Thanks for the answer.

The question about legality is solved. That is no problem, we have the approval of the webpage I want to 'webscrape'.

And yes 'User Friendly' may be unclear. Let's put it like this: I'm looking for a software, which first can help me achieving the above mentioned task, and as a second step I would be glad no to have to mingle with the console. So a GUI would be nice.

Thats all I want. So I hope there is someone out there who knows one or another of these webscraping things.

Thanks again.


----------



## bishjaishi (Sep 4, 2012)

Visual Web Ripper is the best software for your problem. It can export into Excel, XML, SQL Server and mySQL.

It has an easy to use GUI interface, and also allows programming for the advanced user.

It is very well documented and provides easy to follow tutorial videos.

Here is the link to their website.


----------



## MichaelDavid (Apr 30, 2013)

Web Content Extractor is the most powerful and easy-to-use web scraping software. It offers a friendly, wizard-driven interface.


----------



## sinclair_tm (Mar 11, 2005)

This thread is almost a year old. It's bad form to post in threads more than a couple of weeks old here.


----------

