Donate to Remove ads

Got a credit card? use our Credit Card & Finance Calculators

Thanks to eyeball08,Wondergirly,bofh,johnstevens77,Bhoddhisatva, for Donating to support the site

Obtaining valuable content from the old forum

Formerly "Lemon Fool - Improve the Recipe" repurposed as Room 102 (see above).
BeaglesEnd
Posts: 7
Joined: November 4th, 2016, 1:05 pm

Obtaining valuable content from the old forum

#197

Postby BeaglesEnd » November 4th, 2016, 2:30 pm

There is so much incredible information and resource on the old boards.

As someone who has the technical capability to harvest data from websites, it is very simple to scrape the old boards. As it is a public forum there is not a lot of comeback in relation to obtaining it and republishing as long as it isn't used commercially and it could be referenced. But, INAL, so those with experience in law might what to add/disagree. Happy for feedback.

The only area of concern would be the number of hits required to trawl the existing site and download the HTML. The hits issue can be overcome with gapping the GET requests and downloading could be limited to specific boards or threads.

Is, or has, anybody approached the powers that be about some sort of agreement to do this in a legitimate manner?

I am by no means suggesting that this should be done, without the permission of those that run the old forum, only that it is technically possible.

Meatyfool
Lemon Slice
Posts: 313
Joined: November 4th, 2016, 11:43 am
Has thanked: 2 times
Been thanked: 55 times

Re: Obtaining valuable content from the old forum

#215

Postby Meatyfool » November 4th, 2016, 2:42 pm

It would be charitable of TMF if they gave us to the two week read only period to do as much as we can to scrape the site.

Their call.

GrahamPlatt
Lemon Quarter
Posts: 2077
Joined: November 4th, 2016, 9:40 am
Has thanked: 1039 times
Been thanked: 840 times

Re: Obtaining valuable content from the old forum

#234

Postby GrahamPlatt » November 4th, 2016, 3:15 pm

mc2fool has already addressed this, and I have emailed archive.org to ask if they'll do a last minute scrape.

http://boards.fool.co.uk/it-should-be-possible-to-archive-this-board39s-13457309.aspx

BeaglesEnd
Posts: 7
Joined: November 4th, 2016, 1:05 pm

Re: Obtaining valuable content from the old forum

#240

Postby BeaglesEnd » November 4th, 2016, 3:26 pm

GrahamPlatt wrote:mc2fool has already addressed this, and I have emailed archive.org to ask if they'll do a last minute scrape.

http://boards.fool.co.uk/it-should-be-possible-to-archive-this-board39s-13457309.aspx


Excellent. The good thing about the lack of advancement of the forum software is that only a few scraping templates would needed to abstract the data from the html in to whatever format is required for storage/publication (xml,json,etc.).

BEnd

dionaeamuscipula
Lemon Quarter
Posts: 1098
Joined: November 4th, 2016, 1:25 pm
Has thanked: 101 times
Been thanked: 375 times

Re: Obtaining valuable content from the old forum

#250

Postby dionaeamuscipula » November 4th, 2016, 3:47 pm

BeaglesEnd wrote:There is so much incredible information and resource on the old boards.

As someone who has the technical capability to harvest data from websites, it is very simple to scrape the old boards. As it is a public forum there is not a lot of comeback in relation to obtaining it and republishing as long as it isn't used commercially and it could be referenced.


TMF content will remain copyright of TMF Limited in accordance with their T's and C's. They may be happy to grant a licence or licences to copy it, but I would expect that to be stuffed firmly in the "too hard" basket.

Meatyfool
Lemon Slice
Posts: 313
Joined: November 4th, 2016, 11:43 am
Has thanked: 2 times
Been thanked: 55 times

Re: Obtaining valuable content from the old forum

#255

Postby Meatyfool » November 4th, 2016, 3:55 pm

dionaeamuscipula wrote:

TMF content will remain copyright of TMF Limited in accordance with their T's and C's. They may be happy to grant a licence or licences to copy it, but I would expect that to be stuffed firmly in the "too hard" basket.
[/quote]

Hence my suggestion back on TMF to monetise the boards. Let someone screen scrape and rejig the output into a searchable DVD and sell them at a reasonable price to those who want the back catalogue.

TMF could even insist on X number of hard requests backed up by readies before they give permission.

Meatyfool..

modellingman
Lemon Slice
Posts: 621
Joined: November 4th, 2016, 3:46 pm
Has thanked: 601 times
Been thanked: 368 times

Re: Obtaining valuable content from the old forum

#285

Postby modellingman » November 4th, 2016, 4:52 pm

BeaglesEnd wrote:There is so much incredible information and resource on the old boards.

As someone who has the technical capability to harvest data from websites, it is very simple to scrape the old boards. As it is a public forum there is not a lot of comeback in relation to obtaining it and republishing as long as it isn't used commercially and it could be referenced. But, INAL, so those with experience in law might what to add/disagree. Happy for feedback.

The only area of concern would be the number of hits required to trawl the existing site and download the HTML. The hits issue can be overcome with gapping the GET requests and downloading could be limited to specific boards or threads.


I posted some some metrics here on numbers of boards and posts: http://boards.fool.co.uk/if-tmf-history-data-could-somehow-be-saved-then-13457328.aspx

Legal issues aside, the difficulty of harvesting post content from the boards.fool.co.uk lies not in scraping the text from webpages but in the volume of content that exists. If less than 100% of content is to be retained when Fool towers finally pull the plug, then who decides what is kept and on what basis? In addition: how will retained content be organised; where will it be kept (eg will it be kept on lemonfool.co.uk); and, to what extent will the functionality of the old boards site be retained - particularly in keeping working cross links between posts.

I'm not trying to throw a spanner in anyone's works, and what stooz and Clariman have done so far is commenadable but I think there are things to consider before plunging into a large scale data grab, not least because they determine how that grab can be made to achieve its objectives.

ReformedCharacter
Lemon Quarter
Posts: 3133
Joined: November 4th, 2016, 11:12 am
Has thanked: 3629 times
Been thanked: 1519 times

Re: Obtaining valuable content from the old forum

#510

Postby ReformedCharacter » November 4th, 2016, 10:16 pm

TMF content will remain copyright of TMF Limited in accordance with their T's and C's. They may be happy to grant a licence or licences to copy it, but I would expect that to be stuffed firmly in the "too hard" basket.

According to the T&Cs, the copyright remains with the author of a post. TMF grant themselves a perpetual, non-exclusive license.

RC

gbjbaanb
Lemon Slice
Posts: 582
Joined: November 4th, 2016, 1:17 pm
Has thanked: 192 times
Been thanked: 126 times

Re: Obtaining valuable content from the old forum

#535

Postby gbjbaanb » November 4th, 2016, 10:58 pm

ReformedCharacter wrote:TMF content will remain copyright of TMF Limited in accordance with their T's and C's. They may be happy to grant a licence or licences to copy it, but I would expect that to be stuffed firmly in the "too hard" basket.

According to the T&Cs, the copyright remains with the author of a post. TMF grant themselves a perpetual, non-exclusive license.

RC


Bum. That means it legally not possible - TMF has a licence from every poster to hold and display the posts, but they remain copyright with the original poster. If you could get permission from the copyright holder (ie they'd post on TMF under their username explicitly saying you can do this) then you could scrape their old posts, but if you don't get that permission... you cannot, and TMF cannot give that permission either.

I think you'd have to cherry-pick posters with a good "back catalogue" (luniversal on Investing For Income for example), then scrape as many of his posts as you like to re-post here if they give permission.

Itsallaguess
Lemon Half
Posts: 9129
Joined: November 4th, 2016, 1:16 pm
Has thanked: 4140 times
Been thanked: 10025 times

Re: Obtaining valuable content from the old forum

#627

Postby Itsallaguess » November 5th, 2016, 8:42 am

I was having a think this morning about all the hard work that went into the many 'Announcements' notices on a great number of high-traffic boards over on TMF.

Presumably, and I'm happy to verify this with Stooz, the boards that are being duplicated over here can have some sort of 'Sticky' post system, where we could take the opportunity to migrate those 'Announcements' notices over to the high-profile 'Sticky' posts on the relevant boards.

Is this something to think about?

Is it something we're going to be motivated into doing?

Is there likely to be any copyright issues? I'm happy to ask for some sort of 'blanket permission' over on TMF, but I'm not sure if we'll get it, or get it in time.

It would be a real waste not to take advantage of the many fantastic 'Announcements' panels that have in some cases taken years to create and manipulate into concise, helpful announcements, so I wondered what people thought?

Cheers,

Itsallaguess

bionichamster
Lemon Slice
Posts: 406
Joined: November 4th, 2016, 10:52 pm
Has thanked: 242 times
Been thanked: 65 times

Re: Obtaining valuable content from the old forum

#629

Postby bionichamster » November 5th, 2016, 8:45 am

I think you'd have to cherry-pick posters with a good "back catalogue" (luniversal on Investing For Income for example), then scrape as many of his posts as you like to re-post here if they give permission.

Why not change the sign up for this site to ask anyone signing up here to return to TMF during the next couple of weeks and post there on a specific named thread their username here and giving their agreement for anyone to copy all of their posts on TMF should any data harvesting go ahead? Or they could post their refusal. All people already signed up to here could be asked to agree/or not via the same method. As previuously suggested certain posters who are thought to be of particular value could be contacted directly via the TMF boards (reply to author only) while they are still active to ask for permission.

Of course data harvesting may or may not happen but perhaps it would be practical to do something along these lines just in case there are copyright problems?

Realistically who is going to complain about the TMF posts being preserved elsewhere, and if they do, their posts could potentially be removed if they asked, is that reasonable?

BH

Meatyfool
Lemon Slice
Posts: 313
Joined: November 4th, 2016, 11:43 am
Has thanked: 2 times
Been thanked: 55 times

Re: Obtaining valuable content from the old forum

#686

Postby Meatyfool » November 5th, 2016, 10:29 am

+1 to binoichamsters suggestion.

I have been to quite a number of sites where people have put up a notice saying "sorry if we have taken something of yours that is copyrighted, please ask us to take it down or give us your permission".

Meatyfool..

Midsmartin
Lemon Slice
Posts: 778
Joined: November 4th, 2016, 7:18 am
Has thanked: 211 times
Been thanked: 491 times

Re: Obtaining valuable content from the old forum

#701

Postby Midsmartin » November 5th, 2016, 11:00 am

While it wouldn't be the favourite option, I wonder if there is any scope for obtaining not only the old data, but the old forum software itself from TMF, at least as an archive of old posts? That would eliminate the problems of exporting/importing data, but may raise a whole host of other techie issues as no doubt it was never designed to be portable.

grimer
2 Lemon pips
Posts: 197
Joined: November 4th, 2016, 2:36 pm
Has thanked: 16 times
Been thanked: 24 times

Re: Obtaining valuable content from the old forum

#937

Postby grimer » November 5th, 2016, 11:23 pm

I'm not a legal expert, but if the site is available via the web 'time machine' and nobody cares, why is it a problem to scrape the site and import the posts? What's the real difference?

There is freeware available that will download an entire domain- I used to use it to track changes on a news website (they updated stories without updating the timestamp).

Personally, I'd just go for it and remove any 'infractions' upon request. If nobody complains, then there isn't a problem.


Return to “Room 102 - Site Issues, Complaints & General Chat”

Who is online

Users browsing this forum: No registered users and 40 guests