Got a credit card? use our Credit Card & Finance Calculators
Thanks to Rhyd6,eyeball08,Wondergirly,bofh,johnstevens77, for Donating to support the site
Obtaining valuable content from the old forum
-
- Posts: 7
- Joined: November 4th, 2016, 1:05 pm
Obtaining valuable content from the old forum
There is so much incredible information and resource on the old boards.
As someone who has the technical capability to harvest data from websites, it is very simple to scrape the old boards. As it is a public forum there is not a lot of comeback in relation to obtaining it and republishing as long as it isn't used commercially and it could be referenced. But, INAL, so those with experience in law might what to add/disagree. Happy for feedback.
The only area of concern would be the number of hits required to trawl the existing site and download the HTML. The hits issue can be overcome with gapping the GET requests and downloading could be limited to specific boards or threads.
Is, or has, anybody approached the powers that be about some sort of agreement to do this in a legitimate manner?
I am by no means suggesting that this should be done, without the permission of those that run the old forum, only that it is technically possible.
As someone who has the technical capability to harvest data from websites, it is very simple to scrape the old boards. As it is a public forum there is not a lot of comeback in relation to obtaining it and republishing as long as it isn't used commercially and it could be referenced. But, INAL, so those with experience in law might what to add/disagree. Happy for feedback.
The only area of concern would be the number of hits required to trawl the existing site and download the HTML. The hits issue can be overcome with gapping the GET requests and downloading could be limited to specific boards or threads.
Is, or has, anybody approached the powers that be about some sort of agreement to do this in a legitimate manner?
I am by no means suggesting that this should be done, without the permission of those that run the old forum, only that it is technically possible.
-
- Lemon Slice
- Posts: 313
- Joined: November 4th, 2016, 11:43 am
- Has thanked: 2 times
- Been thanked: 55 times
Re: Obtaining valuable content from the old forum
It would be charitable of TMF if they gave us to the two week read only period to do as much as we can to scrape the site.
Their call.
Their call.
-
- Lemon Quarter
- Posts: 2083
- Joined: November 4th, 2016, 9:40 am
- Has thanked: 1040 times
- Been thanked: 842 times
Re: Obtaining valuable content from the old forum
mc2fool has already addressed this, and I have emailed archive.org to ask if they'll do a last minute scrape.
http://boards.fool.co.uk/it-should-be-possible-to-archive-this-board39s-13457309.aspx
http://boards.fool.co.uk/it-should-be-possible-to-archive-this-board39s-13457309.aspx
-
- Posts: 7
- Joined: November 4th, 2016, 1:05 pm
Re: Obtaining valuable content from the old forum
GrahamPlatt wrote:mc2fool has already addressed this, and I have emailed archive.org to ask if they'll do a last minute scrape.
http://boards.fool.co.uk/it-should-be-possible-to-archive-this-board39s-13457309.aspx
Excellent. The good thing about the lack of advancement of the forum software is that only a few scraping templates would needed to abstract the data from the html in to whatever format is required for storage/publication (xml,json,etc.).
BEnd
-
- Lemon Quarter
- Posts: 1099
- Joined: November 4th, 2016, 1:25 pm
- Has thanked: 102 times
- Been thanked: 375 times
Re: Obtaining valuable content from the old forum
BeaglesEnd wrote:There is so much incredible information and resource on the old boards.
As someone who has the technical capability to harvest data from websites, it is very simple to scrape the old boards. As it is a public forum there is not a lot of comeback in relation to obtaining it and republishing as long as it isn't used commercially and it could be referenced.
TMF content will remain copyright of TMF Limited in accordance with their T's and C's. They may be happy to grant a licence or licences to copy it, but I would expect that to be stuffed firmly in the "too hard" basket.
-
- Lemon Slice
- Posts: 313
- Joined: November 4th, 2016, 11:43 am
- Has thanked: 2 times
- Been thanked: 55 times
Re: Obtaining valuable content from the old forum
[/quote]dionaeamuscipula wrote:
TMF content will remain copyright of TMF Limited in accordance with their T's and C's. They may be happy to grant a licence or licences to copy it, but I would expect that to be stuffed firmly in the "too hard" basket.
Hence my suggestion back on TMF to monetise the boards. Let someone screen scrape and rejig the output into a searchable DVD and sell them at a reasonable price to those who want the back catalogue.
TMF could even insist on X number of hard requests backed up by readies before they give permission.
Meatyfool..
-
- Lemon Slice
- Posts: 621
- Joined: November 4th, 2016, 3:46 pm
- Has thanked: 606 times
- Been thanked: 368 times
Re: Obtaining valuable content from the old forum
BeaglesEnd wrote:There is so much incredible information and resource on the old boards.
As someone who has the technical capability to harvest data from websites, it is very simple to scrape the old boards. As it is a public forum there is not a lot of comeback in relation to obtaining it and republishing as long as it isn't used commercially and it could be referenced. But, INAL, so those with experience in law might what to add/disagree. Happy for feedback.
The only area of concern would be the number of hits required to trawl the existing site and download the HTML. The hits issue can be overcome with gapping the GET requests and downloading could be limited to specific boards or threads.
I posted some some metrics here on numbers of boards and posts: http://boards.fool.co.uk/if-tmf-history-data-could-somehow-be-saved-then-13457328.aspx
Legal issues aside, the difficulty of harvesting post content from the boards.fool.co.uk lies not in scraping the text from webpages but in the volume of content that exists. If less than 100% of content is to be retained when Fool towers finally pull the plug, then who decides what is kept and on what basis? In addition: how will retained content be organised; where will it be kept (eg will it be kept on lemonfool.co.uk); and, to what extent will the functionality of the old boards site be retained - particularly in keeping working cross links between posts.
I'm not trying to throw a spanner in anyone's works, and what stooz and Clariman have done so far is commenadable but I think there are things to consider before plunging into a large scale data grab, not least because they determine how that grab can be made to achieve its objectives.
-
- Lemon Quarter
- Posts: 3140
- Joined: November 4th, 2016, 11:12 am
- Has thanked: 3640 times
- Been thanked: 1521 times
Re: Obtaining valuable content from the old forum
TMF content will remain copyright of TMF Limited in accordance with their T's and C's. They may be happy to grant a licence or licences to copy it, but I would expect that to be stuffed firmly in the "too hard" basket.
According to the T&Cs, the copyright remains with the author of a post. TMF grant themselves a perpetual, non-exclusive license.
RC
According to the T&Cs, the copyright remains with the author of a post. TMF grant themselves a perpetual, non-exclusive license.
RC
-
- Lemon Slice
- Posts: 582
- Joined: November 4th, 2016, 1:17 pm
- Has thanked: 192 times
- Been thanked: 126 times
Re: Obtaining valuable content from the old forum
ReformedCharacter wrote:TMF content will remain copyright of TMF Limited in accordance with their T's and C's. They may be happy to grant a licence or licences to copy it, but I would expect that to be stuffed firmly in the "too hard" basket.
According to the T&Cs, the copyright remains with the author of a post. TMF grant themselves a perpetual, non-exclusive license.
RC
Bum. That means it legally not possible - TMF has a licence from every poster to hold and display the posts, but they remain copyright with the original poster. If you could get permission from the copyright holder (ie they'd post on TMF under their username explicitly saying you can do this) then you could scrape their old posts, but if you don't get that permission... you cannot, and TMF cannot give that permission either.
I think you'd have to cherry-pick posters with a good "back catalogue" (luniversal on Investing For Income for example), then scrape as many of his posts as you like to re-post here if they give permission.
-
- Lemon Half
- Posts: 9129
- Joined: November 4th, 2016, 1:16 pm
- Has thanked: 4140 times
- Been thanked: 10032 times
Re: Obtaining valuable content from the old forum
I was having a think this morning about all the hard work that went into the many 'Announcements' notices on a great number of high-traffic boards over on TMF.
Presumably, and I'm happy to verify this with Stooz, the boards that are being duplicated over here can have some sort of 'Sticky' post system, where we could take the opportunity to migrate those 'Announcements' notices over to the high-profile 'Sticky' posts on the relevant boards.
Is this something to think about?
Is it something we're going to be motivated into doing?
Is there likely to be any copyright issues? I'm happy to ask for some sort of 'blanket permission' over on TMF, but I'm not sure if we'll get it, or get it in time.
It would be a real waste not to take advantage of the many fantastic 'Announcements' panels that have in some cases taken years to create and manipulate into concise, helpful announcements, so I wondered what people thought?
Cheers,
Itsallaguess
Presumably, and I'm happy to verify this with Stooz, the boards that are being duplicated over here can have some sort of 'Sticky' post system, where we could take the opportunity to migrate those 'Announcements' notices over to the high-profile 'Sticky' posts on the relevant boards.
Is this something to think about?
Is it something we're going to be motivated into doing?
Is there likely to be any copyright issues? I'm happy to ask for some sort of 'blanket permission' over on TMF, but I'm not sure if we'll get it, or get it in time.
It would be a real waste not to take advantage of the many fantastic 'Announcements' panels that have in some cases taken years to create and manipulate into concise, helpful announcements, so I wondered what people thought?
Cheers,
Itsallaguess
-
- Lemon Slice
- Posts: 406
- Joined: November 4th, 2016, 10:52 pm
- Has thanked: 242 times
- Been thanked: 65 times
Re: Obtaining valuable content from the old forum
I think you'd have to cherry-pick posters with a good "back catalogue" (luniversal on Investing For Income for example), then scrape as many of his posts as you like to re-post here if they give permission.
Why not change the sign up for this site to ask anyone signing up here to return to TMF during the next couple of weeks and post there on a specific named thread their username here and giving their agreement for anyone to copy all of their posts on TMF should any data harvesting go ahead? Or they could post their refusal. All people already signed up to here could be asked to agree/or not via the same method. As previuously suggested certain posters who are thought to be of particular value could be contacted directly via the TMF boards (reply to author only) while they are still active to ask for permission.
Of course data harvesting may or may not happen but perhaps it would be practical to do something along these lines just in case there are copyright problems?
Realistically who is going to complain about the TMF posts being preserved elsewhere, and if they do, their posts could potentially be removed if they asked, is that reasonable?
BH
Why not change the sign up for this site to ask anyone signing up here to return to TMF during the next couple of weeks and post there on a specific named thread their username here and giving their agreement for anyone to copy all of their posts on TMF should any data harvesting go ahead? Or they could post their refusal. All people already signed up to here could be asked to agree/or not via the same method. As previuously suggested certain posters who are thought to be of particular value could be contacted directly via the TMF boards (reply to author only) while they are still active to ask for permission.
Of course data harvesting may or may not happen but perhaps it would be practical to do something along these lines just in case there are copyright problems?
Realistically who is going to complain about the TMF posts being preserved elsewhere, and if they do, their posts could potentially be removed if they asked, is that reasonable?
BH
-
- Lemon Slice
- Posts: 313
- Joined: November 4th, 2016, 11:43 am
- Has thanked: 2 times
- Been thanked: 55 times
Re: Obtaining valuable content from the old forum
+1 to binoichamsters suggestion.
I have been to quite a number of sites where people have put up a notice saying "sorry if we have taken something of yours that is copyrighted, please ask us to take it down or give us your permission".
Meatyfool..
I have been to quite a number of sites where people have put up a notice saying "sorry if we have taken something of yours that is copyrighted, please ask us to take it down or give us your permission".
Meatyfool..
-
- Lemon Slice
- Posts: 778
- Joined: November 4th, 2016, 7:18 am
- Has thanked: 211 times
- Been thanked: 491 times
Re: Obtaining valuable content from the old forum
While it wouldn't be the favourite option, I wonder if there is any scope for obtaining not only the old data, but the old forum software itself from TMF, at least as an archive of old posts? That would eliminate the problems of exporting/importing data, but may raise a whole host of other techie issues as no doubt it was never designed to be portable.
-
- 2 Lemon pips
- Posts: 197
- Joined: November 4th, 2016, 2:36 pm
- Has thanked: 16 times
- Been thanked: 24 times
Re: Obtaining valuable content from the old forum
I'm not a legal expert, but if the site is available via the web 'time machine' and nobody cares, why is it a problem to scrape the site and import the posts? What's the real difference?
There is freeware available that will download an entire domain- I used to use it to track changes on a news website (they updated stories without updating the timestamp).
Personally, I'd just go for it and remove any 'infractions' upon request. If nobody complains, then there isn't a problem.
There is freeware available that will download an entire domain- I used to use it to track changes on a news website (they updated stories without updating the timestamp).
Personally, I'd just go for it and remove any 'infractions' upon request. If nobody complains, then there isn't a problem.
Return to “Room 102 - Site Issues, Complaints & General Chat”
Who is online
Users browsing this forum: No registered users and 25 guests