Not sure if interest in this topic has waned given the lack of responses, but I had a few more thoughts that I think might be useful.
I'll try to keep it short and snappy (although that is not my forte!)
There are two options that have been considered (although I'll come to a possible third at the end of this post); 1- creating & hosting a replica of the TMF boards & structure and 2- using the waybackmachine (I kept calling it the time-machine in my last post - doh) to provide a view of how the boards were at date X. I'll take these in turn.
1- creating & hosting a replica of the TMF boards & structureI had missed stooz's post on this topic from 15-Nov :
stooz wrote:A real benefit would be parts about directors dealings in shares for example. An historical tail as a guide to intended investments.
However we have been in discussions, lengthy and time consuming. We are looking at 40gb of data, many hours of conversion work, on going costs, continued legal protection costs and overall a bill over 5 figures... So don't get your hopes up.
So to cover the issues stooz raises one by one :
> We are looking at 40gb of data -
In fact with efficient data storage (not including text compression which might save more) it will actually be about 8gb - which is probably more managable> many hours of conversion work -
DONE!> on going costs, -
Basically just the costs of hosting - not a massive financial amount I would think. As I don't know the future of my hosted sites at this stage it is not something I can offer currently.> continued legal protection costs -
Presumably this is by way of liability insurance in the case of any comment being called up as libel etc? Not sure how much this would be, but presumably would diminish over time as the posts age?>and overall a bill over 5 figures -
not sure what those costs are? The insurance and ongoing hosting would the only one actually required.It seems to me that this option is highly feasible. I could personally deliver a complete website that mirrors the functionality of the TMF boards (including reinstating Best of Boards (with date filters) that is currently not available in the read-only version - plus maybe some of the other features that I recall have been on peoples wish lists in the past).
However as I mentioned above, I'm not currently prepared to stand up to hosting this ongoing nor taking on the "legal cover"; but if someone else were prepared to lead on this and has a hosting package that includes php & MySql - I'd be happy to deliver the data and the techy stuff.
{Favourite Fools & personal recommendations would not be included and they are not available now, but potentially a new solution could allow users on the new platform to select their favourite Fools and posts to aid navigation}
2. way back machineI had never previously considered how this works; but I think now that it works on "static" webpages and records the returned page content for each of it's list of URLs.
This is then problematic for a data driven site like the fool as there are multiple URLs that can result in the same content being presented, so each page view would have to be stored multiple times and I am pretty sure that this would be seriously beyond the scope of what they do.
I could write reams on this issue, but I think one issue will suffice to illustrate the problem :
When you look at a list of posts on a board the url is (for eg) :
http://boards.fool.co.uk/investment-strategies-.aspx?mid=
13454393&sort=
postdateWhere the blue text is the id of the board , the red text is the post number that will appear at the top of the list and the green text is the sort order.
So for
each board it would be required to store
at least the number of posts x the number of sort options as individual archived web-pages.
Then there is threaded vs unthreaded & a number of additional options besides - it's definitely a finite number, but I can't begin to imagine how big.
BUT The point has been made that you can address each individual post on the boards by using the format :
http://boards.fool.co.uk/Message.aspx?mid=13419738 where 13419738 can be any integer in the range 5667966 to 3460917.
That's c. 7.8m individual pages to ask the waybackmachine to archive - I guess that would be manageable?
What I also found though is that the url above actually produces a redirect code to the "actual" address of the webpage in the format
http://boards.fool.co.uk/i-am-not-a-fan ... 19738.aspxwhere the text bit appears to be the first few words /characters of the post content and the number at the end is the message id as per the earlier format.
If needed I could provide a list of the less than 7.8m urls in this format
And by capturing those all of the CONTENT of the TMF boards could be preserved in perpetuity - however this would not preserve the functionality. Clicking on any (or at least most) of the links on a given post would result in a URL that had not been preserved.
So this is where I come to the third possible option, inevitably :
3. A HYBRIDMy first option above - the "mirror site" essentially has 2 components - (1) the post headers and all the clever (not too) code that deals with the navigation [About 1Gb of the data] and (2) the actual content of each post, flat text including html commands [7Gb of the data].
So
IF the wayback machine were able to preserve the 7.8m (slightly less) individual posts, then a hosted navigation service could easily allow navigation around the site. (I could even translate any links embedded in the post contents).
I wonder if this would also address the legal liability point? The hosted site would only be addressing content published by another provider - the waybackmachine and presumably they already have some mechanism protecting themselves from any archived comment that may be considered libelous?
Anyway I clearly failed on the brevity point, but hopefully this may have given a possible new direction to this?
Cheers,
Gromley