Donate to Remove ads

Got a credit card? use our Credit Card & Finance Calculators

Thanks to Anonymous,bruncher,niord,gvonge,Shelford, for Donating to support the site

Archiving TMF boards

Formerly "Lemon Fool - Improve the Recipe" repurposed as Room 102 (see above).
mc2fool
Lemon Half
Posts: 8089
Joined: November 4th, 2016, 11:24 am
Has thanked: 7 times
Been thanked: 3130 times

Archiving TMF boards

#1637

Postby mc2fool » November 7th, 2016, 12:27 pm

I've just noticed something about the WayBack Machine (https://archive.org/) that may give us a way of archiving most or all of entire TMF boards.

If you try to go to a TMF post on the WayBack Machine that it doesn't have it gives you the option to "Help make the Wayback Machine more complete! Save this url in the Wayback Machine". Ok, so you click to do so and it does that. Fine.

However, if you then navigate from that post, by e.g. using Prev Thread | Prev | Next | Next Thread, it seems to go into auto-save mode, saving in the WayBack Machine every post you navigate to. Now, if that's true (and I've only made a cursory test), it means we can get large amounts of the TMF boards archived into the WayBack Machine by simply getting a few volunteers for each board to get into that mode on the WayBack Machine and blindly repeatedly clicking Next or Next Thread while watching the tele.

One of the challenges is to figure out how that'd work with TMF's different ways of viewing posts (threaded, unthreaded, whole thread) and different ways of addressing posts (text urls, message no urls, etc) and the ways of viewing the list of posts in a board (threaded, unthreaded, collapsed). Clearly saving whole threads would be a lot less clicking (of Prev Thread | Next Thread) than each individual post (Prev | Next) for the volunteers.

So, what seems to be needed is to work out what works and is practical, and then post instructions telling folks what to do and soliciting volunteers and, in the end, put up a sticky post in each forum that says "For the archived TMF board, start at <url-on-wayback-machine>".

I know it's a bit cheeky to come up with an idea and then ask others to do the work for it, but unfortunately I'm quite short of free time at the moment. So, it'd be great if somebody would be willing to check out the first part, "work out what works and is practical", so we can, hopefully, see if there's a way of getting the old boards archived on the WayBack Machine....

seekingbalance
2 Lemon pips
Posts: 163
Joined: November 7th, 2016, 11:14 am
Has thanked: 16 times
Been thanked: 66 times

Re: Archiving TMF boards

#1641

Postby seekingbalance » November 7th, 2016, 12:34 pm

Great post! Now hopefully people can see how useful recs are!

Have a rec

SB

seekingbalance
2 Lemon pips
Posts: 163
Joined: November 7th, 2016, 11:14 am
Has thanked: 16 times
Been thanked: 66 times

Re: Archiving TMF boards

#1644

Postby seekingbalance » November 7th, 2016, 12:37 pm

I'll look into this when I get home, but I wonder whether it is possible to request an archive of the entire site. Well, the forums at least, as the rest is pretty useless.

Gengulphus
Lemon Quarter
Posts: 4255
Joined: November 4th, 2016, 1:17 am
Been thanked: 2631 times

Re: Archiving TMF boards

#1696

Postby Gengulphus » November 7th, 2016, 2:04 pm

I'd be cautious about this idea. Many sites will have policies against misuse that essentially amount to saying "we'll take action against attempts to use up lots of our resources on things that aren't really in accordance with our aims". I haven't investigated the WayBack Machine's policies or aims, but large-scale 'blind' archiving (as opposed to archiving of things people have decided they want to be able to look at again) might fall foul of something like that...

Gengulphus

seekingbalance
2 Lemon pips
Posts: 163
Joined: November 7th, 2016, 11:14 am
Has thanked: 16 times
Been thanked: 66 times

Re: Archiving TMF boards

#1703

Postby seekingbalance » November 7th, 2016, 2:19 pm

Yep, I agree with that point. Looking at the site it does not however appear to say that (though as a US abased site it does have a lot of legalese). As the site has a stated aim to archive any and all "historically important" information, and as it already has an extensive but not exhaustive archive of the Fool, I think it is reasonable to assume they mean it when they say that if you can't find a page you want just add it, so we can certainly do that.

As I said I would, above, I have hadn't a look at the site, especially the help and FAQ sections. It isn't not obvious whether you can ask for them to do a specific crawl, so I have emailed them to ask.

Don't ask, don't get!

We'll see what they come back with. If they say it is possible, maybe we should wait until final archive day from the Fool and get it done then.

SB

Gengulphus
Lemon Quarter
Posts: 4255
Joined: November 4th, 2016, 1:17 am
Been thanked: 2631 times

Re: Archiving TMF boards

#1812

Postby Gengulphus » November 7th, 2016, 5:36 pm

Sorry, SB, when I said "this idea", I was actually referring to the idea in mc2fool's OP of human volunteers doing the crawling - I'd not quite understood that what you'd said in reply was about the WayBack Machine doing it. That sounds an excellent suggestion to me if they'll do it, and it's certainly worth asking - the worst they can do is say no!

Gengulphus

88V8
Lemon Half
Posts: 5965
Joined: November 4th, 2016, 11:22 am
Has thanked: 4328 times
Been thanked: 2676 times

Re: Archiving TMF boards

#1903

Postby 88V8 » November 7th, 2016, 8:41 pm

Surely, the TMF fora are a significant social history snapshot, and uniquely worth saving.

I would be willing to help with this.

V8

nigelpm
Lemon Pip
Posts: 61
Joined: November 4th, 2016, 7:10 pm
Has thanked: 1 time
Been thanked: 11 times

Re: Archiving TMF boards

#1912

Postby nigelpm » November 7th, 2016, 8:51 pm

I would imagine given how "nice" the TMF structure/web address was there must be a way of extracting every post through a script into a csv file.

Then it might be possible to create an archive board on Lemon - and extract into there?

TMF have given their ok on use of the content.

Must be do-able - although I wouldn't know but I know folk who have done this in kind of exercise in the past.

uspaul666
2 Lemon pips
Posts: 233
Joined: November 4th, 2016, 6:35 am
Has thanked: 196 times
Been thanked: 112 times

Re: Archiving TMF boards

#1923

Postby uspaul666 » November 7th, 2016, 9:10 pm

Doable but they hinted it was large and "raw"
Guess at about 400 boards, 5000 posts each, each 10 lines of 80 characters. That's 1.6 GB. I think that's a conservative estimate. It's probably is some kind of weird format that is only readable by some really old software. It might take several days for a regular PC to chew it's way through all that data. If it's going to be processed in the cloud then there may be considerable bandwidths costs connected with uploading, downloading and transferring it all plus it might be necessary to repeat the whole procedure several times until the conversion logic is perfected.
Doable but not trivial. I'd find it a fun and worthwhile project but it'd be hard work.

wickham
Lemon Slice
Posts: 363
Joined: November 6th, 2016, 8:13 am
Has thanked: 34 times
Been thanked: 10 times

Re: Archiving TMF boards

#1927

Postby wickham » November 7th, 2016, 9:19 pm

I wonder if it would be possible to take over the domain name and hosting service TMF UK uses and give a guarantee to TMF that the forum will always be Read Only, so then Lemon Fool would carry on paying for the domain and hosting and just put an link to the archive on this forum. As long as no new members or posts are added, the software would probably carry on working for a while if hackers couldn't get in, but crashes might be a problem.

I administer a forum which has done just that, but both the old and new forums are owned by the same club.

melonfool
Lemon Quarter
Posts: 2939
Joined: November 4th, 2016, 11:18 am
Has thanked: 1365 times
Been thanked: 794 times

Re: Archiving TMF boards

#1933

Postby melonfool » November 7th, 2016, 9:29 pm

TMF Tarantula has said that TMF will help with this:

"Would it be possible to export Board content to another system? Or licence it for 3rd party use?

Yes, either of those may be possible, but there is a very large amount of data and it's in a relatively 'raw' format, so it may be difficult to import it gracefully into another system. However, we could potentially pursue this option if a viable solution can be found."

http://boards.fool.co.uk/hi-everyone-ra ... 58233.aspx

Mel

nigelpm
Lemon Pip
Posts: 61
Joined: November 4th, 2016, 7:10 pm
Has thanked: 1 time
Been thanked: 11 times

Re: Archiving TMF boards

#1940

Postby nigelpm » November 7th, 2016, 9:41 pm

Sounds like a nice challenge for a student or someone as a uni project - they'd lap it up.

1.6Gb - peanuts by today's standards.

88V8
Lemon Half
Posts: 5965
Joined: November 4th, 2016, 11:22 am
Has thanked: 4328 times
Been thanked: 2676 times

Re: Archiving TMF boards

#2148

Postby 88V8 » November 8th, 2016, 12:33 pm

I just scraped this from TMF... perhaps he/she - RandomAmbler will post it over here. They've joined but not yet posted.

If it's of interest to anyone else I have knocked up a tool to download all of an author's posts on a particular board. The idea is that I can archive my posts before they are lost forever (whether they're worth keeping is another matter!) and it's a lot easier to do this with a tool than by hand.

At the moment I've only used this on Chrome and I know that it doesn't work on IE (yet!).

Anyway the tool is here: https://damiancannon.github.io/MotleyFoolDownloader/

Hopefully the instructions are clear enough. If anyone has any questions, feedback or problems then feel free to let me know.

NB when used for the first time Chrome will complain that you're trying to download too many files at once and will, eventually, show a dialog asking if you want to allow this - click yes to complete the downloading.


see #4541
http://boards.fool.co.uk/closure-of-the ... e#13458561

V8

modellingman
Lemon Slice
Posts: 638
Joined: November 4th, 2016, 3:46 pm
Has thanked: 625 times
Been thanked: 377 times

Re: Archiving TMF boards

#2178

Postby modellingman » November 8th, 2016, 1:32 pm

melonfool wrote:TMF Tarantula has said that TMF will help with this:

"Would it be possible to export Board content to another system? Or licence it for 3rd party use?

Yes, either of those may be possible, but there is a very large amount of data and it's in a relatively 'raw' format, so it may be difficult to import it gracefully into another system. However, we could potentially pursue this option if a viable solution can be found."

http://boards.fool.co.uk/hi-everyone-ra ... 58233.aspx

Mel


I'm glad this is starting to get noticed.

I've just written about what I think the implications of this might be here: viewtopic.php?f=21&t=197#p2164

RandomAmbler
Posts: 8
Joined: November 4th, 2016, 11:00 am

Re: Archiving TMF boards

#2211

Postby RandomAmbler » November 8th, 2016, 2:28 pm

88V8 wrote:I just scraped this from TMF... perhaps he/she - RandomAmbler will post it over here. They've joined but not yet posted.

If it's of interest to anyone else I have knocked up a tool to download all of an author's posts on a particular board. The idea is that I can archive my posts before they are lost forever (whether they're worth keeping is another matter!) and it's a lot easier to do this with a tool than by hand.

At the moment I've only used this on Chrome and I know that it doesn't work on IE (yet!).

Anyway the tool is here: https://damiancannon.github.io/MotleyFoolDownloader/

Hopefully the instructions are clear enough. If anyone has any questions, feedback or problems then feel free to let me know.

NB when used for the first time Chrome will complain that you're trying to download too many files at once and will, eventually, show a dialog asking if you want to allow this - click yes to complete the downloading.


see #4541
http://boards.fool.co.uk/closure-of-the ... e#13458561

V8


Hi V8,

Yes indeed I'm here and available for questions. The download tool appears to be working at the moment but this is in just a single environment (Chrome on Windows); I'm quite sure that there will be some issues on other environments as I only hacked this together on Sunday!

That said while I've limited the tool to download just the posts from a single user on a single board at the moment it would be feasible to extract all of the posts from any particular board - although we wouldn't want everyone doing it! Probably best to see if a data dump of the entire forum can be obtained first though.

Feel free to send me any questions/issues and I'll do my best to help.

Damian

seekingbalance
2 Lemon pips
Posts: 163
Joined: November 7th, 2016, 11:14 am
Has thanked: 16 times
Been thanked: 66 times

Re: Archiving TMF boards

#4446

Postby seekingbalance » November 13th, 2016, 12:21 pm

Update - I emailed the Wayback machine people asking if they have a process for a manual crawl for the entire boards section of the site. A week later, no reply from them!

I will try again tomorrow.

SB

mc2fool
Lemon Half
Posts: 8089
Joined: November 4th, 2016, 11:24 am
Has thanked: 7 times
Been thanked: 3130 times

Re: Archiving TMF boards

#4456

Postby mc2fool » November 13th, 2016, 12:37 pm

seekingbalance wrote:Update - I emailed the Wayback machine people asking if they have a process for a manual crawl for the entire boards section of the site. A week later, no reply from them!

I will try again tomorrow.

SB

They do have forums and one of those may be of utility (sorry, once again I'm short of time to research it myself :( )

https://archive.org/iathreads/forums.php

TawnyOwl
Posts: 20
Joined: November 11th, 2016, 1:24 pm
Been thanked: 4 times

Re: Archiving TMF boards

#5243

Postby TawnyOwl » November 15th, 2016, 1:47 pm

I cannot see any reason why saving the old TMF boards is worthwhile. Basically nobody reads old posts, it's a waste of time and effort.

Nowadays there are so many sites, so much stuff, out there that nobody can ever keep up with it. I used to post on many boards on many sites but have basically given them all up because it's like posting into a black hole. I'm still here simply because I have an interest in investment and might find or contribute something useful.

Tawny

mc2fool
Lemon Half
Posts: 8089
Joined: November 4th, 2016, 11:24 am
Has thanked: 7 times
Been thanked: 3130 times

Re: Archiving TMF boards

#5268

Postby mc2fool » November 15th, 2016, 3:09 pm

TawnyOwl wrote:I cannot see any reason why saving the old TMF boards is worthwhile. Basically nobody reads old posts, it's a waste of time and effort.

That may be true for some boards but for others, e.g. the Pensions - Practical Problems board, folks regularly refer -- and link -- back to previous detailed explanations of complex matters and their discussions, so I must disagree with the assertion that "nobody" reads old posts and it's a waste of time and effort.

poundcoin
Lemon Slice
Posts: 313
Joined: November 4th, 2016, 6:00 pm
Has thanked: 67 times
Been thanked: 44 times

Re: Archiving TMF boards

#5270

Postby poundcoin » November 15th, 2016, 3:16 pm

Unlike most others , there were only a couple of threads I initiated on TMF that are worth me saving .

I viewed them in whole thread mode , highlighted ,copied and pasted into an email and sent it to myself . As long as my yahoo account doesn't get hijacked should be there for ever .

Seems to view in the exact same format as TMF page and the external URLs all work .

I realise that this method would be tedious for those wanting to save hundreds of posts .


Return to “Room 102 - Site Issues, Complaints & General Chat”

Who is online

Users browsing this forum: No registered users and 13 guests