• We have updated the guidelines regarding posting political content: please see the stickied thread on Website Issues.

Creating Fortean Times In Electronic Format

BulgariasGhost

Gone But Not Forgotten
(ACCOUNT RETIRED)
Joined
Nov 21, 2001
Messages
83
Hello all, the subject of creating Fortean Times in electronic format has been discussed on a few occasions in the past with particular interest in getting back issues available either for general interest or as a searchable database of some kind.

The copyright clearly belongs to the magazine but I know they are quite stretched to keep the current version of the magazine going so I am proposing ( to them and to the user base ) that we pool our resources to create it for them.

I will speak to the management to see if I can get official approval for this, with the understanding that they will own the end product.

One thing we would need to sort out before scanning a single page to OCR ( ie turning a magazine page into text and photos ) is the way we are going to organise the material. Scanned pages are useless unless they are turned into text, proof read and organized in a way that is readable and which can be indexed.

I would like to hear your thoughts on how to do this, but my initial thoughts are:

1 - Scan in magazines and save them as TIFF files ( multi page, high resolution pictures ). This gives the chance for other people to proof read the resultant text against the original without having to pass around rare back issues.

2 - leave out all the adverts, back issue subscription offers etc and only spend time on the articles / letters and photos.

3 - create one file per article with a name of FT101-P43-UFO where it is issue 101, page 43 and about a general subject of UFOs. More detail on the indexing is to follow.

4 - save photos in the same format, and where there are more than one photo on the same article, just add a number to the end of the name.

5 - indexing. This is the tricky bit as I know there are plenty of knowledge-based systems out there that can do wonders with a load of raw data like this. What I would like to see is a recreation of something like the FT issue index like there exists for the first 60 or so issues ( the book for more may exist ). If I can find the original book with the index then that would be a great start to the project.

Once complete then I would like the project to be able to generate income for FT, whether by selling the data on CDs in blocks of 25 issues or whatever ( a big saving on printing costs for them as they only need a CD burner and CD labels from their local WH Smiths. Think of almost 200 back issues - that is 8 CDs which could sell at around 8 quid each. A nice little earner for FT since CDs are 20p each or so. Not to mention the benefit to the Fortean community when it comes to research...

The feedback I would like from you is how we should organise the data, especially for an index. Please think through any response for viability so we can create something Charles Fort would be proud of :)

I suspect the can of worms is just opening now...

thanks

Iain
 
I'd buy them.
I think there could be problems with the issue of paying contributers. I seem to recall other projects like this having trouble due to this issue
 
Iain: I'm very much in favour of this (and I suspect I have said so before on a number of occasions.

My suggested solution would be to provide a full text PDF this looks like the actual magazine page but would aslo have the text selectable and searchable.

When using Adobe Acrobat you can get a plugin which does the work for you when you scan in the page. You can now create PDFs using a variety of different tools (a lot of them free) but I'd need to consult the print experts I know about what would work in these specific circumstances.

Also if you are contacting the mangement ask them about what they use for the print setup? It is possible that whatever they use (like Quark?) can be setup to export the issues as PDFs which would really speed things up (again if they aren't sure of the software can do that then I can consult some people) although it will require a concerted effort to get the older issues done.

These would be hefty files and I would suggest that the costs could be covered by:

1. Making them available for subscribers as an added extra to their subscription (no one need ever then wait for their issue even if they ahve to wait for the paper copy) and theyw ould also get access to the back issues. This should help increase subscriber numbers in the long run (evidence suggests the more value a website adds to the paper publicaiton the more subscriptions).

2, Offer a purely electronic subscription - handy for palces with trouble getting access to the paper copy.

3. Offer each individual issue for a quid or so - if someone is researching somehting and need to get that issue but don't want to subscribe then they can do.

While we are on this subject I'd also like to raise the issue of Fortean Studies. I believe the magazine owns the full rights to them now and it would make sense to make them available again at the same time. As I've mentioned in another thread I was talking to a guy who works for a print on demand company (about another project I was working on) and if you have the PDFs you can then hook up with a PoD company who can produce very short runs of books to order so smallish shops and companies could start selling paper copies again (they could also be offered through the store here).

-------------
There is one big problem people tend to have with this and that is that once you start making things available electornically then they are open to file swapping but the flip side is that the magazine gets a much wider distribution which is not too likely to loose many sales and may ultimately incraese the numbers of subscribers (esp. if you offered an electronic version). You can also have a covering page saying - if you enjoyed this then think about going to the FT site and making small donation (setup a Paypal account or something like that and people can simply throw in a small amount). I have talked this through with a number of people and although I doubt there are any good numbers on this it is generally considered that overall it might not have much impact or might actually result in a profit.

Also with FT specifically in mind it is usually read by quite a few people for each issue so you cn consider it in that light - a number of those people will never buy their own issue or subscribe but a reasonable number will convert - I know I have converted at least two people to subscribers by letting them read through back issues - I was round at one of their houses last night and they have a rack over thier toilet to keep their issues at close hand for easy reading access).

--------------------------------------
Anyway, if you haven't guessed, I'm 1005 in favour of this idea (and have been for a bit) as it will both add value to current subscribers and greatly increase the audience and readers.

Emps
 
scanning settings

Hello there, I am looking for ideas on the best way to scan in old magazines so I can convert them to TIFF or PDF format, with the intention of converting many old, unavailable and hard to find jornals of fortean interest.

Aside from the copyright issues ( which I will check in each case ) I want to find the most effective way to use a regular desktop scanner and a package such as Omnipage Pro to convert the file into a decent resolution scan plus use OCR to make the data available to search through, all as part of a larger project to create a general database of all things fortean.

Does anyone have practice in doing this sort of thing? I have tried a few times with varying degrees of success ( the OCR and PDF conversions are no problem ) and need some pointers on the best scan resolution to use to make the pictures useable without making the resultant scan hundreds of megabytes in size.

I guess I may have to turn the whole page into a HTML style document, but I wanted something a little simpler. Perhaps to get the pictures in high resolution, the whole page has to be, in which case it will take a little longer for me to scan these and save them individually.

I hope to avoid reinventing the wheel here, so any input is welcome.

thanks

Iain
 
In my real life, I am a consultant in Electronic Publishing. One of my recent jobs was to convert three million pages of the Royal Navy's technical manuals into Interactive Electronic Technical Publications (IETPs), and I've spent the last ten years in the "biz" so I've got a bit of know-how.

Ideally, the documents should be converted into text (OCR) and "tagged" using a metalanguage such as Standardized General Markup Language (SGML) or it's younger, more trendy son, eXtensible Markup Language (XML). This will enable you to re-use any text, maintain cross-references and keep any managment info you may need as metadata.

The information could be submitted into an on-line Content Management System (CMS) by volunteers working from home, and the software's free.

This method will probably give you the most future-proof and useable result.

Later FTs will already be in electronic format and will be more easily converted.
 
Yeah, what Arthur says.:confused:
 
escargot said:
Yeah, what Arthur says.:confused:

What I'm getting at Escargot, is that this will be a big undertaking, and we need to capture as much information (over and above the text and graphics) as we can.

The information held within past FTs is too precious to lose, and we'll probably only have one shot at it. Also, the amount of re-use that might be had by the denizens of Fortean Towers is considerable.

Adobe Acrobat is a proprietory product, and I've seen too many such products come and go leaving any documents created in them useless or needing more conversion. Once the information is captured in SGML or XML, it will NEVER become redundant and will be platform-independant.

Check out Microsoft Word 2003 pro (it's XML-enabled and may provide a means of submitting documents).
 
Thanks Arthur, I will look into this as a way of capturing the data once it has been manually checked after OCR. It will give us a chance to make an index much more accurate as any items which refer to a ghost sighting can be given an attribute of "ghost" rather than an article about UFOs which may say "we haven't a ghost of a chance to prove it happened", since both would show up on a basic search for the word ghost.

Can you suggest which freeware/shareware packages are suitable for the XML conversions? Will MS Office XP have this facility?

I will also be throwing out some ideas on the indexing since we are not getting any response from the guys at FT (I appreciate they are plenty busy enough) as I think we will be able to provide a browseable index long before any kind of database is poulated with the articles themselves.

I will also ask on another thread for ideas of other journals which are worth including in the database so I can try to locate the copyright owner and some copies to scan.

I get the feeling that I am about to volunteer away several years of my spare time here... :)

thanks

Iain
 
Arthur ASCII: Yep that is sort of my thinking on the matter - it is better to hold the actual text in a flexible fomrat that allows us to export in various formats. My thinking behind going for the full text PDF is that the plugin does the OCR work and combines it with the scanned image so the text is selectable. We'd then also have the plain text available. My only concern about going for a purely text approach is that it is a magazine and part of it is visual but I'm sure we can

[edit: Ooooooops that'll teach me to multi-task ;)

come to the right solution. We should remeber that one needs to design for the medium and one would expect more of a visual 'feast' from a magazine but somehting aimed at web distribution should really be aiming for more stripped down content i.e. text with the relevant pictures because as nice as some of the illustrations if it is the payoff between large graphics which don't necessarily add any extra information to the piece and a larger file size then I'd vote every time for getting rid of the large illustration.

However, this is an issue we need to be very sure of as it will avoid wasted effeort and it will need input from the editorial staff because we are deciding how their magazine will look in a format which is likely to get a greater circulation than the magazine]

There are a wide range of tools available to help with this - techncially Notepad can be used to markup the text. I have been sizing up things like the Apache FOP module which allows XSL conversion XML documents into a wide range of formats:

http://xml.apache.org/fop/

and I am really in favour of something liek this as it future proofs the whole endeavour - if there were any need to change the original text we'd just be able to XSL to move it across and so something like this would be my favoured approach.

Its for reasons like this that it is best we take our time and get it right first time.

UncleBulgaria: I'll get the email off to you tonight as I kept thinking of other things to add ;)

Emps
 
Hello there Emps, I appreciate the need for the e-mag version to be well presented, but I thought it best to use all the original work of the editors and keep it as a scan for people to browse - that way the file never needs any kind of updating.
The data part - ie the text and pictures - can be kept in a simple format such as flat text (or marked up text ) and JPG pictures - this way the data can be used much more flexibly.

The second advantage to this is that a sellable electronic version of the scans of the magazine can be made available at the start of the project ( since we need it to extract the data from ) and this can be sold by FT to the public to fund their ongoing publication - something that has been mooted as being difficult to do recently.

There is a lot of demand for back issues ( as is evident by the prices reached on ebay for compilation books ) so why not meet that demand and make some money on it while spending minimal effort.

Perhaps other things are in the pipeline - but since I'm hearing nothing from the staff it is hard to find out if the effort will be wasted.

thanks

Iain
 
Regarding paying contributors, I believe for most articles they would be paid once and the article would then be property of FT.

What might be more difficult is the paying the publishers (John Brown/IFG/Dennis).
 
Storing the information in XML format does not preclude the inclusion of graphics. In fact, extra inforation can be stored with each graphic, making them a more useful resource.

Many of the early FTs had rather dire B/W graphics. Are the originals available for scanning?

Can you suggest which freeware/shareware packages are suitable for the XML conversions? Will MS Office XP have this facility?

Office XP Pro is XML-ready "out of the box" and comes complete with the widely used "Docbook" schema, but the conversion process should be a matter for more extended discussion.

I stronlgy urge you NOT to go down the scanned page route. The work will still need to be done again sometime in the future, and as Emps rightly says, a decent stylesheet or two will enable the information to be output in ANY format (paper-based, web-based, WAP etc).
 
Arthur ASCII said:
I stronlgy urge you NOT to go down the scanned page route. The work will still need to be done again sometime in the future, and as Emps rightly says, a decent stylesheet or two will enable the information to be output in ANY format (paper-based, web-based, WAP etc).

While I'm ceratinly in favour of some kind of stripped down XML (text/necessary images) this partly goes back what I said about designing for the medium and we need to keep an eye on other formats other than online and/or eBook (where file size is an issue) and it would also make sense to aim for a CD (or even DVD) collection of good quality scans of the magazine (and some kind of reader/browser which can easily be done in somethng like Flash). This would give us a more flexible approach and both could be done at the same time - you'd have to scan the page in when you run it through the OCR. This approach would mean that no fancy technology would be required beyond a scanner and it would give us nearly the usability of a full text PDF with much greater flexibility.

We will also have to make a decision about image format. Given the latest problems with JPEG and the ongoing legal issues over GIF I really think we need to store the images as something like TIFF but it would probably be best to distribute them as PNG. Its designed to be patent free and offers lossless compression - I'd personally go for this but there are some browser support issues (although this is largely over the opacity support).

See:

http://www.w3.org/Graphics/PNG/

Emps
 
There is likely to be a problem with copyright, as it happens.

For features and illustrations, FT was never rich enough to buy unlimited reprint rights - just one-time only publication plus syndication rights (each author / artist receiving a percentage of the reprint fee).

Illos and photos are a major chunk of the editorial budget, and we can only afford to buy one-time print rights. Any photo reprint (including web use) would have to be paid for again. You might find that nice people like Janet Bord at the Fortean Picture Library, or the Mary Evans Picture Library, would let you use their pictures for the convenience of setting up a hyperlink to their own websites, but you would have to ask them about that. News agencies and professional photographers do not seem amenable to that kind of arrangement, the last time we tried to resolve the problem for putting features up on the website.

I've no idea how these problems were resolved for the JBP reprint volumes (you would have to ask Paul or Bob), but web usage is an entirely different kettle of wombats.
 
in my small experience with writeing for magazine its ususal to be paid for "first seriel rights"..so they get the right to print it once..that dont stop a couple of things of mine turning up on the net without permision tho....
 
Owen whiteoak said:
There is likely to be a problem with copyright, as it happens.

For features and illustrations, FT was never rich enough to buy unlimited reprint rights - just one-time only publication plus syndication rights (each author / artist receiving a percentage of the reprint fee).

Thanks for the reply. I suspect there might be issues but just to clarify (to avoid misunderstanding - largely I suspect mine but.....) are we just talking about the graphics, illustrations, etc. or the actual text, I assume the pieces that are written 'in house' are under your copyright but what about the forum and article pieces?

Emps
 
Emperor said:
Thanks for the reply. I suspect there might be issues but just to clarify (to avoid misunderstanding - largely I suspect mine but.....) are we just talking about the graphics, illustrations, etc. or the actual text, I assume the pieces that are written 'in house' are under your copyright but what about the forum and article pieces?

Emps

The text for most of Strange Days is written In-House (i.e. by Paul), but the Strange Days columns, Features, Forum pieces, Traveller etc. are all commissioned. All illustrations are commissioned and photos are bought in from agencies.
 
Hmm, it sounds like there is quite a patchwork of things we can use and cannot (at least not yet). I wonder if it would be practical to continue with the plan to produce an index, and for some stalwart individual to contact some of the major contributors (who can be identified from the index) to see if we can get permission for the reproduction of the articles, or at least establish the cost of such a reproduction.

I don't want to place any more workload on the staff at FT than is absolutely necessary, so it will fall to volunteers.

This will need more planning than I thought.

Owen - do you know if FT are changing the article reproduction rights for the future? Also, do you know how far back you have the magazines currently available in electronic format ( ie with text available without OCR )?

thanks

Iain
 
UncleBulgaria said:
Hmm, it sounds like there is quite a patchwork of things we can use and cannot (at least not yet). I wonder if it would be practical to continue with the plan to produce an index, and for some stalwart individual to contact some of the major contributors (who can be identified from the index) to see if we can get permission for the reproduction of the articles, or at least establish the cost of such a reproduction.

Yes the index does show that a lot of text is genertaed by a small number of people - Karl Shuker would be a good bet for contacting (I would assume Bob Rickard and Paul Sieveking would also be fine with this) - I have found his Oliver article floating around on the Intrnet and I'm sure quite a few people would be cool with that as long as they got their own copy to use (saves them effort) and distribution in other formats was kept to a cost level (which was sort of the plan anyway wasn't it?). As you say it might be worth looking into seeing if this can't be added to the 'contract' when pieces are commissioned.

Where does this leave your efforts though? From what has been said the online versions of the articles that are on the site might need to be taken down until we can get the right agreements. Is that right?

Emps
 
UncleBulgaria said:
Hmm, it sounds like there is quite a patchwork of things we can use and cannot (at least not yet). I wonder if it would be practical to continue with the plan to produce an index, and for some stalwart individual to contact some of the major contributors (who can be identified from the index) to see if we can get permission for the reproduction of the articles, or at least establish the cost of such a reproduction.

I don't want to place any more workload on the staff at FT than is absolutely necessary, so it will fall to volunteers.

This will need more planning than I thought.

Owen - do you know if FT are changing the article reproduction rights for the future? Also, do you know how far back you have the magazines currently available in electronic format ( ie with text available without OCR )?

thanks

Iain

The trend has always been not to increase the editorial budget, or even to reduce it instead, so I can’t see FT being in a position to buy full rights. Issues in Quark Xpress format on CD-Rom go back to around FT113. It’s possible there might even be some earlier ones, but that’s what we can readily lay our hands on.
 
Owen whiteoak said:
Issues in Quark Xpress format on CD-Rom go back to around FT113. It’s possible there might even be some earlier ones, but that’s what we can readily lay our hands on.

Thanks for the information - that should prove handy for converting things as we progress.

Emps
 
Thanks Owen, that means that we know that quite a lot of the work for issues 113 on is already done.

I will work with Emperor and the other volunteers to complete an index ASAP so we have a starting point and where we have contributers with a large number of articles, we can approach them to see if they approve of their work being reproduced in this way.

What we can do in the meantime is produce scanned versions of the files to begin with - can we agree on a way to store the raw data before OCR etc is used to extract the text. I believe TIFF format allows high resolution, multi-page files that can be viewed on any Windows platform PCs (the lowest common denomenator). I use Omnipage Pro 11 to extract the OCR text and pictures but we may need another package which is more affordable for others to use.

After speaking to Owen I realise that the pictures are much more difficult to reproduce because of copyright from the original sources (newspapers etc). For now I suggest we concentrate on the text component of the data.

FT staff are too busy to assist with the project by virtue of spending most of their waking hours keeping the mag going, so this is likely to be a volunteer project. I'll speak to the guys who have work in progress already and see if we can give a criteria for volunteers so they know what will be needed of them. I will ask for people to come forward then to do this in a co-ordinated and organised way.

thanks

Iain
 
Why not a WWW Fortean Times?

I was wondering about the possibility of making a web based edition of Fortean Times.

It would be a great asset to have all of the FT issues on-line, searchable, and updated each month.

Not living in the UK or USA, it's nigh impossible to get ahold of my favorite mag! I move around quite a bit so getting an overseas subscription is impractical (though I would pay the huge markup!).

You could even have the web edition follow one or two months behind the magazine releases. So you don't cut into your magazine sales.

I would pay a serious amount of money for access to all this FT goodness. (I'm not telling you how much, because it frightens even me. )

So how about a web edition? Can't be too expensive to produce, and is an additional revenue stream to the FT coffers.

JaiYenJohn
 
An electronic edition of the FT has come up in discussion before as it would help with the issues of distribution, etc. - I think it was concluded that there were legal/licensing issues. Not insurmountable probably but.....
 
I'd like to see it though. As much as I love having a bookshelf full of old issues, it would be really handy if there was an archive somewhere of them, especially the really old issues that are hard to come by now.

I know there's the reprinted collections, but I can't afford to complete my back catalogue like that, and I probably wouldn't be interested in half the stuff in them anyway.

Some way to search the backlist by article, author or subject would be cool.
 
Ahh the mysteries

It's ironic that a publication that thinks nothing of exploring the mysteries of aliens, ghosts, moth-men etc. is stymied by the equally baffling mysteries of modern licensing agreements, and copyright ;)

Seriously though, as a TV producer I understand the pandora's box of trying to confirm that you actually own what you think you do. And the lovely letters from lawyers trying to prove the opposite.

In any case though, most licensing issues are just about money, unless you just can't find out who owns what you want to print.

So a subscription fee could be cheerfully leveled on all of us eager readers to provide for the rights to display FT in a new medium. We would love to pay towards such a project.

Besides, FT is not the weekly world news yeah? It is not ONLY a for profit publication, in your mission statement it is also claimed as "being a journal of record..."

A fully searchable, online source of all the Fortean phenomena since FT's inception would be a towering achievement, an indespensible resource to the public, and a lasting monument to all those who have contributed articles to the field. Those whose words now lie moldering in a dank warehouse, crying to be read again!

:lol:
 
If there is an enthusiasm for this then we could push to get the situation clarified and then draw up a few angles of approach.

The magazine is currently put together in Quark (I believe) so the actual production of an electronic issue (exporting to a PDF) is only the matter of the press of a button.
 
Back
Top