2000: DISTRIBUTED PROOFREADERS
[Overview]
Conceived in October 2000 by Charles Franks, Distributed Proofreaders was launched online in March 2001 to help in the digitization of public domain books. The method is to break up the tedious work of checking eBooks for errors into small, manageable chunks. Originally meant to assist Project Gutenberg in the handling of shared proofreading, Distributed Proofreaders has become the main source of Project Gutenberg eBooks. In 2002, Distributed Proofreaders became an official Project Gutenberg site. The number of books processed through Distributed Proofreaders has grown fast. In 2003, about 250-300 people were working each day all over the world producing a daily total of 2,500-3,000 pages, the equivalent of two pages a minute. In 2004, the average was 300-400 proofreaders participating each day and finishing 4,000-7,000 pages per day, the equivalent of four pages a minute. Distributed Proofreaders processed a total of 3,000 books in February 2004, 5,000 books in October 2004, 7,000 books in May 2005, 8,000 books in February 2006 and 10,000 books in March 2007, with the help of 36,000 volunteers.
[In Depth (published in 2005, updated in 2008)]
The main "leap forward" of Project Gutenberg since 2000 is due to Distributed Proofreaders. In 2002, Distributed Proofreaders became an official Project Gutenberg site. In May 2006, Distributed Proofreaders became a separate entity and continues to maintain a strong relationship with Project Gutenberg.
Volunteers don't have a quota to fill, but it is recommended they do a page a day if possible. It doesn't seem much, but with hundreds of volunteers it really adds up. In December 2007, five books were produced per day by thousands of volunteers.
From the website one can access a program that allows several proofreaders to be working on the same book at the same time, each proofreading different pages. This significantly speeds up the proofreading process. Volunteers register and receive detailed instructions. For example, words in bold, italic or underlined, or footnotes are always treated the same way for any book. A discussion forum allows them to ask questions or seek help at any time. A project manager oversees the progress of a particular book through its different steps on the website.
The website gives a full list of the books that are: (a) completed, i.e. processed through the site and posted to Project Gutenberg; (b) in progress, i.e. processed through the site but not yet posted, because currently going through their final proofreading and assembly; (c) being proofread, i.e. currently being processed. On August 3, 2005, 7,639 books were completed, 1,250 books were in progress and 831 books were being proofread. On May 1st, 2008, 13,039 books were completed, 1,840 books were in progress and 1,000 books were being proofread.
Each time a volunteer (proofreader) goes to the website, s/he chooses a book, any book. Then one page of the book appears in two forms side by side: the scanned image of one page and the text from that image (as produced by OCR software). The proofreader can easily compare both versions, note the differences and fix them. OCR is usually 99% accurate, which makes for about 10 corrections a page. The proofreader saves each page as it is completed and can then either stop work or do another. The books are proofread twice, and the second time only by experienced proofreaders. All the pages of the book are then formatted, combined and assembled by post-processors to make an eBook. The eBook is now ready to be posted with an index entry (title, subtitle, author, eBook number and character set) for the database. Indexers go on with the cataloging process (author's dates of birth and death, Library of Congress classification, etc.) after the release.
Volunteers can also work independently, after contacting Project Gutenberg directly, by keying in a book they particularly like using any text editor or word processor. They can also scan it and convert it into text using OCR software, and then make corrections by comparing it with the original. In each case, someone else will proofread it. They can use ASCII and any other format. Everybody is welcome, whatever the method and whatever the format.
New volunteers are most welcome too at Distributed Proofreaders (DP), Distributed Proofreaders Europe (DP Europe) and Distributed Proofreaders Canada (DPC). Any volunteer anywhere is welcome, for any language. There is a lot to do. As stated on both websites, "Remember that there is no commitment expected on this site. Proofread as often or as seldom as you like, and as many or as few pages as you like. We encourage people to do 'a page a day', but it's entirely up to you! We hope you will join us in our mission of 'preserving the literary history of the world in a freely available form for everyone to use'."