Difference between revisions of "How to use the NHSC webpages effectively"

From GrassrootWiki
Jump to: navigation, search
m (adding info regarding OCR status of pdfs)
 
(No difference)

Latest revision as of 15:23, 22 May 2006

DOWNLOADING THE ENTIRE DOCUMENT, OR PARTICULAR SECTIONS, OR INDIVIDUAL PAGES

The preliminary pages are contained in one pdf file, while pages 1-497 are contained in another pdf file. Major sections of the report are available separately; each section can be read on-line or downloaded in printer-friendly format. Individual pages can be seen one by one in a format that offers the text alongside the photograph showing that page as it was scanned.

Creating two pdf files was done so that the pdf page number will be the same as the actual page number at the bottom of the original document. That makes it easy to find any particular page by using the table of contents, and also easy to give a correct citation of the page number when quoting from the document.

THE ENTIRE OCR'd (uncorrected) FINAL MAJORITY REPORT, COMPRISING PAGES 1-497 (but not including the preliminary pages), can be downloaded by clicking here.

Its abbreviated URL, useful for printed documents and e-mails, is:
http://tinyurl.com/rxd7n

THE OCR'd (uncorrected) PRELIMINARY PAGES include the cover, transmittal letter, names of commissioners and staff members, preface, detailed TABLE OF CONTENTS, detailed list of tables and list of charts. The pdf with all these OCR'd (uncorrected) preliminary pages can be downloaded by clicking here.

Its abbreviated URL, useful for printed documents and e-mails, is:
http://tinyurl.com/m8zou

MAJOR SECTIONS of pages are listed on the front page of this website, under the same title each section has in the table of contents. Not all sections are listed yet -- only those sections are listed which have been edited to correct OCR errors and to look pretty for easy reading. To read a particular section on-line, click on its title. To create a printer-friendly version of that section (which can be copied and pasted or actually printed), use the toolbox on the left-hand side of the screen, near the bottom, and click on "printable"

INDIVIDUAL PAGES include both the scanned photograph and its text. The text is provided in edited form after editing is done. At any time before that page is reached by editors, the text remains in unedited raw form lacking paragraph breaks or table columns. Each individual page can be seen by clicking on its page number in the "Full Page List" available by clicking here:

SEARCHING can be done that covers the entire majority report (including unedited pages). A search window can be found on the front page of this website, at the left-hand side, near the bottom. Type key words or phrases (or copy and paste them) into the search window.

CITATIONS for footnotes might follow this example:

"The Commission finds that the facts do not meet the tests for showing the existence of aboriginal title. Even if the tests had been met, the Commission finds that such title was extinguished by actions of the Hawaiian government before 1893, and certainly before annexation, which was the first assumption of sovereignty by the United States. Finally, even if these tests had been met, neither the Fifth Amendment to the United States Constitution nor current statutes provide authority for payment of compensation to native Hawaiians for loss of aboriginal title. The Commission examined whether a trust or fiduciary relationship exists between the United States and native Hawaiians and concluded that no statutes or treaties give rise to such a relationship because the United States did not exercise sovereignty over the Hawaiian Islands prior to annexation, and the Joint Resolution of Annexation, No. 55 (July 7, 1898) did not create a special relationship for native Hawaiians. The Commission considered whether native Hawaiians are entitled to compensation for loss of sovereignty, and found no present legal entitlement to compensation for any loss of sovereignty." (Final report of the Native Hawaiians Study Commission, presented to Congress June 23, 1983, pp. 25-26, as found at http://tinyurl.com/rxd7n )

BACKGROUND ABOUT THE PROCESS USED (explaining why editing was necessary, how it was done, and why there are problems for the reader and researcher)

The Native Hawaiians Study Commission report (NHSC) was published in 1983, before computers were used routinely for creating electronic documents and before the age of the internet.

To convert the document into electronic form and place it on the internet, and make the text capable of being copied and searched, required several steps.

First each page had to be photographed by a scanner. But a photograph of printed words is not a text document. To make it into a text document whose contents can be copied as text, and whose words can be found by a search engine, it was necessary to use an optical character reader (OCR).

There are not many hardcopies of the NHSC report available, and those are often in very poor condition. Some pages are faded; some are missing an entire column of text along a right or left margin. In such cases editors had to use their best judgment to make an intelligent guess about missing letters and digits which weren't available for the scanner to see. Another source of problems comes when the OCR converts the scanned photograph into text. Every OCR makes errors when a smudge on the page gets translated as a letter or number which isn't really there, or when one letter or number looks too much like a different one and gets translated incorrectly. Human editors not possessing the original hardcopy had to compare the text produced by the OCR against the photograph produced by the scanner in order to correct such OCR errors, and also had to make an intelligent guess to insert letters or digits missing even from the original hardcopy.

Each individual page has both the photograph produced by the scanner and also the text produced by the OCR. Pages not yet edited have the text as originally produced by the OCR with no corrections; and also containing no paragraph breaks or italics or formatting of charts and tables. Pages which have been edited might still contain occasional typographical errors, but at least the editors have made them look pretty by inserting paragraph breaks and by lining up the columns contained in charts.