How to Build a Digital Library

Front Cover
Elsevier Science & Technology Books, 2003 - Computers - 518 pages
1. Orientation: The world of digital libraries 1 One: Supporting human development 1 Two: Pushing on the frontiers of science 1 Three: Preserving a traditional culture 2 Four: Exploring popular music 2 The scope of digital libraries 3 1.1. Libraries and digital libraries 4 1.2. The changing face of libraries 5 In the beginning 6 The information explosion 7 The Alexandrian principle 9 Early technodreams 9 The library catalog 10 The changing nature of books 11 1.3. Digital libraries in developing countries 12 Disseminating humanitarian information 12 Disaster relief 13 Preserving indigenous culture 13 Locally produced information 14 The technological infrastructure 14 1.4. The Greenstone software 15 1.5. The pen is mighty: wield it wisely 18 Copyright 18 Collecting from the Web 19 Illegal and harmful material 23 Cultural sensitivity 23 1.6. Notes and sources 24 2. Preliminaries: Sorting out the ingredients 1 2.1. Sources of material 2 Ideology 2 Converting an existing library 3 Building a new collection 4 Virtual libraries 5 2.2. Bibliographic organization 7 Objectives of a bibliographic system 7 Bibliographic entities 8 2.3. Modes of access 13 2.4. Digitizing documents 16 Scanning 16 Optical character recognition 17 Interactive OCR systems 19 Page handling 22 Planning an image digitization project 23 Inside an OCR shop 24 An example project 25 2.5. Notes and sources 26 3. Presentation: User interfaces 1 3.1. Presenting documents 3 Hierarchically structured documents 3 Plain, unstructured text documents 4 Page images 6 Page images and extracted text 7 Audio and photographic images 8 Video 9 Music 9 Foreign languages 10 3.2. Presenting metadata 11 3.3. Searching 13 Types of query 14 Case-folding and stemming 16 Phrase searching 18 Different query interfaces 20 3.4. Browsing 22 Browsing alphabetical lists 23 Ordering lists of words in Chinese 23 Browsing by date 25 Hierarchical classification structures 25 3.5. Phrase browsing 26 A phrase browsing interface 26 Keyphrases 28 3.6. Browsing using extracted metadata 29 Acronyms 29 Language identification 30 3.7. Notes and sources 30 4. Documents: The raw material 1 4.1. Representing characters 3 Unicode 4 The Unicode character set 5 Composite and combining characters 6 Unicode character encodings 8 Hindi and related scripts 10 Using Unicode in a digital library 14 4.2. Representing documents 14 Plain text 14 Indexing 15 Word segmentation 17 4.3. Page description languages: PostScript and PDF 19 PostScript 20 Fonts 23 Text extraction 25 Using PostScript in a digital library 28 Portable Document Format: PDF 29 PDF and PostScript 32 4.4. Word-processor documents 33 Rich Text Format 34 Native Word formats 38 Latex format 39 4.5. Representing images 40 Lossless image compression: GIF and PNG 41 Lossy image compression: JPEG 43 Progressive refinement 46 4.6. Representing audio and video 48 Multimedia compression: MPEG 48 MPEG video 50 MPEG audio 51 Mixing media 52 Other multimedia formats 53 Using multimedia in a digital library 54 4.7. Notes and sources 55 5. Markup and metadata: Elements of organization 1 5.1. Hypertext markup language 3 Basic HTML 4 Using HTML in a digital library 6 5.2. Extensible markup language XML 6 Development of markup and stylesheet languages 7 The XML metalanguage 8 Parsing XML 10 Using XML in a digital library 11 5.3. Presenting marked up documents 11 Cascaded style sheets: CSS 12 Extensible stylesheet language: XSL 15 5.4. Bibliographic metadata 20 MARC 20 Dublin Core 21 BibTeX 23 Refer 24 5.5. Metadata for images and multimedia 24 Image metadata: TIFF 25 Multimedia metadata: MPEG-7 26 5.6. Extracting metadata 28 Extracting document metadata 29 Generic entity extraction 29 Bibliographic references 30 Language identification 31 Acronym extraction 32 Keyphrase extraction 33 Phrase hierarchies 36 5.7. Notes and sources 38 6. Construction: Building collections 1 6.1. Why Greenstone? 2 What it does 2 How to use it 4 6.2. Using the Collector 6 Creating a new collection 8 Working with existing collections 11 Document formats 12 6.3. Walkthrough 13 Getting started 14 Making a framework for the collection 14 Importing the documents 15 Building the indexes 16 Installing the collection 17 6.4. Importing and building 17 Files and directories 18 Object identifiers 19 Plugins 20 The import process 21 The build process 22 6.5. Greenstone archive documents 24 Document metadata 25 Inside the documents 25 6.6. Collection configuration file 26 Default configuration file 26 Subcollections and supercollections 27 6.7. Getting the most out of your documents 29 Plugins 29 Classifiers 34 Format statements 38 6.8. Building collections graphically 42 6.9. Notes and sources 43 7. Delivery: How Greenstone works 1 7.1. Users, processes and protocols 1 Processes 2 The null protocol implementation 2 The Corba protocol implementation 3 7.2. Preliminaries 3 The macro language 3 The collection information database 8 7.3. Responding to user requests 10 Performing a search 11 Retrieving a document 12 Browsing a hierarchical classifier 13 Generating the home page 13 Using the protocol 14 Actions 14 7.4. Operational aspects 15 Configuring the receptionist 16 Configuring the site 19 7.5. Notes and sources 19 8. Interoperability: Standards and protocols 1 8.1. More markup 2 Names 2 Links 4 Types 7 8.2. Resource description 9 Collection-level metadata 11 8.3. Document exchange 12 Open eBook 13 8.4. Query languages 15 Common command language 16 XML Query 18 8.5. Protocols 20 Z39.50 21 Supporting the Z39.50 protocol 22 The Open Archives Initiative 22 Supporting the OAI protocol 24 8.6. Research protocols 25 Dienst 25 Simple digital library interoperability protocol 26 Translating between protocols 27 Discussion 28 8.7. Notes and sources 29 9. Visions: Future, past, and present 1 9.1. Libraries of the future 2 Today's visions 2 Tomorrow's visions 3 Working inside the digital library 5 9.2. Preserving the past 6 The problem of preservation 6 A tale of preservation in the digital era 7 The digital dark ages 8 Preservation strategies 9 9.3. Generalized documents: a challenge for the present 12 Digital libraries of music 12 Other media 14 Generalized documents in Greenstone 16 Digital libraries for oral cultures 18 9.4. Notes and sources 19 References 1 Glossary of terms 10 Appendix A Installing and operating Greenstone 16 A.1 Installation procedure 17 Windows 17 Unix 19 How to find Greenstone 21 Testing and troubleshooting 22 Greenstone collections 22 Associated software 23 A.2 Setting up the web server 24 Apache web server 24 Security 26 PWS and IIS web servers 26 File permissions 27 A.3 Managing your site 27 Personalizing the Greenstone home page 27 Redirecting a URL to Greenstone 28 Administrative facility 28 Configuration files and logs 29 User management 29 Technical information 30 Appendix B Greenstone source code 36 B.1 Foundations 36 Text_t object 36 Library code 37 Protocol API 38 B.2 Collection server 38 Search object 39 Source object 40 Filter object 40 Collection server code 41 B.3 Receptionist 42 Actions 42 Formatting 44 Macro language 44 Receptionist code 45 B.4 Initialization 48.

From inside the book

Contents

The world of digital libraries
1
3
23
Glossary 481
33
Copyright

12 other sections not shown

Other editions - View all

Common terms and phrases

About the author (2003)

David Bainbridge is a senior lecturer in Computer Science at the University of Waikato, New Zealand. He holds a PhD in Optical Music Recognition from the University of Canterbury, New Zealand where he studied as a Commonwealth Scholar. Since moving to Waikato in 1996 he has continued to broadened his interest in digital media, while retaining a particular emphasis on music. An active member of the New Zealand Digital Library project, he manages the group's digital music library, Meldex, and has collaborated with several United Nations Agencies, the BBC and various public libraries. David has also worked as a research engineer for Thorn EMI in the area of photo-realistic imaging and graduated from the University of Edinburgh in 1991 as the class medalist in Computer Science.