Login

Login
Welcome:
Guest

Search for:


Browse:

Bannner: Aslib individual membership.
 
Journal search
Journal cover: Electronic Library, The

Electronic Library, The

ISSN: 0264-0473

Online from: 1983

Subject Area: Library and Information Studies

Content: Latest Issue | icon: RSS Latest Issue RSS | Previous Issues

Options: To add Favourites and Table of Contents Alerts please take a Emerald profile

Previous article.Icon: Print.Table of Contents.Next article.Icon: .

Converting PDF files to XML files


Document Information:
Title:Converting PDF files to XML files
Author(s):Wende Zhang, (Fuzhou University Library, Fuzhou, Fujian, People's Republic of China)
Citation:Wende Zhang, (2008) "Converting PDF files to XML files", Electronic Library, The, Vol. 26 Iss: 1, pp.68 - 74
Keywords:Extensible Markup Language, Information exchange, Portable document format
Article type:Case study
DOI:10.1108/02640470810851743 (Permanent URL)
Publisher:Emerald Group Publishing Limited
Abstract:

Purpose – The purpose of this paper is to develop a system that can convert PDF files to XML files.

Design/methodology/approach – The system works with XML as an information display model and XSLT as an information extraction rule. The process is illustrated by converting a scientific and technological paper in PDF to a valid XML file.

Findings – Because the PDF file adopts the self-descriptive definition, its content information and the display information exists in different objects; therefore, it is not easy to directly extract information from the PDF source file. The undirected way to solve this problem in the system design was to convert the PDF source file to a relatively easy processing intermediate format, which can then be automatically converted to the target file in accordance with relevant rules.

Originality/value – It is important to be able to easily and conveniently extract information from PDF files and this paper shows how it can be done. The design ideas contained in the paper can also be applied to information extraction from other types of files.



Fulltext Options:

Login

Login

Existing customers: login
to access this document

Login


- Forgot password?

- Athens/Institutional login

Purchase

Purchase

Downloadable; Printable; Owned
HTML, PDF (251kb)Purchase

To purchase this item please login or register.

Login


- Forgot password?

Recommend to your librarian

Complete and print this form to request this document from your librarian


Marked list

Bookmark & share

Reprints & permissions

© Emerald Group Publishing Limited  |  Copyright information  |  Site policies  |  Cookie information
..