We propose to develop an open source web service called 'Bibliographica' which would enable its users to create, edit, explore and share annotated lists of publications - such as reading lists, bibliographies or bibliographic indexes.
The idea arose while looking for a tool to facilitate the collaborative development of an index of research publications that could be searched and sorted according to the diverse needs and requirements of researchers.
Features would include:
* Open source and easy for anyone to set up their own branded
instance of the service at their own domain name (e.g. biblio.ed.ac.uk
or books.example.com)
* Easy to import and export data in a variety of common formats (and
including from existing online sources of open bibliographic data such
as the Open Library)
* Fully versioned so that all changes to the bibliography can be
tracked and, if necessary, reversed
* Allows different read/edit permissions to be assigned to different
users and groups (e.g. individual researchers or multidisciplinary
research groups)
* Allows users and groups to easily create their own personalised
lists of publications (e.g. for a taught course, for a publication or
for a bibliographic index)
* Allows users to easily create new 'record' for a publication
* Allows users to search, sort and query records by author, title,
subject matter, language, country/region of origin, date of
publication, date of subject matter, ...
* Support for arbitrary, user-generated tags for authors and works
* Uses existing best-of-breed technologies such as OpenID for login
* Well documented API
* Allows users to see which works are in the public domain in their
jurisdiction (using a series of public domain calculators)
* Allows users to easily find digital copies of works which have
fallen into the public domain - as well as links to online journal
archives, library catalogues
* Suggestions for related publications or lists.
Final report available at: http://styx.org/~ww/2010/10/bibliographica/
Content is pasted below.
Bibliographica Retrospective Report
===================================
This report was prepared by William Waites of the Open Knowledge
Foundation in Edinburgh on October 14th 2010.
The project was to create a linked data (or RDF) system for
creating and curating bibliographic information. We started
with no good general toolkit for building RDF based web
applications.
Accomplishments
---------------
* Created RDF middleware called ORDF. There is more material
available in the documentation
and as well in the second half of the presentation recently
given at the University of Leipzig http://styx.org/~ww/2010/10/leipzig
Feedback from this presentation is that we are on the right
track -- AKSW has taken a similar approach with their development
of OntoWiki using different tools.
* source code: http://ordf.org/src
* documentation: http://packages.python.org/ordf
Briefly ORDF does,
* Smooths API bumps in the low-level rdflib (major incompatible
changes between version 2.4.X and 3.X)
* Queueing and message passing for distributed processing of
RDF graphs
* Indexes for various purposes (simple retrieval, graph traversal,
full text searches)
* Inferencing (Description Logic and Horn Rule subsets of
FOPL expressed in a manner conformant to RIF)
* Vocabulary modules for commonly used RDF vocabularies including
an Object-Description-Mapper (analogue to ORM) using OWL
* Low-level drivers and modifications to the 4store quad store
were made to facilitate its use in a Python web application.
Specifically,
* Modification of 4store to support multiple client connections
* 4store drivers for rdflib
* Created a web interface (and supporting toolkit) for the
presentation of RDF data in general and bibliographic data
in particular. It uses the common Pylons framework which is
often used for making web applications with python and
enables such an application to be backed with an RDF store
rather than a SQL database.
* Ontology web service toolkit: http://ordf.org/src/ontosrv
* Bibliographica application: http://knowledgeforge.net/pdw/openbiblio
* Broke some new ground in natural language processing by
attempting to use natural language for data entry. The software
that does this is, of course, based on ORDF and the NLTK
* http://bitbucket.org/ww/sembot
* http://blog.okfn.org/2010/08/09/cataloguing-bibliographic-data-with-natu...
Its strategy is,
* Tokenise and tag sentences
* Using a simplified English grammar makes an RDF representation
of a syntax tree
* Execute domain-specific RIF rules over the syntax tree to
entail statements about works, authors, etc.
This approach seems encouraging. There were observed cases where
syntactic ambiguity (i.e. a sentence that could be parsed in more
than one way) was resolved by the domain-specific (semantic)
inference rules. Sebastian Hellmann of AKSW is working on a system
that uses very much the same strategy for his Ph.D thesis.
These accomplishments notwithstanding, a viable system for processing
and presenting RDF bibliographic data having been created, a key
part of the original goal remains unachieved -- having a good
system for users to create and annotate this data. This is hard
because of the following tension. Either,
* Users have to have a thorough understanding of the
structure and layout of the data (seems to be necessary
for a general tool)
* Each user interaction has to be modelled by hand
The first means the assumption of an unacceptable level of system
specific knowledge on the part of the user. The second means a large
amount of crafting web forms by hand, work which is not reuseable
and tedious. These considerations led to trying to use NLP techniques
to accomplish this task. Not only does the approach seem promising
enough to merit further pursuit, but the instigator of our proposal
is experiencing problems with carpal tunnel syndrome and has
difficulty typing. The speech recognition software he uses is
adapted for prose dictation which seems like a natural fit here.
Future Work
-----------
Two threads from this work are being pursued. OKF is a contributor
to the JISC funded JISCOBIB project which could naturally be seen
as a continuation or successor to Bibliographica. This is being
undertaken in conjunction with the University of Cambridge and has
a focus on data about publications in academic journals.
The second thread concerns the NLP techniques and there has been
some interest from Wikimedia Germany in funding a small project to
make the basic building blocks available as a RESTful service.
Project Update
Submitted by wwaites on Wed, 05/19/2010 - 21:15.Much of the basic groundwork has been laid. Without going into much technical detail,
* a pylons application has been written providing the front end
* a back-end datastore supporting multiple types of indices on RDF data has been written
* message handling for offline index building and inferencing has been implemented
* a javascript library implementing the W3C Fresnel standard has been written for presentation as well as some initial lenses
* a javascript editing interface has been implemented, supporting the addition of a restricted set of annotations
* support for exchanging data with other RDF services (or other instances of this service) functions
* support for complete revision history and associated metadata has been implemented
* a prototype instance is running at http://bibliographica.org
Much remains to be done, in particular on the user interface, but we are confident we have a solid base to work from. It should be understood that this base is somewhat general and can be reused for other projects in different domains. To our delight we have had some quite positive feedback from faculty and staff at the University of Edinburgh.
Specific documentation of what has been done is available at http://tinyurl.com/35xk3bu tickets 47 and later pertain to this project. Some background information at http://bibliographica.org/docs/ordf/design_choices.html and a blog post is forthcoming.
We have approached a potential funding source about continued support but do not expect to hear back until a couple of weeks from now. We would be most greatful for suggestions and references to help us continue this work.