Bibliographica

No public posts in this group. You must register or login and become a member in order to post messages, and view any private posts.

We propose to develop an open source web service called 'Bibliographica' which would enable its users to create, edit, explore and share annotated lists of publications - such as reading lists, bibliographies or bibliographic indexes.

The idea arose while looking for a tool to facilitate the collaborative development of an index of research publications that could be searched and sorted according to the diverse needs and requirements of researchers.

Features would include:
* Open source and easy for anyone to set up their own branded
instance of the service at their own domain name (e.g. biblio.ed.ac.uk
or books.example.com)
* Easy to import and export data in a variety of common formats (and
including from existing online sources of open bibliographic data such
as the Open Library)
* Fully versioned so that all changes to the bibliography can be
tracked and, if necessary, reversed
* Allows different read/edit permissions to be assigned to different
users and groups (e.g. individual researchers or multidisciplinary
research groups)
* Allows users and groups to easily create their own personalised
lists of publications (e.g. for a taught course, for a publication or
for a bibliographic index)
* Allows users to easily create new 'record' for a publication
* Allows users to search, sort and query records by author, title,
subject matter, language, country/region of origin, date of
publication, date of subject matter, ...
* Support for arbitrary, user-generated tags for authors and works
* Uses existing best-of-breed technologies such as OpenID for login
* Well documented API
* Allows users to see which works are in the public domain in their
jurisdiction (using a series of public domain calculators)
* Allows users to easily find digital copies of works which have
fallen into the public domain - as well as links to online journal
archives, library catalogues
* Suggestions for related publications or lists.

Who am I?: 
* Michael Fourman, Informatics, University of Edinburgh * Iain Emsley, Open Knowledge Foundation * Jonathan Gray, University of London + Open Knowledge Foundation * Sara Wingate Gray, UCL + Open Knowledge Foundation * Rufus Pollock, University of Cambridge + Open Knowledge Foundation * David Read, Open Knowledge Foundation * Jo Walsh, EDINA + Open Knowledge Foundation
How is it novel? What is exciting about it?: 
Lists of publications are a fundamental and often overlooked part of research. Bibliographica would enable researchers to share, explore and collaborate around the creation of such lists in ways which were not previously possible. While the selection and evaluation of works is usually conducted on a case-by-case basis by individual researchers, Bibliographica would make this process ongoing, collaborative and dynamic. On the one hand, while using digital technologies such as Wikis or Google Docs allows users to create lists in document form - we want something which can be explored and queried in more sophisticated ways. For example, users might be primarily interested in articles or monographs published in a certain language, within a specified time frame, pertaining to a particular period, region or theme. On the other hand, by using 'out of the box' technologies to create a database-driven website, many of the features designed to support the creation of resources by multiple contributors are lost - such as versioning, commenting and so on. For example, users are not straightforwardly able to undo changes or annotate publications or lists. Bibliographica would let users to generate complex queries as well as enabling them to edit and annotate lists and publications in a wiki-like manner. It would enable communities of experts to create and curate contributions - allowing for quality and comprehensiveness surpassing what is possible for an individual researcher.
What will I do next? What opportunities will it open up?: 
Initially we hope to pilot the project with 2-3 instances of the service for different domains (e.g. history of science, European folktales, intellectual property law, …) - to get user feedback as well as suggestions for improvements and additional desirable features. We would be very keen to make Bibliographica interoperable with other existing online web services, and would actively encourage others to build on and extend the code base. In particular we think there are opportunities for representing data about authors and publications in new ways - e.g. using maps and timelines. We would also be interested in looking at how (open) data generated using Bibliographica could be inter-linked with data from other sources - such as Linked datasets - and how the way that data is exchanged between different instances of the service can be made as seamless as possible. Finally we would like to implement support for multiple languages - in both the generated content and the user interface.
What constitutes success? How risky is it?: 
We would consider the project successful if we have a stable, well documented web service with a reasonably sized (100+), relatively active group of researchers using it. We are also keen to engage other institutions interested in using, and possibly supporting, the service. Projects which we would regard as exemplars include Open Street Map (which allows for the collaborative creation of geodata), Open Journal Systems (an online journal system used by over 5000 publications), and Wordpress, the open source blogging platform. While we consider the risks associated with the project to be minimal, a lack of publicity and community building could result in a low user uptake.
What resources do I bring to the project?: 
We would be able to provide: * Technical expertise and experience. For example we have worked on CKAN, which is used in data.gov.uk and supports versioning for data, user management, tagging and ratings * Technical infrastructure, such as hosting and system administration * Publicity, e.g. extensive network in various academic and open source developer communities, high profile blog
What resources and expertise do I need?: 
We require a dedicated developer (preferably Python) and a web designer. We are in contact with several people who would be willing to undertake this work in a paid capacity, and would also be happy to work alongside researchers, students and developers within the University of Edinburgh.
What shared resources, if any, will the project create?: 
All Bibliographica's code, content and data would be open (e.g. GPL, Creative Commons Attribution, Open Database License). We would also strongly encourage beta users of the project to license their contributions in a similar manner. If the project was actively used in several domains we would hope that over time this would generate a substantial amount of open bibliographic metadata and associated user data (lists, annotations, etc.).
What is the timescale?: 
6 months

Project Update

Much of the basic groundwork has been laid. Without going into much technical detail,

* a pylons application has been written providing the front end
* a back-end datastore supporting multiple types of indices on RDF data has been written
* message handling for offline index building and inferencing has been implemented
* a javascript library implementing the W3C Fresnel standard has been written for presentation as well as some initial lenses
* a javascript editing interface has been implemented, supporting the addition of a restricted set of annotations
* support for exchanging data with other RDF services (or other instances of this service) functions
* support for complete revision history and associated metadata has been implemented
* a prototype instance is running at http://bibliographica.org

Much remains to be done, in particular on the user interface, but we are confident we have a solid base to work from. It should be understood that this base is somewhat general and can be reused for other projects in different domains. To our delight we have had some quite positive feedback from faculty and staff at the University of Edinburgh.

Specific documentation of what has been done is available at http://tinyurl.com/35xk3bu tickets 47 and later pertain to this project. Some background information at http://bibliographica.org/docs/ordf/design_choices.html and a blog post is forthcoming.

We have approached a potential funding source about continued support but do not expect to hear back until a couple of weeks from now. We would be most greatful for suggestions and references to help us continue this work.

Final Report

Final report available at: http://styx.org/~ww/2010/10/bibliographica/

Content is pasted below.

Bibliographica Retrospective Report
===================================

This report was prepared by William Waites of the Open Knowledge
Foundation in Edinburgh on October 14th 2010.

The project was to create a linked data (or RDF) system for
creating and curating bibliographic information. We started
with no good general toolkit for building RDF based web
applications.

Accomplishments
---------------

* Created RDF middleware called ORDF. There is more material
available in the documentation
and as well in the second half of the presentation recently
given at the University of Leipzig http://styx.org/~ww/2010/10/leipzig
Feedback from this presentation is that we are on the right
track -- AKSW has taken a similar approach with their development
of OntoWiki using different tools.

* source code: http://ordf.org/src
* documentation: http://packages.python.org/ordf

Briefly ORDF does,

* Smooths API bumps in the low-level rdflib (major incompatible
changes between version 2.4.X and 3.X)
* Queueing and message passing for distributed processing of
RDF graphs
* Indexes for various purposes (simple retrieval, graph traversal,
full text searches)
* Inferencing (Description Logic and Horn Rule subsets of
FOPL expressed in a manner conformant to RIF)
* Vocabulary modules for commonly used RDF vocabularies including
an Object-Description-Mapper (analogue to ORM) using OWL

* Low-level drivers and modifications to the 4store quad store
were made to facilitate its use in a Python web application.
Specifically,

* Modification of 4store to support multiple client connections
* 4store drivers for rdflib

* Created a web interface (and supporting toolkit) for the
presentation of RDF data in general and bibliographic data
in particular. It uses the common Pylons framework which is
often used for making web applications with python and
enables such an application to be backed with an RDF store
rather than a SQL database.

* Ontology web service toolkit: http://ordf.org/src/ontosrv
* Bibliographica application: http://knowledgeforge.net/pdw/openbiblio

* Broke some new ground in natural language processing by
attempting to use natural language for data entry. The software
that does this is, of course, based on ORDF and the NLTK

* http://bitbucket.org/ww/sembot
* http://blog.okfn.org/2010/08/09/cataloguing-bibliographic-data-with-natu...

Its strategy is,

* Tokenise and tag sentences
* Using a simplified English grammar makes an RDF representation
of a syntax tree
* Execute domain-specific RIF rules over the syntax tree to
entail statements about works, authors, etc.

This approach seems encouraging. There were observed cases where
syntactic ambiguity (i.e. a sentence that could be parsed in more
than one way) was resolved by the domain-specific (semantic)
inference rules. Sebastian Hellmann of AKSW is working on a system
that uses very much the same strategy for his Ph.D thesis.

These accomplishments notwithstanding, a viable system for processing
and presenting RDF bibliographic data having been created, a key
part of the original goal remains unachieved -- having a good
system for users to create and annotate this data. This is hard
because of the following tension. Either,

* Users have to have a thorough understanding of the
structure and layout of the data (seems to be necessary
for a general tool)
* Each user interaction has to be modelled by hand

The first means the assumption of an unacceptable level of system
specific knowledge on the part of the user. The second means a large
amount of crafting web forms by hand, work which is not reuseable
and tedious. These considerations led to trying to use NLP techniques
to accomplish this task. Not only does the approach seem promising
enough to merit further pursuit, but the instigator of our proposal
is experiencing problems with carpal tunnel syndrome and has
difficulty typing. The speech recognition software he uses is
adapted for prose dictation which seems like a natural fit here.

Future Work
-----------

Two threads from this work are being pursued. OKF is a contributor
to the JISC funded JISCOBIB project which could naturally be seen
as a continuation or successor to Bibliographica. This is being
undertaken in conjunction with the University of Cambridge and has
a focus on data about publications in academic journals.

The second thread concerns the NLP techniques and there has been
some interest from Wikimedia Germany in funding a small project to
make the basic building blocks available as a RESTful service.

Syndicate content