Areas for Discussion
NISO Bibliographic Roadmap Meeting
April 15-16, 2013, Baltimore, MD
(April 15: 10am-5pm; April 16: 9 am-4 pm)

(Note: if you get here and do not see the edit button (looks like a pencil), scroll to the very bottom of the page and click on "sign in". If you want to be an editor on any of these pages and you can't sign in, email nlagace at niso.org to be enabled.)

This document is intended to help frame our in-person and virtual meeting by providing a list of potential areas / topics we can discuss as a group, or small breakout groups.   Those responding to the RSVP meeting survey were asked if there were particular things they thought should be part of the meeting - these have been sorted into topical areas below.   Further additions/edits should be made if necessary. 

We'd like to turn this into a document we could use to seed meeting brainstorming by fleshing out the areas below a bit, reordering if need be, etc.  

Linked data "Requirements" - what are they?
  • Assuming new data as well as compatibility with existing MARC
    • What elements of MARC, UNIMARC and ISBD need to be carried forward into a future format? [gordond: ALL elements are part of the future, including local RDBS schema for specialist materials, and it is up to standards communities and local organizations to discuss this internally]
    • Can we move forward if "round-tripping" is required?
      • [kc: I would rather that we NOT discuss in terms of MARC, much less "what should be carried forward"? I think that discussion is a dead end, and it would be better to talk about what we need for linked data rather than what to drag forward from old record formats] [dih: Agreed] [dunsire: agreed]
  • Granularity: dumb-down is not necessary if local schema is published in RDF along with linked data using that schema (see Mapping, not migration)
    • Each community has established the granularity of elements/attributes for their local user base (use cases are well understood and implemented). Linked data brings together the local and the global, and there is no reason to favour one over the other [dunsire; bourdon]
  • What do the changes in system capabilities plus the move to web-scale computing mean for bibliographic data?
    • Linked data and RDF affect this significantly
  • Cataloguing tools to produce linked data?
    • Right now it is possible to experiment with exposure of our legacy data produced in MARC as linked data  (example: data.bnf.fr), but how to produce linked data from the beginning? How to design cataloguing tools for such production? What is the impact on the future cataloguing work? [bourdon]
  • Provenance
    • Looking at the W3C and DCMI work and recommendations
    • Development of specific use cases with immediate application
    • Effective linked data applications will require un-restricted sharing of bibliographic data. This then requires the agreement of those holding large stores of such data, from Amazon to national libraries, from publishers to academics, and including library systems that use data as a revenue source, such as article indexers and OCLC.
      • This requires a re-thinking of the 'marketplace', given that most of the purveyors of free bib data are thinking of the data as a way to entice users to their services. The 'data as revenue source' services will not be able to compete. 
    • Deduplication: most bibliographic data exists in huge numbers of duplicate instances. How does this affect usability? [dih: this is only relevant in the current bibliographic 'master record' environment] [dunsire: the new environment expects, and accommodates, duplication on a much bigger scale; agree that deduplication is no longer relevant]
      • In a linked data environment there is no such thing as 'duplication': each statement has provenance, which provides a basis for selection
    • Trusted linked data and updated data: How to manage <trusted> linked data taking into account these data are constantly updated? What routines to manage datasets? [bourdon]
  • Related - how can requirements for data be automatically checked as part of workflow? -mechanisms, rules [dih: if in the RDF world anybody can say anything about any thing, shouldn't we be talking about this differently?]
    • Data integrity: what does this mean in a linked data world? [dunsire: it means that provenance is very, very important]

Data exchange / sharing records / data compatibility
  • Open bibliographic data with no copyright or licensing restrictions (see Open Bibliographic Data Principles)
    • How to manage linked data when these data pertain to different types of licences: CC0 and CCBY for example [bourdon]
    • Linked data and personal information: how to reconcile authority control for names of persons and data privacy? Data privacy and rights information in authority records. [bourdon]
  • Diverse communities and users
    • Recognizing that we don't yet know who most of those will be, ergo we need to expose an 'everything' option
  • Relationships with other bibliographic standards. BIBFRAME / SCHEMA.org / Dublin Core / BibTex / RIS / etc.
  • What is required to foster good communication?
    • documentation
    • versioning and up-date management (e.g. auto-update, update notification, etc.)
  • Data compatibility and sharing among systems that maintain bibliographic control
    • What does 'bibliographic control' mean in a linked data world? 
  • Options for exchange of bibliographic data
  • Relationship of bibliographic data standards to Open Annotation -- is there any? Should there be any? - [I do see a great utility here for holdings data as annotations. - Krier]
  • Vocabulary development and management / vocabulary publication and maintenance
    • Managing statements
    • Interoperability and mapping
  • Machine matching to deduplicate new sets of bibliographic data (e.g. "records")
    • Doesn't de-duplication assume a continuing notion of 'master record'? 
  • Redis Library Services Platform: http://tuttdemo.coloradocollege.edu/code4lib/redis-library-services-platform.html
  • What is required (metadata, full text formation, APIs) to support big data research across existing digital databases of content, such as EBSCO and Gale?
    • Shouldn't such requirements be expressed as application profiles rather than generalized 'requirements'?
  • Making bibliographic data work with Wikipedia and Wikidata
  • Changes/ updates related to Dublin Core and RDA. 

Community
  • Diverse communities and users
  • how can interested parties be identified?
  • What is required to foster good communication
  • Identification of one or more organizations to be foundation for ongoing work
  • Role of IFLA (FRBR, ISBD, UNIMARC) and related bibliographic standards (RDA, RDA/ONIX) in the global information environment and place within the linked data ecosystem
  • no convergence of institutions like LC, BL, Europeana etc of RDA, RDF etc. 
  • how to make sure LC will pay attention to the results of this effort? How should these Roadmap results be introduced into the BIBFRAME process? [kc: we should not concern ourselves with that. We should move ahead, and let BIBFRAME follow or route around it.] [dunsire: Agree. More generally, there is no center/top/core/leader in this environment; we are all the center of our own local universe and the non-local is peripheral. We do not need a black hole in this galaxy, sucking in everything around it and destroying information at the same time.]
Who? How?
  • This will take work. Who are the appropriate parties to take responsibility? Where fill funding come from?
  • Efforts need to be long range and sustainable
  • Taking distributed responsibility more seriously
  • Who is responsible of what in the new bibliographic universe? Does the way we share responsibilities on a national basis according to the principles of <universal bibliographic control> continue to make sense or not? How to share responsibilities between different communities (libraries, archives, museums, publishers, etc) specially for <authority control>? (mainly in the assignment and maintenance of referents; that is, URIs)
Defining Bibliographic Data [dunsire: Everything that is a source of information (an information resource) is bibliographic data. I'm not sure this needs substantive discussion - there are no boundaries or edges in linked data, except those defined by schema/attribute sets, and those boundaries can be shrunk to whatever level is required by refinement/extension of elements and mappings between them]
  • Full range from in-document citations, abstracting service data, library data, bookstore data.
  • Types of materials: Anything? Everything? Some things?
    • books
    • serials
    • articles
    • maps
    • musical scores
    • performances
    • drama scripts (stage, radio, film)
    • films
    • collections, archives
    • data collections
    • objects (e.g. museum objects, archive objects)
Users (pre-meeting)
  • User outcomes
  • Development of specific use cases with immediate application (see W3C LLD Report )
  • Definition of "users" needs to acknowledge the complexity: scholars, librarians, general public, applications, and external services we know nothing about
  • Some users and their needs (or maybe this is a kind of workflow?)
    • metadata schema designers
      • minimally constrained vocabularies
      • clearly defined terms (human-semantic definition)
      • community data model
    • metadata system designers
      • community data model
      • ontology with domains and ranges, as desired (RDF-semantics)
      • defined constraints related to quality control, validation
    • metadata creators (those who create metadata instances)
      • clearly defined terms (human-semantic definition)
      • creation guidance rules
      • value vocabularies
      • an input system that does validation on data
    • searchers
      • clearly defined terms
      • defined values
      • minimal constraints on data re-combination
      • optional inferencing where available
    • data re-users
      • clearly defined terms
      • defined values
    • scholars using bibliographic data
      • interoperability of software using bib data, including document creation

Users (meeting discussion)

Comes down to WHO WHAT WHY
This is a simple list of things that need to be considered. We will have to have metrics to make these actionable. This will be an iterative process.
  1. Who are the users? - Users today are beyond the traditional users of the library, and must include anyone seeking information in public fora.
  2. Uses determine system functionality - How people use information must lead to functional responses on the part of cultural heritage institutions. While some types of use must be assumed in order for systems to be created, the key today is to provide open systems that allow users to invent their own views, and then to respond to those uses by providing new system functionality. Functionality initially is assumed to include:
    1. Discovery platform (broadly defined -> local sources, remote resources)
    2. Reference management/bibliographic software
    3. Delivery services
    4. Resource sharing functions
    5. Web exposure
  3. Measurement - you have to capture information that allows you to respond to user needs. In most cases today we do not capture (or analyze, if captured) the data that would allow the institution to respond to user needs. We need to think about what we can measure and how such measurements can lead to system evolution.
  4. Users are, by definition, "unanticipated" until you have data about them (in other words, don't just measure usage of known groups - that's too constraining.  Allow for unanticipated users and unanticipated uses of your data)
  5. How open you are to users and user participation will determine how they can participate? If there is little or no possibility for users to "invent" new ways to use the system, then the system pre-determines what users can and cannot do. There must be ways for users to develop new uses of the system or it will be static. User views and user activities can be supported through APIs and through other forms of system openness. We suggest looking at the open source model to get ideas about this.
  6. Not all users will be active, or interested.  Don't just measure the 'motivated' user (i.e. scholar, librarian) - measure the passive user as well. The occasional user's needs also must be met, and their activity is as legitimate as that of the most active users. Less active users may need more help, and may become more active if the system shows them what it can do.
  7. Libraries may have to change their ideas of privacy/anonymity. The rise of social media has shown that crowd-sourcing and sharing can be valuable. Libraries need to allow this and at the same time protect the freedom of inquiry that is their mission. This is not a contradiction in terms; we often see it as such because most social media are primarily used to "sell" user preferences to advertisers.
  8. Library services and public-facing interfacing must be user-centric not library-centric
    1. Users should not see library jargon or have to work with library management views.
    2. We must recognize that users are associated with more than one institution, and many institutions over time. They also are on the Web. There needs to be a way for a user to have an identity that travels. But this is a large problem on the Web in general, not a library-specific problem.
  9. Use cases that inform system design need to be more open. There is a tendency to develop use cases based on what the system can do, or on what services the library wants to provide. We need to gather use cases that are outside of our comfort zone, and then evaluate which ones we can or cannot fulfill. Some use cases will require us to partner with others, some may have to be declared outside of the library. In all cases we should allow library data and services to interact with any services that wish to use them.

Mapping [not Migration]

[kc: IMO this group should not be discussing this. That's for the LC BIBFRAME effort. This group should stay away from BIBFRAME, away from MARC and away from migration. We should focus on the bigger picture, not on library data.] [dih: Agreed--the real issue is mapping.]
  • The foreseeable schedule of RDA
  • Incremental shifts from MARC to BIBFRAME
    • Isn't this a mapping conversation, not a 'shift' or a 'migration'?
  • When and where should BIBFRAME be standardized
    • What kinds of extensions to BIBFRAME are required if direct conversion from other MARC formats are needed
      • Countries using national formats unwiling to migrate to MARC 21 first
      • Can an analysis based on UNIMARC be produced?
  • Computational approaches to migration