It might not be clear from a first reading – I am making this analysis based on the blog comments of the people involved – but there are two important points here that, up until now, have been stumbling blocks in the discussion of open scientific data:
- A simple statement is required along the forms of “best practice in data publishing is to apply protocol X”. Not a broad selection of licenses with different effects, not a complex statement about what the options are, but “best practice is X”.
- The purpose of publishing public scientific data and collections of data, whether in the form of a paper, a patent, data publication, or deposition to a database, is to enable re-use and re-purposing of that data. Non-commercial terms prevent this in an unpredictable and unhelpful way. Share-alike and copyleft provisions have the potential to do the same under some circumstances.
- The scientific research community is governed by strong community norms, particularly with respect to attribution. If we could successfully expand these to include share-alike approaches as a community expectation that would obviate many concerns that people attempt to address via licensing.
- Explicit statements of the status of data are required and we need effective technical and legal infrastructure to make this easy for researchers.
- The separation of the decision to publish from the question of open access to published data. Not all data can be published, for example data which identifies a specific person in clinical research. The scientific process knows how to deal with this, usually by making such data available to a couple of trusted outsiders (referees), on request and on the basis of confidentiality, and letting the referees vouch for its veracity or verisimilitude.
- The idea that "best practices" might be different in different domains. This is related to the point above, but also allows a healthy diversity in approaches adapted to different circumstances. Does a chemist really have to run (and publish) an NMR spectrum of every brown-tar reaction product, or will a photo suffice?!
This data has been obtained and made public for the benefit of Society as a whole. Anyone may use it for any purpose so long as the source is acknowledged.These two sentences could be preceded by a reference to a Code of Practice from a learned society or funding body (e.g., the BBSRC data sharing policy), or could be completed with a reference to a specific licence, e.g. CC0 or the PDDL. And all of this needs to be fitted in with the parallel process at Science Commons, however much that process appears to be reinventing the wheel…