The “Protocol for Implementing Open Access Data” states in its point 4.1:
to facilitate data integration and open access data sharing, any implementation of this protocol […] MUST NOT apply any obligations on the user of the data or database such as “copyleft” or “share alike”, or even the legal requirement to provide attribution. Any implementation SHOULD define a non-legally binding set of citation norms in clear, lay-readable language.
I'll come back to the question of how much of a database can be copyrighted in a second – for the moment, we simply need to assume that some aspect of the database can be copyrighted. If that were not the case, there would be no need for protocols such as that prepared by Science Commons, or my "Protocol X" to go with the Panton Principles!
Now the justification for banning contractual requirements of attribution is that it raises the transaction costs for reusers of the data. Very true, users might have to record where they got their data from, which takes time and effort. So Science Commons recommends "that authors simply waive attribution, which does create legal certainty and provides freedom to operate to the data user."
Herein lies the problem. I am a Spanish resident, so all my works are covered by Spanish copyright law. Under Spanish law, I cannot "simply waive attribution": if I pretend to do so in a clause in a contract, that waiver will be null and void. The right to be identified as the author of a work is one of the "moral rights" of copyright law in civil law countries, and it is deemed to be a part of the author and so untransferable except mortis causa. Even after my death, my heirs will be able to insist on attribution of my works for seventy years. That's just Spanish law: if I lived in France (which I did for several years), my moral rights would be eternal (in theory, at least!).
I cannot honestly waive my legal rights to attribution, and the same goes for many, many scientists. If I were to pretend to do so, it would create exactly the sort of legal uncertainty that data-sharing protocols are meant to avoid. My rights to attribution are not enforceable in the United States unless I insist on them in a copyright license, but insisting on an attribution license is creating legal certainty worldwide: I am only licensing what I am able to license, I am not pretending to license something that my local law will not allow me to, nor binding my heirs when that is self-evidently impossible.
So which parts of a database can I claim attribution to? Moral rights are a part of copyright law (article 6-bis of the Berne Convention for the legal geeks), so we need to ask how much of a database can be considered to be covered by copyright (European database rights don't enter into this problem, thank your-favourite-Deity). The current international statement is contained in article 10.2 of the TRIPS agreement:
Compilations of data or other material, whether in machine readable or other form, which by reason of the selection or arrangement of their contents constitute intellectual creations shall be protected as such. Such protection, which shall not extend to the data or material itself, shall be without prejudice to any copyright subsisting in the data or material itself.To put it in layman's terms, the author of a database only has copyright over those parts that he or she has actually thought about and considered, not the bits which relate to simple data entry. Unfortunately, this "threshold of originality", as it is technically known in copyright law, varies from jurisdiction to jurisdiction! To simplify things outrageously, the threshold is lowest in the UK and highest in Germany, with the U.S. being somewhere in between. Obviously, nobody expects professional scientists to be expert international copyright lawyers, so it is best to set the protocol at the lowest threshold of originality, where pretty much everything except the facts themselves is subject to copyright.
What is the response of the Science Commons protocol to this question? It can be found in section 5.2, where the ban on "share-alike" clauses is justified:
a user would be able to extract the entire contents (to the extent those contents are uncopyrightable factual content) and republish those contents without observing the copyleft or share-alike terms.I don't agree with the strength of the statement given in the Science Commons protocol, but I will accept that, in practice, that would be the case. If it were not the case, it would violate the pivotal concept that you cannot copyright the facts of nature: a few people would like to do that, but I still have confidence that judges would uphold a principle that has been generally accepted worldwide since at least 1883.
On the other hand, the Science Commons protocol falls into the trap of internal inconsistency. Share-alike clauses are banned because they are effectively unenforceable for databases, but attribution clauses are banned because it would cost far too much to comply with them, and because reusers might face actions for copyright infringement.
Everyone concerned with open-access publication has the greatest respect for the efforts that Creative Commons and its daughter projects have put in to providing legal security for our aspirations. This doesn't mean that mistakes are never made, as John Wilbanks himself points out. I don't wish to knock the efforts that have obviously gone into preparing the Science Commons protocol, merely to try to make it workable outside of the Fifty States.