Google Books Settlement — an Open Source Project?
Here's a half-baked thought: In the
Google Books Settlement
why not require that Google place the unclaimed books into a
non-profit, open source project status? That is, anybody
— not just Google —
could make use of the scans and text.
Now before I get into the meat of this open source project idea, let me clearly
define which
aspects of the book scanning project I'm talking about: Unclaimed works.
Google is scanning a lot of books, and for many of them the owner of the
copyright is known. The settlement agreement encourages owners to come
forward and claim the books they have rights to.
"That book is mine." For those books, the situation with the rights is
unambiguous: The owner controls them. End of story. (They and Google
may enter into an agreement to sell them, and it is entirely up to the
copyright owner, as it should be.)
What I want to focus on are the books where no owner comes forward
to claim them. These are essentially "orphan" books. There are probably
a vast number of these. Millions. It would be a shame to lose them. I
agree with Google that would be a loss to
society.[**]
However, under the proposed settlement, Google, and Google alone,
stands to profit from these unclaimed works — after
breaking the law to get them
— and that seems wrong. Google should not get a cash-cow monopoly
revenue stream from an illegal action. (I generally hate to see lawbreakers
rewarded, though obviously it happens.) More importantly, for the benefit
of society, Google should not be the only organization allowed to do this. That sort of monopoly sets up a bad situation for the future.
There should be a penalty for breaking the law, and there should be competition.
My suggestion is
that they establish open-source, non-profit competition to themselves for these books.
This solves both issues: The monopoly is removed, and the penalty for
breaking the law is that their ability to profit from unclaimed works is
consequently diminished.
This actually meshes with what Google says it wants.
Google's VP and Chief Legal Officer, David Drummond, says
their Google Books digitization project is meant to help the world, and that they don't want to be a monopoly.
He writes (emphasis added):
In reality, nothing in this [settlement] agreement
precludes[**]
any other organisation from pursuing its own digitisation efforts.
We wish there were a hundred such services.
But despite a number of important projects to date -- and Google has helped fund some of them -- none has been on the same scale simply because no one else has yet chosen to invest the time and resources required. But if there are to be a hundred services in future, we have to start with one.
Well, okay, wonderful, you can help establish those hundred others!
Technically this would be simple. The scanned images, OCR'd text, and
meta-data (titles, coordinates of the text within the images, etc.) for all
unclaimed works would be made available, free, in a standardized format,
to anyone to use as they see fit. (For profit or otherwise; the market
would decide which uses are worthwhile. For works that are later claimed,
I would think it fair if the settlement agreement's profit-sharing
provisions would then apply between the registry and whoever made use of
the work. [But not profit-sharing with google.])
The one licensing proviso for use would be to periodically check
and cease using any work that has become claimed. (Obviously those
who make use of the works could negotiate with the now-identified owners
to continue using the work. After a few years
the number of newly claimed works would probably be quite small so
this isn't much of a barrier. A list
of book-ids that have been recently claimed would be easy for
developers to check in their applications.) This ensures that only
works no owner cares about are being used for free.
The registry that the settlement creates would be the obvious place
to distribute all these files.
This would open up a wealth of uses, probably many clever ones
we would never imagine.
This addresses Google's valid point that they're
preserving a lot of orphan works that might otherwise be lost.
That rightsholders have to take action to protect their rights is
annoying (and technically illegal) but in this context I can see the logic that it's not possible
to find the rightsholders for all those millions of works; and "opt-in"
means they could be lost. That preservation aspect seems useful.
Yes, it does turn copyright law upside down, but it's a recent change
anyway that authors should not have to take action to protect their rights:
US copyright law used to require authors take actions to preserve rights (registration
and renewal). This was the case in the US until 1976. Indeed, authors
are actually still required to take an action, registration, to obtain
maximum protection.
If the law had remained as it was before 1976,
that authors had to take steps to keep their work in copyright,
nobody would bat an eye in the first place at authors having
to take steps to assert their rights today in this settlement.
No-author-action-required
probably seemed like a good idea in 1976. I doubt anyone at the time
imagined computerized scanning of millions of old books or a google-like
system for searching in a mere 30 years.
(Even science fiction writers hardly imagined it.)
That's fine — but times have changed, and perhaps, indeed, to
preserve a corpus of millions of old books, it may be the lesser
evil to require action on the part of authors to claim their works.
I would also note I make this suggestion in the context of the settlement
being a seeming done deal. If the judge decides this settlement is to
become law, and essentially overturn the requirement to locate copyright
owners before using their work, then (in that fait accompli
context) it would be better, in my opinion, to broaden the accessibility
of unclaimed works rather than leave google the only gatekeeper.
Google should not, in any event, be the sole beneficiary of their
illegal action. If the judge decides society benefits by access to
digital works, then society will benefit more if unclaimed works are
freely available to all. (Until they are claimed, when the author's rights
to control their work resumes as it always was.) This will spur the
hundreds of other such services that Google itself wishes for.
I realize the settlement may be close to cast in stone at this point,
but this would seem like a beneficial revision.
What do you think? Would this be a good thing? (Add your comments -here-.)
Taking this a step further, all the other scanned books — the
claimed ones —
should also be available to all comers under the same exact terms as the
settlement provides to Google. (The same revenue split, the same rights
to remove one's book from the system, etc.)
If the settlement agreement goes forward as now written, with Google
the only entity expressly permitted to display unclaimed works, then —
and this is just musing here after a few glasses of the grape —
it would be ironic if some group pirated all the scans of truly orphaned
works (the ones nobody claims) and put them up free on bit-torrent. :)
Also, a final suggestion for the settlement folks: It would be nice
if rightsholders could be given a free copy of the scans/text/metadata
of their own works. This too would increase competition by allowing
copyright owners to easily take their own works elsewhere for display.
(Why should google provide this? Again, because they broke the law to
obtain these books in the first place. It's a minor penalty to ask them
to share with the rightful owners.)
Anyway, enough post-grape musings for the night. :)
What are your thoughts? Crazy idea?
Notes
By way of biographical note, while I was VP of SFWA I chaired the
Orphan Works Committee. Making orphaned works available to the public in a fair
manner is a wish of mine.
To insulate users from their own potential copyright liability, in
addition to making the images/text/metadata available for download, google
should also host all the data in directly usable form. For example, I
might write an ebook reader application that displays scanned image pages
— the URL I would use inside my app would point to the page on
google's server. Thus I wouldn't have to host the page image myself,
unless I wanted to. Likewise the raw text should be retrieable in real
time from google's servers.
Disallowing derivative works and temporarily escrowing potential owner shares of profits if they later turn up, as per the current proposal, could also be reasonable
licensing restrictions.
Note this idea is orthogonal to the whole opt-in / opt-out question. (What the Berne Convention calls a "formality" [and forbids], i.e. authors having to actively claim their work rather than passively, like registration or renewal.)
That is: While it seems likely the settlement will get approved with some form of action required by copyright owners, even if not, some works will remain that even after diligent search no owners can be found. This proposal addresses
any unclaimed / orphaned works, no matter how many or few, or how hard google
does or doesn't try to find the owners.
Regardless how many works are in the "unclaimed" or "orphan" category, it would be nice to prevent google from having a monopoly over these unclaimed works.
Related:
Axioms in the Future of Publishing
Thoughts on Copyright
Ebook (un)availability case study