Describe a new means by which provenance information can be stored using distributed ledgers.
Significance and Impact
This work enhances our ability to effectively capture and share workflow provenance.
- Document problems of current provenance solutions.
- Identify specific, near-term solutions for fully shared provenance solutions.
- Provide recommendations for longer-term research needs.
Sharing provenance across workflow management systems automatically is not currently possible, but the value of such a capability is high since it could greatly reduce the amount of duplicated workflows, accelerate the discovery of new knowledge, and verify the integrity of past and present analyses. Although numerous technological challenges exist to efficiently share provenance information across workflow management systems, permissioned distributed ledgers could surmount many of them. The primary benefit of permissioned distributed ledgers over other technologies is that their distribution is over a peer-to-peer network that encodes transactions across the network into an immutable hash list and achieves consensus on the validity of the new data through a common consensus mechanism. This work discusses provenance and distributed ledgers on their own and then presents an argument that distributed ledgers naturally satisfy many of the requirements of workflow provenance, that provenance information can exist in the ledger in multiple ways, and that a number of novel research areas exist based on this strategy.