Demystifying and Adjusting the Promises of Blockchain-based Data Management in the Permissionless Setting

Roman Matzutt

The digital currency Bitcoin introduced the blockchain as a data structure that allows its users to establish consensus about who owns which coins in a decentralized manner. Since then, blockchain technology has evolved and now enables distrusting parties to engage in online interactions without the need for a trusted intermediary by immutably recording general events in transactions. This interaction model sparked a tremendous interest in blockchain technology, its potential, and applications. However, the identification of multiple shortcomings has since dampened this initial spirit of optimism. These shortcomings are especially apparent for permissionless blockchains, such as Bitcoin, which openly encourage participation by anybody. For instance, Bitcoin has to secure its blockchain against malicious actors by relying on energy-intensive computations, which further leads to scalability issues as only few payments can be accepted at a time. While prior work has extensively studied such technical challenges, it neglected the influence of the data stored on the blockchain so far. Yet, this influence becomes undeniable: On the one hand, unknown actors can irrevocably append new data without a designated removal process. On the other hand, the operation of a blockchain system depends on a massive replication of its full and growing history. Hence, the impact of blockchain-recorded data requires thorough investigation to ensure the security and longevity of blockchain systems. In this dissertation, we thus take a data-driven perspective to assess and improve the applicability of permissionless blockchains as building blocks for decentralized data management systems. We identify two core challenges of blockchain-based data management, i.e., the need for moderating what data is recorded and the need for alleviating the continually growing storage requirements stemming from the append-only nature of blockchains. Furthermore, we assess the potential of blockchains to enable additional applications by seizing their characteristic properties. We address these challenges on a technical level via the following contributions. As our first contribution, we systematically analyze the phenomenon of blockchain content insertion on a conceptual, technical, and empirical level. Our analysis reveals that content insertion is a common practice and offers benefits for higher-level applications, but inserting illicit content can potentially create devastating consequences for the participants. As our second contribution, we explore means to mitigate these consequences, both before and after the fact, by proposing strategies to prevent the insertion of unwanted content as well as a redactable blockchain that enables a swift and transparent removal of content. Our third contribution addresses the challenge of growing blockchain sizes by defining a gradually deployable block-pruning scheme that is retrofittable to Bitcoin and enables users to retroactively forget obsolete data and thereby reduce their storage requirements. Finally, our fourth contribution shows that permissionless blockchains still hold untapped potential for fueling novel applications despite their limitations; namely, we demonstrate how Bitcoin can help securely bootstrapping decentralized anonymity services. Overall, we shed new light on the potential impact of the data persisted on blockchains. Our analyses and technical contributions therefore widen the scope for resilient and durable blockchain designs for data management tasks.