A look at Scorex: a Scala blockchain framework

After slowly gaining visibility for the past few years, in 2017 cryptocurrencies and ‘the blockchain’ in general have suddenly gone from something ‘you may want to keep an eye on’ to something that you ignore at your peril, such is their claimed disruptive potential. Or maybe not—but the interest is certainly skyrocketing.

When I learned of Scorex, the self-described modular blockchain framework written in Scala, my interest was piqued. Scorex’s mission statement is to enable easier blockchain experimentation and prototyping through abstraction. This is attempted by trying to modularise the various building blocks that make up a working blockchain system.

One of my main questions was: who is this framework aimed at? Independent entrepreneurs who want to ship a new currency quickly but don’t want to reinvent the wheel (a sort of Spring for cypherpunks, then) or cutting edge researchers with academic-level knowledge of the subject, devising and simulating the algorithms for the next generation of blockchains? Maybe both? I wanted to find out by diving in.

This foray into Scorex was concurrent with my research into blockchain itself, a subject which I’ve just recently started to investigate—from scratch—not too long ago. So, here are my impressions form the point of view of a mostly-newcomer to the blockchain, and total newcomer to Scorex.

Blockchain concepts

This post is not meant to be a comprehensive introduction of Blockchain concepts. There are several resources available for those needing a primer, for example here.

The first difficulty a newcomer finds when trying to educate themselves about the blockchain, is mentally separating the various components of any given blockchain implementation, cataloguing them into what they already know and the real novelties. The ideas initially all seem enmeshed together and in a way they are.

Take the archetypal example, Bitcoin—more often than not the first application people start looking into. Bitcoin became famous as ‘a currency’, which will mislead the newcomer into assuming that the revolutionary mechanism is some clever book-keeping design. In a broad sense this is true, but thinking about it from this angle risks missing the central point which is the blockchain itself—an untamperable, immutable, distributed, append-only record that can exist without any central authority and without requiring any trust between nodes in the network.

The above is the essence of the technology, and it is in fact quite independent of what we choose to store in the ‘blocks’ that make up the ‘chain’: whether purely money transactions (as in the case of Bitcoin), or other types of data (such as in Namecoin) or even executable code (Ethereum)¹.

A growing lineage

There is now a large and growing number of alternative blockchain-based systems, so it seems natural to compare them to see what makes them similar and what makes them different, and if there are any common elements, whether they can be reused. After all, like pretty much all software in the world, there are various layers to a blockchain application:

The actual blockchain record technology, its physical format, validation rules, etc.
The book-keeping system and/or other data payload format within the blocks
The ‘mining’ (or ‘minting’, or ‘forging’) system which creates new valid blocks to add to the chain
The consensus system to resolve conflicts and detect attempts at malicious activity (which is often tied to the previous item, and is the area with the most active research at the moment)
The cryptographic system that controls who is authorised to perform actions on the blockchain
The network layer, which allows nodes to communicate with each other and stay in sync
The client API…

However, according to Scorex’s ‘mission statement’, many of the existing systems have each implemented their own solution, to their own requirements, without as much concern for abstraction, and the result is that the components of each system are monolithic, tightly coupled to each other and to the technical goals of the application in question. So, it seemed only a matter of time before someone tried to identify these similarities and envisioned a unified framework for blockchain development.

Scorex

Scorex in principle turns out to be pretty much what it claims to be: a modular framework which provides some ready functionality and some very useful abstractions, but whose usability still requires a lot of head-scratching (at least if you don’t understand blockchain theory at the research level—which I certainly don’t). It appears to be in a relatively early stage of development and the documentation is limited. Any specifics which I give below could therefore be subject to change.

Overview

As shown before, there are various ‘layers’ which are required to make up a working blockchain application as a whole. We shouldn’t let ourselves be fooled into thinking that a ‘blockchain application’ is all about the blockchain record system plus some mundane details: in order to have a complete system we have a large amount of ancillary requirements, all of which much less sexy than the central invention, however much of a stroke of genius we consider our novel blockchain or consensus idea.

Firstly let’s note that the authors of Scorex have chosen Scala. In their own documentation, in favour of this choice they cite the cross-platform advantage, Java interoperability, its expressive functional syntax, powerful type system and the good tools available for concurrency, presumably referring especially to Akka actors, on which Scorex relies in its outer layers.

Functional programmers will be especially interested in how Scorex leverages Scala’s type system, and in the abstract patterns that tend to recur in blockchain application design, which this project tries to identify and isolate. These patterns give us a chance to better organise our mental model of blockchain concepts that are otherwise easily jumbled up. Thanks to the type system it is possible, via type parameters, to express various blockchain designs by using only a handful of base traits. For example, we could in theory reimplement a working Bitcoin node as well as other currencies using the same basic building blocks.

Exploring the code

Scorex is an early project that is still under active development. The code still feels a little rough around the edges, and in several places I found myself thinking that the representations chosen could use a little finesse. There is probably much work still in progress. I may be seeing too much into it, but parts of the code remind me of my own code when I am solving a large and interesting problem: the sense of “I need to sketch this bit out as quickly as possible so I don’t get too absorbed in improving this detail at the cost of losing the global picture”. And it really is a large and complex global picture. Any real-world problem will resist clean decomposition to some degree. Sometimes there is a truly general, logical set of principles from which everything else follows, other times not. As far as I could gather, in Scorex it seems that this is still an active area of research. The abstractions may end up being refined and others discovered; looking through the code base, the impression is of a project that is still being reorganised, but I was impressed by the analytical work that led to a useful decomposition of what makes a cryptocurrency a cryptocurrency.

That’s probably what you get when you approach the topic of blockchain from a more rigorous research perspective, but to the more casual enthusiast it can look quite daunting. Indeed the other hurdle I found is that Scaladoc and other inline documentation is still very sparse at this stage, and the naming of the central traits only begins to make sense once a general picture of the design is gleaned. The currently available tutorial helps but it is still work-in-progress.

It does take a while to allow all the concepts to sink in. The various parts of the model are bound to each other via extensive type-parameterisation and this network of dependency takes a while to take shape in one’s head. I had to resort to drawing and scribbling on several sheets of paper until I had a clear-enough picture. I shall attempt to make the job a bit easier for the reader.

Top-level view

Extreme generality

In its quest to be very general, Scorex doesn’t even commit us to the concept of the blockchain itself. There is in fact a more generic trait, History, from which we can decide to derive some different structures (perhaps a block tree rather than a chain, or something else altogether). It does provide us with a convenience BlockChain sub-trait of History, but doesn’t force us to use it. Similarly, it provides the type ‘Block’ but we are not strictly required to use it. One could build a system without the concept of a block, or with a completely redefined kind of block. Also, there is a nice abstraction called a Box (more on this later) that represents a cryptographically-protected piece of state, and if one chooses to go along with this abstraction, conveniently there instances of components that employ this abstraction too (but one could just as well develop a different conceptual blueprint from scratch). It seems to me that Scorex tries hard not to paint itself into a corner and to stay supple as the blockchain world keeps experimenting and coming up with new ideas. The downside to this approach is that, even with the skeleton provided, it can still take a good amount of work to specify and wire up a working system. Perhaps in the future the library will contain more ready-made generic components that can be assembled with less drudge work. So far though, there is still much elbow grease required of the user.

The node state

One of the things we’ll need is a nice, modular model of the state of a node in the network. Functional programmers know the importance of making the presence of mutable state explicit and separating it from a system’s logic. In Scorex, this state is contained in a structure called NodeViewHolder and is made up of the following basic components:

A persisted history of previous activity, to be kept indefinitely: an example of this kind of record is the actual ‘block chain’ that gives the name to many such systems.
A minimal set of pieces of information that allow us to validate (i.e. accept or reject) incoming requests. Scorex calls this a minimal state. (In many cases much this information can be deduced by scanning the history). In the case of a cryptocurrency this minimal required state can include, for example, a set of all accounts and their balances, as well as their associated public keys.
A memory pool of information to be temporarily buffered before being written to the history. In the case of Bitcoin, for example, miners keep outstanding transactions in memory until they manage to ‘mine’ a block (which usually takes some time), and then they are committed to the history proper.
A vault, intended as a structure in which to store secret information known to the node. For instance, the secret keys to its own coin balance (in which case it’s called a Wallet).

Propositions

We’ll start by looking at Propositions as they are one of the first few types we’ll need to understand in order to define more complex components until we arrive at the complete structure we need, which is a complete node state and a set of rules to deal with relevant events.

At the root of ‘security’ lies the concept of certain actions that ought to be performed only if the party requesting their execution can prove something. The point to be proven is called by a Proposition. Examples of real-world propositions in this sense include:

“I know the password to this account”
“I am not a bot” (e.g. Captchas)
“I know the secret key corresponding to a given public key”.

In the cryptographic domain it is the latter type that tends to appear again and again. Something that ‘proves’ a proposition P is unsurprisingly a Proof[P <: Proposition]. A proof to a public-key proposition is, for example, a cryptographic signature added to a transaction, which can only be generated by knowing the private key but can be verified with the public key alone.

Boxes

Boxes are another nice general abstraction used in Scorex. A Box is any piece of information (i.e. an account balance, or any other variable or state element) protected in some way. It will have a Proposition which needs to be ‘proven’ to make changes to the box. If I want to take funds from a box, for example, I need to supply a Proof that satisfies the Proposition. Again, the usual concrete example is that of having to ‘sign’ my request to change the contents with the box. If my proof fails, the node refuses to make the changes requested (and if it were to allow them, other nodes in the network would detect and reject these changes.) The concept of a box has very general use once you absorb the idea, and it can be used to represent almost everything in the context of a crypto/blockchain system. All of the Bitcoins currently in circulation can be thought as Boxes: we need to (cryptographically) prove they’re ours in order to spend them.

Transactions

In Scorex there is a very general trait called a NodeViewModifier which captures any event that changes the state of a node. A Transaction will then be an special case of NodeViewModifier, because after a node receives a transaction, its state changes in some way: it will be added to a block, or a memory structure somewhere. The term ‘transaction’ may have monetary overtones but we should think of it as simply an atomic state modifier. If we employ the Box metaphor above, most transactions will then be actions that create, destroy or modify boxes (and of course they must include a valid proof).

Blocks

A Block, from the point of view of a data structure, is normally just a collection of transactions and auxiliary data and is what gets written on the persistent record known as the blockchain. Interestingly, as already mentioned, while Scorex contains a Block trait, it’s apparently not an essential component and we could design a blockchain without blocks!

A quick summary of concepts

First of all, When we refer to ‘node’ we mean one of the peers in a blockchain network. A NodeViewHolder is, in essence, the instantaneous state of a node. It includes pending transactions, the persisted history (the ‘blockchain’), and other pieces of information necessary to function. A NodeViewModifier is anything that can change the state of a node. A Transaction is a NodeViewModifier: it doesn’t (normally) get written to the history directly, but rather is included in a block. A transaction will usually be required to contain a Proof to a given Proposition if it attempts to act on a Box protected by that proposition. A Block is also a NodeViewModifier, but since it actually gets written into the stored History, it’s actually derived from the more specific trait PersistentNodeViewModifier. A BlockChain is a kind of History made up of Blocks.

All of the above are tied together by types, for example the actual type signature of BlockChain is

BlockChain[P <: Proposition,
           TX <: Transaction[P],
           B <: Block[P, TX], ...]

The network and API layers

These layers use Akka actors to communicate and will not be described in detail here. It suffices to say that NodeViewHolder—the part of the system which stores the state of the node—is an actor itself, and must react to such events as incoming transactions and sync data from other nodes. If the user implements a HTTP API, for example, many of the requests would likely find its way here. There is already some plumbing to allow nodes to be aware of each others and form a network—a lot of that submerged complexity which would be almost invisible to the specific crypto app developer but something to be thankful for. In my experiments I have not focused on creating networks of nodes so I cannot comment on the functionality—I can only observe that it is present and seemingly ready to use.

Conclusions

In this post we have covered the basic building blocks of blockchain and shown the general patterns that recur across implementations. There is also a huge amount of content that we have left uncovered, including consensus schemes (which are the subject of a lot of current research). Scorex aims to provide a general framework for investigating these patterns. While the library is still quite experimental, I believe that in time and with more documentation it can achieve this goal.

In order to work with Scorex, users currently need a level of knowledge that most beginners will underestimate. However, learning about Scorex provides value in itself. I would never have dreamt of delving into the Bitcoin source in order to understand this technology better, but after some days looking at Scorex I have emerged with a much better grounding in blockchain theory.

It’s too early to say whether Scorex will emerge as the go-to tool for blockchain prototyping and development. However, if the idea of the ‘blockchain development framework’ does become ‘a thing’, Scorex will have a headstart.

That said, it seems that, whatever the application, a currency component continues to be necessary in any blockchain application in order to incentivise the nodes in the network to keep running the system honestly. ↩

We encourage discussion of our blog posts on our Gitter channel. Please review our Community Guidelines before posting there. We encourage discussion in good faith, but do not allow combative, exclusionary, or harassing behaviour. If you have any questions, contact us!

Blog