Introducing Clique

Identity Oracles for Web2 User Behaviors

Feb 13, 2023

In the off-chain world, a much more robust set of user identity data is available compared to its web3 counterpart. For example, information about a user’s authenticated identity and social media behaviors allows for more sophisticated forms of user engagement, user quality control, and advertising.

In contrast, blockchains inherently lack a credible user reputation and identity system due to their built-in pseudo-anonymity and the fact that oftentimes native user behaviors are purely financial and speculative. This makes these systems easily prone to Sybil attacks and other forms of identity manipulation.

As a result, the on-chain application design space is highly limited. A reliable on-chain reputation system would allow protocols to conduct higher-quality user selection, distribute directional incentives for a wider range of value-generating activities, and establish more sophisticated solutions in DeFi. For example, a GameFi project could release NFTs limited only to first-person shooter (FPS) players with over a thousand hours clocked in CS:GO. Or projects could incentivize long-tail influencers to create organic marketing content through the use of recurrent airdrops. On-chain peer-to-peer lending can also be conducted in a permissionless manner if users are issued credits based on off-chain FICO scores and KYC information.

Solutions

To solve these problems, Clique proposes a new type of primitive: identity oracles. In general, an oracle refers to a piece of software that channels off-chain data on-chain. A good example is Chainlink’s price oracles. In contrast, an identity oracle specializes in bringing user-specific data such as their identity information (social media influence, gaming skills, credit scores, etc.) and behavioral data (social media engagement, e-commerce consumption, etc.) on-chain.

The design problem for an identity oracle can be broken down into four parts — identity authentication, data retrieval, off-chain computation, and feeding the data into some on-chain vehicle. Authentication is usually done with some kind of O-Auth token or private credential (signing key in a PKI), during which a user needs to prove that they actually own the identity data from another platform. Preserving user anonymity becomes a key challenge in this context — ideally, no one, not even the middleware provider, should be able to link the user’s off-chain and on-chain identities together.

Data retrieval and computation, on the other hand, require that the entire process has both provenance and integrity. The former suggests that the relevant data can be correctly attributed to its original source — this is usually done by verifying TLS certificates and the corresponding chain of trust — while the latter suggests that the computation is being correctly executed without any form of adversarial tampering. Of course, both can be done with centralized servers without any privacy preservation, but this would limit the usecase to public data (e.g. Twitter interactions), leaving private user information like KYC status, credit scores, and even e-commerce transaction histories outside of the picture. Decentralizing the oracle nodes, on the other hand, is important for fault tolerance and custom access control. This is also in line with designing a decentralized identity (DID) system, or a network of nodes to issue verifiable credentials that host identity information. Obviously, user identity information can’t be exposed to an arbitrary node runner, as it would likely violate data compliance legislations like GDPR and CCPA. Therefore, the importance of privacy preservation is once again highlighted.

After the completion of the above three steps, the data can be fed into any decentralized data vehicle. It can be minted to an SBT (Soulbound token) or a non-transferable token on-chain, issued as a verifiable credential within a DID system, used to create upgradable NFTs, and also used to trigger arbitrary smart contract calls. Note that each of these vehicles requires a different application interface for signing the data, storing the data, and verifying the data.

Modular Privacy Layer

To solve the above problems, Clique uses cryptographic tools like zero-knowledge proofs (ZKPs), trusted execution environments (TEEs), and multi-party computation (MPC) to design a modular privacy-preserving layer and supply our identity oracles with custom trust assumptions.

Zero-knowledge proofs enable users to prove certain attributes about their identity without actually revealing them. Clique uses two types of ZKPs; membership proofs and query proofs. Membership proofs allow users to prove that they belong to a group or set anonymously. This is done by having the user prove that a valid Merkle path exists from the corresponding leaf (usually an identity commitment to hide the actual value) to the Merkle root that represents the set. The idea underneath is similar to on-chain mixers like Tornado Cash, which provides transactional anonymity.

On the other hand, ZK query proofs can be used for maintaining confidentiality when generating general identity queries. For example, if a project wants to require that a social media account is of a certain age or that a user has a FICO score higher than some threshold, then it can use these proofs as a proxy for the actual data. It serves as a de-sensitization tool that can be verified in a highly interoperable manner.

But using only ZKPs is not enough for identity oracles to function properly. Whenever a ZKP is generated, someone needs to have a valid witness that satisfies the circuit constraints. Because, in this case, the witness is the user’s off-chain data, which likely originates from some third-party web server, we need to make sure that the data has provenance and has not been tampered with throughout the proof generation process. If the user directly generates the ZKP in their frontend, they can easily change the witness to be something that’s different from its original source, as long as enough incentives are presented for making such an attack. A simple solution here is to use a centralized party to execute all the relevant computation, but as described above this limits both user privacy and network decentralization.

Right now, Clique mainly uses TEEs, or secure enclaves, to validate, encrypt, and process the data for ZK query proofs so that both data integrity and confidentiality are guaranteed. The execution runs in a hardware-based encrypted memory, preventing adversaries from altering the execution logic or accessing the memory. TEEs also provide attestations that allow end users to verify that the execution result is actually produced by an authentic enclave and is operating as expected. Currently, we use Intel SGX to create on-chain verifiable ECDSA signatures that can be directly verified against Intel’s Root CA. We are constantly tracking potential attack vectors against SGX based on newly released vulnerabilities like Aepic and MMIO. Some of the mitigation measures we have taken to reduce the attack surface include keeping the SGX trusted computing base (TCB) up-to-date, implementing various ORAM techniques, and limiting client diversity by allowing only proven hardware that enforces additional DCAP rules to join the network.

A similar solution can be constructed with a TLS-level MPC, where an external verifier engages in a two-party computation protocol (after splitting the MAC key into two shares) with the user under a sequentially enforced commitment scheme to make sure that the data packet is not being tampered with. This enables end-user proof generation within the browser under a WASM-based computation environment. Clique is collaborating with Chainlink on DECO and also exploring potential integrations with TLS Notary (building extensions with emp-zk/Mystique) with respect to MPC-based solutions. Although information-theoretically secure, MPC-based mechanisms are known for inducing huge communication overhead, thus limiting their performance from production-level readiness.

Below is a simple architectural diagram of the components mentioned above.

Current Products

Clique has built several products on top of our identity oracles in the last few months.

Attestor / Issuer Contracts

Attestor is an endpoint for users to make attestations about their off-chain (or cross-chain) behavior and identity data and put them on major L1/L2s. The data can be hosted in various vehicles on-chain and serve as a basis for constructing native reputation systems to unlock more applications. Some examples include Sybil resistance checks, tiered NFT systems, and more robust governance structures (quadratic voting w. different params), etc.

We have recently launched our first attestor with Optimism AttestionStation and have had more than 35,000 unique users in a month. Similar partnerships with Polygon ID and various other L1/L2s are in development. We are actively encouraging developers to build on top of the attestations created.

Provenance / SDK

Provenance (or campaigns) is a product that provides deep data insights to help projects better understand and manage their communities. For example, the data feed from provenance allows organizations to measure certain metrics like the number of real followers, engagement over time, and impressions garnered from user-generated content.

Beyond Provenance, the Clique team also provides an SDK that grants projects granular control of how they would like to use Clique’s data feeds. The SDK lets projects obtain rich user behavior data about their users natively on their websites.

We are actively developing features to decentralize Provenance further. As an extension to regular campaign platforms, we are designing mechanisms to move this product fully on-chain. Ciphertexts of user data will be stored on-chain (or in any decentralized storage protocol) and an MPC scheme (as well as potentially threshold encryption) will be enabled for custom access management for parties who want to access this data.

Provenance has currently been used by 60+ projects for Sybil resistance, screening for high-value users, incentivizing UGC, and maintaining sustained user engagement.

Coming Soon: General-Purpose Action Trigger Contracts

Clique is also building a generalized infrastructure to allow arbitrary smart contract logic to be executed based on off-chain user behaviors. The Clique oracles authenticate user identities, perform computations upon particular metrics (e.g. the user has made social media posts that have generated a certain number of impressions), and then feed the data on-chain to trigger some contract call (e.g. marketing reward distribution). Any dApp builder can integrate with these contracts and supply custom user data pipelines for its business logic.

Conclusion

Clique is actively growing our community and looking for more participation from developers to build upon our products. Don't hesitate to get in touch with us on Twitter to learn more and check our website!

If you are interested in topics like privacy, identity, and reputation systems, we’d also love to hear your thoughts and have a conversation about it, feel free to reach out to Kevin or Jaden anytime!

Clique’s Substack

Discussion about this post

Ready for more?