Proof of Data Segment Inclusion (PoDSI)
Data uploaded to Storacha is aggregated together with data from other users of the service. When an aggregate is big enough, it is stored with multiple Filecoin Storage Providers.
Storage Providers have minimum size requirements for data storage. Therefore it is necessary to aggregate data to satisfy the requirements. The minimum is typically between 16 and 32GB.
The Storacha service uses Proof of Data Segment Inclusion (PoDSI), which allows clients to verify the correct aggregation of their data and prove this fact to third parties.
Shard CIDs
Data uploaded to Storacha is packed up and sent as CAR files. We call these shards, and each one is referenced by it's shard CID.
# example CAR shard CID
bagbaieralcueppbj7cpxhlsfuokxatqzqdutb47mgs44myg7dsmktsb34zxa
The IPLD codec for a CAR shard CID is 0x0202
. You can inspect this CID on cid.ipfs.tech (opens in a new tab). It is not your content root CID - that's a different CID that refers to the root node of a DAG that has been built from your data. The shard CID is a hash of the (CAR) file your DAG has been packed into.
Your content may be split between 1 or more shards.
Piece CIDs
Piece CIDs are the primary means of referencing data stored in a sector of a Filecoin Storage Provider. Each piece CID is loosely equivalent to a corresponding shard CID.
The piece CIDs used in Storacha are v2 piece CIDs since these also encode tree height information. On chain and in various chain explorers online you may see v1 piece CIDs displayed. You can convert from v2 to v1 but not v1 to v2, unless you also know the tree height.
FRC-0069 documentation for Piece CID v2 (opens in a new tab).
# example piece v2 CID
bafkzcibe52kq2dtip2bmrrw5qhphsa35onsxxvkuxl33dnotq2allfpz7tdxlhc5di
The IPLD codec for the above multihash is 0x1011
(fr32-sha2-256-trunc254-padded-binary-tree). You can inspect this CID on cid.ipfs.tech (opens in a new tab).
Proof of Data Segment Inclusion (PoDSI)
PoDSI enables clients using data aggregation services like Storacha to verify the correct aggregation of their data and allow proving of this fact to third parties.
Put simply, it is a proof that a smaller piece (a segment) has been included in a larger piece (an aggregate).
FRC-0058 documentation for PoDSI (opens in a new tab).
# example merkle proof showing path from aggregate (piece) CID to segment (piece) CID
bafkzcibcaapbrpjpxk32treyrtw5kamyh5ayxoj7rp4obkeoloydktubycnkufy
βββ¬ bafkzcibcaapie6hlazrzph5ui3dx2xxhkie3qbju35shf2bhsdhocrgbp5i2opq
βββ¬ bafkzcibcaaorabhed2eafo5l7xhwiqycjqp24dfidpxtanhdi53uz33anbjpsma
βββ¬ bafkzcibcaaojol2me7hy6z2egwmr24otf3pklb2ybxu6qo7fqmwj7avmqise2mi
βββ¬ bafkzcibcaans72xvshaijv5ew3nnr7ytn32fesfhwkmh3bm42uqtmmckim3jcdy
βββ¬ bafkzcibcaanc4f4qf2fgmmmt6svgqkk3bkgek3xcubrp6zliq4ldtuhcgt3lcmi
βββ¬ bafkzcibcaam5ab7crtl3upsf24prtm7egprc4uamwmdy65or5ttx3tvtiipfypi
βββ¬ bafkzcibcaamdx5p77d4mvlhalb4le4l2mdgkivjqnr4y6yuccdyvda72wctwmoy
βββ¬ bafkzcibcaal6br6ujd6rlazb4grghepfitehzhxowh7zsbte33gvvcvzetnuuaa
βββ¬ bafkzcibcaalnrywyaaduk63o6vbewveipgvgn4hd6fbx7z3hfyrcxiakj2bzefa
βββ¬ bafkzcibcaakuwcl737abdrpbanbtyhzrwwlqzjvshenvyfda5d66zbnkyuf5cey
βββ¬ bafkzcibcaakdjuz3vxhpsh4z5enxfsphdquyufqki4jwfq6xwmpmzyzzvkzicfi
βββ¬ bafkzcibcaajvatl2odwkinga5qdngzqhxkuv4np2vjmm2yzia6el3zktd2lc6ki
βββ¬ bafkzcibcaajk3bjkzabhi53fmdmc2rgmbqhanpkmxoqrg6nvd4lmomu4klvc2lq
βββ¬ bafkzcibcaaiua66kdlvtfpgbrhoyha32wemunseubsszahixj2io2zjvthxxypq
βββ¬ bafkzcibcaaift45zx5rhxe4tucqjwpekucnij54onv4n52njju4zg7e2insaiea
βββ bafkzcibe52kq2dtip2bmrrw5qhphsa35onsxxvkuxl33dnotq2allfpz7tdxlhc5di
The above example does not visualize all the information that a PoDSI contains, just the direct path from aggregate (piece) CID to segment (piece) CID.
Data Aggregation Proof
A data aggregation proof is a PoDSI, plus information that ties an aggregate piece to a Filecoin Storage Provider. At time of writing this one or more Deal IDs.
Verifiable Aggregation Pipeline
The Storacha aggregation pipeline is fully verifiable thanks to UCANs. Your piece can be tracked through the pipeline via signed UCAN receipts.
There are 4 roles in the aggregation pipeline:
- Storefront - facilitates data storage services to applications and users, getting the requested data stored into Filecoin deals asynchronously.
- Aggregator - aggregates smaller data (Filecoin Pieces) into a larger piece that can effectively be stored with a Filecoin Storage Provider.
- Dealer - arranges deals with Filecoin Storage Probviders for the aggregates.
- Deal Tracker - follows the filecoin chain to keep track of successful deals.
Roughly speaking, a piece progresses through the pipeline via the following stages:
filecoin/offer
The client submits a piece to the storefront (storacha) for aggregation and storage in Filecoin storage providers. The receipt for this invocation contains two links for async tasks:
filecoin/submit
- allows the client to continue following the receipt chain through the aggregation pipeline. It is executed after the storefront has verified the piece CID corresponds to the shard CID.filecoin/accept
- a "short cut" to the end of the pipeline, where the data aggregation proof will eventually become available. It is executed when the dealer has successfully stored an aggregate containing the submitted piece in one or more Filecoin Storage Providers.
filecoin/submit
The storefront issues a receipt for filecoin/submit
to indicate it has verified the offered piece and submitted it to the pipeline. The receipt for this invocation contains a link for an async task piece/offer
, which is executed when the storefront offers the piece to an aggregator.
piece/offer
The aggregator issues a receipt for piece/offer
when the storefront offers a piece to be aggregated. The receipt contains a link for an async task piece/accept
, which is executed when the piece has been included in an aggregate.
piece/accept
The aggregator issues piece/accept
receipts when an aggregate is big enough. Every piece in the aggregate is issued a receipt which includes a Proof of Data Segment Inclusion (PoDSI). The receipt contains a link for an async task aggregate/offer
, which is executed when the aggregator offers the aggregate to a dealer.
aggregate/offer
The dealer issues an aggregate/offer
receipt when the aggregator offers a piece to be stored by Filecoin Storeage Providers. The receipt contains a link for an async task aggregate/accept
, which is executed when the aggregate has been stored by at least one Filecoin Storage Provider.
aggregate/accept
The dealer issues an aggregate/accept
receipt when an aggregate has been stored by at least one Filecoin Storage Provider. The receipt includes information that ties an aggregate to a Storage Provider which is used in the next step to create a Data Aggregation Proof.
filecoin/accept
The storefront periodically checks for an aggregate/accept
receipt for offered aggregates. When an aggregate is accepted, the storefront issues filecoin/accept
receipts for each piece in the aggregate. The receipt includes a Data Aggregation Proof. This is the end of the pipeline.