Difference between revisions of "InterPlanetary File System"

From MgmtWiki
Jump to: navigation, search
(Taxonomy)
(Taxonomy)
 
Line 11: Line 11:
 
* IPFS is often claimed to be a [https://docs.ipfs.io/concepts/content-addressing/ ''Content Addressable Scheme''] (CAS), but the fact is that the content hash is not consistently applied so that statement is only partially correct.
 
* IPFS is often claimed to be a [https://docs.ipfs.io/concepts/content-addressing/ ''Content Addressable Scheme''] (CAS), but the fact is that the content hash is not consistently applied so that statement is only partially correct.
 
* A [https://docs.ipfs.io/concepts/dht/ distributed hash table] (DHT) is a distributed system for mapping keys to values. In IPFS, the DHT is used as the fundamental component of the content routing.
 
* A [https://docs.ipfs.io/concepts/dht/ distributed hash table] (DHT) is a distributed system for mapping keys to values. In IPFS, the DHT is used as the fundamental component of the content routing.
 +
* A [https://docs.ipfs.io/concepts/merkle-dag/#further-resources Merkle Distributed Acyclic Graphs] (DAGs) are a DAG where each node has an identifier, and this is the result of hashing the node's contents — any opaque payload carried by the node and the list of identifiers of its children using a cryptographic hash function.
  
 
===Swarm===
 
===Swarm===

Latest revision as of 09:04, 29 April 2021

Full Title

The InterPlanetary File System (IPFS) is a protocol and peer-to-peer network for storing and sharing data in a distributed file system.

Context

  • IPFS was first deployed in 2015 and grew by word-of-mouth as a replacement to HTTP for static content.
  • It is claimed that IPFS uses content-addressing to uniquely identify each file in a Global Namespace connecting all computing devices.[1]
  • In fact it is not possible to use the content of a file to get its address.
  • It allows users to host content as well as to search for it.
  • If users do not wish to host content, they can access IPFS by a public gateway.

Taxonomy

  • IPFS is often claimed to be a Content Addressable Scheme (CAS), but the fact is that the content hash is not consistently applied so that statement is only partially correct.
  • A distributed hash table (DHT) is a distributed system for mapping keys to values. In IPFS, the DHT is used as the fundamental component of the content routing.
  • A Merkle Distributed Acyclic Graphs (DAGs) are a DAG where each node has an identifier, and this is the result of hashing the node's contents — any opaque payload carried by the node and the list of identifiers of its children using a cryptographic hash function.

Swarm

Swarm addresses are addresses that the local daemon will listen on for connections from other IPFS peers. You should try to ensure that these addresses can be accessed from a separate computer and that there are no firewalls blocking the ports you specify. Typically port will be 4001.

API

The API address is the address that the daemon will serve the http API from. This API is used to control the daemon through the command line, or simply via Powershell. Ensure that this address is not accessible from outside of your machine or VPN, to prevent potentially malicious parties sending commands to your IPFS daemon. Typically port will be 5001.

Gateway

The Gateway address is the address that the daemon will serve the gateway interface from. The gateway may be used to view files through IPFS, and serve static web content. This port may or may not be accessible from outside of your machine; that's entirely optional. The gateway address if left blank will not start the gateway service. Typically port will be 8080.

The difference between Merkle DAG and Merkle Tree

For IPFS, Merkle DAG and Merkle Tree are two very important concepts.

Merkle DAG is the data structure of IPFS storage objects, and Merkle Tree is used for blockchain transaction verification.

Merkle Tree is usually also called Hash Tree, which is a tree for storing hash values; Merkle DAG is the abbreviation of Merkel directed acyclic graph. The two have similarities and some differences.

From the object format, the leaves of the Merkle Tree are the hash values ​​of data blocks (for example, files, transactions). The non-leaf node is the hash value of the concatenated string of its corresponding child nodes. The node of Merkle DAG includes two parts, Data and Link; Data is binary data, and Link contains three parts: Name, Hash, and Size. In terms of data structure, Merkle DAG is a more general case of Merkle Tree. In other words, Merkle Tree is a special Merkle DAG. From a functional point of view, the latter is usually used to verify the integrity of data, while the former is mostly used in file systems.

Problems

  • Your information is not "stored" on the IFPS. If the source node does not maintain a copy, or pay a pinning service to maintain a copy, do not expect to be able to find the data when it is needed.
  • On 2020-11-15 there was no production ready code that implemented IPFS. Google's GOLANG implementation (see "Installing") was considered to be an alpha test version.

Podcasts

Podcasts over IPFS is something lots of people are thinking about (Adam Curry and his new PodcastIndex is an example of a team considering it). It’s true that most IPFS gateways aren’t going to want to just be used as “free” bandwidth providers, although I’m not sure what they’re doing in general to combat that. I do know some don’t allow video streaming for example. They just block it.

The interesting use case would be pure browser-based IPFS instances running peer-to-peer and getting podcast data from each other (originally sourced from the normal non-IPFS url), and sharing the bandwidth and gaining performance similar to how BitTorrent works.

But here’s the thing: Because most people will NOT be using any IPFS players, the podcasters are forced to use a normal podcasting hosting service (libsyn, bluberry, etc), and once they do that their bandwidth problems all vanish. So there’s no incentive. And on the consumer end, the podcasting hosts are serving up data just fine as is today. So neither end of the equation currently has any incentive to jump to IPFS.

Have a look at https://d.tube

Installing

For Windows using chocolatey.

> choco install go-ipfs
> ipfs init
> ipfs daemon

By default the client looks for a daemon at http://localhost:5001. This can be overridden by either setting the environment variable IpfsHttpUrl or initializing the client with an URL.

// js-ipfs likes this address
static readonly IpfsClient ipfs = new IpfsClient("http://127.0.0.1:5002");

Test to see if the daemon is running by typing this in your browser.

http://localhost:5001/ipfs/bafybeianwe4vy7sprht5sm3hshvxjeqhwcmvbzq73u55sdhqngmohkjgs4/#/

Using

IPFS can run in either online or offline mode. Online mode is when when you have IPFS running separately as a daemon process. If you do not have an IPFS daemon running, you are in offline mode. Some commands, like ipfs swarm peers, are only supported when online.

Solutions

Windows

The context of this wiki page is windows, but the typical context of go is GNU so some helpful translations are

  • ~ tilde is the same as $HOME which in Powershell is $env:USERPROFILE
  • ~/ipfs is the default directory for all things ispf include the config file
  • the most significant part of the config that might want to be changed are here - note in particular "API": "/ip4/127.0.0.1/tcp/5001", which may need to be changed to 0.0.0.0 if you need to access ipfs from any devices other than localhost.
  "Addresses": {
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ],
    "Announce": [],
    "NoAnnounce": [],
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Gateway": "/ip4/127.0.0.1/tcp/8080"
  },
  "Mounts": {
    "IPFS": "/ipfs",
    "IPNS": "/ipns",
    "FuseAllowOther": false
  }

SideTree

Troubleshooting

Check out the installation guide in the IPFS Docs, or try these common fixes:

Is your IPFS daemon running? Try starting or restarting it from your terminal:

ANY SHELL
$ ipfs daemon
Initializing daemon...
API server listening on /ip4/127.0.0.1/tcp/5001
or if you are running on multiple computers
API server listening on /ip4/0.0.0.0/tcp/5001

Is your IPFS API configured to allow cross-origin (CORS) requests? If not, run these commands and then start your daemon from the terminal:

 WINDOWS POWERSHELL ONLY
$a =  '[\"http://localhost:3000\", \"https://webui.ipfs.io\", \"http://127.0.0.1:5001\"]'
$a
[\"http://localhost:3000\", \"https://webui.ipfs.io\", \"http://127.0.0.1:5001\"]
ipfs config --json API.HTTPHeaders.Access-Control-Allow-Origin $a
$ ipfs config --json API.HTTPHeaders.Access-Control-Allow-Origin '["http://192.168.254.24:5001", "http://localhost:3000", "http://127.0.0.1:5001", "https://webui.ipfs.io"]'
$ ipfs config --json API.HTTPHeaders.Access-Control-Allow-Methods '["PUT", "POST"]'

Questions and Answers

  • How long is this stored for? Is it temporary?

I don’t know the js implementation that much but by default the file is pinned on your node (so your node will provide it forever). If someone else fetch it they will either not reprovide it if they disabled this feature, else this will stay until they run a garbage collection ipfs repo gc or automaticaly with the ipfs daemon --enable-gc.

  • What happens if I turn off my computer and then someone else tries to retrieve it?

Theoricaly if an other other people have the file they will send it, but that rare as much of the time you don’t publish one file but many and nodes will only download a few subset of that (eg. someone fetching your website might reprovide the index.html but not your subpages if he havn’t visited them). If you want your file to be reliabely accessible on the network you should think that if you don’t provides your files, no one will (what is good about ipfs is that if lots of peoples download your file they will reshare it, so your bandwith usage scales negatively (more users = more bandwith)).

  • If its lifetime is temporary, how do I make it permanent?

You can either do it manualy using ipfs pin or through a cluster managing your pins using ipfs cluster even if you don’t want to use the cluster feature (syncing the pins of multiples server) I would still advise you to setup a cluster along with your server as this is simple and provides async pinning and more features than the raw go-ipfs (such as names, expirations date, …).

You could also pay a company to host them for you (somes are listed here https://docs.ipfs.io/concepts/persistence/#pinning-services), I’ve personaly tryed the free plan of pinata for a short while and I have nothing to say, it works likes you expect a pinning service to work, they pin your file and provides them, latency and bandwith is very average for what you can expect from any VPS you can setup (you might get better performance if you subscribe to the entreprise option (you will have your own VPS)).

  • Explain pinning, so you can pin files, directory, and there child, …

Pinned objects are never removed from the repo and if they are not in the repo they are downloaded from the network, it’s the canonical way to ensure your files are always available on the ipfs network.

CAS for DID methods

Orie Steele (2021-04-29)

  • If you can handle golang, I think SecureKey’s implementation might be modular enough for you to drop in a small ethereum adapter. (edited)
  • @Troy Ronda (SecureKey) how hard do you think this is?

The core library is agnostic to the ledger and CAS. https://github.com/trustbloc/sidetree-core-go

Troy Ronda (SecureKey) 3 hours ago @Orie Steele (Transmute) @Sebastian Dechant (evan GmbH) The core library is agnostic to the ledger and CAS. https://github.com/trustbloc/sidetree-core-go (edited)

Troy Ronda (SecureKey) 6:54 AM replied to a thread: If you can handle golang, I think SecureKey’s implementation might be modular enough for you to drop in a small ethereum adapter. @Orie Steele (Transmute) @Sebastian Dechant (evan GmbH) The core library is agnostic to the ledger and CAS. https://github.com/trustbloc/sidetree-core-go (edited) 6:58 On top of that library, we have built: a mock ledger Sidetree that we use in BDD tests (https://github.com/trustbloc/sidetree-mock) a fabric-based ledger (https://github.com/trustbloc/sidetree-fabric) an ledger-agnostic version based on ActivityPub, CAS graphs, and anchor origins (https://github.com/trustbloc/orb) so plenty of variation on top of it, if you can handle Golang. 6:58 (and it’s Sidetree v1.0.0).

Sebastian Dechant (evan GmbH) 6:59 AM Thx @Troy Ronda (SecureKey) i'm not deep familiar with go but i will take a look :slightly_smiling_face:

daniel 9:02 AM I'll just say this: please use one of the existing DID Methods or contribute to one, unless you absolutely must create your own. I think we have a lot of good ones across the board, so I'm just hoping people start adopting them and circling the wagons to add strength and momentum (edited)

Tom Jones 9:13 AM do all of the sidetree methods interop at the ipfs level? Or more specifically can i take the IPFS from sidetree, implement that, and then pick the method to layer on top of it? (edited)

Troy Ronda (SecureKey) 9:18 AM There is nuance to it. (a) Sidetree doesn’t require IPFS but rather to a CAS. IPFS is a prime example of a CAS. (b) Different methods support different protocol parameters (so method interop is a harder question). (c) Implementation-wise, the Sidetree methods I know about are using, at least, IPFS conventions. 9:19 Why do I say conventions? In our TrustBloc work, we don’t require usage of the IPFS network but we do use IPFS conventions such as the CID.

Tom Jones 9:19 AM is it possible for sidetree to settle on one CAS?

Troy Ronda (SecureKey) 9:20 AM No - not at a network level. 9:20 At a convention level, I think most (all?) are. (edited) 9:21 In our work, we generally use IPFS go libraries but we don’t require usage of the global IPFS DHT. (edited)

Tom Jones 9:22 AM is the CAS in the the trusbloc code stabile and completely compliant with orb?

Troy Ronda (SecureKey) 9:25 AM There are currently three CAS-related implementations in TrustBloc code.

  1. The Hyperledger Fabric CAS - IPFS objects propagated by a Fabric Network instead of the IPFS network.
  2. WebCAS - for when you can’t use the IPFS global DHT.
  3. ipfs - as in the IPFS network.

Orb focuses on 2 & 3.

  • You can see an example of how the DID string is formulated based on that choice here: https://trustbloc.github.io/did-method-orb/#format
  • btw - we still need to integrate the IPFS network to the Orb codebase - we wanted to have the lowest common denominator sorted out first (webcas).
  • There are also future ideas about if it would make sense to have a more specific IPFS-based DHT - so nodes could avoid fully participating in the ipfs DHT when they only care about Sidetree/Orb files… Not much thought into that so far.

References

  1. Klint Finley, The Inventors of the Internet Are Trying to Build a Truly Permanent Web (2016-06-20) Wired (magazine) https://www.wired.com/2016/06/inventors-internet-trying-build-truly-permanent-web/

Other Material