Immutable Logs

From MgmtWiki
Jump to: navigation, search

Full Title or Meme

Data that cannot be changed, but can be appended.

Context

Immutable data is a piece of information in a database that cannot be deleted or modified. Immutable data is highly useful for auditing and debugging. A real-life example of immutable data is someone’s medical records. Over the years, a person might have sought treatment for different ailments. The medical record consists of various prescriptions, procedures, and test reports. These data files are immutable. For example, when a person receives a new medical prescription, their old prescription should not be overwritten. Instead, the database should append the new data to the existing one. Historical medical data is a classic example of immutable data.[1]

Immutable logs are log files protected from tampering and erroneous insertion.[2] They are highly useful for auditing and debugging purposes. Depending on the implementation, the files can have additional protections from poisoning and fictional recreation/forgery. Secure logs are an event logging technology called immutable logs.[3] This technology implements cryptographic measures to preserve the integrity and authenticity of logs, without compromising the performance of the system. An immutable audit log is a recording of how a system has been used.

Certificate Transparency

Transparency logs are a powerful tool for storing information and presenting it to all users in such a way that they can all verify they see the same entries. Originally deployed for Certificate Transparency over a decade ago, logs are now being used to provide tamper evidence for other ecosystems such as binary transparency and AI model transparency. When a transparency log is used correctly in a tight feedback loop it allows for timely detection and response to malfeasance, forming an important part of security response to protect users.

What does it mean to verify a log. Readers familiar with verifiable logs probably have an idea about what “verifying a log” means, which is likely one of the following:

  1. Verifying that an entry is included in a log
  2. Verifying that a new checkpoint for a log is consistent with any previous version (i.e. the log has grown in an append-only fashion)
  3. Verifying that all users are seeing the same entries in a log
  4. Verifying the entries in a log to discover any malfeasance

The fact that logs can be verified on so many levels is both a blessing and a curse. The blessing is that the first three verification options cover all of the properties of transparency logs needed to provide security guarantees. When these are fully verified, nobody needs to trust the log operator; if the log operator is misbehaving then they will be caught. The curse is that one or more of these verification options is often forgotten because it is easy to fall into the trap of believing that the log is already verified after performing some subset of the verification checks.

Note that the paragraph above delineates these checks into two subgroups:

  • Checks 1-3 verify that the log operator is behaving correctly
  • Check 4 verifies that the entries in the log are safe to rely on and aren’t evidence of malicious activity

Checking for correct log operation is a well beaten path at this point; libraries for verifying inclusion and consistency are available at github.com/transparency-dev/merkle, and witnessing libraries are available at github.com/transparency-dev/witness. This verification is standard across all logs that use the standard checkpoint format.

The rest of the article will discuss the remaining verification check: looking for evidence of malfeasance stored in the log. This check is arguably the most important, and is the primary motivation for introducing transparency logs: “sunlight is the best disinfectant” after all. Once a log has integrated an entry, an appropriate party must verify the contents of that entry in a timely manner. This verification must go beyond checking the cryptographic log proofs because an entry being present in a log does not mean that this entry is good. Lies can also be recorded in logs. Prompt verification of log entries allows such lies to be detected and corrective action taken, ideally before harm is caused.

A quick analogy: store CCTV Many large stores have a CCTV security system. If nobody is watching this footage then theft can go undetected indefinitely. Even if the theft is detected via other security practices, e.g. a stock count, then determining how and when the theft occurred will require someone to look through all of the footage.

If this footage was being tracked in real-time, then the thief could be apprehended before they had got away with the crime.

Transparency logs are like this CCTV in that they record everything that happens, but the security benefits are only realized if this recorded data is verified in a timely manner. If you verify the data close to real time, you can prevent fraud or other malicious activity. However, if you don’t verify the data until after the fact, the entries are only useful for forensics once the crime is complete and detected via its impact in the real world. Either way the perpetrator will be dealt with, but less damage will occur if they are caught sooner.

The role of Verifiers

Clearly verifying entries in a log is just as important as putting entries in a log to realize security benefits from transparency. Any ecosystem trying to drive an increase in security through transparency logs must have one or more entities that are verifying the correctness of entries in the log. Such verifiers must understand how to parse and interpret an entry in order to verify it. This is quite different from verifiers that operate only on the log as a generic verifiable data structure, such as a log witness.

Some real world examples:

  • Certificate Transparency: each entry is an X.509 certificate issued by Certificate Authorities on behalf of a Domain Owner. The only actor that can verify a certificate to ensure that it hasn’t been misissued is the Domain Owner.
  • Go Sum DB: each entry is a tuple of {module, version, hash}. Any actor can verify that no module + version is mapped to more than one hash within the log.
  • Sigstore: each entry contains a tuple of {identity, certificate, signature}. Only the identity owner can verify that these signatures were created on their behalf, and not as the result of key, identity or service compromise.
  • USBArmory Firmware Transparency: each entry contains a tuple of {git commit hash, toolchain, binary hash}. Any actor can check out the code at the given commit, run the build toolchain, and verify that a binary with the given hash is the resulting build artefact.

Some of these ecosystems also have additional verifiers. For example, USBArmory Firmware Transparency also logs a signature, which enables the owner of the release keys to verify that no releases were made without their knowledge. This is similar to Sigstore identity verification.

The Claimant Model[4] provides precise terminology to allow clear analysis and discussion of the different claims that verifiers check.

In practice

Unlike most users of a log, verifiers must download and inspect every entry at least once. This is required even if the verifier is only responsible for verifying a subset of entries that match a particular condition. The reason for this is that logs cannot verifiably prove that they have returned all entries that match a particular condition, even if the log operator runs a search service alongside the log. Consequently, a verifier that operated only using a search service could be misled into believing they had verified all of the entries that applied to them, but bad results not returned in the search would remain unverified. The good news is that verifiable maps can provide log operators, or indeed anyone else, a way to run a verifiable search service.

To see these principles in practice, let’s look at a concrete implementation. The Go Sum DB Verifier can be deployed by anyone. When running, it downloads all of the entries from the log into a local database, and then verifies the entries in the log using this local database. Initially this runs in a batch mode to quickly copy the log to the local storage. After completing the initial download, the log is polled frequently to check for new entries. All downloaded entries are cryptographically checked to make sure the local copy is the same as the log publicly commits to. The verifier periodically checks the local copy of the log in the database to ensure that there are no duplicate entries.

Writing a verifier for another log should be possible using this verifier as a template. Much of the code remains the same, with customization only required for:

Binding the clone tool to a different log API The verification logic

Solutions

  1. Ecosystem designers: be clear about what the entries in logs are claiming, and be thoughtful about who can verify the truth of these claims. Provide open source tooling to help verification, and be clear in your messaging that verification of logs contents is an important part of your security.
  2. Ecosystem participants: verify logs you depend on if possible. If you can’t verify claims yourself, ensure that someone is verifying them. If you depend on a claim that is never verified, you could be tricked and it might never be detected.

Learn more about tamper-evident logs

  • Benefits of tamper-evident logs
  • Trillian: an open-source verifiable log
  • Verifiable Data Structures
  • Applications of tamper-evident logs
  • Add tamper-checking to a package manager
  • Reliably log all actions performed on your servers
  • Strengthen discovery of encryption keys with Key Transparency
  • Discourage misbehaviour by third parties in Certificate Transparency

References

  1. Tibco, What is Immutable Data? https://www.tibco.com/reference-center/what-is-immutable-data
  2. Securosis, Immutable Log Files https://securosis.com/blog/immutable-log-files
  3. Jordi Cucurull & Jordi Puiggalí, Distributed Immutabilization of Secure Logs https://link.springer.com/chapter/10.1007/978-3-319-46598-2_9
  4. Transparency, How to design a verifiable system https://transparency.dev/how-to-design-a-verifiable-system/