wiki/notebook/data.archives.software-heritage.org
Gabriel Arazas 549f476c4c Update the notebook
The topics I've covered so far for Linux, package managers, archiving,
and learning.

I also updated some formatting for other notes especially with the
command line references.
2021-07-29 23:26:51 +08:00

2.2 KiB

Software Heritage

  • project link is at https://www.softwareheritage.org/
  • the infrastructure and tools they used is also open source; primarily happening at their software forge
  • an ambitious project archiving all of humanity's publicly available source code
  • primarily made for researchers to easily refer to software; a centralized database for referring software similar to digital object identifiers (DOI) in research materials and ISBN for books
  • the archive itself is more of a gigantic merkle tree with the ability to interact with the individual objects such as commits, revisions, snapshots, and even the very source code files of an archived repo
  • it is version control software-agnostic; archived software from several sources (e.g., Git, Mercurial)
  • each object is given an identifer referred to as Software Heritage persistent identifiers (SWHIDs)
  • funded from donations including big companies and several not-for-profit foundations
  • a big component for Reproducible research for other projects such as Nix package manager and Guix package manager used as a fallback when upstream vanished; soon enough, it will develop tools to integrate them further such as archiving the code used to build the binary cache
  • there is a public interface for browsing the archive
  • they have dedicated resources into creating an infrastructure for creating a centralized reference for software such the following list

    • swh.fuse, a tool that integrates the archive into a user-local filesystem integrating the archive for development workflow
    • roam:swh.search adding the search functionality in the archive
    • roam:swh.lister lists from several forges (e.g., GitHub, GitLab)