Add notes on archives

This commit is contained in:
Gabriel Arazas 2021-07-27 23:13:13 +08:00
parent a71918683d
commit 48ed6ede94
4 changed files with 113 additions and 0 deletions

View File

@ -0,0 +1,22 @@
:PROPERTIES:
:ID: 9c85ffb2-fc90-4b38-abce-f0425a2b79de
:END:
#+title: Software Heritage
#+date: 2021-07-25 21:01:45 +08:00
#+date_modified: 2021-07-27 23:11:52 +08:00
#+language: en
- project link is at https://www.softwareheritage.org/
- the infrastructure and tools they used is also open source;
primarily happening at [[https://forge.softwareheritage.org/][their software forge]]
- an ambitious project archiving all of humanity's publicly available source code
- primarily made for researchers to easily refer to software;
a centralized database for referring software, in other words
- it is stored in a global merkle tree which each project is given an identifer referred to as Software Heritage persistent identifiers (SWHIDs)
- the archive itself is more of a gigantic merkle tree with the ability to interact with the individualities such as commits, revisions, snapshots, and even the very source code files of an archived repo
- funded from donations including big companies and several not-for-profit foundations
- a big component for [[id:6eeb7a24-b662-46d6-9ece-00a5028ff4d8][Reproducible research]] for other projects such as [[id:3b3fdcbf-eb40-4c89-81f3-9d937a0be53c][Nix package manager]] and [[id:be917383-84c4-4bf5-9ca0-b04bfb778f4f][Guix package manager]] used as a fallback when upstream vanished;
soon enough, it will develop tools to integrate them further such as archiving the code used to build the binary cache
- there is a [[https://archive.softwareheritage.org/][public interface for browsing the archive]]
- they have dedicated resources into creating an infrastructure for creating a centralized reference for software such as a user-local filesystem integrating the archive for development workflow (see [[id:4703f8c2-225c-4c76-a788-af04b84309ac][The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development]])

View File

@ -0,0 +1,24 @@
:PROPERTIES:
:ID: 67ed9916-0e60-44fa-8844-af86727870fc
:END:
#+title: Software archives
#+date: 2021-07-25 19:20:39 +08:00
#+date_modified: 2021-07-27 17:31:16 +08:00
#+language: en
- [[id:9c85ffb2-fc90-4b38-abce-f0425a2b79de][Software Heritage]] is an ambitious project archiving all of the source code of humanity.
Has a [[https://archive.softwareheritage.org/][browsable interface]] of the total archive.
- roam:archive.org has [[https://archive.org/details/software][a dedicated section for software]].
They are especially nice for scouring historical versions of software.
- [[id:f3f1201a-9fb9-4481-981f-5f50f8982a5e][Arch Linux]] has an [[https://archive.archlinux.org/][archive]] that keeps the snapshots, packages, and repos for a few years before uploading it to their [[https://archive.org/details/archlinuxarchive][historical archive]].
- [[id:3b3fdcbf-eb40-4c89-81f3-9d937a0be53c][Nix package manager]] has [[https://cache.nixos.org/][a binary cache]].
According to [[https://chaos.social/@davidak/106640211327736471][one of the contributors]], it makes up 120TB as of 2021-07-25.
- [[id:be917383-84c4-4bf5-9ca0-b04bfb778f4f][Guix package manager]] has a binary cache primarily maintained with its [[https://ci.guix.gnu.org/][homegrown continuous integration software]].
- roam:F-Droid has an archive for its compiled apps.
From [[https://mastodon.technology/@fdroidorg/106635571616898675][one of their comments]], it is sitting at ~390GB as of 2021-07-25.

View File

@ -0,0 +1,24 @@
:PROPERTIES:
:ID: 4703f8c2-225c-4c76-a788-af04b84309ac
:ROAM_REFS: cite:allanconSoftwareHeritageFilesystem2021
:END:
#+title: The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development
#+date: 2021-07-27 17:06:14 +08:00
#+date_modified: 2021-07-27 23:07:36 +08:00
#+published: 2021-02-12
#+author: Allançon, T., Pietri, A., & Zacchiroli, S.
#+source: http://arxiv.org/abs/2102.06390
#+language: en
- primarily features =swh-fuse=, a utility allowing to mount software from Software Heritage to your local environment
- it is based from POSIX filesystems built with FUSE framework;
as such it does not require root privileges to use it
- it exposes the global merkle tree as a filesystem along with its metadata, archive, etc.
- it can interact with the objects in the merkle tree such as the source code files, commits, snapshots, etc.
- the tool is essentially a FUSE adapter to Software Heritage API
features:
- the tool loads the archives lazily to save bandwidth and disk space
- caches for performance especially with how bad remote filesystems can be
- reduces redundancy by using symlinks extensively

View File

@ -0,0 +1,43 @@
#+title: swh-fuse
#+date: 2021-07-27 21:02:05 +08:00
#+date_modified: 2021-07-27 22:08:39 +08:00
#+language: en
A tool to interact with the Software Heritage Filesystem (SwhFS);
you can see [[id:4703f8c2-225c-4c76-a788-af04b84309ac][The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development]] paper for an introduction.
Some details about the tool itself...
- It is mainly used with the =swh fs= subcommand.
- To mount the filesystem itself, use =swh fs mount DIRECTORY=.
When mounted, the directory should have the following structure:
#+begin_src
swhfs
├── archive/
├── cache/
├── origin/
└── README
#+end_src
- =archive/= is the entry point for the archived repos in the library;
the files inside there cannot be listed (e.g., =ls=, file managers)
but you can access the files inside of it (e.g., text editors, file openers)
- =cache/= contains on-disk representation of metadata
- =origin/= is where mounting of origins with an encoded URL
For up-to-date information, you can read the =README= file.
With the complete setup, you are now ready to interact with the filesystem.
The point of interest here is the =archive/= directory which holds all of the archived source code in the library.
You can interact with it by accessing one of the repo through their SWHID.
#+begin_src shell :eval no
ls swhfs/archive/${SWHID}
#+end_src
The tool lazily loads the repo, saving bandwidth and disk space.