diff --git a/notebook/data.archives.software-heritage.org b/notebook/data.archives.software-heritage.org new file mode 100644 index 0000000..14e71aa --- /dev/null +++ b/notebook/data.archives.software-heritage.org @@ -0,0 +1,22 @@ +:PROPERTIES: +:ID: 9c85ffb2-fc90-4b38-abce-f0425a2b79de +:END: +#+title: Software Heritage +#+date: 2021-07-25 21:01:45 +08:00 +#+date_modified: 2021-07-27 23:11:52 +08:00 +#+language: en + + +- project link is at https://www.softwareheritage.org/ +- the infrastructure and tools they used is also open source; + primarily happening at [[https://forge.softwareheritage.org/][their software forge]] +- an ambitious project archiving all of humanity's publicly available source code +- primarily made for researchers to easily refer to software; + a centralized database for referring software, in other words +- it is stored in a global merkle tree which each project is given an identifer referred to as Software Heritage persistent identifiers (SWHIDs) +- the archive itself is more of a gigantic merkle tree with the ability to interact with the individualities such as commits, revisions, snapshots, and even the very source code files of an archived repo +- funded from donations including big companies and several not-for-profit foundations +- a big component for [[id:6eeb7a24-b662-46d6-9ece-00a5028ff4d8][Reproducible research]] for other projects such as [[id:3b3fdcbf-eb40-4c89-81f3-9d937a0be53c][Nix package manager]] and [[id:be917383-84c4-4bf5-9ca0-b04bfb778f4f][Guix package manager]] used as a fallback when upstream vanished; + soon enough, it will develop tools to integrate them further such as archiving the code used to build the binary cache +- there is a [[https://archive.softwareheritage.org/][public interface for browsing the archive]] +- they have dedicated resources into creating an infrastructure for creating a centralized reference for software such as a user-local filesystem integrating the archive for development workflow (see [[id:4703f8c2-225c-4c76-a788-af04b84309ac][The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development]]) diff --git a/notebook/data.software-archives.org b/notebook/data.software-archives.org new file mode 100644 index 0000000..d226fa9 --- /dev/null +++ b/notebook/data.software-archives.org @@ -0,0 +1,24 @@ +:PROPERTIES: +:ID: 67ed9916-0e60-44fa-8844-af86727870fc +:END: +#+title: Software archives +#+date: 2021-07-25 19:20:39 +08:00 +#+date_modified: 2021-07-27 17:31:16 +08:00 +#+language: en + + +- [[id:9c85ffb2-fc90-4b38-abce-f0425a2b79de][Software Heritage]] is an ambitious project archiving all of the source code of humanity. + Has a [[https://archive.softwareheritage.org/][browsable interface]] of the total archive. + +- roam:archive.org has [[https://archive.org/details/software][a dedicated section for software]]. + They are especially nice for scouring historical versions of software. + +- [[id:f3f1201a-9fb9-4481-981f-5f50f8982a5e][Arch Linux]] has an [[https://archive.archlinux.org/][archive]] that keeps the snapshots, packages, and repos for a few years before uploading it to their [[https://archive.org/details/archlinuxarchive][historical archive]]. + +- [[id:3b3fdcbf-eb40-4c89-81f3-9d937a0be53c][Nix package manager]] has [[https://cache.nixos.org/][a binary cache]]. + According to [[https://chaos.social/@davidak/106640211327736471][one of the contributors]], it makes up 120TB as of 2021-07-25. + +- [[id:be917383-84c4-4bf5-9ca0-b04bfb778f4f][Guix package manager]] has a binary cache primarily maintained with its [[https://ci.guix.gnu.org/][homegrown continuous integration software]]. + +- roam:F-Droid has an archive for its compiled apps. + From [[https://mastodon.technology/@fdroidorg/106635571616898675][one of their comments]], it is sitting at ~390GB as of 2021-07-25. diff --git a/notebook/literature.allanconSoftwareHeritageFilesystem2021.org b/notebook/literature.allanconSoftwareHeritageFilesystem2021.org new file mode 100644 index 0000000..67ee967 --- /dev/null +++ b/notebook/literature.allanconSoftwareHeritageFilesystem2021.org @@ -0,0 +1,24 @@ +:PROPERTIES: +:ID: 4703f8c2-225c-4c76-a788-af04b84309ac +:ROAM_REFS: cite:allanconSoftwareHeritageFilesystem2021 +:END: +#+title: The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development +#+date: 2021-07-27 17:06:14 +08:00 +#+date_modified: 2021-07-27 23:07:36 +08:00 +#+published: 2021-02-12 +#+author: Allançon, T., Pietri, A., & Zacchiroli, S. +#+source: http://arxiv.org/abs/2102.06390 +#+language: en + + +- primarily features =swh-fuse=, a utility allowing to mount software from Software Heritage to your local environment +- it is based from POSIX filesystems built with FUSE framework; + as such it does not require root privileges to use it +- it exposes the global merkle tree as a filesystem along with its metadata, archive, etc. +- it can interact with the objects in the merkle tree such as the source code files, commits, snapshots, etc. +- the tool is essentially a FUSE adapter to Software Heritage API + +features: +- the tool loads the archives lazily to save bandwidth and disk space +- caches for performance especially with how bad remote filesystems can be +- reduces redundancy by using symlinks extensively diff --git a/notebook/tools.swh-fuse.org b/notebook/tools.swh-fuse.org new file mode 100644 index 0000000..a49ce2c --- /dev/null +++ b/notebook/tools.swh-fuse.org @@ -0,0 +1,43 @@ +#+title: swh-fuse +#+date: 2021-07-27 21:02:05 +08:00 +#+date_modified: 2021-07-27 22:08:39 +08:00 +#+language: en + + +A tool to interact with the Software Heritage Filesystem (SwhFS); +you can see [[id:4703f8c2-225c-4c76-a788-af04b84309ac][The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development]] paper for an introduction. + +Some details about the tool itself... + +- It is mainly used with the =swh fs= subcommand. + +- To mount the filesystem itself, use =swh fs mount DIRECTORY=. + +When mounted, the directory should have the following structure: + +#+begin_src +swhfs +├── archive/ +├── cache/ +├── origin/ +└── README +#+end_src + +- =archive/= is the entry point for the archived repos in the library; + the files inside there cannot be listed (e.g., =ls=, file managers) + but you can access the files inside of it (e.g., text editors, file openers) +- =cache/= contains on-disk representation of metadata +- =origin/= is where mounting of origins with an encoded URL + +For up-to-date information, you can read the =README= file. + +With the complete setup, you are now ready to interact with the filesystem. +The point of interest here is the =archive/= directory which holds all of the archived source code in the library. + +You can interact with it by accessing one of the repo through their SWHID. + +#+begin_src shell :eval no +ls swhfs/archive/${SWHID} +#+end_src + +The tool lazily loads the repo, saving bandwidth and disk space.