mirror of
https://github.com/foo-dogsquared/wiki.git
synced 2025-01-31 01:57:54 +00:00
Add notes on archives
This commit is contained in:
parent
a71918683d
commit
48ed6ede94
22
notebook/data.archives.software-heritage.org
Normal file
22
notebook/data.archives.software-heritage.org
Normal file
@ -0,0 +1,22 @@
|
||||
:PROPERTIES:
|
||||
:ID: 9c85ffb2-fc90-4b38-abce-f0425a2b79de
|
||||
:END:
|
||||
#+title: Software Heritage
|
||||
#+date: 2021-07-25 21:01:45 +08:00
|
||||
#+date_modified: 2021-07-27 23:11:52 +08:00
|
||||
#+language: en
|
||||
|
||||
|
||||
- project link is at https://www.softwareheritage.org/
|
||||
- the infrastructure and tools they used is also open source;
|
||||
primarily happening at [[https://forge.softwareheritage.org/][their software forge]]
|
||||
- an ambitious project archiving all of humanity's publicly available source code
|
||||
- primarily made for researchers to easily refer to software;
|
||||
a centralized database for referring software, in other words
|
||||
- it is stored in a global merkle tree which each project is given an identifer referred to as Software Heritage persistent identifiers (SWHIDs)
|
||||
- the archive itself is more of a gigantic merkle tree with the ability to interact with the individualities such as commits, revisions, snapshots, and even the very source code files of an archived repo
|
||||
- funded from donations including big companies and several not-for-profit foundations
|
||||
- a big component for [[id:6eeb7a24-b662-46d6-9ece-00a5028ff4d8][Reproducible research]] for other projects such as [[id:3b3fdcbf-eb40-4c89-81f3-9d937a0be53c][Nix package manager]] and [[id:be917383-84c4-4bf5-9ca0-b04bfb778f4f][Guix package manager]] used as a fallback when upstream vanished;
|
||||
soon enough, it will develop tools to integrate them further such as archiving the code used to build the binary cache
|
||||
- there is a [[https://archive.softwareheritage.org/][public interface for browsing the archive]]
|
||||
- they have dedicated resources into creating an infrastructure for creating a centralized reference for software such as a user-local filesystem integrating the archive for development workflow (see [[id:4703f8c2-225c-4c76-a788-af04b84309ac][The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development]])
|
24
notebook/data.software-archives.org
Normal file
24
notebook/data.software-archives.org
Normal file
@ -0,0 +1,24 @@
|
||||
:PROPERTIES:
|
||||
:ID: 67ed9916-0e60-44fa-8844-af86727870fc
|
||||
:END:
|
||||
#+title: Software archives
|
||||
#+date: 2021-07-25 19:20:39 +08:00
|
||||
#+date_modified: 2021-07-27 17:31:16 +08:00
|
||||
#+language: en
|
||||
|
||||
|
||||
- [[id:9c85ffb2-fc90-4b38-abce-f0425a2b79de][Software Heritage]] is an ambitious project archiving all of the source code of humanity.
|
||||
Has a [[https://archive.softwareheritage.org/][browsable interface]] of the total archive.
|
||||
|
||||
- roam:archive.org has [[https://archive.org/details/software][a dedicated section for software]].
|
||||
They are especially nice for scouring historical versions of software.
|
||||
|
||||
- [[id:f3f1201a-9fb9-4481-981f-5f50f8982a5e][Arch Linux]] has an [[https://archive.archlinux.org/][archive]] that keeps the snapshots, packages, and repos for a few years before uploading it to their [[https://archive.org/details/archlinuxarchive][historical archive]].
|
||||
|
||||
- [[id:3b3fdcbf-eb40-4c89-81f3-9d937a0be53c][Nix package manager]] has [[https://cache.nixos.org/][a binary cache]].
|
||||
According to [[https://chaos.social/@davidak/106640211327736471][one of the contributors]], it makes up 120TB as of 2021-07-25.
|
||||
|
||||
- [[id:be917383-84c4-4bf5-9ca0-b04bfb778f4f][Guix package manager]] has a binary cache primarily maintained with its [[https://ci.guix.gnu.org/][homegrown continuous integration software]].
|
||||
|
||||
- roam:F-Droid has an archive for its compiled apps.
|
||||
From [[https://mastodon.technology/@fdroidorg/106635571616898675][one of their comments]], it is sitting at ~390GB as of 2021-07-25.
|
@ -0,0 +1,24 @@
|
||||
:PROPERTIES:
|
||||
:ID: 4703f8c2-225c-4c76-a788-af04b84309ac
|
||||
:ROAM_REFS: cite:allanconSoftwareHeritageFilesystem2021
|
||||
:END:
|
||||
#+title: The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development
|
||||
#+date: 2021-07-27 17:06:14 +08:00
|
||||
#+date_modified: 2021-07-27 23:07:36 +08:00
|
||||
#+published: 2021-02-12
|
||||
#+author: Allançon, T., Pietri, A., & Zacchiroli, S.
|
||||
#+source: http://arxiv.org/abs/2102.06390
|
||||
#+language: en
|
||||
|
||||
|
||||
- primarily features =swh-fuse=, a utility allowing to mount software from Software Heritage to your local environment
|
||||
- it is based from POSIX filesystems built with FUSE framework;
|
||||
as such it does not require root privileges to use it
|
||||
- it exposes the global merkle tree as a filesystem along with its metadata, archive, etc.
|
||||
- it can interact with the objects in the merkle tree such as the source code files, commits, snapshots, etc.
|
||||
- the tool is essentially a FUSE adapter to Software Heritage API
|
||||
|
||||
features:
|
||||
- the tool loads the archives lazily to save bandwidth and disk space
|
||||
- caches for performance especially with how bad remote filesystems can be
|
||||
- reduces redundancy by using symlinks extensively
|
43
notebook/tools.swh-fuse.org
Normal file
43
notebook/tools.swh-fuse.org
Normal file
@ -0,0 +1,43 @@
|
||||
#+title: swh-fuse
|
||||
#+date: 2021-07-27 21:02:05 +08:00
|
||||
#+date_modified: 2021-07-27 22:08:39 +08:00
|
||||
#+language: en
|
||||
|
||||
|
||||
A tool to interact with the Software Heritage Filesystem (SwhFS);
|
||||
you can see [[id:4703f8c2-225c-4c76-a788-af04b84309ac][The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development]] paper for an introduction.
|
||||
|
||||
Some details about the tool itself...
|
||||
|
||||
- It is mainly used with the =swh fs= subcommand.
|
||||
|
||||
- To mount the filesystem itself, use =swh fs mount DIRECTORY=.
|
||||
|
||||
When mounted, the directory should have the following structure:
|
||||
|
||||
#+begin_src
|
||||
swhfs
|
||||
├── archive/
|
||||
├── cache/
|
||||
├── origin/
|
||||
└── README
|
||||
#+end_src
|
||||
|
||||
- =archive/= is the entry point for the archived repos in the library;
|
||||
the files inside there cannot be listed (e.g., =ls=, file managers)
|
||||
but you can access the files inside of it (e.g., text editors, file openers)
|
||||
- =cache/= contains on-disk representation of metadata
|
||||
- =origin/= is where mounting of origins with an encoded URL
|
||||
|
||||
For up-to-date information, you can read the =README= file.
|
||||
|
||||
With the complete setup, you are now ready to interact with the filesystem.
|
||||
The point of interest here is the =archive/= directory which holds all of the archived source code in the library.
|
||||
|
||||
You can interact with it by accessing one of the repo through their SWHID.
|
||||
|
||||
#+begin_src shell :eval no
|
||||
ls swhfs/archive/${SWHID}
|
||||
#+end_src
|
||||
|
||||
The tool lazily loads the repo, saving bandwidth and disk space.
|
Loading…
Reference in New Issue
Block a user