Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

darwin: encrypt nix volume if filevault is enabled #4181

Closed
wants to merge 37 commits into from

Conversation

abathur
Copy link
Member

@abathur abathur commented Oct 23, 2020

Motive/Goal

Get single-invocation curl -L https://nixos.org/nix/install | sh installs working on macOS Catalina+


Recap in case it helps anyone:

  • To support read-only / in Catalina, we updated the installer to move /nix to its own volume.
  • We wanted to encrypt this volume when FileVault FDE was enabled on the system to meet the user's expectation that their data was encrypted, but didn't find an approach we could confidently recommend.
  • To sate the Nix-starved mob, we sliced off all of the cases we felt like we could handle, hid them behind the --darwin-use-unencrypted-nix-store-volume scare-flag to make sure people with Opinions and/or compliance concerns would pay attention, and left the remainder (non-T2 devices with FileVault enabled) to manually prepare their volumes.

We've recently confirmed that we can skirt the race condition by using a RunAtLoad LaunchDaemon to unlock and mount the volume. (I've tested straightforward cases like restoring editors with open files on the /nix volume, and restoring .apps managed by Nix. It may still be possible to force race conditions with multiple RunAtLoad daemons, but I think it's OK to work around those as-needed with wait4path.)

Status

  • PR currently works
    • tested locally with FileVault enabled on Big Sur beta using both single & multi-user installs
    • install working in CI on matrix of FileVault yes/no, Catalina/Big Sur, --daemon --no-daemon
  • To finish the code out, we need to haggle over how this should affect single-user mode and "default" macOS installs. (see upcoming 2nd post for more on this): blocked single-user macOS installs
  • To finish the code out, we need to haggle over the curing/uninstall concept?
  • Need doc edits once code is baked.

Try it out?

I have a temporary installer up at https://files.t-ravis.com/install-volume, so curl https://files.t-ravis.com/install-volume | sh should work. If you can't try it out, you can see output from the CI run (same one listed earlier). I refresh it to test larger functionality changes, but it may be a few commits behind if they don't merit the chore of regenerating. Here's a history of the versions it has pointed at, with the current at the bottom:

@LnL7 @lilyball @matthewbauer @dhess @mroi @domenkozar (others I'm forgetting are inevitably also interested; feel free to add people)

Broad strokes
- mount the volume with a launchdaemon instead of fstab; fstab isn't
  processed at the right time to avoid race conditions for mounting
  and decrypting /nix in time for programs that need to restore
- generate a random credential and store it in keychain; both the
  system and login keychains appear to be compatible with this
  approach
This should make the keychain entry we add compatible with the way
the system tries to check keychain for entries (it appears to look
for an entry with a "service" or "where" field == VolumeUUID).
Making the FS overridable makes it easier to play with 'Case-sensitive APFS'
in CI or on local systems. This will help people explore whether it improves
compatibility with nixpkgs (or creates more trouble than it's worth...).
@abathur
Copy link
Member Author

abathur commented Oct 26, 2020

I could use help hammering out an issue with as many nix+macOS stakeholders as we can round up on short notice.

I have opinions here, but my overriding interest is finding the quickest path to single-invocation macOS Nix installs for Catalina and Big Sur that Just Work without making prospective users crawl through glass anymore. I'm going to try to sketch out the facts and and opinions I'm already aware of without staking anyone down by naming names. Please point out anything I'm missing!

Question

Should this approach apply to single-user installs, or not? (and, in either case, how)

Context

My rough understanding is that single-user installs are intended to be simpler, work even if Nix doesn't support the service manager, install without root, (and be simple to uninstall?)

Starting with Catalina, I think it's fair to characterize both single and multi-user macOS installs as already violating:

  • the "install without root" goal (AFAIR, we currently need sudo to edit /etc/synthetic.conf and /etc/fstab, figure out if the device has a T2 chip, and create/mount the volume.)
  • the "simple to uninstall" goal (the intuitive rm -rf / isn't enough because the system root is read-only; you really need to do something like remove nix from fstab and synthetic.conf, remove the Nix volume, and reboot)

The PR as it stands:

  • adds a few additional sudo invocations to interact with the security and launchctl commands, write the LaunchDaemon, and a few more
  • adds an "active" component (the LaunchDaemon); note: the current .plist "KeepAlive" setting attempts to run until a path we expect to be on the volume is available; this can and should be refined to limit the resources it can waste if the volume isn't mountable for some reason
  • the user might fail to uninstall this active component (it may be possible to make it self-disable or self-destruct to avoid wasting resources if forgotten?)
  • when FileVault is enabled, generate a random password, use it to encrypt the new volume, and insert this credential in the system keychain (where it could be read by other users with admin access)

There are also some more general concerns here, like minimizing the number of possible install variants.

Solutions?

  • include the LaunchDaemon mounter with multi-user installs; block single-user installs when FileVault is enabled (i.e., only support FileVault/FDE in daemon mode) and forward users who try it to the docs; change the default macOS install to multi-user mode @LnL7
  • include the LaunchDaemon mounter with both single and multi-user installs
  • prompt the user (for both type of install?) to accept the LaunchDaemon (explain the purpose of the daemon, show them the full plist, clarify the consequences of not accepting it)
  • entirely disable single-user installs for macOS @domenkozar @dhess @lilyball (maybe?) @thefloweringash (via thumbs-up?)

I'll edit in any suggestions that don't boil down to one of the above, and note apparent votes/endorsements (but do correct me if I misrepresent you).

Here's a related IRC discussion that has bearing on any path that makes multi-user the default:

emily | we should really ship an uninstaller
emily | or people will hate us for adding 32 users and a volume and etc.
emily | I guess the docs are at least pretty clear about what you have to undo
  LnL | yeah, that part will be less nice with the switch
  LnL | there are also a few other small tweaks that could be done to make it a bit better in a few cases
  LnL | eg. does the channel really have to be configured for root by default?
  LnL | and the installer could setup both /etc/bashrc and ~/.bashrc so the issue with updates doesn't really bother people that only use a single account
emily | defaulting to fewer users also doesn't solve /etc/synthetic.conf(.d), /etc/fstab, diskutil apfs blahblah, deleting the group, ...
LnL | yeah, it's too complicated to do manually now

@domenkozar
Copy link
Member

As per my NixCon talk entirely disable single-user installs for macOS is the way forward and you lay down the complexity of such division by just a small tiny portion of the installation logic.

@edolstra
Copy link
Member

@domenkozar I thought you were in favor of getting rid of multi-user installs?

you lay down the complexity of such division by just a small tiny portion of the installation logic.

Hm, I have trouble parsing this...

@domenkozar
Copy link
Member

domenkozar commented Oct 26, 2020

I am in favor of reducing complexity across whole Nix code base that impacts users and our support.

One of the factors is branching of logic per installation method. I'm in favor of merging them, if that's the pragmatic solution for macOS by defaulting to multi-user installations.

@dhess
Copy link

dhess commented Oct 26, 2020

As per my NixCon talk entirely disable single-user installs for macOS is the way forward and you lay down the complexity of such division by just a small tiny portion of the installation logic.

I completely agree. I've been solely using multi-user installs on all of my Macs for the past few years. It works reliably and is more secure than single-user, to boot.

@dhess
Copy link

dhess commented Oct 26, 2020

@abathur Thank you for all of your work on this. It's a (mostly) thankless task and full of corner cases, but greatly appreciated.

@lilyball
Copy link
Member

I was actually planning on looking into this problem this very week, as I'm currently working on figuring out how to deploy Nix to my team 😅 I have not yet looked at what this PR does but here are my immediate thoughts:

  1. Last time I looked into this, it appeared that putting the volume passphrase into the system keychain and granting access to a few specific processes allowed the system to mount it automatically upon boot, just as it mounts the volume when unencrypted. This is mentioned in this PR comment and the reference for this is this third-party install script. According to that script, the keychain entry needs to be accessible by APFSUserAgent and CSUserAgent. If this approach just works, that's fantastic. My worry here is just that the need to grant access to 2 specific daemons is undocumented and it's not clear if this will be reliable going forward.
  2. I like the convenience of single-user as I don't have to use sudo to update my channels, and the conceptual simplicity of only having one profile instead of having default + per-user is nice. I worry that when introducing Nix to colleagues, a multi-user install might lead them to accidentally installing some stuff in the default profile and some in the per-user profile (though my intended usage of Nix does not have them explicitly installing anything at all). Having said that, the fact that there are two different ways of installing Nix, and the fact that the shell setup is different for these, and the usage patterns are different, is annoying. Ultimately, it's probably a good idea to remove single-user install, given that we need sudo to install it anyway. Single-user install is probably only worth having if we have a sudoless installation (which is never going to happen as that means the installation path would be relative to the user's home folder and would break all binary caches).

@lilyball
Copy link
Member

Regarding the keychain thing, I wonder if someone can expend an Apple developer account incident to see if we can get in touch with an Apple filesystem engineer in order to confirm whether the "grant keychain access to these 2 processes" is reliable going forward, or whether there's any alternative supported mechanism.

@lilyball
Copy link
Member

adds an "active" component (the LaunchDaemon); note: the current .plist "KeepAlive" setting attempts to run until a path we expect to be on the volume is available; this can and should be refined to limit the resources it can waste if the volume isn't mountable for some reason

What is the reason for using KeepAlive here? Do we actually expect it to fail to mount a few times before it successfully mounts? With it set up this way, if I manually unmount the volume it will be expected to immediately remount it, which seems rather odd.

scripts/create-darwin-volume.sh Outdated Show resolved Hide resolved
scripts/create-darwin-volume.sh Outdated Show resolved Hide resolved
scripts/create-darwin-volume.sh Outdated Show resolved Hide resolved
scripts/create-darwin-volume.sh Outdated Show resolved Hide resolved
scripts/create-darwin-volume.sh Outdated Show resolved Hide resolved
scripts/create-darwin-volume.sh Outdated Show resolved Hide resolved
scripts/install-nix-from-closure.sh Outdated Show resolved Hide resolved
scripts/install-nix-from-closure.sh Show resolved Hide resolved
Comment on lines 229 to 231
if ! test_voldaemon; then
echo "Configuring LaunchDaemon to mount '$volume'..." >&2
generate_mount_daemon | sudo tee /Library/LaunchDaemons/org.nixos.darwin-store.plist >/dev/null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check if the volume is encrypted first. If this is using a pre-existing unencrypted Nix volume, then the LaunchDaemon is superfluous (and the keychain entry won't exist).

For that matter, if it's using a pre-existing Nix volume at all, then there's no guarantee about the keychain entry's status. We should probably actually just skip this if we're not creating the volume. Anyone who creates their own volume is then on the hook for managing its mounting, and we can print a warning to this effect in the "Using existing volume" path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wrestled with this a bit. The latest update may resolve it, though that's contingent on what people think of the "curing" approach (and it likely needs logical refinement).

Basically, I've cut attempts to infer user intent or do anything magic based on what exists. If the root is read-only and NIX_VOLUME_CREATE=1 (the default), the installer will want to create a volume (encrypted if filevault is on), and the "curing" version currently enabled asks the user for permission to remove anything that conflicts (volume, keychain credentials, fstab, synthetic.conf, daemon) before it'll install.

I've gone to a little pain to set up create-darwin-volume.sh so that it'll still behave roughly as it currently does if invoked with no arguments, but to also support passing a function to call when you source or invoke it. I won't say there's a clear path to it yet since I haven't thought through the logistics of scripting this exact case, but I'm inclined to close off some options for now and focus on getting the golden path right--but make the installer scripts flexible enough that people can leverage them rather than just copy and customize them?

scripts/install-nix-from-closure.sh Outdated Show resolved Hide resolved
@abathur
Copy link
Member Author

abathur commented Oct 29, 2020

So you're saying if the volume is encrypted with the password stored in the system keychain, and I sit at the login screen without logging in, it will never mount?

Yes (with FileVault enabled, at least).

I'm also wondering if there's any way we can set up automounting for the volume. The info I'm seeing about /etc/auto_master is all about how to automount a remote volume, but if there's a way for us to mount a local volume, I'm wondering if that would then mean anything that tries to access /nix would block on the volume mounting (assuming of course that doing this would work with the system keychain approach pre-login).

I tried automount a month or so back, but it was also too late to solve the problem.

@mroi
Copy link

mroi commented Oct 30, 2020

I tried automount a month or so back, but it was also too late to solve the problem.

It may be late, but it should block any process trying to access a file there until the volume is mounted? And I am pretty sure you can automount local volumes as well.

@abathur
Copy link
Member Author

abathur commented Oct 30, 2020

I am pretty sure you can automount local volumes as well.

Yes.

It may be late, but it should block any process trying to access a file there until the volume is mounted?

Perhaps, but it didn't help apps restore and doesn't solve the problem.

@abathur
Copy link
Member Author

abathur commented Oct 30, 2020

@mroi @lilyball I think I caught myself lying about automount working with local volumes. Are either of you familiar enough with it to verify?

I'm fairly sure it's a dead end by this point (after banging my head on it again today), but I revisited it to verify for sure whether or not automount was capable of unlocking a volume with keychain credentials (because that might mean we could force a call to automount in the daemon...) and ran into trouble.

I thought I made this work a month ago with a line in /etc/auto_master like /- auto_nix and a direct map like <mountpath> -fstype=apfs :/dev/disk1s7, but I didn't make a note of the mountpath I used, and I haven't been able to it to work again today.

I hadn't used automount previously, so I suspect that I misinterpreted the output from automount -vc indicating it successfully mapped paths (and those showing up in mount), and didn't look closely enough at diskutil apfs list to notice that the volume wasn't actually mounting.

If you have any success--I also recall having trouble getting it to mount at any root path.

Replace the cumbersome two-step process, involving an expect script,
for passing the generated password in to the security program.
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/about-native-binary/9751/1

@lilyball
Copy link
Member

lilyball commented Nov 2, 2020

I have not ever actually configured automount myself. I did look through the manpages the other day and was disappointed that it didn't have any direct examples of automounting a local volume.

@abathur
Copy link
Member Author

abathur commented Nov 2, 2020

@mroi @lilyball I've been searching through some of the underlying components here to see if I can find any way we could trigger, on-demand, the system mechanism that unlocks from keychain. I stumbled on an interesting issue (openwall-com-au/BootUnlock#12) on a project (https://github.com/openwall-com-au/BootUnlock) that is picking at something similar.

I haven't had time to read much of it yet, but a quick skim makes me think it may help answer some timeline questions.

The repo also includes an approach to any process that can use security having access. I've seen this elsewhere but forgot about it. They're making/signing a copy of the security program, so that they can just permit that one.

I have not ever actually configured automount myself. I did look through the manpages the other day and was disappointed that it didn't have any direct examples of automounting a local volume.

Yes. Tricky to search, too. I assume I found an example somewhere, but I haven't been able to re-find it.

@lilyball
Copy link
Member

lilyball commented Nov 2, 2020

The repo also includes an approach to any process that can use security having access. I've seen this elsewhere but forgot about it. They're making/signing a copy of the security program, so that they can just permit that one.

We really should not go down that rabbit hole. We should refrain from putting anything else on the system outside of the normal Nix locations that's not strictly necessary (which is to say, removing the launch daemon plist should be sufficient to clean up the daemon behavior). And since the volume is expected to be mounted always while the system is running I really would not worry about trying to keep the password secure. Any attacker who has access to the running system can already inspect the contents of the volume, we're encrypting it to protect against offline access, and I don't think "attacker has access to running system but can only exfiltrate a few dozen bytes of data, wants to record the password so they can swipe the physical machine later and decrypt the Nix volume" is a practical threat model to care about.

@abathur
Copy link
Member Author

abathur commented Nov 3, 2020

We really should not go down that rabbit hole.

I agree. Mostly just find it interesting/telling that they haven't found anything better.

Manually invoking diskarbitrationd, or killing it and letting launchd restart it, will both trigger a failed attempt to mount the volume (visible in the logs), so I'm not sure it's the originator of the requests that actually unlock it. If it is, there's just something different about the two contexts...

I spent a while stepping through verbose logs, and tried restarting most of the other components I could spot as involved, but nothing triggered it to automatically mount.

The log entries closest to the actual auth process appear to be APFSUserAgent: [com.apple.diskunlock:Unlock] and APFSUserAgent: [com.apple.diskunlock:Keychain] if anyone wants a starting point.

@galaxy4public
Copy link

galaxy4public commented Nov 3, 2020

I agree. Mostly just find it interesting/telling that they haven't found anything better.

We did :), but it requires a creation of a tool that is quite complex. The reverse engineering of Apple's SecurityAgentPlugins shows that the HomeDirMechanism, a plugin that prepares user's home directory right after the system confirmed that the supplied credentials are valid, can mount different filesystems including network (AFS, Samba, etc) and encrypted APFS (if the latter has attached "Cryptousers" in its container keybag).

Now, the challenge is that all Apple tools (fdesetup, diskutil, etc.) are explicitly configured to allow adding this information to the APFS container where the System volume resides. We are working on a tool that would allow an administrator to add the keybag with "Cryptousers" to any APFS container, but the work progresses quite slowly.

@abathur
Copy link
Member Author

abathur commented Nov 10, 2020

Apologies for the radio silence. I've pushed a large ~WIP refactor (fair warning: I hope to obsolete & remove chunks of this, pending discussion) of this PR (and the existing installer).

I think the most-important high-level changes are:

  1. closes the single-user macOS install path
  2. sources create-darwin-volume from the darwin multi-user installer
  3. supports "curing" darwin-volume cruft if any is left from a previous install (removing it if Nix is the only thing in it, editing Nix out otherwise)
  4. extends the "curing" concept one step further into an "uninstaller" proof-of-concept
    • not a full uninstaller--that's out-of-scope here
    • but I want to demonstrate the case because there's naturally a lot of code and UI/X overlap between the tasks, and I suspect that discussion will clarify whether the functionality belongs here or elsewhere

I have updated the installer in the first post if you'd like to try it. Otherwise, you can see it in action:

I'll also briefly outline ~motives in case it helps connect any dots. If the above makes (narrative?) sense to you, you can probably ignore the list below.

  • I hoped to skirt some state/ergonomic problems people might run into around uninstall/reinstall (and be a little lazy :) by keeping the volume-mounter generic, but @lilyball poked a few holes in this approach, making it clear that we need to embed the volume UUID. But, this makes detecting existing state more critical.
  • Working through detecting/objecting to these conditions and walking users through fixing them made me suspect it wouldn't be much harder (and better from a dev/community-relations perspective) to just "cure" cruft (with user input).
  • I took the tentative agreement on removing single-user mode for macOS as a green light to partially integrate the volume script and multi-user installer to take advantage of the latter's better UI/X affordances. I'm a little leery of blowback, here, so I've taken some pains to ensure it's still possible to run this script standalone for now.
  • It seemed like the logic, code, and UI idioms I needed to prompt users about curing each component would readily compose into an uninstaller, so I wanted to stretch a little to test out the concepts and see if anything interesting shakes out of it.

@abathur
Copy link
Member Author

abathur commented Nov 28, 2020

I've done some additional refactoring of the work here. I've changed enough that I think it merits a fresh look, so I've opened a new PR #4289 to take over the work here and am closing this one. If you're subscribed here, you'll probably want to subscribe to that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants