Posted on :: Tags: , ,

They are everywhere, we can't see them, what do they want?

Note!

I started this post in November 2024, but I interrupted myself to take care of my mental health. This is not really a blog post, but rather my unedited notes from my research into environment variables. I just want to get it out.

How environment variables are accessed?

Python

We can access environment variables from the process environment via the os.environ object.

import os
print(os.environ["HOME"])

A mapping object where keys and values are strings that represent the process environment. For example, environ['HOME'] is the pathname of your home directory (on some platforms), and is equivalent to getenv("HOME") in C.

-- Python3 documentation: os - Miscellaneous operating system interfaces

In the CPython codebase we can find os.environ is created here:

# https://github.com/python/cpython python/cpython Lib/os.py
def _create_environ_mapping() -> _Environ:
    # snip
    data = environ
    return _Environ(data, encodekey, decode, encode, decode)

# unicode environ
environ = _create_environ_mapping()

Earlier in this module

# https://github.com/python/cpython python/cpython Lib/os.py
if 'posix' in _names:
    name = 'posix'
    linesep = '\n'
    from posix import *

Let's go see the posix module in /Modules/posixmodule.c.

It looks like it is every OS has its own way of doing stuff with environment variables. But if look only at Linux, well we need to look somewhere else.

// https://github.com/python/cpython Modules/posixmodule.c

/* On Darwin/MacOSX a shared library or framework has no access to
** environ directly, we must obtain it with _NSGetEnviron(). See also
** man environ(7).
*/

extern char **environ;
man environ
NAME
     environ – user environment

SYNOPSIS
     extern char **environ;

DESCRIPTION
     An array of strings called the environment is made available by execve(2)
     when a process begins. By convention these strings have the form “name=value”.
man execve
NAME
     execve – execute a file

SYNOPSIS
     #include <unistd.h>

     int
     execve(const char *path, char *const argv[], char *const envp[]);

DESCRIPTION
     execve() transforms the calling process into a new process.  The new process is constructed from an ordinary file, whose name is pointed to by path,
     called the new process file.  This file is either an executable object file, or a file of data for an interpreter.

So that's how Python scripts get access to the environment: execve makes the environ table available to the process.

-> execve: https://man7.org/linux/man-pages/man2/execve.2.html

Rust

This time we start from std::env.

use std::env;

let key = "HOME";
match env::var(key) {
    Ok(val) => println!("{key}: {val:?}"),
    Err(e) => println!("couldn't interpret {key}: {e}"),
}

Looking at the code of the standard library in library/std/src/env.rs.

// https://github.com/rust-lang/rust/ library/std/src/env.rs
#[stable(feature = "env", since = "1.0.0")]
pub fn var<K: AsRef<OsStr>>(key: K) -> Result<String, VarError> {
    _var(key.as_ref())
}

fn _var(key: &OsStr) -> Result<String, VarError> {
    match var_os(key) {
        Some(s) => s.into_string().map_err(VarError::NotUnicode),
        None => Err(VarError::NotPresent),
    }
}

#[must_use]
#[stable(feature = "env", since = "1.0.0")]
pub fn var_os<K: AsRef<OsStr>>(key: K) -> Option<OsString> {
    _var_os(key.as_ref())
}

fn _var_os(key: &OsStr) -> Option<OsString> {
    os_imp::getenv(key)
}

With earlier in the file, we can find the following import which I imagine host the OS specific implementation of getenv:

use crate::sys::os as os_imp;
// https://github.com/rust-lang/rust/ library/std/src/sys/mod.rs

/// The PAL (platform abstraction layer) contains platform-specific abstractions
/// for implementing the features in the other submodules, e.g. UNIX file
/// descriptors.
mod pal;

Ah, here we go!

We finally end up at the same point as with cpython: accessing the environ variable. Same warning about Apple doing somethiung different. The getenv function does point to libc and not to the environ() function.

// https://github.com/rust-lang/rust/ library/std/src/sys/pal/unix/os.rs


// Use `_NSGetEnviron` on Apple platforms.
//
// `_NSGetEnviron` is the documented alternative (see `man environ`), and has
// been available since the first versions of both macOS and iOS.
//
// Nowadays, specifically since macOS 10.8, `environ` has been exposed through
// `libdyld.dylib`, which is linked via. `libSystem.dylib`:
// <https://github.com/apple-oss-distributions/dyld/blob/dyld-1160.6/libdyld/libdyldGlue.cpp#L913>
//
// ...
#[cfg(target_vendor = "apple")]
pub unsafe fn environ() -> *mut *const *const c_char {
    libc::_NSGetEnviron() as *mut *const *const c_char
}

// Use the `environ` static which is part of POSIX.
#[cfg(not(target_vendor = "apple"))]
pub unsafe fn environ() -> *mut *const *const c_char {
    extern "C" {
        static mut environ: *const *const c_char;
    }
    &raw mut environ
}


// ...

pub fn getenv(k: &OsStr) -> Option<OsString> {
    // environment variables with a nul byte can't be set, so their value is
    // always None as well
    run_with_cstr(k.as_bytes(), &|k| {
        let _guard = env_read_lock();
        let v = unsafe { libc::getenv(k.as_ptr()) } as *const libc::c_char;

        if v.is_null() {
            Ok(None)
        } else {
            // SAFETY: `v` cannot be mutated while executing this line since we've a read lock
            let bytes = unsafe { CStr::from_ptr(v) }.to_bytes().to_vec();

            Ok(Some(OsStringExt::from_vec(bytes)))
        }
    })
    .ok()
    .flatten()
}

environ - the place where the variables reside

So, programs, scripts & friends have access to this piece of memory that contains the environment variables.

Some related links: