Table of Contents
They are everywhere, we can't see them, what do they want?
I started this post in November 2024, but I interrupted myself to take care of my mental health. This is not really a blog post, but rather my unedited notes from my research into environment variables. I just want to get it out.
How environment variables are accessed?
Python
We can access environment variables from the process environment via the os.environ object.
import os
print(os.environ["HOME"])
A mapping object where keys and values are strings that represent the process environment. For example,
environ['HOME']
is the pathname of your home directory (on some platforms), and is equivalent togetenv("HOME")
in C.-- Python3 documentation: os - Miscellaneous operating system interfaces
In the CPython codebase we can find os.environ
is created here:
# https://github.com/python/cpython python/cpython Lib/os.py
def _create_environ_mapping() -> _Environ:
# snip
data = environ
return _Environ(data, encodekey, decode, encode, decode)
# unicode environ
environ = _create_environ_mapping()
Earlier in this module
# https://github.com/python/cpython python/cpython Lib/os.py
if 'posix' in _names:
name = 'posix'
linesep = '\n'
from posix import *
Let's go see the posix
module in /Modules/posixmodule.c
.
It looks like it is every OS has its own way of doing stuff with environment variables. But if look only at Linux, well we need to look somewhere else.
// https://github.com/python/cpython Modules/posixmodule.c
/* On Darwin/MacOSX a shared library or framework has no access to
** environ directly, we must obtain it with _NSGetEnviron(). See also
** man environ(7).
*/
extern char **environ;
man environ
NAME
environ – user environment
SYNOPSIS
extern char **environ;
DESCRIPTION
An array of strings called the environment is made available by execve(2)
when a process begins. By convention these strings have the form “name=value”.
man execve
NAME
execve – execute a file
SYNOPSIS
#include <unistd.h>
int
execve(const char *path, char *const argv[], char *const envp[]);
DESCRIPTION
execve() transforms the calling process into a new process. The new process is constructed from an ordinary file, whose name is pointed to by path,
called the new process file. This file is either an executable object file, or a file of data for an interpreter.
So that's how Python scripts get access to the environment: execve
makes the environ
table available to the process.
-> execve
: https://man7.org/linux/man-pages/man2/execve.2.html
Rust
This time we start from std::env
.
use std::env;
let key = "HOME";
match env::var(key) {
Ok(val) => println!("{key}: {val:?}"),
Err(e) => println!("couldn't interpret {key}: {e}"),
}
Looking at the code of the standard library in library/std/src/env.rs
.
// https://github.com/rust-lang/rust/ library/std/src/env.rs
#[stable(feature = "env", since = "1.0.0")]
pub fn var<K: AsRef<OsStr>>(key: K) -> Result<String, VarError> {
_var(key.as_ref())
}
fn _var(key: &OsStr) -> Result<String, VarError> {
match var_os(key) {
Some(s) => s.into_string().map_err(VarError::NotUnicode),
None => Err(VarError::NotPresent),
}
}
#[must_use]
#[stable(feature = "env", since = "1.0.0")]
pub fn var_os<K: AsRef<OsStr>>(key: K) -> Option<OsString> {
_var_os(key.as_ref())
}
fn _var_os(key: &OsStr) -> Option<OsString> {
os_imp::getenv(key)
}
With earlier in the file, we can find the following import which I imagine host the OS specific implementation of getenv
:
use crate::sys::os as os_imp;
// https://github.com/rust-lang/rust/ library/std/src/sys/mod.rs
/// The PAL (platform abstraction layer) contains platform-specific abstractions
/// for implementing the features in the other submodules, e.g. UNIX file
/// descriptors.
mod pal;
Ah, here we go!
We finally end up at the same point as with cpython: accessing the environ
variable. Same warning about Apple doing somethiung different. The getenv
function does point to libc
and not to the environ()
function.
// https://github.com/rust-lang/rust/ library/std/src/sys/pal/unix/os.rs
// Use `_NSGetEnviron` on Apple platforms.
//
// `_NSGetEnviron` is the documented alternative (see `man environ`), and has
// been available since the first versions of both macOS and iOS.
//
// Nowadays, specifically since macOS 10.8, `environ` has been exposed through
// `libdyld.dylib`, which is linked via. `libSystem.dylib`:
// <https://github.com/apple-oss-distributions/dyld/blob/dyld-1160.6/libdyld/libdyldGlue.cpp#L913>
//
// ...
#[cfg(target_vendor = "apple")]
pub unsafe fn environ() -> *mut *const *const c_char {
libc::_NSGetEnviron() as *mut *const *const c_char
}
// Use the `environ` static which is part of POSIX.
#[cfg(not(target_vendor = "apple"))]
pub unsafe fn environ() -> *mut *const *const c_char {
extern "C" {
static mut environ: *const *const c_char;
}
&raw mut environ
}
// ...
pub fn getenv(k: &OsStr) -> Option<OsString> {
// environment variables with a nul byte can't be set, so their value is
// always None as well
run_with_cstr(k.as_bytes(), &|k| {
let _guard = env_read_lock();
let v = unsafe { libc::getenv(k.as_ptr()) } as *const libc::c_char;
if v.is_null() {
Ok(None)
} else {
// SAFETY: `v` cannot be mutated while executing this line since we've a read lock
let bytes = unsafe { CStr::from_ptr(v) }.to_bytes().to_vec();
Ok(Some(OsStringExt::from_vec(bytes)))
}
})
.ok()
.flatten()
}
environ
- the place where the variables reside
So, programs, scripts & friends have access to this piece of memory that contains the environment variables.
Some related links:
- environ variable defined by the UNIX specification https://en.wikipedia.org/wiki/Single_UNIX_Specification
environ
in "The Single UNIX Specification, Version 2" https://pubs.opengroup.org/onlinepubs/007908775/xsh/environ.htmlgetenv
in "The Single UNIX Specification, Version 2" https://pubs.opengroup.org/onlinepubs/007908775/xsh/getenv.html- list of all commonly used variables https://pubs.opengroup.org/onlinepubs/007908775/xbd/envvar.html