Cogs and Levers A blog full of technical stuff

Linux PAM

Introduction

PAM (Pluggable Authentication Modules) is a flexible mechanism for authenticating users, which has become a fundamental part of system security on Linux and other Unix-like operating systems. PAM abstracts a range of authentication tasks into modular plugins that can be tailored to suit the needs of system administrators, providing a way to develop programs that are independent of authentication scheme specifics. This modularity not only enhances security but also simplifies the management of user authentication.

In today’s guide, we will delve into the process of creating a program that interacts with the Linux passwd system using PAM. By the end of this article, you’ll have a clear understanding of how to harness PAM’s capabilities to authenticate users in your own applications, ensuring secure and efficient access control.

Architecture

PAM (Pluggable Authentication Modules) employ a unique architecture that separates the specific implementation of authentication methods from the application programming interface (API).

At the heart of this design is the concept of “conversations,” a mechanism that facilitates communication between the application and the authentication modules.

This conversation-based model allows PAM to present a uniform interface to the application, regardless of the underlying authentication process.

As a result, developers can integrate various authentication technologies into their applications without having to tailor their code to each method.

Instead, they rely on PAM to handle the specifics through configurable modules, each responsible for a different aspect of the authentication process.

This abstraction not only simplifies development but also enhances the flexibility and scalability of security systems, accommodating a wide range of authentication schemes with minimal changes to core application code.

Conversation

The conversation function is the callback that PAM uses for interactivity. In this code snippet, we’re using the password of "password" to stuff into the response to use. You could use the conversation opportunity to ask for input from the user.

int converse(int num_msg, const struct pam_message **msg, struct pam_response **resp, void *appdata_ptr) {
    *resp = NULL;
    
    if (num_msg <= 0 || num_msg > PAM_MAX_NUM_MSG) {
        return PAM_CONV_ERR;
    }
    
    struct pam_response *response =
        malloc(num_msg * sizeof(struct pam_response));

    if (response == NULL) {
        return PAM_BUF_ERR;
    }
    
    for (int i = 0; i < num_msg; ++i) {
        if (msg[i]->msg_style == PAM_PROMPT_ECHO_OFF) {
            // supply the password directly here
            const char *password = "password";
            response[i].resp = strdup(password);
        } else {
            // Handle other message styles if needed
            response[i].resp = NULL;
        }
        response[i].resp_retcode = 0;
    }
    
    *resp = response;
    return PAM_SUCCESS;
}

You can see that there’s a preference asked for here with the message style of PAM_PROMPT_ECHO_OFF. This would be an indicator back to the user interface to not echo the user’s keystrokes to the interface as they type.

Test Harness

We can now use this conversation function with the pam library:

int main() {
    const char *username = "username";
    const char *service = "login";

    pam_handle_t *pamh = NULL;
    struct pam_conv conv = { converse, NULL };

    int retval = pam_start(service, username, &conv, &pamh);

    if (retval != PAM_SUCCESS) {
        fprintf(stderr, "pam_start failed: %s\n", pam_strerror(pamh, retval));
        return 1;
    }

    retval = pam_authenticate(pamh, 0);

    if (retval != PAM_SUCCESS) {
        fprintf(stderr, "Authentication failed: %s\n", pam_strerror(pamh, retval));
        return 1;
    }

    printf("Authentication successful!\n");

    pam_end(pamh, retval);
    return 0;
}

The pam_start function begins the authentication conversation. The first parameter supplied is the service, and in this case it’s set to "login". PAM uses this value and looks for a configuration file named /etc/pam.d/login. This is pretty standard on any system. This particular service is for the “Shadow ‘login’ service.

The username variable should contain the name of a registered user.

Our conversation function converse is supplied to pam_start via the conv variable.

We then use pam_authenticate to preform the conversation, and pam_end will do any clean up for us.

A full example

#include <security/pam_appl.h>
#include <security/pam_misc.h>
#include <stdio.h>

int converse(int num_msg, const struct pam_message **msg, struct pam_response **resp, void *appdata_ptr) {
    *resp = NULL;
    
    if (num_msg <= 0 || num_msg > PAM_MAX_NUM_MSG) {
        return PAM_CONV_ERR;
    }
    
    struct pam_response *response =
        malloc(num_msg * sizeof(struct pam_response));

    if (response == NULL) {
        return PAM_BUF_ERR;
    }
    
    for (int i = 0; i < num_msg; ++i) {
        if (msg[i]->msg_style == PAM_PROMPT_ECHO_OFF) {
            // supply the password directly here
            const char *password = "password";
            response[i].resp = strdup(password);
        } else {
            // Handle other message styles if needed
            response[i].resp = NULL;
        }
        response[i].resp_retcode = 0;
    }
    
    *resp = response;
    return PAM_SUCCESS;
}

int main() {
    const char *username = "username";
    const char *service = "login";

    pam_handle_t *pamh = NULL;
    struct pam_conv conv = { converse, NULL };

    int retval = pam_start(service, username, &conv, &pamh);

    if (retval != PAM_SUCCESS) {
        fprintf(stderr, "pam_start failed: %s\n", pam_strerror(pamh, retval));
        return 1;
    }

    retval = pam_authenticate(pamh, 0);

    if (retval != PAM_SUCCESS) {
        fprintf(stderr, "Authentication failed: %s\n", pam_strerror(pamh, retval));
        return 1;
    }

    printf("Authentication successful!\n");

    pam_end(pamh, retval);
    return 0;
}

Building

In order to build this test program you need to link with pam and pam_misc.

gcc auth.c -o auth -lpam -lpam_misc

Arbitrary length arithmetic with GMP

Introduction

GMP is a library that will allow you to perform calculations on numbers that extend past the reach of what your standard data sizes can hold. From their website:

GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating-point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.

This means you can embed large math into your programs and will only be limited by the amount of memory on your running system.

In today’s article, we’ll go through a few simple examples on how to use this library.

Getting setup

Get gmp installed locally on your system either by downloading the latest release from their site, or just using your package manager.

sudo apt-get install libgmp10

Building

For any program that will require the gmp library, we’ll need to add the -lgmp switch:

gcc test.c -o test -lgmp

Now, we’re ready to go.

Factorial

As an example implementation, we’ll write a program to calculate the n’th factorial for us. First of all, we’ll implement this using traditional data types offered to us through C, and then we’ll swap this out to get greater numbers.

/* test1.c */
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

/** Compute the n'th factorial */
int factorial(int n) {
  int i;
  int x = 1;

  for (i = 1; i <= n; ++i) {
    x = x * i;
  }

  return x;
}

int main(int argc, char *argv[]) {
  int x;

  /* check that a number was specified */
  if (argc <= 1) {
    printf("usage: %s <number>\n", argv[0]);
    return 1;
  }

  /* convert the input to a number */
  x = atoi(argv[1]);
  assert(x >= 0);

  printf("%d\n", factorial(x));

  return 0;
}

We build this application not needing the gmp library:

gcc test1.c -o test1

We can then start to test it out.

$ ./test1 2  
2
$ ./test1 4
24
$ ./test1 5
120
$ ./test1 8
40320
$ ./test1 15
2004310016

The wheels start to fall off once we want to look at numbers higher than !19.

$ ./test1 19
109641728
$ ./test1 20
-2102132736

We overflowed our integer to where it wrapped into negative numbers. 19 is our limit for traditional data types.

Make the numbers bigger!

Now we can introduce gmp to help us break out of these constraints.

We’ll rewrite the factorial function above to operate on gmp types, and we’ll also convert the body of our program into a input capture function that will parse text into a gmp type for us.

Let’s deal with the input first:

void atoi_mpz(char *s, mpz_t res) {
  assert(s);
  
  int parse_result;

  /* allocate our number, and set an initial value of 0 */
  mpz_set_ui(res, 0);

  /* we assume base-10 when parsing */
  parse_result = mpz_set_str(res, s, 10);
  /* check that parsing was successful */
  assert(parse_result == 0);
}

First, you’ll notice that we’re not returning anything here. The mpz_t type is typed as an array, and as such can’t be used as a return. So, we supply it as an output parameter. This pattern will reoccur through these examples.

This function also assumes that res has already had mpz_init run on it, so it’s not magically allocating resources on your behalf.

mpz_set_ui sets the initial state of an mpz_t with a value from an integer (the real world!). mpz_set_str is really doing most of the work for us here, parsing out a string that its given into a mpz_t. The base needs to be supplied.

Now the factorial function will need to change as you’d expect:

/** Compute the n'th factorial */
void factorial(mpz_t n, mpz_t res) {
  mpz_t i;

  /* initialize x and set it to n */
  mpz_set(res, n);

  /* initialize i to 1 */
  mpz_init(i);
  mpz_set_ui(i, 1);

  while (mpz_cmp(i, n) < 0) {
    mpz_add_ui(i, i, 1);
    mpz_mul(res, res, i);
  }
 
  /* clean up work vars */
  mpz_clear(i);
}

Again, res is an output parameter with no return value. We do have a “work” variable here, so we set it up and destroy it all within the context of our function so we don’t have a memory leak.

mpz_cmp takes care of the looping for us. We’ve substituted a for loop here for a while loop to accommodate.

Full program!

Now we can use these functions in a program of our own. Here’s a full listing of an application that will give you the factorial of an arbitrary length integer.

/* test2.c */
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

#include <gmp.h>

/** Compute the n'th factorial */
void factorial(mpz_t n, mpz_t res) {
  mpz_t i;

  /* inintialize x and set it to n */
  mpz_set(res, n);

  /* initialize i to 1 */
  mpz_init(i);
  mpz_set_ui(i, 1);

  while (mpz_cmp(i, n) < 0) {
    mpz_add_ui(i, i, 1);
    mpz_mul(res, res, i);
  }
 
  /* clean up work vars */
  mpz_clear(i);
}

void atoi_mpz(char *s, mpz_t res) {
  assert(s);
  
  int parse_result;

  /* allocate our number, and set an initial value of 0 */
  mpz_set_ui(res, 0);

  /* we assume base-10 when parsing */
  parse_result = mpz_set_str(res, s, 10);
  /* check that parsing was successful */
  assert(parse_result == 0);
}

int main(int argc, char *argv[]) {
  mpz_t x, f;

  /* check that a number was specified */
  if (argc <= 1) {
    printf("usage: %s <number>\n", argv[0]);
    return 1;
  }

  mpz_init(f);
  mpz_init(x);

  atoi_mpz(argv[1], x);
  factorial(x, f);

  mpz_out_str(stdout, 10, f);

  mpz_clear(x);
  mpz_clear(f);

  return 0;
}

We write the result number here with mpz_out_str. This can redirected to any stream of your choice.

Building this program is just adding the lgmp switch.

gcc test2.c -o test2 -lgmp

And, now you can calculate factorials as high as you like:

$ ./test2 5
600
$ ./test2 50
1520704660085668902180630408303238442218882078448025600000000000000
$ ./test2 508
2600912719299986667129836551161461729376055224067622828051124610945736767140539144806269014448212661926636052457507242531933990603974750482248704223132435196370906188185711501338510076590779372857463616608930177975159159937930181522646905544337180113404654114353667383757400970162882750213805498362693213099688856210937449924868526775622360531973548790490474776119049615868165281085383657990705679209654697390907170752684309098744014502095309729615191639442748317478208592870187172780119819882401997425092893189361279318323502318101157291409635548371757588486399668194699833678157051688803686737440480747043533781525086057498368946374944369614791535224963761053372201191517194843602022799832918767988197763611904441080442538109356964428607751413413409857782381537871839805616626750150075526770316820978596423627176357508773710490144878649275528649481402223674961313005296813402084900146024228278827847133177811762165315793180635312968824483728409835808810724283538893336754191301602850516144797238826325624248081953676797029938259558400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

systemd Services and Sockets

Introduction

systemd is a set of basic tools that any system can use to build more sophisticated service applications. Using these building you can create units which can be a:

  • service
  • socket
  • device
  • mount
  • automount
  • swap
  • target
  • path
  • timer
  • slice
  • scope

In today’s article, we’ll go through an example that uses service and socket to build a simple server.

Hello, World

To start with, let’s create a “Hello, World” service that will do nothing more than take your connection and send you back the string "Hello, world". First we define our service in a file. Ours is hw.service.

You can install this at ~/.config/systemd/user/hw.service.

# hw.service
[Unit]
Description=Hello World Service
After=network.target hw.socket
Requires=hw.socket

[Service]
Type=simple
ExecStart=/usr/bin/python3 %h/tmp/serve.py
TimeoutStopSec=5

[Install]
WantedBy=default.target

The ExecStart holds what systemd will hand the socket connection off to. In this case, we’re going to hand the connection off to a python socket server running from our ~/tmp directory.

You can see that our requires hw.socket. It needs to be up before it will respond to requests. You install this one at ~/.config/systemd/user/hw.socket.

# hw.socket
[Unit]
Description=Hello World Socket
PartOf=hw.service

[Socket]
ListenStream=127.0.0.1:7777

[Install]
WantedBy=sockets.target

Our socket will listen on 7777 waiting for connections.

The serve.py mentioned in the service file is what systemd will hand the connection off to. The implementation of that server is a simple socket server:

#!/usr/bin/env python3

from socketserver import TCPServer, StreamRequestHandler
import socket
import logging

class Handler(StreamRequestHandler):
    def handle(self):
        self.data = self.rfile.readline().strip()
        logging.info("From <%s>: %s" % (self.client_address, self.data))
        self.wfile.write("Hello, world!\r\n".encode("utf-8"))

class Server(TCPServer):
    
    # The constant would be better initialized by a systemd module
    SYSTEMD_FIRST_SOCKET_FD = 3

    def __init__(self, server_address, handler_cls):
        # ignore the bind/listen steps
        TCPServer.__init__(
            self, server_address, handler_cls, bind_and_activate=False
        )
        
        # take the socket from systemd
        self.socket = socket.fromfd(
            self.SYSTEMD_FIRST_SOCKET_FD, self.address_family, self.socket_type
        )

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)

    # host and port values are ignored here
    server = Server(("localhost", 9999), Handler)
    server.serve_forever()

Inside of the Handler class, in the constructor you can see that we avoid the bind and listen steps. This is because systemd has already done this for us. We’re just going to be handed a file descriptor with the socket already attached.

# ignore the bind/listen steps
TCPServer.__init__(
    self, server_address, handler_cls, bind_and_activate=False
)

# take the socket from systemd
self.socket = socket.fromfd(
    self.SYSTEMD_FIRST_SOCKET_FD, self.address_family, self.socket_type
)

That’s exactly what’s happening with fromfd here. We’re given a socket to work with via descriptor 3.

The actual implementation of our handler is not doing much more than taking in the request data, and sending back "Hello World".

def handle(self):
    self.data = self.rfile.readline().strip()
    logging.info("From <%s>: %s" % (self.client_address, self.data))
    self.wfile.write("Hello, world!\r\n".encode("utf-8"))

Getting it installed

You can start your server listening with the following now:

systemctl --user daemon-reload
systemctl --user start hw.socket

You should be up and running.

Testing

You can use telnet to take a look at your server:

➜  ~ telnet 127.0.0.1 7777
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Hello
Hello, world!
Connection closed by foreign host.

Alternatively, you can just use netcat:

➜  ~ echo 'test' | netcat 127.0.0.1 7777 
Hello, world!

Check that it’s working

After you’ve tested it a few times, you’ll be albe to see requests in the logs.

journalctl -f --user-unit hw.service

You should see the lines from the logging.info calls.

Nov 05 18:23:15 ditto systemd[2245]: Started Hello World Service.
Nov 05 18:23:18 ditto python3[9883]: INFO:root:From <('127.0.0.1', 38292)>: b'Hello'

Cleaning up

Once you’re done and you’d like to remove these, simply stop the service, remove the units, and reload.

# stop the service
systemctl --user stop hw.socket

# remove the service and socket
rm ~/.config/systemd/user/hw.*

# reload systemd state
systemctl --user daemon-reload 

Find Gems

Introduction

The find utility is an extremely flexible query tool for your file system. It allows you to navigate and traverse your folder structures.

From the man page:

This manual page documents the GNU version of find. GNU find searches the directory tree rooted at each given file name by evaluating the given expression from left to right, according to the rules of precedence (see section OPERATORS), until the outcome is known (the left hand side is false for and operations, true for or), at which point find moves on to the next file name.

In this article, we’ll go through some usages.

Find by name

You can simply find by name with the -name switch:

find /the/directory -name the-file-name.txt

You can also ignore case by switching out -name for -iname.

Find by permissions

If you’d like to find files with a specific permission (in this case 777):

find . -type f -perm 0777 -print

You can take the converse of this with !:

find . -type f ! -perm 0777 -print

Average file size

Let’s say you have a directory of text files, and you’d like to find the average size of them. By joining find with awk, you can use the following to do just that:

find . -type f -exec wc -w {} \; | awk '{numfiles=numfiles+1;total += $1} END{print total/numfiles}'

Random sampling

You may need to take a random sample of files in a given folder. You can start this process off with find, and then through clever usage of sort, and tail you can get a random sample.

This will take 500 random files from a directory:

find /the/path -type f -print0 | sort -zR | head -zn 500

find /the/path -type f -print0 prints out the files using \0 as a delimiter thanks to -print0.

sort is told to use \0 as its delimiter with -z and -R is to sort them randomly.

head now takes the first 500, again using \0 as the delimiter.

Parquet

The Apache Parquet file format is used widely in the data space. It’s a column-oriented format that focuses on storing data as efficiently as possible, with emphasis on data retrieval.

Why?

The most common storage format that you’d use to hold data is CSV. It’s a good, simple format but has quite a number of short-comings when it comes to data analysis.

Parquet is a column-oriented format which is much more sympathetic to data analysis. CSV is row-oriented, which is a much better application for an OLTP scenario.

Parquet offers compression and partitioning that is simply not available to the CSV format.

So, it stores information - but is better suited to the data warehouse idea.

Python

A really easy way to get started using Parquet is with Python. The PyArrow library is a set of utilities that allow you to work with in-memory analytics tools. PyArrow plays nicely with Pandas and NumPy so it’s a good fit.

Make sure you have pyarrow installed as a dependency.

import pandas as pd
import numpy as np
import pyarrow as pa

import pyarrow.parquet as pq

Create a table

First off, we’ll create a DataFrame from some raw data.

data = [
    ["John", "Smith", 23],
    ["Amber", "Brown", 31],
    ["Mark", "Green", 22],
    ["Jane", "Thomas", 26]
]

df = pd.DataFrame(
    data, 
    columns=["first_name", "last_name", "age"]
)

We can then use from_pandas function to create a pyarrow.Table from this DataFrame.

table = pa.Table.from_pandas(df)

Basic I/O

Now that we have loaded a Table, we can save this data using write_table and pick it back up off disk using read_table.

pq.write_table(table, 'people.parquet')

people = pq.read_table('people.parquet')
people.to_pandas()

Finishing up

There’s a lot more benefits to picking a more optimal data storage solution when working in the data analytics space.