How do I use my PTO? Part 2 – Math to Code

In Part 1 of this series, we sketched out our decision problem of choosing which trips to take given the amount of PTO days we have as a (barebones) optimization problem. That’s great and all (I’ve always like just looking at mathematical notation), but we should be able to solve this problem to get an answer to which trips should I take?

In this article, I’ll move the math to code. The code I show in this article is available in a jupyter notebook in a GitHub repository. Just as I mentioned at the end of the article in Part 1, this whole process is iterative. I will gradually modify our problem definition and code in future article to account for more complex preferences and constraints. These modifications will appear in new jupyter notebooks.

Importing our toolset

We are using python to solve the problem we defined in Part 1. I’m going to assume you have python already set up and know how to install libraries/modules.

The modules we need we import in the following:

import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from ortools.linear_solver import pywraplp

We use…

  • numpy to construct vectors and matrices (arrays),
  • pandas to manage and manipulate input data,
  • scikit-learn (sklearn) to help us transform some text labels into numerical ones,
  • and ortools to actually setup and solve our optimization problem

Defining data inputs

Defining number of PTO days

This one is easy but must be defined somewhere: the total number of PTO days we get in a year. We’ll store this in a variable:

D_pto = 20

In Part 1, this variable is D_{PTO}. Here, I set our number of PTO days to 20 but it can be set to anything that is applicable to you.

Defining given days off

We need to identify the days in the year that we get off for “free”, i.e. days where we wouldn’t have to spend a PTO day. These freedays could include federal/bank/company holidays and weekends. Let’s start by creating a pandas dataframe of all the days of the year:

yearSeries = pd.date_range('2023-01-01', '2023-12-31', freq='D').to_series()
yearDF = pd.DataFrame(yearSeries, columns = ['Day'])
yearDF['Day of Year'] = yearDF['Day'].dt.dayofyear

In yearDF, each row is a day of the year, and we have a column for the date of that day (“Day”) and what day of the year that day is (“Day of Year”).

Next, we can create a new column called “Freeday” that will act as a flag if a particular day is a freeday. We start below with a list of the days we have free as company holidays, then set their value in the “Freeday” column of the dataframe as true:

company_holidays = [
    '2023-01-02',
    '2023-01-16',
    '2023-04-07',
    '2023-04-10',
    '2023-05-29',
    '2023-07-03',
    '2023-07-04',
    '2023-09-04',
    '2023-11-10',
    '2023-11-23',
    '2023-11-24',
    '2023-12-25',
    '2023-12-26',
    '2023-12-27',
    '2023-12-28',
    '2023-12-29',
]

yearDF['Freeday'] = yearDF['Day'].isin(pd.to_datetime(company_holidays))

These are example days and should be changed to whatever is applicable to you. However, we need to keep the format of YEAR-MONTH-DAY when defining these dates.

Now we find all the Saturdays (day 5 of the week) and all Sundays (day 6 of the week) and set them to be freedays as well:

yearDF.loc[(yearDF['Day'].dt.dayofweek == 5)|(yearDF['Day'].dt.dayofweek == 6), 'Freeday'] = True

Defining trips options

To serve as example trip options, I collected over a thousand trips offered by G Adventures in 2023. We will go over how I did this programmatically in a future post ;). Trip data includes the itinerary name, the start day of year, and the end day of year. I saved this data in a *.csv file that we can import:

tripsDF = pd.read_csv('trips/trips.csv')

I have my *.csv file in a folder called trips so point to that folder and file in this line of code.

Defining our constraints

Constraint 1: PTO Day Limit

Recall our first constraint where we can’t go over our PTO limit:

(1)   \begin{equation*}\mathbf{w}^T \mathbf{A}\mathbf{x} \leq D_{PTO}\end{equation*}

where \mathbf{w} is a vector defining workdays vs. freedays and \mathbf{A} is a matrix defining when trips occur during the year.

Let’s create \mathbf{w} using our “Freeday” flag in yearDF below:

w = np.ones(365)
w[yearDF.loc[yearDF['Freeday'], 'Day of Year'] - 1] = 0

where we find our freedays defined in yearDf and set their value to 0 signifying that they won’t count in our PTO usage calculations. Note that we are subtracting one from the “Day of Year” because day of year starts with 1 but our indices for \mathbf{w} start with 0.

Now we create \mathbf{A} :

A = np.zeros((365, len(tripsDF)))
for trip_index, row in tripsDF.iterrows():
    day_range = np.arange(row['start_doy'], row['end_doy'] + 1)
    A[day_range - 1, trip_index] = 1

Here, we iterate through each trip, find its start and end days, and set the corresponding row values in the trip’s column to 1.

Constraint 2: Only 1 Trip at a Time

Recall the second constraint that keeps us from recommending two trips that happen at the same time:

(2)   \begin{equation*}\mathbf{A} \mathbf{x} \leq \mathbf{1}_{365}\end{equation*}

Since we already created \mathbf{A} for the first constraint, we just need to create the vector \mathbf{1}_{365}:

Ones_365 = np.ones(365)

Simple…

Constraint 3: No Repeats

The third constraint keeps us from choosing the same itinerary multiple times in a year:

(3)   \begin{equation*}\mathbf{B} \mathbf{x} \leq \mathbf{1}_{T}\end{equation*}

To construct \mathbf{B}, we need to group trips by their itineraries. Below, we use scikit-learn’s LabelEncoder to assign a numerical ID to each itinerary:

itineraryID = LabelEncoder().fit_transform(tripsDF['itinerary_name'])

Next, we iterate through the trips in the tripsDF dataframe to assign each trip to an itinerary as captured by \mathbf{B}:

B = np.zeros((tripsDF['itinerary_name'].nunique(), len(tripsDF)))
for index, row in tripsDF.iterrows():
    B[itineraryID[index], index] = 1

Finally, we create a vector of ones that is as along as the number of unique itineraries we have in our trip set:

Ones_T = np.ones(tripsDF['itinerary_name'].nunique())

Defining our objective

In Part 1, we generically assigned a value to each of our trips, which we store in a value vector \mathbf{v}. Choosing what the value is of a given trip is not trivial as it can be a factor of the trip’s cost, where is it goes, when it is, activities included, etc. Defining value deserves its own article.

For the purposes of this article and to get some working code, let’s at least define a placeholder objective. A reasonable objective is to maximize the number of PTO days taken. Conceivably this objective wouldn’t be the best if we could roll over PTO days from one year to another but if you’re like me, you can’t :(. We know from constraint 1 that \mathbf{w}^T\mathbf{A}\mathbf{x} computes the total number of PTO days needed for the given trip recommendation in \mathbf{x}. So we’ll just use this as our objective.

Putting it together

For safe keeping, we’ll put all of our problem data into a dictionary:

data = dict()

First, we’ll save the number of decision variables we have:

data['num_vars'] = (w@A).shape[0]

Then we’ll stack all of our constraint coefficients into a single numpy array:

data['constraint_coeffs'] = np.vstack((
    w.T@A,
    A,
    B
))

and we’ll do the same with our constraint bounds:

data['bounds'] = np.hstack((
    20,
    Ones_365,
    Ones_T,
))

Next, we save the total number of constraints we have in our problem:

data['num_constraints'] = 1 + len(Ones_365) + len(Ones_T)

and our objective coefficients:

data['obj_coeffs'] = w.T@A

We’ll reference this data dictionary when setting up and solving the problem with ORTools below.

Solving the problem

In Part 1 of this series, we noted that our problem is a Integer Program. We need an Interger Program solver.  In Google ORTools, this is SCIP so we initialize our solver object to use the SCIP solver:

solver = pywraplp.Solver.CreateSolver('SCIP')

Next, we register all of our decision variables, which each can be 0 or 1, with the solver:

x = {}
for j in range(data['num_vars']):
    x[j] = solver.IntVar(0, 1, 'x[%i]' % j)
print('Number of variables =', solver.NumVariables())

Then we register our constraints.  We loop over the number of constraints we have, and for each constraint, we set the bounds with solver.RowConstraint().  Then we loop over all the decision variables to tie a constraint coefficient to them for our given constraint in the higher loop with constraint.SetCoefficient():

for i in range(data['num_constraints']):
    constraint = solver.RowConstraint(0, data['bounds'][i], '')
    for j in range(data['num_vars']):
        constraint.SetCoefficient(x[j], data['constraint_coeffs'][i][j])
print('Number of constraints =', solver.NumConstraints())

Finally before solving, we define our coefficients for our objective function with objective.SetCoefficient().  This essentially ties a value coefficient to each of our decision variables similarly to what we did with the constraints.  We also set the objective to maximize with objective.SetMaximization(), meaning a higher objective function value is better:

objective = solver.Objective()
for j in range(data['num_vars']):
    objective.SetCoefficient(x[j], data['obj_coeffs'][j])
objective.SetMaximization()

Now we solve:

status = solver.Solve()

Of course, we want to know what the recommended trips we should take are.  Below, we extract the trips that were chosen by the solver:

if status == pywraplp.Solver.OPTIMAL:
    print('Objective value =', solver.Objective().Value(), 'Vacation Days')
    sol_ind = []
    for j in range(data['num_vars']):
        if x[j].solution_value() > 0:
            sol_ind.append(j)
            print(x[j].name(), ' = ', x[j].solution_value(), tripsDF.iloc[j]['itinerary_name'])
    print()
    print('Problem solved in %f milliseconds' % solver.wall_time())
    print('Problem solved in %d iterations' % solver.iterations())
    print('Problem solved in %d branch-and-bound nodes' % solver.nodes())
else:
    print('The problem does not have an optimal solution.')

Reflecting

So what answer did we get? What trips will we take? The solution to the optimization problem defined above is the following:

Objective value = 20.0 Vacation Days
x[126]  =  1.0 Bangkok to Kuta: Summits & Sunsets in Malaysia, Asia - G Adventures

Problem solved in 4158.000000 milliseconds
Problem solved in 0 iterations
Problem solved in 1 branch-and-bound nodes

So we see that we do use up all of our PTO days, but it only recommends one trip (at least with the version of ORTools that I have… there may not be a unique optimal answer). The Bangkok to Kuta trip from G Adventures lasts 29 days. We use all our PTO days on one trip and we’ll be gone on that trip for nearly a month. Is that OK for a recommendation? If not, this is an indicator that something is missing. Either we should add more constraints to better reflect what we don’t want or choose our objective function to better reflect what we do want. This sort of build-test-reflect cycle drives us to better results.