TITLE: Tracking meals for a year
DATE: 2022-08-16
AUTHOR: John L. Godlee
====================================================================


I've just finished an experiment where I tracked what meals I ate
for just over a year. I did it for 460 days in total.

The original goal was to keep a reference to provide inspiration
for what to cook when I can't think of anything. I realised that my
cooking and eating habits have changed since I first moved away
from home about 10 year ago, and I have forgotten many of the
dishes I used to cook.

A second goal arose when I started the project, which was to track
how many meals I eat "out", i.e. paid for somebody else to cook in
a restaurant, fast food place, cafe etc. I was suspicious that I
probably ate out more than I admitted, and cooking at home is a
good way to save money and stay healthy.

Collecting the data was quite easy. When I was living in London I
drew a table on paper and stuck it to the wall above the kitchen
table. When I moved to Edinburgh and I started working in the
office again I transitioned to using an excel spreadsheet kept on
my Desktop, so it remained visible and I wouldn't forget to fill it
out.

The only times I found it difficult to keep the log was when I was
travelling abroad, as my routine was disrupted. Even then, I tried
to backdate the log as best as I could.

In order to analyse the data in a meaningful way I had to spend a
long time at the end recoding the entries. At the time of recording
I didn't attempt to standardise how I recorded a meal, so I had
lots of very similar entries like "Eggs on toast", "Fried egg", and
"Fried egg on toast". I decided that the best way to recode the
responses, rather than just combining values to something like "Egg
toast", was to add a variable number of tags to each entry. So all
the above entries would get the tag of "bread", "egg", and "toast".
The tags describe the main ingredients in the dish, the cooking or
preparation method, and sometimes the food culture I think the dish
is from. Even with this system there are ambiguities that required
me to make a decision. Should a tortilla be classified as "bread",
or is it it's own thing like "flatbread"?

I used R to recode and process the data.

The original spreadsheet looked like this:

 date         breakfast                 lunch            supper
    location
 ------------ ------------------------- ----------------
-------------- -----------
 2022-08-20   Cereal                    Falafel wrap     NOTHING
    Edinburgh
 2022-08-21   Toast and peanut butter   Sandwich (OUT)   Lentil
salad   Edinburgh

The first this was to transform the table to a long format:

   x %>%
     pivot_longer(
       names_to = "meal",
       cols = c("breakfast", "lunch", "supper"))

Then I added an ID value to each meal, recoded the date, and
created two extra columns: out and nothing, which were logical
based on whether the strings "OUT" and "NOTHING" were found in the
meal entry. Then I set out adding tags to each meal, separated by a
semi-colon.

   x %>%
       mutate(
         meal_id = row_number(),
         date = as.Date(date),
         out = ifelse(grepl("OUT", value), TRUE, FALSE),
         nothing = ifelse(grepl("NOTHING", value), TRUE, FALSE),
         value = trimws(na_if(gsub("\\(OUT\\)", "", value), "NA")),
         tag = case_when(
           value == "Aloo paratha" ~ "flatbread;potato",
           value == "Apple" ~ "fruit;apple",
           value == "Aubergeine curry" ~ "curry;aubergeine",

I also filtered out meals from incomplete months at the start and
end of the survey period, so I could calculate unbiased monthly
statistics. I separated out the tags, so a meal may span multiple
lines, depending on how many tags it has. Finally, I added a
logical column which adds a FALSE where tag "meat" is encountered.

   x %>%
       filter(date >= as.Date("2021-05-01"), date <=
as.Date("2022-07-31")) %>%
       separate_rows(tag, sep = ";") %>%
       mutate(vegetarian = ifelse(tag == "meat", FALSE, TRUE))

The final table looked like this:

 date         meal        value          meal_id   out     nothing
 tag        vegetarian
 ------------ ----------- -------------- --------- -------
--------- ---------- ------------
 2022-08-20   breakfast   Cereal         1         FALSE   FALSE
 cereal     TRUE
 2022-08-20   lunch       Falafel wrap   2         FALSE   FALSE
 falafel    TRUE
 2022-08-20   lunch       Falafel wrap   2         FALSE   FALSE
 tortilla   TRUE
 2022-08-20   supper      NOTHING        3         FALSE   TRUE
 NA         TRUE

I made a plot of the number of meals I missed per month:

   meals_clean %>%
     group_by(meal_id, date) %>%
     summarise(nothing = any(nothing)) %>%
     mutate(month = floor_date(date, "month")) %>%
     group_by(month) %>%
     summarise(sum_nothing = sum(nothing)) %>%
     ggplot(., aes(x = month, y = sum_nothing)) +
       geom_line() +
       geom_point(shape = 21, fill = "darkgrey") +
       scale_x_date(date_breaks = "1 month", date_labels = "%b %Y",
         guide = guide_axis(angle = 45)) +
       labs(x = "Date", y = "Meals missed per month") +
       theme_bw() +
       theme(panel.grid.minor = element_blank())

 ![Meals missed per
month](https://johngodlee.xyz/img_full/meals_year/missed.png)

There seems to be a fairly regular oscillation where I miss meals
fairly regularly on month, then rarely the next month. May 2022 is
a big outlier, as I was travelling a lot and eating at irregular
times. It's possible that eating out has increased since I moved
from London to Edinburgh in September 2022, as we live closer to
the centre of the city and COVID restrictions have eased, but it's
difficult to tell from such noisy data.

Meals missed broken down by meal type:

   meals_clean %>%
     group_by(meal_id, date, meal) %>%
     summarise(nothing = any(nothing)) %>%
     group_by(meal) %>%
     summarise(sum_nothing = sum(nothing)) %>%
     ggplot(., aes(x = meal, y = sum_nothing)) +
       geom_bar(stat = "identity", colour = "black", fill =
"darkgrey") +
       labs(x = "Meal", y = "Meals missed") +
       theme_bw() +
       theme(panel.grid.minor = element_blank())

 ![Missed meals by meal
type](https://johngodlee.xyz/img_full/meals_year/missed_meal.png)

This doesn't surprise me. Breakfast is an important meal for me, as
it helps me to wake up, but sometimes I miss evening meals if I
have had a big lunch or I'm doing something in the evening.

A monthly timeline of meals eaten out:

   meals_clean %>%
     group_by(meal_id, date) %>%
     summarise(out = any(out)) %>%
     group_by(month = floor_date(date, "month")) %>%
     group_by(month) %>%
     summarise(sum_out = sum(out)) %>%
     ggplot(., aes(x = month, y = sum_out)) +
       geom_line() +
       geom_point(shape = 21, fill = "darkgrey") +
       scale_x_date(date_breaks = "1 month", date_labels = "%b %Y",
         guide = guide_axis(angle = 45)) +
       labs(x = "Date", y = "Meals out per month") +
       theme_bw() +
       theme(panel.grid.minor = element_blank())

 ![Meals eaten out per
month](https://johngodlee.xyz/img_full/meals_year/out.png)

On average I eat 10.33 meals out per month, from a total of ~90
meals per month. That's not bad. September 2021 is particularly low
because I was living with my parents in the countryside, so few
opportunities to eat out. April 2022 is also low because I was on
fieldwork in Angola, where most of our meals we cooked ourselves or
were cooked by someone we employed in the National Park.

and non-vegetarian meals:

   meals_clean %>%
     group_by(meal_id, date) %>%
     summarise(vegetarian = any(vegetarian, na.rm = TRUE)) %>%
     group_by(month = floor_date(date, "month")) %>%
     group_by(month) %>%
     summarise(sum_meat = sum(!vegetarian)) %>%
     ggplot(., aes(x = month, y = sum_meat)) +
       geom_line() +
       geom_point(shape = 21, fill = "darkgrey") +
       scale_x_date(date_breaks = "1 month", date_labels = "%b %Y",
         guide = guide_axis(angle = 45)) +
       labs(x = "Date", y = "Meaty meals per month") +
       theme_bw() +
       theme(panel.grid.minor = element_blank())

 ![Meaty meals per
month](https://johngodlee.xyz/img_full/meals_year/meat.png)

May 2022 is particularly hight because I was in Mexico on holiday
and decided to eat whatever I wanted, including lots of carne asada
and cochinita pibil.

Finally, a breakdown of the most common tags by meal type:

   meals_clean %>%
     filter(!is.na(tag)) %>%
     group_by(tag, meal) %>%
     tally() %>%
     group_by(meal) %>%
     mutate(n_meals = sum(n)) %>%
     group_by(meal, tag) %>%
     mutate(prop = n / n_meals) %>%
     group_by(meal) %>%
     slice_max(prop, n = 10, with_ties = FALSE) %>%
     ggplot(., aes(x = reorder_within(tag, -prop, meal), y =
prop)) +
       geom_bar(stat = "identity", fill = "darkgrey", colour =
"black") +
       geom_label(aes(label = n)) +
       scale_x_discrete(name = NULL, labels = function(x)
sub('^(.*)___.*$', '\\1', x)) +
       facet_wrap(~meal, scales = "free_x", nrow = 3) +
       labs(x = "Breakfast", y = "Proportion") +
       theme_bw()

 ![Breakdown of most popular tags by meal
type](https://johngodlee.xyz/img_full/meals_year/tags.png)

This didn't work out as nicely as I'd hoped. Bread comes out on top
for all meal types, but that's not particularly interesting. I eat
a lot of sandwiches for lunch (which contain bread), and toast for
breakfast. The tag system doesn't really capture the essence of the
meal. But it's very hard to classify individual meals because they
overlap so much and are so variable.

I had a stab at creating a network graph for the most commonly
shared tags per meal, as I've been playing with {igraph} recently.

   # Create list of dataframes of edges, with freq. per meal type
   tag_edges <- meals_clean %>%
     dplyr::select(meal, meal_id, tag) %>%
     group_by(meal, meal_id) %>%
     filter(n() > 1) %>%
     do(data.frame(t(combn(.$tag, 2)))) %>%
     ungroup() %>%
     dplyr::select(-meal_id) %>%
     group_by(meal, X1, X2) %>%
     tally() %>%
     rename(
       from = X1,
       to = X2,
       weight = n) %>%
     ungroup() %>%
     split(., .$meal)

   # For each meal type dataframe
   tag_graph_list <- lapply(tag_edges, function(x) {
     # Create a graph object
     tag_graph <- x %>%
       dplyr::select(-meal) %>%
       filter(weight > 5) %>%
       graph.data.frame() %>%
       as.undirected()

     # Create a plot
     ggraph(tag_graph, layout = 'linear', circular = TRUE) +
       geom_edge_link(aes(width = weight)) +
       geom_node_label(aes(label = name)) +
       ggtitle(unique(x$meal)) +
       theme_graph() +
       theme(legend.position = "none")
   })

   # Use patchwork to mosaic plots
   wrap_plots(tag_graph_list, ncol = 1)

 ![Most common tag connections per meal
type](https://johngodlee.xyz/img_full/meals_year/conns.png)

Lots of eggs on toast for breakfast, or bananas with other stuff.
Sandwiches with salad, cheese and salad for lunch. Curries, tomato
pasta, and rice with beans and vegetables.

If I was to do this again I'd try harder to be more descriptive in
the contents of the meals I ate. Rather than just "pasta", "pasta
with mushrooms, courgettes, tomato-based sauce, and crusty bread".