Edit: If you want to see the full code, I include everything in this post in the Mathematica file here.
I promised previously that we would go into depth into the graph theory and food calculations, so today we will do just that. This will be Mathematica heavy, so really this is only aimed at those who have played around with the Mathematica programming language.
I’ve been using this language now for over a decade, and while it is not the fastest language on the market for doing numerics-heavy calculations, it is an incredibly versatile language, and for getting code written fast, it’s hard to beat!
I tend to code in what is called a functional programming style (ideal for Mathematica), which doesn’t use loops as you would normally find in a procedural language. Perhaps the most oft used coding syntax you will see below is of the form:
somefunction[#]&/@{el1,el2,el3,el4…}
which takes the elements of a list and passes them one by one into a function. The output of this will be a list of the form:
{somefunction[e1],somefunction[e2],somefunction[e3],somefunction[e4]…}
I shan’t explain all that much of the code below in detail in terms of the language, but I’ll try to explain what it is doing.
So, we started last time with a string of text, of the form.
- Allspice pairs well with: apples, beets, cabbage, caramel, cardamom, cinnamon, cloves, coriander, ginger, juniper, mace, mustard, nuts, nutmeg, onions, pears, pumpkin, root vegetables, yams
- Almond pairs well with: apple, apricot, banana, caramel, cherry, coffee, fig, honey, orange, peach, pear, plum
- Anice pairs well with: apples, beets, caramel, carrots, chocolate, citrus, cinnamon, coconut, coriander, cranberry, fennel, figs, fish, garlic, peaches, pomegranates, pumpkin
- Apple pairs well with: caramel, cardamom, chestnut, cinnamon, cranberry, currant, ginger, hazelnut, mango, maple, rosemary, walnut
- Apricot pairs well with: almond, black pepper, caramel, cardamom, ginger, hazelnut, honey, orange, peach, vanilla, plum
- Asian Pear pairs well with: ….
- and so on…
First we will define a variable strtotal with the whole of the above string, ie.
strtotal=”• Allspice pairs well with: apples, beets, cabbage…etc.”
We see that the text is formatted with bullet points. This is to our advantage as we can now split this big string into a list of individual strings, where each one is the flavour pairing with one ingredient:
splittotal=StringSplit[strtotal,”•“]
The output of this will then be a list of the form:
{“Allspice pairs well with: apples, beets, cabbage, caramel, cardamom, cinnamon, etc.“,
“Almond pairs well with: apple, apricot, banana, caramel, cherry, coffee, fig, etc.“,
“Anice pairs well with: apples, beets, caramel, carrots, chocolate, citrus, cinnamon, etc.”
, etc.}
We can now take this and further split each string but this time not splitting on a bullet point, but on the words “pairs well with”. We have to be a bit careful because there are situations with singular nouns and with plurals, so we need both “pairs well with” and “pair well with”. In fact there is an instance of “pears well with” which we have to include. We will then pass this directly into StringTrim, which will remove whitespace:
ss2=StringTrim[#] & /@ (StringSplit[splittotal, “pairs well with:” | “pair well with:” | “pears well with:”])
The ouput of this is now of the form:
{{“Allspice”,”apples,beets,cabbage,caramel,cardamom,cinnamon,etc.”},
{“Almond”, “apple,apricot,banana,caramel,cherry,coffee,fig,honey,orange,peach,pear,plum”},
{“Anice”, “apples,beets,caramel,carrots,chocolate,citrus,cinnamon,coconut,coriander,cranberry,etc.”},
{“Apple”, “caramel,cardamom,chestnut,cinnamon,cranberry,currant,ginger,hazelnut,etc.”},
{“Apricot”, “almond,black pepper,caramel,cardamom,ginger,hazelnut,honey,orange,peach,vanilla,plum”},etc.}
ok, now we need to split the second element of each element in this list. To do this we have to use:
ss3=ToLowerCase[({#[[1]], Flatten[StringSplit[StringTrim[StringSplit[#[[2]], “,” | ” and “], “.”], “,”]]} & /@ss2)]
Here we are splitting the second element of each item in the list on either “,” or ” and “, we are then trimming these and then removing superfluous “,”. This now gives us a nice list of the form:
{{“allspice”, {“apples”, “beets”, “cabbage”, “caramel”, “cardamom”, “cinnamon”, “cloves”,etc.}},
{“almond”, {“apple”, “apricot”, “banana”, “caramel”, “cherry”, “coffee”, “fig”, “honey”, “orange”,etc.}},
{“anice”, {“apples”, “beets”, “caramel”,”carrots”, “chocolate”, “citrus”, “cinnamon”, “coconut”,etc.}},
{“apple”, {“caramel”, “cardamom”, “chestnut”, “cinnamon”, “cranberry”, “currant”, “ginger”,etc.}},
{“apricot”, {“almond”, “black pepper”, “caramel”, “cardamom”, “ginger”, “hazelnut”, “honey”,etc.}},etc.}
ok, now we’re getting somewhere.
To get all of the foods in this list we do:
words=ss3//Flatten//Union
However, this includes various foods in both their singular and plural forms as well as a few variations in spelling, for instance anice and anise. Here we take advantage of the command EditDistance.
The edit distance is simply the number of changes we would have to make to a word (letter swaps and letter deletion or addition) in order to turn it into another word. We form a table of the edit distance between all the foods in our list words. This is simply given by:
editdistances=Table[EditDistance[words[[m]], words[[n]]], {m, Length[words]}, {n, Length[words]}]
This gives a table of edit distances. We are really interested in those words which differ by just 1 or 2 letters, so we can pull out the positions in this table where the edit distance is 1 or 2. If we then pass these positions into the word list then we’ll have a list of pairs of words whose edit distance is 1 or 2. Then we simply want to sort these (as we don’t want the pair {“anice”,”anise”} as well as {“anise”,”anice”}) and find the union of these pairs, and finally pass these into a replacement pattern:
editdistance1or2=#[[2]] -> #[[1]] & /@Union[Sort[{words[[#[[1]]]], words[[#[[2]]]]}]&/@Position[editdistances, 1 | 2]]
Now we have a replacement list which looks like:
{“almonds” -> “almond”, “anise” -> “anice”, “aniseed” -> “anise”, “apples” -> “apple”, “maple” -> “apple”, “bananas” -> “banana”, “beets” -> “beans”, “pears” -> “beans”, “cherries” -> “berries”, “curries” -> “berries”, “capers” -> “cakes”, “dates” -> “cakes”, etc.}
which is good, but clearly not quite right. We’ve picked up words which are close to others, but are clearly different words. For instance “cherries” and “berries” or “beets” and “beans”. We can deal with this by asking that everything up to the last couple of letters be the same
pluralreps=Select[editdistance1or2, If[StringLength[#[[1]]] > 3 && StringLength[#[[2]]] > 3, StringTake[#[[1]], ;; 4] == StringTake[#[[2]], ;; 4], StringTake[#[[1]], ;; -2] == StringTake[#[[2]], ;; -2]] &]
This leaves us with:
{“almonds” -> “almond”, “aniseed” -> “anise”, “apples” -> “apple”, “bananas” -> “banana”, “carrots” -> “carrot”, “chilies” -> “chili”, “cloves” -> “clove”, “coconuts” -> “coconut”, etc.}
Actually this leaves us with a few replacements which haven’t been captured using this method, so we have to add these by hand. If you want to follow along, these are:
{“cherries” -> “cherry”, “cilantro (coriander)” -> “cilantro”, “citrus fruit” -> “citrus”,
“cranberries” -> “cranberry”, “curries” -> “curry”, “juniper berry” -> “juniper”, “star anise” -> “anise”, “sun-dried” -> “sun-dried tomato”, “currants,black and red,” -> “currants”}.
So we join the two previous lists into the pluralreps variable.
Now we are ready to make a graph! We can first form all the pairs of foods which go together from our list ss3 with:
gotogether=Flatten[Table[{ss3[[n, 1]], #} & /@ ss3[[n, 2]], {n, Length[ss3]}], 1] /. pluralreps // Sort // Union
This now gives all the pairs of foods which go together as:
{{“allspice”, “apple”}, {“allspice”, “beets”}, {“allspice”, “cabbage”}, {“allspice”, “caramel”}, {“allspice”,
“cardamom”}, {“allspice”, “cinnamon”}, {“allspice”, “clove”}, {“allspice”, “coriander”}, {“allspice”,
“ginger”}, {“allspice”, “juniper”}, {“allspice”, “mace”}, {“allspice”, “mustard”}, {“allspice”,
“nutmeg”}, {“allspice”, “nuts”}, {“allspice”, “onion”}, {“allspice”,“pear”}, {“allspice”, “pumpkin”}, {“allspice”, “root vegetables”}, {“allspice”, “yams”}, {“almond”, “apple”}, {“almond”, “apricot”}, {“almond”, “banana”}, {“almond”, “caramel”},etc.}
Now we simply pass this into the syntax of a set of graph edges:
graphedges1=#[[1]] <-> #[[2]] & /@ gotogether
There are a few pesky links in here which we don’t want, which we can delete using:
graphedges2=Union[DeleteCases[graphedges1, “basil” <-> “sweet basil is the best basil for pesto” | “celery seed” <-> “cinnamon coriander” | n_ <->n_ | n_ <-> “tomato salads”]]
Now we’re truly good to go to make our first graph! As we saw before, the whole graph is too big to plot on a normal computer screen, so let’s take a random sample of all the edges of the graph:
Graph[RandomSample[graphedges2,100],VertexLabels -> “Name”, ImageSize -> 1000, ImagePadding -> 100]
This gives us (for one particular random sample):
It looks like there are a number of disconnected components here, but in fact this is just an artifact of having chosen just 100 of the 800 or so edges in the full graph.
I think this is enough now having set up the graph, but next time we’ll start to do some analysis on the graph itself.
If you are able to implement this code, and can do some fun analysis yourself, please leave a comment and we can feature some of the code in a future post.
[…] Part 2 can be found here. […]