Embedding Architecture
Our ingredient encoder consists of three consecutive blocks: (1) Nutriment MLP, (2) Name encoder (BERT), and (3) Fusion MLP. Once each ingredient is encoded, we aggregate across the ingredient set with a Transformer encoder.
Nutriment MLP
Given per-ingredient nutriments (with ) and a scalar quantity , we form
The block applies two affine transformations with BatchNorm, ReLU, and dropout:
We denote this entire operation as:
Name Encoder (BERT)
Let the ingredient name be a token sequence . We extract the [CLS] embedding from a pretrained BERT:
We then project to a lower-dimensional space:
Fusion MLP
We fuse the numeric and textual features by concatenation:
A two-layer MLP with BatchNorm, ReLU, and dropout produces the final ingredient embedding:
Ingredient Set Aggregator
Stacking all ingredient embeddings into a matrix, we apply a standard Transformer encoder:
A simple mean-pool over the sequence yields the recipe-level ingredient representation:
Recipe Encoder
The recipe encoder transforms high‐level recipe features (title, overall nutriments, and cooking steps) into embeddings, which are then fused together with the ingredient representation.
Title Encoder
For title tokens :
Recipe Nutriment MLP
Given recipe-level nutriments :
Steps Encoder
For step tokens :
Final Fusion
We concatenate all high‐level vectors:
A final two-layer MLP produces the unified recipe representation:
Final Representation: