Skip to Content

Embedding Architecture

Our ingredient encoder consists of three consecutive blocks: (1) Nutriment MLP, (2) Name encoder (BERT), and (3) Fusion MLP. Once each ingredient is encoded, we aggregate across the ingredient set with a Transformer encoder.

Nutriment MLP

Given per-ingredient nutriments xnRdnx_n \in \mathbb{R}^{d_n} (with dn=9d_n=9) and a scalar quantity qRq\in\mathbb{R}, we form

xnut=[xn;q]Rdn+1.x_{\mathrm{nut}} = \bigl[x_n; q\bigr] \in \mathbb{R}^{d_n+1}.

The block applies two affine transformations with BatchNorm, ReLU, and dropout:

h1=ReLU(BN(W1xnut+b1))h1=Dropout(h1)h2=ReLU(W2h1+b2)\begin{aligned} h_1 &= \mathrm{ReLU}\bigl(\mathrm{BN}(W_1\,x_{\mathrm{nut}} + b_1)\bigr) \\ h_1' &= \mathrm{Dropout}(h_1) \\ h_2 &= \mathrm{ReLU}(W_2\,h_1' + b_2) \end{aligned}

We denote this entire operation as:

hnut=NutEnc(xnut).h_{\mathrm{nut}} = \mathrm{NutEnc}(x_{\mathrm{nut}}).

Name Encoder (BERT)

Let the ingredient name be a token sequence xname=(w1,,wL)\mathbf{x}_{\mathrm{name}}=(w_1,\dots,w_L). We extract the [CLS] embedding from a pretrained BERT:

ename=BERT[CLS](xname)Rdbert.e_{\mathrm{name}} = \mathrm{BERT}_{[\mathrm{CLS}]}\bigl(\mathbf{x}_{\mathrm{name}}\bigr) \in \mathbb{R}^{d_{\mathrm{bert}}}.

We then project to a lower-dimensional space:

enameproj=Wpename+bp.e_{\mathrm{name}}^{\mathrm{proj}} = W_p\,e_{\mathrm{name}} + b_p.

Fusion MLP

We fuse the numeric and textual features by concatenation:

zfuse=[h2;  enameproj].z_{\mathrm{fuse}} = \bigl[h_2;\;e_{\mathrm{name}}^{\mathrm{proj}}\bigr].

A two-layer MLP with BatchNorm, ReLU, and dropout produces the final ingredient embedding:

f1=ReLU(BNf(Wfzfuse+bf))f1=Dropout(f1)zingr=ReLU(Wof1+bo)\begin{aligned} f_1 &= \mathrm{ReLU}\bigl(\mathrm{BN}_f(W_f\,z_{\mathrm{fuse}} + b_f)\bigr) \\ f_1' &= \mathrm{Dropout}(f_1) \\ z_{\mathrm{ingr}} &= \mathrm{ReLU}(W_o\,f_1' + b_o) \end{aligned}

Ingredient Set Aggregator

Stacking all NN ingredient embeddings {zingr(i)}i=1N\{z_{\mathrm{ingr}}^{(i)}\}_{i=1}^N into a matrix, we apply a standard Transformer encoder:

Ztrans=TransformerEncoder([zingr(1),,zingr(N)])RN×dtrans.Z_{\mathrm{trans}} = \mathrm{TransformerEncoder} \bigl([\,z_{\mathrm{ingr}}^{(1)},\dots,z_{\mathrm{ingr}}^{(N)}]\bigr) \in \mathbb{R}^{N\times d_{\mathrm{trans}}}.

A simple mean-pool over the sequence yields the recipe-level ingredient representation:

zrecipe=1Ni=1NZtrans(i).z_{\mathrm{recipe}} = \frac{1}{N}\sum_{i=1}^N Z_{\mathrm{trans}}^{(i)}.

Recipe Encoder

The recipe encoder transforms high‐level recipe features (title, overall nutriments, and cooking steps) into embeddings, which are then fused together with the ingredient representation.


Title Encoder

For title tokens xtitle=(w1,,wT)\mathbf{x}_{\mathrm{title}} = (w_1, \dots, w_T):

etitle=BERT[CLS](xtitle),ztitle=Wtetitle+bt.e_{\mathrm{title}} = \mathrm{BERT}_{[\mathrm{CLS}]}\bigl(\mathbf{x}_{\mathrm{title}}\bigr), \quad z_{\mathrm{title}} = W_t\,e_{\mathrm{title}} + b_t.

Recipe Nutriment MLP

Given recipe-level nutriments xnutr,recRdnutr,recx_{\mathrm{nutr,rec}} \in \mathbb{R}^{d_{\mathrm{nutr,rec}}}:

hrec=ReLU(BNr(Wrxnutr,rec+br)),znutr,rec=ReLU(Wrhrec+br).\begin{aligned} h_{\mathrm{rec}} &= \mathrm{ReLU}\bigl(\mathrm{BN}_r(W_r\,x_{\mathrm{nutr,rec}} + b_r)\bigr), \\ z_{\mathrm{nutr,rec}} &= \mathrm{ReLU}(W_r'\,h_{\mathrm{rec}} + b_r'). \end{aligned}

Steps Encoder

For step tokens xsteps=(w1,,wS)\mathbf{x}_{\mathrm{steps}} = (w_1, \dots, w_S):

esteps=BERT[CLS](xsteps),zsteps=Wsesteps+bs.e_{\mathrm{steps}} = \mathrm{BERT}_{[\mathrm{CLS}]}\bigl(\mathbf{x}_{\mathrm{steps}}\bigr), \quad z_{\mathrm{steps}} = W_s\,e_{\mathrm{steps}} + b_s.

Final Fusion

We concatenate all high‐level vectors:

zall=[zrecipe;  ztitle;  znutr,rec;  zsteps].z_{\mathrm{all}} = \bigl[z_{\mathrm{recipe}};\;z_{\mathrm{title}};\;z_{\mathrm{nutr,rec}};\;z_{\mathrm{steps}}\bigr].

A final two-layer MLP produces the unified recipe representation:

fall,1=ReLU(BNa(Wazall+ba)),fall,1=Dropout(fall,1),zfinal=ReLU(Wffall,1+bf).\begin{aligned} f_{\mathrm{all},1} &= \mathrm{ReLU}\bigl(\mathrm{BN}_a(W_a\,z_{\mathrm{all}} + b_a)\bigr), \\ f_{\mathrm{all},1}' &= \mathrm{Dropout}(f_{\mathrm{all},1}), \\ z_{\mathrm{final}} &= \mathrm{ReLU}(W_f\,f_{\mathrm{all},1}' + b_f). \end{aligned}

Final Representation:

zfinal\boxed{\,z_{\mathrm{final}}\,}
Last updated on