I think this [1] N-pair implementation is easier to use.

1 min readMay 26, 2020

embeddings_anchor => code_embedding,

embeddings_positive => docstring_embedding,

labels => [1,2,3,......N],

Where N is the batch_size.

I usually use N-pair loss in a different context where there is a set of pre-defined classes, each class has a fixed label. This is probably invalid for your problem setting.

Thus, I think setting labels = [1,2,3,......N] would work. It will signals that each code_embedding is different. I doubt the exact values matter. They just need to be different values.

[1] https://github.com/CongWeilin/cluster-loss-tensorflow/blob/ec20b1022a208a78d35291dda31e6f0522843ae6/metric_learning/metric_loss_ops.py#L245

Written by Ahmed Taha

No responses yet