Fast Evaluation of S-Boxes With Garbled Circuits

Garbling schemes are vital primitives for privacy-preserving protocols and secure two-party computation. This paper presents a projective garbling scheme that assigns $2^{n}$ values to wires in a circuit comprising XOR and unary projection gates. A generalization of FreeXOR allows the XOR of wires with $2^{n}$ values to be very efficient. We then analyze the performance of our scheme by evaluating substitution-permutation ciphers. Using our proposal, we measure high-speed evaluation of the ciphers with a moderately increased cost in garbling and bandwidth. Theoretical analysis suggests that for evaluating the nine examined ciphers, one can expect a 4- to 70-fold improvement in evaluation performance with, at most, a 4-fold increase in garbling cost and, at most, an 8-fold increase in communication cost compared to the Half-Gates (Zahur, Rosulek and Evans; Eurocrypt’15) and ThreeHalves (Rosulek and Roy; Crypto’21) garbling schemes. In an offline/online setting, such as secure function evaluation as a service, the circuit garbling and communication to the evaluator can proceed in the offline phase. Thus, our scheme offers a fast online phase. Furthermore, we present efficient Boolean circuits for the S-boxes of TWINE and Midori64 ciphers. To our knowledge, our formulas give the smallest number of AND gates for the S-boxes of these two ciphers.


I. INTRODUCTION
Privacy-preserving protocols enable collaborative computation on sensitive data while protecting the privacy of the sensitive data.Successful implementations in a two-party scenario include privacy-preserving genome analysis [1], email spam filtering [2], image processing [3] and machine learning [4].The formalization of such two-party computation is called Secure Function Evaluation (SFE).Here the two parties, namely, Alice and Bob, want to compute a public function f (x, y), where x is the input of Alice and y is the input of Bob, without revealing their input to each other.Yao's garbled circuit protocol [5] has become a practical solution for SFE.Moreover, garbling schemes (derived from the original garbled circuit construction) have also been identified as a useful cryptographic primitive.Most of the previous works focus on projective garbling schemes that assign two values to a wire, 0 and 1, such as the garbling scheme Half-Gates by Zahur et al. [6] or the work by Rosulek and Roy [7].
This paper considers garbling schemes in the offline/online setting.The offline phase performs function-dependent preprocessing.Concretely, the garbler garbles the circuit computing f and transmits the garbled gates to the evaluator but withholds the wire labels for the input layer.Once the input data of the garbler and the evaluator is available, the parties engage to obtain the appropriate wire labels for their respective inputs.Then, the evaluator evaluates the garbled circuit.The offline phase can be performed ahead of time and even batched to allow for optimal use of hardware and bandwidth if multiple COSIC KU Leuven, Belgium.This work is supported by the Flemish Government through FWO SBO project MOZAIK S003321N.function evaluations are expected.Hence, the online time, i.e., the time from having obtained the respective inputs to the evaluated output of the garbled circuit, is essential in this setting.This offline/online setting enables an efficient SFE as a service where the SFE service providers agree on a set of useful functions.The offline phase is run when the system is under low load and pre-processing results are stored.This way, the user of the service benefits from improved online times.
In this work, we examine a projective garbling scheme that assigns 2 n values to a wire.As a consequence, each wire in the circuit carries the semantics of an n-bit string.We generalize the encoding of FreeXOR by Kolesnikov and Schneider [8] to obtain a scheme where bitwise-XOR between n-bit strings is free.Our scheme allows fast evaluation of highly non-linear functions with n input bits at a moderate additional garbling and bandwidth cost in the offline phase.We demonstrate this trade-off by implementing several symmetric-key primitives of a certain structure (cf.Sect.VI).Further, the presented scheme may exhibit useful trade-offs for programs with many small, secret table lookups.
An Internet of Things (IoT) to Cloud scenario is a particular application of our garbling scheme.In this context, the focus is on encrypting data at the source, specifically on the IoT devices, employing a Substitution-Permutation Network (SPN) cipher and efficient distributed decryption in the Cloud prior to privacy-preserving computation on the data.This approach would facilitate end-to-end secure data collection and processing.Our proposed garbling scheme aims to balance the security demands of IoT-to-Cloud paradigms and the practical constraints of resource-constrained IoT devices, ultimately contributing to a more resilient and privacy-preserving IoTto-Cloud ecosystem.
While the new garbling scheme assumes semi-honest adversaries, i.e. neither the garbler nor the evaluator may deviate from the protocol, several general approaches exist to make a garbled circuit protocol secure in the presence of active adversaries, which are allowed to deviate arbitrarily from the protocol.Prominent examples are based on cut-and-choose [9], [10], [11], [12], on zero-knowledge proofs [13], [14] and authenticated garbling [15], [16], [17].Moreover, semi-honest garbling schemes can be compiled into actively secure three party protocols in the honest majority setting [18].
1) Technical Overview.:The core ideas of the scheme are summarized as follows.We encode an n-bit string with bits x 1 , x 2 , . . .x n into a wire label as where W is a random label.We call R i the wire label offsets that are randomly chosen by the garbler but fixed for all encodings in the circuit (see Definition 1 for details) and if x i = 1, x i R i is R i , otherwise it is the zero string.
For n = 1, this is the encoding of FreeXOR.We define two types of gates, XOR and projection gates.XOR gates compute the bitwise-XOR of two n-bit strings, require little garbling and evaluation work and are non-interactive, making them practically free.A projection gate computes any n-bit to m-bit function on a wire value by using Yao's garbled table lookup, i.e., encrypting output wire labels using the respective input wire label as key.We apply standard garbled row reduction [19] and point-and-permute techniques [20].For a projection gate, the garbler's work is 2 n calls to the encryption primitive, and 2 n − 1 ciphertexts have to be sent to the evaluator.However, the evaluator only makes a single call to the encryption primitive, independent of the "size" n of the projection gate.This makes the scheme attractive in the pre-processed garbled circuit model since any non-linear n-bit functionality is evaluated with one call to the cryptographic primitive.
2) Contributions.: We present a projective garbling scheme that assigns n-bit strings to each wire and in which XOR gates are free.The specific encoding of an n-bit string in a label allows seamless integration into existing garbling schemes that assign two values per wire.Following the spirit of modular proofs, we identify necessary properties of the cryptographic primitive that is used to encrypt the truth table.Subsequently, we obtain a generalization of tweakable circular correlation robustness (TCCR, first defined by Choi et al. [21]), which we call n-TCCR, for hash functions (denoted by H).
We apply the garbling scheme to compute a number of selected symmetric-key primitives that follow the SPN architecture.For these, we show a significant improvement in evaluation work in the online phase over the state-ofthe-art schemes Half-Gates [6] and ThreeHalves [7] that is traded off with moderate additional garbling work and/or communication cost in the offline phase.Table I shows the estimated evaluation improvement based on calls to H, which is complemented by a practical implementation in Sect.VI that shows that this evaluation improvement translates into practice (see Table VII).We obtain evaluation times for, e.g., AES as low as 0.016 ms.Furthermore, to facilitate implementation, we give Boolean circuits for the S-boxes of TWINE [22] and Midori64 [23], which is also used in MANTIS [24] and CRAFT [25], using only AND and XOR gates.To the best of our knowledge, our Boolean circuits give the smallest number of AND gates for these two ciphers, namely, 6 AND gates for TWINE's 4-bit S-box and 4 AND gates for the Midori64 Sb 0 S-box.Details can be found in Appendix A.

II. RELATED WORK
Recent improvements on Yao's garbled circuit protocol in the passive security setting focus on lowering bandwidth requirements, e.g., [30], [31].In the line of work [19], [20], [8] leading to the state-of-the-art schemes Half-Gates [6] and ThreeHalves [7], AND gates only require 2κ bits and ≈ 1.5κ bits, respectively, where κ is a security parameter, to be sent, while XOR gates are free.Recently, Acharya et al. [32] propose an approach to garbling where the garbled gate is no longer composed of ciphertexts from individual rows in the truth table, focusing only on binary gates.
While computation with binary values is mainly expressed in Boolean circuits with binary gates, gates with more inputs than two or more outputs than one have been studied as well.Dessouky et al. [33] define those gates as lookup tables and show how they can be evaluated in the passive security case in the Goldreich-Micali-Wigderson protocol [34].Damgård et al. [35], [36] design a table lookup for two-party secure computation and Keller et al. [37] extend it to the multiparty case based on secret-sharing.The basis for the aforementioned constructions is the one-time truth-table protocol OTTT by Ishai et al. [38].Table II compares the (estimated) communication cost of these approaches for a 4-bit and 8-bit lookup table, respectively.AES as a function has been studied explicitly by Durak and Guajardo [39], SKINNY and Photon were studied by Abidin et al. [40].However, both works are in the arithmetic setting.
In the garbled circuit domain, Fairplay [41] and TASTY [42] already compute larger gates.Huang et al. [43] focus on an 8-bit to 8-bit AES S-box gate.But unlike our scheme, these works consider multiple wires instead of multiple values per wire.They also do not provide any security proof for the larger gates.Computing a gate with multiple input wires Table III: Comparison of multi-input, multi-output gates in garbling schemes.We note the cost for a n-to-m-bit gate.

Scheme
Garbling Work Circuit Size Evaluation Work (κ-bit strings) This work  [44], [45].It is unclear how these constructions extend to the multi-input case in the context of garbled circuits.Our scheme uses a cryptographic primitive with fixed-length input, enabling the use of AES-NI instructions.An overview of garbling and evaluation work, circuit size and the hash function construction for a generic nto-m-bit gate is given in Table III.Heath and Kolesnikov [46] construct a garbling gadget that computes a one-hot outerproduct of two bit-vectors which can be used to select one entry from a truth-table based on an index known by the evaluator.However, their approach doesn't generalize to secret access to arbitrary truth-tables.
Since the work of Ball et al. [47] is conceptually very close to ours, we discuss it in detail in Sect.III-A.The main difference is that their scheme uses arithmetic circuits, where addition modulo an integer is free, while our proposal sticks to a bit representation where XOR is free.

III. BACKGROUND
We start with the arithmetic circuit scheme by Ball et al. [47] in Sect.III-A and detail the security model by Bellare, Hoang and Rogaway (BHR) [48] in Sect.III-B.Table IV lists the notation used throughout the paper.

A. Garbled Circuits for Bounded Integers
Ball et al. [47] propose a scheme based on garbled circuits that assigns integers x ∈ Z m to each wire in the circuit.In this representation, addition (in Z m ) is free in the same sense as FreeXOR.We briefly describe their scheme as our scheme is similar but represents n-bit strings per wire instead of numbers in Z m .
The wire encoding of x ∈ Z m is Addition is component-wise in the ring Z m .Here ⊙ denotes a scalar multiplication.For each m, ∆ m is a secret, random vector known by the garbler.
The scheme mainly offers two types of gates, addition and unary projection.For addition of wires a and b with output wire c, let W 0 a , W 0 b be the two input wire labels of zero, then the garbler computes W 0 c = W 0 a +W 0 b as the output zero label.The evaluator, given W x a and W y b for evaluation, computes Addition incurs neither transmitted ciphertexts nor invocations of the encryption primitive.Let ϕ : Z n → Z m be an arbitrary function.The projection gate Proj ϕ computes the operation ϕ(x), x ∈ Z n , ϕ(x) ∈ Z m .Let G be the garbled table, then the garbler fills G for every x ∈ Z n as follows: , where r is the secret cyclic shift offset.We can reduce the number of ciphertexts per projection gate to n − 1 by applying garbled row reduction.The zero label is obtained when r = −x, W 0 c = −H(W −r a ) − ϕ(−r) ⊙ ∆ m , from the encryption above.This, analogous to the binary case, fixes the ciphertext of the first garbled row to 0 λm .Again, one element of the label can be used as a pointer and replace the shift r during garbling if ∆ n is chosen appropriately.With this, the evaluator only has to decrypt the ciphertext the pointer indicates.

B. Security Model by Bellare, Hoang and Rogaway
Bellare, Hoang and Rogaway [48] define a security model for garbling schemes that formalizes the principle of circuit garbling as a cryptographic primitive.Many recent garbling schemes were proven secure in their model, e.g., [6], [49], [50], [7].As we will use the same model, we give a brief overview.
A garbling scheme is a tuple of Garble, Encode, Eval and Decode algorithms: • Garble: Transforms the input circuit f into the tuple (GC, e, d) where GC is the garbled circuit, e is the input encoding information (e.g., all semantic labels for input wires) and d is the decoding information.• Encode: Encodes a given input x using the semantic labels e and returns a garbled input X, e.g., the input label with semantic x. • Eval: Evaluates the garbled circuit GC using the input wire labels {W i } i∈Inputs and returns the output wire labels {W i } i∈Outputs .• Decode: Decodes the output wire labels {W i } i∈Outputs using the decoding information d and returns the plaintext output y ∈ {0, 1} m or ⊥ if the output wire labels are invalid.The garbling scheme must produce correct circuit evaluations for any circuit f and inputs x ∈ {0, 1} n .Let GC, e, d be the outputs of Garble(f ), and X i the output of Encode(x i , e) for i ∈ Inputs then where f (x) denotes the circuit evaluation in the clear.
Bellare et al. define two notions of secrecy.In the privacy notion, given (GC, X, d), a party cannot learn any information besides what is revealed from the final output y and the sideinformation function Φ.In our case, Φ = Φ topo* where only the circuit topology and the XOR gates are revealed but the function computed by projection gates remains hidden to the evaluator 1 .The privacy property can be achieved by giving a simulator S for the Garble function that only receives the output y and Φ.In the code-based game in Fig. 1, the garbling scheme is prv.simsecure if for every polynomialtime adversary A there is a polynomial-time simulator S such that Adv(prv.sim) is negligible, where Intuitively, if the output of the simulator is indistinguishable from the output of Garble and Encode on a circuit and input chosen by the adversary, the scheme is prv.simsecure.In the notion of obliviousness, the adversary does not learn the decoding function.So given (GC, X), a party cannot learn any information besides the side-information Φ.The advantage is defined analogously to Adv(prv.sim).

IV. THE SCHEME
In Sect.IV-A, we first describe the notation for a circuit comprising XOR gates and projection gates.Then, we detail how the garbler encodes n-bit strings and transforms them into wire labels.Next, in Sect.IV-B, we show how XOR gates are garbled and evaluated, followed by a description of how projection gates are garbled and evaluated.Section IV-C describes higher-level gadgets that can be obtained from the aforementioned gates.In Sect.IV-D, all concepts are pieced together to describe the garbling, evaluation and decoding function.We also describe how input is handled.The complete garbling scheme Π is given in Fig. 2. We start with some general notations.Let lsb n (W ) be the n least significant bits2 of the bit-vector W ∈ {0, 1} k .With k we denote the wire label length.We use a hash function H : {0, 1} k × {0, 1} τ → {0, 1} k that accepts a k-bit input, a τ -bit tweak and outputs k bits.Further properties of H are presented in Sect.V-A.

A. Circuit Definition
We define a circuit with p-bit input and q gates.The function computed by the circuit is denoted by f .Let the wire index be 1, . . ., p, p + 1, . . ., p + q, where the input wires have index 1, . . ., p and the output wire of the i-th gate has index p + i.We denote the set of input wire indices as Inputs, and the set of output wire indices as Outputs.We associate a bit-length ℓ(i) to each wire i.Let n denote the maximum bit-length of wires used in f , then we use bit strings of length k = κ + n as wire labels.Let Gates be a topologically sorted list of gates G 1 , . . ., G q .We distinguish two types of gates: XOR and projection gates.XOR gates accept two wires of the same bit-length n as input and output a wire with bit-length n.The unary projection gate accepts one n-bit wire and outputs one m-bit wire.
Definition 1 (Wire Label Offsets).For each bit-length n (1 ≤ n ≤ n) that is used in f , a wire label offset is a bit-vector of length k = κ + n with κ random bits and n fixed bits.The garbler draws the matrix M ∈ {0, 1} κ×n at random and appends fixed bits to each column-vector to form , where I n ∈ {0, 1} n×n is the identity matrix.
The column vector R i in R n is the i-th wire label offset.We denote the distribution from which The matrix R n is used throughout the whole circuit for all wires of bit-length n.We use the last n bits of the label to fix distinct values to allow point-and-permute.The inner product of x • R n is defined as Definition 2 (Wire Label Encoding).The encoding W x i of an n-bit string x ∈ {0, 1} n on a wire with index i is defined as Note, this yields a unique encoding for all x and R even if the random part M is linearly dependent in the columns because the lower n bits of x • R are always unique due to I n .
Intuitively, there are n distinct offsets R, one for each encoded bit.The offset applied to a wire label that encodes x is the linear combination of R values.

B. Gates
For an XOR gate with n-bit input wires a and b, and output wire c, the garbler generates the output wire label To evaluate an XOR gate, let W a and W b be the wire labels that the evaluator obtained as input labels for the XOR gate.The output label is then computed as A projection gate Proj ϕ computes the unary projection ϕ : {0, 1} n → {0, 1} m , a n-to-m-bit function.Let a be the input wire index to the projection gate and c be the index of the output wire, the garbler first draws the output wire label for 0 at random: W 0 m c ←$ {0, 1} k and then generates else ▷ Gi with n-bit input wire a 13: for x ∈ {0, 1} n do 15:   2 n ciphertexts for each x ∈ {0, 1} n and stores the result in the garbled table at the position indicated by the pointer bits, i.e., GC[c, . We apply the row-reduction technique and reduce the number of ciphertexts that need to be sent by one.Let a be the input wire index to the projection gate Proj ϕ and c be the index of the output wire.Then, the garbler chooses the output wire label for 0 m as and computes the remaining ciphertexts as described above.
Since the first ciphertext (where x = lsb n (W 0 n a )) is always 0 k , it does not need to be sent.The number of rows sent to the evaluator is therefore 2 n − 1.
For evaluation, let W a be the wire label that the evaluator obtained as input to the projection gate.The output label W c is computed by where the position of the ciphertext to evaluate is indicated by the pointer bits of the input wire label.The first ciphertext is set to 0: GC[c, 0 n ] = 0 m .

C. Circuit Constructions
Below, we give useful gadgets comprised of XOR and projection gates.Wire Composition.We can compose an n-bit wire a with an m-bit wire b resulting in a (n + m)-bit wire c.
The composition construction computes the functionality , where s : {0, 1} n → {0, 1} n+m is defined as s(x) = x||0 m and s ′ : {0, 1} m → {0, 1} n+m is given as s ′ (y) = 0 n ||y.Wire composition costs 2 n + 2 m ciphertexts to garble and two ciphertexts to evaluate.Note that the construction is not limited to two arguments.
It is efficient to compose many wires together at once instead of cascading or using a tree-based approach 3 .E.g., to compose four 1-bit wires a, b, c, d , we may use where Constants.In the garbling scheme, we can encode public constants or constants known only to the garbler at no cost.Let x ∈ {0, 1} n be the constant for the n-bit wire a, then the garbler chooses W 0 n a ← x • R n .This fixes the label W x a to 0 k .No ciphertext is sent to the evaluator.Likewise, the evaluator uses W a = 0 k for further evaluation.

D. Garbling Scheme
We now describe the complete garbling scheme (see Fig. 2).Garble.The garbler chooses n matrices of offset values (see Definition 1).For each input bit i, a wire label W 0 i is chosen uniformly at random.The garbling process applies the operations for projection and XOR gates as described in Sect.IV-B gate-by-gate in topological order.In the end, the garbling routine outputs the ciphertexts, input wire values, offsets and decoding information.
Encoding and Oblivious Transfer.The garbler encodes their own input by picking the respective wire label.In Yao's protocol, the evaluator obtains the appropriate wire labels that correspond to its input via oblivious transfer (OT) [51].Using OT extensions [38], [52] speeds this up in practice.To obtain the correct label for an n-bit wire, one could simply perform a 1-out-of-2 n OT.Naor and Pinkas [53] show how to reduce this to n 1-out-of-2 OTs by introducing additional pseudorandom function (PRF) evaluations.However, using the FreeXOR property of our scheme, we can instead perform only n 1-out-of-2 OTs (as in a garbling scheme with 2 wire labels).For each input bit b i at position i, the sender sends where R i is the i-th column vector in R n .To obtain the wire label for the n-bit wire, we XOR the obtained labels together at no additional cost.Note that W 0 n i is a fresh random wire label for each bit i of the input.Evaluation and Decoding.Once the evaluator obtains the garbled inputs, it computes the garbled output of each gate accordingly (see Sect.IV-B).Having computed the garbled output, the evaluator may either share the wire labels with the garbler or directly use the decoding information d i = lsb n (W 0 n i ) for output wire i ∈ Outputs in the decoding function to obtain the output bits in the clear.Let us briefly look at why this decoding scheme is correct.Let i be an output wire.Since we fixed lsb n (y • R n ) = y • I n = y by construction of the offset values, for any value y ∈ {0, 1} n , we have

V. SECURITY
Using the BHR security model (see Sect.III-B) we show that if a hash function satisfies the properties of n-TCCR security defined in Sect.V-A below, our scheme is prv.sim(Sect.V-B) and obv.sim (Sect.V-C) secure.We sketch how to achieve authenticity in Sect.V-D.

A. (n-)TCCR Security
We revisit the tweakable circular correlation robustness (TCCR) definition by Guo et al. [44] adapted to our notation.Definition 3 (TCCR Security [44]).A TCCR (tweakable circular correlation robust) hash function H is a function {0, 1} k × {0, 1} τ → {0, 1} k that accepts a message m and a tweak t.In the TCCR security game, the distinguisher D TCCR is given one of the two oracles with signature {0, with the goal to decide which is the oracle given to it.The distinguisher doesn't know the secret value R ∈ {0, 1} k , R ←$ R TCCR and is only allowed to make legal queries.An illegal query is (m, t, 1 − b) if (m, t, b) has been queried before.We define the advantage as where D O signifies that the distinguisher has access to oracle O.We call H TCCR secure if Adv R (D TCCR ) is negligible in the security parameter κ.
Note that the advantage of D TCCR depends on the distribution of the secret value R. Next, we define n-TCCR security, a generalized TCCR notion incorporating n secret offsets.

Definition 4 (n-TCCR Security). A n-TCCR hash function
H is a function {0, 1} k × {0, 1} τ → {0, 1} k that accepts a message m and a tweak t.In the n-TCCR security game, the distinguisher D n-TCCR is given one of the two oracles with signature {0, ) is a random function with the goal to decide which is the oracle given to it.We interpret a, b ∈ {0, 1} n as binary vectors, is the linear combination of offsets defined by a.The distinguisher doesn't know the secret value R and is only allowed to make legal queries.An illegal query is a = 0 or (m, t, a, b ′ ) if (m, t, a, b) has been queried before for b ̸ = b ′ .We define the advantage as We call H n-TCCR secure if Adv Rn (D n-TCCR ) is negligible in κ.

B. Privacy
The prv.sim definition states that given the garbled circuit GC, all the labels of the garbled input X and the decoding information d, no information is revealed about the input except from what can be deduced from the output y.
Proof.We define a simulator S (see Fig. 3) and show through a series of hybrids that the output of S is indistinguishable for an adversary from the output of Garble.We require n ≪ κ, i.e., the largest bit length n used in a wire in the circuit is small compared to the security parameter κ, to ensure that for any adversarially chosen circuit, both the garbling scheme and the simulator run in polynomial time.When evaluating a garbled circuit, let the assignment of active labels to the 1: function S(f, y) 2: for i ∈ Inputs do 3: if Gi = XOR then ▷ Gi with n-bit input wires a, b 7: else ▷ Gi with n-bit input wire a and output ▷ size m of the function Gi realizes 9: for x ̸ = 0 n ∈ {0, 1} n do 12: wires be called the active path, i.e., for input wires, the active labels are retrieved via OT, for gate outputs, the active wires are retrieved by decrypting the row denoted by the point-andpermute bits.
The idea of the simulator is to produce a garbled circuit with a fixed active path.The simulator chooses the wire labels such that • the garbled input X that is handed to the adversary corresponds to 0 p ; • the active label on each gate's output wire that the adversary obtains if they choose to evaluate the circuit with X is W 0 n (see Line 10 in Fig. 3).
The simulator adapts the decoding information s.t.if the garbled output is W 0 n , the expected output y is decoded.
S ≈ G 1 .Hybrid G 1 (see Fig. 4) describes the simulator from the perspective of the evaluator.Let x be the input that the adversary chooses in the game.We view x as a black box as it is unknown.Suppose we evaluated the circuit on x in plaintext.We denote v i as the active value on wire i.Instead of fixing the active path on labels W 0 n , we fix it on W vi .The output values GC, d and the outputs of S are identically distributed as W 0 n and W vi are both distributed uniformly at random.Further, the change of input arguments, does not change the distribution since all inputs (x, 0 m ), ∀x ∈ {0, 1} n ̸ = 0 n and (v a ⊕x, ϕ(v a ⊕x)), ∀x ∈ {0, 1} n ̸ = v a , respectively, are unique and therefore amount to fresh randomness from the oracle, irrespective of ϕ.
Xi ← W vi if Gi = XOR then ▷ Gi with n-bit input wires a, b 18: else ▷ Gi with n-bit input wire a 20: for x ̸ = va ∈ {0, 1} n do 23: the n-TCCR secure function H (see Definition 4) In hybrid G 3 (see Fig. 5), we no longer compute the wire values v i explicitly from the black-box input x.We fix an encoding for v i , namely v i = 0 n .For the input wires, note that x i = v i by definition of EVALWIRES, so X i ← W xi i instead of W vi i .Further, the ciphertext indexing GC[i, •] (Line 14 in Fig. 5) is identical after the re-write.In G 2 , ⊕x by definition of R n .In the output of all gates G i , we now maintain the invariant with x ∈ {0, 1} n .
And for the decoding information, first note that for i ∈ The decoding information in G 2 and G 3 yield correct results when used with their respective garbled inputs.d G2 and d G3 are both uniformly distributed as lsb ) are distributed at random.So d G2 and d G3 remain indistinguishable.We conclude the proof by noting that G 3 and Garble yield identical outputs in the prv.sim game.This can easily be seen when the exceptional case for x = 0 n (Line 21 in Fig. 5) in the projection gates part is incorporated into the loop and the computation of d is re-written, G 3 is a description of the Garble function.

C. Obliviousness
The notion of obv.sim expresses that the adversary cannot learn any information given the garbled circuit GC and all input wire labels X.Unlike the privacy notion, the adversary does not have access to the decoding information d.
Proof.Let S auth be S from Fig. 3 with the lines 13-14 removed.Then we note that the computation of GC and X doesn't depend on y, neither in S auth nor in one of the hybrids G 1 , G 2 , G 3 .We can thus use the same reasoning as for prv.sim security, omitting parts that correspond to y or d.

D. Authenticity
Authenticity states that an adversary cannot forge wire labels that are not obtained through evaluating the garbled circuit.Clearly, the presented scheme does not satisfy this property as any wire label is decoded to output bits.If authenticity is desired, we modify the decoding information d to list hashes of all output wire labels and associations to their semantic meaning.As in [48], the decoding function checks if the presented wire is indeed in the list d.

VI. EVALUATION OF SPN PRIMITIVES
In the following, we discuss how SPN primitives with a specific structure can be implemented with our new garbling scheme and how this improves over the state-of-the-art.Note that we don't intend to compare the performance of the primitives among each other in MPC protocols.Instead, we focus on how each primitive can be accelerated.Consequently, we will not consider other traditional or MPC-friendly primitives.We compare the state-of-the-art garbling schemes Half-Gates by Zahur, Rosulek and Evans [6], which we abbreviate ZRE15, as well as the work of Rosulek and Roy [7], abbreviated RR21.Both schemes support free XOR gates and AND gates on wires holding one bit.
In SPN-based primitives, a state is updated with a round function consisting of a substitution layer, a permutation layer, a round constant and/or (round) key addition layer.SPNs are commonly used to construct block ciphers and pseudo-random permutations used, e.g., in hash or MAC functions.
We show an efficient circuit representation with projection gates for primitives that satisfy the following conditions for state and round function parts.
• State.The state is (conceptually) split into n-bit cells.
• Substitution Layer.The substitution layer consists of Sboxes that are applied to each cell.• Permutation Layer.The permutation layer can be described by a permutation on the cells and/or by a mixing matrix which encodes a fixed matrix multiplication with the state.In this paper, we focus on primitives with a binary mixing matrix.• Round Constant/(Round) Key Addition Layer.The round constant or (round) key is XORed cell-wise.With this structure, we set n = n and implement a single cell as n-bit wire.Each S-box in the substitution layer is replaced with an n-bit projection gate computing the same functionality.The permutation layer and the addition layer are expressible with XOR gates only.
We identified nine SPN primitives in the literature that fulfill the conditions.Since the studied primitives have at most 8-bit cells, we set n = 8.

A. Implementation Details
For the projection gates implementation, we assume that the input block is already setup in n-bit wires where n denotes the cell size in bits.This doesn't incur additional cost since the input phase using OT can already share wire labels with the desired wire label offset, as detailed in Sect.IV-D.For the implementation using AND gates, only the S-box costs AND gates in the data path of the primitive.We selected implementations for the S-boxes with the lowest number of AND gates since their cost dominates in Half-Gates and RR21.Table V details the number of projection and AND gates for each primitive.
For the 4-bit S-box used in TWINE-80 and TWINE-128 and for the 4-bit S-box used in Midori64, MANTIS and CRAFT, we found new circuits using the smallest number of AND gates so far, reducing the number of AND gates for the TWINE Sbox from 7 to 6 and for the Midori64 S-box from 8 to 4. We used the heuristic optimization tool LIGHTER by Jean et al. [54] operating on a customized cost metric, for more details see Appendix A.
For the key, we assume individual key bits to be available in 1-bit wires as this eases key scheduling in many cases.Note that the cost to transform the (round) key bits into n-bit wires is taken into account.In scenarios where one party knows the complete key, e.g., to offer blind symmetric encryption or decryption where the encryption or decryption is performed without learning the message and ciphertext, the key schedule does not need to be computed within the garbling scheme.Instead, if the garbler knows the key, they can compute the key schedule separately and insert the round keys as secret constants.Similarly, if the evaluator knows the key, they may receive the wire labels for round keys via OT instead.
If the key is shared among the players using a linear secretsharing scheme, for instance as k = k G ⊕ k E where k G is the garbler's share and k E is the evaluator's share, the key schedule can be computed outside of the garbling scheme by each player on their share instead for ciphers with a linear key schedule, e.g., for Piccolo, Midori, SKINNY, MANTIS and CRAFT.The resulting round key shares can then be treated as input and are recombined using only linear operations saving any gates specified in the key schedule column for the cipher.However, the gate counts presented here compute the entire key schedule of the primitive which is required in the distributed encryption/decryption scenario.

B. Performance
The gate counts from Table V can be turned into calls to H and sent ciphertexts.In ZRE15, each AND gate costs 4 calls to H for garbling, 2 ciphertexts are sent, and 2 calls to H for evaluation.In RR21, each AND gate costs 6 calls to H for garbling, 1.5 ciphertexts are sent, and 3 calls to H for evaluation.
Table VI lists all studied primitives with the corresponding trade-off in garbling and communication cost, and evaluation improvement measured in the number of calls to H and in the number of ciphertexts, respectively.We found three primitives in five configurations in total where our scheme improves in Table V: Detailed gate counts for setup, key schedule and data path of the selected symmetric primitives.The top entry denotes the number of AND gates while the bottom entry denotes the number of projection gates.
Table VI: Estimated performance difference for selected symmetric ciphers.The notation ×x denotes an improvement by factor x in the category with respect to the base scheme, i.e., x > 1 is an improvement, x < 1 is degradation.both garbling and evaluation cost over both reference garbling schemes.In the remaining primitives and cases, projection gates trade off higher garbling and communication cost for faster evaluation performance.Note that for most primitives, the evaluation improvement is much higher than the additional communication cost.E.g., for Midori64, at a cost of slightly more garbling work (≈ 6% more) and less than twice the number of sent ciphertexts, we improve the evaluation work by a factor of five.We detail the implementation approach with projection gates for the ciphers in Appendix B. Next, we experimentally compared the performance of four primitives in nine configurations in ZRE15, RR21 and our scheme.RR21 has been implemented by Hamacher et al. [56] in the MOTION framework while ZRE15 and our scheme have been implemented in MP-SPDZ [57].Table VII lists the garbling and evaluation time, and the circuit size.We achieve a considerable speed-up in evaluation time of, e.g., factor 20 to 45 for AES.In general the expected trade-off of faster evaluation and larger circuit size is immediate for all implemented ciphers.However, we observed differences in garbling and evaluation time between ZRE15 and RR21 executions of the same circuit which cannot be explained by the differing number of hash function calls.We believe the observations are due to the implementation in the two MPC frameworks which have differing overhead.
Besides oblivious computation of SPN primitives, statements where a prover proves knowledge of a key k to a pair x, y s.t.AES k (x) = y are highly relevant.Garbling schemes have been used to construct efficient interactive zeroknowledge protocols that prove statements over "unstructured" languages expressible in Boolean circuits [58], [59].Using our garbling scheme, proving statements involving SPN primitives would be much faster since proving equates to evaluating the garbled circuit.This is traded-off with a larger proof size.

VII. CONCLUSION
We presented a garbling scheme that encodes n-bit strings per wire.It generalizes the idea of FreeXOR and integrates seamlessly into state-of-the-art schemes with FreeXOR on the 1-bit wire level.Projection gates can be used to convert strings from nto m-bit or to compute arbitrary nto m-bit functions, while XOR is free.We prove the scheme secure under the assumption of a n-TCCR secure hash function, a generalization of TCCR security.
For an important application in two-party secure function evaluation, the evaluation of symmetric primitives, we show that substitution-permutation network primitives with certain structure can be efficiently implemented in our scheme.Compared to AND gate-based circuits, we show a high-speed evaluation that is traded off with moderate additional garbling or communication cost.In scenarios where the garbling scheme runs in an offline/online setting, we shift the garbling work and garbled circuit transfer to the evaluator into the pre-processing phase and thus obtain a high-speed online phase.We obtained a considerable performance improvement, a 4-to 72-times faster online phase, for nine primitives in literature when taking hash function calls as a metric.Implementation of some ciphers shows that this evaluation performance improvement translates into practical applications.

APPENDIX A FORMULAS FOR S-BOXES OF TWINE AND MIDORI64
We use the heuristic optimization tool LIGHTER by Jean et al. [54] operating on a customized cost metric.We restrict the tool to use only NOT, AND and XOR gates with the associated costs of 0.01, 1 and 0.01, respectively.These costs describe our setting where NOT and XOR gates are practically free, i.e., very low cost, and AND gates are expensive, i.e., high cost 4 .The tool then searches an implementation with low total cost following a heuristic.This approach reduces the number of AND gates for the TWINE S-box from 7 AND gates (algebraic normal form) to 6 AND gates (see Fig. 6a).For the Midori64 S-box, the number of AND gates is reduced from 8 AND gates (formula given in the specification [23]) to 4 AND gates (see Fig. 6b).

APPENDIX B IMPLEMENTATION OF SPN PRIMITIVES
In the following, we give a more detailed explanation of the implementation from Tables V and VI for each primitive.
The 4-bit S-box of TWINE [22] computed using 6 AND gates.
(b) The 4-bit S-box Sb0 of Midori64 computed using 4 AND gates.
Figure 6: Implementation formulas for the TWINE and Mi-dori64 S-boxes.The input bits are x 0 through x 3 , the output bits are x ′ 0 through x ′ 3 .

A. AES
The key schedule of AES-128 applies 4 S-boxes per round to the state.All remaining key schedule operations can be expressed using XOR gates.The AES S-box can be computed with 32 AND gates, as described by Boyar and Peralta [60].In the data path, 16 S-boxes are applied per round.The ShiftRows, MixColumns and AddRoundKey steps can be expressed with XOR gates.AES-128 defines 10 rounds.
For an implementation using projection gates, we first compose the key into 8-bit wires.Then, the key schedule can be computed by replacing the S-box with a single 8bit projection gate computing the same functionality.For the data path, we replace S-boxes with 8-bit projection gates.The mixing step in AES cannot be described with a binary matrix alone but we re-write the MixColumns step as where s 0 , . . .s 15 are the 8-bit cells of the state and f (s) = 2s computes the finite field doubling in GF(2 8 ) defined for AES.Therefore, we compute a round of AES with 2 • 16 8-bit projection gates.This yields a correct result, since s ⊕ f (s) = 3s in GF(2 8 ).

B. CRAFT
The key and tweak bits are first composed into 4-bit wires.The remaining key schedule is linear w.r.t.4-bit wires.
The data path is linear except for the 16 S-boxes that are applied in each of the 30 rounds.CRAFT uses the Midori Sb 0 S-box which can be computed with 4 AND gates (see Fig. 6b), or one 4-bit projection gate.

C. Fides
The internal state of Fides is a 4 × 8 grid of 5-bit and 6-bit cells for Fides-80 and Fides-96, respectively.We can compute the 5-bit S-box with 10 AND gates (see Fig. 7), or one 5bit projection gate.The 6-bit S-box may be computed with 34 AND gates expressing each output bit in algebraic normal form.This approach doesn't aim to optimise the number of AND gates used.However, we count common terms from different output bits only once since they can be shared as intermediate results.In our garbling scheme, the S-box is expressed in one 6-bit projection gate.

D. MANTIS
The key k = k 0 ||k 1 is expanded as defined in [24]: Afterwards we compose the required 4-bit wires for the expanded key costing 192 1-bit projection gates.MANTIS uses the Midori Sb 0 S-box, which can be computed with 4 AND gates (see Fig. 6b), or one 4-bit projection gate.

E. Midori64
The key bits are first composed into 4-bit wires.The key schedule can then be computed using XOR gates between the 4-bit wires.
In the data path, all steps except for the S-box can be computed with XOR gates alone.The 4-bit S-box Sb 0 can be computed with 4 AND gates (see Fig. 6b), or one 4-bit projection gate.

F. Piccolo
The key schedule for Piccolo-80 and Piccolo-128 can be computed using only XOR gates after the key bits are composed to 4-bit wires.
Piccolo's data path applies the 16-bit function F two times per round to half of the state.This function F is composed of a parallel application of 4 4-bit S-boxes, followed by a mixing matrix multiplication, followed by another parallel application of 4 4-bit S-boxes. .
The mixing matrix encodes multiplications with elements in the finite field GF(2 4 ) with the irreducible polynomial x 4 + x + 1.Clearly, Piccolo doesn't have the property of a binary mixing matrix.However, we can still provide an implementation with projection gates at additional cost.
We re-write the function F as where f (s) = S(s) and g(s) = 2S(s).Subsequently, we compute f , g and the remaining S-box layer S via 4bit projection gates.Therefore, F ′ can be computed with 4+4+4 = 12 4-bit projection gates.This re-writing is correct because f (s) ⊕ g(s) = 3S(s) w.r.t GF(2 4 ).

G. SKINNY
The SKINNY cipher family comprises three tweakey sizes, 64, 128 and 192 bit, of which we include the size 128-bit here.The key schedule for SKINNY-64-128 also includes the application of a linear feedback shift register (LFSR) to 8 per round.This LFSR is implemented with a 4-bit projection gate.
The SKINNY data path contains 16 4-bit S-boxes per round.Each S-box is implemented with 4 AND gates using the formula from [24], or one 4-bit projection gate.

H. TWINE
The key bits are first composed into 4-bit wires.The key schedule is linear except for 2 and 3 S-box computations per round for TWINE-80 and TWINE-128, respectively.In total, the key schedule comprises 35 rounds with S-box computation for both TWINE-80 and TWINE-128.
The data path is the same for TWINE-80 and TWINE-128 and contains 8 S-boxes per round in 36 rounds.The S-box can be computed with 6 AND gates (see Fig. 6a), or one 4-bit projection gate.

I. WAGE
The internal state of the WAGE permutation is represented as 37 7-bit cells.We load the initial state by computing the 7-bit wire composition for all bits.
We write s i to denote the i-th 7-bit cell and s ′ i to denote the updated i-th 7-bit cell.The internal state is updated 111 times in the following procedure: The 7-bit functions WGP, Dbl and SB denote a Welch-Gong permutation, finite field doubling and a lightweight 7-bit Sbox.All three are implemented using a 7-bit projection gate.Further, rc 0 is a round-dependent constant.

Figure 1 :
Figure 1: For every circuit f and input x of the adversary's choice, the respective game function is called and the adversary outputs a choice b ′ given (GC, X, d) (resp.(GC, X)).The adversary wins if b = b ′ . do

10 :
if Gi = XOR then ▷ Gi with n-bit input wires a,b 11:

Figure 2 :
Figure 2: The new garbling scheme Π comprises a garble, evaluation, encoding and decoding function.

Figure 4 :
Figure 4: Hybrid G 1 .The simulator from the perspective of the evaluator where x is a black box value.Values in a box v i highlight the difference between S and G 1 .

Figure 5 :
Figure 5: Hybrid G 3 .We fix the encoding of W vi i to W 0 n i .Values in a box 0 n highlight the difference between G 2 and G 3 .

Table I :
[6]luation work improvement for selected symmetric primitives over ZRE15[6].Garbling and communication trade-off is listed in TableVI.

Table II :
Comparison of pre-processed lookup table (LUT) approaches in MPC protocols.The depth of the circuit is denoted by d.Total communication is denoted in kilobytes.

Table VII :
Performance benchmark results for some SPNciphers comparing garbling and evaluation time as well as the circuit size.All reported numbers are amortized from 500 (for SKINNY-128-*) and 1000 parallel primitive calls averaged over 10 repetitions.