Cryptography

Backlinks

1. Cryptography
- 1.1. Creating a custom hash function

#ret #hw

1 Cryptography

This is a bit of a weirder one..

Done prior:

-   Try turning on two-factor authentication, if you have that option.
-   Think about how secure your password is, relative to how attackers would try guessing. Is it a dictionary or relatively common word? Is it all lower case letters, or do you also include uppercase letters, numbers, and/or symbols? Think about how large a search space is needed to find your password. Do some
-   Generate a public/private key pair for yourself. Put the public key on our test laptop so that you can SSH into your account without typing a password. You could also use this key for your Github account, if you have one.

1.1 Creating a custom hash function

1.1.1 Requirements

What are the requirements for a hash function?

Source
- No preimage: given \(y\), it should not be feasible to find \(x\) such that \(h(x) = y\).
- No second preimage: given \(x_1\), it should not be feasible to find \(x_2\) (distinct from \(x_1\)) such that \(h(x_1) = h(x_2)\).
- No collision: it should not be feasible to find any \(x_1\) and \(x_2\) (distinct from each other) such that \(h(x_1) = h(x_2)\).

What if we just.. use a neural network?

1.1.2 NN hash

Intuitively, using a neural network as a hash function seems like it won't work. In fact, I believe it doesn't work, I just don't know why it doesn't work. * Of course, a vanilla neural network won't work because we can just train a model to reverse its mapping. To solve this problem, we can use something I call permute layers.

Permute Layers

Concept -- Fundamentally, the concept of permute layers is to take an input, and do some operation on it such that a non-continuous output space is generated. The goal would be that: - Similar inputs lead to vastly different outputs - Make there no guarantee that an adversarial NN's guess of \(x\) is closer to \(x+\epsilon_1\) than it is to \(x+ \frac{1}{\epsilon_2}\) - As in, we can't train a NN to reverse it!

Implementation -- One possible implementation of these permute layers would be simply permuting the bits that make up our tensors.

1.1.3 Proof of concept

The code below is meant as proof of concept – or rather, demonstration of concept. It is messy, unoptimized, and surely error ridden, but it seems to work.

Outline: - Create a deep neural network with randomly initialized weights. - Ensure that it is deterministic with a set seed. This seed could potentially be carried with the hash. - Add permute layers throughout the model. - Store the first 8 bits to preserve the mantissa, then concatenate and permute the rest. - Reform floats from the permuted bits, and convert back into a tensor. - Do some post-processing to clean the output - Convert each float in the output tensor to bits, then take the last half of each as this is the part that is most shuffled - Convert to base 10, then to 16 - Profit

The code can also be found here.

#####################
#       SETUP       #
#####################

# imports
import torch
import torch.nn as nn
import torch.nn.functional as F
import math
import struct
from codecs import decode
import numpy as np
import string

INP = [0.1, 0.0, 1.01] # our input!
SAFE = 8 # don't permute the first 8 bits, so we don't get infs and 0

# makes things deterministic
np.random.seed(0)
torch.manual_seed(0)

torch.set_default_dtype(torch.float64) # make sure we use the right datatype!


#######################
#       BASE NN       #
#######################

class Net(nn.Module): # define the model

    def __init__(self):
        super(Net, self).__init__()
        self.l1 = nn.Linear(3, 128) # linear layer, with input size 3
        self.pl1 = PermuteLayer(128,256) # custom permute layer

        self.l2 = nn.Linear(256, 512)
        self.pl2 = PermuteLayer(512, 512)

        self.l3 = nn.Linear(512, 256)
        self.pl3 = PermuteLayer(256, 256)

        self.l4 = nn.Linear(256, 128)
        self.pl4 = PermuteLayer(128, 8) # output a tensor with 8 floats


    def forward(self, x): # run it through!
        x = [100*(y+1) for y in x] # add 1 and multiply by 100 for each input element
        x = torch.tensor(x) # then convert it to a tensor

        x = self.l1(x) # run it through the layers
        x = x.view(-1, 128)
        x = self.pl1(x)
        x = self.l2(x)
        x = self.pl2(x)
        x = self.l3(x)
        x = self.pl3(x)
        x = self.l4(x)
        x = self.pl4(x)

        return x


####################################
#       CUSTOM PERMUTE LAYER       #
####################################

class PermuteLayer(nn.Module): # not my code! default linear code comes from https://auro-227.medium.com/writing-a-custom-layer-in-pytorch-14ab6ac94b77
    # after modification, acts as a normal linear layer except it permutes the bits.
    def __init__(self, size_in, size_out):
        super().__init__()
        self.size_in, self.size_out = size_in, size_out
        weights = torch.Tensor(size_out, size_in)
        self.weights = nn.Parameter(weights)  # nn.Parameter is a Tensor that's a module parameter.
        bias = torch.Tensor(size_out)
        self.bias = nn.Parameter(bias)

        # initialize weights and biases
        nn.init.kaiming_uniform_(self.weights, a=math.sqrt(5)) # weight init
        fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.weights)
        bound = 1 / math.sqrt(fan_in)
        nn.init.uniform_(self.bias, -bound, bound)  # bias init

    def forward(self, x): # where the permuting happens
        # this part isn't pretty..
        # but according to Dr. Brian Dean, we don't need to constant factor optimize!

        bits = "" # store bits in a char array
        saved = [] # save the bits we want to protect

        for i,v in enumerate(x[0]): # loop through the floats
            tnsr = float_to_bin(v) # convert them to binary
            saved.append(tnsr[:SAFE]) # save what we need to
            bits += tnsr[SAFE:] # and add to the char array

        p = np.random.permutation([x for x in bits]) # permute it!
        p = ''.join(map(str, p)) # and then.. join it back together

        converted = []

        # loop through p, chunk it into segments
        for i in range(len(p)//(64-SAFE)):
            # convert segment to floats
            item = bin_to_float(saved[i]+p[(64-SAFE)*i:((64-SAFE)*i)+(64-SAFE)])
            converted.append(item)

        converted = torch.tensor([converted]) # change it back to a tensor
        x = converted

        w_times_x= torch.mm(x, self.weights.t()) # matrix multiply them
        return torch.add(w_times_x, self.bias)  # w times x + b


#########################
#       HELPERS         #
#########################

# not my code! from https://stackoverflow.com/questions/16444726/binary-representation-of-float-in-python-bits-not-hex
def bin_to_float(b):
    """ Convert binary string to a float. """
    bf = int_to_bytes(int(b, 2), 8)  # 8 bytes needed for IEEE 754 binary64.
    return struct.unpack('>d', bf)[0]

def int_to_bytes(n, length):  # Helper function
    """ Int/long to byte string.
        Python 3.2+ has a built-in int.to_bytes() method that could be used
        instead, but the following works in earlier versions including 2.x.
    """
    return decode('%%0%dx' % (length << 1) % n, 'hex')[-length:]

def float_to_bin(value):  # For testing.
    """ Convert float to 64-bit binary string. """
    [d] = struct.unpack(">Q", struct.pack(">d", value))
    return '{:064b}'.format(d)


def int2base(x, base): # not my code! modified from https://stackoverflow.com/questions/2267362/how-to-convert-an-integer-to-a-string-in-any-base
    if x < 0:    sign = -1
    elif x == 0: return digs[0]
    else:        sign = 1
    x *= sign
    digits = []
    while x:
        digits.append(digs[x % base])
        x = x // base
    if sign < 0: digits.append('-')
    digits.reverse()
    return ''.join(digits)
digs = string.digits + string.ascii_letters

########################
#       OUTPUT         #
########################

model = Net()

result = list(model(INP).detach().numpy()[0]) # convert output to list

output_bits = ''
for i in result:
    # convert to bits, then take the second half
    # because it's more shuffled
    output_bits += float_to_bin(i)[32:]

print(int2base(int(output_bits, 2), 16)) # clean the output up and print it out

1.1.4 Results

Running the hash function with a few example inputs gives us:

Input	Output
1.0, 0.0, 0.0	7707b1d82074095cd7ec672e372dd54002594a60762ddab49954fd7536983af3
1.0, 0.1, 0.0	489014b1b6a164c4410abb09d38fa2c974dda663853a870d8da2e7bbe1276561
1.0, 0.01, 0.0	ba1c5e18233cfd68ef14fa7d77cf47fbaeca8182bcc6688fdfce32b0feab6e69
1.0, 0.01, 1000.0	593c5d758312de157e7bff4733227ddb95033a364724e8e7492a355ca32b56e0

Raw Output
6898.310410004538, -27752.31448079027, 5368.972044730368, 15157.626683930517, 2565.062517919854, 3530.8778547601196, -1500.8912099151323, 12012.064166096481
10800.338151941398, 7383.0027867194185, 3447.3852618554415, 27197.54416266343, -16078.800441461795, 15308.535315814917, -10308.394947398328, 21108.498117302897
6385.139558575604, -11019.165137885793, -2616.5486990505, -4771.384640650819, -27374.854418398354, -22029.83964691363, 32460.107410001277, 39128.374837604184
28033.42732152106, 27429.429875103833, 19619.10146999292, -20498.237496016412, 28754.94659500775, 20997.47309229607, 11754.59598281484, 13.915845011749695

Cosine Similarity of these outputs were calculated with:

similarity = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)

The sorted Cosine Similarities of the outputs above are as follows:

A	B	Similarity
1	4	-0.29788
3	4	-0.24353
2	4	-0.09071
2	3	0.157713
1	3	0.240493
1	2	0.372453

This table demonstrates how different inputs lead to drastically different outputs, regardless of how similar the inputs were.

%%	A	B
2	4	0.000996
1	4	0.000999
3	4	0.001000
1	2	0.995037
2	3	0.995982
1	3	0.999950
1	1	1.0
2	2	1.000000

%% Further analysis is required, as collision rate and etc. have not yet been determined because no large scale tests have been done. However, this method of hash function seems potentially viable.