pbctf 2021

Alkaloid Stream [134 pts]

I found a weird stream cipher scheme. Can you break this?


pbctf{super_duper_easy_brute_forcing_actually_this_one_was_made_by_mistake}


tl;dr

For each input bit, a key number and fake number is generated. A random keystream is generated and a corresponding public list, such that

public[i] = [fake[i], key[i]] if keystream[i] else [key[i], fake[i]]

The input is XORed with the random keystream to generate the ciphertext and this is output together with public. But fake was generated by XORing values from key, and based on this, we can guess which values from public are fake. The others must be from key, allowing us to recover the keystream and decrypt the flag.


Introduction

If you already understand the script file, you can safely skip to next section.

We are given the challenge files gen.py and output.txt. Let us first take a look at the script gen.py, inspecting the main functionality:

from flag import flag
flag = bytes_to_bits(flag)

key = keygen(len(flag))
keystream, public = gen_keystream(key)
assert keystream == recover_keystream(key, public)
enc = bits_to_bytes(xor(flag, keystream))

print(enc.hex())
print(public)

The script converts the flag to bits, generates a key of the same length, and generates some keystream and public value. Next, it performs a check to ensure the generated keystream matches what you would get from recover_keystream(key, public). Finally, it encrypts the flag by XORing with the keystream and then outputs the ciphertext as hex and the public value, whatever that is. The contents of output.txt matches this output and seems to be the encrypted flag followed by the corresponding public value.

Let’s look deeper into each step, starting with key = keygen(len(flag)):

def keygen(ln):
    # Generate a linearly independent key
    arr = [ 1 << i for i in range(ln) ]
    for i in range(ln):
        for j in range(i):
            if random.getrandbits(1):
                arr[j] ^= arr[i]
    for i in range(ln):
        for j in range(i):
            if random.getrandbits(1):
                arr[ln - 1 - j] ^= arr[ln - 1 - i]
    return arr

The function creates a list of ln powers of 2 (one per flag bit), so [1, 2, 4, 8, ...], and then runs through two loops. The first iterates over the list from the beginning and XORs each number with some of the previous (determined by random.getrandbits(1)). The second basically does the same but from behind and in reverse order. The end result is a list of ln seemingly random numbers, which is the key.

This is used in the next step, keystream, public = gen_keystream(key):

def gen_keystream(key):
    ln = len(key)
    
    # Generate some fake values based on the given key...
    fake = [0] * ln
    for i in range(ln):
        for j in range(ln // 3):
            if i + j + 1 >= ln:
                break
            fake[i] ^= key[i + j + 1]

    # Generate the keystream
    res = []
    for i in range(ln):
        t = random.getrandbits(1)
        if t:
            res.append((t, [fake[i], key[i]]))
        else:
            res.append((t, [key[i], fake[i]]))

    # Shuffle!
    random.shuffle(res)

    keystream = [v[0] for v in res]
    public = [v[1] for v in res]
    return keystream, public

This function first generates the list fake by XORing some numbers from the actual key together and then uses this as a sort of fake key. It then generates ln random bits which ends up being the keystream. It populates public with both a key value and fake value for each keystream bit. If the bit is a 1, it inserts [fake[i], key[i]], and else [key[i], fake[i]] - so the bit determines the order. This means public is a list of ln elements, where each element is a list of two numbers - a key value and fake value – but without the key or keystream, there should be no way of knowing which is which. As an example, if we have

key = [12, 34, 56, 78]
fake = [98, 76, 54, 32]  # Just example values

and the random keystream ends up being [0, 1, 1, 0], then we would have

public = [
    [12, 98],  # key, fake
    [76, 34],  # fake, key
    [54, 56],  # fake, key
    [78, 32]   # key, fake
]

The keystream tells us which values from public are key values and fake values - a 0 indicates the first is a key value and a 1 that the second is. At the end, the function returns keystream and public.

We can recover the keystream from key and public. This is what recover_keystream(key, public) does:

def recover_keystream(key, public):
    st = set(key)
    keystream = []
    for v0, v1 in public:
        if v0 in st:
            keystream.append(0)
        elif v1 in st:
            keystream.append(1)
        else:
            assert False, "Failed to recover the keystream"
    return keystream

This just goes through each public element and if the first value is somewhere in the key, it adds a 0 to the keystream, else a 1. The recovered keystream will only be correct if fake doesn’t end up containing any value that is also in key. This is the reason why the script runs

keystream, public = gen_keystream(key)
assert keystream == recover_keystream(key, public)

So after generating the keystream and public, it makes sure the same keystream would be recovered from the key and public before using it for encryption. If everything goes well, the encryption can finally be performed:

enc = bits_to_bytes(xor(flag, keystream))

This is a simple XOR of the flag and keystream.

Vulnerability

Since the encryption is just an XOR with a randomly generated keystream, we don’t get much info from the encrypted flag itself. We need to find the keystream, and since we only have the public list, we must somehow be able to guess the stream from that. Said differently, if this scheme is vulnerable, it must be because we can somehow guess which values in public are key values and which are fake.

Looking at the keygen() method, there doesn’t seem to be an obvious way to guess which of the values in public it could have generated. Instead, we look at how the fake values are generated from the key. This direct dependence and lack of inserted randomness could perhaps lead us to guess which values are from fake – and therefore which are from key. The fake list is generated as follows:

fake = [0] * ln
for i in range(ln):
    for j in range(ln // 3):
        if i + j + 1 >= ln:
            break
        fake[i] ^= key[i + j + 1]

The outer loop iterates over each position of fake. For each index i, it XORs into fake[i] the next ln // 3 values from key, starting from key[i + 1]. As an example, consider the encryption of a single byte, so ln = 8 and ln // 3 = 2. In this case, every value in fake is just the XOR of the two next values from key, so the following key and fake values will be paired in public (ignoring the reordering):

( key[0],  0 ^ key[1] ^ key[2] ),
( key[1],  0 ^ key[2] ^ key[3] ),
( key[2],  0 ^ key[3] ^ key[4] ),
( key[3],  0 ^ key[4] ^ key[5] ),
( key[4],  0 ^ key[5] ^ key[6] ),
( key[5],  0 ^ key[6] ^ key[7] ),
( key[6],  0 ^ key[7]          ),
( key[7],  0                   )

We see the last value of fake will always be 0, and we can work backwards from this: If we find a pair in public with a 0, then the other element is key[7]. Given this, we can look for 0 ^ key[7], which we know is paired with key[6]. We can then look for key[6] ^ key[7], which gives us key[5], etc. At the end of this process, we have the entire key.

Exploit

The following script exploits the vulnerability we just found:

key = []
k = 0
for i in range(ln):
    for a, b in public:
        if a == k:
            key.append(b)
            break
        elif b == k:
            key.append(a)
            break
    k = xor_list(key[-ln // 3:])

Here, k is the current value from fake we are looking for, starting with 0. The inner loop goes through each pair of public and checks if one of the two values is k. If that is the case, then we know that value is from fake, so we add the other value of the pair to key. Then we update k to be the XOR of the last ln // 3 values from key, since this is the next value from fake to look for. The outer loop then repeats this process ln times, and when done, we have the entire key.

Knowing both key and public, we can recover the keystream and decrypt the flag:

keystream = recover_keystream(key, public)
flag = bits_to_bytes(xor(enc, keystream))
print(flag.decode())

The full decryption script can be downloaded here.

Notes

A few things to note: First, we recover the key in reverse order. This has no significance, since the order of keystream is only based on the order of public. The recover function just uses key as a lookup set.

Secondly, in the actual decryption script, we make a copy of public and work on this instead of on public. This allows us to delete each pair we have already used, which is necessary to avoid some duplicate value issues.

pbctf{super_duper_easy_brute_forcing_actually_this_one_was_made_by_mistake}

____

11 October 2021
Tags: <crypto/>