pbctf 2021
Alkaloid Stream [134 pts]
I found a weird stream cipher scheme. Can you break this?
pbctf{super_duper_easy_brute_forcing_actually_this_one_was_made_by_mistake}
tl;dr
For each input bit, a key
number and fake
number is generated. A random keystream
is generated and a corresponding public
list, such that
public[i] = [fake[i], key[i]] if keystream[i] else [key[i], fake[i]]
The input is XORed with the random keystream to generate the ciphertext and this is output together with public
. But fake
was generated by XORing values from key
, and based on this, we can guess which values from public
are fake. The others must be from key
, allowing us to recover the keystream
and decrypt the flag
.
Introduction
If you already understand the script file, you can safely skip to next section.
We are given the challenge files gen.py and output.txt. Let us first take a look at the script gen.py
, inspecting the main functionality:
from flag import flag
flag = bytes_to_bits(flag)
key = keygen(len(flag))
keystream, public = gen_keystream(key)
assert keystream == recover_keystream(key, public)
enc = bits_to_bytes(xor(flag, keystream))
print(enc.hex())
print(public)
The script converts the flag to bits, generates a key
of the same length, and generates some keystream
and public
value. Next, it performs a check to ensure the generated keystream
matches what you would get from recover_keystream(key, public)
. Finally, it encrypts the flag
by XORing with the keystream
and then outputs the ciphertext as hex and the public
value, whatever that is. The contents of output.txt
matches this output and seems to be the encrypted flag followed by the corresponding public
value.
Let’s look deeper into each step, starting with key = keygen(len(flag))
:
def keygen(ln):
# Generate a linearly independent key
arr = [ 1 << i for i in range(ln) ]
for i in range(ln):
for j in range(i):
if random.getrandbits(1):
arr[j] ^= arr[i]
for i in range(ln):
for j in range(i):
if random.getrandbits(1):
arr[ln - 1 - j] ^= arr[ln - 1 - i]
return arr
The function creates a list of ln
powers of 2 (one per flag bit), so [1, 2, 4, 8, ...]
, and then runs through two loops. The first iterates over the list from the beginning and XORs each number with some of the previous (determined by random.getrandbits(1)
). The second basically does the same but from behind and in reverse order. The end result is a list of ln
seemingly random numbers, which is the key
.
This is used in the next step, keystream, public = gen_keystream(key)
:
def gen_keystream(key):
ln = len(key)
# Generate some fake values based on the given key...
fake = [0] * ln
for i in range(ln):
for j in range(ln // 3):
if i + j + 1 >= ln:
break
fake[i] ^= key[i + j + 1]
# Generate the keystream
res = []
for i in range(ln):
t = random.getrandbits(1)
if t:
res.append((t, [fake[i], key[i]]))
else:
res.append((t, [key[i], fake[i]]))
# Shuffle!
random.shuffle(res)
keystream = [v[0] for v in res]
public = [v[1] for v in res]
return keystream, public
This function first generates the list fake
by XORing some numbers from the actual key
together and then uses this as a sort of fake key. It then generates ln
random bits which ends up being the keystream
. It populates public
with both a key
value and fake
value for each keystream
bit. If the bit is a 1, it inserts [fake[i], key[i]]
, and else [key[i], fake[i]]
- so the bit determines the order. This means public
is a list of ln
elements, where each element is a list of two numbers - a key
value and fake
value – but without the key
or keystream
, there should be no way of knowing which is which. As an example, if we have
key = [12, 34, 56, 78]
fake = [98, 76, 54, 32] # Just example values
and the random keystream
ends up being [0, 1, 1, 0]
, then we would have
public = [
[12, 98], # key, fake
[76, 34], # fake, key
[54, 56], # fake, key
[78, 32] # key, fake
]
The keystream
tells us which values from public
are key
values and fake
values - a 0 indicates the first is a key
value and a 1 that the second is. At the end, the function returns keystream
and public
.
We can recover the keystream
from key
and public
. This is what recover_keystream(key, public)
does:
def recover_keystream(key, public):
st = set(key)
keystream = []
for v0, v1 in public:
if v0 in st:
keystream.append(0)
elif v1 in st:
keystream.append(1)
else:
assert False, "Failed to recover the keystream"
return keystream
This just goes through each public
element and if the first value is somewhere in the key
, it adds a 0 to the keystream
, else a 1. The recovered keystream
will only be correct if fake
doesn’t end up containing any value that is also in key
. This is the reason why the script runs
keystream, public = gen_keystream(key)
assert keystream == recover_keystream(key, public)
So after generating the keystream
and public
, it makes sure the same keystream
would be recovered from the key
and public
before using it for encryption. If everything goes well, the encryption can finally be performed:
enc = bits_to_bytes(xor(flag, keystream))
This is a simple XOR of the flag
and keystream
.
Vulnerability
Since the encryption is just an XOR with a randomly generated keystream
, we don’t get much info from the encrypted flag itself. We need to find the keystream
, and since we only have the public
list, we must somehow be able to guess the stream from that. Said differently, if this scheme is vulnerable, it must be because we can somehow guess which values in public
are key
values and which are fake
.
Looking at the keygen()
method, there doesn’t seem to be an obvious way to guess which of the values in public
it could have generated. Instead, we look at how the fake
values are generated from the key
. This direct dependence and lack of inserted randomness could perhaps lead us to guess which values are from fake
– and therefore which are from key
. The fake
list is generated as follows:
fake = [0] * ln
for i in range(ln):
for j in range(ln // 3):
if i + j + 1 >= ln:
break
fake[i] ^= key[i + j + 1]
The outer loop iterates over each position of fake
. For each index i
, it XORs into fake[i]
the next ln // 3
values from key
, starting from key[i + 1]
. As an example, consider the encryption of a single byte, so ln = 8
and ln // 3 = 2
. In this case, every value in fake
is just the XOR of the two next values from key
, so the following key
and fake
values will be paired in public
(ignoring the reordering):
( key[0], 0 ^ key[1] ^ key[2] ),
( key[1], 0 ^ key[2] ^ key[3] ),
( key[2], 0 ^ key[3] ^ key[4] ),
( key[3], 0 ^ key[4] ^ key[5] ),
( key[4], 0 ^ key[5] ^ key[6] ),
( key[5], 0 ^ key[6] ^ key[7] ),
( key[6], 0 ^ key[7] ),
( key[7], 0 )
We see the last value of fake
will always be 0, and we can work backwards from this: If we find a pair in public
with a 0, then the other element is key[7]
. Given this, we can look for 0 ^ key[7]
, which we know is paired with key[6]
. We can then look for key[6] ^ key[7]
, which gives us key[5]
, etc. At the end of this process, we have the entire key
.
Exploit
The following script exploits the vulnerability we just found:
key = []
k = 0
for i in range(ln):
for a, b in public:
if a == k:
key.append(b)
break
elif b == k:
key.append(a)
break
k = xor_list(key[-ln // 3:])
Here, k
is the current value from fake
we are looking for, starting with 0
. The inner loop goes through each pair of public
and checks if one of the two values is k
. If that is the case, then we know that value is from fake
, so we add the other value of the pair to key
. Then we update k
to be the XOR of the last ln // 3
values from key
, since this is the next value from fake
to look for. The outer loop then repeats this process ln
times, and when done, we have the entire key
.
Knowing both key
and public
, we can recover the keystream
and decrypt the flag
:
keystream = recover_keystream(key, public)
flag = bits_to_bytes(xor(enc, keystream))
print(flag.decode())
The full decryption script can be downloaded here.
Notes
A few things to note: First, we recover the key
in reverse order. This has no significance, since the order of keystream
is only based on the order of public
. The recover function just uses key
as a lookup set.
Secondly, in the actual decryption script, we make a copy of public
and work on this instead of on public
. This allows us to delete each pair we have already used, which is necessary to avoid some duplicate value issues.