juju.re/jujure/content/writeups/fcsc_2022/diplodocus.md

405 lines
16 KiB
Markdown
Raw Normal View History

---
title: "Exploiting logical bug to solve NP-complete reversing puzzle | Diplodocus @ FCSC 2022"
date: "2022-05-08 18:00:00"
author: "Juju"
tags: ["Reverse", "Writeup", "fcsc"]
toc: true
---
# Intro
Diplodocus is I think by far my favorite challenge of the 2022 edition of the
FCSC. First it is a reverse challenge, my reference category, then it does not
feature weird unknown CPU architecture, just plain x64 with the only layer of
obfuscation being the heavy optimization of the compiler. Finally it is an
algorithmic problem, and I love algorithms.
The complexity of this challenge relies on the underlying problem it
implements. Diplodocus is the successor of Triceratops, a similar challenge
from the 2021 edition of the FCSC where you were also asked to solve an
NP-complete puzzle to get the flag.
The twist here is that a conceptual flaw of the program allows us to recover
the flag without actually solving the intended puzzle, but more on that later.
{{< image src="/diplodocus/yee-dinosaur.gif" style="border-radius: 8px;" >}}
## Challenge description
`reverse` | `477 pts` `10 solves` `:star::star::star:`
```
Trouvez une entrée qui valide, et soumettez-la au service en ligne pour obtenir
le flag.
`nc challenges.france-cybersecurity-challenge.fr 2201`
SHA256(`diplodocus`) =
`9af3062da630d2b94ad3bfa0b5fd67328d2c6c7bbb79607d7d2fa28a67c7ff9c`.
```
Author: `\J`
## Given files
[diplodocus](/diplodocus/diplodocus)
# Writeup
## Overview
As stated the program is simply an x64 stripped ELF.
```console
$ file diplodocus
diplodocus: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=c489721789271e74d301cda200feb877bd22d80a, for GNU/Linux 3.2.0,
stripped
```
If we try to execute the program, it reads a line from standard input and exits
with status code 1.
```console
$ ./diplodocus
input
$ echo $status
1
```
stracing and ltracing doesn't apport much more information, the program indeed
only calls `read(2)`.
## Main
It's now time to open our favorite decompiler and investigate the main
function.
After cleaning up the decompiler output and renaming the variables we get the
following code:
{{< code file="/static/diplodocus/main.c" language="c" >}}
The code is really simple here, it reads the standard input and pass the input
to a function that seems to perform a check.
If the check is sucessful, the program prints the flag, otherwise it exits with
1.
## Check
The Check function starts by setting up some local variables like a pointer to
the start and to the end of the input and what seems to be a struct that I
called context that may contain informations relevant to the ongoing puzzle.
Most of this struct is set to 0:
{{< code file="/static/diplodocus/check_init.c" language="c" >}}
## Instruction fetching and dispatch
My first observation of this main function made me think it was a really small
virtual machine. We can see it goes through every character in the input and
matches it against opcodes between 0 and 4 included. Every other values seems
to end the main loop with a return value of 1, indicating an error.
Since I see that the return value is set by oring a field of the context (which
I assumed at that time was a processor structure) I assume that this field holds
some sort of error flags that will invalidate our puzzle.
We then have the main dispatcher, processing the correct instruction. One thing
that can be noted is the two `break` statement in `case 0`, this is obviously
my decompiler trolling me but what he is trying to say is that this case will
end the main loop as well. It thus probably is the only way to end the function
with a return value of 0. It must then be the instruction performing the final
check after executing all our instructions. We will take a look at it last
since other instructions will give us a better idea of the internal structure
of the context and are easier to reverse.
{{< code file="/static/diplodocus/check_loop.c" language="c" >}}
### Case 1
This instruction is really simple, it makes one field of the context cycle
between 0 and 3 included. Let's simply remember the offset of this field
(`0xa0`) for now to see where it is used.
{{< code file="/static/diplodocus/case1.c" language="c" >}}
### Case 2
Well we did not need to wait a long time to see where the field from case 1
is used. Right at the beginning of case 2, the program matches the value of the
variable against all its possible values, it then creates a copy of two other
variable of the context and update them depending on our matching.
The 2 variables are then bound checked against `0xb` and used to index what
seems to be an array of `int32_t`. We can therefore understand that the two
variables are indexes of this array and that the variable from case 1 is an
enum describing possible movements in this array. I therefore name them `move`,
`i` and `j`.
Next we notice that we check that the `jth` bit of the `ith` row of the array,
if is not set, it sets the said bit and update the coordinates in the context.
If however it is already set, the program sets the error flag that we already
identified earlier.
Clearly this is some sort of 12 * 12 bitboard implementation, the first thing
that came to my mind is chess since I'm familiar with bitboard based chess
engines programming so it was at this exact moment that I understood this
problem was probably a puzzle or some sort of board game.
So to rephrase all this information, this instruction changes our coordinates
on a 12 * 12 board based on a preselected move from case 1, it places a marker
at our new coordinates and errors if we already visited this tile.
I now know one rule of the game: I cannot step twice on the same tile. By lack
of inspiration, I name this bitboard `visited` since it keeps track of all my
visited tiles.
Bellow is the cleaned up pseudo-C code from the decompiler with all variables
renamed accordingly.
{{< code file="/static/diplodocus/case2.c" language="c" >}}
And here is the moves enum to understand better how case 1 manipulates our
moves.
{{< code file="/static/diplodocus/move_enum.c" language="c" >}}
### Case 3
Case 3 uses a variable from the context that we previously saw initialized to
`0xffffff` at the very start of the check function without understanding it.
We can now see that it is in fact a queue of 24 bits that is popped in case 3
to place the said bit on a different bitboard than case 2 but with the same
coordinates. If the bit queue is empty then we output an error.
So I understand here that we must place 24 bits on another board while
simultaneously moving on the first one that tracks the tiles we already visited.
Seems easy enough, however, the instruction does not end here, it calls a
function passing the row of the bitboard as parameter, if the return value of
this function is more than `2` we output an error.
{{< code file="/static/diplodocus/case3_stripped.c" language="c" >}}
#### Pop count
Here is the said function, so it seems to perform magic bitwise operation.
Could be anything really, my intuition tells me it count the number of bits set
on the line but it may as well be some weird bitboard magic, I mean you never
know the stuff I found on [chess programming](chessprogramming.org) while
working on chess engines really blows my mind.
By googling the first constant `0x5555555555555555` I confirm really fast my
intuition. It is indeed a population count function, the fifth result of the
search being, guess what? [Chess programming wiki on population
count](https://www.chessprogramming.org/Population_Count). I now really start
thinking that this is actually a chess problem.
{{< code file="/static/diplodocus/pop_count.c" language="c" >}}
With this information we can now rename the variables and symbols from case 3
and we now understand that we place a bit from the bit queue on the board, we
cannot allign more than 2 bits on the same line.
{{< code file="/static/diplodocus/case3.c" language="c" >}}
### Case 4
Do not be misled, this instruction seems simple enough but is by far the
hardest one of the challenge.
So first this instruction fetch 2 operand, these operands are later used to
access a third bitboard, so I name them accordingly `i` and `j`.
Then we check if our bit queue is empty, if it is not then we output an error.
This mean that we can call this instruction only when we placed all the 24
points at our disposition.
Now comes the fun part, we call a function passing the coordinates and the
whole context as parameter, this function will output a bit that will be placed
on a third bitboard at the coordinates indexed by the operands.
So my decompiler really despise this function, I told you it took the whole
context as parameters but its not exactly true as it actually packs the whole
context in 10 `int128_t` arguments using SIMD instructions. I cleaned up the
function call for your eyes so you do not have de keep tracks of the 12
parameters of the function.
{{< code file="/static/diplodocus/case4_stripped.c" language="c" >}}
#### \J dabbing on me
You can see below a decompilation of the function, looks fun right? And this is
actually cleaned up so you don't see the SIMD registers and weird struct
packing. Now I really recommend you to not actually read this code but to get
the main idea from the comments and my explanations. A much more readable python
reimplementation is available right after for your sanity.
Most variables are not renamed correctly because I did not actually reverse and
understood all this function. This is simply what it looked like in my
decompiler at the moment I undestood enough to throw it by the window, modulo
of course SIMD and stack fengshui :eyes:.
Let's not care too much about weird constants and implementation details for
now as I did not understood them at the time either.
The main idea of this is code is that it will run through all possible
increments combinations that will iterate on the second board (the one from
case 3, where we manually place our bits from the bit queue).
Basically, we check every possible way we can run through the board.
For each of these increment combination we see a weird loop performing modulos
of the increments. This is actually the euclidean algorithm that computes the
GCD of 2 numbers. This GCD is compared to 1. If it is not one then we skip this
increments combination and go to the next loop iteration. We are therefore only
concerned about running through the board using coprime increments, confusing I
know.
We then iterate on the board using these increments and count the number of
bits set while doing so. If at any point we encounter more than 2 points during
the same passing of the board we set a result flag. The function will output
`1` if and only if no result flag was set for any iteration. Meaning for any
traversal of the board, we did not encounter more than 2 bits.
Here I was really scared because I started to think that this was maybe `\J`
trolling me with again a cryptography problem modelised using bitboards are
something. And everyone who checks my results of the FCSC know how bad I am at
cryptography.
{{< code file="/static/diplodocus/check_align.c" language="c" >}}
#### Alignement count
So now I am really unhappy, I was having fun finding out puzzle and placing
bits on boards but now `\J` is throwing modular arithmetic at me.
I take a break crying in my bed after these findings before actually thinking.
Running through the array using coprime increments probably has a really
interesting property that may be plotted visually.
So I reimplemented this function in python but instead of summing the bits I
encountered, I mark the bits I would have summed to see the path that I take in
the array.
Which gives us this much more readable code:
{{< code file="/static/diplodocus/show.py" language="py" >}}
With this output (only a sample):
{{< code file="/static/diplodocus/show.out" >}}
It is now obvious what this function is doing, it draws every straight line
passing through the point given as parameter and checks if more than 2 points
of the board are alligned.
If we remember the context were this function is used, it means that we will
set a bit in the third bitboard if and only if no line passing through this
point intersects more than 2 points.
### Case 0 (Final check)
So case 0 is the instruction that performs the final check to see if our board
match the desired state. Again there is a lot of SIMD magic going on so I tried
to clean it a little bit, you lose the actual result of the decompilation but
the code is semantically equivalent.
First we perform a pop count on all lines of the `visited` bitboard, comparing
the total sum to `0x90` which is `12 * 12`, the total number of tiles. We must
therefore go through every single tile of the board exactly once.
Then, we pack the third bitboard (the one that stores if 3 points are alligned)
into the `zmm0` 128 bit register because why not? And basically `shifting` and
`bitwise anding` it to check that every bit of the board are set. So its
basically the same check than before but for the third bitboard, meaning that
no line drawn from any point on the board must intersect with more than 2
points.
The last check simply performs a xor on every line of the board and outputs
an error if it is different from 0. This means that every column of the board
must contain an even number of points.
{{< code file="/static/diplodocus/case0.c" language="c" >}}
Great we have everything ready to go! We simply need to go trough the board and
place 24 points on it and make sure there are never 3 points alligned.
So it turns out this problem is NP-complete (I learned after solving the
challenge that is called the Not-three-in-line problem), I might have a hard
time solving it manually. I might start to implement a SAT solver or som...
But wait, did you spot something sus with this implementation ?
{{< image src="/diplodocus/the_rock.jpg" style="border-radius: 8px;" >}}
## Bypassing the puzzle
There are actually two distinct bugs in this implementation of the puzzle.
First, the program does not check that you place a point when there is already
one. It does check that you do not go twice on the same tile but you can still
place multiple points without moving.
The second, the one I exploited here, is that the weird function we
reimplemented actually only count alligned points on all straight lines
**except** for lines and columns.
The special case of the line is checked independently in case 3 when inserting
the point if you remember well.
However the only columns check is in case 0 where it simply checks that there
is an even number of point per column.
Nothing forbids us to put 12 points on the first and last column, which
validates all our constraints.
```
|X| | | | | | | | | | |X|
|X| | | | | | | | | | |X|
|X| | | | | | | | | | |X|
|X| | | | | | | | | | |X|
|X| | | | | | | | | | |X|
|X| | | | | | | | | | |X|
|X| | | | | | | | | | |X|
|X| | | | | | | | | | |X|
|X| | | | | | | | | | |X|
|X| | | | | | | | | | |X|
|X| | | | | | | | | | |X|
|X| | | | | | | | | | |X|
```
{{< image src="/diplodocus/stop_the_count.png" style="border-radius: 8px;" >}}
## Solve
We now simply need to write the input that will give instructions to the
program to write this bugged board.
We first fill the first column, by going downward.
Then we go through the 10 next columns without placing any bit and we fill
the last one.
Finally we need to fill the third bitboard by manually performing the alignment
check for every point in the bitboard so we call case 4 for every coordinates.
We do not forget to put case 0 at the end to validate everything went fine.
{{< code file="/static/diplodocus/solve.py" language="py" >}}
```console
$ ./solve.py | ./diplodocus
Well done! You can submit your input to the remote service to grab the flag!
```
```console
$ ./solve.py | nc challenges.france-cybersecurity-challenge.fr 2201
Well done, the flag is FCSC{bf809d2614501166a890740116103410a69ede950b57a9186bf49eb734eaa1a1}
```