Jakob's Blog coding for fun and growth

Rust in a coding competition (Part 1)

What is the best programming language for a coding competition? What are the significant differences? And how does Rust fit in?

The other Friday night, I attended a coding competition called Codecon. It has been organized by Bloomberg and hosted by ETHZ. Although I did not know what type of tasks will expect me, I knew that Rust would be among the accepted programming languages and I thought it can be a good opportunity for me to practice it.

Arriving there, I have been told there will be a number of tasks with different difficulties and scores attached to them, all puzzles related to security. In fact, no task required prior knowledge of secure systems and each task could be solved by handing in a program that takes a string as an input and provides the correct solution string. Given these types of problems, I think choosing a language that allows for quick and easy string operating is crucial.

This post shows different implementations of the same solution to demonstrate pros and cons of different programming languages.

Problem statement

Taken unchanged from the competition:

Take a hex-encoded, encrypted message - with a single, ASCII character key. Guess the key, and output the original message.

Input Specifications

A single string, representing an XOR’ed message. len(string) % 2 is guarenteed to be 0. This message is intended to be read as a sequence of hexadecimal pairs, which you should convert to binary and XOR with a key to get the correct ASCII output character. len(string) <= 256.

Output Specifications

Your best-guess decrypted message

And then the example provided:

Input

023f6c386b3f39222820326b3f246b392428206b2a6b392332262e6b3f246b392428206b2a6b392332262e6b3f232a3f6c386b39222c233f6b24256b3f22262e

Output

It’s tricky to rock a rhyme to rock a rhyme that’s right on time

For the purpose of structuring the post, I will divide the problem into two subtasks, decoding and guessing. In each subtask, I will provide solutions in different programming languages as a basis for discussion.

The first subtask will provide a function that takes a key and an input string that it will decrypt. From there, we can brute-force all 128 possible keys and thereby generate 128 possible messages. We just don’t know which one is the correct message.

The second subtask will be to somehow decide the probability of each decrypted message to represent a proper English message. That will be done with two simple tools we can use without external dependencies: Heuristics and creativity.

Eventually, some boilerplate code will finalize the solution program to guess the correct message.

First subtask: Decoding

The decoding part boils down to splitting up the input string into chunks of two digits and converting the value of each chunk to a number. The numbers can then be XORed with a key and converted to characters.

Before I show you any code, I want to hand in that Rust has certainly not been designed to write quick and dirty solutions for a competition. More conventional choices would be, for instance, Python or Haskell. Python, as a dynamically typed language with powerful tools to manipulate strings, brings all it takes to solve such tasks fast. And it is considered a simple language and seems quite popular these days. Haskell on the other hand, with its functional elegance and inferred static types probably allows to solve the problems with very short programs codes. With that in mind, let us dive into some code.

First Rust implementation

I would say, my recently increased confrontation with Java 8 streams has influenced this implementation, the first that came to my mind.

  fn decode(input: &str, key: u8) -> String {
    input.chars()
      .map(|c| { c.to_digit(16).unwrap() } )
      .collect::<Vec<_>>()
      .chunks(2)
      .map(|chunk| { ( chunk[0] << 4 | chunk[1] ) as u8 })
      .map(|character| { (character ^ key) as char} ) // XOR
      .collect()
  }

What I like about this code is the functional flow in it and it seems easier to understand than a purely procedural implementation relying on loops. What I do not like is some of the unnecessary type information contained in the code.

Starting at the top, we see that the function borrows a string slice (represented by the type &str) and returns an owned string that is allocated on the heap (represented by the type String). While I love these explicit statements in systems programming, here, for the purpose of a competition, it seems too verbose and complicated to deal with different types of strings.

Then, on line 4 we have a call to collect that looks ugly and complicated, yet it does not add any logic to the algorithm. It basically only does a conversion from an iterator to a vector. That is required because there is no equivalent method to chunks on iterators. To add insult to injury, a type hint is required because type interference does not work here otherwise. In case you are not familiar with Rust, ::<Vec<_>> tells the type checker that the return type of collect should be a vector, where the _ is a wildcard that the type checker is able to fill in without our help. In most cases, we as a programmer do not have to give any type hints to the Rust type checker. But in this case, we are facing compiler errors if the hint is omitted.

Decoding in Haskell

After I have stated that I like the functional flow in the Rust program, let us have a look what a purely functional language like Haskell can offer.

import Data.Char (digitToInt, chr)
import Data.Bits (xor)

decode :: Int -> String -> String
decode key (a:b:xs) = decode_digit key a b : decode key xs
decode _ _ = []

decode_digit :: Int -> Char -> Char -> Char
decode_digit key a b = chr (xor key (16 * digitToInt a + digitToInt b))

Here, I have added the type information voluntarily to make it easier to follow. But I want to point out, I do not actually need to write down line 4 and 8.

With Haskell pattern matching on lists, the chunk from before can be merged right into the variable names declaration ((a:b:xs)). That looks neat and safes some typing.

Besides the recursive call on the remainder of the input, the rest of the program is a one-liner helper function that decodes two hexadecimal digits to a single decrypted character.

Converting in Haskell is provided by the standard library through chr and digitToInt. And the last bit we need, the xor, also comes out of the box.

Now, some people might want to point out that provided code can be shortened to one single line like so:

decode key (a:b:xs) = chr (xor key (16*digitToInt a + digitToInt b)) : decode key xs

Personally, I would say the first snippet is easier to understand while the second is a bit cooler. I will leave it to the reader to decide which is better.

Decoding in Python

Going further to Python 3, I want to show you some list comprehensions and some slicing:

def my_decode(msg, key):
  return ''.join([
      decode_digit(key, msg[i:i+2])
      for i in range(0, len(msg), 2)
    ]);

def decode_digit(key, doubleHexDigit):
  return chr(key ^ int(doubleHexDigit, 16));

Of course, the program does the same thing as the two programs above. The list comprehension takes over the chunking here. The rest is surprisingly similar to the Haskell version, both sharing a basically equivalent one-liner helper function to decode each digit.

Type conversion in Python is nice and consistent with chr(...) to get a character and int(...) to get a number. And thanks to the base parameter (=16 for hexadecimal representations) available on int(...), we can avoid doing the nasty calculations ourselves that have been necessary in Haskell.

This particular code snippet has been the one that I had right after the shortest coding time. And that even though from all mentioned languages here, I have the least experience with Python. Therefore I feel comfortable to say that Python is indeed a simple and powerful language when all we care for is a quick (and dirty) solution.

Again, formatting is up for debate. Some people may like it better on a single line. Although I am certainly not one of them, for completeness sake here it is on one single line:

def my_decode(msg, key): return ''.join([chr(key ^ int(msg[i:i+2], 16)) for i in range(0, len(msg), 2)])

Decoding in C

Now, for the purpose of demonstrating what is possible with a plain old procedural approach, I have also written an equivalent algorithm in C.

char* decode(char* input, char key) {
  char* output = malloc(MAX_LENGTH + 1);
  int i;
  for(i = 0; input[i*2] != '\0' ; i ++) {
        char hexdigit[3] = { input[2*i], input[2*i+1], '\0'};
        long int character = strtol(hexdigit, NULL, 16);
        output[i] = character ^ key;
  }
  output[i] = '\0';
  return output;
}

How does it compare to the Rust version?

The Rust version has the nice benefit that memory management is taken care of by the compiler and that there is no restriction on the input length. In contrast, the C function leaks memory that needs to be freed outside and the program will crash if the input is too long if it is not NULL-terminated properly. Definitely, I would prefer the Rust version in any productive environment.

However, the C version is about the same length as the Rust version and it actually took me less time to write it down, due to time I spent fighting the Rust type system. In the C code, type conversions are well-hidden in the background and require no explicit statements. Again, I would prefer explicitness in productive environments but here it seems unnecessary. Therefor I must admit, for the purpose of the competition, even the C version can be considered more competitive than my initial approach with Rust.

Wrapping up this post

This post demonstrates how the choice of the programming language heavily influences the sort of solution we come up with. In the next post, I will investigate how it is possible to imitate other languages within Rust, staying on subtask 1 a little longer.

Next post in series: Versatile Rust - Part 2 of Rust in a coding competition

Technology Stack

Rust

The Rust programming languages had its first stable release in May 2015. It has been designed with performance and low-level programming as a high priority. Therefore, it is well suited for applications that would otherwise be done in C++. The huge improvements of Rust over C++ are its strong safety guarantees like data-race freedom and memory safety at compile time.

Haskell

A purely functional programming language with a strong, static type system. Especially beloved in academia but also relevant to the industry.

Python

An interpreted language with a primary focus on code readability. It is strongly, dynamically typed and is often used in modern science to perform computations or even to train machine-learning models.

C

Arguably the most important programming language in human history. If you have not heard about it, go and read it up on Wikipedia by clicking on the title.