Sunday, May 24, 2015

HUST Festival - Network 300 Challenge

HUST festival was this weekend and we participated although many parts of the festival were Korean language, they made a great effort to put English clues in for all the challenges.

This challenge was one of the earliest available so we tackled it first. It was pretty much just a PCAP file with instructions to connect to an IP address on port 22.

We downloaded the file, unzipped it and found the PCAP to be pretty large in size:

root@mankrik:~/hust/net300# unzip net_7be01f8e.zip 
Archive:  net_7be01f8e.zip
  inflating: league_of_access.pcap   
root@mankrik:~/hust/net300# file league_of_access.pcap 
league_of_access.pcap: pcap-ng capture file - version 1.0
root@mankrik:~/hust/net300# ls -lah league_of_access.pcap 
-rw-r--r-- 1 root root 9.1M May 22 15:09 league_of_access.pcap

Whenever I have a bigger PCAP I use a Windows program called Network Miner to get a graphical overview of different attributes of the traffic in the PCAP. Network Miner needs a PCAP file (i.e. not a PCAP-NG file) so I used Wireshark to "Save As" a PCAP file before importing it into Network miner.

In Network Miner, I used the Session window to sort the connections found in the PCAP by different attributes. I sorted by "Server Port" column to notice that the user conducted a port scan against the target server


I then checked the SSH connections and noted the timestamps of the connections. I noted that the final SSH connection attempts happened some minutes after the port scan completed.


I then examined some of the other tabs in Network miner. In the Parameters window I noticed quite a few Referrer entries from a Russian blog post about "port knocking":


I also found some more "port knocking" hints in the DNS tab:


Lastly, there was a couple of other things I noted in the images tab, a few items mentioning "hidden files":


Ok so now I feel it's time to switch over to Wireshark because I have a good idea of the method of attack here:

  1. Port knocking, to open an SSH port
  2. Hidden files on the host

In Wireshark I setup a filter to just examine traffic to/from this host in question:

  • "ip.src == 223.194.105.178 or ip.dst == 223.194.105.178"

Next, I scrolled down to the Frame #'s shown during the Network Miner investigation to see if the user was able to successfully get that SSH fired up.

First I double checked that the user was still getting failed SSH connections, and at Frame # 10905 & 10932 I see a failed connection attempt.


Then later, at frame 10967 & 10968 a successful connection is found:


So what changed between 10905 and 10967? Well we're thinking port knocking so let's see if we can see a weird bunch of connection attempts before this and see the following:


At this point im not 100% sure which of these are port knocking and which is just coincidental traffic. So I decided to try all of it. I came with the following command line:


root@mankrik:~/hust/net300# nmap -sS -p T:80,4523,2351,2351,4523,443 223.194.105.178; ping -c 1 -t 1 223.194.105.178; icmpush -tstamp -seq 0 -id 13970 -to 1 223.194.105.178; ssh you@223.194.105.178

Which actually worked first go. So either I got right or I got lucky.

Anyway, once on the server getting the flag was pretty simple. Using the clue of "hidden" files I expected there to be a file beginning with "." and sure enough there was:


-bash-4.1$ find
.
./.bashrc
./.bash_logout
./.bash_profile
./.bashc
./.bash_history
-bash-4.1$ id
uid=500 gid=500(you) groups=500(you)
-bash-4.1$ cat .bashc


++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++
++++++++-----------------+++++++++
+++++++|                 |++++++++
+++++++| Hari60_iz_B3s+! |++++++++
+++++++|                 |++++++++
++++++++-----------------+++++++++
++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++
-bash-4.1$ exit

Monday, May 18, 2015

Defcon CTF 2015 - Access Control - Reverse Engineering - 1 Point challenge

Another one pointer, this time in the RE category. For this one I am going to give an example of solving something "just enough" to get the points so you can move on to the next flag. I've read other writeups about this and notice some people reversed very precisely to get a perfectly working exploit. I, on the other hand, was willing to cut corners!

The clue for this one is again brief,

accesscontrol

It's all about who you know and what you want. access_control_server_f380fcad6e9b2cdb3c73c651824222dc.quals.shallweplayaga.me:17069

With the clue is a binary file which you can find archived here.

First up I just connected to the server with netcat to see what it looked like:


root@mankrik:~/defcon/access# nc access_control_server_f380fcad6e9b2cdb3c73c651824222dc.quals.shallweplayaga.me 17069
connection ID: ZP]j[OD6|t9IMa


*** Welcome to the ACME data retrieval service ***
what version is your client?

Ok neat, no idea what it wants yet but check out that "Connection ID". I bet that's useful later right?

Let's examine the binary:


root@mankrik:~/defcon/access# file client_197010ce28dffd35bf00ffc56e3aeb9f 
client_197010ce28dffd35bf00ffc56e3aeb9f: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=0xcf260fd5e12b4ccf789d77ac706a049d83df4f05, stripped


root@mankrik:~/defcon/access# strings client_197010ce28dffd35bf00ffc56e3aeb9f 
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
socket
exit
htons
sprintf
perror
connect
strncpy
puts
__stack_chk_fail
stdin
fgets
send
strstr
recv
inet_addr
__libc_start_main
GLIBC_2.4
GLIBC_2.0
PTRhP
D$tf
[^_]
VUUU
UWVS
[^_]
need IP
Could not create socket
Socket created
connect failed. Error
Enter message : 
hack the world
nope...%s
what version is your client?
version 3.11.54
hello...who is this?
grumpy
grumpy
enter user password
hello %s, what would you like to do?
list users
deadwood
print key
the key is:
challenge:
answer?
recv failed
<< %s
connection ID:
connection ID: 
challenge: 
Send failed
;*2$",

Standard stuff. There's a version string "version 3.11.54" so that's needed to answer the question to the server I bet.

Let's try running the client...


root@mankrik:~/defcon/access# ./client_197010ce28dffd35bf00ffc56e3aeb9f 
need IP

root@mankrik:~/defcon/access# host access_control_server_f380fcad6e9b2cdb3c73c651824222dc.quals.shallweplayaga.me
access_control_server_f380fcad6e9b2cdb3c73c651824222dc.quals.shallweplayaga.me has address 54.84.39.118

root@mankrik:~/defcon/access# ./client_197010ce28dffd35bf00ffc56e3aeb9f 54.84.39.118
Socket created
Enter message : go?
nope...go?

Ok time for more reversing... In IDA we find this tidbit quickly:


         printf("Enter message : ");
          fgets(byte_804B080, 1000, stdin);
          v6 = 16;
          v7 = byte_804B080;
          v8 = "hack the world\n";
          do
          {
            if ( !v6 )
              break;
            v4 = *v7 < *v8;
            v5 = *v7++ == *v8++;
            --v6;
          }
          while ( v5 );
          if ( (!v4 && !v5) != v4 )
          {
            printf("nope...%s\n", byte_804B080);
            result = -1;
            goto LABEL_54;
          }

So the "message" is probably "hack the world"? Back to the client:


root@mankrik:~/defcon/access# ./client_197010ce28dffd35bf00ffc56e3aeb9f 54.84.39.118
Socket created
Enter message : hack the world
<< connection ID: m:to/#$w'x6DYy


*** Welcome to the ACME data retrieval service ***
what version is your client?

<< hello...who is this?
<< 

<< enter user password

<< hello grumpy, what would you like to do?

<< grumpy
<< 
mrvito
gynophage
selir
jymbolia
sirgoon
duchess
deadwood
hello grumpy, what would you like to do?

<< the key is not accessible from this account. your administrator has been notified.
<< 
hello grumpy, what would you like to do?

hello



^C

Alrighty, thats cool. We get logged in, we see a user list, and we see theres some kind of key although we don't have privs to get the key yet. The client is not interactive either, so we can't try more stuff. It's looking like we need to build our own client... So we need to reverse how this one works so we can do that.

Let's look at the network layer to watch what it sends:


Ok all normal, we got:
  • the version number string (as expected)
  • the username
  • the two commands "list users" and "print key".
  • What's up with the password though. Is that the user password? 

I capture a few connections and the response to the password prompt changes each time. So this is somehow ciphered. So if we're going to write a client, we need to encrypt the password correctly...

Back to IDA! We find this code naming two functions "sub_8048EAB" and "sub_8048F67":

          sub_8048CFA("enter user password");
          if ( sub_8048CFA("enter user password") )
          {
            v28 = 0;
            v29 = 0;
            sub_8048EAB("grumpy", &v28);
            sub_8048F67(&v28);
            HIBYTE(v29) = 0;
            sprintf(&v28, "%s\n", &v28);
            sub_8048E3E(&v28);
          }

sub_8048EAB XORs the first 5 bytes of the given plaintext password with a key. The key is based on the connection ID (dword_804B04C) and an offset into the connection ID string is chosen based on (dword_804BC80 % 3):


char *__cdecl sub_8048EAB(int a1, int a2)
{
  char *result; // eax@1
  int v3; // ebx@4
  signed int i; // [sp+2Ch] [bp-1Ch]@1
  char dest[5]; // [sp+37h] [bp-11h]@1
  int v6; // [sp+3Ch] [bp-Ch]@1

  v6 = *MK_FP(__GS__, 20);
  result = strncpy(dest, &::dest[dword_804B04C] + dword_804BC80 % 3, 5u);
  for ( i = 0; i <= 4; ++i )
  {
    result = (a2 + i);
    *(a2 + i) = dest[i] ^ *(a1 + i);
  }
  v3 = *MK_FP(__GS__, 20) ^ v6;
  return result;
}

So what do we know, in plain language:

  • The ciphered password string really can only be 5 characters long
  • The key (connection ID) is public and given to us at the start of the connection
  • The offset of the connection ID where the key begins is possible to calculate by deriving how dword_804BC80 is generated then computing it modulo 3.
The first two are easy, the third item requires more reversing. When you think about it, what is the point of reversing more here? So we know exactly the offset for the key? Does it matter? Let's try something in the server with netcat:


root@mankrik:~/defcon/access# nc access_control_server_f380fcad6e9b2cdb3c73c651824222dc.quals.shallweplayaga.me 17069
connection ID: )Y<AH{"5mgnZU8


*** Welcome to the ACME data retrieval service ***
what version is your client?
version 3.11.54
hello...who is this?
grumpy
enter user password
nottherightpassword
wrong password, fat fingers
hello...who is this?
grumpy
enter user password
grumpy
wrong password, fat fingers
hello...who is this?

So the server, for 1 connection ID gives us multiple attempts at the password. So since the correct offset modulo 3 can only ever be 0,1,2,3 we have only a maximum of 4 password attempts before we will get the right password.

Long story short, I didn't bother reversing that bit and moved on to the second function that operates on the password before sending it:

This function is simpler, it just parses the output of the first function looking for characters <= 31 and if found, adds 32 to them. If the character == 127 it does other things.


int __cdecl sub_8048F67(int a1)
{
  int result; // eax@4
  signed int i; // [sp+Ch] [bp-4h]@1

  for ( i = 0; i <= 4; ++i )
  {
    if ( *(a1 + i) <= 31 )
      *(a1 + i) += 32;
    result = *(a1 + i);
    if ( result == 127 )
    {
      *(a1 + i) -= 126;
      result = a1 + i;
      *(a1 + i) += 32;
    }
  }
  return result;

Again I got lazy here, I don't care about the edge case single character == 127 so I completely ignored implementing that!

For both of these "optimizations" I made up front, I paid a price though. We'll see that later :)

So now I have the makings of  a client that can login, let's try it:


root@mankrik:~/defcon/access# ./m1.py 
[+] Opening connection to access_control_server_f380fcad6e9b2cdb3c73c651824222dc.quals.shallweplayaga.me on port 17069: Done
[+] Connection ID: /Sn]9;INS1Y~*P
[+] Sending username: grumpy
[+] Sending password attempt 0...
[+] Retrying...
[+] Sending username: grumpy
[+] Sending password attempt 1...
[+] Login success with offset 1
[*] Switching to interactive mode
$ list users
grumpy
mrvito
gynophage
selir
jymbolia
sirgoon
duchess
deadwood
hello grumpy, what would you like to do?
$  

Cool it works! And now I'm interactive I can try more commands. Except there aren't any! Poop. Well let's try these other users eh? Hope they all picked silly passwords like Mr Grumpy did.


root@mankrik:~/defcon/access# ./m1.py 
[+] Opening connection to access_control_server_f380fcad6e9b2cdb3c73c651824222dc.quals.shallweplayaga.me on port 17069: Done
[+] Connection ID: vB_Ydfnq}/m|"L
[+] Sending username: sirgoon
[+] Sending password attempt 0...
[+] Retrying...
[+] Sending username: sirgoon
[+] Sending password attempt 1...
[+] Retrying...
[+] Sending username: sirgoon
[+] Sending password attempt 2...
[+] Retrying...
[+] Sending username: sirgoon
[+] Sending password attempt 3...
[+] Login success with offset 3
[*] Switching to interactive mode
$ print key
the key is not accessible from this account. your administrator has been notified.
hello sirgoon, what would you like to do?

Ok fine. We go on this way for a while. Finally we get to the user "duchess":


[+] Opening connection to access_control_server_f380fcad6e9b2cdb3c73c651824222dc.quals.shallweplayaga.me on port 17069: Done
[+] Connection ID: Xo/Wug;}erP'PQ
[+] Sending username: duchess
[+] Sending password attempt 0...
[+] Retrying...
[+] Sending username: duchess
[+] Sending password attempt 1...
[+] Retrying...
[+] Sending username: duchess
[+] Sending password attempt 2...
[+] Retrying...
[+] Sending username: duchess
[+] Sending password attempt 3...
[+] Login success with offset 3
[*] Switching to interactive mode
$ print key
challenge: _(&:5
answer?
$ nope
you are not worthy
hello...who is this?

Ok neat. The user duchess can get the key, but first they must answer the challenge!

The challenge looks to be a 5 byte string, encrypted in the same way our password was when it was sent over the wire.

If we get the answer wrong we get dropped, but we don't lose our connection. We just have to relogin. That's pretty important because we keep our connection ID.

So this is the code I came up with as a "test" client to solve the challenge. I didn't continue coding past this point because my first attempt got me the flag:


#!/usr/bin/python

from pwn import *

HOST='access_control_server_f380fcad6e9b2cdb3c73c651824222dc.quals.shallweplayaga.me'
PORT=17069

username = 'duchess'
password = username[:5]
version = 'version 3.11.54'

def cipherpw(connectionid, password, offset):
 connectionid = connectionid[offset:]
 pl = list(password)
 cl = list(connectionid)
 result = ""
 for p in range(len(pl)):
  result += chr(ord(pl[p]) ^ ord(cl[p]))

 a1 = list(result)
 r2 = ""
 for i in range(5):
  if ord(a1[i]) <= 31:
   a1[i] = chr(ord(a1[i]) + 32)
  r2 += a1[i]
 return r2

conn = remote(HOST,PORT)

connectionid = conn.recvline().split(" ")[2]
print "[+] Connection ID: " + connectionid,
conn.recvlines(4) # banner
conn.sendline(version)

o = 0
while True:
 conn.recvline()  # login prompt
 print "[+] Sending username: " + username
 conn.sendline(username)
 conn.recvline()  # password prompt
 print "[+] Sending password attempt " + str(o) + "..."
 conn.sendline(cipherpw(connectionid,password,o))
 result = conn.recvline()
 if 'what would you like to do' in result:
  print "[+] Login success with offset " + str(o)
  break
 else:
  print "[+] Retrying..."
 o += 1

conn.sendline('print key')
challenge = conn.recvline().split(" ")[1]
print "[+] Challenge received: " + challenge
conn.recvline() # answer prompt
for i in range(8):
 answer = cipherpw(connectionid, challenge, i)
 print "[+] Possible answer: " + answer

print "[+] Challenge response: " + answer
conn.sendline(answer)
print "[+] Result: " + conn.recvline()
conn.close()

Here's what it looks like when run.


[+] Opening connection to access_control_server_f380fcad6e9b2cdb3c73c651824222dc.quals.shallweplayaga.me on port 17069: Done
[+] Connection ID: J2*mO()uQAMp7d
[+] Sending username: duchess
[+] Sending password attempt 0...
[+] Retrying...
[+] Sending username: duchess
[+] Sending password attempt 1...
[+] Login success with offset 1
[+] Challenge received: +]J=4

[+] Possible answer: ao`P{
[+] Possible answer: 9w'r<
[+] Possible answer: !0%5=
[+] Possible answer: F2b4A
[+] Possible answer: ducHe
[+] Possible answer: #t?lu
[+] Possible answer: "(;|y
[+] Possible answer: ^,+pD
[+] Challenge response: ^,+pD
[+] Result: the key is: The only easy day was yesterday. 44564

[*] Closed connection to access_control_server_f380fcad6e9b2cdb3c73c651824222dc.quals.shallweplayaga.me port 17069


To be fair, this client is not complete. The logic which selects the real correct answer is not in place. It was only by luck that my solution finds the correct answer some times. I have rerun the exploit about 10 times now and it solved it correctly 3 times out of those 10 attempts.

Since we're not about perfect execution, just about CTF, we leave it open ended but happy we solved it in time.

Defcon CTF 2015 - Cat western - 1 Point Coding Challenge

Defcon was finally here and I was so totally pumped for it. When it dropped, so did my jaw, it wasn't going to be easy. That's for sure. Defcon had some of the evilest, trickiest challenges I've ever faced in a CTF. And it was super fun because of it.

This challenge itself was a break from all the pwnables which I found really irritating in their multilayered complexity. This was simple, get in, work on data, get the flag. Single step stuff.

The challenge gives only the following clue:

Catwestern

meow catwestern_631d7907670909fc4df2defc13f2057c.quals.shallweplayaga.me 9999

Upon connecting to that host with netcat we see the following:


# nc catwestern_631d7907670909fc4df2defc13f2057c.quals.shallweplayaga.me 9999
****Initial Register State****
rax=0x33d34005adc0a57c
rbx=0xe318943106ae1bf9
rcx=0x805e3dd411fbc177
rdx=0x9a951fd093fea167
rsi=0xdecfe15b09b2a5f5
rdi=0x8c1473c5803b0f4c
r8=0xf3888ac64cb3b981
r9=0xda321f211f484523
r10=0xadb76e93d0e8fa1e
r11=0x54acc124703437a
r12=0x6ba39546c9366ffa
r13=0x2250452bedb3e99a
r14=0x525e1ed890af328e
r15=0xdfcbd919b08f5cbe
****Send Solution In The Same Format****
About to send 81 bytes: 
H��I�׵m
I �H��)o�zL)�I��� �^M!�I��M �H��M��I��H��I��


Hrmm, whats this even about?

Given that it's talking about registers, my first thought is that maybe the binary data represents a memory dump of registers that we're supposed to unpack and send back. But after some thought, that was not much of a challenge so I looked for other vectors.

Next I thought about the terminology used in the server response. "Initial Register State"? "Send solution in same format"? Hmm. Well what operates on registers? Machine code I suppose. I wrote a quick Python client to grab just the binary data from the host into a file called "data.bin".

I then wrote a quick C program to act as a host for my test:


root@mankrik:~/defcon/cat# cat hw.c
#include<stdio.h>

void main(void) {
 printf("Hello world!\n");
}



Next I used GDB to inject my data.bin file into this process:


root@mankrik:~/defcon/cat# gcc hw.c -o hw
root@mankrik:~/defcon/cat# gdb ./hw
GNU gdb (GDB) 7.4.1-debian
...
(no debugging symbols found)...done.
gdb-peda$ br main
Breakpoint 1 at 0x400510
gdb-peda$ r
...

Breakpoint 1, 0x0000000000400510 in main ()
gdb-peda$ restore data.bin binary $pc
Restoring binary file data.bin into memory (0x400510 to 0x40055b)
gdb-peda$ x/20i $pc
=> 0x400510 <main+4>: nop
   0x400511 <main+5>: inc    rdx
   0x400514 <main+8>: shrd   r13,r11,0x6
   0x400519 <main+13>: shrd   r9,r11,cl
   0x40051d: mul    rdi
   0x400520 <__libc_csu_fini>: adc    rbx,0x5b4087d2
   0x400527: adc    rbx,r13
   0x40052a: shld   rax,r11,0x1
   0x40052f: push   r8
   0x400531 <__libc_csu_init+1>: shld   rdi,rax,0xd
   0x400536 <__libc_csu_init+6>: add    rbx,r13
   0x400539 <__libc_csu_init+9>: adc    r13,r12
   0x40053c <__libc_csu_init+12>: and    r15,0x7fac5ea2
   0x400543 <__libc_csu_init+19>: xor    rdi,rdx
   0x400546 <__libc_csu_init+22>: xor    r14,r11
   0x400549 <__libc_csu_init+25>: sbb    r9,r14
   0x40054c <__libc_csu_init+28>: xor    rax,0x586fd268
   0x400552 <__libc_csu_init+34>: imul   r15,r15,0x7053ef41
   0x400559 <__libc_csu_init+41>: pop    rax
   0x40055a <__libc_csu_init+42>: ret    

As I suspected, the binary blob is x86 instructions that operate on registers and then returns. Cool. Now, how to programatically do this?

Since my host binary is working so well in hosting this parasite, I decided to make GDB do all the heavy lifting in this exploit and simply script it to do what I wanted.

In Python I grab the binary data and initial states from the network, write a GDB script to load the binary data into the main() function of a hello world C program. Execute the code, crash on the "ret" instruction, examine the final state registers and send them to the server.

The output looks like this:


[+] Opening connection to catwestern_631d7907670909fc4df2defc13f2057c.quals.shallweplayaga.me on port 9999: Done
[+] Getting register states.
[+] Received binary data of size 75:
[+] Building GDB script ...
[+] Executing code...
[+] Parsing registers...
[+] Uploading final state registers...
[>>] rax=0xc4dd04450d8a82e2
[>>] rbx=0xecec03e91686ddeb
[>>] rcx=0xbc71ad0689c1b33d
[>>] rdx=0xc53b9356b687985d
[>>] rsi=0x4997cd7d0cbedd1d
[>>] rdi=0x1bcddcc53dbff749
[>>] r8=0xc4dd04450d8a82e2
[>>] r9=0xc4af29843ee93987
[>>] r10=0x5eb67a469067a63d
[>>] r11=0x554004b2b7283702
[>>] r12=0x677751cce192919e
[>>] r13=0x711711cc7f408fec
[>>] r14=0xe550fc117a587e8f
[>>] r15=0x2748e3d257567b22
[+] Recieving all data: Done (66B)
[*] Closed connection to catwestern_631d7907670909fc4df2defc13f2057c.quals.shallweplayaga.me port 9999
[+] Result: The flag is: Cats with frickin lazer beamz on top of their heads!

And here's the code!


#!/usr/bin/python

from pwn import *
import subprocess
import re
import os

HOST='catwestern_631d7907670909fc4df2defc13f2057c.quals.shallweplayaga.me'
PORT=9999

conn = remote(HOST,PORT)

print "[+] Getting register states."
regstates = conn.recvuntil("****Send")

# Store the states in a list
initials = []
for register in regstates.splitlines():
 if '****' in register:
  continue
 else:
  initials.append(register.split("=")[0])
  initials.append(register.split("=")[1])
   
conn.recvline()
size = int(conn.recvline().split(" ")[3],10)
data = conn.recvn(size)
print "[+] Received binary data of size " + str(size)
with open('data.bin','wb') as f:
 f.write(data) 

print "[+] Building GDB script ..."
with open('gdbscript.txt','wb') as f:
 f.write("pset option ansicolor off\nbr main\nr\n")
 a = 0
 while a < len(initials):
  f.write("set $" + initials[a] + "=" + initials[a+1]+"\n")
  a += 2
 
 f.write("info reg\nrestore data.bin binary $pc\nc\ninfo reg\nquit\n")

print "[+] Executing code..."
with open(os.devnull) as DEVNULL:
 gdbout = subprocess.check_output(['gdb','-x','gdbscript.txt','./hw'], stderr=DEVNULL)
print "[+] Parsing registers..."

foundsegv = 0
finals = []
for line in gdbout.splitlines():
 if 'Stopped reason: SIGSEGV' in line: # the ret instruction causes segv
  foundsegv = 1
 
 i = 0
 while i < len(initials):
  if initials[i] in line and foundsegv > 0:
   finalval = re.split('\s+',line)[1]
   finals.append(initials[i])
   finals.append(finalval)
  i+=2

print "[+] Uploading final state registers..."
i = 0
while i < len(finals):
 payload = finals[i] + "=" + finals[i+1]
 print "[>>] " + payload
 conn.sendline(payload)
 i += 2
 
result = conn.recvall()
print "[+] Result: " + result
conn.close()


Monday, May 11, 2015

ASIS CTF 2015 - simple_algorithm - 100 point Crypto Challenge


I worked on this one first because it was one of the earliest challenges available and for the point value must be a quick solve. We are given very little information in the challenge but at least the provided file has everything we need.


root@mankrik:~/asis/algo# tar -Jxvf simple_algorithm_5a0058082857cf27d6e51c095ac59bd5
simple_algorithm/
simple_algorithm/enc.txt
simple_algorithm/._simple_algorithm.py
simple_algorithm/simple_algorithm.py

If we examine the ciphertext we see a long integer:


root@mankrik:~/asis/algo/simple_algorithm# cat enc.txt 
2712733801194381163880124319146586498182192151917719248224681364019142438188097307292437016388011943193619457377217328473027324319178428

And if we check the python code we see the details of the cipher algorithm:


#!/usr/bin/python

flag = '[censored]'
hflag = flag.encode('hex')
iflag = int(hflag[2:], 16)

def FAN(n, m):
    i = 0
    z = []
    s = 0
    while n > 0:
     if n % 2 != 0:
      z.append(2 - (n % 4))
     else:
      z.append(0)
     n = (n - z[i])/2
     i = i + 1
    z = z[::-1]
    l = len(z)
    for i in range(0, l):
        s += z[i] * m ** (l - 1 - i)
    return s

i = 0
r = ''
while i < len(str(iflag)):
    d = str(iflag)[i:i+2]
    nf = FAN(int(d), 3)
    r += str(nf)
    i += 2

print r

What it does is take 2 bytes of the plaintext at a time, converts that to an integer, does some math on that integer and then concatenates the result of that math to the list of existing integers to form the ciphertext.

Initially I tried to simply reverse the algorithm but that turned out not to be feasible. The output of the math results in integers that are either 2 or 3 digits long and there's no way to know from the ciphertext which you're dealing with and so you'd have a large amount of trial and error.

I next tried enciphering some sample strings and noticed that enciphering a simple "ASIS{00000000000000000000000000000000}" string gets me 14 integers closer to solving the ciphertext:

root@mankrik:~/asis/algo/simple_algorithm# ./simple_algorithm.py 
2712733801194321673080924280272919148112712871921790216656907207572172432448111947191682811944216233193618028182875719416452697218
root@mankrik:~/asis/algo/simple_algorithm# cat enc.txt 
2712733801194381163880124319146586498182192151917719248224681364019142438188097307292437016388011943193619457377217328473027324319178428

So given this idea, and the knowledge that all ASIS flags contain only hexadecimal digits 0-9 and a-f I decided to brute force attack this part of the challenge.

What I decided initially to try is to iterate each character through 0-9, a-f byte by byte. However since the algorithm takes 2 bytes of plaintext and produces 2-3 digit integers it was extremely unreliable.

The basic algorithm I came up with is:

  • Beginning at the 1st character in the MD5sum, encrypt the text ASIS{x0000000000000000000000000000000} call it Ci
  • Compare Ci with the ciphertext C by counting the number of correct integers in the result
  • If Ci improves the count of integers we have correct in our last pass by 2+ then put this Ci in a list and label it with how many correct integers it had. Call this result a "good result"
  • Once we've checked every character 0-9, a-f in that position in the plaintext then compare all of the "good results" for this position to find the best result. Assume that is correct and move to the next position.
Next I switched to n-grams and iterated through digrams and trigrams and got increasingly accurate results but finally decided to settle on quadgrams which gave 100% reliable cracking in an acceptable amount of time.


root@mankrik:~/asis/algo/crack# ./simplepwn2.py 
[+] Cracking ciphertext...
[+] Flag so far: ASIS{a9ab0000000000000000000000000000}
[+] Flag so far: ASIS{a9ab115c000000000000000000000000}
[+] Flag so far: ASIS{a9ab115c488a00000000000000000000}
[+] Flag so far: ASIS{a9ab115c488a31180000000000000000}
[+] Flag so far: ASIS{a9ab115c488a311896da000000000000}
[+] Flag so far: ASIS{a9ab115c488a311896dac4e800000000}
[+] Flag so far: ASIS{a9ab115c488a311896dac4e8bc200000}
[+] Flag so far: ASIS{a9ab115c488a311896dac4e8bc20a6d7}
[+] Flag: ASIS{a9ab115c488a311896dac4e8bc20a6d7}

And finally, here's the code which takes the basic code provided in the challenge and uses it to crack the cipher:


#!/usr/bin/python

import string
import itertools

# try n-grams of how many characters at a time?
atatime=4

# Example:  ASIS{b026324c6904b2a9cb4b88d6d61c81d1}
initflag = 'ASIS{00000000000000000000000000000000}'

encint = "2712733801194381163880124319146586498182192151917719248224681364019142438188097307292437016388011943193619457377217328473027324319178428"

def FAN(n, m):
    i = 0
    z = []
    s = 0
    while n > 0:
     if n % 2 != 0:
      z.append(2 - (n % 4))
     else:
      z.append(0)
     n = (n - z[i])/2
     i = i + 1
    z = z[::-1]
    l = len(z)
    for i in range(0, l):
        s += z[i] * m ** (l - 1 - i)
    return s

def enc(plaintext):
 hflag = plaintext.encode('hex')
 iflag = int(hflag[2:], 16)
 i = 0
 r = ''
 while i < len(str(iflag)):
     d = str(iflag)[i:i+2]
     nf = FAN(int(d), 3)
     r += str(nf)
     i += 2

 return r 

def compareit(attemptstr):
       enclist = list(encint)
       attempt = list(attemptstr)
       correct = 0
       for c in range(len(enclist)):
           if attempt[c] == enclist[c]:
         correct += 1 
           else:
                break

       return correct


# start exchanging pairs at the n'th column
startchar = 5
# baseline from that column of correct integers in output
baseline  = 14
alphabet = '0123456789abcdef'
pair = startchar
currentflag = initflag

print "[+] Cracking ciphertext..."

while pair < len(initflag)-1:

 flaglist = list(currentflag)
 goodresults = []
 # for every pair of hexdigits
 for maybe in itertools.product(alphabet, repeat=atatime): 
  flaglist[pair] = maybe[0]
  flaglist[pair+1] = maybe[1]
  flaglist[pair+2] = maybe[2]
  flaglist[pair+3] = maybe[3]

  tryflag = "".join(flaglist)
  attempt = enc(tryflag)
  result = compareit(attempt)
  if result > baseline+2:
   maybelist = list(maybe)
   maybelist.append(result)
   goodresults.append(maybelist)
   
 bestresult = 0
 bestfit = ""

 # Parse the good results looking for the best result
 for r in goodresults:
  if r[atatime] > bestresult:
   bestresult  = r[atatime]
   flaglist[pair]  = r[0]
   flaglist[pair+1] = r[1]
   flaglist[pair+2] = r[2]
   flaglist[pair+3] = r[3]

 currentflag = "".join(flaglist)
 print "[+] Flag so far: " + currentflag
   
 pair += atatime

print "[+] Flag: " + currentflag
























Monday, May 4, 2015

Volga CTF 2015 - Captcha - 150 point Stego challenge

captcha

We've got a rather strange png file. Very strange png. Something isn't right about it...

png

Stego challenges are not my favorite but still I gave this one a try because I felt the point value meant it would be a reasonably quick solve.




The png file when viewed just appeared to be a single 256x256 image of the letter "i". Sort of like this (this is not the actual PNG from the challenge):
When we investigated further though it's rather large in size to be a single image:

root@mankrik:~/volga/captcha# ls -lah capthca.png 
-rw-r--r-- 1 root root 1.6M Apr 30 22:56 capthca.png
root@mankrik:~/volga/captcha# file capthca.png 
capthca.png: PNG image data, 256 x 256, 8-bit/color RGB, non-interlaced
root@mankrik:~/volga/captcha# pngcheck -v capthca.png 
File: capthca.png (1622884 bytes)
  chunk IHDR at offset 0x0000c, length 13
    256 x 256 image, 24-bit RGB, non-interlaced
  chunk IDAT at offset 0x00025, length 735
    zlib: deflated, 32K window, default compression
  chunk IEND at offset 0x00310, length 0
  additional data after IEND chunk
ERRORS DETECTED in capthca.png

So pngcheck says there's additional data after the IEND chunk. Let's try carving the file with foremost:

root@mankrik:~/volga/captcha# foremost -v capthca.png 
Foremost version 1.5.7 by Jesse Kornblum, Kris Kendall, and Nick Mikus
Audit File

Foremost started at Mon May  4 19:04:58 2015
Invocation: foremost -v capthca.png 
Output directory: /root/volga/captcha/output
Configuration file: /etc/foremost.conf
Processing: capthca.png
|------------------------------------------------------------------
File: capthca.png
Start: Mon May  4 19:04:58 2015
Length: 1 MB (1622884 bytes)
 
Num  Name (bs=512)        Size  File Offset  Comment 

0: 00000000.png        792 B            0    (256 x 256)
1: 00000001.png        867 B          792    (256 x 256)
2: 00000003.png        859 B         1659    (256 x 256)
3: 00000004.png        916 B         2518    (256 x 256)

...

1890: 00003166.png        781 B      1621311    (256 x 256)
1891: 00003168.png        792 B      1622092    (256 x 256)
*|
Finish: Mon May  4 19:04:58 2015

1892 FILES EXTRACTED
 
png:= 1892
------------------------------------------------------------------

Foremost finished at Mon May  4 19:04:58 2015

Oh cool, 1892 PNG files embedded inside. Wow, what are all those files of I wonder:



Each PNG file contained a single letter. At first I thought this was the flag, that there would be a message in this and I could just read it back and win the points. Unfortunately this was just step one. After scrolling through the image previews in my Linux file explorer it quickly made sense that these images described a base64 encoded string. The biggest give away of this was the final image in the list was an "=".

So it seemed like the next step was to decode these images, get a base64 string, decode the string into a binary and get the flag. How to do that?

I've read other writeups of this challenge and saw that other people approached this in a smarter way than I did. I did this the long way, with OCR. I think if you want to know the best way to do this challenge, read those writeups. If you want to know how to OCR large groups of single letter images, read on!

Having never done any OCR before, this was going to be fun. First I found that on Linux one of the accepted OCR solutions is called Tesseract OCR and a Python interface to Tesseract OCR is called PyTesser.

So I grab those things quickly and read up on using it...


root@mankrik:~/volga/captcha# apt-get install tesseract-ocr
root@mankrik:~/volga/captcha# wget https://pytesser.googlecode.com/files/pytesser_v0.0.1.zip
root@mankrik:~/volga/captcha# unzip pytesser_v0.0.1.zip

It seems to use PyTesser we just import it and use image_to_string from a PIL image in memory. Note here that we used PyTesser 0.0.1 which uses PIL. It seems on Github there's a new version of PyTesser that uses OpenCV. I'm certain OpenCV is better but im more familiar with PIL so i'm happy to use the old version.

Our first attempt was a flop:


#!/usr/bin/python

from pytesser import *
import os
import Image
import subprocess

PNG_PATH='output/png/'

print "[+] Foremosting input..."
subprocess.call(['foremost','-Q','capthca.png'])

pngfiles = [ f for f in os.listdir(PNG_PATH) if os.path.isfile(os.path.join(PNG_PATH, f)) ]
pngfiles.sort()
flag = ""
print "[+] Processing image files ..."
for f in pngfiles:
 im = Image.open(PNG_PATH + f)
 flag += image_to_string(im)

flag = "".join(flag.split())
print "[+] Encoded flag is: " + flag 


This gave no output at all. When I looked into it I found that Tesseract is tunable and in the default mode, PyTesser has it tuned to all default settings.


root@mankrik:~/volga/captcha# tesseract 
Usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]
pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-l lang and/or -psm pagesegmode must occur before anyconfigfile.

So I wanted to use "pagesegmode" 10 to treat each image as a single character. Let's modify the args value in pytesser.py to suit our needs:


args = [tesseract_exe_name, input_filename, output_filename,'-psm','10']


Let's try our script again:


root@mankrik:~/volga/captcha# python ./fail1.py 
[+] Foremosting input...
[+] Processing image files ...
[+] Encoded flag is: .VBO.W0KG90AAAANSUhEU9AAA.AAAACDCAIAAADK7dMbAAAAAXNS.0IAFS4C.QAAAA.nQU1BAACXWV8YQUAAAAJCEhZCWAADSMAAA7DACdVqGQAAAUfSU.BVHhe7dhtYt0WEEV.1SWCWA+FYTNdTOq.hTSD.NHEmSe37Z8k9.GV.T.PQBYBmBAQQEBhAQGEBAYAAB9QEEBAYQEBhAQGAAAYEBBAQGEBAYQEB9AAGBAQQEBhAQGEBAYAAB9QEEBAYQEBhAQGAAAYEBBAQGEBAYQEB9AAGBAQQEBhAQGEBAYAAB9QEEBAYQEBhAQGAAAYEBBAQGEB

Cool! Output! Oh but wait. It's totally wrong. Doh! Firstly base64 strings don't have "." in them. It must be interpreting the lowercase i characters as a ".". Half the other characters are similarly mangled. Ouch. So not good enough, it seems our OCR library needs more context about the letters so it can do a better job at OCR?

Firstly let's tweak tesseract a bit more. I learnt about character whitelists so I set one up containing only the base64 alphabet:


root@mankrik:~/volga/captcha# cat /usr/share/tesseract-ocr/tessdata/configs/base64 
tessedit_char_whitelist 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz/+=

Next I tweak pytesser.py to use PSM 7(Treat image as a single text line) and to include our tesseract configuration file by changing the args array:



args = [tesseract_exe_name, input_filename, output_filename,'-psm','7','base64']

Next I modify our script so that it imports some number of images into a single image and then OCRs them all at once. This greatly increases our OCR efficiency but still not perfect results:

#!/usr/bin/python

from pytesser import *
import os
import Image
import subprocess
import base64

PNG_PATH='output/png/'
BEST_WIDTH=38
BEST_HEIGHT=50

# use factors of the total number of files or this will error
CHARS_WIDE=43

print "[+] Foremosting input..."
subprocess.call(['foremost','-Q','capthca.png'])

pngfiles = [ f for f in os.listdir(PNG_PATH) if os.path.isfile(os.path.join(PNG_PATH, f)) ]
pngfiles.sort()
flag64 = ""
print "[+] Processing " + str(len(pngfiles)) + " image files " + str(CHARS_WIDE) + " characters at a time."
im = Image.new("RGB", (BEST_WIDTH * CHARS_WIDE,50), "white")
idx = 0
for f in pngfiles:
 letter = Image.open(PNG_PATH + f)
 width, height = letter.size
 left = (width - BEST_WIDTH)/2
 top = (height - BEST_HEIGHT)/2
 right = (width + BEST_WIDTH)/2
 bottom = (height + BEST_HEIGHT)/2
 letter = letter.crop((left,top,right,bottom))
 im.paste(letter,(idx*BEST_WIDTH,0))
 idx+=1
 if idx == CHARS_WIDE:
  thislot = image_to_string(im) 
  thislot = "".join(thislot.split())
  flag64 += thislot
  idx=0

print "[+] Base64 of flag PNG: " + flag64


And the output:


root@mankrik:~/volga/captcha# python ./fail3.py 
[+] Foremosting input...
[+] Processing 1892 image files 43 characters at a time.
[+] Base64 of flag PNG: iVBORWOKGgoAAAANSUhEUQAAARAAAACDCAIAAADK7dMbAAAAAXNSROIArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAUfSURBVHhe7dhtYtowEEVR1sWCWA+rYTNdTOqRhTSjD6NHEmjSe37Z8kgaGV6T9PQBYBmBAQQEBhAQGEBAYAABgQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBQAAGBAQQEBhAQGEBAYAABgQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBgAAGBAQQEBhAQGEBAYAABQQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBQAAGBAQQEBhAQGEBAYAABgQEEQmD+XM+n0+I8/ZPvB6zkcss3/53b5TQ4/fSdjMvf7zd9iOk7mw9j7/vzb3w9MOU9piZ6IqRvet

And when we assemble a PNG file from the output:



root@mankrik:~/volga/captcha# pngcheck -v fail3.png 
File: fail3.png (1418 bytes)
  File is CORRUPTED.  It seems to have suffered EOL conversion.
ERRORS DETECTED in fail3.png

Now i'm really unhappy because I've spent a bit of time on these 150 points and I wan't to just solve it now. Not pretty code any more, just solution. So instead of fussing more with this I resign myself to a manual process of weeding out these final byte errors using the following script.

It assembles a string of characters, OCR's the string, displays it and asks the user to proof read. If any issues are found it can correct them. This requires about 10-20 minutes of the user's time but as I said. I just wanted a solution and 10-20 minutes of focused reading seemed like a good trade off at this point:


#!/usr/bin/python

from pytesser import *
import os
import Image
import subprocess
import base64

PNG_PATH='output/png/'
BEST_WIDTH=38
BEST_HEIGHT=50

# use factors of the total number of files or this will error
CHARS_WIDE=43

print "[+] Foremosting input..."
subprocess.call(['foremost','-Q','capthca.png'])

pngfiles = [ f for f in os.listdir(PNG_PATH) if os.path.isfile(os.path.join(PNG_PATH, f)) ]
pngfiles.sort()
flag64 = ""
print "[+] Processing " + str(len(pngfiles)) + " image files " + str(CHARS_WIDE) + " characters at a time."
im = Image.new("RGB", (BEST_WIDTH * CHARS_WIDE,50), "white")
idx = 0
for f in pngfiles:
 letter = Image.open(PNG_PATH + f)
 width, height = letter.size
 left = (width - BEST_WIDTH)/2
 top = (height - BEST_HEIGHT)/2
 right = (width + BEST_WIDTH)/2
 bottom = (height + BEST_HEIGHT)/2
 letter = letter.crop((left,top,right,bottom))
 im.paste(letter,(idx*BEST_WIDTH,0))
 idx+=1
 if idx == CHARS_WIDE:
  im.show()
  thislot = image_to_string(im) 
  thislot = "".join(thislot.split())
  print "[+] OCRd: " + thislot
  check = raw_input("These ok? >> " )
  if 'n' in check:
   redo = list(thislot)
   chk = 0
   for r in redo:
    ok = raw_input("Correct this " + r + "["+r+"] >> ")
    if ok <> '':
     redo[chk] = ok
    chk += 1
   thislot = "".join(redo)
   flag64 += thislot  
   print "[+] Corrected: " + thislot
   subprocess.call(['killall','-9','display'])
  else:
   flag64 += thislot
   subprocess.call(['killall','-9','display'])
   
  flag64 = "".join(flag64.split())
  idx = 0
  im = Image.new("RGB", (BEST_WIDTH * CHARS_WIDE,50), "white")

subprocess.call(['rm','-fr','./output/'])

f=open('flag.b64','wb')
f.write(flag64)
f.close()
print "[+] Base64 of flag PNG: " + flag64

And the output looks like this, it'll throw up an image on your screen:




root@mankrik:~/volga/captcha# ./readcaptcha.py 
[+] Foremosting input...
[+] Processing 1892 image files 43 characters at a time.
[+] OCRd: iVBORWOKGgoAAAANSUhEUQAAARAAAACDCAIAAADK7dM
These ok? >> n
Correct this i[i] >> 
Correct this V[V] >> 
Correct this B[B] >> 
Correct this O[O] >> 
Correct this R[R] >> 
Correct this W[W] >> w
Correct this O[O] >> 0
Correct this K[K] >> 
Correct this G[G] >> 
Correct this g[g] >> 
Correct this o[o] >> 
Correct this A[A] >> 
Correct this A[A] >> 
Correct this A[A] >> 
Correct this A[A] >> 
Correct this N[N] >> 
Correct this S[S] >> 
Correct this U[U] >> 
Correct this h[h] >> 
Correct this E[E] >> 
Correct this U[U] >> 
Correct this Q[Q] >> 
Correct this A[A] >> 
Correct this A[A] >> 
Correct this A[A] >> 
Correct this R[R] >> 
Correct this A[A] >> 
Correct this A[A] >> 
Correct this A[A] >> 
Correct this A[A] >> 
Correct this C[C] >> 
Correct this D[D] >> 
Correct this C[C] >> 
Correct this A[A] >> 
Correct this I[I] >> 
Correct this A[A] >> 
Correct this A[A] >> 
Correct this A[A] >> 
Correct this D[D] >> 
Correct this K[K] >> 
Correct this 7[7] >> 
Correct this d[d] >> 
Correct this M[M] >> 
[+] Corrected: iVBORw0KGgoAAAANSUhEUQAAARAAAACDCAIAAADK7dM
...

And so on... You get the idea. You'd think we'd be finished by now right? Noooo. There was a time bomb in this challenge that I only just now discovered. And that is the upercase letter "I" looks identical to the lowercase letter "l". The OCR and even human recognition steps have no way to tell these apart. The best I could do after the best OCR and human recognition i could do resulted in a dud file still:



root@mankrik:~/volga/captcha/best# pngcheck -v flag.png 
File: flag.png (1418 bytes)
  chunk IHDR at offset 0x0000c, length 13
    272 x 131 image, 24-bit RGB, non-interlaced
  chunk sRGB at offset 0x00025, length 1
    rendering intent = perceptual
  chunk gAMA at offset 0x00032, length 4: 0.45455
  chunk pHYs at offset 0x00042, length 9: 3779x3779 pixels/meter (96 dpi)
  chunk IDAT at offset 0x00057, length 1311
    zlib: deflated, 32K window, fast compression
  CRC error in chunk IDAT (computed 29d1d0fd, expected 6eb66220)
ERRORS DETECTED in flag.png

So I was stuck here and ready to throw in the towel, but I had one more idea. I didn't need to make the file perfect, I just needed enough image data to read a flag. So I counted the number of letter I's in the base64. There were 29 of them. So I had 229 (536,870,912 different possible file combinations here. I don't NEED a perfect image though. So here was my final idea:


  • First, only permutate the data in the obvious part of the base64 data where the actual useful image data is found. This obvious when you see all the base64 data.
  • Only try some reasonable number of permutations

So the base64 data we needed to focus on was quite obvious to the naked eye. I've highlighted it below. It's obvious because the surrounding data is repeated QEEBAYhEBAY etc which looks like it might be a repeating data, like a blank background of the image maybe?


iVBORw0KGgoAAAANSUhEUgAAARAAAACDCAIAAADK7dMbAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAUfSURBVHhe7dhtYtowEEVR1sWCWA+rYTNdTOqRhTSjD6NHEmjSe37Z8kgaGV6T9PQBYBmBAQQEBhAQGEBAYAABgQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBgAAGBAQQEBhAQGEBAYAABgQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBgAAGBAQQEBhAQGEBAYAABgQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBgAAGBAQQEBhAQGEBAYAABgQEEQmD+XM+n0+I8/ZPvB6zkcss3/53b5TQ4/fSdjMvf7zd9iOk7mw9j7/vzb3w9MOU9piZ6IqRvetf+2JKnJz6IJsDvO30nBOb7NV+ALzjaemAGn283JDe09qVpjr3u6YIPITDvNvxCut+JvuBoPyIwP8T4MATmdYZfyDcGpv37Zdjf5WbD5be0O3t0t8/xIw++OmGjsvym7ajIJq71Iou3svP1tg/vIX3/SR0+X691s5V9Xc1msv6ITXTt28x8O1xkbyDfNLuOTBsert8049YfNjOzFW+rrG5qSu3Gyn1ZLdyq3FpW87CVY8uBGe1VX062N507tKf5Mkz2s7oVxmqZreTf5gNu/aPefD/p0p67zkKNa8GPhymzfdsJviZf+/UnbGapKOVhkdTNfmPj9+Gw60Ra576+2yqsU5dZaKaMzoVN3WS7HGw6WbE/3Dbi6/oC1Vpgmm3vuu3D8Wbd+cMuHsCV2eWoIzE3cdLb+NXHvtqa7Wm6bcbdpOm+7km9mq0/5SaUS7dIUu5jA21Zb6I+1lG7dGPJ0oncNuNd3apW0K/YTxts/LCXQ8s/YUZH6Mbisf1je+LcOx6tOhDL7C6sMucmTnobdxBHm+YTe9rMdbfuMu7rntSr2fpzNiOdvly0zdT72EBb1pvXN33eX/+gmaYyOdw1bup3nWy61wyG4jbbSH3sO3zWd/wNk2/q43TmMu6b7k84NC6z0QendxMPeusXiRuOa7pxN2m6r93I21ozW//IPtstHIt297GBtqw3qXed57vScr5xE8PjFW7upky3i8mmhXWYRx8ebiuQ2upJgWmb6YbiscvjcGwbLV3bzcIJBntvxu8vcBMnvXXNpctmw1DjWFnuIJWUombfWDWpGax/xKa7HfPA/c515jfaJx3vZM3Uijl59JjWce9+uz9fLm6eeqJU37+Iw00zq8mjVhEKmjbsVmhq5CWByY3v0v8mIVOVB4fnqCvZVXU4yTQt1PrQuIs0j4XHiTuBuT+sw7N/7rfR8h9um/p5xj0m6x/Yuw6F7iDhe1PGQ5MTsSKeJC/TflibQTPSifZNS5916fGm7qCbunSp9i3Hx8d9PPSpwOCd7NMPsfjJvuCrvOCIgXnNkbDqV+XINwYm7db8qP9K8ads9fCAT0/8xx2f65N5ectLO9j0BYFJu3/66ysEBgCBAQQEBhAQGEBAYAABgQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBgAAGBAQQEBhAQGEBAYAABgQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBgAAGBAQQEBhAQGEBAYAABgQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBgAAGBAQQEBhAQGEBAYAABgQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBgAAGBAQQEBhAQGEBAYAABgQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBgAAGBAQQEBhAQGEBAYAABgQEEBAYQEBhAQGAAAYEBBAQGEBAYQEBgAAGBAQQEBlj28fEX4wPz1G62YiCAAAAASUVORK5CYll=

The "reasonable" number of permutated I/l combinations I settled on was 211 which is just 2,048 combinations. If I could get enough image data in the first 11 I/l substitutions, I would be super happy.

Here's the final code:


#!/usr/bin/python
#
# Captcha solver for VolgaCTF 2015.
# CaptureTheSwag - capturetheswag.blogspot.com
# by dacat

from pytesser import *
import os
import Image
import subprocess
import base64
import itertools

PNG_PATH='output/png/'
ATTEMPT_PATH='attempts/'
BEST_WIDTH=38
BEST_HEIGHT=50

# use factors of the total number of files or this will error
CHARS_WIDE=43

alphabet = ('I','l')
MAXLEN=11

print "[+] Foremosting input..."
subprocess.call(['foremost','-Q','capthca.png'])

pngfiles = [ f for f in os.listdir(PNG_PATH) if os.path.isfile(os.path.join(PNG_PATH, f)) ]
pngfiles.sort()
flag64 = ""
print "[+] Processing " + str(len(pngfiles)) + " image files " + str(CHARS_WIDE) + " characters at a time."
im = Image.new("RGB", (BEST_WIDTH * CHARS_WIDE,50), "white")
idx = 0
for f in pngfiles:
 letter = Image.open(PNG_PATH + f)
 width, height = letter.size
 left = (width - BEST_WIDTH)/2
 top = (height - BEST_HEIGHT)/2
 right = (width + BEST_WIDTH)/2
 bottom = (height + BEST_HEIGHT)/2
 letter = letter.crop((left,top,right,bottom))
 im.paste(letter,(idx*BEST_WIDTH,0))
 idx+=1
 if idx == CHARS_WIDE:
  im.show()
  thislot = image_to_string(im) 
  thislot = "".join(thislot.split())
  print "[+] OCRd: " + thislot
  check = raw_input("These ok? >> " )
  if 'n' in check:
   redo = list(thislot)
   chk = 0
   for r in redo:
    ok = raw_input("Correct this " + r + "["+r+"] >> ")
    if ok <> '':
     redo[chk] = ok
    chk += 1
   thislot = "".join(redo)
   flag64 += thislot  
   print "[+] Corrected: " + thislot
   subprocess.call(['killall','-9','display'])
  else:
   flag64 += thislot
   subprocess.call(['killall','-9','display'])
   
  flag64 = "".join(flag64.split())
  idx = 0
  im = Image.new("RGB", (BEST_WIDTH * CHARS_WIDE,50), "white")

subprocess.call(['rm','-fr','./output/'])
print "[+] OCR Completed. Creating permutations"
header=flag64[:361]
body=flag64[361:]

try:
 os.makedirs(ATTEMPT_PATH)
except OSError as exc:
 pass

# fiddle with all the l/I combinations and create a lot of PNGs
for i in itertools.product(alphabet, repeat=MAXLEN):
        idx = 0
        bodylist = list(body)
        for char in range(len(bodylist)):
                if 'I' in bodylist[char]:
                        bodylist[char] = i[idx]
                        idx += 1
                if idx >= MAXLEN:
                        flag = header + "".join(bodylist) 
                        flagbin = base64.b64decode(flag)
                        f=open(ATTEMPT_PATH+'attempt'+"".join(i)+".png", 'wb')
                        f.write(flagbin)
                        f.close()
                        break

print "[+] Completed, copy the png files from the " + ATTEMPT_PATH + " folder to a Windows system and find the flag!" 


And when we run it, we find 2,048 PNG files. Copying them to Windows 7 system allows simple browsing with Extra Large thumbnail mode right in explorer:



And finally we spot the "best" attempt:



Which is by no means perfect, but was enough for me to get the flag:

{That_is_incredible_you_have_past!}

Incredible indeed. That's how you do a challenge the wrong way!