Introduction
The simulator is largely functional now and quite stable as well. It’s running on a server and I’ve even exposed it on the internet: http://cray.modularcircuits.com/. If you visit that page, you’ll see that you can ssh to the machine. But it’s sort of a cheat: you don’t actually ssh into UNICOS itself directly. Instead, you ssh into the host machine, running the simulator, which in turn does an rlogin for you into the simulator. Not only this is ugly, but it doesn’t give you two essential features: scp and X11 forwarding.
In this I’ll describe my quest to get native sshd running on UNCIOS.
Now, before we get there, let me say this: this shouldn’t have been this hard: there used to be something, called Cray Open Software, a package of pre-compiled OSS projects for UNICOS. This included, among other things ssh. If only I could get a copy of it… But of course not, it’s nowhere to be found, and noone in Cray even bothered to reply to my requests.
At the same time, would I really be comfortable running a ~20 year old ssh server? That sounds like asking for trouble.
Peeling the onion
If I need ssh (openssh to be more precise), I’ll need its dependencies of course. Small ones, like zlib are simple: download, configure, make, make install and you’re done. The big one however is openssl. And openssl needs Perl.
Perl is a whopper. I’ve spend quite some time unsuccessfully trying to massage it to compile. I’ve eventually gotten to the point where the first step (building miniperl) succeeded, but then the build process just hung with no further progress. The major problem in getting even this far was Perls insistence on IEEE compatible floating point numbers, which of course the J90 is not. I was hacking thing in hoping that the Perl based openssl build process doesn’t actually use floating-point, but that didn’t help.
Then, I got lucky: browsing all the content that I’ve amassed during the years, I’ve came across a CrayDOC documentation CD. This contains a bunch of manuals, man pages, and a simple web-server based environment to search and display all that. Now, for some reason this environment is included in source form: apache 1.3.20, Berkeley DB 3.2.9 and Perl 5.6.1. Of course, these tools as far as I can tell were intended to be run on the SWS station (a Sun workstation used to manage the J90), but – what the hell – I’ve tried to compile it on UNICOS. And it worked!
So, one down, several more to go. Next one up is openssl. The configuration and compilation steps went rather smooth, though excruciatingly slow. For some reason linking anything with openssl libraries takes 15 minutes.
Along the way I’ve found out that UNICOS lacks /dev/urandom, so a replacement (prngd) was needed. No biggie, it worked like a charm. It even included a sample configuration for UNICOS.
On to the main course
With all that being done, I was finally ready for openssh. I usually take versions of tools from the same era as the OS release for two reasons:
- I think it’s more authentic to the experience not to include the latest features and functionality that would not have been available to the users of the time
- It’s quite likely that even if a project nominally supports UNICOS, compatibility bugs crept in during the two decades when no one was testing these projects on the J90s.
To minimize my exposure to vulnerabilities though, I’ve made an exception as far as openssl and openssh goes: I’ve grabbed the latest version of both (1.0.2l and 7.5p1 respectively).
Getting openssh to compile was not terribly complicated, though it involved fixing a few issues in the configure script mis-identifying features of the platform.
I needed to turn off two features by setting the following in config.h:
1 2 |
#define BROKEN_READV_COMPARISON 1 #define DISABLE_UTMPX |
I opted to simply patching up the config rather than trying to figure out how to fix the God-awful autoconf scripts.
After a lot of patience due to the aforementioned link time problems, I had an ssh client to try.
So I did.
And nothing worked.
That was rather unpleasant as I know nothing about ssh, ssl or cryptography in general so the prospect of digging into the guts of these tools and trying to find obscure compatibility issues wasn’t really palatable. I happen to work together with a talented security engineer though so I picked his mind as to what might the cause be.
His advise was to sprinkle debug printfs all over the place, compile the same code on both Linux and UNICOS and start digging for divergent execution paths.
I did that and pretty soon I have found that the code calls into an obscure openssl function.
At this point I started to despair: if I have to keep re-compiling (and re-deploying) openssl to make any further progress, I would grow a pretty long beard by the time I find anything interesting.
A new tackt
Googling around I’ve came across a post stating that openssh can be compiled without openssl. It even gave instructions on how to do that. That was good news: the compilation time for ssh went from 20 minutes to 2 if I modified a single C file. That is actually a pretty reasonable turnaround, so my progress got significantly faster. I quickly zeroed in on sha2.c.
Here, I have to make a little detour and introduce you to the weird world of C on the J90. You see, the J90, like all Cray-1 descendant machines is a 64-bit architecture, but one where the smallest integer is 64 bits. In most architectures, people are accustomed to the fact that large integers (say uint64_t on a 32-bit architecture) is made up of smaller integers and the compiler generating multi-instruction sequences for operations on those numbers.
For the J90 it’s the opposite: 64-bit integers are the most efficient, and if you want to go smaller, say 32-bits, the compiler has to do extra gymnastics to simulate the expected behavior for you. As such, both int, long are defined as 64-bit integers, short is 32-bit long and char is 8-bit long.
There’s more however: since the architecture doesn’t natively understand anything but 64-bit integers, it can’t even address sub-64-bit quantities. Every pointer is 8-byte aligned! Now, that’s a lie actually, the C compiler does make char-pointers work, but that involves even more extra gymnastics. The end result is this:
1 2 3 4 |
sizeof(unsigned int) = 8; UINT_MAX = 18446744073709551615 sizeof(unsigned long) = 8; ULONG_MAX = 18446744073709551615 sizeof(unsigned char) = 1; UCHAR_MAX = 255 sizeof(unsigned short) = 8; USHRT_MAX = 4294967295 |
Now, that’s weird: short consumes 8 bytes of storage, yet it can only represent 32-bit integers!
This fact brakes a lot of assumptions in the code. For example:
1 2 3 4 5 6 7 8 9 10 |
SHA256_Final(u_int8_t digest[SHA256_DIGEST_LENGTH], SHA256_CTX *context) { SHA256_Pad(context); /* If no digest buffer is passed, we don't bother doing this: */ if (digest != NULL) { memcpy(digest, context->state, SHA256_DIGEST_LENGTH); memset(context, 0, sizeof(*context)); } } |
Here we memcpy an array of 32-bit integers into a byte-array. Well, that works, but it gets padded with a bunch of 0-s in every other four bytes. Plus, SHA256_DIGEST_LENGTH doesn’t even describe the true size of the source in bytes, so we only copy half of the array.
Notice how the code is actually non-compliant in the sense that it makes the underlying assumption that a 32-bit integer is 4 bytes long. The standard only guarantees that anything can be copied into a byte array (well, char array to be precise) and back and keep its meaning. It doesn’t guarantee any particular semantics of the byte-array itself.
Another example, this one from sha1.c:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
typedef union { u_int8_t c[64]; u_int32_t l[16]; } CHAR64LONG16; /* * Hash a single 512-bit block. This is the core of the algorithm. */ void SHA1Transform(u_int32_t state[5], const u_int8_t buffer[SHA1_BLOCK_LENGTH]) { u_int32_t a, b, c, d, e; u_int8_t workspace[SHA1_BLOCK_LENGTH]; CHAR64LONG16 *block = (CHAR64LONG16 *)workspace; (void)memcpy(block, buffer, SHA1_BLOCK_LENGTH); ... |
Here we assume that we can overlay the union to a byte-array than later on interpret it as an array of 32-bit integers. Again, the union will not help you: the standard doesn’t make any claims that you can safely re-interpret types like that. In fact it states the opposite. So, here again, the C compiler is right, the code is wrong.
Finally, in poly1305.c, there’s this piece of code:
1 2 3 4 |
f0 = ((h0 ) | (h1 << 26)) + (uint64_t)U8TO32_LE(&key[16]); f1 = ((h1 >> 6) | (h2 << 20)) + (uint64_t)U8TO32_LE(&key[20]); f2 = ((h2 >> 12) | (h3 << 14)) + (uint64_t)U8TO32_LE(&key[24]); f3 = ((h3 >> 18) | (h4 << 8)) + (uint64_t)U8TO32_LE(&key[28]); |
Here, h0…h4 are 32-bit integers, while f0…f4 are 64-bit ones. In this case the standard says to promote h0…h4 to unsigned int, than follow-on with the operations. Well, on this architecture, unsigned int is 64-bits long, so the codes expected truncation of the values after the shift left operations doesn’t happen. Again, the compiler wins.
There were a few other problems, for example in some files the 32-bit integer type (u_int32_t) was actually typedef-ed to unsigned int, instead of unsigned short, making the math actually 64-bits, but that can easily be fixed.
With these changes, I’ve finally managed to get ssh and sshd work with the chacha20-poly1305@openssh.com cipher. Fixing the AES ciphers seems way more involved, I won’t bore you with the details: it boils down to the same incorrect assumption: that 32-bit integers are actually 32-bit long. I still don’t have a complete fix, even after several additional fixes. I believe my current problems boil down to stack corruption, where a u_int32_t array is overlaid with a byte-array, then over-indexed because of the incorrect assumption.
It doesn’t help matters that the debugger (TotalView) doesn’t debug into child processes and sshd loves to fork. So for now, only chacha20 is enabled as a cipher. That shouldn’t be a big deal for modern ssh clients, but just be warned: you might need to update your client.
X11
As I’ve said before, one of the main reasons to get sshd working on UNICOS is to get X11 forwarding as well.
Now, theoretically, it should “just work”. In practice, when I’ve logged into my server with X forwarding enabled, I got the following:
1 2 3 4 5 6 |
tantos@paprika:~$ ssh -X crayusr@<<omittied>> crayusr@192.168.169.3's password: Last successful login was : Sat Oct 21 10:46:59 from sn9000. /usr/bin/X11/xauth: (stdin):1: bad display name "unix:12.0" in "remove" command /usr/bin/X11/xauth: (stdin):2: bad display name "unix:12.0" in "add" command |
And of course X applications complain:
1 2 3 |
bash-2.03$ xclock X11 connection rejected because of wrong authentication. X connection to localhost:12.0 broken (explicit kill or server shutdown). |
What could this be? Google wasn’t terribly helpful, it seems that this particular problem is unique to my installation.
I don’t have source code for the X11 install on UNICOS, but that doesn’t mean I can’t look at the X11 source. It just means I can’t exactly reproduce the build, but an X11R6 source should be close enough.
After downloading and starting to look into the code, it pretty soon became clear what the problem was. For example, in parsedpy.c, which is responsible for parsing the server string in xauth, this is what one finds:
1 2 3 4 |
#ifdef UNIXCONN #define UNIX_CONNECTION "unix" #define UNIX_CONNECTION_LENGTH 4 #endif |
In other words “unix” as the name of the display is not even considered, unless UNIXCONN is defined. Apparently it wasn’t in the build that Cray used.
Now I’m not looking forward for re-building the whole of X (especially because even the first imake step failed), but I can re-build xauth, that’s a simple enough program. With that, I can make xauth to accept the server string, but will the rest of X11 work as well?
Anyway, it’s worth a shot, so I recompiled xauth with the right #define set, put the executable into /usr/bin/X11 and retried.
No complaints of course from xauth anymore during login, so, here’s the moment of truth:
1 2 3 |
bash-2.03$ xclock & [1] 24347 bash-2.03$ |
Yay! Apparently the rest of X11 really doesn’t care about this #define (it is used in a couple of places though) or simply xauth was mis-compiled by Cray. Either way, now both ssh and X11 forwarding works.
Closing words
This work was way more convoluted that it should have been: yes, the C compiler on Cray is funky, the type system on J90 is also weird, but every single problem I’ve identified in openSSH was the result of relying on undefined behavior.
I still don’t know why openSSL doesn’t work, but chances are, similar if not identical issues exist in that codebase as well. I’m not terribly inclined to dig into it and fix them.
Unfortunately that means that other ‘secure’ protocols, like https are still not supported on my machine which is a bit of a bummer.
The last thing to do was to recompile all of this with optimizations enabled and deploy it on the ‘production’ simulator.
As a teaser though what X11 gives you, here’s source-level debugging of a program with stack trace, local variables, source listings, all in an interactive, graphical interface, called totalview: