Jeffrey Wang's Blog @ cs.utexas.edu/~wang

Important notice about this site

I graduated from the University of Texas at Austin in December 2020. This blog is archived and no updates will be made to it.

Welcome to my blog!

Here, I post on topics relevant to my time here at the university, general topics on computer science and business administration, and anything interesting that I learn as a student and lifelong learner.

Wish to reach out to me about my blog posts? I'm open to questions, comments, and suggestions. Feel free to contact me.

Getting started with the UTCS lab machines

May 15, 2019

The UT Computer Science department provides some pretty good facilities for students to use. When students first enter UTCS, they are told to create an account, with the expectation they will figure out how to use the rest of the facilities on their own. This post aims to explain what one can do with their UTCS account, on the lab machines, and how to use the facilities.

How the lab machines work

The lab machines are computers in the 1st and 3rd floors of the Gates–Dell Complex (GDC), which is the computer science building here at UT Austin. They run Ubuntu Linux.

Students log into the lab machines using their UTCS account credentials. All of your user files are stored on the same storage cluster, so it does not matter which machine you use.

The list of lab machines are listed on the CS website here. You can SSH into them using ssh csid@labmachinehost.cs.utexas.edu, where csid is your UTCS account username (discussed more in detail below) and labmachinehost is where you put the lab machine's hostname in. For instance, I might do ssh wang@linux.cs.utexas.edu to log into a lab machine.

Creating a UTCS account

Each CS student gets to create a UTCS account online beginning about two weeks before classes start for their first semester at UT Austin. It is separate from your UT EID. (In fact, your EID password and your UTCS account password can't be the same, last I checked.) This is what you use for all UTCS student services and what you use to log in to the lab machines.

Unlike your UT EID, you get to pick your UTCS account name. For instance, I picked wang because it's my last name. However, as long as it's appropriate, you can pick anything you want. Someone I know decided to go with dab.

Each UTCS account can be configured to use a shell of your choice. The default is bash, but zsh and tcsh are also available.

The account comes with two really nice perks: your own @cs.utexas.edu email and a webpage on cs.utexas.edu. They are both determined by your username. For instance, right now, you are reading this webpage on https://www.cs.utexas.edu/~wang, and you can email me via wang@cs.utexas.edu. The email can either be forwarded to an address or you can choose to use the mail servers that the department maintains.

Note: Logging into the UTCS machines from an external network (i.e. neither on utexas Wi-Fi nor on the UT VPN) requires you to use public-key authentication. The department provides a tutorial for those who need assistance setting this up.

Creating your UTCS website

Once you have an account, you can log in to a lab machine, either via SSH or physically on a lab machine.

In your user directory, you'll see there is a public_html directory. Whatever files that are put inside there with world-readable permissions will be properly served by the UTCS web servers under https://www.cs.utexas.edu/~csid or https://www.cs.utexas.edu/users/csid, where csid is your UTCS account username.

The server supports PHP scripts and CGI scripts for languages such as Perl and Python. In the future, I'll publish another blog post on how to get a Python CGI script running on the UTCS web servers, since it's not possible to set up a Flask/WSGI configuration.

Using UTCS-specific functionality on lab machines

To look someone's UTCS account up, you can use the finger command. For instance, if you do finger wang@cs.utexas.edu, then the following information would appear:

[net8.cs.utexas.edu: phingerd responds at Wed May 15 18:15:31 2019]

Last login:   never logged in

Recent sessions:
wang     the-professor:pts/3      May 12 03:43 - 07:54  (1+04:11)
wang     the-professor:pts/1      May 13 14:00 - 15:19  (01:19)
wang     the-professor:pts/0      May 14 20:32 - 02:45  (06:13)
wang     the-professor:pts/4      May 14 22:28 - 09:24  (10:56)
wang     the-professor:pts/5      May 14 22:29 - 09:24  (10:55)

Name:         Jeffrey Wang                                                    
Org:          University of Texas at Austin, Department of Computer Science   
Office:                                                                       
Office Ph#:                                                                   

Home Ph#:                              Birthday:                             

Login:        wang                     Sponsor:    mcicero                   
Group:        under                    Type:       under                     
Shell:        /bin/zsh                 Expires:    Sep 30, 2019              
Server:       /v/filer5b/v38q001/wang  Quota:      under-default             

Mailbox:      Aliased to <jeffreywang@utexas.edu>

[PLAN]
My email is wang@cs.utexas.edu, not sure what's supposed to go here. -- August 16, 2018

You can also simply finger for names instead of using a specific username. For instance, my user would appear if you did finger wang. (To protect the privacy of others, that output will not be displayed here.) You can finger yourself by typing in finger $(whoami).

To see who else is on the lab machine with you at that point, you can type users or who, with who giving more detailed output than users does.

Each undergraduate student is given a 10 GB user space quota. (It used to be 2 GB, so we're not complaining too much at the moment!) To check how much disk space you are using, type chkquota. You can actually check how much disk space other users are using by doing chkquota csid, but don't be a creep.

To print from the lab machines, you can do so directly from the command line by using lpr -PlwXXX filename.pdf, where XXX is the printer number. The printer numbers will be evident in the GDC. To check on the print queue, use lpq -PlwXXX. In case you want to cancel, you can remove a print job by using lprm -PlwXXX printjobnum. Obviously, you should not use this command if you are not in the GDC.

As long as you follow these below rules, the lab machines are your oyster!

Please note: this system is intended to serve the instructional,
research, and administrative needs of the students, faculty, and
staff of the UT Austin Department of Computer Sciences.  Any other
use of this system, including but not limited to using any method
to circumvent proper authentication or authorization, constitutes
unauthorized access and may subject the user to criminal prosecution
under Texas Computer Crime Statutes and other state or federal laws.

(Updated May 24, 2019 to include the who command.)

Emulating 'Hello World' in ARM

May 15, 2019

Who knew "Hello world!" would be so difficult to emulate?

For my Computer Architecture class, we got to pick our final project. Three classmates and I decided to group up and extend the ARM AArch64 emulator we created earlier in the semester in the class so it could support the printf function in C.

Unfortunately, this was much easier said than done.

To understand the difficulty behind emulating printf, let's explore what C has to do with our ARM emulator. We would write a C program that was compiled without the C standard library included, so it would convert the C code that we wrote into AArch64 assembly instructions. However, real programs run with the C standard library, which itself is a lot of assembly instructions.

For starters, we would have to emulate all of the instructions used by the C standard library to start, before it could even begin to execute whatever is in the main() function. It turns out there is a mind-boggling amount of work that the GNU C library does for each program in order for it to prepare to do whatever awaits it in main(). That's hundreds of instructions, some of which were SIMD/vector instructions that we did not even know about originally.

The real kicker is that we have to emulate syscalls. Whenever the assembly file says svc #0x0, that means we have to take whatever value is stored in x8 and look up which syscall to perform as specified in the syscall table that the OS (in this case, Linux) provides. The emulator therefore has to trick the program into thinking that the syscall executed correctly and that the expected return value is provided.

Eventually, we didn't make much progress on emulating the GNU C library's startup functions, so we switched over to the musl C library. It featured much fewer instructions than the GNU C library used, and didn't make as many pointless syscalls (such as calling brk for a simple hello world printf program - why does the heap need to be expanded for that?). Unfortunately, we still didn't make too much progress on emulating the entire musl startup process either.

At the end of the day, we realized just how much work was necessary in order to implement the C standard library, whether the GNU implementation or musl. That being said, we sure did learn a lot about syscalls and other tidbits about the ARMv8 architecture. While we couldn't get "Hello world" to print out with the C standard library, we were at least able to emulate a simple hello world assembly program that used syscalls to make this happen. Overall, extending our ARMv8 Emulator to support the C standard library was a somewhat disappointing yet very insightful experience.

Here is the disassembly of the simple hello world assembly program that makes syscalls directly without the overhead of the C standard library. Actually, write_char at 400174 is slightly inaccurate; it is missing a ret statement. In reality, it should include one; I am just too lazy to regenerate the .disas again to get it.

writesyscall:     file format elf64-littleaarch64

Disassembly of section .note.gnu.build-id:

00000000004000e8 <.note.gnu.build-id>:
  4000e8:   00000004    .inst   0x00000004 ; undefined
  4000ec:   00000014    .inst   0x00000014 ; undefined
  4000f0:   00000003    .inst   0x00000003 ; undefined
  4000f4:   00554e47    .inst   0x00554e47 ; undefined
  4000f8:   6ec8f7ae    .inst   0x6ec8f7ae ; undefined
  4000fc:   ab44b93d    adds    x29, x9, x4, lsr #46
  400100:   4d434f15    .inst   0x4d434f15 ; undefined
  400104:   410d9a08    .inst   0x410d9a08 ; undefined
  400108:   cce822a0    .inst   0xcce822a0 ; undefined

Disassembly of section .text:

000000000040010c <write_string>:
  40010c:   a9bd7bfd    stp x29, x30, [sp,#-48]!
  400110:   910003fd    mov x29, sp
  400114:   f9000fa0    str x0, [x29,#24]
  400118:   f9400fa0    ldr x0, [x29,#24]
  40011c:   39400000    ldrb    w0, [x0]
  400120:   3900bfa0    strb    w0, [x29,#47]
  400124:   3940bfa0    ldrb    w0, [x29,#47]
  400128:   7100001f    cmp w0, #0x0
  40012c:   540000e0    b.eq    400148 <write_string+0x3c>
  400130:   f9400fa0    ldr x0, [x29,#24]
  400134:   94000010    bl  400174 <write_char>
  400138:   f9400fa0    ldr x0, [x29,#24]
  40013c:   91000400    add x0, x0, #0x1
  400140:   f9000fa0    str x0, [x29,#24]
  400144:   17fffff5    b   400118 <write_string+0xc>
  400148:   d503201f    nop
  40014c:   a8c37bfd    ldp x29, x30, [sp],#48
  400150:   d65f03c0    ret

0000000000400154 <start>:
  400154:   a9bf7bfd    stp x29, x30, [sp,#-16]!
  400158:   910003fd    mov x29, sp
  40015c:   90000000    adrp    x0, 400000 <write_string-0x10c>
  400160:   91062000    add x0, x0, #0x188
  400164:   97ffffea    bl  40010c <write_string>
  400168:   d503201f    nop
  40016c:   a8c17bfd    ldp x29, x30, [sp],#16
  400170:   d65f03c0    ret

0000000000400174 <write_char>:
  400174:   d2800808    mov x8, #0x40                   // #64
  400178:   aa0003e1    mov x1, x0
  40017c:   d2800020    mov x0, #0x1                    // #1
  400180:   d2800022    mov x2, #0x1                    // #1
  400184:   d4000001    svc #0x0

Disassembly of section .rodata:

0000000000400188 <__bss_end__-0x10007>:
  400188:   6c6c6568    .word   0x6c6c6568
  40018c:   Address 0x000000000040018c is out of bounds.

Disassembly of section .comment:

0000000000000000 <.comment>:
   0:   3a434347    ccmn    w26, w3, #0x7, mi
   4:   694c2820    ldpsw   x0, x10, [x1,#96]
   8:   6f72616e    umlsl2  v14.4s, v11.8h, v2.h[3]
   c:   43434720    .inst   0x43434720 ; undefined
  10:   352e3520    cbnz    w0, 5c6b4 <write_string-0x3a3a58>
  14:   3130322d    adds    w13, w17, #0xc0c
  18:   30312e37    adr x23, 625dd <write_string-0x39db2f>
  1c:   2e352029    usubl   v9.8h, v1.8b, v21.8b
  20:   00302e35    .inst   0x00302e35 ; NYI

The binary was created by compiling and linking writesyscall.c and write_char.S.

writesyscall.c:

extern void write_char(const char* c);

void write_string(const char* s) {
    do {
        char c = *s;
        if (c == 0) return;
        write_char(s);
        s++;
    } while(1);
}

void _start() {
    write_string("Hello world!\n");
}

write_char.S:

.global write_char

write_char:
mov x8, #0x40
mov x1, x0
mov x0, #1
mov x2, #1
svc #0x0
ret

Perks of Python

May 14, 2019

I've had the great pleasure of learning quite a few programming languages over the past several years. I started with PHP in 8th grade. (I maintain that it is a necessary evil for my job, but that is a discussion for later.) Then, I learned Java in 9th grade in my AP Computer Science class, like most computer science students do in the United States. When I changed schools in 11th grade, I took two semesters of introductory programming in C++. Along the way, I've picked up JavaScript on my own. These few languages cover a great variety of applications. However, something has always been missing in the mix: Python.

Why do I know all of these languages but never bothered to learn Python?

To start off with, I'm not a big fan of Python's syntax. Astute observers may notice that all of the languages I mentioned in the prior paragraph are certainly in the C family in terms of syntax. Python's syntax is a significant departure from the bread and butter of C syntax to which I was accustomed, and I wasn't comfortable with this, to be honest. It prevented me from appreciating Python and therefore prevented me from exploring it too. Furthermore, everything that Python is used for can be done just as well by other languages. I hurled excuses at each potential application of Python. Web server? Don't use Flask, just use Express.js instead. Statistics? Don't use Python, just use R. General-purpose programming? Not even a question, Java or C++ is the way to go.

However, over the past year, I've come to realize that I need to step away from my comfort zone and start to learn Python. This semester, I was taking a Competitive Programming class, and we had the option to submit our assignments in either C++, Java, or Python. Ironically, I went from using Java (which is slow) to trying out Python (which is even slower) instead of moving to C++. The reason I made this change is because I got tired of the bloat that Java syntax had. (C++ syntax would not have been better.) Every time I have to write Scanner sc = new Scanner(System.in);, a little bit of me dies inside.

Thus, I became determined to finally learn Python. The syntax may not be my cup of tea, but I could recognize that there were benefits to using Python's syntax over other languages' syntax for the same constructs. For instance, .charAt() and .substring() are two very common String methods in Java that would be much better served if there were some special shortcut syntax for them. Thankfully, Python offers just that: splicing. varname.charAt(i) becomes varname[i] and varname.substring(0, j) becomes varname[:j]. Quite refreshing!

Where does this come handy? There are often simple actions that we want to achieve in programming, but they can be quite verbose to write out. For instance, if I were given an array of strings and asked to find the shortest string and return its length, I would think to myself "this is a really easy thing to ask for". Unfortunately, it's not really as simple to write out in Java:

String[] strs = new String[]{'string1', 'string2', 'string3', ...};
int minLen = Integer.MAX_VALUE;
for ( int i = 0; i < strs.length; i++ ) {
    if ( strs[i].length() < minLen ) {
        minLen = strs[i].length();
    }
}
// result stored in minLen

However, the same thing is extremely easy to accomplish in Python using list comprehensions. These are basically embedded loops that do a simple function to generate/modify a loop.

Here's what I did: I used a list comprehension to get the length of each string in the list, and then used the built-in min() function in Python to return the minimum.

strs = ['string1', 'string2', 'string3', ...]

minLen = min([len(x) for x in strs])
# result stored in minLen

I admit, this might not be space efficient, but the elegance of the syntax to convey the same meaning as the Java code I just wrote cannot be denied.

Another part of Python I really like is assigning multiple variables at once, or multiple assignment, which is accomplished through tuples. This is also possible in PHP, but let's be honest, why would I use PHP just for multiple assignment benefits?

For instance, this code in Java:

int a = -3;
int b = 25;
int c = 47;

would be this in Python:

a, b, c = -3, 25, 47

This came in handy in Competitive Programming whenever I needed to load in multiple variables on one line:

n, j, k = input().split(' ')

If we wanted to convert each input from a string to an integer, using list comprehensions comes in handy. The key to what I'm about to do lies in the fact that Python uses tuples to represent multiple assignment. In the below code sample, the input is split by spaces into a list. Then, each input is converted into an int and stored into a list. Finally, this list is converted into a tuple. The element at index 0 is stored in n, the element at index 1 is stored in j, and the element at index 2 is stored in k.

n, j, k = tuple([int(x) for x in input().split(' ')])

I started freshman year recognizing Python as something that exists that I dislike, but now I see why it might be at least somewhat useful. Python is slowly wooing me over with things like list comprehensions and multiple assignment. I hope to share what I find next and incorporate them into my programming repetoire.

Welcome to my blog

May 14, 2019

As finals season wraps up, I slowly realize that I have nothing to do. All of the personal projects that I wanted to do throughout the semester were put on the backburner, but now that the semester has ended, I can't seem to remember what I had put on the backburner. As my friend put it, we went from doing Computer Architecture assignments every week to absolutely nothing at all.

This past weekend, I attended the semesterly banquet that Turing Scholars holds for its students. Towards the end of the banquet, the graduating seniors each give a short speech of advice. One senior made an excellent suggestion: although we are likely going to become software engineers, we should all have great communication skills. One way she mentioned we could sharpen our communication skills is by having a blog.

So, here I am, two days later, bored and with nothing to do, launching my blog on my UTCS website. Thanks, Jo!

What will I post on here? Given that this is hosted on cs.utexas.edu, I don't think it'd be the appropriate venue to hash out political grievances or deep philosophical thoughts that college students typically think of in the middle of the night. Instead, my intention is to post on topics relevant to my time here at the university, general topics on computer science and business administration (my two areas of study), and anything interesting that I learn as a student and lifelong learner.

In the spirit of my friend's blog, I'm also keeping this blog clean and simple. Instead of using WordPress or some other blog management system, I'm running things with PHP (because the UTCS web servers only supports PHP) and Markdown. This blog is using an excellent open source Markdown parser for PHP called Parsedown. When I have time, I'll put my quick blog script on GitHub. It's a little bit more work than using WordPress, but it's much easier for me to set up in my opinion, and doesn't have as much overhead. At the same time, I am not a big fan of regenerating websites every time I update them, hence my decision to use PHP over Jekyll.

My only regret in launching this blog is that I didn't start sooner. I could have written a lot for freshman year. However, the old saying goes: "The best time to plant an oak tree was 20 years ago. The second best time is now." With that, welcome to my blog!

Jeffrey Wang's Blog