Important notice about this site
I graduated from the University of Texas at Austin in December 2020. This blog is archived and no updates will be made to it.
Welcome to my blog!
Here, I post on topics relevant to my time here at the university, general topics on computer science and business administration, and anything interesting that I learn as a student and lifelong learner.
Wish to reach out to me about my blog posts? I'm open to questions, comments, and suggestions. Feel free to contact me.
May 15, 2019
The UT Computer Science department provides some pretty good facilities for students to use. When students first enter UTCS, they are told to create an account, with the expectation they will figure out how to use the rest of the facilities on their own. This post aims to explain what one can do with their UTCS account, on the lab machines, and how to use the facilities.
The lab machines are computers in the 1st and 3rd floors of the Gates–Dell Complex (GDC), which is the computer science building here at UT Austin. They run Ubuntu Linux.
Students log into the lab machines using their UTCS account credentials. All of your user files are stored on the same storage cluster, so it does not matter which machine you use.
The list of lab machines are listed on the CS website here. You can SSH into them using ssh csid@labmachinehost.cs.utexas.edu
, where csid
is your UTCS account username (discussed more in detail below) and labmachinehost
is where you put the lab machine's hostname in. For instance, I might do ssh wang@linux.cs.utexas.edu
to log into a lab machine.
Each CS student gets to create a UTCS account online beginning about two weeks before classes start for their first semester at UT Austin. It is separate from your UT EID. (In fact, your EID password and your UTCS account password can't be the same, last I checked.) This is what you use for all UTCS student services and what you use to log in to the lab machines.
Unlike your UT EID, you get to pick your UTCS account name. For instance, I picked wang
because it's my last name. However, as long as it's appropriate, you can pick anything you want. Someone I know decided to go with dab
.
Each UTCS account can be configured to use a shell of your choice. The default is bash, but zsh and tcsh are also available.
The account comes with two really nice perks: your own @cs.utexas.edu
email and a webpage on cs.utexas.edu
. They are both determined by your username. For instance, right now, you are reading this webpage on https://www.cs.utexas.edu/~wang
, and you can email me via wang@cs.utexas.edu
. The email can either be forwarded to an address or you can choose to use the mail servers that the department maintains.
Note: Logging into the UTCS machines from an external network (i.e. neither on utexas Wi-Fi nor on the UT VPN) requires you to use public-key authentication. The department provides a tutorial for those who need assistance setting this up.
Once you have an account, you can log in to a lab machine, either via SSH or physically on a lab machine.
In your user directory, you'll see there is a public_html
directory. Whatever files that are put inside there with world-readable permissions will be properly served by the UTCS web servers under https://www.cs.utexas.edu/~csid
or https://www.cs.utexas.edu/users/csid
, where csid
is your UTCS account username.
The server supports PHP scripts and CGI scripts for languages such as Perl and Python. In the future, I'll publish another blog post on how to get a Python CGI script running on the UTCS web servers, since it's not possible to set up a Flask/WSGI configuration.
To look someone's UTCS account up, you can use the finger
command. For instance, if you do finger wang@cs.utexas.edu
, then the following information would appear:
[net8.cs.utexas.edu: phingerd responds at Wed May 15 18:15:31 2019]
Last login: never logged in
Recent sessions:
wang the-professor:pts/3 May 12 03:43 - 07:54 (1+04:11)
wang the-professor:pts/1 May 13 14:00 - 15:19 (01:19)
wang the-professor:pts/0 May 14 20:32 - 02:45 (06:13)
wang the-professor:pts/4 May 14 22:28 - 09:24 (10:56)
wang the-professor:pts/5 May 14 22:29 - 09:24 (10:55)
Name: Jeffrey Wang
Org: University of Texas at Austin, Department of Computer Science
Office:
Office Ph#:
Home Ph#: Birthday:
Login: wang Sponsor: mcicero
Group: under Type: under
Shell: /bin/zsh Expires: Sep 30, 2019
Server: /v/filer5b/v38q001/wang Quota: under-default
Mailbox: Aliased to <jeffreywang@utexas.edu>
[PLAN]
My email is wang@cs.utexas.edu, not sure what's supposed to go here. -- August 16, 2018
You can also simply finger for names instead of using a specific username. For instance, my user would appear if you did finger wang
. (To protect the privacy of others, that output will not be displayed here.) You can finger yourself by typing in finger $(whoami)
.
To see who else is on the lab machine with you at that point, you can type users
or who
, with who
giving more detailed output than users
does.
Each undergraduate student is given a 10 GB user space quota. (It used to be 2 GB, so we're not complaining too much at the moment!) To check how much disk space you are using, type chkquota
. You can actually check how much disk space other users are using by doing chkquota csid
, but don't be a creep.
To print from the lab machines, you can do so directly from the command line by using lpr -PlwXXX filename.pdf
, where XXX is the printer number. The printer numbers will be evident in the GDC. To check on the print queue, use lpq -PlwXXX
. In case you want to cancel, you can remove a print job by using lprm -PlwXXX printjobnum
. Obviously, you should not use this command if you are not in the GDC.
As long as you follow these below rules, the lab machines are your oyster!
Please note: this system is intended to serve the instructional,
research, and administrative needs of the students, faculty, and
staff of the UT Austin Department of Computer Sciences. Any other
use of this system, including but not limited to using any method
to circumvent proper authentication or authorization, constitutes
unauthorized access and may subject the user to criminal prosecution
under Texas Computer Crime Statutes and other state or federal laws.
(Updated May 24, 2019 to include the who
command.)
May 15, 2019
Who knew "Hello world!" would be so difficult to emulate?
For my Computer Architecture class, we got to pick our final project. Three classmates and I decided to group up and extend the ARM AArch64 emulator we created earlier in the semester in the class so it could support the printf
function in C.
Unfortunately, this was much easier said than done.
To understand the difficulty behind emulating printf
, let's explore what C has to do with our ARM emulator. We would write a C program that was compiled without the C standard library included, so it would convert the C code that we wrote into AArch64 assembly instructions. However, real programs run with the C standard library, which itself is a lot of assembly instructions.
For starters, we would have to emulate all of the instructions used by the C standard library to start, before it could even begin to execute whatever is in the main()
function. It turns out there is a mind-boggling amount of work that the GNU C library does for each program in order for it to prepare to do whatever awaits it in main()
. That's hundreds of instructions, some of which were SIMD/vector instructions that we did not even know about originally.
The real kicker is that we have to emulate syscalls. Whenever the assembly file says svc #0x0
, that means we have to take whatever value is stored in x8 and look up which syscall to perform as specified in the syscall table that the OS (in this case, Linux) provides. The emulator therefore has to trick the program into thinking that the syscall executed correctly and that the expected return value is provided.
Eventually, we didn't make much progress on emulating the GNU C library's startup functions, so we switched over to the musl C library. It featured much fewer instructions than the GNU C library used, and didn't make as many pointless syscalls (such as calling brk
for a simple hello world printf program - why does the heap need to be expanded for that?). Unfortunately, we still didn't make too much progress on emulating the entire musl startup process either.
At the end of the day, we realized just how much work was necessary in order to implement the C standard library, whether the GNU implementation or musl. That being said, we sure did learn a lot about syscalls and other tidbits about the ARMv8 architecture. While we couldn't get "Hello world" to print out with the C standard library, we were at least able to emulate a simple hello world assembly program that used syscalls to make this happen. Overall, extending our ARMv8 Emulator to support the C standard library was a somewhat disappointing yet very insightful experience.
Here is the disassembly of the simple hello world assembly program that makes syscalls directly without the overhead of the C standard library. Actually, write_char
at 400174
is slightly inaccurate; it is missing a ret
statement. In reality, it should include one; I am just too lazy to regenerate the .disas
again to get it.
writesyscall: file format elf64-littleaarch64
Disassembly of section .note.gnu.build-id:
00000000004000e8 <.note.gnu.build-id>:
4000e8: 00000004 .inst 0x00000004 ; undefined
4000ec: 00000014 .inst 0x00000014 ; undefined
4000f0: 00000003 .inst 0x00000003 ; undefined
4000f4: 00554e47 .inst 0x00554e47 ; undefined
4000f8: 6ec8f7ae .inst 0x6ec8f7ae ; undefined
4000fc: ab44b93d adds x29, x9, x4, lsr #46
400100: 4d434f15 .inst 0x4d434f15 ; undefined
400104: 410d9a08 .inst 0x410d9a08 ; undefined
400108: cce822a0 .inst 0xcce822a0 ; undefined
Disassembly of section .text:
000000000040010c <write_string>:
40010c: a9bd7bfd stp x29, x30, [sp,#-48]!
400110: 910003fd mov x29, sp
400114: f9000fa0 str x0, [x29,#24]
400118: f9400fa0 ldr x0, [x29,#24]
40011c: 39400000 ldrb w0, [x0]
400120: 3900bfa0 strb w0, [x29,#47]
400124: 3940bfa0 ldrb w0, [x29,#47]
400128: 7100001f cmp w0, #0x0
40012c: 540000e0 b.eq 400148 <write_string+0x3c>
400130: f9400fa0 ldr x0, [x29,#24]
400134: 94000010 bl 400174 <write_char>
400138: f9400fa0 ldr x0, [x29,#24]
40013c: 91000400 add x0, x0, #0x1
400140: f9000fa0 str x0, [x29,#24]
400144: 17fffff5 b 400118 <write_string+0xc>
400148: d503201f nop
40014c: a8c37bfd ldp x29, x30, [sp],#48
400150: d65f03c0 ret
0000000000400154 <start>:
400154: a9bf7bfd stp x29, x30, [sp,#-16]!
400158: 910003fd mov x29, sp
40015c: 90000000 adrp x0, 400000 <write_string-0x10c>
400160: 91062000 add x0, x0, #0x188
400164: 97ffffea bl 40010c <write_string>
400168: d503201f nop
40016c: a8c17bfd ldp x29, x30, [sp],#16
400170: d65f03c0 ret
0000000000400174 <write_char>:
400174: d2800808 mov x8, #0x40 // #64
400178: aa0003e1 mov x1, x0
40017c: d2800020 mov x0, #0x1 // #1
400180: d2800022 mov x2, #0x1 // #1
400184: d4000001 svc #0x0
Disassembly of section .rodata:
0000000000400188 <__bss_end__-0x10007>:
400188: 6c6c6568 .word 0x6c6c6568
40018c: Address 0x000000000040018c is out of bounds.
Disassembly of section .comment:
0000000000000000 <.comment>:
0: 3a434347 ccmn w26, w3, #0x7, mi
4: 694c2820 ldpsw x0, x10, [x1,#96]
8: 6f72616e umlsl2 v14.4s, v11.8h, v2.h[3]
c: 43434720 .inst 0x43434720 ; undefined
10: 352e3520 cbnz w0, 5c6b4 <write_string-0x3a3a58>
14: 3130322d adds w13, w17, #0xc0c
18: 30312e37 adr x23, 625dd <write_string-0x39db2f>
1c: 2e352029 usubl v9.8h, v1.8b, v21.8b
20: 00302e35 .inst 0x00302e35 ; NYI
The binary was created by compiling and linking writesyscall.c
and write_char.S
.
writesyscall.c
:
extern void write_char(const char* c);
void write_string(const char* s) {
do {
char c = *s;
if (c == 0) return;
write_char(s);
s++;
} while(1);
}
void _start() {
write_string("Hello world!\n");
}
write_char.S
:
.global write_char
write_char:
mov x8, #0x40
mov x1, x0
mov x0, #1
mov x2, #1
svc #0x0
ret
May 14, 2019
I've had the great pleasure of learning quite a few programming languages over the past several years. I started with PHP in 8th grade. (I maintain that it is a necessary evil for my job, but that is a discussion for later.) Then, I learned Java in 9th grade in my AP Computer Science class, like most computer science students do in the United States. When I changed schools in 11th grade, I took two semesters of introductory programming in C++. Along the way, I've picked up JavaScript on my own. These few languages cover a great variety of applications. However, something has always been missing in the mix: Python.
Why do I know all of these languages but never bothered to learn Python?
To start off with, I'm not a big fan of Python's syntax. Astute observers may notice that all of the languages I mentioned in the prior paragraph are certainly in the C family in terms of syntax. Python's syntax is a significant departure from the bread and butter of C syntax to which I was accustomed, and I wasn't comfortable with this, to be honest. It prevented me from appreciating Python and therefore prevented me from exploring it too. Furthermore, everything that Python is used for can be done just as well by other languages. I hurled excuses at each potential application of Python. Web server? Don't use Flask, just use Express.js instead. Statistics? Don't use Python, just use R. General-purpose programming? Not even a question, Java or C++ is the way to go.
However, over the past year, I've come to realize that I need to step away from my comfort zone and start to learn Python. This semester, I was taking a Competitive Programming class, and we had the option to submit our assignments in either C++, Java, or Python. Ironically, I went from using Java (which is slow) to trying out Python (which is even slower) instead of moving to C++. The reason I made this change is because I got tired of the bloat that Java syntax had. (C++ syntax would not have been better.) Every time I have to write Scanner sc = new Scanner(System.in);
, a little bit of me dies inside.
Thus, I became determined to finally learn Python. The syntax may not be my cup of tea, but I could recognize that there were benefits to using Python's syntax over other languages' syntax for the same constructs. For instance, .charAt()
and .substring()
are two very common String
methods in Java that would be much better served if there were some special shortcut syntax for them. Thankfully, Python offers just that: splicing. varname.charAt(i)
becomes varname[i]
and varname.substring(0, j)
becomes varname[:j]
. Quite refreshing!
Where does this come handy? There are often simple actions that we want to achieve in programming, but they can be quite verbose to write out. For instance, if I were given an array of strings and asked to find the shortest string and return its length, I would think to myself "this is a really easy thing to ask for". Unfortunately, it's not really as simple to write out in Java:
String[] strs = new String[]{'string1', 'string2', 'string3', ...};
int minLen = Integer.MAX_VALUE;
for ( int i = 0; i < strs.length; i++ ) {
if ( strs[i].length() < minLen ) {
minLen = strs[i].length();
}
}
// result stored in minLen
However, the same thing is extremely easy to accomplish in Python using list comprehensions. These are basically embedded loops that do a simple function to generate/modify a loop.
Here's what I did: I used a list comprehension to get the length of each string in the list, and then used the built-in min()
function in Python to return the minimum.
strs = ['string1', 'string2', 'string3', ...]
minLen = min([len(x) for x in strs])
# result stored in minLen
I admit, this might not be space efficient, but the elegance of the syntax to convey the same meaning as the Java code I just wrote cannot be denied.
Another part of Python I really like is assigning multiple variables at once, or multiple assignment, which is accomplished through tuples. This is also possible in PHP, but let's be honest, why would I use PHP just for multiple assignment benefits?
For instance, this code in Java:
int a = -3;
int b = 25;
int c = 47;
would be this in Python:
a, b, c = -3, 25, 47
This came in handy in Competitive Programming whenever I needed to load in multiple variables on one line:
n, j, k = input().split(' ')
If we wanted to convert each input from a string to an integer, using list comprehensions comes in handy. The key to what I'm about to do lies in the fact that Python uses tuples to represent multiple assignment. In the below code sample, the input is split by spaces into a list. Then, each input is converted into an int and stored into a list. Finally, this list is converted into a tuple. The element at index 0 is stored in n, the element at index 1 is stored in j, and the element at index 2 is stored in k.
n, j, k = tuple([int(x) for x in input().split(' ')])
I started freshman year recognizing Python as something that exists that I dislike, but now I see why it might be at least somewhat useful. Python is slowly wooing me over with things like list comprehensions and multiple assignment. I hope to share what I find next and incorporate them into my programming repetoire.
May 14, 2019
As finals season wraps up, I slowly realize that I have nothing to do. All of the personal projects that I wanted to do throughout the semester were put on the backburner, but now that the semester has ended, I can't seem to remember what I had put on the backburner. As my friend put it, we went from doing Computer Architecture assignments every week to absolutely nothing at all.
This past weekend, I attended the semesterly banquet that Turing Scholars holds for its students. Towards the end of the banquet, the graduating seniors each give a short speech of advice. One senior made an excellent suggestion: although we are likely going to become software engineers, we should all have great communication skills. One way she mentioned we could sharpen our communication skills is by having a blog.
So, here I am, two days later, bored and with nothing to do, launching my blog on my UTCS website. Thanks, Jo!
What will I post on here? Given that this is hosted on cs.utexas.edu
, I don't think it'd be the appropriate venue to hash out political grievances or deep philosophical thoughts that college students typically think of in the middle of the night. Instead, my intention is to post on topics relevant to my time here at the university, general topics on computer science and business administration (my two areas of study), and anything interesting that I learn as a student and lifelong learner.
In the spirit of my friend's blog, I'm also keeping this blog clean and simple. Instead of using WordPress or some other blog management system, I'm running things with PHP (because the UTCS web servers only supports PHP) and Markdown. This blog is using an excellent open source Markdown parser for PHP called Parsedown. When I have time, I'll put my quick blog script on GitHub. It's a little bit more work than using WordPress, but it's much easier for me to set up in my opinion, and doesn't have as much overhead. At the same time, I am not a big fan of regenerating websites every time I update them, hence my decision to use PHP over Jekyll.
My only regret in launching this blog is that I didn't start sooner. I could have written a lot for freshman year. However, the old saying goes: "The best time to plant an oak tree was 20 years ago. The second best time is now." With that, welcome to my blog!