We've crunched the numbers and generated the graphs. It's time to take a look at Spring 2016 course registration. This period consists of the dates Friday October 30 through Friday November 13, when all Graduate and Undergraduate students choose their classes for the upcoming semester. Technically registration extends through the Schedule Adjustment Period (December 14 - December 18) and the final Add / Drop period (January 4 - January 15), but that data won't be covered in this analysis.

We'll be looking at the registration patterns and behavior of the student body at large to get a different perspective on students' academic interests and personal course preferences. Data for this analysis is provided by examining the changes in course details on CNU's course listings over time. This data is stored by the Open Course system as it runs periodical updates for use in future analysis. Recently we've added Course Versions to the API, which allows anyone to examine these changes over time.

Let's get into it!

Data

We'll first look at the big picture - the changes in overall availability of courses over time. The following graph shows the number of open seats in all courses across the registration period.

This graph follows the tiered registration pattern laid out by the registrar. Seniors and graduate students are permitted to register at 7 AM and 7:30 AM of October 30. Juniors register on November 3, Sophomores on November 5, and Freshmen on November 11. However, there are some dips in the graph that are unaccounted for by this schedule. For instance, it looks as though there is an additional registration time on November 2, although not as many seats are taken during this time (around 1,878 seats registered for, compared with 3,916 on the 3rd). Assuming the average student has 4-5 classes on their schedule, we can say that this registration involves around 400-500 students. Similarly small and undocumented drops occur on November 4 and 10, right before the official registration times for Sophomores and Freshman respectively.

It's worth noting that, once a student has been opened to registration, their slot does not close until November 13th. It is possible that these drops are a result of students from previous registration times wanting to get a spot in a class before the next wave of registration takes place. However, this is unlikely due to the time and scale of the drop in seats. If the scenario just mentioned were occuring, we would expect to see a much more gradual decrease taking place between registration periods. Instead, we can clearly see a much more discrete drop occuring exactly between 7 and 9 AM, making it clear that this is a planned and regulated registration time provided by the registrar.

This makes it also unlikely that these seats are being held by upperclassmen in preparation for younger student's registration times, as the graph would be much more gradual as courses were dropped and then added if this was a widespread issue.

Since Open Course does not directly interface with the registration system, it's impossible to tell who these classes are being registered for, and thus figure out what the academic qualification is of these unpublished registration periods. As it turns out, however, these registration periods are the advance opportunities given to students in the Honors or President's Leadership programs. This information matches the number of estimated students registering during these periods, which would become smaller each year as students decide to drop out from the programs, but encompasses about half of each class. Curiously not listed on the Registrar's website, students in these programs can expect to be allowed to register a full day before their listed academic registration time. Both of these programs list "Priority schedule registration" as one of their benefits, and the benefit is clearly seen here in the data.

With that solved, let's move on to the semester ahead!

The spring semester is decidedly less musical than the fall. We've lost courses in VIOL, OBOE, HARP, PIAN, CLAR, FLUT, PERC, VOIC, TRMB, HORN, TRPT, TUBA, and IMPR. However, we've gained EVST (environmental studies), GREK (greek language), and EENG (electrical engineering, a new program at CNU).

There's also substantial changes in the number of courses offered this semester. While there are 15% less ENGL courses, 18% less MATH courses, and 28% less BUSN courses, there are 450% more TCHG (teaching) opportunities available (27 more, up only 3 from last semester) with a wealth of internships provided for 4 or even 8 credit hours. There are also small increases in availability of MLAN (modern language) HIST, COMM, and MGMT courses.



The Open Course database was searched this semester alone over 11,000 times by users. The combined results of these searches amounts to over 620,000 courses found.

The award for most searches of the database goes to..... non-logged in users! Not surprisingly, as most web traffic comes from unidentified sources (even if that user has an account but only wants to search, it's easier to remain logged-out), searches from anonymous users totaled 4,352 searches alone, close to 40% of the overall. For comparison, the next most active user made 320 searches. Still, this means over 60% of the searches came from users who had an account and were logged in.

This unusually high number (in the grand scheme of things, web-crawlers alone make up over 60% of Internet traffic) can be attributed to the fact that on Open Course it is incredibly easy to sign up for an account. In fact, you don't have to sign up for anything - the site uses Google's OAuth system to allow users to login with their existing CNU accounts. This is important to keep in mind, as it shows that ease of use is an important factor to user's experience on a site. It's also fair to say that most bots don't care about which English class they'll be taking in the spring, so they wont be sticking around long, so this might not be a fair comparison.

Log in to see how you measure up in these numbers.

What did people most often search by when they wanted to find a course? The graph below shows which methods students used to look up courses they might want to take - whether it's by searching for a specific CRN, for a professor they like, or for a course subject or number they have to take.

Clearly, searches by course subject are the most common (a subject being a four-letter department identifier or a course number). These sorts of searches make up 64% of the totals. This backs up the conclusion of the last Add-Drop breakdown that, in general, people see more importance in the substance of the course than who teaches it. Let's have a look at the most popular subjects people searched for:

While these were the most popular subject searches made, there was also a substantial number of searches found that showed a misunderstanding of how course searching works.

By narrowing our analysis to focus only on searches which returned no results (and are thus more often invalid searches), we can see some issues with this system. Here are some examples of commonly-made searches which are invalid.

Obviously, these are very small mistakes that could easily be made by anyone and could confuse people when it returns no results, and this is why data is important. Looking at the search form, it's very easy to understand how these would be made - the help text given for the "Course" field, "(ACCT, PHYS 151, etc.)", is meant to instruct users to the format, but fails to tell people when they've made a mistake. Even the field name, "Course", can be misleading. It's not easy to understand why you cannot search for a choir course using that box, and it's surprisingly hard to find that course at all unless you know it's listed under "MUSC".

This indicates that something needs to be done to make searching easier on users. Making things easier through new presentations is, after all, the reason this site was created. Even though the vast majority of searches are successful, some who are in a hurry or have very specific searches to make can be put off by the lack of feedback on what went wrong.

As a first step, I've made changes to the search engine to automatically correct the 3rd example, where no space is left between the course subject number, without bothering the user. The other examples are a bit more complex. I've been discontent with the current search system for a while now, and this has given me a reason to start over and make something better, fixing those issues (or at least letting people know it's wrong).

After considering what can go wrong when searching, let's move on to look at successful searches and what people most look for.

When it comes to picking a day, students seem to look for classes on Tuesdays and Thursdays far more than those on others. While 233 searches were made for TR courses, only 103 were made for those on MW/F. This could mean most people prefer their classes on these days, or that they already have their MWF courses figured out and are using the search function to find classes on the other days. Which do you prefer?

As for times of day, it turns out that, in order, 3 PM, 9 AM, 8 AM, 11 AM and 6 PM are the most searched class times. However, when times are grouped together in blocks, searches are most often made for classes between 11:00 AM and 1:30 PM This represents, for many college students, a sweet spot where classes are not too early and not too late. This is consistent with the findings of the fall semester's add/drop analysis, which found that classes which meet around 10 AM are generally most popular.

Surprisingly, instructors are not searched for by name very often (queries with instructors named make up less than 1.5% of all searches). Still, some professors are saught after by searchers. Among these popular folks are professors Throupe (LDSP), Camobreco (GOVT), Lopez (MUSC), Sheffield (LDSP), Falk (HIST), Manning (COMM), Redick (RSTD), and... Staff. It seems like one adventurous user really wanted a surprise in their semester.

The top course attributes students looked for were AINW (Investigating the Natural World), AICE (Creative Expressions), and AIWT (Western Traditions). In fact, the 5 Area of Inquiry attributes made up 87% of the attributes that were searched for, and the Liberal Learning Foundations, Writing Intensive, and Honors attributes were far less saught after.

Conclusion

One of the major limitations of this data is the time between data sampling. Thus far, Open Course has only updated its data every 2 or 3 hours. This is because CNU places limits on its Schedule of Classes web service that lock out the system if more frequent queries are attempted. To be able to see these sorts of breakdowns without any inside access is great, but it's not nearly as detailed and informative as it could be if the database was more complete and up-to-date. There's definitely still more progress to be made there.

Still, major improvements on this site alone can come from analysis. In this update alone, multiple issues with the search function were found that will lead to a better user experience for everyone in the future. I'd be happy and interested to hear anyone's feedback or ideas about what can be done to make things more clear.

Since the query logging system was new as of writing this post, this analysis focused more on that data than on the courses themselves (those were covered a lot in the last one, and the course makeup doesn't change substantially semester to semester). It's good to see how data collected right here on the site (as opposed to from CNU's website as a third-party) can be so detailed and offers more fine selection, such as the ability to look at only searches that failed to determine their cause. I'll be looking forward to a point where this is available for all course data.

I do hope that everyone's semester goes well, and wish everyone a wonderful 2016. Until next time!

We're looking back on Add/Drop period -- the first week of classes, where students can update their scheduled courses at will. This week consists of the dates Monday August 24 2015 through Friday August 28 2015, after which adding and dropping classes is no longer allowed. This period encompasses a large shift in registration as students make final decisions on which courses they want to take, as well as a number of new courses being added to the listing.

The goal of analysing these numbers is to gain insights into the add/drop process and to better understand the student body, but the primary motivation is always simple curiosity. We'll break this data down in a variety of ways to see what we can learn in this interest. Data for this analysis is provided by the recently added course versioning functionality, which is the technology used to send relevant updates via email when course details change. Course versioning tracks courses over time and retains past data for future crunching.

With all that said, let's get to it!

Data

The Fall Semester 2015 listings, as of the posting of this article, contain 1,355 courses. Of these, lectures are of course the overwhelming majority, encompassing 74% of them. The next largest type is labs, coming in at 16%. The type with the least number of offered courses is practicum, of which only 16 are offered.

An attempt to break these courses down by subject quickly reveals the diversity of subjects offered at CNU. The top three contenders for most classes are, in order, Biology with 107 courses offered, English with 101 courses offered, and Math with 76 courses offered. A large amount of musical instruments are offered as their own subject, similar to how foreign languages are separated, and most only contain a single course in that subject, including Violin, Cello, Clarinet, Oboe, and more. Other subjects with only one course include a Musical Improvisation course and NSCI - The Study of Science (not to be mistaken for NCIS, unfortunately). You can click on the subjects below to disable them from the graph.

Of these classes, the majority (57%) are currently closed, which is to be expected following add/drop.

However, simply comparing the numbers of closed and open courses does not paint the entire picture. Many courses frequently flip between statuses as students modify their schedules over time. Data from the Open Course following system, which notifies users when a course status changes, show that more users were notified about a closed course opening (79 notifications) than about an open course closing (65 notifications) during add/drop.

In fact, over 519 courses changed their status at least once during this period. 470 courses were closed from the beginning and never lost a registration, and 366 courses were opened from the beginning and have still not reached capacity.

Looking at the change in the total number of open seats in all classes, an upward trend is evident which grows as the weekend nears, indicating more available spots being opened up. Whether this is from students dropping classes (the majority of the changes) or from a number of sections added, courses become more available over time.

Note that this graph also includes the open seats from new classes added to the listings. The jump in available seats on the 24th is the result of a large batch of 50 courses that were opened that day. Following the graph from that point at the beginning of the week, students can be seen largely registering for seats by the temporary downward trend before it heads back up as drops occur.

This number of open seats (over 1,200) is at first surprisingly large - enough for nearly an entire extra class of students to fill. However, considering the number of courses from which they are summed, the overall average number of seats is only 2.9 per course.

Zeroing in on that set of 50 sections that was added on the 24th, we can break down the courses by subject to see what was added. What happened there?

The explanation for this addition is more technical in nature. These classes actually are the new sections that apply to the Fall 2015 catalog. Around the 24th at 10 AM, a change was made to the code of the Schedule of Classes page which Open Course gets its data from. This change meant that the "Fall 2014" catalog option shows classes for catalogs After Fall 2014, even though the label has remained the same. This has the effect of introducing all of the newly introduced courses, notably ENGL 123 and LDSP 210 courses for Freshmen.

Prior to this change, the data on Schedule of Classes, and by extension on Open Course, was incomplete (missing the Fall 2015 catalog-specific sections). This issue was known because several freshman who had used the browser plugin to sync their classes to Open Course had synced an unknown CRN, causing database errors that had to be resolved. Looking into this issue revealed the inconsistencies in the public Schedule of Classes site which had remained unfixed by CNU until this time (and resulted in much better API validation).

With that sorted, let's take a look at the state of courses by what time they start. Time is often a factor when it comes to scheduling your courses, but when it comes to early and late classes, there is still much debate about which are better. To determine what the majority prefers, we'll look at the average available number of seats in courses at 12 hours of the day.

Not surprisingly, the courses beginning at 8 AM are highly undesired by students, at least 44% less popular than other classes which begin at 9 AM, just an hour later. It seems that waking up much earlier than 8 to get ready for the day is avoided by most people. Interestingly, classes at this time aren't the least popular according to this data. Seats in courses around 7 PM are the least popular, possibly because taking one of these will cause you to miss regular hours for dinner. Courses which meet at most other times during the day are the more widely taken classes, and 10 AM courses are the favorites. Of course, these hours aren't exhaustive.

What about the instructor who teaches the course? RateMyProfessors is an often-used site that allows students to give reviews on their courses, and generates a rating for each professor. Are these reviews heeded by students, or does the convenience of the course (time and other factors) level the playing field?

Let's look at the average number of seats in the 50 highest and lowest rated courses.

As it turns out, students do appear to take general class or instructor rating into account when selecting their classes. This result could come from online sources, such as RateMyProfessor, simple word of mouth, or could be in fact related to unanalyzed factors. Keep in mind that this evaluation is highly unscientific. From the data at hand, however, students to tend to choose courses from more highly recommend instructors to add to their schedule.

It should be noted that, even among the bottom 50 courses, the average open seats is only 3.48. This is lower than the average number of seats in classes which meet at 8 AM from the data above, and significantly lower than those that meet at 7 PM. This indicates that, even though instructors are taken into account during registration, time can provide a much more compelling reason to choose one class over another.

Conclusion

There's a lot of cool things to be learned from data. Taking a look at this data is incredibly validating and shows the importance (and coolness) of this project, because without it these insights haven't been available. If you have an angle you'd like to see that hasn't been looked at yet, please let me know!

Though the results of this analysis are interesting and very cool, they are limited in scope. The functionality created to track courses over time offers better insight into how they change over time, but still from an outside perspective. Better analytics could obviously be leveraged directly from course systems.

Futhermore, this data only comes from the add/drop period of course registration. The Open Course system launched this summer and does not have any records of the full registration period at the end of the Spring 2015 Semester. With the limitations of this analysis understood, there is an opportunity to make changes and upgrades where needed to get a better view of next semester's registration when the time comes.

With that said, boundaries must be respected. Data collection is often at odds with privacy. Companies which manage popular internet websites often grapple with the desire to both respect their users' perceived rights and better understanding them. When this is done wrong, it creates security concerns for everyone involved because user's personal information is put at risk and makes users feel as though they are being watched. "Surveillance breeds conformity" (Glenn Grenwald). The goal of this project is to always, in the face of such a dilemma, air on the side of privacy. These implications need to be kept in mind for future research.

Here's the traffic spike to the website from students checking on the details of their first classes before they head out. Hope everyone had a great first day back!