What I Learned from Teaching a Data Analytics Course
Categories: Misc Stuff
At the beginning of the year, I started to teach an online college course on Data Analytics at a four-year university. This was a completely new experience for me as I had never taught a course before at any level. It was work that was terrifying, uncertain, and demanding, but ultimately very fulfilling. I had to manage my own expectations for the course as well as the reactions of the students as they experienced it, but at the end I learned many things that I hope to incorporate into a future teaching opportunity, should I get another opportunity.
This post will revisit my experiences surrounding this course and summarize the lessons that I learned. I’ve already shared most of these thoughts with my department chair, and I’ve sought to anonymize student experiences and feedback as much as possible. Hopefully my own self-assessment will be useful to someone.
How I Got the Job
Like a lot of things, it all started with a phone call.
Last July I received a call from one of my close friends from graduate school. This friend was in the process of becoming chairman of the Mechanical Engineering department at the university, he was looking to expand the department’s teaching and research portfolios in data science and analytics, and he immediately thought of me. He asked if I would be interested in teaching any course in his department. And specifically, would I be interested in teaching Data Analytics to the students in the department?
Teaching would be a new and terrifying experience for me. I had never taught a course at any level before! I had managed to complete graduate school without working as a teaching assistant, and I didn’t get a chance to teach as a postdoc. I thought that I would like the experience if given the opportunity, but didn’t know if I would ever get one. Now I had an opportunity to teach, and it scared me! I would have had to create a new course from the ground up, carry it out, and handle all content and grading duties myself. It would have required at least a six-month commitment from initial planning to grade submission, and all of my time outside of my day job would have been devoted to this course. It would be a new, uncomfortable, and terrifying undertaking to say the least.
It would be been easy to run away from such an opportunity. I would have been stretching out into areas that are unfamiliar to me. It’s easy to think of all the ways that this endeavor could go poorly. But it could also go very well! I decided that the only way to grow personally and professionally was to do things that made me uncomfortable. So I gave my friend a call back and agreed to teach the course.
The Course
Every course needs a title, and the one I came up with was “Data Analytics for Mechanical Engineers”. My goal for the course was to teach Data Analytics topics to mechanical engineers, which meant applying analytics concepts to engineering problems. I used the term “Data Analytics” instead of “Machine Learning” for a couple of reasons. First, Machine Learning tends to be in the province of Computer Science; second, Data Analytics, in my view, covers Machine Learning and other topics such as business problem formulation, data modeling and management, data visualization and reporting, and explainable/ethical modeling, and I wanted to give my students a solid introduction to those topics.
This course was designated as a “Special Topics” course, which means a course on miscellaneous topics that are outside the course curriculum and taught intermittently (usually once every other year). The course was taught 100% online, which wasn’t surprising in a COVID-19 world, but worked very well for me because I lived nowhere near the university campus (it is in the metro Boston area, while I live in the metro Atlanta area). Despite a scant description of the course, 23 students signed up, which was extremely unusual for a Special Topics course. By the Drop deadline a month into the course, 19 students remained enrolled, which was significantly more than I expected (more details later). Seventeen (17) students completed the course.
Teaching the Course
When I was planning the course, I considered approaching it along the lines of either a Coursera/Udemy/EdX (massive open online) course or a traditional university course that happened to be offered online. I decided on the latter approach because it was a course aimed at enrolled students at Tufts, and it was best to present the material in a manner that students would expect if they were in a classroom. This approach required a lot more work in a fully online setting because all course material had to be complete ahead of the class meeting. I didn’t always succeed in delivering all the content ahead of time.
I organized the class into lectures in which I would present Data Analytics topics, with problem sets given out about every two weeks to evaluate students’ understanding of the material. The class met twice a week for 75-minute lectures. Outside of lectures I organized three hours of online office hour sessions per week. In the second half of the course the lectures continued while the class organized themselves into teams of 2-3 students to carry out final projects. All of the teams gave oral presentations of their projects in class — first the proposals and, about 5-6 weeks later, full presentations. The teams had to submit written project reports as well.
I tried to recycle as much content as I could. I looked at a number of similar courses at other universities and how they were organized. I used those materials to refine my presentation of certain topics, but I found their published course policies and logistics to be most useful to me.
The actual teaching of the class was one of the easier aspects of the experience. I’d log onto Zoom, see students appear online, and start the class a few minutes after the top of the hour. Public speaking is a skill that for me requires a lot of practice, but the act of a class lecture is something different and not nearly as stressful. Online lectures are much easier because I have a slide deck to recall things I want to talk about. I made the initial mistake of packing too much content into slides and then going through them quickly. No wonder that when I asked if there were any questions I didn’t receive many.
It was a challenge to find examples that weren’t the typical ones presented in a Machine Learning course. I wanted to find as many examples from engineering as much as possible, but I eventually had to abandon this desire. Many engineering examples were too advanced for what is a first Data Analytics/Machine Learning course, and it was too difficult to come up with decent, tractable examples on short notice.
As one might expect with a Data Analytics course, no one book covered all topics I wanted to cover. I assigned six books and ended up using five:
- “Teach Yourself SQL in 10 Minutes” by Ben Forta.
- “Python for Data Analysis” (PDA) by Wes McKinney.
- “An Introduction to Statistical Learning” (ISL) by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
- “Visualization Analysis and Design” by Tamara Munzner.
- “Storytelling with Data” by Cole Nussbaumer Knaflic.
The ISL book got a lot more use than the other books as it covered most of the Machine Learning content of the course. The other books were used to cover other topics of the course, which meant that most of them were used for just one lecture. I would say that the ISL and PDA books got the bulk of use in the course.
I supplemented the books with journal publications that covered certain topics in Data Analytics. The idea was to have the class discuss the findings of the papers and recall the main points in problem set assignments, but with the exception of one paper (Gailt Shmueli’s work on using models to predict or explain), I didn’t incorporate those publications into my lectures.
One innovation that I came up with was the use of Jupyter Notebooks to create tutorials that illustrated concepts presented in Data Analytics lectures. I wanted students to be able to get a feel for the practical applications of these concepts, and have a space to examine examples and convince themselves that they really understood the concepts and their applications. With Jupyter Notebooks, I was able to use prose, images, and code to present the course topics. To ensure that my students were reading these tutorials, I adapted at least one of the examples into a question on my problem set assignments. Many students were able to figure out what I was doing but I was dismayed at the number of students who copied computer code verbatim without understanding what it meant. At any rate, the tutorials turned out to be an inspired idea but creating content was a massive undertaking.
Lessons Learned
As I wrote at the beginning, I learned many things from this experience. Here are those things:
My pre-class prep was necessary, but insufficient.
I started preparing for my course in October and focused on creating a detailed outline of course content and policies. In a desire to make my course self-contained to students, I started writing supplemental material on Python, linear algebra, probability and statistics, and the use of Jupyter. By the time December rolled around, I realized that I hadn’t written any actual course content, so I tried to get to work on that. In the end, I had completed slides for my first two lectures and outlines for the following ten.
I came to realize that my pre-class prep was necessary to give my course some needed focus and structure. But it still wasn’t enough time to prepare lectures ahead of time. Perhaps that is to be expected — it is very difficult to gauge how much information is too much for a 65-75 minute lecture. I never got much further ahead than one or two lectures — some of the slide decks for my lectures were completed with minutes to spare before class time. Problem sets weren’t completed until the day that they were supposed to be released, which is a practice that I don’t recommend and ended up causing me a lot of headaches.
I was told by a co-worker who had taught as an adjunct that every hour of lecture for a new class takes about 20 hours of preparation. I also remember my PhD advisor telling me that preparation for a new lecture takes about 3-4 hours. In my experience, class preparation required significantly less than 20 hours per hour of lecture time, but a lot more than 3-4 hours. When you consider that the class preparation included not just lecture preparation, but also creation of tutorials, problem sets, and supplemental materials, preparation time probably was closer to 20 hours!
Watch the prerequisites.
Course prerequisites are like a contract. There is a difference between a prerequisite and a recommended course, and students will takes these differences literally. For my course, I set prerequisites to be courses that were part of the ME course curriculum. There were prerequisites that I really wanted to designate (linear algebra background, for example), but I found out that they weren’t required courses for the students, so very few had the level of preparation that I had considered necessary. This has implications for how certain subjects are covered.
Teach the class with the students you have.
This was the hardest lesson for me to learn, but one that had the greatest impact on the students’ satisfaction with the course. There is a difference between the students you’d like to have — or the students who attended your graduate school — and the students that you actually have in your class. So the students’ preparation is different and their expectations are different. Teach accordingly. I am certain that the source of the students’ frustrations with my course lay in my assumption that they had certain skills that they did not have. I tried to compensate for this in later lectures and tutorial notebooks. The best way to understand the preparation of your students is to have them take a diagnostic test early in the course.
Write problem sets with a purpose.
I centered my course around problem sets, which are typical in STEM courses. I wanted students to demonstrate understanding of the basic concepts and terms within Data Analytics and apply those ideas to solve familiar and unfamiliar problems. Doing so required developing problem sets that are workable, consistent, and fair to the students.
On these qualities, I think I fell short. A big reason for falling short is that I finalized the problem sets the day that they were supposed to be released, and a couple of times I couldn’t manage to do even that. As a result, I wasn’t able to work my own problems and identify inconsistencies or null results before the students did. Another reason for falling short is that I didn’t create a clear rubric for evaluating problems. I did have an internal rubric that I tried to apply consistently, but I believe that the students would have been less frustrated with me (and I with them) if I had given clear guidelines of what I was looking for in problem set solutions.
But who knows. Maybe I did better with the problem sets than I thought. I only remember the problems that didn’t turn out so well.
Maybe I needed to give more assignments.
I evaluated students based on (1) problem sets (four + one bonus), (2) the final project, (3) class participation (lectures, breakout sessions, office hour sessions). This wasn’t unreasonable, but it caused a lot of uncertainty among students. It turned out that only one assignment had been graded and returned to students before the drop deadline, and it was a very challenging assignment with a relatively low class average. I even saw a student withdraw from the course even though he had one of the best scores from the first assignment. In retrospect, I should have assigned more quizzes as basic knowledge checks each week. It would have resulted in more work for me, but the quizzes would have given students more feedback on their progress with the course.
Insert students in the evaluation process.
Students’ comments are requested at the end of the semester, but their thoughts during the course are just as valuable because of the ability to correct things that aren’t going well. Comments won’t always be fair or realistic, and some comments can’t be followed exactly. But you do need to know which comments to take on board. One thing that I did for the final project was to ask students to score their peers’ final project presentations. I presented them with a rubric and clear instructions on what to look for. I didn’t know what to expect, but the peer scoring turned out to be quite fair, and their averages were close to my own scores for the project teams.
So, Will I Teach It Again?
I didn’t teach Data Analytics perfectly, but I taught it the best way that I could by example and instinct. My students didn’t like everything I did in teaching the course, but they persevered and ended up more knowledgeable about machine learning and data analytics than before. The (uncurved) class average was equivalent to a A-, which was well beyond my expectations.
Would I teach it again?
I have a clearer idea of which course topics need refinement and redesign, and I would like the opportunity to teach it again, whether at this university or elsewhere. Much of the country is returning to in-office work and in-person instruction, and I’m sure that after fifteen months, students are tired of remote classes. Teaching remote is convenient for someone like me, but I also recognize that in-person instruction will always be preferable to students. Teaching a course for a second time doesn’t require the massive amount of time to create course materials, but it still requires a significant amount of time. So it’s not a decision to be made impulsively, and I won’t make it impulsively.
I’ll just say this, and I thank you for actually reading this entire piece: teaching Data Analytics to a new generation of engineering students has been a privilege and an experience that I dreamed about but for some time wasn’t sure it would actually happen. I thank my friends for helping me make this happen. My hope is that, in my own imperfect way, I did some good in the world and transferred a bit of my passion to others.