Setting the scene
I am compiling some articles on undergraduate CS research and there are more to come along this line. Lately, I wrote a blog on SOTA reproduction as a key pathway for starting undergrad research. I guess bringing conference workshops into the scenario fits a timely and valuable example following that discussion. Today I will be talking on workshop tracks from top-notch machine learning and natural language processing conferences and try to argue why CS undergrad students from Bangladesh should go for them obviously. I first learned about workshop tasks from SemEval18. Since then always hoped for doing some tasks of my interests but sadly never got the chance. The last December I attended a talk titled “Story of Natural Language Processing” by Dr. Sudipta Kar, Ph.D., Houston University. He was giving this talk at Dhaka and highly encouraged the participants to join NLP workshop tracks and pushed them to honestly work out a project paper. He also made a reference on a rising CS research culture in India, how grad students are doing workshop projects a lot, resulting in publication to good venues, building a strong communication channel, and eventually availing a position in good schools strategically. He made an apparent point that clearly we have a great potential in ML/NLP conference workshops, considering our merit but low resources and less likely research environments.
Conferences and workshops, an introduction
I agree with Sudipta da that workshops are kinda cool research hack for undergrad students and early researchers. There are many exciting and fun workshops now happening independently and jointly under several ML/NLP/Vision conferences. ML workshops are typically one or two days long and dedicated to discussing a specific topic ranging from computer science to cross-disciplinary fields. Workshops tend to have deadlines ranging year-long periods. Practically workshops are not exactly the same as conferences but a smaller version of a conference, you can think of minimalism in research conferences. Usually, workshop registration fees are small compared to a wholesome conference, and sometimes it is free to all, so students can usually join paying less. These events are also very diverse and inclusive in terms of attendees and hosts. Workshops tracks are much more action-driven, motivated by certain immediate problems, and call for concrete solutions. Most of them address underrepresented, lately promising, and somewhat new but easy problems to draw attention from the broader research community. You may find some tracks which are literally unheard! These tracks are mostly organized by groups around the world specializing in certain areas. It is mostly doable for undergrad students as you can submit a short paper (4 pages long), maybe some small work of four to five months is pretty much enough for that! Some workshops also call for 8-page papers but they are less likely. That said, workshops are definitely fun places where you can learn research stuff of interesting and amazing concerns.
A few exemplary workshops to follow and doable tracks
Here I am pointing to some good workshops that will be happening throughout the year. Most of these cover areas from machine learning, learning theory, big data, social media analysis, social science, healthcare, agriculture, and natural language processing. SemEval is one of the most popular workshop hosts for the Natural Language Processing community. I find SemEval tasks quite fun cuz they cover some of the most exciting and interesting problems e.g. semantics in texts, sentiment and emotion analysis, common sense reasoning, etc. The fun part is they seem to be doable to early researchers. There are some other great workshops from notable venues e.g. EMNLP, ACL, ICLR, ICML, NeurIPS, and so on. These venues address a broader set of fields on learning theory, neuroscience and AI, healthcare and AI, social science and AI, privacy and AI, social media, language understanding, language representation, transfer learning, generative text, conversation and discourse, semantic evaluation, common sense reasoning, etc, etc. I will suggest my readers go for NLP and social media tracks for three obvious reasons. First, these are tracks somewhat easy and so doable for sure. Second is our computation and resource issues. Text-processing is possible using a machine of a small configuration which becomes very hard and computation demanding if you want to do Vision problems. Third, other tracks require a strong understanding of the field and a strong skill set. Talking about some easygoing workshops, I will list a few from SemEval, and NLP and social media analysis related ones from ACL and EMNLP. Setting an example of how workshops address emergency issues, here is an NLP COVID-19 Workshop from ACL, addressing an emergency call to work on timely research and scientific analysis on problems combating COVID-19. This year NeurIPS had some exciting and fun workshops, to name a few, Learning Meaningful Representations of Life, Joint Workshop on AI for Social Good, Minding the Gap: Between Fairness and Ethics and Real Neurons & Hidden Units: future directions at the intersection of neuroscience and AI. These could give one some idea of how worlds are moving towards embedding machines and automation in everyday life, considering this wide range of questions being addressed and areas open to explore in the coming days.
In many cases, the workshop hosts will provide some data and formulate a problem so that you can develop a machine learning system or build an analysis on those datasets. The organizer typically shares a baseline solution for the task if it’s a system development problem. You can try to build a better solution than that baseline and if you can manage to do so they will accept your work. If it’s not limited to system development, they will also put some possible addressable directions there, e.g open discussions, insights and analysis, evaluation, etc. Some are more like open tasks where possible concerning research questions will be well noted.
My first paper was a workshop paper at ICWSM20 on hate speech detection datasets which I happen to formulate from a very silly idea. It took us around one and a half months to finish the projects. So to be honest, if you work hard and form a strong team, it is always possible to solve one or two problems from workshop tracks.
Workshops as a potential venue for being noticed
Aha! One matter which I forgot to mention earlier is using workshop participation as a key for building a good relationship with the community. A few people join in a workshop, typically a closed event of special interest group researchers, so it is very like that you may get noticed by others, maybe you can find someone there who fits your area of research interests. Why not make some friends, right? These are extremely useful if you looking for a Ph.D. position or internship opportunity in a professor lab. It could bring you some collaborations, and maybe open a door to the outer world if you are okay with some wise chit-chat. Actually my supervisor made two friends in the last workshop I joined with him, and recently, we started a project with a CMU Ph.D. student linked from that workshop. Sadly our students spend a whole lot of time hunting for GRE/TOEFL and writing emails to professors. I believe getting involved in workshop tasks seriously could be a smart hack along this line. Even you can plan your bachelor thesis aligning with workshop tasks. I mean we can’t lose anything, right?
A twitter thread for starting right away!
At this point, I have to direct you to some popular and interesting workshops that will be happening rest of the 2020 in the fields of machine learning and NLP. This twitter thread from UCPH Professor Anna Rogers is extremely useful if you are thinking about doing a workshop paper right away!
How to build something nice
Okayyy! We are pretty much done for today! Stopping right away with a few practical notes. A typical setup for starting your project for a workshop could be somewhat like this below.
Step by step how to start working for a workshop paper –
a) Follow workshops of main conferences
b) Pick a task which broadly fits your interest and seems like doable
c) Read some relevant papers, you can previous year version of that workshop
d) Pair with a friend or active researcher
e) Formulate a proposal. Don’t be shy being an artist and borrow/steal ideas from others. Beware plagiarism!
e) Start working with a timeline
f) Keep making incremental progress
g) Review your work by an expert before submission. You should do it one month ahead of submission so that you can adapt corrections and changes.
Time needed: 4~5 months
Reproducing SOTA works as a pathway to get into research and preparation for a bachelor thesis
An opinionated guide for CS undergrads (local universities)