Let’s be honest — research papers are scary. In the field of machine learning, reading a research paper can feel like staring into an abyss of dense words and complicated formulas. It can be very easy to look into that abyss and assume it is too much to overcome. Learning how to extract information from research papers, though, is critical. The field of machine learning is moving so quickly that often the only way to stay up to date is by reading papers. My hope is to help you develop some skills and strategies in order to not feel overwhelmed.
Start Broad, Not Deep
It can be tempting to pick a research paper and decide that no matter what you will conquer it. I have found when starting out, though, this can lead to burnout and despair. You can find yourself weeks later feeling like you have made little progress and decide you are not cut out for the task.
Instead, I would recommend skimming many research papers to start. The goal of this process is to start to feel comfortable with the way papers are written. Papers almost always follow a similar flow: there is an abstract at the beginning with an introduction and background research, the middle tends to consist of a detailed description of the research contribution, and the end has experimental results and a conclusion with proposed next steps.
To start, I would check out a list of curated papers such as Awesome Deep Learning Papers (note: Arxiv Sanity is also a great place to find papers). The Awesome Deep Learning Papers list is no longer maintained but it is still a great starting point to get familiar with key research papers in the field of deep learning. Start by picking research papers that look interesting and do the following:
- Read the entire abstract.
- Skip the equations, but read the figures. This one might be unique to deep learning, but there are almost always great figures with an overview of the proposed architecture.
- Review the tables in the experimental results section.
- Read the conclusion.
This entire process should take less than 10 minutes and provide you with a decent summary of the paper. You should walk away understanding what the paper was trying to accomplish (abstract), a high-level idea of the methodology (figures), how well it actually worked (tables), and the shortcomings and potential next steps (conclusion).
You will not have a deep understanding of how the methodology works, but that’s okay — the goal is broad coverage of papers. So pick up the next paper and repeat the process.
After reading 10 to 20 papers in this way, I have found something magical starts to happen. You not only start to feel comfortable just picking up a research paper and extracting the key points, but you also start to develop a knowledge network. You begin to see how different ideas connect and what ideas keep resurfacing. For example, if you’d be reading recent NLP papers you’d start to realize that transformers are all the rage and a new state-of-the-art is being published at an incredible rate.
This knowledge network is incredibly valuable because the next time you skim a paper you will see how the research connects to other papers you have read. For example, you might read (from the XLNet abstract)
However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation.
And think to yourself: okay — yeah, at a high level, that makes sense because I remember reading a bit about masking positions in the figures of the BERT paper. In just a few minutes you understand that one of the contributions of XLNet is doing something better than masking tokens. Your knowledge is shallow, but it is enough to add to your knowledge network. Since skimming only takes minutes, it also allows you to keep your knowledge network fresh with the latest research even in a fast-moving field such as machine learning.
Once you’ve skimmed a handful of papers you should start to have a feel for the key subjects that keep recurring. For example, in NLP, a key subject would be transformers. In my experience, key subjects are rare enough, that when you spot one it is worth the time to go deep and really understand it (note: you should have bought yourself a lot of time by only skimming most papers). My strategy for going deep is the following:
- Start from the beginning. For example, if you decide you really need to understand BERT, you might discover concepts used in BERT that you don’t quite grasp. Try and trace those concepts back to their origin by skimming referenced papers. For transformers, you’d need to understand attention, and that would probably take you to the Attention Is All You Need paper.
- Leverage blogs. It turns out that most research papers are not really written for understanding (at least not by most people). Fortunately, many amazing people have taken the time to describe difficult concepts with clarity. For example, check out this amazing write up on the transformer from Attention Is All You Need. Search for these resources on Google, Medium, Reddit, really wherever you can — they can save you hours of time.
- Code it up. For me, I find that once I can code something, I have a pretty solid grasp on the implementation. If you are in the field of deep learning, I would highly recommend PyTorch as often I have found that you can almost convert directly from the words in a paper to PyTorch code. A group at Harvard even did this for the Attention Is All You Need paper.
- Teach it. At this point, you have developed a solid understanding of the paper and I would highly recommend teaching what you have learned. You could create a blog post, present what you have learned to co-workers, or even give a talk at a local meetup. Teaching will force you to solidify and clarify what you learned and also help others.
This process takes time and requires patience. Don’t be deceived by the fact that research papers are only 5–10 pages long and you can read all the words in an hour. A research paper is a distilled version of a significant amount of effort and time — usually by a group of people. Don’t expect to master the concepts in a day. Instead, focus on putting in the work of understanding. For example, you could set a goal to spend 30 minutes a day better understanding the paper. Don’t stress about how long the entire process will take. Just continue to put in the work and you’ll find yourself on the other side with substantially more knowledge.