Final Project - inspirations, datasets and papers
Interesting papers
To inspire project ideas, here are some cool NLP papers:
- Attention is All You Need
- Quasi-Recurrent Neural Networks
- Semi-supervised Sequence Learning
- A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
- Semi-supervised sequence tagging with bidirectional language models
- Deep Biaffine Attention for Neural Dependency Parsing
- Generating Sentences from a Continuous Space
- Improving Neural Language Models with a Continuous Cache
- Reasoning about Entailment with Neural Attention
- Ultradense Word Embeddings by Orthogonal Transformation
To inspire project ideas, here are some cool Computer Vision papers:
- Object recognition: [Krizhevsky et al.], [Russakovsky et al.], [Szegedy et al.], [Simonyan et al.], [He et al.]
- Object detection: [Girshick et al.], [Sermanet et al.], [Erhan et al.]
- Image segmentation: [Long et al.]
- Video classification: [Karpathy et al.], [Simonyan and Zisserman]
- Scene classification: [Zhou et al.]
- Face recognition: [Taigman et al.]
- Depth estimation: [Eigen et al.]
- Image-to-sentence generation: [Karpathy and Fei-Fei], [Donahue et al.], [Vinyals et al.]
- Visualization and optimization: [Szegedy et al.], [Nguyen et al.], [Zeiler and Fergus], [Goodfellow et al.], [Schaul et al.]
Interesting datasets
NLP datasets:
- Sequence Tagging: Named Entity Recognition and Chunking
- Dependency Parsing
- Quora Question Pairs
- Sentence-Level Sentiment Analysis and Document-Level Sentiment Analysis
- Textual Entailment
- Machine Translation (Ambitious)
- Yelp Reviews
- WikiText Language Modeling
- Fake News Challenge
- Toxic Comment Classification
Computer vision datasets:
- Meta Pointer: A large collection organized by CV Datasets.
- Yet another Meta pointer
- ImageNet: a large-scale image dataset for visual recognition organized by WordNet hierarchy
- SUN Database: a benchmark for scene recognition and object detection with annotated scene categories and segmented objects
- Places Database: a scene-centric database with 205 scene categories and 2.5 millions of labelled images
- NYU Depth Dataset v2: a RGB-D dataset of segmented indoor scenes
- Microsoft COCO: a new benchmark for image recognition, segmentation and captioning
- Flickr100M: 100 million creative commons Flickr images
- Labeled Faces in the Wild: a dataset of 13,000 labeled face photographs
- Human Pose Dataset: a benchmark for articulated human pose estimation
- YouTube Faces DB: a face video dataset for unconstrained face recognition in videos
- UCF101: an action recognition data set of realistic action videos with 101 action categories
- HMDB-51: a large human motion dataset of 51 action classes
Others: you can always explore the Kaggle Datasets for various types of datasets.