Final Project - inspirations, datasets and papers

Interesting papers

To inspire project ideas, here are some cool NLP papers:

To inspire project ideas, here are some cool Computer Vision papers:

Object recognition: [Krizhevsky et al.], [Russakovsky et al.], [Szegedy et al.], [Simonyan et al.], [He et al.]
Object detection: [Girshick et al.], [Sermanet et al.], [Erhan et al.]
Image segmentation: [Long et al.]
Video classification: [Karpathy et al.], [Simonyan and Zisserman]
Scene classification: [Zhou et al.]
Face recognition: [Taigman et al.]
Depth estimation: [Eigen et al.]
Image-to-sentence generation: [Karpathy and Fei-Fei], [Donahue et al.], [Vinyals et al.]
Visualization and optimization: [Szegedy et al.], [Nguyen et al.], [Zeiler and Fergus], [Goodfellow et al.], [Schaul et al.]

Interesting datasets

NLP datasets:

Computer vision datasets:

Meta Pointer: A large collection organized by CV Datasets.
Yet another Meta pointer
ImageNet: a large-scale image dataset for visual recognition organized by WordNet hierarchy
SUN Database: a benchmark for scene recognition and object detection with annotated scene categories and segmented objects
Places Database: a scene-centric database with 205 scene categories and 2.5 millions of labelled images
NYU Depth Dataset v2: a RGB-D dataset of segmented indoor scenes
Microsoft COCO: a new benchmark for image recognition, segmentation and captioning
Flickr100M: 100 million creative commons Flickr images
Labeled Faces in the Wild: a dataset of 13,000 labeled face photographs
Human Pose Dataset: a benchmark for articulated human pose estimation
YouTube Faces DB: a face video dataset for unconstrained face recognition in videos
UCF101: an action recognition data set of realistic action videos with 101 action categories
HMDB-51: a large human motion dataset of 51 action classes

Others: you can always explore the Kaggle Datasets for various types of datasets.

Sample projects

We have created few sample projects, which can give you an idea about the types of challenges you can tackle

Troll tweet / toxic comments detection
- goal: detect toxic / troll comments on Facebook or in other social medias / forums
- max team size: 3 people
- datasets:
  - https://www.kaggle.com/vikasg/russian-troll-tweets
  - https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
3d human pose from 2d image
- goal: implementing a cutting edge research paper, you can see here
- max team size: 5 people
- dataset:
  - The DensePose dataset (expected to be released before mid of June)
  - Unite the people dataset
growing of NNs during training (similar to GAN)
- goal: apply progressive gorwing of neural networks during training (borrowed from GAN paper by Nvidia)
- max team size: 4 people
- dataset: the idea can be applied on various problem, the dataset would be chosen depending on the selected problem
Fake news
- goal: attempt to develop mathematical / heuristic model + Deep Learning approach to finding fake news
- max team size: 4 people
- datasets:
  - https://github.com/FakeNewsChallenge/fnc-1
Predicting song popularity
- goal: attempt to predict how popular a song can be, based on sound data, lyrics and other criteria
- max team size: 5 people
- datasets:
  - https://labrosa.ee.columbia.edu/millionsong/
  - https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking
  - https://www.kaggle.com/mousehead/songlyrics
  - https://www.kaggle.com/artimous/every-song-you-have-heard-almost
Predicting bitcoin prices (based on financial indicators and sentiment)
- goal: highly experimental topic, main idea would be to explore the feasibility of such algorithm
- max team size: 5 people
- datasets:
  - https://www.kaggle.com/mczielinski/bitcoin-historical-data
  - http://eventregistry.org/
  - https://www.kaggle.com/bigquery/bitcoin-blockchain
  - https://www.kaggle.com/jessevent/all-crypto-currencies
  - https://www.kaggle.com/snapcrack/all-the-news