Interesting papers

To inspire project ideas, here are some cool NLP papers:

To inspire project ideas, here are some cool Computer Vision papers:

Interesting datasets

NLP datasets:

Computer vision datasets:

  • Meta Pointer: A large collection organized by CV Datasets.
  • Yet another Meta pointer
  • ImageNet: a large-scale image dataset for visual recognition organized by WordNet hierarchy
  • SUN Database: a benchmark for scene recognition and object detection with annotated scene categories and segmented objects
  • Places Database: a scene-centric database with 205 scene categories and 2.5 millions of labelled images
  • NYU Depth Dataset v2: a RGB-D dataset of segmented indoor scenes
  • Microsoft COCO: a new benchmark for image recognition, segmentation and captioning
  • Flickr100M: 100 million creative commons Flickr images
  • Labeled Faces in the Wild: a dataset of 13,000 labeled face photographs
  • Human Pose Dataset: a benchmark for articulated human pose estimation
  • YouTube Faces DB: a face video dataset for unconstrained face recognition in videos
  • UCF101: an action recognition data set of realistic action videos with 101 action categories
  • HMDB-51: a large human motion dataset of 51 action classes

Others: you can always explore the Kaggle Datasets for various types of datasets.

Sample projects

We have created few sample projects, which can give you an idea about the types of challenges you can tackle

  1. Troll tweet / toxic comments detection
    • goal: detect toxic / troll comments on Facebook or in other social medias / forums
    • max team size: 3 people
    • datasets:
      • https://www.kaggle.com/vikasg/russian-troll-tweets
      • https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
  2. 3d human pose from 2d image
  3. growing of NNs during training (similar to GAN)
    • goal: apply progressive gorwing of neural networks during training (borrowed from GAN paper by Nvidia)
    • max team size: 4 people
    • dataset: the idea can be applied on various problem, the dataset would be chosen depending on the selected problem
  4. Fake news
    • goal: attempt to develop mathematical / heuristic model + Deep Learning approach to finding fake news
    • max team size: 4 people
    • datasets:
      • https://github.com/FakeNewsChallenge/fnc-1
  5. Predicting song popularity
    • goal: attempt to predict how popular a song can be, based on sound data, lyrics and other criteria
    • max team size: 5 people
    • datasets:
      • https://labrosa.ee.columbia.edu/millionsong/
      • https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking
      • https://www.kaggle.com/mousehead/songlyrics
      • https://www.kaggle.com/artimous/every-song-you-have-heard-almost
  6. Predicting bitcoin prices (based on financial indicators and sentiment)
    • goal: highly experimental topic, main idea would be to explore the feasibility of such algorithm
    • max team size: 5 people
    • datasets:
      • https://www.kaggle.com/mczielinski/bitcoin-historical-data
      • http://eventregistry.org/
      • https://www.kaggle.com/bigquery/bitcoin-blockchain
      • https://www.kaggle.com/jessevent/all-crypto-currencies
      • https://www.kaggle.com/snapcrack/all-the-news