Been working on a few projects where I need both historic weather data (~4-5 years in the past) and pollen count data. I haven't been able to find any data sets / APIs to get this data.
Has anyone else had this problem and found a solution? For context one project is looking at predicting public transit usage (using weather as the main factor) and the other is looking at marketing allocation using pollen data as a main factor.
[D] Are small transformers better than small LSTMs?[reddit]/r/MachineLearning
Transformers are currently beating the state of the art on different NLP tasks.
Some examples are:
Machine translation: Transformer Big + BT
Named entity recognition: BERT large
Natural language inference: RoBERTa
Something I noticed is that in all of the papers, the models are massive with maybe 20 layers and 100s of millions of parameters.
Of course, using larger models is a general trend in NLP but it begs the question if small transformers are any good. I recently had to train a sequence to sequence model from scratch and I was unable to get better results with a transformer than with LSTMs.
I am wondering if someone here has had similar experiences or knows of any papers on this topic.
I made a notebook with examples of cool Python features that either took me a long time to find out or were too intimidating for me to use. I especially focus on the features I find useful for machine learning. [github.com] @chipro
Excited to share our work: collaboration requires understanding! In Overcooked, self-play doesn't gel with humans: it expects them to play like itself. (1/4) Demo: [github.io] Blog: [berkeley.edu] Paper: [arxiv.org] Code: [github.com][twitter] @rohinmshahWe introduced a simple environment based on the game Overcooked that is particularly well-suited for studying coordination, and demonstrated quantitatively the poor performance of such agents when paired with a learned human model, and with actual humans
Looking for suggestions for biomedical datasets similar to the Wisconsin Breast cancer database[reddit]/r/datasets
I am looking for biomedical databases similar to the Wisconsin breast cancer database (available at https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original) ). This database has 9 features (each feature values being integers ranging from 1 to 10) and two classes – benign and malignant. Defining characteristic of this dataset is that the higher feature values generally indicate higher chance of abnormality (malignancy). I am looking for other biomedical datasets having features with this property (not necessarily integer valued, can also be real valued; preferably with low number of features also - less than 30 or so)
[D] What's a hypothesis that you would really like to see tested, but never will get around to testing yourself, and hoping that someone else will get around to doing it?[reddit]/r/MachineLearning
-I really want to see doc2vec but with contextualized vectors (Bert, Elmo, etc) instead of word2vec. I think it'll be a slam dunk. I don't think I'll ever get around to testing this. If anyone wants to do it, i'll be happy to give some guidance if it's needed.
-I would really like to see word2vec or glove tested with a context limited to other words within the same sentence as the target word. Or, perhaps extend the context to any word in the same paragraph. I was sort of planning on doing this, but lost some motivation with the rise of contextualized vectors. I think it would give some great insight though.