2021-09-16: Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision https://arxiv.org/abs/2109.08203v1This work has serious limitations that could be overcome by spending ten times the computation budget on CIFAR to ensure all models are train to the best possible, and probably around 50 to 100 times the computation budget on ImageNet to allow training models from scratch
In this paper I investigate the effect of random seed selection on the
accuracy when using popular deep learning architectures for computer vision. I
scan a large amount of seeds (up to $10^4$) on CIFAR 10 and I also scan fewer
seeds on Imagenet using pre-trained models to investigate large scale datasets.
The conclusions are that even if the variance is not very large, it is
surprisingly easy to find an outlier that performs much better or much worse
than the average.
Very interesting thread on plagiarism in machine learning papers at top conferences in this post on r/ML The comments are especially quite revealing, with links to detailed discussions on Chinese discussion forum Zhuanlan Zhihu, in case you read Chinese. [redd.it]
[R] The Autodidactic Universe: “We present an approach to cosmology in which the Universe learns its own physical laws. It does so by exploring a landscape of possible laws, which we express as a certain class of matrix models.”[reddit]https://arxiv.org/abs/2104.03902/r/MachineLearningThese employ the renormalization group, the idea that there are no laws except to follow precedence, self-sampling methods, systems that maximize variety and geometrical self-assembly, and a direct mapping from a mathematical model of a learning system into the equations of motion of a general relativity and gauge fields
Abstract We present an approach to cosmology in which the Universe learns its own physical laws. It does so by exploring a landscape of possible laws, which we express as a certain class of matrix models. We discover maps that put each of these matrix models in correspondence with both a gauge/gravity theory and a mathematical model of a learning machine, such as a deep recurrent, cyclic neural network. This establishes a correspondence between each solution of the physical theory and a run of a neural network. This correspondence is not an equivalence, partly because gauge theories emerge from N → ∞ limits of the matrix models, whereas the same limits of the neural networks used here are not well-defined. We discuss in detail what it means to say that learning takes place in autodidactic systems, where there is no supervision. We propose that if the neural network model can be said to learn without supervision, the same can be said for the corresponding physical theory. We consider other protocols for autodidactic physical systems, such as optimization of graph variety, subset-replication using self-attention and look-ahead, geometrogenesis guided by reinforcement learning, structural learning using renormalization group techniques, and extensions. These protocols together provide a number of directions in which to explore the origin of physical laws based on putting machine learning architectures in correspondence with physical theories.
Mixed input model: one covariate needs 500 input units, the other is just a single one-hot. Is that going to work?[reddit]/r/MLQuestions
To be more precise, the code runs fine, but I am worried that the architecture-as-written makes the first input, which is a 500-dimensional vector of temperature readings (normalized / standardized to be mean-0, std. dev. = 1), seem way more important than the second input, which is just a binary indicator for whether it's currently "winter."
My understanding is that during backpropagation, the model updates the weights associated with each neuron at the same relative scale, e.g. neuron #7 gets +0.3 to the weight, and neuron #501 gets maybe -0.2. This I think will create an issue, alongside the random initialization of weights, that suggests that the 501st neuron is about as "heavy" as the other 500; at least at first. With enough training time, I assume the model could discover that the 501st neuron is critical, and give it a ton of weight. But in the interest of speeding up training + ensuring it understands its importance, I was thinking to duplicate this input such that it has 500 input neurons, which all fire either 0 or 1, depending on if it's winter. I think it's a dumb idea, but (1) I don't know how much of a hazard my stated problem actually is; (2) I don't know a better way to solve it. Any ideas? Thanks either way!
Is there any multilingual BERT model that was fine-tuned for sentiment analysis?[reddit]/r/MLQuestions
I’ve been searching for the most accurate open-source multilingual sentiment analysis library/API/ML model, and I have found that BERT is one of the few technologies that were trained on multilingual data (and not rely on translation libraries). So my question is: can I find a fine-tuned BERT model for multilingual sentiment analysis (huggingface/transformers, tfhub …)? If not, are there any resources I can use to learn how to fine-tune the pretrained model?
Also is there any other open source, accurate, and easier library you recommend for multilingual sentiment analysis?
Zip code analysis; need information from 2020 census??![reddit]/r/datasets
I’m trying to figure out the average income and population of zip codes in the U.S. what’s the best (and possibly free) way of going about that? I’ve used zipmapdata in the past but they use the 2010 census data. I have to make a spread shit of income by zip code and I’m having a difficult time finding a reliable website.
Also bonus if the website list all the school within a zip code!
2021-09-23: VBridge: Connecting the Dots Between Features and Data to Explain Healthcare Models https://arxiv.org/abs/2108.02550v2In this work, we identified three key challenges limiting the use of ML in clinical settings, including clinicians’ unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence
Machine learning (ML) is increasingly applied to Electronic Health Records
(EHRs) to solve clinical prediction tasks. Although many ML models perform
promisingly, issues with model transparency and interpretability limit their
adoption in clinical practice. Directly using existing explainable ML
techniques in clinical settings can be challenging. Through literature surveys
and collaborations with six clinicians with an average of 17 years of clinical
experience, we identified three key challenges, including clinicians'
unfamiliarity with ML features, lack of contextual information, and the need
for cohort-level evidence. Following an iterative design process, we further
designed and developed VBridge, a visual analytics tool that seamlessly
incorporates ML explanations into clinicians' decision-making workflow. The
system includes a novel hierarchical display of contribution-based feature
explanations and enriched interactions that connect the dots between ML
features, explanations, and data. We demonstrated the effectiveness of VBridge
through two case studies and expert interviews with four clinicians, showing
that visually associating model explanations with patients' situational records
can help clinicians better interpret and use model predictions when making
clinician decisions. We further derived a list of design implications for
developing future explainable ML tools to support clinical decision-making.