[R] How Smart is BERT? Evaluating the Language Model’s Commonsense Knowledge[reddit]/r/MachineLearning
In the new paper Does BERT Solve Commonsense Task via Commonsense Knowledge?, a team of researchers from Westlake University, Fudan University and Microsoft Research Asia dive deep into the large language model to discover how it encodes the structured commonsense knowledge it leverages on downstream commonsense tasks.
Abstract: Deep learning techniques have recently demonstrated broad success in
predicting complex dynamical systems ranging from turbulence to human speech,
motivating broader questions about how neural networks encode and represent
dynamical rules. We explore this problem in the context of cellular automata
(CA), simple dynamical systems that are intrinsically discrete and thus
difficult to analyze using standard tools from dynamical systems theory. We
show that any CA may readily be represented using a convolutional neural
network with a network-in-network architecture. This motivates our development
of a general convolutional multilayer perceptron architecture, which we find
can learn the dynamical rules for arbitrary CA when given videos of the CA as
training data. In the limit of large network widths, we find that training
dynamics are nearly identical across replicates, and that common patterns
emerge in the structure of networks trained on different CA rulesets. We train
ensembles of networks on randomly-sampled CA, and we probe how the trained
networks internally represent the CA rules using an information-theoretic
technique based on distributions of layer activation patterns. We find that CA
with simpler rule tables produce trained networks with hierarchical structure
and layer specialization, while more complex CA produce shallower
representations---illustrating how the underlying complexity of the CA's rules
influences the specificity of these internal representations. Our results
suggest how the entropy of a physical process can affect its representation
when learned by neural networks.
[D] Reusable unit tests for deep learning projects[reddit]/r/MachineLearning
Recently there was some discussion in this sub about software engineering for ML/DL. Inspired by this, I thought it would be nice to go over some of the tools for good software engineering again. So I wrote a blog post about how to unit test your deep learning project:
Obviously, I am not the first to write about this topic, but I focused on how to maximize test reusability. After all, you don't want to spend all your time writing unit tests. I think it's an important topic for practitioners (especially in research), but next to no public project repository has unit tests in them.
What do you think? Did I miss an important test? Do you use unit tests in research at all?
Video Time How to bring neuroplasticity into RL? This Paper uses Hebbian Learning to reconfigure the policy *during* each episode and can adapt to changing environments and damaged parts on-the-fly! Watch Now! [youtu.be] @risii1979 @enasmel[twitter]
[D] List of over 30 data annotation companies grouped by data type: vision, NLP, audio and API availability[reddit]/r/MachineLearning
There are plenty of data annotation/ labeling companies all over the world. To get a better overview I started curating a list of them and added additional information like data types they support or whether they have an API to access their service.
The list has over 30 entries and is still growing on a monthly basis. If you know any company not on the list feel free to reach out and I will add it.
Looking datasets for GPA prediction[reddit]/r/datasets
Looking for GPA prediction datasets, containing information about students demographics, and also some surveys related their life, mood and things like this. I found studentlife dataset ( https://studentlife.cs.dartmouth.edu/dataset.html ), it has very detailed information, but it is only for 59 students. I would prefer the features be not this detailed, but for more number of students. Any information is highly appreciated.