In the era of the Internet of Things (IoT), an enormous amount of sensing
devices collect and/or generate various sensory data over time for a wide range
of fields and applications. Based on the nature of the application, these
devices will result in big or fast/real-time data streams. Applying analytics
over such data streams to discover new information, predict future insights,
and make control decisions is a crucial process that makes IoT a worthy
paradigm for businesses and a quality-of-life improving technology. In this
paper, we provide a thorough overview on using a class of advanced machine
learning techniques, namely Deep Learning (DL), to facilitate the analytics and
learning in the IoT domain. We start by articulating IoT data characteristics
and identifying two major treatments for IoT data from a machine learning
perspective, namely IoT big data analytics and IoT streaming data analytics. We
also discuss why DL is a promising approach to achieve the desired analytics in
these types of data and applications. The potential of using emerging DL
techniques for IoT data analytics are then discussed, and its promises and
challenges are introduced. We present a comprehensive background on different
DL architectures and algorithms. We also analyze and summarize major reported
research attempts that leveraged DL in the IoT domain. The smart IoT devices
that have incorporated DL in their intelligence background are also discussed.
DL implementation approaches on the fog and cloud centers in support of IoT
applications are also surveyed. Finally, we shed light on some challenges and
potential directions for future research. At the end of each section, we
highlight the lessons learned based on our experiments and review of the recent
[D] what percent of top conference papers fudge results?[reddit]/r/MachineLearning
I’ve noticed recently a lot of top conference papers with available repos are not reproducible. What have been everyone’s experience with this and are there any studies estimating what percent of papers fudge results?
[P] Looking for people to test my new GPU instance "cloud' service![reddit]/r/MachineLearning
Hi everyone! I've spent the last couple months building and configuring a virtual GPU "cloud/instance" service. I'm looking for anyone with ML/Ubuntu/GPU experience to put my VMs to the test and let me know what you like/dislike about it. User location in east Canada/USA would be preferable because host is located there. It's still in the early beta stages so I'd like to know how training times and latency compare to what you're currently used to. Absolutely free of charge. SSH and VNC connections are available through the web. Let me know if you're willing to try this out. All constructive criticism is greatly appreciated!
Advocates for Neuro-Symbolic AI (NeSy) assert that combining deep learning
with symbolic reasoning will lead to stronger AI than either paradigm on its
own. As successful as deep learning has been, it is generally accepted that
even our best deep learning systems are not very good at abstract reasoning.
And since reasoning is inextricably linked to language, it makes intuitive
sense that Natural Language Processing (NLP), would be a particularly
well-suited candidate for NeSy. We conduct a structured review of studies
implementing NeSy for NLP, challenges and future directions, and aim to answer
the question of whether NeSy is indeed meeting its promises: reasoning,
out-of-distribution generalization, interpretability, learning and reasoning
from small data, and transferability to new domains. We examine the impact of
knowledge representation, such as rules and semantic networks, language
structure and relational structure, and whether implicit or explicit reasoning
contributes to higher promise scores. We find that knowledge encoded in
relational structures and explicit reasoning tend to lead to more NeSy goals
being satisfied. We also advocate for a more methodical approach to the
application of theories of reasoning, which we hope can reduce some of the
friction between the symbolic and sub-symbolic schools of AI.
Geoscientists, as well as researchers in many fields, need to read a huge
amount of literature to locate, extract, and aggregate relevant results and
data to enable future research or to build a scientific database, but there is
no existing system to support this use case well. In this paper, based on the
findings of a formative study about how geoscientists collaboratively annotate
literature and extract and aggregate data, we proposed DeepShovel, a
publicly-available AI-assisted data extraction system to support their needs.
DeepShovel leverages the state-of-the-art neural network models to support
researcher(s) easily and accurately annotate papers (in the PDF format) and
extract data from tables, figures, maps, etc. in a human-AI collaboration
manner. A follow-up user evaluation with 14 researchers suggested DeepShovel
improved users' efficiency of data extraction for building scientific
databases, and encouraged teams to form a larger scale but more tightly-coupled
Author Atrribute dataset from Reddit[reddit]/r/datasets
Hello, we are doing a research about authorship attribution on Reddit.
We are trying to match two accounts for the same author. We want a volunteer who has two accounts with enough textual data to train the model. We are trying at least to get one match to prove the effectiveness of our model.
Please you can just type your usernames here and we will be taking care of collecting the posts.
[R] The Shapley Value in Machine Learning[reddit]/r/MachineLearning
Over the last few years, the Shapley value, a solution concept from cooperative game theory, has found numerous applications in machine learning. In this paper, we first discuss fundamental concepts of cooperative game theory and axiomatic properties of the Shapley value. Then we give an overview of the most important applications of the Shapley value in machine learning: feature selection, explainability, multi-agent reinforcement learning, ensemble pruning, and data valuation. We examine the most crucial limitations of the Shapley value and point out directions for future research.
[R][P] Announcing audax, a audio ML/DL framework in Jax[reddit]/r/MachineLearning
I'm here to announce the release of audax, a framework for audio ML/DL in Jax! I had been working on reproducing several recent papers as well as unifying my personal code-base, and decided to consolidate it all together in one place!
JAX implementation of torchaudio inspired Short time fourier transform (STFT), which is jit and vmap compatible. At the time of writing, Jax doesn't have an official native stft implementation. This STFT implementation powers torchaudio inspired vectorized Spectrogram and Melspectrogram feature extraction pipeline, which can run on CPUs, TPUs and GPUs alike.
AudioSet supervised pretrained models!
JAX Implementation of raw-waveform frontends LEAF  and SincNet . AudioSet pretrained weights (EfficientNet-b0) available!
Jax implementation of EfficientNets and ConvNeXT , the latest CNN architecture, with pretrained AudioSet weights for ConvNeXT-Tiny!
Self-supervised learning: Jax Implementation of COLA  and SimCLR . Pre-trained weights coming very soon.
A lot of pretrained models, self-supervised and supervised, across various audio tasks.
Grouping and recognition are important components of visual scene
understanding, e.g., for object detection and semantic segmentation. With
end-to-end deep learning systems, grouping of image regions usually happens
implicitly via top-down supervision from pixel-level recognition labels.
Instead, in this paper, we propose to bring back the grouping mechanism into
deep networks, which allows semantic segments to emerge automatically with only
text supervision. We propose a hierarchical Grouping Vision Transformer
(GroupViT), which goes beyond the regular grid structure representation and
learns to group image regions into progressively larger arbitrary-shaped
segments. We train GroupViT jointly with a text encoder on a large-scale
image-text dataset via contrastive losses. With only text supervision and
without any pixel-level annotations, GroupViT learns to group together semantic
regions and successfully transfers to the task of semantic segmentation in a
zero-shot manner, i.e., without any further fine-tuning. It achieves a
zero-shot accuracy of 51.2% mIoU on the PASCAL VOC 2012 and 22.3% mIoU on
PASCAL Context datasets, and performs competitively to state-of-the-art
transfer-learning methods requiring greater levels of supervision. Project page
is available at https://jerryxu.net/GroupViT.