python_feedgen09_jnboehm.com.atom.xml - sfeed_tests - sfeed tests and RSS and A… | |
git clone git://git.codemadness.org/sfeed_tests | |
Log | |
Files | |
Refs | |
README | |
LICENSE | |
--- | |
python_feedgen09_jnboehm.com.atom.xml (142453B) | |
--- | |
1 <?xml version='1.0' encoding='UTF-8'?> | |
2 <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><id>http://arxiv… | |
3 networks for time series (TS) forecasting on a large set of time series. | |
4 Current state-of-the-art deep ensemble models have high memory and | |
5 computational requirements, hampering their use to forecast millions of … | |
6 practical scenarios. We propose N-BEATS(P), a global multivariate varian… | |
7 the N-BEATS model designed to allow simultaneous training of multiple | |
8 univariate TS forecasting models. Our model addresses the practical limi… | |
9 of related models, reducing the training time by half and memory require… | |
10 a factor of 5, while keeping the same level of accuracy. We have perform… | |
11 multiple experiments detailing the various ways to train our model and h… | |
12 obtained results that demonstrate its capacity to support zero-shot TS | |
13 forecasting, i.e., to train a neural network on a source TS dataset and … | |
14 it on a different target TS dataset without retraining, which provides an | |
15 efficient and reliable solution to forecast at scale even in difficult | |
16 forecasting conditions. | |
17 </summary></entry><entry><id>http://arxiv.org/abs/2109.02624</id><title>… | |
18 and -- for shapes -- also scale, we extend generalized additive regressi… | |
19 models for the shape/form of planar curves or landmark configurations. T… | |
20 model respects the resulting quotient geometry of the response, employin… | |
21 squared geodesic distance as loss function and a geodesic response funct… | |
22 mapping the additive predictor to the shape/form space. For fitting the … | |
23 we propose a Riemannian $L_2$-Boosting algorithm well-suited for a poten… | |
24 large number of possibly parameter-intensive model terms, which also yie… | |
25 automated model selection. We provide novel intuitively interpretable | |
26 visualizations for (even non-linear) covariate effects in the shape/form… | |
27 via suitable tensor based factorizations. The usefulness of the proposed | |
28 framework is illustrated in an analysis of 1) astragalus shapes of wild … | |
29 domesticated sheep and 2) cell forms generated in a biophysical model, a… | |
30 as 3) in a realistic simulation study with response shapes and forms mot… | |
31 from a dataset on bottle outlines. | |
32 </summary></entry><entry><id>http://arxiv.org/abs/2107.04136</id><title>… | |
33 precision matrices encodes complete information about independence and | |
34 conditional independence properties. For general distributions, the cova… | |
35 and precision matrices reveal correlations and so-called partial correla… | |
36 between variables, but these do not, in general, have any correspondence… | |
37 respect to independence properties. In this paper, we prove that, for a … | |
38 class of non-Gaussian distributions, these correspondences still hold, e… | |
39 for the covariance and approximately for the precision. The distribution… | |
40 sometimes referred to as "nonparanormal" -- are given by diagonal | |
41 transformations of multivariate normal random variables. We provide seve… | |
42 analytic and numerical examples illustrating these results. | |
43 </summary></entry><entry><id>http://arxiv.org/abs/2106.09370</id><title>… | |
44 renewables is one of the pillars to power a carbon-neutral society by 20… | |
45 However, in contrast to conventional power plants, renewable energy is s… | |
46 to uncertainty raising challenges for their interaction with power syste… | |
47 Scenario-based probabilistic forecasting models have become a vital tool… | |
48 equip decision-makers. This paper presents to the power systems forecast… | |
49 practitioners a recent deep learning technique, the normalizing flows, to | |
50 produce accurate scenario-based probabilistic forecasts that are crucial… | |
51 face the new challenges in power systems applications. The strength of t… | |
52 technique is to directly learn the stochastic multivariate distribution … | |
53 underlying process by maximizing the likelihood. Through comprehensive | |
54 empirical evaluations using the open data of the Global Energy Forecasti… | |
55 Competition 2014, we demonstrate that this methodology is competitive wi… | |
56 other state-of-the-art deep learning generative models: generative adver… | |
57 networks and variational autoencoders. The models producing weather-base… | |
58 solar power, and load scenarios are properly compared in terms of foreca… | |
59 value by considering the case study of an energy retailer and quality us… | |
60 several complementary metrics. The numerical experiments are simple and … | |
61 reproducible. Thus, we hope it will encourage other forecasting practiti… | |
62 to test and use normalizing flows in power system applications such as b… | |
63 on electricity markets, scheduling power systems with high renewable ene… | |
64 sources penetration, energy management of virtual power plan or microgri… | |
65 unit commitment. | |
66 </summary></entry><entry><id>http://arxiv.org/abs/2105.14367</id><title>… | |
67 probability of an event conditioned on some inputs. A neural network (NN… | |
68 be used to compute the output distribution for continuous-domain, but it… | |
69 difficult to explicitly approximate a free-form one without knowing the | |
70 information of its general form a priori. In order to fit an arbitrary | |
71 conditional distribution, discretizing the continuous domain into bins i… | |
72 effective strategy, as long as we have sufficiently narrow bins and very… | |
73 data. However, collecting enough data is often hard to reach and falls f… | |
74 short of that ideal in many circumstances, especially in multivariate CD… | |
75 the curse of dimensionality. In this paper, we demonstrate the benefits … | |
76 modeling free-form conditional distributions using a deconvolution-based… | |
77 net framework, coping with data deficiency problems in discretization. I… | |
78 the advantage of being flexible but also takes advantage of the hierarch… | |
79 smoothness offered by the deconvolution layers. We compare our method to… | |
80 number of other density-estimation approaches and show that our Deconvol… | |
81 Density Network (DDN) outperforms the competing methods on many univaria… | |
82 multivariate tasks. | |
83 </summary></entry><entry><id>http://arxiv.org/abs/2102.07767</id><title>… | |
84 agents seeks to agree on a set of hypotheses that best describes a seque… | |
85 private observations. In the scenario where the set of hypotheses is lar… | |
86 propose a belief update rule where agents share compressed (either spars… | |
87 quantized) beliefs with an arbitrary positive compression rate. Our algo… | |
88 leverages a unified communication rule that enables agents to access | |
89 wide-ranging compression operators as black-box modules. We prove the al… | |
90 sure asymptotic exponential convergence of beliefs around the set of opt… | |
91 hypotheses. Additionally, we show a non-asymptotic, explicit, and linear | |
92 concentration rate in probability of the beliefs on the optimal hypothes… | |
93 We provide numerical experiments to illustrate the communication benefit… | |
94 our method. The simulation results show that the number of transmitted b… | |
95 be reduced to 5-10% of the non-compressed method in the studied scenario… | |
96 </summary></entry><entry><id>http://arxiv.org/abs/2012.15059</id><title>… | |
97 models that are trained across sets of time series, known as Global Fore… | |
98 Models (GFM), are regularly outperforming traditional univariate forecas… | |
99 models that work on isolated series. As GFMs usually share the same set … | |
100 parameters across all time series, they often have the problem of not be… | |
101 localised enough to a particular series, especially in situations where | |
102 datasets are heterogeneous. We study how ensembling techniques can be us… | |
103 generic GFMs and univariate models to solve this issue. Our work systema… | |
104 and compares relevant current approaches, namely clustering series and t… | |
105 separate submodels per cluster, the so-called ensemble of specialists ap… | |
106 and building heterogeneous ensembles of global and local models. We fill… | |
107 gaps in the existing GFM localisation approaches, in particular by | |
108 incorporating varied clustering techniques such as feature-based cluster… | |
109 distance-based clustering and random clustering, and generalise them to … | |
110 different underlying GFM model types. We then propose a new methodology … | |
111 clustered ensembles where we train multiple GFMs on different clusters of | |
112 series, obtained by changing the number of clusters and cluster seeds. U… | |
113 Feed-forward Neural Networks, Recurrent Neural Networks, and Pooled Regr… | |
114 models as the underlying GFMs, in our evaluation on eight publicly avail… | |
115 datasets, the proposed models are able to achieve significantly higher a… | |
116 than baseline GFM models and univariate forecasting methods. | |
117 </summary></entry><entry><id>http://arxiv.org/abs/2009.13267</id><title>… | |
118 such as BLEU score has been studied before for autoregressive neural mac… | |
119 translation (NMT) and resulted in alternative training algorithms (Ranza… | |
120 al., 2016; Norouzi et al., 2016; Shen et al., 2016; Wu et al., 2018). Ho… | |
121 MLE training remains the de facto approach for autoregressive NMT becaus… | |
122 its computational efficiency and stability. Despite this mismatch betwee… | |
123 training objective and task measure, we notice that the samples drawn fr… | |
124 MLE-based trained NMT support the desired distribution -- there are samp… | |
125 with much higher BLEU score comparing to the beam decoding output. To be… | |
126 from this observation, we train an energy-based model to mimic the behav… | |
127 the task measure (i.e., the energy-based model assigns lower energy to s… | |
128 with higher BLEU score), which is resulted in a re-ranking algorithm bas… | |
129 the samples drawn from NMT: energy-based re-ranking (EBR). We use both m… | |
130 energy models (over target sentence) and joint energy models (over both … | |
131 and target sentences). Our EBR with the joint energy model consistently | |
132 improves the performance of the Transformer-based NMT: +4 BLEU points on | |
133 IWSLT'14 German-English, +3.0 BELU points on Sinhala-English, +1.2 BLEU … | |
134 WMT'16 English-German tasks. | |
135 </summary></entry><entry><id>http://arxiv.org/abs/2005.11079</id><title>… | |
136 neural networks (GNNs) have been extensively explored. However, most exi… | |
137 GNNs inherently suffer from the limitations of over-smoothing, non-robus… | |
138 and weak-generalization when labeled nodes are scarce. In this paper, we | |
139 propose a simple yet effective framework -- GRAPH RANDOM NEURAL NETWORKS | |
140 (GRAND) -- to address these issues. In GRAND, we first design a random | |
141 propagation strategy to perform graph data augmentation. Then we leverage | |
142 consistency regularization to optimize the prediction consistency of unl… | |
143 nodes across different data augmentations. Extensive experiments on graph | |
144 benchmark datasets suggest that GRAND significantly outperforms | |
145 state-of-the-art GNN baselines on semi-supervised node classification. F… | |
146 we show that GRAND mitigates the issues of over-smoothing and non-robust… | |
147 exhibiting better generalization behavior than existing GNNs. The source… | |
148 of GRAND is publicly available at https://github.com/Grand20/grand. | |
149 </summary></entry><entry><id>http://arxiv.org/abs/2004.14427</id><title>… | |
150 restless bandits with average reward, using the paradigms of Q-learning … | |
151 Whittle index. Specifically, we leverage the structure of the Whittle in… | |
152 policy to reduce the search space of Q-learning, resulting in major | |
153 computational gains. Rigorous convergence analysis is provided, supporte… | |
154 numerical experiments. The numerical experiments show excellent empirical | |
155 performance of the proposed scheme. | |
156 </summary></entry><entry><id>http://arxiv.org/abs/2003.05738</id><title>… | |
157 state and action spaces. Multi-agent reinforcement learning attempts to … | |
158 this challenge by distributing control to specialized agents. However, | |
159 specialization hinders generalization and transferability, and the | |
160 computational graphs underlying neural-networks architectures -- dominat… | |
161 the multi-agent setting -- do not offer the flexibility to handle an arb… | |
162 number of entities which changes both between road networks, and over ti… | |
163 vehicles traverse the network. We introduce Inductive Graph Reinforcement | |
164 Learning (IG-RL) based on graph-convolutional networks which adapts to t… | |
165 structure of any road network, to learn detailed representations of | |
166 traffic-controllers and their surroundings. Our decentralized approach e… | |
167 learning of a transferable-adaptive-traffic-signal-control policy. After… | |
168 trained on an arbitrary set of road networks, our model can generalize t… | |
169 road networks, traffic distributions, and traffic regimes, with no addit… | |
170 training and a constant number of parameters, enabling greater scalabili… | |
171 compared to prior methods. Furthermore, our approach can exploit the | |
172 granularity of available data by capturing the (dynamic) demand at both … | |
173 lane and the vehicle levels. The proposed method is tested on both road | |
174 networks and traffic settings never experienced during training. We comp… | |
175 IG-RL to multi-agent reinforcement learning and domain-specific baseline… | |
176 both synthetic road networks and in a larger experiment involving the co… | |
177 of the 3,971 traffic signals of Manhattan, we show that different | |
178 instantiations of IG-RL outperform baselines. | |
179 </summary></entry><entry><id>http://arxiv.org/abs/1905.10029</id><title>… | |
180 data. However, they have been recently shown to be vulnerable to topolog… | |
181 attacks. To enhance adversarial robustness, we go beyond spectral graph … | |
182 to robust graph theory. By challenging the classical graph Laplacian, we | |
183 propose a new convolution operator that is provably robust in the spectr… | |
184 domain and is incorporated in the GCN architecture to improve expressivi… | |
185 interpretability. By extending the original graph to a sequence of graph… | |
186 also propose a robust training paradigm that encourages transferability … | |
187 graphs that span a range of spatial and spectral characteristics. The pr… | |
188 approaches are demonstrated in extensive experiments to simultaneously i… | |
189 performance in both benign and adversarial situations. | |
190 </summary></entry><entry><id>http://arxiv.org/abs/2109.10319</id><title>… | |
191 physiology and computer science. However, at present, most network analy… | |
192 ignores the direction. In this paper, we construct a spectral clustering… | |
193 based on the singular decomposition of the adjacency matrix to detect co… | |
194 in directed stochastic block model (DiSBM). By considering a sparsity | |
195 parameter, under some mild conditions, we show the proposed approach can | |
196 consistently recover hidden row and column communities for different sca… | |
197 degrees. | |
198 </summary></entry><entry><id>http://arxiv.org/abs/2109.10298</id><title>… | |
199 Linear Unit (ReLU) Neural Network (NN) architecture (number of layers and | |
200 number of neurons per layer) with the assurance that it is sufficiently | |
201 parametrized to control a nonlinear system; i.e. control the system to s… | |
202 a given formal specification. This is unlike current techniques, which p… | |
203 no assurances on the resultant architecture. Moreover, our approach requ… | |
204 only limited knowledge of the underlying nonlinear system and specificat… | |
205 assume only that the specification can be satisfied by a Lipschitz-conti… | |
206 controller with a known bound on its Lipschitz constant; the specific | |
207 controller need not be known. From this assumption, we bound the number … | |
208 affine functions needed to construct a Continuous Piecewise Affine (CPWA) | |
209 function that can approximate any Lipschitz-continuous controller that | |
210 satisfies the specification. Then we connect this CPWA to a NN architect… | |
211 using the authors' recent results on the Two-Level Lattice (TLL) NN | |
212 architecture; the TLL architecture was shown to be parameterized by the … | |
213 of affine functions present in the CPWA function it realizes. | |
214 </summary></entry><entry><id>http://arxiv.org/abs/2109.10279</id><title>… | |
215 challenging setup in applied machine learning. Even though model interpr… | |
216 has attracted more attention in recent years, many modeling approaches s… | |
217 focus mainly on performance. To further improve the interpretability of … | |
218 learning models, we suggest the adoption of concepts and tools from the | |
219 well-established framework of component based multiblock analysis, also … | |
220 as chemometrics. Nevertheless, artificial neural networks provide greater | |
221 flexibility in model architecture and thus, often deliver superior predi… | |
222 performance. In this study, we propose a setup to transfer the concepts … | |
223 component based statistical models, including multiblock variants of pri… | |
224 component regression and partial least squares regression, to neural net… | |
225 architectures. Thereby, we combine the flexibility of neural networks wi… | |
226 concepts for interpreting block relevance in multiblock methods. In two … | |
227 cases we demonstrate how the concept can be implemented in practice, and | |
228 compare it to both common feed-forward neural networks without blocks, a… | |
229 as statistical component based multiblock methods. Our results underline… | |
230 multiblock networks allow for basic model interpretation while matching … | |
231 performance of ordinary feed-forward neural networks. | |
232 </summary></entry><entry><id>http://arxiv.org/abs/2109.10262</id><title>… | |
233 reverse-mode automatic differentiation. We use this operator to generali… | |
234 several optimization algorithms, including a straightforward generalizat… | |
235 gradient descent and a novel generalization of Newton's method. We then … | |
236 which properties of these algorithms are preserved in this generalized s… | |
237 First, we show that the transformation invariances of these algorithms a… | |
238 preserved: while generalized Newton's method is invariant to all inverti… | |
239 linear transformations, generalized gradient descent is invariant only to | |
240 orthogonal linear transformations. Next, we show that we can express the… | |
241 in loss of generalized gradient descent with an inner product-like expre… | |
242 thereby generalizing the non-increasing and convergence properties of the | |
243 gradient descent optimization flow. Finally, we include several numerical | |
244 experiments to illustrate the ideas in the paper and demonstrate how we … | |
245 them to optimize polynomial functions over an ordered ring. | |
246 </summary></entry><entry><id>http://arxiv.org/abs/2109.10254</id><title>… | |
247 tasks, there is a greater need for accurate quantification of predictive | |
248 uncertainty. While the common goal in uncertainty quantification (UQ) in | |
249 machine learning is to approximate the true distribution of the target d… | |
250 many works in UQ tend to be disjoint in the evaluation metrics utilized,… | |
251 disparate implementations for each metric lead to numerical results that… | |
252 not directly comparable across different works. To address this, we intr… | |
253 Uncertainty Toolbox, an open-source python library that helps to assess, | |
254 visualize, and improve UQ. Uncertainty Toolbox additionally provides | |
255 pedagogical resources, such as a glossary of key terms and an organized | |
256 collection of key paper references. We hope that this toolbox is useful … | |
257 accelerating and uniting research efforts in uncertainty in machine lear… | |
258 </summary></entry><entry><id>http://arxiv.org/abs/2109.10219</id><title>… | |
259 are available. Physical experiments or detailed simulations that accurat… | |
260 capture the behavior of the system are regarded as high-fidelity models … | |
261 low model uncertainty, however, they are expensive to run. On the other … | |
262 simplified physical experiments or numerical models are seen as low-fide… | |
263 models that are cheaper to evaluate. Although low-fidelity models are of… | |
264 suitable for direct use in reliability analysis due to their low accurac… | |
265 can offer information about the trend of the high-fidelity model thus pr… | |
266 the opportunity to explore the design space at a low cost. This study pr… | |
267 a new approach called adaptive multi-fidelity Gaussian process for relia… | |
268 analysis (AMGPRA). Contrary to selecting training points and information | |
269 sources in two separate stages as done in state-of-the-art mfEGRA method… | |
270 proposed approach finds the optimal training point and information source | |
271 simultaneously using the novel collective learning function (CLF). CLF i… | |
272 to assess the global impact of a candidate training point from an inform… | |
273 source and it accommodates any learning function that satisfies a certain | |
274 profile. In this context, CLF provides a new direction for quantifying t… | |
275 impact of new training points and can be easily extended with new learni… | |
276 functions to adapt to different reliability problems. The performance of… | |
277 proposed method is demonstrated by three mathematical examples and one | |
278 engineering problem concerning the wind reliability of transmission towe… | |
279 is shown that the proposed method achieves similar or higher accuracy wi… | |
280 reduced computational costs compared to state-of-the-art single and | |
281 multi-fidelity methods. A key application of AMGPRA is high-fidelity fra… | |
282 modeling using complex and costly physics-based computational models. | |
283 </summary></entry><entry><id>http://arxiv.org/abs/2109.10162</id><title>… | |
284 $\varepsilon,\delta\in(0,1)$, a bounded function $f:\{-1,1\}^n\to[-1,1]$… | |
285 degree at most $d$ can be learned with probability at least $1-\delta$ a… | |
286 $L_2$-error $\varepsilon$ using $\log(\tfrac{n}{\delta})\,\varepsilon^{-… | |
287 C^{d^{3/2}\sqrt{\log d}}$ random queries for a universal finite constant… | |
288 </summary></entry><entry><id>http://arxiv.org/abs/2109.09988</id><title>… | |
289 has many practical applications. With existing classifiers we may be abl… | |
290 accurately classify signals, however that accuracy may decline if using a | |
291 reduced number of attributes. Transforming the data then undertaking red… | |
292 in dimensionality may improve the quality of the data analysis, decrease… | |
293 required for classification and simplify models. We propose an approach,… | |
294 chooses suitable wavelets to transform the data, then combines the outpu… | |
295 these transforms to construct a dataset to then apply ensemble classifie… | |
296 We demonstrate this on different data sets, across different classifiers… | |
297 use differing evaluation methods. Our experimental results demonstrate t… | |
298 effectiveness of the proposed technique, compared to the approaches that… | |
299 either raw signal data or a single wavelet transform. | |
300 </summary></entry><entry><id>http://arxiv.org/abs/2109.09859</id><title>… | |
301 covariates, and the associated nonconvex problem of fitting these models… | |
302 data. We develop a general recipe for analyzing the convergence of itera… | |
303 algorithms for this task from a random initialization. In particular, pr… | |
304 each iteration can be written as the solution to a convex optimization p… | |
305 satisfying some natural conditions, we leverage Gaussian comparison theo… | |
306 derive a deterministic sequence that provides sharp upper and lower boun… | |
307 the error of the algorithm with sample-splitting. Crucially, this determ… | |
308 sequence accurately captures both the convergence rate of the algorithm … | |
309 eventual error floor in the finite-sample regime, and is distinct from t… | |
310 commonly used "population" sequence that results from taking the | |
311 infinite-sample limit. We apply our general framework to derive several | |
312 concrete consequences for parameter estimation in popular statistical mo… | |
313 including phase retrieval and mixtures of regressions. Provided the samp… | |
314 scales near-linearly in the dimension, we show sharp global convergence … | |
315 for both higher-order algorithms based on alternating updates and first-… | |
316 algorithms based on subgradient descent. These corollaries, in turn, yie… | |
317 multiple consequences, including: (a) Proof that higher-order algorithms… | |
318 converge significantly faster than their first-order counterparts (and | |
319 sometimes super-linearly), even if the two share the same population upd… | |
320 (b) Intricacies in super-linear convergence behavior for higher-order | |
321 algorithms, which can be nonstandard (e.g., with exponent 3/2) and sensi… | |
322 the noise level in the problem. We complement these results with extensi… | |
323 numerical experiments, which show excellent agreement with our theoretic… | |
324 predictions. | |
325 </summary></entry><entry><id>http://arxiv.org/abs/2109.09856</id><title>… | |
326 proposed for predicting failure of a system or device with multivariate … | |
327 series sensor data. We treat the multivariate time series sensor data as… | |
328 for both visualization and computation. Failure follows various patterns… | |
329 are closely related to the root causes. Different predefined transformat… | |
330 are applied on the original sensors data to better characterize the fail… | |
331 patterns. In addition to feature derivation, ensemble method is used to … | |
332 improve the performance. In addition, a general algorithm architecture o… | |
333 neural network is proposed to handle multiple types of data with less ma… | |
334 feature engineering. We apply the proposed method on the early predict f… | |
335 of computer disk drive in order to improve storage systems availability … | |
336 avoid data loss. The classification accuracy is largely improved with the | |
337 enriched features, named smart features. | |
338 </summary></entry><entry><id>http://arxiv.org/abs/2109.09855</id><title>… | |
339 actions, dubbed R(MA)^2B. The state of each arm evolves according to a | |
340 controlled Markov decision process (MDP), and the reward of pulling an a… | |
341 depends on both the current state of the corresponding MDP and the action | |
342 taken. The goal is to sequentially choose actions for arms so as to maxi… | |
343 the expected value of the cumulative rewards collected. Since finding the | |
344 optimal policy is typically intractable, we propose a computationally ap… | |
345 index policy which we call Occupancy-Measured-Reward Index Policy. Our p… | |
346 is well-defined even if the underlying MDPs are not indexable. We prove … | |
347 is asymptotically optimal when the activation budget and number of arms … | |
348 scaled up, while keeping their ratio as a constant. For the case when the | |
349 system parameters are unknown, we develop a learning algorithm. Our lear… | |
350 algorithm uses the principle of optimism in the face of uncertainty and … | |
351 uses a generative model in order to fully exploit the structure of | |
352 Occupancy-Measured-Reward Index Policy. We call it the R(MA)^2B-UCB algo… | |
353 As compared with the existing algorithms, R(MA)^2B-UCB performs close to… | |
354 offline optimum policy, and also achieves a sub-linear regret with a low | |
355 computational complexity. Experimental results show that R(MA)^2B-UCB | |
356 outperforms the existing algorithms in both regret and run time. | |
357 </summary></entry><entry><id>http://arxiv.org/abs/2109.09847</id><title>… | |
358 interpreting machine learning models, with strong theoretical guarantees | |
359 (consistency, local accuracy) and a wide availability of implementations… | |
360 use cases. Even though computing SHAP values takes exponential time in g… | |
361 TreeSHAP takes polynomial time on tree-based models. While the speedup is | |
362 significant, TreeSHAP can still dominate the computation time of industr… | |
363 machine learning solutions on datasets with millions or more entries, ca… | |
364 delays in post-hoc model diagnosis and interpretation service. In this p… | |
365 present two new algorithms, Fast TreeSHAP v1 and v2, designed to improve… | |
366 computational efficiency of TreeSHAP for large datasets. We empirically … | |
367 that Fast TreeSHAP v1 is 1.5x faster than TreeSHAP while keeping the mem… | |
368 cost unchanged. Similarly, Fast TreeSHAP v2 is 2.5x faster than TreeSHAP… | |
369 the cost of a slightly higher memory usage, thanks to the pre-computatio… | |
370 expensive TreeSHAP steps. We also show that Fast TreeSHAP v2 is well-sui… | |
371 multi-time model interpretations, resulting in as high as 3x faster expl… | |
372 of newly incoming samples. | |
373 </summary></entry><entry><id>http://arxiv.org/abs/2109.09831</id><title>… | |
374 algorithms, can substantially impact their performance. To support users… | |
375 determining well-performing hyperparameter configurations for their algo… | |
376 datasets and applications at hand, SMAC3 offers a robust and flexible fr… | |
377 for Bayesian Optimization, which can improve performance within a few | |
378 evaluations. It offers several facades and pre-sets for typical use case… | |
379 as optimizing hyperparameters, solving low dimensional continuous (artif… | |
380 global optimization problems and configuring algorithms to perform well … | |
381 multiple problem instances. The SMAC3 package is available under a permi… | |
382 BSD-license at https://github.com/automl/SMAC3. | |
383 </summary></entry><entry><id>http://arxiv.org/abs/2109.09816</id><title>… | |
384 systems. In the beginning, the recommender and rational users have diffe… | |
385 pieces of knowledge, and the recommender needs to learn the users' knowl… | |
386 make better recommendations. The recommender learns users' knowledge by | |
387 observing whether each user followed or deviated from her recommendation… | |
388 show that learning frequently stalls if the recommender always recommend… | |
389 choice: users tend to follow the recommendation blindly, and their choic… | |
390 not reflect their knowledge. Social welfare and the learning rate are im… | |
391 drastically if the recommender abstains from recommending a choice when … | |
392 predicts that multiple arms will produce a similar payoff. | |
393 </summary></entry><entry><id>http://arxiv.org/abs/2011.02602</id><title>… | |
394 of small businesses and online shops. When processing these digital | |
395 transactions, recognizing each merchant's real identity (i.e., business … | |
396 is vital to ensure the integrity of payment processing systems. Conventi… | |
397 this problem is formulated as a time series classification problem solel… | |
398 the merchant transaction history. However, with the large scale of the d… | |
399 and changing behaviors of merchants and consumers over time, it is extre… | |
400 challenging to achieve satisfying performance from off-the-shelf classif… | |
401 methods. In this work, we approach this problem from a multi-modal learn… | |
402 perspective, where we use not only the merchant time series data but als… | |
403 information of merchant-merchant relationship (i.e., affinity) to verify… | |
404 self-reported business type (i.e., merchant category) of a given merchan… | |
405 Specifically, we design two individual encoders, where one is responsibl… | |
406 encoding temporal information and the other is responsible for affinity | |
407 information, and a mechanism to fuse the outputs of the two encoders to | |
408 accomplish the identification task. Our experiments on real-world credit… | |
409 transaction data between 71,668 merchants and 433,772,755 customers have | |
410 demonstrated the effectiveness and efficiency of the proposed model. | |
411 </summary></entry><entry><id>http://arxiv.org/abs/2007.05303</id><title>… | |
412 provide critical insights for payment processing companies. The capabili… | |
413 predicting merchants' future is crucial for fraud detection and recommen… | |
414 systems. Conventionally, this problem is formulated to predict one multi… | |
415 time series under the multi-horizon setting. However, real-world applica… | |
416 often require more than one future trend prediction considering the | |
417 uncertainties, where more than one multivariate time series needs to be | |
418 predicted. This problem is called multi-future prediction. In this work,… | |
419 combine the two research directions and propose to study this new proble… | |
420 multi-future, multi-horizon and multivariate time series prediction. This | |
421 problem is crucial as it has broad use cases in the financial industry to | |
422 reduce the risk while improving user experience by providing alternative | |
423 futures. This problem is also challenging as now we not only need to cap… | |
424 the patterns and insights from the past but also train a model that has a | |
425 strong inference capability to project multiple possible outcomes. To so… | |
426 this problem, we propose a new model using convolutional neural networks… | |
427 simple yet effective encoder-decoder structure to learn the time series … | |
428 from multiple perspectives. We use experiments on real-world merchant | |
429 transaction data to demonstrate the effectiveness of our proposed model.… | |
430 also provide extensive discussions on different model design choices in … | |
431 experimental section. | |
432 </summary></entry><entry><id>http://arxiv.org/abs/2109.09690</id><title>… | |
433 fast uncertainty estimates for predictions with Deep Neural Networks (DN… | |
434 Our main contribution is a practical and principled combination of DNNs … | |
435 sparse Gaussian Processes (GPs). We prove theoretically that DNNs can be… | |
436 as a special case of sparse GPs, namely mixtures of GP experts (MoE-GP),… | |
437 devise a learning algorithm that brings the derived theory into practice… | |
438 experiments from two different robotic tasks -- inverse dynamics of a | |
439 manipulator and object detection on a micro-aerial vehicle (MAV) -- we s… | |
440 effectiveness of our approach in terms of predictive uncertainty, improv… | |
441 scalability, and run-time efficiency on a Jetson TX2. We thus argue that… | |
442 approach can pave the way towards reliable and fast robot learning syste… | |
443 uncertainty awareness. | |
444 </summary></entry><entry><id>http://arxiv.org/abs/2109.09658</id><title>… | |
445 extensive amount of data generated by today's clinical systems, has led … | |
446 development of imaging AI solutions across the whole value chain of medi… | |
447 imaging, including image reconstruction, medical image segmentation, | |
448 image-based diagnosis and treatment planning. Notwithstanding the succes… | |
449 future potential of AI in medical imaging, many stakeholders are concern… | |
450 the potential risks and ethical implications of imaging AI solutions, wh… | |
451 perceived as complex, opaque, and difficult to comprehend, utilise, and … | |
452 in critical clinical applications. Despite these concerns and risks, the… | |
453 currently no concrete guidelines and best practices for guiding future AI | |
454 developments in medical imaging towards increased trust, safety and adop… | |
455 To bridge this gap, this paper introduces a careful selection of guiding | |
456 principles drawn from the accumulated experiences, consensus, and best | |
457 practices from five large European projects on AI in Health Imaging. The… | |
458 guiding principles are named FUTURE-AI and its building blocks consist o… | |
459 Fairness, (ii) Universality, (iii) Traceability, (iv) Usability, (v) Rob… | |
460 and (vi) Explainability. In a step-by-step approach, these guidelines are | |
461 further translated into a framework of concrete recommendations for spec… | |
462 developing, evaluating, and deploying technically, clinically and ethica… | |
463 trustworthy AI solutions into clinical practice. | |
464 </summary></entry><entry><id>http://arxiv.org/abs/2109.09105</id><title>… | |
465 including spoken language understanding (SLU). Spoken language requires … | |
466 understanding of speaker interactions, dialog states and speech induced | |
467 multimodal behaviors to generate a meaningful representation of the | |
468 conversation. In this work, we propose to dissect SLU into three represe… | |
469 properties:conversational (disfluency, pause, overtalk), channel (speake… | |
470 turn-tasks) and ASR (insertion, deletion,substitution). We probe BERT ba… | |
471 language models (BERT, RoBERTa) trained on spoken transcripts to investi… | |
472 its ability to understand multifarious properties in absence of any spee… | |
473 cues. Empirical results indicate that LM is surprisingly good at capturi… | |
474 conversational properties such as pause prediction and overtalk detectio… | |
475 lexical tokens. On the downsides, the LM scores low on turn-tasks and ASR | |
476 errors predictions. Additionally, pre-training the LM on spoken transcri… | |
477 restrain its linguistic understanding. Finally, we establish the efficac… | |
478 transferability of the mentioned properties on two benchmark datasets: | |
479 Switchboard Dialog Act and Disfluency datasets. | |
480 </summary></entry><entry><id>http://arxiv.org/abs/2109.07436</id><title>… | |
481 errors and deviations in execution if there is uncertainty in identifyin… | |
482 state. So an algorithm that computes a policy for a human to execute oug… | |
483 consider these effects in its computations. An optimal MDP policy that is | |
484 poorly executed (because of a human agent) maybe much worse than another… | |
485 that is executed with fewer errors. In this paper, we consider the probl… | |
486 erroneous execution and execution delay when computing policies for a hu… | |
487 agent that would act in a setting modeled by a Markov Decision Process. … | |
488 present a framework to model the likelihood of policy execution errors a… | |
489 likelihood of non-policy actions like inaction (delays) due to state | |
490 uncertainty. This is followed by a hill climbing algorithm to search for… | |
491 policies that account for these errors. We then use the best policy foun… | |
492 hill climbing with a branch and bound algorithm to find the optimal poli… | |
493 show experimental results in a Gridworld domain and analyze the performa… | |
494 the two algorithms. We also present human studies that verify if our | |
495 assumptions on policy execution by humans under state-aliasing are reaso… | |
496 </summary></entry><entry><id>http://arxiv.org/abs/2109.01134</id><title>… | |
497 for representation learning. It shifts from the tradition of using image… | |
498 discrete labels for learning a fixed set of weights, seen as visual conc… | |
499 to aligning images and raw text for two separate encoders. Such a paradi… | |
500 benefits from a broader source of supervision and allows zero-shot trans… | |
501 downstream tasks since visual concepts can be diametrically generated fr… | |
502 natural language, known as prompt. In this paper, we identify that a maj… | |
503 challenge of deploying such models in practice is prompt engineering. Th… | |
504 because designing a proper prompt, especially for context words surround… | |
505 class name, requires domain expertise and typically takes a significant … | |
506 of time for words tuning since a slight change in wording could have a h… | |
507 impact on performance. Moreover, different downstream tasks require spec… | |
508 designs, further hampering the efficiency of deployment. To overcome this | |
509 challenge, we propose a novel approach named context optimization (CoOp)… | |
510 main idea is to model context in prompts using continuous representation… | |
511 perform end-to-end learning from data while keeping the pre-trained para… | |
512 fixed. In this way, the design of task-relevant prompts can be fully aut… | |
513 Experiments on 11 datasets show that CoOp effectively turns pre-trained | |
514 vision-language models into data-efficient visual learners, requiring as… | |
515 one or two shots to beat hand-crafted prompts with a decent margin and a… | |
516 gain significant improvements when using more shots (e.g., at 16 shots t… | |
517 average gain is around 17% with the highest reaching over 50%). CoOp also | |
518 exhibits strong robustness to distribution shift. | |
519 </summary></entry><entry><id>http://arxiv.org/abs/2108.09432</id><title>… | |
520 deformation shape generators. The key idea is to enforce the preservatio… | |
521 local rigidity among the generated shapes. Our approach builds on an | |
522 approximation of the as-rigid-as possible (or ARAP) deformation energy. … | |
523 how to develop the unsupervised loss via a spectral decomposition of the | |
524 Hessian of the ARAP energy. Our loss nicely decouples pose and shape var… | |
525 through a robust norm. The loss admits simple closed-form expressions. I… | |
526 easy to train and can be plugged into any standard generation models, e.… | |
527 variational auto-encoder (VAE) and auto-decoder (AD). Experimental resul… | |
528 that our approach outperforms existing shape generation approaches consi… | |
529 on public benchmark datasets of various shape categories such as human, … | |
530 and bone. | |
531 </summary></entry><entry><id>http://arxiv.org/abs/2107.11913</id><title>… | |
532 has become the subject of interest of academia, government, and industry. | |
533 Efforts towards measuring different phenomena have gained traction in th… | |
534 community, as illustrated by the publication of several influential field | |
535 reports and policy documents. These metrics are designed to help decision | |
536 takers to inform themselves about the fast-moving and impacting influenc… | |
537 key advances in Artificial Intelligence in general and Machine Learning … | |
538 particular. In this paper we propose to use such newfound capabilities o… | |
539 technologies to augment our AI measuring capabilities. We do so by train… | |
540 model to classify publications related to ethical issues and concerns. I… | |
541 methodology we use an expert, manually curated dataset as the training s… | |
542 then evaluate a large set of research papers. Finally, we highlight the | |
543 implications of AI metrics, in particular their contribution towards dev… | |
544 trustful and fair AI-based tools and technologies. Keywords: AI Ethics; … | |
545 Fairness; AI Measurement. Ethics in Computer Science. | |
546 </summary></entry><entry><id>http://arxiv.org/abs/2107.04775</id><title>… | |
547 high-dimensional environments to learn complex tasks, but can often exhi… | |
548 unsafe behaviors and require extensive environment interaction when expl… | |
549 is unconstrained. A promising strategy for learning in dynamically uncer… | |
550 environments is requiring that the agent can robustly return to learned … | |
551 sets, where task success (and therefore safety) can be guaranteed. While… | |
552 approach has been successful in low-dimensions, enforcing this constrain… | |
553 environments with visual observations is exceedingly challenging. We pre… | |
554 novel continuous representation for safe sets by framing it as a binary | |
555 classification problem in a learned latent space, which flexibly scales … | |
556 image observations. We then present a new algorithm, Latent Space Safe S… | |
557 (LS3), which uses this representation for long-horizon tasks with sparse | |
558 rewards. We evaluate LS3 on 4 domains, including a challenging sequential | |
559 pushing task in simulation and a physical cable routing task. We find th… | |
560 can use prior task successes to restrict exploration and learn more effi… | |
561 than prior algorithms while satisfying constraints. See | |
562 https://tinyurl.com/latent-ss for code and supplementary material. | |
563 </summary></entry><entry><id>http://arxiv.org/abs/2106.07857</id><title>… | |
564 human-robot interaction. Current researches in this field mainly focus on | |
565 generating responses consistent with the robot's pre-assigned persona, w… | |
566 ignoring the user's persona. Such responses may be inappropriate or even | |
567 offensive, which may lead to the bad user experience. Therefore, we prop… | |
568 Bilateral Personalized Dialogue Generation (BPDG) method for dyadic | |
569 conversation, which integrates user and robot personas into dialogue gen… | |
570 via designing a dynamic persona-aware fusion method. To bridge the gap b… | |
571 the learning objective function and evaluation metrics, the Conditional … | |
572 Information Maximum (CMIM) criterion is adopted with contrastive learnin… | |
573 select the proper response from the generated candidates. Moreover, a bi… | |
574 persona accuracy metric is designed to measure the degree of bilateral | |
575 personalization. Experimental results demonstrate that, compared with se… | |
576 state-of-the-art methods, the final results of the proposed method are m… | |
577 personalized and consistent with bilateral personas in terms of both aut… | |
578 and manual evaluations. | |
579 </summary></entry><entry><id>http://arxiv.org/abs/2105.15033</id><title>… | |
580 and conceptual knowledge, especially in the medical domain. However, the… | |
581 of high-quality annotated corpora remains a crucial problem for advancin… | |
582 research and applications on this task. In order to accelerate the resea… | |
583 domain-specific knowledge graphs in the medical domain, we introduce Dia… | |
584 high-quality Chinese dataset for Diabetes knowledge graph, which contains | |
585 22,050 entities and 6,890 relations in total. We implement recent typical | |
586 methods for Named Entity Recognition and Relation Extraction as a benchm… | |
587 evaluate the proposed dataset thoroughly. Empirical results show that th… | |
588 is challenging for most existing methods and further analysis is conduct… | |
589 discuss future research direction for improvements. We hope the release … | |
590 dataset can assist the construction of diabetes knowledge graphs and fac… | |
591 AI-based applications. | |
592 </summary></entry><entry><id>http://arxiv.org/abs/2105.11844</id><title>… | |
593 aerial and satellite images is of high importance in several fields such… | |
594 security, anomaly detection, land use planning and land use change detec… | |
595 However, the detection of such infrastructures is complex as they have h… | |
596 variable shapes and sizes, i.e., some infrastructures, such as electrical | |
597 substations, are too small while others, such as airports, are too large. | |
598 Besides, airports can have a surface area either small or too large with | |
599 completely different shapes, which makes its correct detection challengi… | |
600 far as we know, these limitations have not been tackled yet in previous … | |
601 This paper presents (1) a smart Critical Infrastructure dataset, named | |
602 CI-dataset, organised into two scales, small and large scales critical | |
603 infrastructures and (2) a two-level resolution-independent critical | |
604 infrastructure detection (DetDSCI) methodology that first determines the | |
605 spatial resolution of the input image using a classification model, then | |
606 analyses the image using the appropriate detector for that spatial resol… | |
607 The present study targets two representative classes, airports and elect… | |
608 substations. Our experiments show that DetDSCI methodology achieves up to | |
609 37,53% F1 improvement with respect to Faster R-CNN, one of the most infl… | |
610 detection models. | |
611 </summary></entry><entry><id>http://arxiv.org/abs/2103.13460</id><title>… | |
612 widely deployed in industrial robotics settings. Part of the challenge l… | |
613 identifying slip and other key events from the tactile data stream. In t… | |
614 paper, we present a learning-based method to detect slip using barometric | |
615 tactile sensors. Although these sensors have a low resolution, they have… | |
616 other desirable properties including high reliability and durability, a … | |
617 slim profile, and a low cost. We are able to achieve slip detection accu… | |
618 of greater than 91% while being robust to the speed and direction of the… | |
619 motion. Further, we test our detector on two robot manipulation tasks in… | |
620 common household objects and demonstrate successful generalization to | |
621 real-world scenarios not seen during training. We show that barometric t… | |
622 sensing technology, combined with data-driven learning, is potentially s… | |
623 for complex manipulation tasks such as slip compensation. | |
624 </summary></entry><entry><id>http://arxiv.org/abs/2102.08633</id><title>… | |
625 rules, answer high-level questions such as "May I qualify for VA health … | |
626 benefits?", and ask follow-up clarification questions whose answer is ne… | |
627 to answer the original question. However, existing works assume the rule… | |
628 is provided for each user question, which neglects the essential retriev… | |
629 in real scenarios. In this work, we propose and investigate an open-retr… | |
630 setting of conversational machine reading. In the open-retrieval setting… | |
631 relevant rule texts are unknown so that a system needs to retrieve | |
632 question-relevant evidence from a collection of rule texts, and answer u… | |
633 high-level questions according to multiple retrieved rule texts in a | |
634 conversational manner. We propose MUDERN, a Multi-passage Discourse-aware | |
635 Entailment Reasoning Network which extracts conditions in the rule texts | |
636 through discourse segmentation, conducts multi-passage entailment reason… | |
637 answer user questions directly, or asks clarification follow-up question… | |
638 inquiry more information. On our created OR-ShARC dataset, MUDERN achiev… | |
639 state-of-the-art performance, outperforming existing single-passage | |
640 conversational machine reading models as well as a new multi-passage | |
641 conversational machine reading baseline by a large margin. In addition, … | |
642 conduct in-depth analyses to provide new insights into this new setting … | |
643 model. | |
644 </summary></entry><entry><id>http://arxiv.org/abs/2102.07358</id><title>… | |
645 methods. In some target problem domains, there are not many data samples | |
646 available, which could significantly hinder the learning process. While … | |
647 from similar domains may be leveraged to help through domain adaptation, | |
648 obtaining high-quality labeled data for those source domains themselves … | |
649 be difficult or costly. To address such challenges on data insufficiency… | |
650 classification problem in a target domain, we propose a weak adaptation | |
651 learning (WAL) approach that leverages unlabeled data from a similar sou… | |
652 domain, a low-cost weak annotator that produces labels based on task-spe… | |
653 heuristics, labeling rules, or other methods (albeit with inaccuracy), a… | |
654 small amount of labeled data in the target domain. Our approach first co… | |
655 a theoretical analysis on the error bound of the trained classifier with | |
656 respect to the data quantity and the performance of the weak annotator, … | |
657 then introduces a multi-stage weak adaptation learning method to learn an | |
658 accurate classifier by lowering the error bound. Our experiments demonst… | |
659 the effectiveness of our approach in learning an accurate classifier with | |
660 limited labeled data in the target domain and unlabeled data in the sour… | |
661 domain. | |
662 </summary></entry><entry><id>http://arxiv.org/abs/2102.04394</id><title>… | |
663 powerful formalism to represent both the quantum and classical uncertain… | |
664 quantum systems and to express different statistical operations such as | |
665 measurement, system combination and expectations as linear algebra opera… | |
666 This paper explores how density matrices can be used as a building block… | |
667 build machine learning models exploiting their ability to straightforwar… | |
668 combine linear algebra and probability. One of the main results of the p… | |
669 to show that density matrices coupled with random Fourier features could | |
670 approximate arbitrary probability distributions over $\mathbb{R}^n$. Bas… | |
671 this finding the paper builds different models for density estimation, | |
672 classification and regression. These models are differentiable, so it is | |
673 possible to integrate them with other differentiable components, such as… | |
674 learning architectures and to learn their parameters using gradient-based | |
675 optimization. In addition, the paper presents optimization-less training | |
676 strategies based on estimation and model averaging. The models are evalu… | |
677 benchmark tasks and the results are reported and discussed. | |
678 </summary></entry><entry><id>http://arxiv.org/abs/2011.11152</id><title>… | |
679 training deep neural networks that generalize well. Previous work usually | |
680 interpreted weight decay as a Gaussian prior from the Bayesian perspecti… | |
681 However, weight decay sometimes shows mysterious behaviors beyond the | |
682 conventional understanding. For example, the optimal weight decay value … | |
683 to be zero given long enough training time. Moreover, existing work typi… | |
684 failed to recognize the importance of scheduling weight decay during tra… | |
685 Our work aims at theoretically understanding novel behaviors of weight d… | |
686 and designing schedulers for weight decay in deep learning. This paper m… | |
687 has three contributions. First, we propose a novel theoretical interpret… | |
688 of weight decay from the perspective of learning dynamics. Second, we pr… | |
689 novel weight-decay linear scaling rule for large-batch training that | |
690 proportionally increases weight decay rather than the learning rate as t… | |
691 batch size increases. Third, we provide an effective learning-rate-aware | |
692 scheduler for weight decay, called the Stable Weight Decay (SWD) method,… | |
693 to the best of our knowledge, is the first practical design for weight d… | |
694 scheduling. In our various experiments, the SWD method often makes impro… | |
695 over $L_{2}$ Regularization and Decoupled Weight Decay. | |
696 </summary></entry><entry><id>http://arxiv.org/abs/2011.02073</id><title>… | |
697 policies for high-dimensional, complex robotic tasks, but tends to be | |
698 data-inefficient. Model-based RL tends to be more data-efficient but oft… | |
699 suffers from learning a high-dimensional model that is good enough for p… | |
700 improvement. This limits its use to learning simple models for restricti… | |
701 domains. Optimal control generates solutions without collecting any data, | |
702 assuming an accurate model of the system and environment is known, which… | |
703 often true in many control theory applications. However, optimal control… | |
704 be scaled to problems with a high-dimensional state space. In this paper… | |
705 propose a novel approach to alleviate data inefficiency of model-free RL… | |
706 high-dimensional problems by warm-starting the learning process using a | |
707 lower-dimensional model-based solution. Particularly, we initialize a ba… | |
708 function for the high-dimensional RL problem via supervision from a | |
709 lower-dimensional value function, which can be obtained by solving a | |
710 lower-dimensional problem with a known, approximate model using "classic… | |
711 techniques such as value iteration or optimal control. Therefore, our ap… | |
712 implicitly exploits the model priors from simplified problem space to | |
713 facilitate the policy learning in high-dimensional RL tasks. We demonstr… | |
714 approach on two representative robotic learning tasks and observe signif… | |
715 improvement in policy performance and learning efficiency. We also evalu… | |
716 method empirically with a third task. | |
717 </summary></entry><entry><id>http://arxiv.org/abs/2004.12908</id><title>… | |
718 current task, but also on previously encountered and as yet unencountered | |
719 tasks. In contrast, classical machine learning starts from a blank slate… | |
720 tabula rasa, using data only for the single task at hand. While typical | |
721 transfer learning algorithms can improve performance on future tasks, th… | |
722 performance on prior tasks degrades upon learning new tasks (called | |
723 catastrophic forgetting). Many recent approaches for continual or lifelo… | |
724 learning have attempted to maintain performance given new tasks. But str… | |
725 to avoid forgetting sets the goal unnecessarily low: the goal of lifelong | |
726 learning, whether biological or artificial, should be to improve perform… | |
727 all tasks (including past and future) with any new data. We propose | |
728 omnidirectional transfer learning algorithms, which includes two special… | |
729 of interest: decision forests and deep networks. Our key insight is the | |
730 development of the omni-voter layer, which ensembles representations lea… | |
731 independently on all tasks to jointly decide how to proceed on any given… | |
732 data point, thereby improving performance on both past and future tasks.… | |
733 algorithms demonstrate omnidirectional transfer in a variety of simulate… | |
734 real data scenarios, including tabular data, image data, spoken data, and | |
735 adversarial tasks. Moreover, they do so with quasilinear space and time | |
736 complexity. | |
737 </summary></entry><entry><id>http://arxiv.org/abs/2109.10322</id><title>… | |
738 dense visual recognition tasks, such as scene segmentation. The last lay… | |
739 FCN is typically a global classifier (1x1 convolution) to recognize each… | |
740 to a semantic label. We empirically show that this global classifier, ig… | |
741 the intra-class distinction, may lead to sub-optimal results. | |
742 </summary></entry><entry><id>http://arxiv.org/abs/2109.10317</id><title>… | |
743 do. But deep neural networks are fragile and their behaviors are often | |
744 surprising. In many settings, we need to provide formal guarantees on the | |
745 safety, security, correctness, or robustness of neural networks. This bo… | |
746 covers foundational ideas from formal verification and their adaptation … | |
747 reasoning about neural networks and deep learning. | |
748 </summary></entry><entry><id>http://arxiv.org/abs/2109.10312</id><title>… | |
749 skills from raw images that can be sequenced to complete long-horizon | |
750 visuomotor tasks. Reinforcement learning (RL) is a promising approach for | |
751 acquiring short-horizon skills autonomously. However, the focus of RL | |
752 algorithms has largely been on the success of those individual skills, m… | |
753 than learning and grounding a large repertoire of skills that can be seq… | |
754 to complete extended multi-stage tasks. The latter demands robustness and | |
755 persistence, as errors in skills can compound over time, and may require… | |
756 robot to have a number of primitive skills in its repertoire, rather tha… | |
757 one. To this end, we introduce EMBR, a model-based RL method for learning | |
758 primitive skills that are suitable for completing long-horizon visuomotor | |
759 tasks. EMBR learns and plans using a learned model, critic, and success | |
760 classifier, where the success classifier serves both as a reward functio… | |
761 RL and as a grounding mechanism to continuously detect if the robot shou… | |
762 retry a skill when unsuccessful or under perturbations. Further, the lea… | |
763 model is task-agnostic and trained using data from all skills, enabling … | |
764 robot to efficiently learn a number of distinct primitives. These visuom… | |
765 primitive skills and their associated pre- and post-conditions can then … | |
766 directly combined with off-the-shelf symbolic planners to complete long-… | |
767 tasks. On a Franka Emika robot arm, we find that EMBR enables the robot … | |
768 complete three long-horizon visuomotor tasks at 85% success rate, such as | |
769 organizing an office desk, a file cabinet, and drawers, which require | |
770 sequencing up to 12 skills, involve 14 unique learned primitives, and de… | |
771 generalization to novel objects. | |
772 </summary></entry><entry><id>http://arxiv.org/abs/2109.10303</id><title>… | |
773 deterministic finite automata with rewards as outputs, based on Kolmogor… | |
774 complexity. Kolmogorov complexity is considered since it can detect | |
775 computational regularities of deterministic optimal policies. We present… | |
776 planning objective yielding an explicit trade-off between a policy's | |
777 performance and complexity. It is proven that maximising this objective … | |
778 non-trivial in the sense that dynamic programming is infeasible. We pres… | |
779 algorithms obtaining low-complexity policies, where the first algorithm … | |
780 a low-complexity optimal policy, and the second algorithm finds a policy | |
781 maximising performance while maintaining local (stage-wise) complexity | |
782 constraints. We evaluate the algorithms on a simple navigation task for a | |
783 mobile robot, where our algorithms yield low-complexity policies that co… | |
784 with intuition. | |
785 </summary></entry><entry><id>http://arxiv.org/abs/2109.10285</id><title>… | |
786 light of itssignificance in a wide range of applications including healt… | |
787 transportation and fi-nance. Until now, the early classification problem… | |
788 been dealt with by considering onlyirrevocable decisions. This paper int… | |
789 a new problem calledearly and revocabletimeseries classification, where … | |
790 decision maker can revoke its earlier decisions based on thenew available | |
791 measurements. In order to formalize and tackle this problem, we propose … | |
792 cost-based framework and derive two new approaches from it. The first ap… | |
793 doesnot consider explicitly the cost of changing decision, while the sec… | |
794 does. Exten-sive experiments are conducted to evaluate these approaches … | |
795 large benchmark of realdatasets. The empirical results obtained convinci… | |
796 show (i) that the ability of revok-ing decisions significantly improves | |
797 performance over the irrevocable regime, and (ii) thattaking into accoun… | |
798 cost of changing decision brings even better results in | |
799 general.Keywords:revocable decisions, cost estimation, online decision m… | |
800 </summary></entry><entry><id>http://arxiv.org/abs/2109.10246</id><title>… | |
801 their lack of grounding, i.e., connecting words to their meanings in the | |
802 physical world. Vision-and-Language (VL) models, trained jointly on text… | |
803 image or video data, have been offered as a response to such criticisms. | |
804 However, while VL pretraining has shown success on multimodal tasks such… | |
805 visual question answering, it is not yet known how the internal linguist… | |
806 representations themselves compare to their text-only counterparts. This… | |
807 compares the semantic representations learned via VL vs. text-only pretr… | |
808 for two recent VL models using a suite of analyses (clustering, probing,… | |
809 performance on a commonsense question answering task) in a language-only | |
810 setting. We find that the multimodal models fail to significantly outper… | |
811 the text-only variants, suggesting that future work is required if multi… | |
812 pretraining is to be pursued as a means of improving NLP in general. | |
813 </summary></entry><entry><id>http://arxiv.org/abs/2109.10231</id><title>… | |
814 provide insights towards behavior change. Prior work has explored how | |
815 self-trackers reflect on their logged data, but it remains unclear how m… | |
816 they learn from the tracking feedback, and which information is more use… | |
817 Indeed, the feedback can still be overwhelming, and making it concise can | |
818 improve learning by increasing focus and reducing interpretation burden.… | |
819 conducted a field study of mobile food logging with two feedback modes (… | |
820 journaling and automatic annotation of food images) and identified learn… | |
821 differences regarding nutrition, assessment, behavioral, and contextual | |
822 information. We propose a Self-Tracking Feedback Saliency Framework to d… | |
823 when to provide feedback, on which specific information, why those detai… | |
824 how to present them (as manual inquiry or automatic feedback). We propose | |
825 SalienTrack to implement these requirements. Using the data collected fr… | |
826 user study, we trained a machine learning model to predict whether a use… | |
827 learn from each tracked event. Using explainable AI (XAI) techniques, we | |
828 identified the most salient features per instance and why they lead to p… | |
829 learning outcomes. We discuss implications for learnability in self-trac… | |
830 and how adding model explainability expands opportunities for improving | |
831 feedback experience. | |
832 </summary></entry><entry><id>http://arxiv.org/abs/2109.10217</id><title>… | |
833 of content in various industries. These techniques require extensive kno… | |
834 of the desired content, and about how to actually implement such procedu… | |
835 methods. Algorithms for learning interpretable generative models from ex… | |
836 content could alleviate both difficulties. We propose SIGI, a novel meth… | |
837 inferring shapes and inducing a shape grammar from grid-based 3D building | |
838 examples. This interpretable grammar is well-suited for co-creative desi… | |
839 Applied to Minecraft buildings, we show how the shape grammar can be use… | |
840 automatically generate new buildings in a similar style. | |
841 </summary></entry><entry><id>http://arxiv.org/abs/2109.10200</id><title>… | |
842 where both customer locations and demands are uncertain. In particular, | |
843 potential customers are not restricted to a predefined customer set but … | |
844 continuously spatially distributed in a given service area. The objectiv… | |
845 maximize the served demands while fulfilling vehicle capacities and time | |
846 restrictions. We call this problem the VRP with stochastic customers and | |
847 demands (VRPSCD). For this problem, we first propose a Markov Decision P… | |
848 (MDP) formulation representing the classical centralized decision-making | |
849 perspective where one decision-maker establishes the routes of all vehic… | |
850 While the resulting formulation turns out to be intractable, it provides… | |
851 with the ground to develop a new MDP formulation of the VRPSCD represent… | |
852 decentralized decision-making framework, where vehicles autonomously est… | |
853 their own routes. This new formulation allows us to develop several stra… | |
854 to reduce the dimension of the state and action spaces, resulting in a | |
855 considerably more tractable problem. We solve the decentralized problem … | |
856 Reinforcement Learning, and in particular, we develop a Q-learning algor… | |
857 featuring state-of-the-art acceleration techniques such as Replay Memory… | |
858 Double Q Network. Computational results show that our method considerably | |
859 outperforms two commonly adopted benchmark policies (random and heuristi… | |
860 Moreover, when comparing with existing literature, we show that our appr… | |
861 can compete with specialized methods developed for the particular case o… | |
862 VRPSCD where customer locations and expected demands are known in advanc… | |
863 Finally, we show that the value functions and policies obtained by our | |
864 algorithm can be easily embedded in Rollout algorithms, thus further imp… | |
865 their performances. | |
866 </summary></entry><entry><id>http://arxiv.org/abs/2109.10199</id><title>… | |
867 led researchers and engineers to investigate novel models for robust and | |
868 reliable control of autonomous robots (navigation, obstacle detection and | |
869 avoidance, etc.), especially for quadrotors in challenging contexts such… | |
870 drone racing and aggressive maneuvers. Using spiking neural networks, th… | |
871 models can be run on neuromorphic hardware to benefit from outstanding u… | |
872 rates and high energy efficiency. Yet, low-level controllers are often | |
873 neglected and remain outside of the neuromorphic loop. Designing low-lev… | |
874 neuromorphic controllers is crucial to remove the standard PID, and ther… | |
875 benefit from all the advantages of closing the neuromorphic loop. In this | |
876 paper, we propose a parsimonious and adjustable neuromorphic PID control… | |
877 endowed with a minimal number of 93 neurons sparsely connected to achieve | |
878 autonomous, onboard altitude control of a quadrotor equipped with Intel'… | |
879 neuromorphic chip. We successfully demonstrate the robustness of our pro… | |
880 network in a set of experiments where the quadrotor is requested to reac… | |
881 target altitude from take-off. Our results confirm the suitability of su… | |
882 low-level neuromorphic controllers, ultimately with a very high update | |
883 frequency. | |
884 </summary></entry><entry><id>http://arxiv.org/abs/2109.10187</id><title>… | |
885 in aerial images are displayed in arbitrary directions and usually dense… | |
886 packed. Although considerable progress has been made, there are still | |
887 challenges that existing regression-based rotation detectors suffer the … | |
888 of discontinuous boundaries, which is directly caused by angular periodi… | |
889 corner ordering. In this paper, we propose a simple effective framework … | |
890 address the above challenges. Instead of directly regressing the five | |
891 parameters (coordinates of the central point, width, height, and rotation | |
892 angle) or the four vertices, we use the area ratio of parallelogram (ARP… | |
893 accurately describe a multi-oriented object. Specifically, we regress | |
894 coordinates of center point, height and width of minimum circumscribed | |
895 rectangle of oriented object and three area ratios {\lambda}_1, {\lambda… | |
896 {\lambda}_3. This may facilitate the offset learning and avoid the issue… | |
897 angular periodicity or label points sequence for oriented objects. To fu… | |
898 remedy the confusion issue nearly horizontal objects, we employ the area… | |
899 between the object and its horizontal bounding box (minimum circumscribed | |
900 rectangle) to guide the selection of horizontal or oriented detection fo… | |
901 object. We also propose a rotation efficient IoU loss (R-EIoU) to connec… | |
902 horizontal bounding box with the three area ratios and improve the accur… | |
903 the rotating bounding box. Experimental results on three remote sensing | |
904 datasets including HRSC2016, DOTA and UCAS-AOD and scene text including | |
905 ICDAR2015 show that our method achieves superior detection performance c… | |
906 with many state-of-the-art approaches. The code and model will be coming… | |
907 paper published. | |
908 </summary></entry><entry><id>http://arxiv.org/abs/2109.10173</id><title>… | |
909 the quality of learned policy. Hard-exploration environments are defined… | |
910 huge state space and sparse rewards. In such conditions, an exhaustive | |
911 exploration of the environment is often impossible, and the successful t… | |
912 of an agent requires a lot of interaction steps. In this paper, we propo… | |
913 exploration method called Rollback-Explore (RbExplore), which utilizes t… | |
914 concept of the persistent Markov decision process, in which agents during | |
915 training can roll back to visited states. We test our algorithm in the | |
916 hard-exploration Prince of Persia game, without rewards and domain knowl… | |
917 At all used levels of the game, our agent outperforms or shows comparable | |
918 results with state-of-the-art curiosity methods with knowledge-based int… | |
919 motivation: ICM and RND. An implementation of RbExplore can be found at | |
920 https://github.com/cds-mipt/RbExplore. | |
921 </summary></entry><entry><id>http://arxiv.org/abs/2109.10149</id><title>… | |
922 feedback methods require human assessment from facilitators or peers. Th… | |
923 not scalable to large crowds. We propose Interpretable Directed Diversit… | |
924 automatically predict ideation quality and diversity scores, and provide… | |
925 explanations - Attribution, Contrastive Attribution, and Counterfactual | |
926 Suggestions - for deeper feedback on why ideations were scored (low), an… | |
927 to get higher scores. These explanations provide multi-faceted feedback … | |
928 users iteratively improve their ideation. We conducted think aloud and | |
929 controlled user studies to understand how various explanations are used,… | |
930 evaluated whether explanations improve ideation diversity and quality. U… | |
931 appreciated that explanation feedback helped focus their efforts and pro… | |
932 directions for improvement. This resulted in explanations improving dive… | |
933 compared to no feedback or feedback with predictions only. Hence, our ap… | |
934 opens opportunities for explainable AI towards scalable and rich feedbac… | |
935 iterative crowd ideation. | |
936 </summary></entry><entry><id>http://arxiv.org/abs/2109.10129</id><title>… | |
937 domains can be expressed and learned in terms of a pool of features defi… | |
938 from the domain predicates using a description logic grammar. At the sam… | |
939 most description logics correspond to a fragment of $k$-variable countin… | |
940 ($C_k$) for $k=2$, that has been shown to provide a tight characterizati… | |
941 the expressive power of graph neural networks. In this work, we make use… | |
942 these results to understand the power and limits of using graph neural n… | |
943 (GNNs) for learning optimal general policies over a number of tractable | |
944 planning domains where such policies are known to exist. For this, we tr… | |
945 simple GNN in a supervised manner to approximate the optimal value funct… | |
946 $V^{*}(s)$ of a number of sample states $s$. As predicted by the theory,… | |
947 observed that general optimal policies are obtained in domains where gen… | |
948 optimal value functions can be defined with $C_2$ features but not in th… | |
949 requiring more expressive $C_3$ features. In addition, it is observed th… | |
950 features learned are in close correspondence with the features needed to | |
951 express $V^{*}$ in closed form. The theory and the analysis of the domai… | |
952 us understand the features that are actually learned as well as those th… | |
953 cannot be learned in this way, and let us move in a principled manner fr… | |
954 combinatorial optimization approach to learning general policies to a | |
955 potentially, more robust and scalable approach based on deep learning. | |
956 </summary></entry><entry><id>http://arxiv.org/abs/2109.10106</id><title>… | |
957 planning complex missions for heterogeneous multi-robot teams. This clas… | |
958 problems involves tasks that can be executed in different ways and are | |
959 associated with cross-schedule dependencies that constrain the schedules… | |
960 different robots in the system. The proposed approach involves a | |
961 multi-objective heuristic search of the mission, represented as a hierar… | |
962 tree that defines the mission goal. This procedure outputs several favor… | |
963 ways to fulfill the mission, which directly feed into the next stage of … | |
964 method. We propose a distributed metaheuristic based on evolutionary | |
965 computation to allocate tasks and generate schedules for the set of chos… | |
966 decompositions. The method is evaluated in a simulation setup of an auto… | |
967 greenhouse use case, where we demonstrate the method's ability to adapt … | |
968 planning strategy depending on the available robots and the given optimi… | |
969 criteria. | |
970 </summary></entry><entry><id>http://arxiv.org/abs/2109.10100</id><title>… | |
971 deep neural networks. However the computation of Fisher information matr… | |
972 becomes more and more difficult as the network structure turns large and | |
973 complex. This paper proposes a new optimization method whose main idea i… | |
974 accurately replace the natural gradient optimization by reconstructing t… | |
975 network. More specifically, we reconstruct the structure of the deep neu… | |
976 network, and optimize the new network using traditional gradient descent… | |
977 The reconstructed network achieves the effect of the optimization way wi… | |
978 natural gradient descent. Experimental results show that our optimization | |
979 method can accelerate the convergence of deep network models and achieve… | |
980 performance than GD while sharing its computational simplicity. | |
981 </summary></entry><entry><id>http://arxiv.org/abs/2109.10086</id><title>… | |
982 improving the first retriever in ranking pipelines. Learning dense embed… | |
983 to conduct retrieval using efficient approximate nearest neighbors metho… | |
984 proven to work well. Meanwhile, there has been a growing interest in lea… | |
985 \emph{sparse} representations for documents and queries, that could inhe… | |
986 from the desirable properties of bag-of-words models such as the exact m… | |
987 of terms and the efficiency of inverted indexes. Introduced recently, the | |
988 SPLADE model provides highly sparse representations and competitive resu… | |
989 with respect to state-of-the-art dense and sparse approaches. In this pa… | |
990 build on SPLADE and propose several significant improvements in terms of | |
991 effectiveness and/or efficiency. More specifically, we modify the pooling | |
992 mechanism, benchmark a model solely based on document expansion, and int… | |
993 models trained with distillation. We also report results on the BEIR ben… | |
994 Overall, SPLADE is considerably improved with more than $9$\% gains on N… | |
995 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchma… | |
996 </summary></entry><entry><id>http://arxiv.org/abs/2109.10085</id><title>… | |
997 seen increasing importance for investment decisions. Hence, investors (a… | |
998 managers and asset owners) who wanted to incorporate these issues starte… | |
999 assess companies based on how they handle such topics. For this assessme… | |
1000 investors rely on specialized rating agencies that issue ratings along t… | |
1001 environmental, social and governance (ESG) dimensions. Such ratings allo… | |
1002 to make investment decisions in favor of sustainability. However, rating | |
1003 agencies base their analysis on subjective assessment of sustainability | |
1004 reports, not provided by every company. Furthermore, due to human labor | |
1005 involved, rating agencies are currently facing the challenge to scale up… | |
1006 coverage in a timely manner. | |
1007 </summary></entry><entry><id>http://arxiv.org/abs/2109.10065</id><title>… | |
1008 based on accuracy, quickness, and consistency for antenna modelling. Usi… | |
1009 Nntool by MATLAB, 22 different combinations of networks and training alg… | |
1010 are used to predict the dimensions of a rectangular microstrip antenna u… | |
1011 dielectric constant, height of substrate, and frequency of operation as … | |
1012 Comparison and characterization of networks is done based on accuracy, m… | |
1013 square error, and training time. Algorithms, on the other hand, are anal… | |
1014 their accuracy, speed, reliability, and smoothness in the training proce… | |
1015 Finally, these results are analyzed, and recommendations are made for ea… | |
1016 neural network and algorithm based on uses, advantages, and disadvantage… | |
1017 example, it is observed that Reduced Radial Bias network is the most acc… | |
1018 network and Scaled Conjugate Gradient is the most reliable algorithm for | |
1019 electromagnetic modelling. This paper will help a researcher find the op… | |
1020 network and algorithm directly without doing time-taking experimentation. | |
1021 </summary></entry><entry><id>http://arxiv.org/abs/2109.10057</id><title>… | |
1022 network named Localization Transformer (LOTR). The proposed framework is… | |
1023 direct coordinate regression approach leveraging a Transformer network to | |
1024 better utilize the spatial information in the feature map. An LOTR model | |
1025 consists of three main modules: 1) a visual backbone that converts an in… | |
1026 image into a feature map, 2) a Transformer module that improves the feat… | |
1027 representation from the visual backbone, and 3) a landmark prediction he… | |
1028 directly predicts the landmark coordinates from the Transformer's | |
1029 representation. Given cropped-and-aligned face images, the proposed LOTR… | |
1030 trained end-to-end without requiring any post-processing steps. This pap… | |
1031 introduces the smooth-Wing loss function, which addresses the gradient | |
1032 discontinuity of the Wing loss, leading to better convergence than stand… | |
1033 loss functions such as L1, L2, and Wing loss. Experimental results on th… | |
1034 landmark dataset provided by the First Grand Challenge of 106-Point Faci… | |
1035 Landmark Localization indicate the superiority of LOTR over the existing | |
1036 methods on the leaderboard and two recent heatmap-based approaches. | |
1037 </summary></entry><entry><id>http://arxiv.org/abs/2109.10047</id><title>… | |
1038 aggregate components with shallow and simple architectures, which are li… | |
1039 by the 'over-smooth' problem. To further explore the benefits from struc… | |
1040 diversity and depth of GNN architectures, we propose a GNN generation pi… | |
1041 with a novel two-stage search space, which aims at automatically generat… | |
1042 high-performance while transferable deep GNN models in a block-wise mann… | |
1043 Meanwhile, to alleviate the 'over-smooth' problem, we incorporate multip… | |
1044 flexible residual connection in our search space and apply identity mapp… | |
1045 the basic GNN layers. For the search algorithm, we use deep-q-learning w… | |
1046 epsilon-greedy exploration strategy and reward reshaping. Extensive expe… | |
1047 on real-world datasets show that our generated GNN models outperforms ex… | |
1048 manually designed and NAS-based ones. | |
1049 </summary></entry><entry><id>http://arxiv.org/abs/2109.10034</id><title>… | |
1050 key functions. This process has often been conceptualised within the fra… | |
1051 of reinforcement learning, which has also gained prominence in machine l… | |
1052 and artificial intelligence (AI) as a way to optimise decision-making. A… | |
1053 aspect of both biological and machine reinforcement learning is the | |
1054 reactivation of previously experienced episodes, referred to as replay. … | |
1055 is important for memory consolidation in biological neural networks, and… | |
1056 to stabilising learning in deep neural networks. Here, we review recent | |
1057 developments concerning the functional roles of replay in the fields of | |
1058 neuroscience and AI. Complementary progress suggests how replay might su… | |
1059 learning processes, including generalisation and continual learning, aff… | |
1060 opportunities to transfer knowledge across the two fields to advance the | |
1061 understanding of biological and artificial learning and memory. | |
1062 </summary></entry><entry><id>http://arxiv.org/abs/2109.10020</id><title>… | |
1063 payment processing networks is essential for system monitoring. Multivar… | |
1064 time series, aggregated from the past transaction history, can provide v… | |
1065 insights for such prediction. The general multivariate time series predi… | |
1066 problem has been well studied and applied across several domains, includ… | |
1067 manufacturing, medical, and entomology. However, new domain-related chal… | |
1068 associated with the data such as concept drift and multi-modality have s… | |
1069 in addition to the real-time requirements of handling the payment transa… | |
1070 data at scale. In this work, we study the problem of multivariate time s… | |
1071 prediction for estimating transaction metrics associated with entities i… | |
1072 payment transaction database. We propose a model with five unique compon… | |
1073 estimate the transaction metrics from multi-modality data. Four of these | |
1074 components capture interaction, temporal, scale, and shape perspectives,… | |
1075 the fifth component fuses these perspectives together. We also propose a… | |
1076 offline/online training scheme to address concept drift in the data and … | |
1077 the real-time requirements. Combining the estimation model with a graphi… | |
1078 user interface, the prototype transaction metric estimation system has | |
1079 demonstrated its potential benefit as a tool for improving a payment pro… | |
1080 company's system monitoring capability. | |
1081 </summary></entry><entry><id>http://arxiv.org/abs/2109.10016</id><title>… | |
1082 This task is essential because advanced video retrieval applications sho… | |
1083 enable users to retrieve a precise moment from a large video corpus. We … | |
1084 a novel CONtextual QUery-awarE Ranking~(CONQUER) model for effective mom… | |
1085 localization and ranking. CONQUER explores query context for multi-modal… | |
1086 and representation learning in two different steps. The first step deriv… | |
1087 fusion weights for the adaptive combination of multi-modal video content… | |
1088 second step performs bi-directional attention to tightly couple video an… | |
1089 as a single joint representation for moment localization. As query conte… | |
1090 fully engaged in video representation learning, from feature fusion to | |
1091 transformation, the resulting feature is user-centered and has a larger | |
1092 capacity in capturing multi-modal signals specific to query. We conduct … | |
1093 on two datasets, TVR for closed-world TV episodes and DiDeMo for open-wo… | |
1094 user-generated videos, to investigate the potential advantages of fusing… | |
1095 and query online as a joint representation for moment retrieval. | |
1096 </summary></entry><entry><id>http://arxiv.org/abs/2109.10011</id><title>… | |
1097 intelligence, and it has been widely used to measure the abstract reason… | |
1098 ability of humans. In this paper, to study the abstract reasoning capabi… | |
1099 deep neural networks, we propose the first unsupervised learning method … | |
1100 solving RPM problems. Since the ground truth labels are not allowed, we … | |
1101 a pseudo target based on the prior constraints of the RPM formulation to | |
1102 approximate the ground truth label, which effectively converts the unsup… | |
1103 learning strategy into a supervised one. However, the correct answer is … | |
1104 labelled by the pseudo target, and thus the noisy contrast will lead to | |
1105 inaccurate model training. To alleviate this issue, we propose to improv… | |
1106 model performance with negative answers. Moreover, we develop a | |
1107 decentralization method to adapt the feature representation to different… | |
1108 problems. Extensive experiments on three datasets demonstrate that our m… | |
1109 even outperforms some of the supervised approaches. Our code is availabl… | |
1110 https://github.com/visiontao/ncd. | |
1111 </summary></entry><entry><id>http://arxiv.org/abs/2109.10007</id><title>… | |
1112 to measure the degree of similarity between scientific papers. These app… | |
1113 are intuitive, easy to put into practice, and computationally cheap. Mor… | |
1114 they have been used to generate a map of science, allowing visualizing r… | |
1115 field interactions. Nonetheless, these methods do not work unless two pa… | |
1116 share a standard reference, limiting the two papers usability with no di… | |
1117 connection. In this work, we propose to extend bibliographic coupling to… | |
1118 deep neighborhood, by using graph diffusion methods. This method allows | |
1119 defining similarity between any two papers, making it possible to genera… | |
1120 local map of science, highlighting field organization. | |
1121 </summary></entry><entry><id>http://arxiv.org/abs/2109.09975</id><title>… | |
1122 trajectories of autonomous vehicles when probabilistic predictions of ot… | |
1123 agents' futures are generated by deep neural networks (DNNs). The presen… | |
1124 methods address a wide range of representations for uncertain predictions | |
1125 including both Gaussian and non-Gaussian mixture models to predict both … | |
1126 positions and control inputs conditioned on the scene contexts. We show … | |
1127 the problem of risk assessment when Gaussian mixture models (GMMs) of ag… | |
1128 positions are learned can be solved rapidly to arbitrary levels of accur… | |
1129 with existing numerical methods. To address the problem of risk assessme… | |
1130 non-Gaussian mixture models of agent position, we propose finding upper … | |
1131 on risk using nonlinear Chebyshev's Inequality and sums-of-squares (SOS) | |
1132 programming; they are both of interest as the former is much faster whil… | |
1133 latter can be arbitrarily tight. These approaches only require higher or… | |
1134 statistical moments of agent positions to determine upper bounds on risk… | |
1135 perform risk assessment when models are learned for agent control inputs… | |
1136 opposed to positions, we propagate the moments of uncertain control inpu… | |
1137 through the nonlinear motion dynamics to obtain the exact moments of unc… | |
1138 position over the planning horizon. To this end, we construct determinis… | |
1139 linear dynamical systems that govern the exact time evolution of the mom… | |
1140 uncertain position in the presence of uncertain control inputs. The pres… | |
1141 methods are demonstrated on realistic predictions from DNNs trained on t… | |
1142 Argoverse and CARLA datasets and are shown to be effective for rapidly | |
1143 assessing the probability of low probability events. | |
1144 </summary></entry><entry><id>http://arxiv.org/abs/2109.09968</id><title>… | |
1145 games in studying natural language communication between humans and arti… | |
1146 agents. However, the generalization still remains a big challenge as the… | |
1147 depend critically on the complexity and variety of training tasks. In th… | |
1148 paper, we address this problem by introducing a hierarchical framework b… | |
1149 upon the knowledge graph-based RL agent. In the high level, a meta-polic… | |
1150 executed to decompose the whole game into a set of subtasks specified by | |
1151 textual goals, and select one of them based on the KG. Then a sub-policy… | |
1152 low level is executed to conduct goal-conditioned reinforcement learning… | |
1153 carry out experiments on games with various difficulty levels and show t… | |
1154 proposed method enjoys favorable generalizability. | |
1155 </summary></entry><entry><id>http://arxiv.org/abs/2109.09960</id><title>… | |
1156 effectively exploit the unlabeled hard regions for semi-supervised medic… | |
1157 image segmentation. The MC-Net+ model is motivated by the observation th… | |
1158 models trained with limited annotations are prone to output highly uncer… | |
1159 and easily mis-classified predictions in the ambiguous regions (e.g. adh… | |
1160 edges or thin branches) for the image segmentation task. Leveraging these | |
1161 region-level challenging samples can make the semi-supervised segmentati… | |
1162 model training more effective. Therefore, our proposed MC-Net+ model con… | |
1163 of two new designs. First, the model contains one shared encoder and mul… | |
1164 sightly different decoders (i.e. using different up-sampling strategies)… | |
1165 statistical discrepancy of multiple decoders' outputs is computed to den… | |
1166 model's uncertainty, which indicates the unlabeled hard regions. Second,… | |
1167 mutual consistency constraint is enforced between one decoder's probabil… | |
1168 output and other decoders' soft pseudo labels. In this way, we minimize … | |
1169 model's uncertainty during training and force the model to generate inva… | |
1170 and low-entropy results in such challenging areas of unlabeled data, in … | |
1171 to learn a generalized feature representation. We compared the segmentat… | |
1172 results of the MC-Net+ with five state-of-the-art semi-supervised approa… | |
1173 three public medical datasets. Extension experiments with two common | |
1174 semi-supervised settings demonstrate the superior performance of our mod… | |
1175 other existing methods, which sets a new state of the art for semi-super… | |
1176 medical image segmentation. | |
1177 </summary></entry><entry><id>http://arxiv.org/abs/2109.09946</id><title>… | |
1178 case data has long been recognized. Here, we study the problem of identi… | |
1179 and measuring biases in large-scale legal case data from an algorithmic | |
1180 fairness perspective. Our approach utilizes two regression models: A bas… | |
1181 that represents the decisions of a "typical" judge as given by the data … | |
1182 "fair" judge that applies one of three fairness concepts. Comparing the | |
1183 decisions of the "typical" judge and the "fair" judge allows for quantif… | |
1184 biases across demographic groups, as we demonstrate in four case studies… | |
1185 criminal data from Cook County (Illinois). | |
1186 </summary></entry><entry><id>http://arxiv.org/abs/2109.09906</id><title>… | |
1187 visual or audio content. This typically augments the use of technologies… | |
1188 as AI and ML by allowing to use natural speech for searching by keywords… | |
1189 video descriptions. Prior research has successfully provided a number of | |
1190 solutions for speech to text, in the case of a human speech, but this ar… | |
1191 aims to investigate possible solutions to retrieve sound events based on… | |
1192 natural language query, and estimate how effective and accurate they are… | |
1193 this study, we specifically focus on the YamNet, AlexNet, and ResNet-50 | |
1194 pre-trained models to automatically classify audio samples using their | |
1195 respective melspectrograms into a number of predefined classes. The pred… | |
1196 classes can represent sounds associated with actions within a video frag… | |
1197 Two tests are conducted to evaluate the performance of the models on two | |
1198 separate problems: audio classification and intervals retrieval based on… | |
1199 natural language query. Results show that the benchmarked models are com… | |
1200 in terms of performance, with YamNet slightly outperforming the other two | |
1201 models. YamNet was able to classify single fixed-size audio samples with… | |
1202 accuracy and 68.75% precision while its average accuracy on intervals re… | |
1203 was 71.62% and precision was 41.95%. The investigated method may be embe… | |
1204 into an automated event marking architecture for streaming services. | |
1205 </summary></entry><entry><id>http://arxiv.org/abs/2109.09904</id><title>… | |
1206 own representations, there is significant discontent about their inscrut… | |
1207 and the attendant problems in their ability to interact with humans. Whi… | |
1208 alternatives such as neuro-symbolic approaches have been proposed, there… | |
1209 lack of consensus on what they are about. There are often two independent | |
1210 motivations (i) symbols as a lingua franca for human-AI interaction and … | |
1211 symbols as (system-produced) abstractions use in its internal reasoning.… | |
1212 jury is still out on whether AI systems will need to use symbols in their | |
1213 internal reasoning to achieve general intelligence capabilities. Whateve… | |
1214 answer there is, the need for (human-understandable) symbols in human-AI | |
1215 interaction seems quite compelling. Symbols, like emotions, may well not… | |
1216 sine qua non for intelligence per se, but they will be crucial for AI sy… | |
1217 to interact with us humans--as we can neither turn off our emotions nor … | |
1218 without our symbols. In particular, in many human-designed domains, huma… | |
1219 would be interested in providing explicit (symbolic) knowledge and advic… | |
1220 expect machine explanations in kind. This alone requires AI systems to a… | |
1221 do their I/O in symbolic terms. In this blue sky paper, we argue this po… | |
1222 view, and discuss research directions that need to be pursued to allow f… | |
1223 type of human-AI interaction. | |
1224 </summary></entry><entry><id>http://arxiv.org/abs/2109.09889</id><title>… | |
1225 beyond the scope of an RL policy. Such states may make the RL system uns… | |
1226 impede its deployment in real scenarios. In this paper, we propose a sim… | |
1227 effective anomaly detection framework for deep RL algorithms that | |
1228 simultaneously considers random, adversarial and out-of-distribution~(OO… | |
1229 state outliers. In particular, we attain the class-conditional distribut… | |
1230 for each action class under the Gaussian assumption, and rely on these | |
1231 distributions to discriminate between inliers and outliers based on Maha… | |
1232 Distance~(MD) and Robust Mahalanobis Distance. We conduct extensive expe… | |
1233 on Atari games that verify the effectiveness of our detection strategies… | |
1234 the best of our knowledge, we present the first in-detail study of stati… | |
1235 and adversarial anomaly detection in deep RL algorithms. This simple uni… | |
1236 anomaly detection paves the way towards deploying safe RL systems in rea… | |
1237 applications. | |
1238 </summary></entry><entry><id>http://arxiv.org/abs/2109.09876</id><title>… | |
1239 extended actions, such as options, that can provide benefits in problems | |
1240 requiring extensive exploration. One promising approach that learns these | |
1241 options end-to-end is the option-critic (OC) framework. We examine and s… | |
1242 this paper that OC does not decompose a problem into simpler sub-problem… | |
1243 instead increases the size of the search over policy space with each opt… | |
1244 considering the entire state space during learning. This issue can resul… | |
1245 practical limitations of this method, including sample inefficient learn… | |
1246 address this problem, we introduce Context-Specific Representation Abstr… | |
1247 for Deep Option Learning (CRADOL), a new framework that considers both t… | |
1248 abstraction and context-specific representation abstraction to effective… | |
1249 reduce the size of the search over policy space. Specifically, our method | |
1250 learns a factored belief state representation that enables each option t… | |
1251 a policy over only a subsection of the state space. We test our method a… | |
1252 hierarchical, non-hierarchical, and modular recurrent neural network bas… | |
1253 demonstrating significant sample efficiency improvements in challenging | |
1254 partially observable environments. | |
1255 </summary></entry><entry><id>http://arxiv.org/abs/2109.09862</id><title>… | |
1256 pipelines (Jauhiainen et al.,2019) and is not a solved problem in real-w… | |
1257 settings. We present a lightweight and effective language identifier tha… | |
1258 robust to changes of domain and to the absence of copious training data. | |
1259 </summary></entry><entry><id>http://arxiv.org/abs/2109.09861</id><title>… | |
1260 for autonomous driving, empirical evidence shows that there are still op… | |
1261 questions around dealing with the challenges of common knowledge assumpt… | |
1262 well as modeling bounded rationality. To address some of these practical | |
1263 challenges, we develop a framework of generalized dynamic cognitive hier… | |
1264 for both modelling naturalistic human driving behavior as well as behavi… | |
1265 planning for autonomous vehicles (AV). This framework is built upon a ri… | |
1266 model of level-0 behavior through the use of automata strategies, an | |
1267 interpretable notion of bounded rationality through safety and maneuver | |
1268 satisficing, and a robust response for planning. Based on evaluation on … | |
1269 large naturalistic datasets as well as simulation of critical traffic | |
1270 scenarios, we show that i) automata strategies are well suited for level… | |
1271 behavior in a dynamic level-k framework, and ii) the proposed robust res… | |
1272 to a heterogeneous population of strategic and non-strategic reasoners c… | |
1273 an effective approach for game theoretic planning in AV. | |
1274 </summary></entry><entry><id>http://arxiv.org/abs/2109.09844</id><title>… | |
1275 monitoring of multiple sclerosis is an important component of successful | |
1276 disease management. Prior studies have established that multiple scleros… | |
1277 correlated with speech discrepancies. Early research using objective aco… | |
1278 measurements has discovered measurable dysarthria. | |
1279 </summary></entry><entry><id>http://arxiv.org/abs/2109.09833</id><title>… | |
1280 the noise-induced dynamics during training deep neural networks by | |
1281 gradient-based optimizers. Specifically, we firstly show that the stocha… | |
1282 gradient noise possesses finite variance, and therefore the classical Ce… | |
1283 Limit Theorem (CLT) applies; this indicates that the gradient noise is | |
1284 asymptotically Gaussian. Such an asymptotic result validates the wide-ac… | |
1285 assumption of Gaussian noise. We clarify that the recently observed phen… | |
1286 of heavy tails within gradient noise may not be intrinsic properties, bu… | |
1287 consequence of insufficient mini-batch size; the gradient noise, which i… | |
1288 of limited i.i.d. random variables, has not reached the asymptotic regim… | |
1289 CLT, thus deviates from Gaussian. We quantitatively measure the goodness… | |
1290 Gaussian approximation of the noise, which supports our conclusion. Seco… | |
1291 we analyze the noise-induced dynamics of stochastic gradient descent usi… | |
1292 Langevin equation, granting for momentum hyperparameter in the optimizer… | |
1293 physical interpretation. We then proceed to demonstrate the existence of… | |
1294 steady-state distribution of stochastic gradient descent and approximate… | |
1295 distribution at a small learning rate. | |
1296 </summary></entry><entry><id>http://arxiv.org/abs/2109.09829</id><title>… | |
1297 required to be processed on regular basis has pushed processing to the e… | |
1298 the computing systems. Deploying advanced Neural Networks (NN), such as … | |
1299 neural networks (DNNs) and spiking neural networks (SNNs), that offer | |
1300 state-of-the-art results on resource-constrained edge devices is challen… | |
1301 due to the stringent memory and power/energy constraints. Moreover, these | |
1302 systems are required to maintain correct functionality under diverse sec… | |
1303 and reliability threats. This paper first discusses existing approaches … | |
1304 address energy efficiency, reliability, and security issues at different… | |
1305 layers, i.e., hardware (HW) and software (SW). Afterward, we discuss how… | |
1306 further improve the performance (latency) and the energy efficiency of E… | |
1307 systems through HW/SW-level optimizations, such as pruning, quantization… | |
1308 approximation. To address reliability threats (like permanent and transi… | |
1309 faults), we highlight cost-effective mitigation techniques, like fault-a… | |
1310 training and mapping. Moreover, we briefly discuss effective detection a… | |
1311 protection techniques to address security threats (like model and data | |
1312 corruption). Towards the end, we discuss how these techniques can be com… | |
1313 in an integrated cross-layer framework for realizing robust and | |
1314 energy-efficient Edge AI systems. | |
1315 </summary></entry><entry><id>http://arxiv.org/abs/2109.09825</id><title>… | |
1316 many others, unrealized (null) arguments in certain syntactic positions … | |
1317 refer to a previously introduced entity, and are thus called anaphoric z… | |
1318 pronouns. The existing resources for studying anaphoric zero pronoun | |
1319 interpretation are however still limited. In this paper, we use five data | |
1320 augmentation methods to generate and detect anaphoric zero pronouns | |
1321 automatically. We use the augmented data as additional training material… | |
1322 two anaphoric zero pronoun systems for Arabic. Our experimental results … | |
1323 that data augmentation improves the performance of the two systems, surp… | |
1324 the state-of-the-art results. | |
1325 </summary></entry><entry><id>http://arxiv.org/abs/2109.09809</id><title>… | |
1326 machine learning systems. An increasingly popular approach has been to s… | |
1327 provide \emph{counterfactual instance explanations}. These specify close | |
1328 possible worlds in which, contrary to the facts, a person receives their | |
1329 desired decision from the machine learning system. This paper will draw … | |
1330 literature from the philosophy of science to argue that a satisfactory | |
1331 explanation must consist of both counterfactual instances and a causal e… | |
1332 (or system of equations) that support the counterfactual instances. We w… | |
1333 show that counterfactual instances by themselves explain little. We will | |
1334 further illustrate how explainable AI methods that provide both causal | |
1335 equations and counterfactual instances can successfully explain machine | |
1336 learning predictions. | |
1337 </summary></entry><entry><id>http://arxiv.org/abs/2109.09807</id><title>… | |
1338 risk associated with dynamic occlusion, i.e., occlusion caused by other | |
1339 vehicles in traffic. Based on the theory of hypergames, we develop a nov… | |
1340 multi-agent dynamic occlusion risk (DOR) measure for assessing situation… | |
1341 in dynamic occlusion scenarios. Furthermore, we present a white-box, | |
1342 scenario-based, accelerated safety validation framework for assessing sa… | |
1343 strategic planners in AV. Based on evaluation over a large naturalistic | |
1344 database, our proposed validation method achieves a 4000% speedup compar… | |
1345 direct validation on naturalistic data, a more diverse coverage, and abi… | |
1346 generalize beyond the dataset and generate commonly observed dynamic occ… | |
1347 crashes in traffic in an automated manner. | |
1348 </summary></entry><entry><id>http://arxiv.org/abs/2109.09791</id><title>… | |
1349 either numerical methods for the solution of dynamic model equations or | |
1350 data-driven artificial intelligence algorithms. Within this latter frame… | |
1351 the present paper illustrates how a deep learning method, exploiting vid… | |
1352 radar reflectivity frames as input, can be used to realize a warning mac… | |
1353 able to sound timely alarms of possible severe thunderstorm events. From… | |
1354 technical viewpoint, the computational core of this approach is the use … | |
1355 value-weighted skill score for both transforming the probabilistic outco… | |
1356 the deep neural network into binary classification and assessing the | |
1357 forecasting performances. The warning machine has been validated against | |
1358 weather radar data recorded in the Liguria region, in Italy, | |
1359 |