GopherProxy

	python_feedgen09_jnboehm.com.atom.xml - sfeed_tests - sfeed tests and RSS and A…
	git clone git://git.codemadness.org/sfeed_tests
	Log
	Files
	Refs
	README
	LICENSE
	---
	python_feedgen09_jnboehm.com.atom.xml (142453B)
	---
	1 <?xml version='1.0' encoding='UTF-8'?>
	2 <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><id>http://arxiv…
	3 networks for time series (TS) forecasting on a large set of time series.
	4 Current state-of-the-art deep ensemble models have high memory and
	5 computational requirements, hampering their use to forecast millions of …
	6 practical scenarios. We propose N-BEATS(P), a global multivariate varian…
	7 the N-BEATS model designed to allow simultaneous training of multiple
	8 univariate TS forecasting models. Our model addresses the practical limi…
	9 of related models, reducing the training time by half and memory require…
	10 a factor of 5, while keeping the same level of accuracy. We have perform…
	11 multiple experiments detailing the various ways to train our model and h…
	12 obtained results that demonstrate its capacity to support zero-shot TS
	13 forecasting, i.e., to train a neural network on a source TS dataset and …
	14 it on a different target TS dataset without retraining, which provides an
	15 efficient and reliable solution to forecast at scale even in difficult
	16 forecasting conditions.
	17 </summary></entry><entry><id>http://arxiv.org/abs/2109.02624</id><title>…
	18 and -- for shapes -- also scale, we extend generalized additive regressi…
	19 models for the shape/form of planar curves or landmark configurations. T…
	20 model respects the resulting quotient geometry of the response, employin…
	21 squared geodesic distance as loss function and a geodesic response funct…
	22 mapping the additive predictor to the shape/form space. For fitting the …
	23 we propose a Riemannian $L_2$-Boosting algorithm well-suited for a poten…
	24 large number of possibly parameter-intensive model terms, which also yie…
	25 automated model selection. We provide novel intuitively interpretable
	26 visualizations for (even non-linear) covariate effects in the shape/form…
	27 via suitable tensor based factorizations. The usefulness of the proposed
	28 framework is illustrated in an analysis of 1) astragalus shapes of wild …
	29 domesticated sheep and 2) cell forms generated in a biophysical model, a…
	30 as 3) in a realistic simulation study with response shapes and forms mot…
	31 from a dataset on bottle outlines.
	32 </summary></entry><entry><id>http://arxiv.org/abs/2107.04136</id><title>…
	33 precision matrices encodes complete information about independence and
	34 conditional independence properties. For general distributions, the cova…
	35 and precision matrices reveal correlations and so-called partial correla…
	36 between variables, but these do not, in general, have any correspondence…
	37 respect to independence properties. In this paper, we prove that, for a …
	38 class of non-Gaussian distributions, these correspondences still hold, e…
	39 for the covariance and approximately for the precision. The distribution…
	40 sometimes referred to as "nonparanormal" -- are given by diagonal
	41 transformations of multivariate normal random variables. We provide seve…
	42 analytic and numerical examples illustrating these results.
	43 </summary></entry><entry><id>http://arxiv.org/abs/2106.09370</id><title>…
	44 renewables is one of the pillars to power a carbon-neutral society by 20…
	45 However, in contrast to conventional power plants, renewable energy is s…
	46 to uncertainty raising challenges for their interaction with power syste…
	47 Scenario-based probabilistic forecasting models have become a vital tool…
	48 equip decision-makers. This paper presents to the power systems forecast…
	49 practitioners a recent deep learning technique, the normalizing flows, to
	50 produce accurate scenario-based probabilistic forecasts that are crucial…
	51 face the new challenges in power systems applications. The strength of t…
	52 technique is to directly learn the stochastic multivariate distribution …
	53 underlying process by maximizing the likelihood. Through comprehensive
	54 empirical evaluations using the open data of the Global Energy Forecasti…
	55 Competition 2014, we demonstrate that this methodology is competitive wi…
	56 other state-of-the-art deep learning generative models: generative adver…
	57 networks and variational autoencoders. The models producing weather-base…
	58 solar power, and load scenarios are properly compared in terms of foreca…
	59 value by considering the case study of an energy retailer and quality us…
	60 several complementary metrics. The numerical experiments are simple and …
	61 reproducible. Thus, we hope it will encourage other forecasting practiti…
	62 to test and use normalizing flows in power system applications such as b…
	63 on electricity markets, scheduling power systems with high renewable ene…
	64 sources penetration, energy management of virtual power plan or microgri…
	65 unit commitment.
	66 </summary></entry><entry><id>http://arxiv.org/abs/2105.14367</id><title>…
	67 probability of an event conditioned on some inputs. A neural network (NN…
	68 be used to compute the output distribution for continuous-domain, but it…
	69 difficult to explicitly approximate a free-form one without knowing the
	70 information of its general form a priori. In order to fit an arbitrary
	71 conditional distribution, discretizing the continuous domain into bins i…
	72 effective strategy, as long as we have sufficiently narrow bins and very…
	73 data. However, collecting enough data is often hard to reach and falls f…
	74 short of that ideal in many circumstances, especially in multivariate CD…
	75 the curse of dimensionality. In this paper, we demonstrate the benefits …
	76 modeling free-form conditional distributions using a deconvolution-based…
	77 net framework, coping with data deficiency problems in discretization. I…
	78 the advantage of being flexible but also takes advantage of the hierarch…
	79 smoothness offered by the deconvolution layers. We compare our method to…
	80 number of other density-estimation approaches and show that our Deconvol…
	81 Density Network (DDN) outperforms the competing methods on many univaria…
	82 multivariate tasks.
	83 </summary></entry><entry><id>http://arxiv.org/abs/2102.07767</id><title>…
	84 agents seeks to agree on a set of hypotheses that best describes a seque…
	85 private observations. In the scenario where the set of hypotheses is lar…
	86 propose a belief update rule where agents share compressed (either spars…
	87 quantized) beliefs with an arbitrary positive compression rate. Our algo…
	88 leverages a unified communication rule that enables agents to access
	89 wide-ranging compression operators as black-box modules. We prove the al…
	90 sure asymptotic exponential convergence of beliefs around the set of opt…
	91 hypotheses. Additionally, we show a non-asymptotic, explicit, and linear
	92 concentration rate in probability of the beliefs on the optimal hypothes…
	93 We provide numerical experiments to illustrate the communication benefit…
	94 our method. The simulation results show that the number of transmitted b…
	95 be reduced to 5-10% of the non-compressed method in the studied scenario…
	96 </summary></entry><entry><id>http://arxiv.org/abs/2012.15059</id><title>…
	97 models that are trained across sets of time series, known as Global Fore…
	98 Models (GFM), are regularly outperforming traditional univariate forecas…
	99 models that work on isolated series. As GFMs usually share the same set …
	100 parameters across all time series, they often have the problem of not be…
	101 localised enough to a particular series, especially in situations where
	102 datasets are heterogeneous. We study how ensembling techniques can be us…
	103 generic GFMs and univariate models to solve this issue. Our work systema…
	104 and compares relevant current approaches, namely clustering series and t…
	105 separate submodels per cluster, the so-called ensemble of specialists ap…
	106 and building heterogeneous ensembles of global and local models. We fill…
	107 gaps in the existing GFM localisation approaches, in particular by
	108 incorporating varied clustering techniques such as feature-based cluster…
	109 distance-based clustering and random clustering, and generalise them to …
	110 different underlying GFM model types. We then propose a new methodology …
	111 clustered ensembles where we train multiple GFMs on different clusters of
	112 series, obtained by changing the number of clusters and cluster seeds. U…
	113 Feed-forward Neural Networks, Recurrent Neural Networks, and Pooled Regr…
	114 models as the underlying GFMs, in our evaluation on eight publicly avail…
	115 datasets, the proposed models are able to achieve significantly higher a…
	116 than baseline GFM models and univariate forecasting methods.
	117 </summary></entry><entry><id>http://arxiv.org/abs/2009.13267</id><title>…
	118 such as BLEU score has been studied before for autoregressive neural mac…
	119 translation (NMT) and resulted in alternative training algorithms (Ranza…
	120 al., 2016; Norouzi et al., 2016; Shen et al., 2016; Wu et al., 2018). Ho…
	121 MLE training remains the de facto approach for autoregressive NMT becaus…
	122 its computational efficiency and stability. Despite this mismatch betwee…
	123 training objective and task measure, we notice that the samples drawn fr…
	124 MLE-based trained NMT support the desired distribution -- there are samp…
	125 with much higher BLEU score comparing to the beam decoding output. To be…
	126 from this observation, we train an energy-based model to mimic the behav…
	127 the task measure (i.e., the energy-based model assigns lower energy to s…
	128 with higher BLEU score), which is resulted in a re-ranking algorithm bas…
	129 the samples drawn from NMT: energy-based re-ranking (EBR). We use both m…
	130 energy models (over target sentence) and joint energy models (over both …
	131 and target sentences). Our EBR with the joint energy model consistently
	132 improves the performance of the Transformer-based NMT: +4 BLEU points on
	133 IWSLT'14 German-English, +3.0 BELU points on Sinhala-English, +1.2 BLEU …
	134 WMT'16 English-German tasks.
	135 </summary></entry><entry><id>http://arxiv.org/abs/2005.11079</id><title>…
	136 neural networks (GNNs) have been extensively explored. However, most exi…
	137 GNNs inherently suffer from the limitations of over-smoothing, non-robus…
	138 and weak-generalization when labeled nodes are scarce. In this paper, we
	139 propose a simple yet effective framework -- GRAPH RANDOM NEURAL NETWORKS
	140 (GRAND) -- to address these issues. In GRAND, we first design a random
	141 propagation strategy to perform graph data augmentation. Then we leverage
	142 consistency regularization to optimize the prediction consistency of unl…
	143 nodes across different data augmentations. Extensive experiments on graph
	144 benchmark datasets suggest that GRAND significantly outperforms
	145 state-of-the-art GNN baselines on semi-supervised node classification. F…
	146 we show that GRAND mitigates the issues of over-smoothing and non-robust…
	147 exhibiting better generalization behavior than existing GNNs. The source…
	148 of GRAND is publicly available at https://github.com/Grand20/grand.
	149 </summary></entry><entry><id>http://arxiv.org/abs/2004.14427</id><title>…
	150 restless bandits with average reward, using the paradigms of Q-learning …
	151 Whittle index. Specifically, we leverage the structure of the Whittle in…
	152 policy to reduce the search space of Q-learning, resulting in major
	153 computational gains. Rigorous convergence analysis is provided, supporte…
	154 numerical experiments. The numerical experiments show excellent empirical
	155 performance of the proposed scheme.
	156 </summary></entry><entry><id>http://arxiv.org/abs/2003.05738</id><title>…
	157 state and action spaces. Multi-agent reinforcement learning attempts to …
	158 this challenge by distributing control to specialized agents. However,
	159 specialization hinders generalization and transferability, and the
	160 computational graphs underlying neural-networks architectures -- dominat…
	161 the multi-agent setting -- do not offer the flexibility to handle an arb…
	162 number of entities which changes both between road networks, and over ti…
	163 vehicles traverse the network. We introduce Inductive Graph Reinforcement
	164 Learning (IG-RL) based on graph-convolutional networks which adapts to t…
	165 structure of any road network, to learn detailed representations of
	166 traffic-controllers and their surroundings. Our decentralized approach e…
	167 learning of a transferable-adaptive-traffic-signal-control policy. After…
	168 trained on an arbitrary set of road networks, our model can generalize t…
	169 road networks, traffic distributions, and traffic regimes, with no addit…
	170 training and a constant number of parameters, enabling greater scalabili…
	171 compared to prior methods. Furthermore, our approach can exploit the
	172 granularity of available data by capturing the (dynamic) demand at both …
	173 lane and the vehicle levels. The proposed method is tested on both road
	174 networks and traffic settings never experienced during training. We comp…
	175 IG-RL to multi-agent reinforcement learning and domain-specific baseline…
	176 both synthetic road networks and in a larger experiment involving the co…
	177 of the 3,971 traffic signals of Manhattan, we show that different
	178 instantiations of IG-RL outperform baselines.
	179 </summary></entry><entry><id>http://arxiv.org/abs/1905.10029</id><title>…
	180 data. However, they have been recently shown to be vulnerable to topolog…
	181 attacks. To enhance adversarial robustness, we go beyond spectral graph …
	182 to robust graph theory. By challenging the classical graph Laplacian, we
	183 propose a new convolution operator that is provably robust in the spectr…
	184 domain and is incorporated in the GCN architecture to improve expressivi…
	185 interpretability. By extending the original graph to a sequence of graph…
	186 also propose a robust training paradigm that encourages transferability …
	187 graphs that span a range of spatial and spectral characteristics. The pr…
	188 approaches are demonstrated in extensive experiments to simultaneously i…
	189 performance in both benign and adversarial situations.
	190 </summary></entry><entry><id>http://arxiv.org/abs/2109.10319</id><title>…
	191 physiology and computer science. However, at present, most network analy…
	192 ignores the direction. In this paper, we construct a spectral clustering…
	193 based on the singular decomposition of the adjacency matrix to detect co…
	194 in directed stochastic block model (DiSBM). By considering a sparsity
	195 parameter, under some mild conditions, we show the proposed approach can
	196 consistently recover hidden row and column communities for different sca…
	197 degrees.
	198 </summary></entry><entry><id>http://arxiv.org/abs/2109.10298</id><title>…
	199 Linear Unit (ReLU) Neural Network (NN) architecture (number of layers and
	200 number of neurons per layer) with the assurance that it is sufficiently
	201 parametrized to control a nonlinear system; i.e. control the system to s…
	202 a given formal specification. This is unlike current techniques, which p…
	203 no assurances on the resultant architecture. Moreover, our approach requ…
	204 only limited knowledge of the underlying nonlinear system and specificat…
	205 assume only that the specification can be satisfied by a Lipschitz-conti…
	206 controller with a known bound on its Lipschitz constant; the specific
	207 controller need not be known. From this assumption, we bound the number …
	208 affine functions needed to construct a Continuous Piecewise Affine (CPWA)
	209 function that can approximate any Lipschitz-continuous controller that
	210 satisfies the specification. Then we connect this CPWA to a NN architect…
	211 using the authors' recent results on the Two-Level Lattice (TLL) NN
	212 architecture; the TLL architecture was shown to be parameterized by the …
	213 of affine functions present in the CPWA function it realizes.
	214 </summary></entry><entry><id>http://arxiv.org/abs/2109.10279</id><title>…
	215 challenging setup in applied machine learning. Even though model interpr…
	216 has attracted more attention in recent years, many modeling approaches s…
	217 focus mainly on performance. To further improve the interpretability of …
	218 learning models, we suggest the adoption of concepts and tools from the
	219 well-established framework of component based multiblock analysis, also …
	220 as chemometrics. Nevertheless, artificial neural networks provide greater
	221 flexibility in model architecture and thus, often deliver superior predi…
	222 performance. In this study, we propose a setup to transfer the concepts …
	223 component based statistical models, including multiblock variants of pri…
	224 component regression and partial least squares regression, to neural net…
	225 architectures. Thereby, we combine the flexibility of neural networks wi…
	226 concepts for interpreting block relevance in multiblock methods. In two …
	227 cases we demonstrate how the concept can be implemented in practice, and
	228 compare it to both common feed-forward neural networks without blocks, a…
	229 as statistical component based multiblock methods. Our results underline…
	230 multiblock networks allow for basic model interpretation while matching …
	231 performance of ordinary feed-forward neural networks.
	232 </summary></entry><entry><id>http://arxiv.org/abs/2109.10262</id><title>…
	233 reverse-mode automatic differentiation. We use this operator to generali…
	234 several optimization algorithms, including a straightforward generalizat…
	235 gradient descent and a novel generalization of Newton's method. We then …
	236 which properties of these algorithms are preserved in this generalized s…
	237 First, we show that the transformation invariances of these algorithms a…
	238 preserved: while generalized Newton's method is invariant to all inverti…
	239 linear transformations, generalized gradient descent is invariant only to
	240 orthogonal linear transformations. Next, we show that we can express the…
	241 in loss of generalized gradient descent with an inner product-like expre…
	242 thereby generalizing the non-increasing and convergence properties of the
	243 gradient descent optimization flow. Finally, we include several numerical
	244 experiments to illustrate the ideas in the paper and demonstrate how we …
	245 them to optimize polynomial functions over an ordered ring.
	246 </summary></entry><entry><id>http://arxiv.org/abs/2109.10254</id><title>…
	247 tasks, there is a greater need for accurate quantification of predictive
	248 uncertainty. While the common goal in uncertainty quantification (UQ) in
	249 machine learning is to approximate the true distribution of the target d…
	250 many works in UQ tend to be disjoint in the evaluation metrics utilized,…
	251 disparate implementations for each metric lead to numerical results that…
	252 not directly comparable across different works. To address this, we intr…
	253 Uncertainty Toolbox, an open-source python library that helps to assess,
	254 visualize, and improve UQ. Uncertainty Toolbox additionally provides
	255 pedagogical resources, such as a glossary of key terms and an organized
	256 collection of key paper references. We hope that this toolbox is useful …
	257 accelerating and uniting research efforts in uncertainty in machine lear…
	258 </summary></entry><entry><id>http://arxiv.org/abs/2109.10219</id><title>…
	259 are available. Physical experiments or detailed simulations that accurat…
	260 capture the behavior of the system are regarded as high-fidelity models …
	261 low model uncertainty, however, they are expensive to run. On the other …
	262 simplified physical experiments or numerical models are seen as low-fide…
	263 models that are cheaper to evaluate. Although low-fidelity models are of…
	264 suitable for direct use in reliability analysis due to their low accurac…
	265 can offer information about the trend of the high-fidelity model thus pr…
	266 the opportunity to explore the design space at a low cost. This study pr…
	267 a new approach called adaptive multi-fidelity Gaussian process for relia…
	268 analysis (AMGPRA). Contrary to selecting training points and information
	269 sources in two separate stages as done in state-of-the-art mfEGRA method…
	270 proposed approach finds the optimal training point and information source
	271 simultaneously using the novel collective learning function (CLF). CLF i…
	272 to assess the global impact of a candidate training point from an inform…
	273 source and it accommodates any learning function that satisfies a certain
	274 profile. In this context, CLF provides a new direction for quantifying t…
	275 impact of new training points and can be easily extended with new learni…
	276 functions to adapt to different reliability problems. The performance of…
	277 proposed method is demonstrated by three mathematical examples and one
	278 engineering problem concerning the wind reliability of transmission towe…
	279 is shown that the proposed method achieves similar or higher accuracy wi…
	280 reduced computational costs compared to state-of-the-art single and
	281 multi-fidelity methods. A key application of AMGPRA is high-fidelity fra…
	282 modeling using complex and costly physics-based computational models.
	283 </summary></entry><entry><id>http://arxiv.org/abs/2109.10162</id><title>…
	284 $\varepsilon,\delta\in(0,1)$, a bounded function $f:\{-1,1\}^n\to[-1,1]$…
	285 degree at most $d$ can be learned with probability at least $1-\delta$ a…
	286 $L_2$-error $\varepsilon$ using $\log(\tfrac{n}{\delta})\,\varepsilon^{-…
	287 C^{d^{3/2}\sqrt{\log d}}$ random queries for a universal finite constant…
	288 </summary></entry><entry><id>http://arxiv.org/abs/2109.09988</id><title>…
	289 has many practical applications. With existing classifiers we may be abl…
	290 accurately classify signals, however that accuracy may decline if using a
	291 reduced number of attributes. Transforming the data then undertaking red…
	292 in dimensionality may improve the quality of the data analysis, decrease…
	293 required for classification and simplify models. We propose an approach,…
	294 chooses suitable wavelets to transform the data, then combines the outpu…
	295 these transforms to construct a dataset to then apply ensemble classifie…
	296 We demonstrate this on different data sets, across different classifiers…
	297 use differing evaluation methods. Our experimental results demonstrate t…
	298 effectiveness of the proposed technique, compared to the approaches that…
	299 either raw signal data or a single wavelet transform.
	300 </summary></entry><entry><id>http://arxiv.org/abs/2109.09859</id><title>…
	301 covariates, and the associated nonconvex problem of fitting these models…
	302 data. We develop a general recipe for analyzing the convergence of itera…
	303 algorithms for this task from a random initialization. In particular, pr…
	304 each iteration can be written as the solution to a convex optimization p…
	305 satisfying some natural conditions, we leverage Gaussian comparison theo…
	306 derive a deterministic sequence that provides sharp upper and lower boun…
	307 the error of the algorithm with sample-splitting. Crucially, this determ…
	308 sequence accurately captures both the convergence rate of the algorithm …
	309 eventual error floor in the finite-sample regime, and is distinct from t…
	310 commonly used "population" sequence that results from taking the
	311 infinite-sample limit. We apply our general framework to derive several
	312 concrete consequences for parameter estimation in popular statistical mo…
	313 including phase retrieval and mixtures of regressions. Provided the samp…
	314 scales near-linearly in the dimension, we show sharp global convergence …
	315 for both higher-order algorithms based on alternating updates and first-…
	316 algorithms based on subgradient descent. These corollaries, in turn, yie…
	317 multiple consequences, including: (a) Proof that higher-order algorithms…
	318 converge significantly faster than their first-order counterparts (and
	319 sometimes super-linearly), even if the two share the same population upd…
	320 (b) Intricacies in super-linear convergence behavior for higher-order
	321 algorithms, which can be nonstandard (e.g., with exponent 3/2) and sensi…
	322 the noise level in the problem. We complement these results with extensi…
	323 numerical experiments, which show excellent agreement with our theoretic…
	324 predictions.
	325 </summary></entry><entry><id>http://arxiv.org/abs/2109.09856</id><title>…
	326 proposed for predicting failure of a system or device with multivariate …
	327 series sensor data. We treat the multivariate time series sensor data as…
	328 for both visualization and computation. Failure follows various patterns…
	329 are closely related to the root causes. Different predefined transformat…
	330 are applied on the original sensors data to better characterize the fail…
	331 patterns. In addition to feature derivation, ensemble method is used to …
	332 improve the performance. In addition, a general algorithm architecture o…
	333 neural network is proposed to handle multiple types of data with less ma…
	334 feature engineering. We apply the proposed method on the early predict f…
	335 of computer disk drive in order to improve storage systems availability …
	336 avoid data loss. The classification accuracy is largely improved with the
	337 enriched features, named smart features.
	338 </summary></entry><entry><id>http://arxiv.org/abs/2109.09855</id><title>…
	339 actions, dubbed R(MA)^2B. The state of each arm evolves according to a
	340 controlled Markov decision process (MDP), and the reward of pulling an a…
	341 depends on both the current state of the corresponding MDP and the action
	342 taken. The goal is to sequentially choose actions for arms so as to maxi…
	343 the expected value of the cumulative rewards collected. Since finding the
	344 optimal policy is typically intractable, we propose a computationally ap…
	345 index policy which we call Occupancy-Measured-Reward Index Policy. Our p…
	346 is well-defined even if the underlying MDPs are not indexable. We prove …
	347 is asymptotically optimal when the activation budget and number of arms …
	348 scaled up, while keeping their ratio as a constant. For the case when the
	349 system parameters are unknown, we develop a learning algorithm. Our lear…
	350 algorithm uses the principle of optimism in the face of uncertainty and …
	351 uses a generative model in order to fully exploit the structure of
	352 Occupancy-Measured-Reward Index Policy. We call it the R(MA)^2B-UCB algo…
	353 As compared with the existing algorithms, R(MA)^2B-UCB performs close to…
	354 offline optimum policy, and also achieves a sub-linear regret with a low
	355 computational complexity. Experimental results show that R(MA)^2B-UCB
	356 outperforms the existing algorithms in both regret and run time.
	357 </summary></entry><entry><id>http://arxiv.org/abs/2109.09847</id><title>…
	358 interpreting machine learning models, with strong theoretical guarantees
	359 (consistency, local accuracy) and a wide availability of implementations…
	360 use cases. Even though computing SHAP values takes exponential time in g…
	361 TreeSHAP takes polynomial time on tree-based models. While the speedup is
	362 significant, TreeSHAP can still dominate the computation time of industr…
	363 machine learning solutions on datasets with millions or more entries, ca…
	364 delays in post-hoc model diagnosis and interpretation service. In this p…
	365 present two new algorithms, Fast TreeSHAP v1 and v2, designed to improve…
	366 computational efficiency of TreeSHAP for large datasets. We empirically …
	367 that Fast TreeSHAP v1 is 1.5x faster than TreeSHAP while keeping the mem…
	368 cost unchanged. Similarly, Fast TreeSHAP v2 is 2.5x faster than TreeSHAP…
	369 the cost of a slightly higher memory usage, thanks to the pre-computatio…
	370 expensive TreeSHAP steps. We also show that Fast TreeSHAP v2 is well-sui…
	371 multi-time model interpretations, resulting in as high as 3x faster expl…
	372 of newly incoming samples.
	373 </summary></entry><entry><id>http://arxiv.org/abs/2109.09831</id><title>…
	374 algorithms, can substantially impact their performance. To support users…
	375 determining well-performing hyperparameter configurations for their algo…
	376 datasets and applications at hand, SMAC3 offers a robust and flexible fr…
	377 for Bayesian Optimization, which can improve performance within a few
	378 evaluations. It offers several facades and pre-sets for typical use case…
	379 as optimizing hyperparameters, solving low dimensional continuous (artif…
	380 global optimization problems and configuring algorithms to perform well …
	381 multiple problem instances. The SMAC3 package is available under a permi…
	382 BSD-license at https://github.com/automl/SMAC3.
	383 </summary></entry><entry><id>http://arxiv.org/abs/2109.09816</id><title>…
	384 systems. In the beginning, the recommender and rational users have diffe…
	385 pieces of knowledge, and the recommender needs to learn the users' knowl…
	386 make better recommendations. The recommender learns users' knowledge by
	387 observing whether each user followed or deviated from her recommendation…
	388 show that learning frequently stalls if the recommender always recommend…
	389 choice: users tend to follow the recommendation blindly, and their choic…
	390 not reflect their knowledge. Social welfare and the learning rate are im…
	391 drastically if the recommender abstains from recommending a choice when …
	392 predicts that multiple arms will produce a similar payoff.
	393 </summary></entry><entry><id>http://arxiv.org/abs/2011.02602</id><title>…
	394 of small businesses and online shops. When processing these digital
	395 transactions, recognizing each merchant's real identity (i.e., business …
	396 is vital to ensure the integrity of payment processing systems. Conventi…
	397 this problem is formulated as a time series classification problem solel…
	398 the merchant transaction history. However, with the large scale of the d…
	399 and changing behaviors of merchants and consumers over time, it is extre…
	400 challenging to achieve satisfying performance from off-the-shelf classif…
	401 methods. In this work, we approach this problem from a multi-modal learn…
	402 perspective, where we use not only the merchant time series data but als…
	403 information of merchant-merchant relationship (i.e., affinity) to verify…
	404 self-reported business type (i.e., merchant category) of a given merchan…
	405 Specifically, we design two individual encoders, where one is responsibl…
	406 encoding temporal information and the other is responsible for affinity
	407 information, and a mechanism to fuse the outputs of the two encoders to
	408 accomplish the identification task. Our experiments on real-world credit…
	409 transaction data between 71,668 merchants and 433,772,755 customers have
	410 demonstrated the effectiveness and efficiency of the proposed model.
	411 </summary></entry><entry><id>http://arxiv.org/abs/2007.05303</id><title>…
	412 provide critical insights for payment processing companies. The capabili…
	413 predicting merchants' future is crucial for fraud detection and recommen…
	414 systems. Conventionally, this problem is formulated to predict one multi…
	415 time series under the multi-horizon setting. However, real-world applica…
	416 often require more than one future trend prediction considering the
	417 uncertainties, where more than one multivariate time series needs to be
	418 predicted. This problem is called multi-future prediction. In this work,…
	419 combine the two research directions and propose to study this new proble…
	420 multi-future, multi-horizon and multivariate time series prediction. This
	421 problem is crucial as it has broad use cases in the financial industry to
	422 reduce the risk while improving user experience by providing alternative
	423 futures. This problem is also challenging as now we not only need to cap…
	424 the patterns and insights from the past but also train a model that has a
	425 strong inference capability to project multiple possible outcomes. To so…
	426 this problem, we propose a new model using convolutional neural networks…
	427 simple yet effective encoder-decoder structure to learn the time series …
	428 from multiple perspectives. We use experiments on real-world merchant
	429 transaction data to demonstrate the effectiveness of our proposed model.…
	430 also provide extensive discussions on different model design choices in …
	431 experimental section.
	432 </summary></entry><entry><id>http://arxiv.org/abs/2109.09690</id><title>…
	433 fast uncertainty estimates for predictions with Deep Neural Networks (DN…
	434 Our main contribution is a practical and principled combination of DNNs …
	435 sparse Gaussian Processes (GPs). We prove theoretically that DNNs can be…
	436 as a special case of sparse GPs, namely mixtures of GP experts (MoE-GP),…
	437 devise a learning algorithm that brings the derived theory into practice…
	438 experiments from two different robotic tasks -- inverse dynamics of a
	439 manipulator and object detection on a micro-aerial vehicle (MAV) -- we s…
	440 effectiveness of our approach in terms of predictive uncertainty, improv…
	441 scalability, and run-time efficiency on a Jetson TX2. We thus argue that…
	442 approach can pave the way towards reliable and fast robot learning syste…
	443 uncertainty awareness.
	444 </summary></entry><entry><id>http://arxiv.org/abs/2109.09658</id><title>…
	445 extensive amount of data generated by today's clinical systems, has led …
	446 development of imaging AI solutions across the whole value chain of medi…
	447 imaging, including image reconstruction, medical image segmentation,
	448 image-based diagnosis and treatment planning. Notwithstanding the succes…
	449 future potential of AI in medical imaging, many stakeholders are concern…
	450 the potential risks and ethical implications of imaging AI solutions, wh…
	451 perceived as complex, opaque, and difficult to comprehend, utilise, and …
	452 in critical clinical applications. Despite these concerns and risks, the…
	453 currently no concrete guidelines and best practices for guiding future AI
	454 developments in medical imaging towards increased trust, safety and adop…
	455 To bridge this gap, this paper introduces a careful selection of guiding
	456 principles drawn from the accumulated experiences, consensus, and best
	457 practices from five large European projects on AI in Health Imaging. The…
	458 guiding principles are named FUTURE-AI and its building blocks consist o…
	459 Fairness, (ii) Universality, (iii) Traceability, (iv) Usability, (v) Rob…
	460 and (vi) Explainability. In a step-by-step approach, these guidelines are
	461 further translated into a framework of concrete recommendations for spec…
	462 developing, evaluating, and deploying technically, clinically and ethica…
	463 trustworthy AI solutions into clinical practice.
	464 </summary></entry><entry><id>http://arxiv.org/abs/2109.09105</id><title>…
	465 including spoken language understanding (SLU). Spoken language requires …
	466 understanding of speaker interactions, dialog states and speech induced
	467 multimodal behaviors to generate a meaningful representation of the
	468 conversation. In this work, we propose to dissect SLU into three represe…
	469 properties:conversational (disfluency, pause, overtalk), channel (speake…
	470 turn-tasks) and ASR (insertion, deletion,substitution). We probe BERT ba…
	471 language models (BERT, RoBERTa) trained on spoken transcripts to investi…
	472 its ability to understand multifarious properties in absence of any spee…
	473 cues. Empirical results indicate that LM is surprisingly good at capturi…
	474 conversational properties such as pause prediction and overtalk detectio…
	475 lexical tokens. On the downsides, the LM scores low on turn-tasks and ASR
	476 errors predictions. Additionally, pre-training the LM on spoken transcri…
	477 restrain its linguistic understanding. Finally, we establish the efficac…
	478 transferability of the mentioned properties on two benchmark datasets:
	479 Switchboard Dialog Act and Disfluency datasets.
	480 </summary></entry><entry><id>http://arxiv.org/abs/2109.07436</id><title>…
	481 errors and deviations in execution if there is uncertainty in identifyin…
	482 state. So an algorithm that computes a policy for a human to execute oug…
	483 consider these effects in its computations. An optimal MDP policy that is
	484 poorly executed (because of a human agent) maybe much worse than another…
	485 that is executed with fewer errors. In this paper, we consider the probl…
	486 erroneous execution and execution delay when computing policies for a hu…
	487 agent that would act in a setting modeled by a Markov Decision Process. …
	488 present a framework to model the likelihood of policy execution errors a…
	489 likelihood of non-policy actions like inaction (delays) due to state
	490 uncertainty. This is followed by a hill climbing algorithm to search for…
	491 policies that account for these errors. We then use the best policy foun…
	492 hill climbing with a branch and bound algorithm to find the optimal poli…
	493 show experimental results in a Gridworld domain and analyze the performa…
	494 the two algorithms. We also present human studies that verify if our
	495 assumptions on policy execution by humans under state-aliasing are reaso…
	496 </summary></entry><entry><id>http://arxiv.org/abs/2109.01134</id><title>…
	497 for representation learning. It shifts from the tradition of using image…
	498 discrete labels for learning a fixed set of weights, seen as visual conc…
	499 to aligning images and raw text for two separate encoders. Such a paradi…
	500 benefits from a broader source of supervision and allows zero-shot trans…
	501 downstream tasks since visual concepts can be diametrically generated fr…
	502 natural language, known as prompt. In this paper, we identify that a maj…
	503 challenge of deploying such models in practice is prompt engineering. Th…
	504 because designing a proper prompt, especially for context words surround…
	505 class name, requires domain expertise and typically takes a significant …
	506 of time for words tuning since a slight change in wording could have a h…
	507 impact on performance. Moreover, different downstream tasks require spec…
	508 designs, further hampering the efficiency of deployment. To overcome this
	509 challenge, we propose a novel approach named context optimization (CoOp)…
	510 main idea is to model context in prompts using continuous representation…
	511 perform end-to-end learning from data while keeping the pre-trained para…
	512 fixed. In this way, the design of task-relevant prompts can be fully aut…
	513 Experiments on 11 datasets show that CoOp effectively turns pre-trained
	514 vision-language models into data-efficient visual learners, requiring as…
	515 one or two shots to beat hand-crafted prompts with a decent margin and a…
	516 gain significant improvements when using more shots (e.g., at 16 shots t…
	517 average gain is around 17% with the highest reaching over 50%). CoOp also
	518 exhibits strong robustness to distribution shift.
	519 </summary></entry><entry><id>http://arxiv.org/abs/2108.09432</id><title>…
	520 deformation shape generators. The key idea is to enforce the preservatio…
	521 local rigidity among the generated shapes. Our approach builds on an
	522 approximation of the as-rigid-as possible (or ARAP) deformation energy. …
	523 how to develop the unsupervised loss via a spectral decomposition of the
	524 Hessian of the ARAP energy. Our loss nicely decouples pose and shape var…
	525 through a robust norm. The loss admits simple closed-form expressions. I…
	526 easy to train and can be plugged into any standard generation models, e.…
	527 variational auto-encoder (VAE) and auto-decoder (AD). Experimental resul…
	528 that our approach outperforms existing shape generation approaches consi…
	529 on public benchmark datasets of various shape categories such as human, …
	530 and bone.
	531 </summary></entry><entry><id>http://arxiv.org/abs/2107.11913</id><title>…
	532 has become the subject of interest of academia, government, and industry.
	533 Efforts towards measuring different phenomena have gained traction in th…
	534 community, as illustrated by the publication of several influential field
	535 reports and policy documents. These metrics are designed to help decision
	536 takers to inform themselves about the fast-moving and impacting influenc…
	537 key advances in Artificial Intelligence in general and Machine Learning …
	538 particular. In this paper we propose to use such newfound capabilities o…
	539 technologies to augment our AI measuring capabilities. We do so by train…
	540 model to classify publications related to ethical issues and concerns. I…
	541 methodology we use an expert, manually curated dataset as the training s…
	542 then evaluate a large set of research papers. Finally, we highlight the
	543 implications of AI metrics, in particular their contribution towards dev…
	544 trustful and fair AI-based tools and technologies. Keywords: AI Ethics; …
	545 Fairness; AI Measurement. Ethics in Computer Science.
	546 </summary></entry><entry><id>http://arxiv.org/abs/2107.04775</id><title>…
	547 high-dimensional environments to learn complex tasks, but can often exhi…
	548 unsafe behaviors and require extensive environment interaction when expl…
	549 is unconstrained. A promising strategy for learning in dynamically uncer…
	550 environments is requiring that the agent can robustly return to learned …
	551 sets, where task success (and therefore safety) can be guaranteed. While…
	552 approach has been successful in low-dimensions, enforcing this constrain…
	553 environments with visual observations is exceedingly challenging. We pre…
	554 novel continuous representation for safe sets by framing it as a binary
	555 classification problem in a learned latent space, which flexibly scales …
	556 image observations. We then present a new algorithm, Latent Space Safe S…
	557 (LS3), which uses this representation for long-horizon tasks with sparse
	558 rewards. We evaluate LS3 on 4 domains, including a challenging sequential
	559 pushing task in simulation and a physical cable routing task. We find th…
	560 can use prior task successes to restrict exploration and learn more effi…
	561 than prior algorithms while satisfying constraints. See
	562 https://tinyurl.com/latent-ss for code and supplementary material.
	563 </summary></entry><entry><id>http://arxiv.org/abs/2106.07857</id><title>…
	564 human-robot interaction. Current researches in this field mainly focus on
	565 generating responses consistent with the robot's pre-assigned persona, w…
	566 ignoring the user's persona. Such responses may be inappropriate or even
	567 offensive, which may lead to the bad user experience. Therefore, we prop…
	568 Bilateral Personalized Dialogue Generation (BPDG) method for dyadic
	569 conversation, which integrates user and robot personas into dialogue gen…
	570 via designing a dynamic persona-aware fusion method. To bridge the gap b…
	571 the learning objective function and evaluation metrics, the Conditional …
	572 Information Maximum (CMIM) criterion is adopted with contrastive learnin…
	573 select the proper response from the generated candidates. Moreover, a bi…
	574 persona accuracy metric is designed to measure the degree of bilateral
	575 personalization. Experimental results demonstrate that, compared with se…
	576 state-of-the-art methods, the final results of the proposed method are m…
	577 personalized and consistent with bilateral personas in terms of both aut…
	578 and manual evaluations.
	579 </summary></entry><entry><id>http://arxiv.org/abs/2105.15033</id><title>…
	580 and conceptual knowledge, especially in the medical domain. However, the…
	581 of high-quality annotated corpora remains a crucial problem for advancin…
	582 research and applications on this task. In order to accelerate the resea…
	583 domain-specific knowledge graphs in the medical domain, we introduce Dia…
	584 high-quality Chinese dataset for Diabetes knowledge graph, which contains
	585 22,050 entities and 6,890 relations in total. We implement recent typical
	586 methods for Named Entity Recognition and Relation Extraction as a benchm…
	587 evaluate the proposed dataset thoroughly. Empirical results show that th…
	588 is challenging for most existing methods and further analysis is conduct…
	589 discuss future research direction for improvements. We hope the release …
	590 dataset can assist the construction of diabetes knowledge graphs and fac…
	591 AI-based applications.
	592 </summary></entry><entry><id>http://arxiv.org/abs/2105.11844</id><title>…
	593 aerial and satellite images is of high importance in several fields such…
	594 security, anomaly detection, land use planning and land use change detec…
	595 However, the detection of such infrastructures is complex as they have h…
	596 variable shapes and sizes, i.e., some infrastructures, such as electrical
	597 substations, are too small while others, such as airports, are too large.
	598 Besides, airports can have a surface area either small or too large with
	599 completely different shapes, which makes its correct detection challengi…
	600 far as we know, these limitations have not been tackled yet in previous …
	601 This paper presents (1) a smart Critical Infrastructure dataset, named
	602 CI-dataset, organised into two scales, small and large scales critical
	603 infrastructures and (2) a two-level resolution-independent critical
	604 infrastructure detection (DetDSCI) methodology that first determines the
	605 spatial resolution of the input image using a classification model, then
	606 analyses the image using the appropriate detector for that spatial resol…
	607 The present study targets two representative classes, airports and elect…
	608 substations. Our experiments show that DetDSCI methodology achieves up to
	609 37,53% F1 improvement with respect to Faster R-CNN, one of the most infl…
	610 detection models.
	611 </summary></entry><entry><id>http://arxiv.org/abs/2103.13460</id><title>…
	612 widely deployed in industrial robotics settings. Part of the challenge l…
	613 identifying slip and other key events from the tactile data stream. In t…
	614 paper, we present a learning-based method to detect slip using barometric
	615 tactile sensors. Although these sensors have a low resolution, they have…
	616 other desirable properties including high reliability and durability, a …
	617 slim profile, and a low cost. We are able to achieve slip detection accu…
	618 of greater than 91% while being robust to the speed and direction of the…
	619 motion. Further, we test our detector on two robot manipulation tasks in…
	620 common household objects and demonstrate successful generalization to
	621 real-world scenarios not seen during training. We show that barometric t…
	622 sensing technology, combined with data-driven learning, is potentially s…
	623 for complex manipulation tasks such as slip compensation.
	624 </summary></entry><entry><id>http://arxiv.org/abs/2102.08633</id><title>…
	625 rules, answer high-level questions such as "May I qualify for VA health …
	626 benefits?", and ask follow-up clarification questions whose answer is ne…
	627 to answer the original question. However, existing works assume the rule…
	628 is provided for each user question, which neglects the essential retriev…
	629 in real scenarios. In this work, we propose and investigate an open-retr…
	630 setting of conversational machine reading. In the open-retrieval setting…
	631 relevant rule texts are unknown so that a system needs to retrieve
	632 question-relevant evidence from a collection of rule texts, and answer u…
	633 high-level questions according to multiple retrieved rule texts in a
	634 conversational manner. We propose MUDERN, a Multi-passage Discourse-aware
	635 Entailment Reasoning Network which extracts conditions in the rule texts
	636 through discourse segmentation, conducts multi-passage entailment reason…
	637 answer user questions directly, or asks clarification follow-up question…
	638 inquiry more information. On our created OR-ShARC dataset, MUDERN achiev…
	639 state-of-the-art performance, outperforming existing single-passage
	640 conversational machine reading models as well as a new multi-passage
	641 conversational machine reading baseline by a large margin. In addition, …
	642 conduct in-depth analyses to provide new insights into this new setting …
	643 model.
	644 </summary></entry><entry><id>http://arxiv.org/abs/2102.07358</id><title>…
	645 methods. In some target problem domains, there are not many data samples
	646 available, which could significantly hinder the learning process. While …
	647 from similar domains may be leveraged to help through domain adaptation,
	648 obtaining high-quality labeled data for those source domains themselves …
	649 be difficult or costly. To address such challenges on data insufficiency…
	650 classification problem in a target domain, we propose a weak adaptation
	651 learning (WAL) approach that leverages unlabeled data from a similar sou…
	652 domain, a low-cost weak annotator that produces labels based on task-spe…
	653 heuristics, labeling rules, or other methods (albeit with inaccuracy), a…
	654 small amount of labeled data in the target domain. Our approach first co…
	655 a theoretical analysis on the error bound of the trained classifier with
	656 respect to the data quantity and the performance of the weak annotator, …
	657 then introduces a multi-stage weak adaptation learning method to learn an
	658 accurate classifier by lowering the error bound. Our experiments demonst…
	659 the effectiveness of our approach in learning an accurate classifier with
	660 limited labeled data in the target domain and unlabeled data in the sour…
	661 domain.
	662 </summary></entry><entry><id>http://arxiv.org/abs/2102.04394</id><title>…
	663 powerful formalism to represent both the quantum and classical uncertain…
	664 quantum systems and to express different statistical operations such as
	665 measurement, system combination and expectations as linear algebra opera…
	666 This paper explores how density matrices can be used as a building block…
	667 build machine learning models exploiting their ability to straightforwar…
	668 combine linear algebra and probability. One of the main results of the p…
	669 to show that density matrices coupled with random Fourier features could
	670 approximate arbitrary probability distributions over $\mathbb{R}^n$. Bas…
	671 this finding the paper builds different models for density estimation,
	672 classification and regression. These models are differentiable, so it is
	673 possible to integrate them with other differentiable components, such as…
	674 learning architectures and to learn their parameters using gradient-based
	675 optimization. In addition, the paper presents optimization-less training
	676 strategies based on estimation and model averaging. The models are evalu…
	677 benchmark tasks and the results are reported and discussed.
	678 </summary></entry><entry><id>http://arxiv.org/abs/2011.11152</id><title>…
	679 training deep neural networks that generalize well. Previous work usually
	680 interpreted weight decay as a Gaussian prior from the Bayesian perspecti…
	681 However, weight decay sometimes shows mysterious behaviors beyond the
	682 conventional understanding. For example, the optimal weight decay value …
	683 to be zero given long enough training time. Moreover, existing work typi…
	684 failed to recognize the importance of scheduling weight decay during tra…
	685 Our work aims at theoretically understanding novel behaviors of weight d…
	686 and designing schedulers for weight decay in deep learning. This paper m…
	687 has three contributions. First, we propose a novel theoretical interpret…
	688 of weight decay from the perspective of learning dynamics. Second, we pr…
	689 novel weight-decay linear scaling rule for large-batch training that
	690 proportionally increases weight decay rather than the learning rate as t…
	691 batch size increases. Third, we provide an effective learning-rate-aware
	692 scheduler for weight decay, called the Stable Weight Decay (SWD) method,…
	693 to the best of our knowledge, is the first practical design for weight d…
	694 scheduling. In our various experiments, the SWD method often makes impro…
	695 over $L_{2}$ Regularization and Decoupled Weight Decay.
	696 </summary></entry><entry><id>http://arxiv.org/abs/2011.02073</id><title>…
	697 policies for high-dimensional, complex robotic tasks, but tends to be
	698 data-inefficient. Model-based RL tends to be more data-efficient but oft…
	699 suffers from learning a high-dimensional model that is good enough for p…
	700 improvement. This limits its use to learning simple models for restricti…
	701 domains. Optimal control generates solutions without collecting any data,
	702 assuming an accurate model of the system and environment is known, which…
	703 often true in many control theory applications. However, optimal control…
	704 be scaled to problems with a high-dimensional state space. In this paper…
	705 propose a novel approach to alleviate data inefficiency of model-free RL…
	706 high-dimensional problems by warm-starting the learning process using a
	707 lower-dimensional model-based solution. Particularly, we initialize a ba…
	708 function for the high-dimensional RL problem via supervision from a
	709 lower-dimensional value function, which can be obtained by solving a
	710 lower-dimensional problem with a known, approximate model using "classic…
	711 techniques such as value iteration or optimal control. Therefore, our ap…
	712 implicitly exploits the model priors from simplified problem space to
	713 facilitate the policy learning in high-dimensional RL tasks. We demonstr…
	714 approach on two representative robotic learning tasks and observe signif…
	715 improvement in policy performance and learning efficiency. We also evalu…
	716 method empirically with a third task.
	717 </summary></entry><entry><id>http://arxiv.org/abs/2004.12908</id><title>…
	718 current task, but also on previously encountered and as yet unencountered
	719 tasks. In contrast, classical machine learning starts from a blank slate…
	720 tabula rasa, using data only for the single task at hand. While typical
	721 transfer learning algorithms can improve performance on future tasks, th…
	722 performance on prior tasks degrades upon learning new tasks (called
	723 catastrophic forgetting). Many recent approaches for continual or lifelo…
	724 learning have attempted to maintain performance given new tasks. But str…
	725 to avoid forgetting sets the goal unnecessarily low: the goal of lifelong
	726 learning, whether biological or artificial, should be to improve perform…
	727 all tasks (including past and future) with any new data. We propose
	728 omnidirectional transfer learning algorithms, which includes two special…
	729 of interest: decision forests and deep networks. Our key insight is the
	730 development of the omni-voter layer, which ensembles representations lea…
	731 independently on all tasks to jointly decide how to proceed on any given…
	732 data point, thereby improving performance on both past and future tasks.…
	733 algorithms demonstrate omnidirectional transfer in a variety of simulate…
	734 real data scenarios, including tabular data, image data, spoken data, and
	735 adversarial tasks. Moreover, they do so with quasilinear space and time
	736 complexity.
	737 </summary></entry><entry><id>http://arxiv.org/abs/2109.10322</id><title>…
	738 dense visual recognition tasks, such as scene segmentation. The last lay…
	739 FCN is typically a global classifier (1x1 convolution) to recognize each…
	740 to a semantic label. We empirically show that this global classifier, ig…
	741 the intra-class distinction, may lead to sub-optimal results.
	742 </summary></entry><entry><id>http://arxiv.org/abs/2109.10317</id><title>…
	743 do. But deep neural networks are fragile and their behaviors are often
	744 surprising. In many settings, we need to provide formal guarantees on the
	745 safety, security, correctness, or robustness of neural networks. This bo…
	746 covers foundational ideas from formal verification and their adaptation …
	747 reasoning about neural networks and deep learning.
	748 </summary></entry><entry><id>http://arxiv.org/abs/2109.10312</id><title>…
	749 skills from raw images that can be sequenced to complete long-horizon
	750 visuomotor tasks. Reinforcement learning (RL) is a promising approach for
	751 acquiring short-horizon skills autonomously. However, the focus of RL
	752 algorithms has largely been on the success of those individual skills, m…
	753 than learning and grounding a large repertoire of skills that can be seq…
	754 to complete extended multi-stage tasks. The latter demands robustness and
	755 persistence, as errors in skills can compound over time, and may require…
	756 robot to have a number of primitive skills in its repertoire, rather tha…
	757 one. To this end, we introduce EMBR, a model-based RL method for learning
	758 primitive skills that are suitable for completing long-horizon visuomotor
	759 tasks. EMBR learns and plans using a learned model, critic, and success
	760 classifier, where the success classifier serves both as a reward functio…
	761 RL and as a grounding mechanism to continuously detect if the robot shou…
	762 retry a skill when unsuccessful or under perturbations. Further, the lea…
	763 model is task-agnostic and trained using data from all skills, enabling …
	764 robot to efficiently learn a number of distinct primitives. These visuom…
	765 primitive skills and their associated pre- and post-conditions can then …
	766 directly combined with off-the-shelf symbolic planners to complete long-…
	767 tasks. On a Franka Emika robot arm, we find that EMBR enables the robot …
	768 complete three long-horizon visuomotor tasks at 85% success rate, such as
	769 organizing an office desk, a file cabinet, and drawers, which require
	770 sequencing up to 12 skills, involve 14 unique learned primitives, and de…
	771 generalization to novel objects.
	772 </summary></entry><entry><id>http://arxiv.org/abs/2109.10303</id><title>…
	773 deterministic finite automata with rewards as outputs, based on Kolmogor…
	774 complexity. Kolmogorov complexity is considered since it can detect
	775 computational regularities of deterministic optimal policies. We present…
	776 planning objective yielding an explicit trade-off between a policy's
	777 performance and complexity. It is proven that maximising this objective …
	778 non-trivial in the sense that dynamic programming is infeasible. We pres…
	779 algorithms obtaining low-complexity policies, where the first algorithm …
	780 a low-complexity optimal policy, and the second algorithm finds a policy
	781 maximising performance while maintaining local (stage-wise) complexity
	782 constraints. We evaluate the algorithms on a simple navigation task for a
	783 mobile robot, where our algorithms yield low-complexity policies that co…
	784 with intuition.
	785 </summary></entry><entry><id>http://arxiv.org/abs/2109.10285</id><title>…
	786 light of itssignificance in a wide range of applications including healt…
	787 transportation and fi-nance. Until now, the early classification problem…
	788 been dealt with by considering onlyirrevocable decisions. This paper int…
	789 a new problem calledearly and revocabletimeseries classification, where …
	790 decision maker can revoke its earlier decisions based on thenew available
	791 measurements. In order to formalize and tackle this problem, we propose …
	792 cost-based framework and derive two new approaches from it. The first ap…
	793 doesnot consider explicitly the cost of changing decision, while the sec…
	794 does. Exten-sive experiments are conducted to evaluate these approaches …
	795 large benchmark of realdatasets. The empirical results obtained convinci…
	796 show (i) that the ability of revok-ing decisions significantly improves
	797 performance over the irrevocable regime, and (ii) thattaking into accoun…
	798 cost of changing decision brings even better results in
	799 general.Keywords:revocable decisions, cost estimation, online decision m…
	800 </summary></entry><entry><id>http://arxiv.org/abs/2109.10246</id><title>…
	801 their lack of grounding, i.e., connecting words to their meanings in the
	802 physical world. Vision-and-Language (VL) models, trained jointly on text…
	803 image or video data, have been offered as a response to such criticisms.
	804 However, while VL pretraining has shown success on multimodal tasks such…
	805 visual question answering, it is not yet known how the internal linguist…
	806 representations themselves compare to their text-only counterparts. This…
	807 compares the semantic representations learned via VL vs. text-only pretr…
	808 for two recent VL models using a suite of analyses (clustering, probing,…
	809 performance on a commonsense question answering task) in a language-only
	810 setting. We find that the multimodal models fail to significantly outper…
	811 the text-only variants, suggesting that future work is required if multi…
	812 pretraining is to be pursued as a means of improving NLP in general.
	813 </summary></entry><entry><id>http://arxiv.org/abs/2109.10231</id><title>…
	814 provide insights towards behavior change. Prior work has explored how
	815 self-trackers reflect on their logged data, but it remains unclear how m…
	816 they learn from the tracking feedback, and which information is more use…
	817 Indeed, the feedback can still be overwhelming, and making it concise can
	818 improve learning by increasing focus and reducing interpretation burden.…
	819 conducted a field study of mobile food logging with two feedback modes (…
	820 journaling and automatic annotation of food images) and identified learn…
	821 differences regarding nutrition, assessment, behavioral, and contextual
	822 information. We propose a Self-Tracking Feedback Saliency Framework to d…
	823 when to provide feedback, on which specific information, why those detai…
	824 how to present them (as manual inquiry or automatic feedback). We propose
	825 SalienTrack to implement these requirements. Using the data collected fr…
	826 user study, we trained a machine learning model to predict whether a use…
	827 learn from each tracked event. Using explainable AI (XAI) techniques, we
	828 identified the most salient features per instance and why they lead to p…
	829 learning outcomes. We discuss implications for learnability in self-trac…
	830 and how adding model explainability expands opportunities for improving
	831 feedback experience.
	832 </summary></entry><entry><id>http://arxiv.org/abs/2109.10217</id><title>…
	833 of content in various industries. These techniques require extensive kno…
	834 of the desired content, and about how to actually implement such procedu…
	835 methods. Algorithms for learning interpretable generative models from ex…
	836 content could alleviate both difficulties. We propose SIGI, a novel meth…
	837 inferring shapes and inducing a shape grammar from grid-based 3D building
	838 examples. This interpretable grammar is well-suited for co-creative desi…
	839 Applied to Minecraft buildings, we show how the shape grammar can be use…
	840 automatically generate new buildings in a similar style.
	841 </summary></entry><entry><id>http://arxiv.org/abs/2109.10200</id><title>…
	842 where both customer locations and demands are uncertain. In particular,
	843 potential customers are not restricted to a predefined customer set but …
	844 continuously spatially distributed in a given service area. The objectiv…
	845 maximize the served demands while fulfilling vehicle capacities and time
	846 restrictions. We call this problem the VRP with stochastic customers and
	847 demands (VRPSCD). For this problem, we first propose a Markov Decision P…
	848 (MDP) formulation representing the classical centralized decision-making
	849 perspective where one decision-maker establishes the routes of all vehic…
	850 While the resulting formulation turns out to be intractable, it provides…
	851 with the ground to develop a new MDP formulation of the VRPSCD represent…
	852 decentralized decision-making framework, where vehicles autonomously est…
	853 their own routes. This new formulation allows us to develop several stra…
	854 to reduce the dimension of the state and action spaces, resulting in a
	855 considerably more tractable problem. We solve the decentralized problem …
	856 Reinforcement Learning, and in particular, we develop a Q-learning algor…
	857 featuring state-of-the-art acceleration techniques such as Replay Memory…
	858 Double Q Network. Computational results show that our method considerably
	859 outperforms two commonly adopted benchmark policies (random and heuristi…
	860 Moreover, when comparing with existing literature, we show that our appr…
	861 can compete with specialized methods developed for the particular case o…
	862 VRPSCD where customer locations and expected demands are known in advanc…
	863 Finally, we show that the value functions and policies obtained by our
	864 algorithm can be easily embedded in Rollout algorithms, thus further imp…
	865 their performances.
	866 </summary></entry><entry><id>http://arxiv.org/abs/2109.10199</id><title>…
	867 led researchers and engineers to investigate novel models for robust and
	868 reliable control of autonomous robots (navigation, obstacle detection and
	869 avoidance, etc.), especially for quadrotors in challenging contexts such…
	870 drone racing and aggressive maneuvers. Using spiking neural networks, th…
	871 models can be run on neuromorphic hardware to benefit from outstanding u…
	872 rates and high energy efficiency. Yet, low-level controllers are often
	873 neglected and remain outside of the neuromorphic loop. Designing low-lev…
	874 neuromorphic controllers is crucial to remove the standard PID, and ther…
	875 benefit from all the advantages of closing the neuromorphic loop. In this
	876 paper, we propose a parsimonious and adjustable neuromorphic PID control…
	877 endowed with a minimal number of 93 neurons sparsely connected to achieve
	878 autonomous, onboard altitude control of a quadrotor equipped with Intel'…
	879 neuromorphic chip. We successfully demonstrate the robustness of our pro…
	880 network in a set of experiments where the quadrotor is requested to reac…
	881 target altitude from take-off. Our results confirm the suitability of su…
	882 low-level neuromorphic controllers, ultimately with a very high update
	883 frequency.
	884 </summary></entry><entry><id>http://arxiv.org/abs/2109.10187</id><title>…
	885 in aerial images are displayed in arbitrary directions and usually dense…
	886 packed. Although considerable progress has been made, there are still
	887 challenges that existing regression-based rotation detectors suffer the …
	888 of discontinuous boundaries, which is directly caused by angular periodi…
	889 corner ordering. In this paper, we propose a simple effective framework …
	890 address the above challenges. Instead of directly regressing the five
	891 parameters (coordinates of the central point, width, height, and rotation
	892 angle) or the four vertices, we use the area ratio of parallelogram (ARP…
	893 accurately describe a multi-oriented object. Specifically, we regress
	894 coordinates of center point, height and width of minimum circumscribed
	895 rectangle of oriented object and three area ratios {\lambda}_1, {\lambda…
	896 {\lambda}_3. This may facilitate the offset learning and avoid the issue…
	897 angular periodicity or label points sequence for oriented objects. To fu…
	898 remedy the confusion issue nearly horizontal objects, we employ the area…
	899 between the object and its horizontal bounding box (minimum circumscribed
	900 rectangle) to guide the selection of horizontal or oriented detection fo…
	901 object. We also propose a rotation efficient IoU loss (R-EIoU) to connec…
	902 horizontal bounding box with the three area ratios and improve the accur…
	903 the rotating bounding box. Experimental results on three remote sensing
	904 datasets including HRSC2016, DOTA and UCAS-AOD and scene text including
	905 ICDAR2015 show that our method achieves superior detection performance c…
	906 with many state-of-the-art approaches. The code and model will be coming…
	907 paper published.
	908 </summary></entry><entry><id>http://arxiv.org/abs/2109.10173</id><title>…
	909 the quality of learned policy. Hard-exploration environments are defined…
	910 huge state space and sparse rewards. In such conditions, an exhaustive
	911 exploration of the environment is often impossible, and the successful t…
	912 of an agent requires a lot of interaction steps. In this paper, we propo…
	913 exploration method called Rollback-Explore (RbExplore), which utilizes t…
	914 concept of the persistent Markov decision process, in which agents during
	915 training can roll back to visited states. We test our algorithm in the
	916 hard-exploration Prince of Persia game, without rewards and domain knowl…
	917 At all used levels of the game, our agent outperforms or shows comparable
	918 results with state-of-the-art curiosity methods with knowledge-based int…
	919 motivation: ICM and RND. An implementation of RbExplore can be found at
	920 https://github.com/cds-mipt/RbExplore.
	921 </summary></entry><entry><id>http://arxiv.org/abs/2109.10149</id><title>…
	922 feedback methods require human assessment from facilitators or peers. Th…
	923 not scalable to large crowds. We propose Interpretable Directed Diversit…
	924 automatically predict ideation quality and diversity scores, and provide…
	925 explanations - Attribution, Contrastive Attribution, and Counterfactual
	926 Suggestions - for deeper feedback on why ideations were scored (low), an…
	927 to get higher scores. These explanations provide multi-faceted feedback …
	928 users iteratively improve their ideation. We conducted think aloud and
	929 controlled user studies to understand how various explanations are used,…
	930 evaluated whether explanations improve ideation diversity and quality. U…
	931 appreciated that explanation feedback helped focus their efforts and pro…
	932 directions for improvement. This resulted in explanations improving dive…
	933 compared to no feedback or feedback with predictions only. Hence, our ap…
	934 opens opportunities for explainable AI towards scalable and rich feedbac…
	935 iterative crowd ideation.
	936 </summary></entry><entry><id>http://arxiv.org/abs/2109.10129</id><title>…
	937 domains can be expressed and learned in terms of a pool of features defi…
	938 from the domain predicates using a description logic grammar. At the sam…
	939 most description logics correspond to a fragment of $k$-variable countin…
	940 ($C_k$) for $k=2$, that has been shown to provide a tight characterizati…
	941 the expressive power of graph neural networks. In this work, we make use…
	942 these results to understand the power and limits of using graph neural n…
	943 (GNNs) for learning optimal general policies over a number of tractable
	944 planning domains where such policies are known to exist. For this, we tr…
	945 simple GNN in a supervised manner to approximate the optimal value funct…
	946 $V^{*}(s)$ of a number of sample states $s$. As predicted by the theory,…
	947 observed that general optimal policies are obtained in domains where gen…
	948 optimal value functions can be defined with $C_2$ features but not in th…
	949 requiring more expressive $C_3$ features. In addition, it is observed th…
	950 features learned are in close correspondence with the features needed to
	951 express $V^{*}$ in closed form. The theory and the analysis of the domai…
	952 us understand the features that are actually learned as well as those th…
	953 cannot be learned in this way, and let us move in a principled manner fr…
	954 combinatorial optimization approach to learning general policies to a
	955 potentially, more robust and scalable approach based on deep learning.
	956 </summary></entry><entry><id>http://arxiv.org/abs/2109.10106</id><title>…
	957 planning complex missions for heterogeneous multi-robot teams. This clas…
	958 problems involves tasks that can be executed in different ways and are
	959 associated with cross-schedule dependencies that constrain the schedules…
	960 different robots in the system. The proposed approach involves a
	961 multi-objective heuristic search of the mission, represented as a hierar…
	962 tree that defines the mission goal. This procedure outputs several favor…
	963 ways to fulfill the mission, which directly feed into the next stage of …
	964 method. We propose a distributed metaheuristic based on evolutionary
	965 computation to allocate tasks and generate schedules for the set of chos…
	966 decompositions. The method is evaluated in a simulation setup of an auto…
	967 greenhouse use case, where we demonstrate the method's ability to adapt …
	968 planning strategy depending on the available robots and the given optimi…
	969 criteria.
	970 </summary></entry><entry><id>http://arxiv.org/abs/2109.10100</id><title>…
	971 deep neural networks. However the computation of Fisher information matr…
	972 becomes more and more difficult as the network structure turns large and
	973 complex. This paper proposes a new optimization method whose main idea i…
	974 accurately replace the natural gradient optimization by reconstructing t…
	975 network. More specifically, we reconstruct the structure of the deep neu…
	976 network, and optimize the new network using traditional gradient descent…
	977 The reconstructed network achieves the effect of the optimization way wi…
	978 natural gradient descent. Experimental results show that our optimization
	979 method can accelerate the convergence of deep network models and achieve…
	980 performance than GD while sharing its computational simplicity.
	981 </summary></entry><entry><id>http://arxiv.org/abs/2109.10086</id><title>…
	982 improving the first retriever in ranking pipelines. Learning dense embed…
	983 to conduct retrieval using efficient approximate nearest neighbors metho…
	984 proven to work well. Meanwhile, there has been a growing interest in lea…
	985 \emph{sparse} representations for documents and queries, that could inhe…
	986 from the desirable properties of bag-of-words models such as the exact m…
	987 of terms and the efficiency of inverted indexes. Introduced recently, the
	988 SPLADE model provides highly sparse representations and competitive resu…
	989 with respect to state-of-the-art dense and sparse approaches. In this pa…
	990 build on SPLADE and propose several significant improvements in terms of
	991 effectiveness and/or efficiency. More specifically, we modify the pooling
	992 mechanism, benchmark a model solely based on document expansion, and int…
	993 models trained with distillation. We also report results on the BEIR ben…
	994 Overall, SPLADE is considerably improved with more than $9$\% gains on N…
	995 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchma…
	996 </summary></entry><entry><id>http://arxiv.org/abs/2109.10085</id><title>…
	997 seen increasing importance for investment decisions. Hence, investors (a…
	998 managers and asset owners) who wanted to incorporate these issues starte…
	999 assess companies based on how they handle such topics. For this assessme…
	1000 investors rely on specialized rating agencies that issue ratings along t…
	1001 environmental, social and governance (ESG) dimensions. Such ratings allo…
	1002 to make investment decisions in favor of sustainability. However, rating
	1003 agencies base their analysis on subjective assessment of sustainability
	1004 reports, not provided by every company. Furthermore, due to human labor
	1005 involved, rating agencies are currently facing the challenge to scale up…
	1006 coverage in a timely manner.
	1007 </summary></entry><entry><id>http://arxiv.org/abs/2109.10065</id><title>…
	1008 based on accuracy, quickness, and consistency for antenna modelling. Usi…
	1009 Nntool by MATLAB, 22 different combinations of networks and training alg…
	1010 are used to predict the dimensions of a rectangular microstrip antenna u…
	1011 dielectric constant, height of substrate, and frequency of operation as …
	1012 Comparison and characterization of networks is done based on accuracy, m…
	1013 square error, and training time. Algorithms, on the other hand, are anal…
	1014 their accuracy, speed, reliability, and smoothness in the training proce…
	1015 Finally, these results are analyzed, and recommendations are made for ea…
	1016 neural network and algorithm based on uses, advantages, and disadvantage…
	1017 example, it is observed that Reduced Radial Bias network is the most acc…
	1018 network and Scaled Conjugate Gradient is the most reliable algorithm for
	1019 electromagnetic modelling. This paper will help a researcher find the op…
	1020 network and algorithm directly without doing time-taking experimentation.
	1021 </summary></entry><entry><id>http://arxiv.org/abs/2109.10057</id><title>…
	1022 network named Localization Transformer (LOTR). The proposed framework is…
	1023 direct coordinate regression approach leveraging a Transformer network to
	1024 better utilize the spatial information in the feature map. An LOTR model
	1025 consists of three main modules: 1) a visual backbone that converts an in…
	1026 image into a feature map, 2) a Transformer module that improves the feat…
	1027 representation from the visual backbone, and 3) a landmark prediction he…
	1028 directly predicts the landmark coordinates from the Transformer's
	1029 representation. Given cropped-and-aligned face images, the proposed LOTR…
	1030 trained end-to-end without requiring any post-processing steps. This pap…
	1031 introduces the smooth-Wing loss function, which addresses the gradient
	1032 discontinuity of the Wing loss, leading to better convergence than stand…
	1033 loss functions such as L1, L2, and Wing loss. Experimental results on th…
	1034 landmark dataset provided by the First Grand Challenge of 106-Point Faci…
	1035 Landmark Localization indicate the superiority of LOTR over the existing
	1036 methods on the leaderboard and two recent heatmap-based approaches.
	1037 </summary></entry><entry><id>http://arxiv.org/abs/2109.10047</id><title>…
	1038 aggregate components with shallow and simple architectures, which are li…
	1039 by the 'over-smooth' problem. To further explore the benefits from struc…
	1040 diversity and depth of GNN architectures, we propose a GNN generation pi…
	1041 with a novel two-stage search space, which aims at automatically generat…
	1042 high-performance while transferable deep GNN models in a block-wise mann…
	1043 Meanwhile, to alleviate the 'over-smooth' problem, we incorporate multip…
	1044 flexible residual connection in our search space and apply identity mapp…
	1045 the basic GNN layers. For the search algorithm, we use deep-q-learning w…
	1046 epsilon-greedy exploration strategy and reward reshaping. Extensive expe…
	1047 on real-world datasets show that our generated GNN models outperforms ex…
	1048 manually designed and NAS-based ones.
	1049 </summary></entry><entry><id>http://arxiv.org/abs/2109.10034</id><title>…
	1050 key functions. This process has often been conceptualised within the fra…
	1051 of reinforcement learning, which has also gained prominence in machine l…
	1052 and artificial intelligence (AI) as a way to optimise decision-making. A…
	1053 aspect of both biological and machine reinforcement learning is the
	1054 reactivation of previously experienced episodes, referred to as replay. …
	1055 is important for memory consolidation in biological neural networks, and…
	1056 to stabilising learning in deep neural networks. Here, we review recent
	1057 developments concerning the functional roles of replay in the fields of
	1058 neuroscience and AI. Complementary progress suggests how replay might su…
	1059 learning processes, including generalisation and continual learning, aff…
	1060 opportunities to transfer knowledge across the two fields to advance the
	1061 understanding of biological and artificial learning and memory.
	1062 </summary></entry><entry><id>http://arxiv.org/abs/2109.10020</id><title>…
	1063 payment processing networks is essential for system monitoring. Multivar…
	1064 time series, aggregated from the past transaction history, can provide v…
	1065 insights for such prediction. The general multivariate time series predi…
	1066 problem has been well studied and applied across several domains, includ…
	1067 manufacturing, medical, and entomology. However, new domain-related chal…
	1068 associated with the data such as concept drift and multi-modality have s…
	1069 in addition to the real-time requirements of handling the payment transa…
	1070 data at scale. In this work, we study the problem of multivariate time s…
	1071 prediction for estimating transaction metrics associated with entities i…
	1072 payment transaction database. We propose a model with five unique compon…
	1073 estimate the transaction metrics from multi-modality data. Four of these
	1074 components capture interaction, temporal, scale, and shape perspectives,…
	1075 the fifth component fuses these perspectives together. We also propose a…
	1076 offline/online training scheme to address concept drift in the data and …
	1077 the real-time requirements. Combining the estimation model with a graphi…
	1078 user interface, the prototype transaction metric estimation system has
	1079 demonstrated its potential benefit as a tool for improving a payment pro…
	1080 company's system monitoring capability.
	1081 </summary></entry><entry><id>http://arxiv.org/abs/2109.10016</id><title>…
	1082 This task is essential because advanced video retrieval applications sho…
	1083 enable users to retrieve a precise moment from a large video corpus. We …
	1084 a novel CONtextual QUery-awarE Ranking~(CONQUER) model for effective mom…
	1085 localization and ranking. CONQUER explores query context for multi-modal…
	1086 and representation learning in two different steps. The first step deriv…
	1087 fusion weights for the adaptive combination of multi-modal video content…
	1088 second step performs bi-directional attention to tightly couple video an…
	1089 as a single joint representation for moment localization. As query conte…
	1090 fully engaged in video representation learning, from feature fusion to
	1091 transformation, the resulting feature is user-centered and has a larger
	1092 capacity in capturing multi-modal signals specific to query. We conduct …
	1093 on two datasets, TVR for closed-world TV episodes and DiDeMo for open-wo…
	1094 user-generated videos, to investigate the potential advantages of fusing…
	1095 and query online as a joint representation for moment retrieval.
	1096 </summary></entry><entry><id>http://arxiv.org/abs/2109.10011</id><title>…
	1097 intelligence, and it has been widely used to measure the abstract reason…
	1098 ability of humans. In this paper, to study the abstract reasoning capabi…
	1099 deep neural networks, we propose the first unsupervised learning method …
	1100 solving RPM problems. Since the ground truth labels are not allowed, we …
	1101 a pseudo target based on the prior constraints of the RPM formulation to
	1102 approximate the ground truth label, which effectively converts the unsup…
	1103 learning strategy into a supervised one. However, the correct answer is …
	1104 labelled by the pseudo target, and thus the noisy contrast will lead to
	1105 inaccurate model training. To alleviate this issue, we propose to improv…
	1106 model performance with negative answers. Moreover, we develop a
	1107 decentralization method to adapt the feature representation to different…
	1108 problems. Extensive experiments on three datasets demonstrate that our m…
	1109 even outperforms some of the supervised approaches. Our code is availabl…
	1110 https://github.com/visiontao/ncd.
	1111 </summary></entry><entry><id>http://arxiv.org/abs/2109.10007</id><title>…
	1112 to measure the degree of similarity between scientific papers. These app…
	1113 are intuitive, easy to put into practice, and computationally cheap. Mor…
	1114 they have been used to generate a map of science, allowing visualizing r…
	1115 field interactions. Nonetheless, these methods do not work unless two pa…
	1116 share a standard reference, limiting the two papers usability with no di…
	1117 connection. In this work, we propose to extend bibliographic coupling to…
	1118 deep neighborhood, by using graph diffusion methods. This method allows
	1119 defining similarity between any two papers, making it possible to genera…
	1120 local map of science, highlighting field organization.
	1121 </summary></entry><entry><id>http://arxiv.org/abs/2109.09975</id><title>…
	1122 trajectories of autonomous vehicles when probabilistic predictions of ot…
	1123 agents' futures are generated by deep neural networks (DNNs). The presen…
	1124 methods address a wide range of representations for uncertain predictions
	1125 including both Gaussian and non-Gaussian mixture models to predict both …
	1126 positions and control inputs conditioned on the scene contexts. We show …
	1127 the problem of risk assessment when Gaussian mixture models (GMMs) of ag…
	1128 positions are learned can be solved rapidly to arbitrary levels of accur…
	1129 with existing numerical methods. To address the problem of risk assessme…
	1130 non-Gaussian mixture models of agent position, we propose finding upper …
	1131 on risk using nonlinear Chebyshev's Inequality and sums-of-squares (SOS)
	1132 programming; they are both of interest as the former is much faster whil…
	1133 latter can be arbitrarily tight. These approaches only require higher or…
	1134 statistical moments of agent positions to determine upper bounds on risk…
	1135 perform risk assessment when models are learned for agent control inputs…
	1136 opposed to positions, we propagate the moments of uncertain control inpu…
	1137 through the nonlinear motion dynamics to obtain the exact moments of unc…
	1138 position over the planning horizon. To this end, we construct determinis…
	1139 linear dynamical systems that govern the exact time evolution of the mom…
	1140 uncertain position in the presence of uncertain control inputs. The pres…
	1141 methods are demonstrated on realistic predictions from DNNs trained on t…
	1142 Argoverse and CARLA datasets and are shown to be effective for rapidly
	1143 assessing the probability of low probability events.
	1144 </summary></entry><entry><id>http://arxiv.org/abs/2109.09968</id><title>…
	1145 games in studying natural language communication between humans and arti…
	1146 agents. However, the generalization still remains a big challenge as the…
	1147 depend critically on the complexity and variety of training tasks. In th…
	1148 paper, we address this problem by introducing a hierarchical framework b…
	1149 upon the knowledge graph-based RL agent. In the high level, a meta-polic…
	1150 executed to decompose the whole game into a set of subtasks specified by
	1151 textual goals, and select one of them based on the KG. Then a sub-policy…
	1152 low level is executed to conduct goal-conditioned reinforcement learning…
	1153 carry out experiments on games with various difficulty levels and show t…
	1154 proposed method enjoys favorable generalizability.
	1155 </summary></entry><entry><id>http://arxiv.org/abs/2109.09960</id><title>…
	1156 effectively exploit the unlabeled hard regions for semi-supervised medic…
	1157 image segmentation. The MC-Net+ model is motivated by the observation th…
	1158 models trained with limited annotations are prone to output highly uncer…
	1159 and easily mis-classified predictions in the ambiguous regions (e.g. adh…
	1160 edges or thin branches) for the image segmentation task. Leveraging these
	1161 region-level challenging samples can make the semi-supervised segmentati…
	1162 model training more effective. Therefore, our proposed MC-Net+ model con…
	1163 of two new designs. First, the model contains one shared encoder and mul…
	1164 sightly different decoders (i.e. using different up-sampling strategies)…
	1165 statistical discrepancy of multiple decoders' outputs is computed to den…
	1166 model's uncertainty, which indicates the unlabeled hard regions. Second,…
	1167 mutual consistency constraint is enforced between one decoder's probabil…
	1168 output and other decoders' soft pseudo labels. In this way, we minimize …
	1169 model's uncertainty during training and force the model to generate inva…
	1170 and low-entropy results in such challenging areas of unlabeled data, in …
	1171 to learn a generalized feature representation. We compared the segmentat…
	1172 results of the MC-Net+ with five state-of-the-art semi-supervised approa…
	1173 three public medical datasets. Extension experiments with two common
	1174 semi-supervised settings demonstrate the superior performance of our mod…
	1175 other existing methods, which sets a new state of the art for semi-super…
	1176 medical image segmentation.
	1177 </summary></entry><entry><id>http://arxiv.org/abs/2109.09946</id><title>…
	1178 case data has long been recognized. Here, we study the problem of identi…
	1179 and measuring biases in large-scale legal case data from an algorithmic
	1180 fairness perspective. Our approach utilizes two regression models: A bas…
	1181 that represents the decisions of a "typical" judge as given by the data …
	1182 "fair" judge that applies one of three fairness concepts. Comparing the
	1183 decisions of the "typical" judge and the "fair" judge allows for quantif…
	1184 biases across demographic groups, as we demonstrate in four case studies…
	1185 criminal data from Cook County (Illinois).
	1186 </summary></entry><entry><id>http://arxiv.org/abs/2109.09906</id><title>…
	1187 visual or audio content. This typically augments the use of technologies…
	1188 as AI and ML by allowing to use natural speech for searching by keywords…
	1189 video descriptions. Prior research has successfully provided a number of
	1190 solutions for speech to text, in the case of a human speech, but this ar…
	1191 aims to investigate possible solutions to retrieve sound events based on…
	1192 natural language query, and estimate how effective and accurate they are…
	1193 this study, we specifically focus on the YamNet, AlexNet, and ResNet-50
	1194 pre-trained models to automatically classify audio samples using their
	1195 respective melspectrograms into a number of predefined classes. The pred…
	1196 classes can represent sounds associated with actions within a video frag…
	1197 Two tests are conducted to evaluate the performance of the models on two
	1198 separate problems: audio classification and intervals retrieval based on…
	1199 natural language query. Results show that the benchmarked models are com…
	1200 in terms of performance, with YamNet slightly outperforming the other two
	1201 models. YamNet was able to classify single fixed-size audio samples with…
	1202 accuracy and 68.75% precision while its average accuracy on intervals re…
	1203 was 71.62% and precision was 41.95%. The investigated method may be embe…
	1204 into an automated event marking architecture for streaming services.
	1205 </summary></entry><entry><id>http://arxiv.org/abs/2109.09904</id><title>…
	1206 own representations, there is significant discontent about their inscrut…
	1207 and the attendant problems in their ability to interact with humans. Whi…
	1208 alternatives such as neuro-symbolic approaches have been proposed, there…
	1209 lack of consensus on what they are about. There are often two independent
	1210 motivations (i) symbols as a lingua franca for human-AI interaction and …
	1211 symbols as (system-produced) abstractions use in its internal reasoning.…
	1212 jury is still out on whether AI systems will need to use symbols in their
	1213 internal reasoning to achieve general intelligence capabilities. Whateve…
	1214 answer there is, the need for (human-understandable) symbols in human-AI
	1215 interaction seems quite compelling. Symbols, like emotions, may well not…
	1216 sine qua non for intelligence per se, but they will be crucial for AI sy…
	1217 to interact with us humans--as we can neither turn off our emotions nor …
	1218 without our symbols. In particular, in many human-designed domains, huma…
	1219 would be interested in providing explicit (symbolic) knowledge and advic…
	1220 expect machine explanations in kind. This alone requires AI systems to a…
	1221 do their I/O in symbolic terms. In this blue sky paper, we argue this po…
	1222 view, and discuss research directions that need to be pursued to allow f…
	1223 type of human-AI interaction.
	1224 </summary></entry><entry><id>http://arxiv.org/abs/2109.09889</id><title>…
	1225 beyond the scope of an RL policy. Such states may make the RL system uns…
	1226 impede its deployment in real scenarios. In this paper, we propose a sim…
	1227 effective anomaly detection framework for deep RL algorithms that
	1228 simultaneously considers random, adversarial and out-of-distribution~(OO…
	1229 state outliers. In particular, we attain the class-conditional distribut…
	1230 for each action class under the Gaussian assumption, and rely on these
	1231 distributions to discriminate between inliers and outliers based on Maha…
	1232 Distance~(MD) and Robust Mahalanobis Distance. We conduct extensive expe…
	1233 on Atari games that verify the effectiveness of our detection strategies…
	1234 the best of our knowledge, we present the first in-detail study of stati…
	1235 and adversarial anomaly detection in deep RL algorithms. This simple uni…
	1236 anomaly detection paves the way towards deploying safe RL systems in rea…
	1237 applications.
	1238 </summary></entry><entry><id>http://arxiv.org/abs/2109.09876</id><title>…
	1239 extended actions, such as options, that can provide benefits in problems
	1240 requiring extensive exploration. One promising approach that learns these
	1241 options end-to-end is the option-critic (OC) framework. We examine and s…
	1242 this paper that OC does not decompose a problem into simpler sub-problem…
	1243 instead increases the size of the search over policy space with each opt…
	1244 considering the entire state space during learning. This issue can resul…
	1245 practical limitations of this method, including sample inefficient learn…
	1246 address this problem, we introduce Context-Specific Representation Abstr…
	1247 for Deep Option Learning (CRADOL), a new framework that considers both t…
	1248 abstraction and context-specific representation abstraction to effective…
	1249 reduce the size of the search over policy space. Specifically, our method
	1250 learns a factored belief state representation that enables each option t…
	1251 a policy over only a subsection of the state space. We test our method a…
	1252 hierarchical, non-hierarchical, and modular recurrent neural network bas…
	1253 demonstrating significant sample efficiency improvements in challenging
	1254 partially observable environments.
	1255 </summary></entry><entry><id>http://arxiv.org/abs/2109.09862</id><title>…
	1256 pipelines (Jauhiainen et al.,2019) and is not a solved problem in real-w…
	1257 settings. We present a lightweight and effective language identifier tha…
	1258 robust to changes of domain and to the absence of copious training data.
	1259 </summary></entry><entry><id>http://arxiv.org/abs/2109.09861</id><title>…
	1260 for autonomous driving, empirical evidence shows that there are still op…
	1261 questions around dealing with the challenges of common knowledge assumpt…
	1262 well as modeling bounded rationality. To address some of these practical
	1263 challenges, we develop a framework of generalized dynamic cognitive hier…
	1264 for both modelling naturalistic human driving behavior as well as behavi…
	1265 planning for autonomous vehicles (AV). This framework is built upon a ri…
	1266 model of level-0 behavior through the use of automata strategies, an
	1267 interpretable notion of bounded rationality through safety and maneuver
	1268 satisficing, and a robust response for planning. Based on evaluation on …
	1269 large naturalistic datasets as well as simulation of critical traffic
	1270 scenarios, we show that i) automata strategies are well suited for level…
	1271 behavior in a dynamic level-k framework, and ii) the proposed robust res…
	1272 to a heterogeneous population of strategic and non-strategic reasoners c…
	1273 an effective approach for game theoretic planning in AV.
	1274 </summary></entry><entry><id>http://arxiv.org/abs/2109.09844</id><title>…
	1275 monitoring of multiple sclerosis is an important component of successful
	1276 disease management. Prior studies have established that multiple scleros…
	1277 correlated with speech discrepancies. Early research using objective aco…
	1278 measurements has discovered measurable dysarthria.
	1279 </summary></entry><entry><id>http://arxiv.org/abs/2109.09833</id><title>…
	1280 the noise-induced dynamics during training deep neural networks by
	1281 gradient-based optimizers. Specifically, we firstly show that the stocha…
	1282 gradient noise possesses finite variance, and therefore the classical Ce…
	1283 Limit Theorem (CLT) applies; this indicates that the gradient noise is
	1284 asymptotically Gaussian. Such an asymptotic result validates the wide-ac…
	1285 assumption of Gaussian noise. We clarify that the recently observed phen…
	1286 of heavy tails within gradient noise may not be intrinsic properties, bu…
	1287 consequence of insufficient mini-batch size; the gradient noise, which i…
	1288 of limited i.i.d. random variables, has not reached the asymptotic regim…
	1289 CLT, thus deviates from Gaussian. We quantitatively measure the goodness…
	1290 Gaussian approximation of the noise, which supports our conclusion. Seco…
	1291 we analyze the noise-induced dynamics of stochastic gradient descent usi…
	1292 Langevin equation, granting for momentum hyperparameter in the optimizer…
	1293 physical interpretation. We then proceed to demonstrate the existence of…
	1294 steady-state distribution of stochastic gradient descent and approximate…
	1295 distribution at a small learning rate.
	1296 </summary></entry><entry><id>http://arxiv.org/abs/2109.09829</id><title>…
	1297 required to be processed on regular basis has pushed processing to the e…
	1298 the computing systems. Deploying advanced Neural Networks (NN), such as …
	1299 neural networks (DNNs) and spiking neural networks (SNNs), that offer
	1300 state-of-the-art results on resource-constrained edge devices is challen…
	1301 due to the stringent memory and power/energy constraints. Moreover, these
	1302 systems are required to maintain correct functionality under diverse sec…
	1303 and reliability threats. This paper first discusses existing approaches …
	1304 address energy efficiency, reliability, and security issues at different…
	1305 layers, i.e., hardware (HW) and software (SW). Afterward, we discuss how…
	1306 further improve the performance (latency) and the energy efficiency of E…
	1307 systems through HW/SW-level optimizations, such as pruning, quantization…
	1308 approximation. To address reliability threats (like permanent and transi…
	1309 faults), we highlight cost-effective mitigation techniques, like fault-a…
	1310 training and mapping. Moreover, we briefly discuss effective detection a…
	1311 protection techniques to address security threats (like model and data
	1312 corruption). Towards the end, we discuss how these techniques can be com…
	1313 in an integrated cross-layer framework for realizing robust and
	1314 energy-efficient Edge AI systems.
	1315 </summary></entry><entry><id>http://arxiv.org/abs/2109.09825</id><title>…
	1316 many others, unrealized (null) arguments in certain syntactic positions …
	1317 refer to a previously introduced entity, and are thus called anaphoric z…
	1318 pronouns. The existing resources for studying anaphoric zero pronoun
	1319 interpretation are however still limited. In this paper, we use five data
	1320 augmentation methods to generate and detect anaphoric zero pronouns
	1321 automatically. We use the augmented data as additional training material…
	1322 two anaphoric zero pronoun systems for Arabic. Our experimental results …
	1323 that data augmentation improves the performance of the two systems, surp…
	1324 the state-of-the-art results.
	1325 </summary></entry><entry><id>http://arxiv.org/abs/2109.09809</id><title>…
	1326 machine learning systems. An increasingly popular approach has been to s…
	1327 provide \emph{counterfactual instance explanations}. These specify close
	1328 possible worlds in which, contrary to the facts, a person receives their
	1329 desired decision from the machine learning system. This paper will draw …
	1330 literature from the philosophy of science to argue that a satisfactory
	1331 explanation must consist of both counterfactual instances and a causal e…
	1332 (or system of equations) that support the counterfactual instances. We w…
	1333 show that counterfactual instances by themselves explain little. We will
	1334 further illustrate how explainable AI methods that provide both causal
	1335 equations and counterfactual instances can successfully explain machine
	1336 learning predictions.
	1337 </summary></entry><entry><id>http://arxiv.org/abs/2109.09807</id><title>…
	1338 risk associated with dynamic occlusion, i.e., occlusion caused by other
	1339 vehicles in traffic. Based on the theory of hypergames, we develop a nov…
	1340 multi-agent dynamic occlusion risk (DOR) measure for assessing situation…
	1341 in dynamic occlusion scenarios. Furthermore, we present a white-box,
	1342 scenario-based, accelerated safety validation framework for assessing sa…
	1343 strategic planners in AV. Based on evaluation over a large naturalistic
	1344 database, our proposed validation method achieves a 4000% speedup compar…
	1345 direct validation on naturalistic data, a more diverse coverage, and abi…
	1346 generalize beyond the dataset and generate commonly observed dynamic occ…
	1347 crashes in traffic in an automated manner.
	1348 </summary></entry><entry><id>http://arxiv.org/abs/2109.09791</id><title>…
	1349 either numerical methods for the solution of dynamic model equations or
	1350 data-driven artificial intelligence algorithms. Within this latter frame…
	1351 the present paper illustrates how a deep learning method, exploiting vid…
	1352 radar reflectivity frames as input, can be used to realize a warning mac…
	1353 able to sound timely alarms of possible severe thunderstorm events. From…
	1354 technical viewpoint, the computational core of this approach is the use …
	1355 value-weighted skill score for both transforming the probabilistic outco…
	1356 the deep neural network into binary classification and assessing the
	1357 forecasting performances. The warning machine has been validated against
	1358 weather radar data recorded in the Liguria region, in Italy,
	1359