Dates: July 10-11, 2018
Address: University of Iceland - Sæmundargata 4, 101 Reykjavík
Room: University Centre - Háskólatorg - HT-101
July 10, 2018
|14:15||Analyzing and Predicting Human Activities in Video||Greg Mori (SFU)|
|15:00||Structured Deep Learning of Human Motion||Christian Wolf (INSA-Lyon/INRIA)|
|16:00||Handling Missing Annotations for Semantic Segmentation of Medical Images||Nicolas Thome (Cnam Paris)|
|16:45||Geometric Deep Learning and Applications||Gudmundur Einarsson
(Technical University of Denmark)
|20:30||Buffet dinner at Sattrestaurant
Nauthólsvegur 52, 101 Reykjavík
(4900 ISK per person)
|Note time change!|
July 11, 2018
|09:30||Deep Neural Network Compression||Fred Tung (SFU)|
|10:15||Negative Evidence Pooling in Deep ConvNets||Thibaut Durand (SFU)|
|11:15||Deep Preference Neural Network for Move Prediction in Board Games||Tómas Philip Rúnarsson
(University of Iceland)
|12:00||Lunch at STÚDENTAKJALLARINN
|13:30||From Design to Search in High-Dimensional Spaces||Graham Taylor
(University of Guelph/Vector Institute)
|14:15||Using a Neural Network to estimate field strength of groundwave propagated radio signals||Gísli Bergur Sigurðsson
|14:35||From Smileys to Smileycoins: Using a Cryptocurrency in Education||Anna Helga Jónsdóttir
(University of Iceland)
- Analyzing and Predicting Human Activities in Video - Greg Mori
- Structured Deep Learning of Human Motion - Christian Wolf
- Handling Missing Annotations for Semantic Segmentation of Medical Images - Nicolas Thome
- Geometric Deep Learning and Applications - Gudmundur Einarsson
- Deep Neural Network Compression - Fred Tung
- Negative Evidence Pooling in Deep ConvNets - Thibaut Durand
- Deep Preference Neural Network for Move Prediction in Board Games - Tómas Philip Rúnarsson
- From Design to Search in High-Dimensional Spaces - Graham Taylor
Visual recognition involves reasoning about structured relations at multiple levels of detail. For example, human behaviour analysis requires a comprehensive labeling covering individual low-level actions to pair-wise interactions through to high-level events. In this talk I will present recent work by our group building deep learning approaches capable of modeling these structures. I will present models for learning trajectory features that represent individual human actions, and hierarchical temporal models for group activity recognition. I will demonstrate methods for learning where to look in internet videos to efficiently detect human actions.
Visual data consists of massive amounts of variables, and making sense of their content requires modeling their complex dependencies and relationships. This talk presents an overview of our past activities, which aim in enforcing coherence in this large ensemble of observed and latent variables, and to infer estimates from it. In particular, the presentation deals with work on attention mechanisms for video analysis, where structure in the data is not imposed but predicted from input through a fully trained model.
Application wise, we address human action recognition from RGB data and study the role of articulated pose and of visual attention mechanisms for this application. In particular, articulated pose is well established as an intermediate representation and capable of providing precise cues relevant to human motion and behavior. We explore, how articulated pose can be complemented, and in some cases replaced, by mechanisms, which draw attention to local positions in space and time. This allows to model interactions between humans and relevant objects in the scene, as well as regularities between objects themselves.
Annotation of medical images for semantic segmentation is a very time consuming and difficult task. Moreover, clinical experts often focus on specific anatomical structures and thus, produce partially annotated images. In this talk, I introduce SMILE, a new deep convolutional neural network which addresses the issue of learning with incomplete ground truth. SMILE aims to identify ambiguous labels in order to ignore them during training, and don't propagate incorrect or noisy information. A second contribution is SMILEr which uses SMILE as initialization for automatically relabeling missing annotations, using a curriculum strategy. Experiments on 3 organ classes (liver, stomach, pancreas) show the relevance of the proposed approach for semantic segmentation: with 70\% of missing annotations, SMILEr performs similarly as a baseline trained with complete ground truth annotations.
Geometric Deep Learning (GDL) concerns the problem of applying deep learning-based approaches to graph and manifold data. I will focus my attention on graphs and meshes obtained from 3D scanned faces. We will tackle the problem of performing landmark predictions on these kind of data, and demonstrate how we can apply the resulting model to a different imaging modality.
Deep neural networks enable state-of-the-art accuracy on visual recognition tasks such as image classification and object detection. However, modern deep networks contain millions of learned connections; a more efficient utilization of computation resources would assist in a variety of deployment scenarios, from embedded platforms with resource constraints to computing clusters running ensembles of networks.
In this talk, I will first consider the common scenario of adapting a pre-trained neural network to a narrower, specialized image domain. I will introduce the fine-pruning method, which jointly fine-tunes and compresses the pre-trained network to produce an efficient network tailored to the target domain. Next, I will consider the general scenario of compressing a pre-trained neural network. I will present the CLIP-Q method (Compression Learning by In-Parallel Pruning-Quantization), which performs weight pruning and quantization jointly, and in parallel with fine-tuning. CLIP-Q compresses AlexNet by 51-fold, GoogLeNet by 10-fold, ResNet by 15-fold, and MobileNet by 7-fold, while preserving the uncompressed network accuracies on ImageNet.
I discuss in this talk how to adapt deep architectures for complex scenes analysis (with large and cluttered background, not centered objects, variable size). I first present the limitations of modern deep ConvNet architectures to deal with large and complex images. To process large images, several methods use Fully Convolutional Networks (FCNs). The FCN preserves spatial information throughout the network, but requires a global pooling to predict a class label. Several recent approaches only differ in the way (how) and place the location it is achieved inside the network. I detail several block combinations in deep architectures to achieve global pooling, and compare different pooling functions. Our WILDCAT architecture is used to illustrate the different strategies. Results and evaluations on different datasets for visual classification tasks will support (or not) our statements. Finally, I present how to use WILDCAT architecture for weakly-supervised pointwise localization and semantic segmentation.
The training of deep neural networks for move prediction in board games using comparison training is studied. Specifically, the aim is to predict moves for the game Othello from championship tournament game data. A general deep preference neural network will be presented based on a twenty year old model by Tesauro. The problem of over-fitting becomes an immediate concern when training the deep preference neural networks. It will be shown how dropout may combat this problem to a certain extent. How classification test accuracy does not necessarily correspond to move accuracy is illustrated and the key difference between preference training versus single-label classification is discussed. The careful use of dropout coupled with richer game data produces an evaluation function that is a better move predictor but will not necessarily produce a stronger game player.
There is no doubt machine learning is changing the way scientists and engineers perform their craft. In simple terms, it is “software writing software”. Software that is not constrained to the linearity of human thinking and limits to our knowledge has two huge implications. The first, is that machine learning can write software that is more accurate and systematic than humans. The second is that it can write software to solve problems that are currently out of our reach.
In this talk, I will discuss the transition from human-driven design to algorithmic search in high-dimensional spaces. I will use software engineering as an example, but also describe the more general implications for engineering design. I will profile several examples of machine learning’s use in design, from multi-modal data processing systems to micro-hydro generators.
- Greg Mori
- Christian Wolf
- Graham Taylor
Greg Mori received the Ph.D. degree in Computer Science from the University of California, Berkeley in 2004. He received an Hon. B.Sc. in Computer Science and Mathematics with High Distinction from the University of Toronto in 1999. He spent one year (1997-1998) as an intern at Advanced Telecommunications Research (ATR) in Kyoto, Japan. He spent part of 2014-2015 as a Visiting Scientist at Google in Mountain View, CA. After graduating from Berkeley, he returned home to Vancouver and is currently a Professor in the School of Computing Science at Simon Fraser University and Research Director for Borealis AI Vancouver. Dr. Mori's research interests are in computer vision and machine learning. Dr. Mori has served on the organizing committees of the major computer vision conferences (CVPR, ECCV, ICCV). Dr. Mori is an Associate Editor of IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and an Editorial Board Member of the International Journal of Computer Vision (IJCV). He will be a Program Chair for CVPR 2020.
Christian Wolf is associate professor (Maitre de Conférences, HDR) at INSA de Lyon and LIRIS UMR 5205, a CNRS laboratory, since 2005. He is interested in computer vision and machine learning, deep learning, especially in the visual analysis of complex scenes in motion: gesture and activity recognition and pose estimation. In his work he puts an emphasis on models of complex interactions, on structured models, graphical models and on deep learning. He received his MSc in computer science from Vienna University of Technology (TU Wien) in 2000, and a PhD in computer science from INSA de Lyon, France, in 2003. In 2012 he obtained the habilitation diploma, also from INSA de Lyon.
Graham Taylor is an Canada Research Chair in Machine Learning at the University of Guelph, a CIFAR Azrieli Global Scholar, an Academic Director of NextAI, and a member of the Vector Institute for Artificial Intelligence. His research aims to discover new algorithms and architectures for deep learning: the automatic construction of hierarchical algorithms from high-dimensional, unstructured data. He is especially interested in time series, having applied his work to better understand human and animal behaviour, environmental data (climate or agricultural), audio (music or speech) and financial time series. His work also intersects high performance computing, investigating better ways to leverage hardware accelerators to cope with the challenges of large-scale machine learning. He co-organizes the annual CIFAR Deep Learning Summer School, and has trained more than 50 students and staff members on AI-related projects.
Machine Learning Research Group, 2018