15th Annual Workshop of
The Australasian Language Technology Association

Queensland University of Technology, Brisbane, Australia

6th - 8th December 2017

The proceedings is now available on the ACL anthology.

ALTA 2017 Programme

Below is the tentative programme. (This may be subject to change closer to the conference date.)

6th December 2017 (Wednesday) Tutorial

12:30 Registration
Tutorial Session (Presenter: Ben Hachey)
Room P419
13:00 Tutorial Part IFrom Zero to Hero
14:15 Break
14:30 Tutorial Part IILive Shared Task
15:45 Break
16:00 Tutorial Part IIIWild Blue Yonder
17:00 End of Tutorial

7th December 2017 (Thursday) Day 1

8:30 Registration
9:00 Opening
Room P421
Keynote 1 (Chair: Bevan Koopman)
Room P421
9:15 Dan Russell, GoogleWhat do you really need to know? Learning and knowing in the age of the Internet
10:15 Morning Tea
Session 1: Machine Learning and Applications (Chair: Massimo Piccardi)
Room P419
10:45 Leonardo Dos Santos Pinheiro and Mark Dras Stock Market Prediction with Deep Learning: A Character-based Neural Language Model for Event-based Trading
11:05 Fei Liu, Trevor Cohn and Timothy Baldwin Improving End-to-End Memory Networks with Unified Weight Tying
11:25 Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin and Julian Brooke Joint Sentence-Document Model for Manifesto Text Analysis
11:45 Ming Liu, Gholamreza Haffari, Wray Buntine and Michelle Ananda-Rajah Leveraging Linguistic Resources for Improving Neural Text Classification
12:05 Hamideh Hajiabadi, Diego Molla-Aliod and Reza Monsefi On Extending Neural Networks with Loss Ensembles for Text Classification
12:15 Lunch
Keynote 2 (Chair: Cecile Paris)
Room P421
13:15 Lewis Mitchell, University of AdelaideCharacterising information and happiness in online social activity
Session 2: Special Session (Chair: Falk Scholer)
Room P421
14:15 Shiwei Zhang, Xiuzhen Zhang and Jeffrey Chan A Word-Character Convolutional Neural Network for Language-Agnostic Twitter Sentiment Analysis
14:30 Lance De Vine, Shlomo Geva and Peter Bruza Efficient Analogy Completion with Word Embedding Clusters
14:45 Aili Shen, Jianzhong Qi and Timothy Baldwin A Hybrid Model for Quality Assessment of Wikipedia Articles
15:05 Diego Molla-Aliod Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive Summarisation
15:15 Afternoon Tea
Session 3: Translation and Low Resource Languages (Chair: Reza Haffari)
Room P419
15:45 Inigo Jauregi Unanue, Lierni Garmendia Arratibel, Ehsan Zare Borzeshi and Massimo Piccardi English-Basque Statistical and Neural Machine Translation
16:05 Oliver Adams, Trevor Cohn, Graham Neubig and Alexis Michaud Phonemic Transcription of Low-Resource Tonal Languages
16:25 Hanieh Poostchi, Ehsan Zare Borzeshi and Massimo Piccardi BiLSTM-CRF for Persian Named-Entity Recognition
16:45 End of Day 1
19:00 Dinner

8th December 2017 (Friday) Day 2

Keynote 3 (Chair: Mark Carman)
Room P421
9:15 Victor Kovalev, RedbubbleSolving hard problems at massive scale – applied data science research approach at Redbubble
10:15 Morning Tea
Session 4: Computational Linguistics and Information Extraction (Chair: Mark Dras)
Room P419
10:45 Dat Quoc Nguyen, Thanh Vu, Dai Quoc Nguyen, Mark Dras and Mark Johnson From Word Segmentation to POS Tagging for Vietnamese
10:55 Shunichi Ishihara A Comparative Study of Two Statistical Modelling Approaches for Estimating Multivariate Likelihood Ratios in Forensic Voice Comparison
11:15 Katharine Cheng, Timothy Baldwin and Karin Verspoor Automatic Negation and Speculation Detection in Veterinary Clinical Text
11:35 Xiang Dai, Sarvnaz Karimi and Cecile Paris Medication and Adverse Event Extraction from Noisy Text
11:55 Maria Myunghee Kim Incremental Knowledge Acquisition Approach for Information Extraction on both Semi-structured and Unstructured Text from the Open Domain Web
12:15 Lunch
Keynote 4 (Chair: Stephen Wan)
Room P421
13:15 Robert Dale, Language Technology Group Pty LtdCommercialised NLP: The state of the art
14:15 Poster Session
Room S407 and S408 (S Block)
Afternoon Tea
Shared Task Session (Chair: Diego Molla-Aliod)
Room P419
15:30 Diego Molla-Aliod and Steve Cassidy Overview of the 2017 ALTA Shared Task: Correcting OCR Errors
Gitansh Khirbat OCR Post-Processing Text Correction using Simulated Annealing (OPTeCA)
Yufei Wang SuperOCR for ALTA 2017 Shared Task
16:00 Best Paper and Poster Presentation Awards
16:15 Business Meeting
16:45 Closing
17:00 End of Day 2

Invited Keynotes

Title: Characterising Information and Happiness in Online Social Activity

Speaker: Lewis Mitchell (Lecturer in Applied Mathematics, University of Adelaide)

Lewis’s research focusses on large-scale methods for extracting useful information from online social networks, and on mathematical techniques for inference and prediction using these data. He works on building tools for real-time estimation of social phenomena such as happiness from written text, and prediction of population-level events like disease outbreaks, elections, and civil unrest.


Understanding the nature of influence and information propagation in social networks is of clear societal importance, as they form the basis for phenomena like "echo chambers" and "emotional contagion". However, these concepts remain surprisingly ill-defined. In studies of large online social networks, proxies for influence and information are routinely employed, leading to confusion as to whether the phenomena they underlie actually exist. In this talk I will demonstrate how online social media streams can be used as proxies for population-level health characteristics such as obesity and happiness, and introduce information-theoretic tools for constructing social networks from underlying information flows between individuals. I will present results relating individual predictability to popularity and contact volume, and introduce a paradigmatic mathematical model of information flow over social networks.

Title: Commercialised NLP: The State of the Art

Speaker: Robert Dale (Principal Consultant, Language Technology Group Pty Ltd)

Robert Dale runs the Language Technology Group, an independent consultancy providing unbiased advice to corporations and businesses on the selection and deployment of NLP technologies. Until recently, he was Chief Technology Officer of Arria NLG, where he led the development of a cloud-based natural language generation tool; prior to joining Arria in 2012, he held a chair in the Department of Computing at Macquarie University in Sydney, where he was Director of that university's Centre for Language Technology. After receiving his PhD from the University of Edinburgh in 1989, he taught there for several years before moving to Sydney in 1994. He played a foundational role in building up the NLP community in Australia, and was editor in chief of the Computational Linguistics journal from 2003 to 2012. He writes a semi-regular column titled 'Industry Watch' for the Journal of Natural Language Engineering.


The last few years have seen a tremendous surge in commercial interest in Artificial Intelligence, and with it, a widespread recognition that technologies based on Natural Language Processing can support valuable commercial applications. In this talk, I'll aim to give a comprehensive picture of the commercial NLP landscape, focussing on what I see as the key categories of activity: [1] virtual assistants, including chatbots; [2] text analytics and text mining technologies; [3] machine translation; [4] natural language generation; and [5] text correction technologies. In each case my goal is to sketch the history of work in the area, to identify the major players, and to give a realistic appraisal of the state of the art.