Ankit Kumar Srivastava, B.S., M.A., Ph.D.

Profile Photo Researcher, DFKI Language Technology Lab,
German Research Center for Artificial Intelligence

DFKI GmbH Alt-Moabit 91c,
10559 Berlin, Germany

T: +49 30 238 95 1846
E: Ankit DOT Srivastava AT dfki DOT de

I am a computer scientist researching in the field of computational modeling of natural languages focussing on Machine Translation (MT) and Multilingual Applications. Recently I have been working on:

  • * Machine Translation: Statistical, Neural, Reranking, Retraining
  • * Linked Open Data
  • * Summarisation: Single-Document and Multi-Document
  • * Information Extraction: Event, Entity, Coreference

I have a Doctor of Philosophy in Statistical Machine Translation (Ph.D., 2014) from School of Computing at Dublin City University, a Master of Arts in Computational Linguistics (M.A., 2008) from the Department of Linguistics at University of Washington Seattle, and a Bachelor of Science in Computer Sciences (B.S., 2006) from Department of Computer Science at the University of Texas at Austin.

MT is one of the earliest non-numeric applications of computers and an active use case of Artificial Intelligence (AI). Multilingual online chatting, multilingual customer support (see BBC Business article dated Nov 14, 2014), automatic email translation (learn how to enable message translation in Gmail), multilingual video games, relief and aid workers communicating at a disaster-struck foreign country (read Lewis 2010 paper on rapid MT deployment for Haitian Creole), cross-lingual search on the web, machine-aided human translation (e.g. MateCat Computer-Assisted Translation tool): each of the afore-mentioned scenarios currently uses or has the potential to use MT technology in some fashion.

For further information, see my publications and research projects.

Publications

Vivien Macketanz, Eleftherios Avramidis, Aljoscha Burchardt, Jindrich Helcl, and Ankit Srivastava. 2017. Machine Translation: Phrase-based, Rule-based, and Neural Approaches with Linguistic Evaluation. Journal. In Journal Cybernetics and Information Technologies (CIT), Vol.17(2).

Ankit Srivastava, Georg Rehm, and Julian Moreno Schneider. 2017. DFKI-DKT at SemEval 2017 Task 8: Rumour Detection and Classification using Cascading Heuristics. System Description. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017), Association for Computational Linguistics (ACL 2017), Vancouver, Canada, pp. 477-481

Georg Rehm, Julian Moreno Schneider, Peter Bourgonje, Ankit Srivastava, Jan Nehring, Armin Berger, Luca Konig, Soren Rochle, and Jens Gerth. 2017. Event Detection and Semantic Storytelling: Generating a Travelogue from a Large Collection of Personal Letters. Workshop. In Proceedings of the Workshop on Events and Stories in the News, Association for Computational Linguistics (ACL 2017), Vancouver, Canada.

Ankit Srivastava, Georg Rehm, and Felix Sasaki. 2017. Improving Machine Translation through Linked Data. Journal. In Proceedings of the 20th Annual Conference of the European Association for Machine Translation (EAMT 2017), Prague Bulletin of Mathematical Linguistics, Vol. 108, Prague, Czech Republic, pp. 355-366

Ankit Srivastava, Felix Sasaki, Peter Bourgonje, Julian Moreno Schneider, Jan Nehring, and Georg Rehm. 2016. How To Configure Statistical Machine Translation with Linked Open Data Resources. Conference. In Proceedings of the 38th Annual Conference on Translating and Computer (TC38), London, UK, pp. 138-148.

Ankit Srivastava, Vivien Macketanz, Aljoscha Burchardt, and Eleftherios Avramidis. 2016. Towards Deeper MT: Parallel Treebanks, Entity Linking, and Linguistic Evaluation. Workshop. In Proceedings of the Workshop on Deep Language Processing for Quality Machine Translation, Varna, Bulgaria, pp. 34-39.

Eleftherios Avramidis, Aljoscha Burchardt, Vivien Macketanz, and Ankit Srivastava. 2016. DFKI’s System for WMT 16 IT-domain Task, including Analysis of Systematic Errors. System Description. In Proceedings of the First Conference on Machine Translation, Association for Computational Linguistics (ACL 2016), Berlin, Germany, pp. 415-422.

Julian Moreno Schneider, Peter Bourgonje, Jan Nehring, Georg Rehm, Felix Sasaki, and Ankit Srivastava. 2016. Towards Semantic Storytelling with Digital Curation Technologies. Workshop. In Proceedings of the Workshop on Natural Language Processing meets Journalism, International Joint Conference on Artificial Intelligence (IJCAI 2016), New York, USA, pp. 20-24.

Peter Bourgonje, Julian Moreno Schneider, Jan Nehring, Georg Rehm, Felix Sasaki, and Ankit Srivastava. 2016. Towards a Platform for Curation Technologies: Enriching Text Collections with a Semantic-Web Layer. Book Chapter. In Proceedings of the Semantic Web: ESWC 2016, Lecture Notes in Computer Science, Vol. 9989, Heraklion, Crete, Greece, pp. 65--68.

Meghan Dowling, Lauren Cassidy, Eimear Maguire, Teresa Lynn, Ankit Srivastava, and John Judge. 2015. Tapadóir: Developing a Statistical Machine Translation Engine and Associated Resources for Irish. Conference. In Proceedings of the 4th Workshop for the Language Technologies in Support of Less-Resourced Languages (LRL 2015), Poznan, Poland, pp. 314--318.

Jinhua Du, Ankit Srivastava, Andy Way, Alfredo Maldonado-Guerra, and David Lewis. 2015. An Empirical Study of Segment Prioritization for Incrementally Retrained Post-Editing-Based SMT. Conference. In Proceedings of the 15th Machine Translation Summit (MT Summit XV), Miami, Florida, USA, pp. 172--185.

Jinhua Du, Ankit Srivastava, Martin Lauer, Andy Way, Alfred Maldonando, and Dave Lewis. 2015. Translation project-Level Evaluation. Technical Report. EU-FP7 Project (Grant No. 610979) FALCON: Federated Active Linguistic Data Curation, Public Deliverable 4.3.

Joss Moorkens, Ankit Srivastava, Balasaz Benedek, Dave Lewis, and Kazemi Fatema. 2015. Linguistic Task usability and Reuse Efficacy Results. Technical Report. EU-FP7 Project (Grant No. 610979) FALCON: Federated Active Linguistic Data Curation, Public Deliverable 4.2.

Jinhua Du, Ankit Srivastava, Alfred Maldonando, Sandipan Dandapat, Bruno del Campo, Dave Lewis, and Andy Way. 2015. Integrated System Performance Evaluation. Technical Report. EU-FP7 Project (Grant No. 610979) FALCON: Federated Active Linguistic Data Curation, Public Deliverable 4.1.

Ankit Srivastava, Jinhua Du, Andy Way, Alfred Maldonado, and Dave Lewis. 2015. SMT and NER Integration into L3Data Federation Platform. Technical Report. EU-FP7 Project (Grant No. 610979) FALCON: Federated Active Linguistic Data Curation, Public Deliverable 3.7.

Leroy Finn, Kevin Koidl, Dave Lewis, Sandipan Dandapat, Ankit Srivastava, Alfred Maldonado, and Bruno del Campo. 2014. System Test Suite. Technical Report. EU-FP7 Project (Grant No. 610979) FALCON: Federated Active Linguistic Data Curation, Public Deliverable 3.1.

Santanu Pal, Ankit Srivastava, Sandipan Dandapat, Josef van Genabith, Qun Liu, and Andy Way. 2014. USAAR-DCU Hybrid Machine Translation System for ICON 2014. System Description. In Proceedings of the 11th International Conference on Natural Language Processing (ICON 2014), Goa, India.

Santiago Cortes, Piyush Arora, Chris Hokamp, Federico Fancellu, Alex Killen, Lamia Tounsi, Antonio Toral, Ankit Srivastava, Maria Alecu, Iacer Calixto, Sheila Castilho, Keith Curtis, Federico Gaspari, Akira Hayakawa, Teresa Lynn, Peyman Passban, Eziz Tursun, Ali Hosseinzadeh Vahid, Xiaofeng Wu, Xiaojun Zhang, Debasis Ganguly, Louise Irwin, Anna Kostekidou, Liangyou Li, Tsuyoshi Okita, Ximo Planells, David Racca, Joris Vreeke, Jian Zhang, Andy Way, Will Lewis, Declan Groves, Federico Garcea, and Chris Wendt. 2014. Brazilator: Machine Translation and Sentiment Analysis for World Cup 2014. Software Demo. In Proceedings of the Association for Machine Translation in the Americas (AMTA 2014) Technology Showcase, Vancouver, Canada, pp. 4--5.

Ankit Srivastava, Federico Gaspari, Lucia Specia, John Judge, and Qun Liu. 2014. Workshop and Shared Task Proceedings. Technical Report. EU-FP7 Project (Grant No. 296347) QTLaunchPad: Preparation and Launch of a Large-Scale Action for Quality Translation Technology, Public Deliverable 5.2.1.

Ankit Srivastava. 2014. Phrase Extraction and Rescoring in Statistical Machine Translation. PhD Thesis. Dublin City University.

Ankit Srivastava, Declan Groves, and John Judge. 2013. Metadata-Aware Machine Translation Experiments. Technical Report. EU-FP7 Project (Grant No. 287815) MultilingualWeb-LT (LT-Web): Language Technology in the Web, Public Deliverable 5.2 annex.

Ankit Srivastava, Declan Groves, and John Judge. 2013. Report on Metadata-Aware Machine Translation Training Tools. Technical Report. EU-FP7 Project (Grant No. 287815) MultilingualWeb-LT (LT-Web): Language Technology in the Web, Public Deliverable 5.2.

Pablo Caride, Giuseppe Nolasco, Pedro Orzas, Felix Fernandez, Consuelo Aldana, Pablo Badia, Ankit Srivastava, Declan Groves, Thomas Ruedesheim, Roman Diez, and Alberto Crespo. 2013. Report on Online MT System. Technical Report. EU-FP7 Project (Grant No. 287815) MultilingualWeb-LT (LT-Web): Language Technology in the Web, Public Deliverable 4.2.2.

Pablo Caride, Giuseppe Nolasco, Pedro Orzas, Felix Fernandez, Consuelo Aldana, Pablo Badia, Ankit Srivastava, Declan Groves, Thomas Ruedesheim, Roman Diez, and Alberto Crespo. 2013. Online MT System Linguaserve Showcase. Technical Report. EU-FP7 Project (Grant No. 287815) MultilingualWeb-LT (LT-Web): Language Technology in the Web, Public Deliverable 4.2.1.

Ankit Srivastava and Declan Groves. 2013. ITS 2.0-enabled Statistical Machine Translation and Training Web Services. Conference. In Proceedings of the Federated Event for Interoperability Standardization in Globalization, Internationalization, Localization, and Translation Technologies (FEISGILTT) at Localization World Conference 2013, London, UK.

Ankit Srivastava, Declan Groves, Leroy Finn, and Dave Lewis. 2013. Simple Segment Machine Translation Using DCU MaTrEx for Translating Segments extracted from ITS 2.0-aware XLIFF and HTML documents. Software Demo. In Proceedings of the 6th Multilingual Web Workshop (Multilingual 2013), Rome, Italy.

Pablo Caride, Thomas Ruedesheim, Ankit Srivastava, Giuseppe Nolasco, Declan Groves, and Pedro Orzas. 2013. Report on Modifications in MT Systems. Technical Report. EU-FP7 Project (Grant No. 287815) MultilingualWeb-LT (LT-Web): Language Technology in the Web, Public Deliverable 4.1.4.

Ankit Srivastava, Yanjun Ma, and Andy Way. 2011. Oracle-based Training for Phrase-based Statistical Machine Translation. Conference. In Proceedings of the 15th Annual Conference of the European Association for Machine Translation (EAMT 2011), Leuven, Belgium, pp. 169--176.

Sergio Penkale, Rejwanul Haque, Sandipan Dandapat, Pratyush Banerjee, Ankit Srivastava, Jinhua Du, Pavel Pecina, Sudip Naskar, Mikel Forcada, and Andy Way. 2010. MaTrEx: The DCU MT System for WMT 2010. System Description. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation (WMT 2010) and Metrics MATR at 48th Annual Meetingof the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, pp. 143-148.

Ankit Srivastava, Sergio Penkale, Declan Groves, and John Tinsley. 2009. Evaluating Syntax-Driven Approaches to Phrase Extraction for MT. Workshop. In Proceedings of 3rd International Workshop on Example-Based Machine Translation (EBMT 2009), Dublin, Ireland, pp. 19-28.

Jinhua Du, Yifan He, Ankit Srivastava, Rejwanul Haque, Sandipan Dandapat, and Andy Way. 2009. DCU MT System and Recent Research Improvements. System Description. Poster at NIST Open Machine Translation Evaluation (MT09), Ottawa, Canada.

Ankit Srivastava and Andy Way. 2009. Using Percolated Dependencies for Phrase Extraction in SMT. Conference. In Proceedings of the 12th Machine Translation Summit (MT Summit 2009), Ottawa, Canada, pp. 316-323.

Rejwanul Haque, Sandipan Danadapat, Ankit Srivastava, Sudip Naskar, and Andy Way. 2009. English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System. System Description. In Proceedings of the 2009 Named Entities Workshop (NEWS): Shared Task on Transliteration at Jont Conference of 47th Annual Meeting of the Association for Computational Linguistics (ACL 2009) and 4th International Joint Conference on Natural Language Processing (IJCNLP 2009), Suntec, Singapore, pp. 104-107

Ankit Srivastava, Rejwanul Haque, Sudip Naskar, and Andy Way. 2008. MaTrEx: the DCU MT System for ICON 2008. System Description. In Proceedings of the NLP Tools Contest: Statistical Machine Translation (English to Hindi) at 6th International Conference on Natural Language Processing (ICON 2008), Pune, India.

Note

If you have any queries on the data and/or associated methodologies and code for any of this research, please do not hesitate to email me.

Ankit - Ankit DOT Srivastava AT dfki DOT de

Research

I work with millions of sentences written in two or more languages, design mathematical models to represent them so that computers can learn patterns between these languages and emulate human communication.

Image

Machine Translation is the automation of translation between human (natural) languages. Currently, there is more content to be translated than human translators available. This demands automation of the translation process as much as is possible. It involves gathering large amounts of online content written in two or more languages such as the European Union Parliamentary Proceedings (billions of words; Big Data), writing grammars (mathematical models) for these languages. These grammars can be written by hand (rule-based) or learned automatically (machine learning of syntax, semantics, pragmatics) drawing heavily from language theories (computational linguistics).

List of Research Projects:

  • DKT (Digital Curation Technologies)

    DKT is a 2-year German government (BMBF) funded research project. The project aims at supporting curation processes carried out by knowledge workers at museums, newsrooms, digital archives via robust precise and modular language and knowledge technologies for semantic analysis, semantic generation, and multilingual processing. Consortium with 4 Industry partners, 2015--2017.

  • FALCON (Federated Active Linguistic Data Curation)

    FALCON was a 2-year European Commission (EU-FP7) funded research project focused on combining the power of open data on the web with data-driven language technologies such as Machine Translation (MT) and Named Entity Recognition (NER) to deliver a modular software platform to enable SMEs (small and medium enterprises) in the translation and localisation industry to leverage Semantic Web resources such as Linked Open Data (LOD). Consortium of 2 Academic and 3 Industry partners, 2013--2015.

  • QTLaunchPad (Preparation and Launch of a Large Scale Action for Quality Translation Technology)

    The QTLaunchPad project was a 2-year European Commission (EU-FP7) funded support action (CSA) focused on collaborative research on high-quality MT with the ultimate goal of launching an international action dedicated to significantly advancing quality translation technology. This was the precursor to Horizon 2020-funded QT21 Project. Consortium of 4 Academic Partners and Research Centres, 2012--2014.

  • LT-Web (Language Technology in the Web / MultilingualWeb-LT)

    The LT-Web project was a 2-year European Commission (EU-FP7) funded support action (CSA) focused on collaborative research on developing standards for language technology on the web. Consortium of 13 Academic, Industry Partners and Research Centres, 2012--2013

    .
  • CNGL (Next Generation Localisation)

    Localisation is the adaptation of digital content to culture, locale and linguistic environment. The Centre for Next Generation Localisation (CNGL) was a large Academia-Industry partnership, funded by Science Foundation Ireland and Industry Partners (2007-12), with over 100 researchers developing novel technologies addressing the key localisation challenges of volume, access, and personalisation. Consortium of 4 Academic and 9 Industry Partners, 2008--2012

    .

Our Galleries