Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation

Latest news

The ACCURAT Toolkit version 3.0 released

The ACCURAT consortium is pleased to announce the release of the third version of the ACCURAT toolkit. Free access is granted to the toolkit after filling out the registration form.

You are welcome to use the ACCURAT Toolkit (including the code) under the terms of the Apache 2.0 licence, however please acknowledge its use with a citation:

Pinnis, M., Ion, R., Ştefănescu, D., Su, F., Skadiņa, I., Vasiļjevs, A., & Babych, B. (2012). ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora. Proceedings of the ACL 2012 System Demonstrations (pp. 91–96). Association for Computational Linguistics. Jeju, South Korea.

Skadiņa, I., Aker, A., Mastropavlos, N., Su, F., Tufiș, D., Verlic, M., Vasiļjevs, A., Babych, B., Clough, P., Gaizauskas, R., Glaros, N., Paramita, M. & Pinnis, M. (2012). Collecting and Using Comparable Corpora for Statistical Machine Translation. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) (pp. 438–445). European Language Resources Association (ELRA). Istanbul, Turkey.

If you would like to cite a particular tool included in the ACCURAT Toolkit, please refer to the documentation of the ACCURAT Toolkit (or the paper above) for specific tool references.

| 2012-07-02 |

The ACCURAT Toolkit Demonstrated at ACL2012

The ACCURAT Toolkit was successfully demonstrated to the research community in the system demonstrations track of the 50th annual meeting of the Association for Computational Linguistics (ACL2012) by Mārcis Pinnis. The conference took place from July 8 to July 14 in Jeju, South Korea. The following paper was published in the Proceedings of ACL 2012:

The poster created specially for the demonstration can be acquired here.

| 2012-08-21 |

ACCURAT at TKE2012 and CHAT2012

ACCURAT was presented at the Terminology and Knowledge Engineering Conference TKE 2012 - New frontiers in the constructive symbiosis of terminology and knowledge engineering organized in Madrid, Spain from June 19 to June 22. ACCURAT methods for term extraction, tagging and bilingual mapping were presented in the conference by Mārcis Pinnis with an oral paper presentation:

The ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora was demonstrated also at the TKE2012 post-conference workshop CHAT2012 (the Second Workshop on Creation, Harmonization and Application of Terminology Resources) by Mārcis Pinnis.

| 2012-08-21 |

EAMT2012 successful for ACCURAT

This year's annual EAMT2012 conference was organised in Trento, Italy, from 2012-05-28 to 2012-05-30. The pictoresque summer Alpine landscape was considered by participants to be very inspiring, so the conference turned into a series of interesting presentations and discussions. This was particulary noticeable at several poster sessions, and at one of them Raivis Skadiņš and Marko Tadić presented the ACCURAT final poster, while three ACCURAT partners presented their papers:

 

| 2012-06-02 |

Another ACCURAT workshop organised

The 5th Workshop on Building and Using Comparable Corpora was held on 2012-05-26 as the LREC2012 post-conference full day workshop, in Istanbul, Turkey. ACCURAT project was one of the organising projects while ACCURAT partners -- Marko Tadić and Andrejs Vasiļjevs -- were the members of the workshop Ogranising Committee, while Marko Tadić was one of the editors of the Proceedings. This workshop was planned to serve as the final and most important event where results of the ACCURAT project will be presented to the scientific community. The project was represented by four presentations (oral or poster):

  • Skadiņa, I., Analysis and Evaluation of Comparable Corpora for Under-Resourced Areas of Machine Translation
  • Ljubešić, N., Vintar, Š., Fišer, D. Multi-word term extraction from comparable corpora by combining contextual and constituent clues
  • Irimia, E. Experimenting with Extracting Lexical Dictionaries from Comparable Corpora for English-Romanian language pair
  • Ştefănescu, D. Mining for Term Translations in Comparable Corpora

| 2012-06-02 |

ACCURAT boosting its scientific output at LREC2012

The LREC2012 could be considered the most successful conference for the ACCURAT project so far. The project partners presented eight papers either as oral presentations or posters. The large number of contributions to this conference was expected since it came close to the end of the ACCURAT project where many interesting results were available for presentation, but the number of accepted papers surpassed all expectations. This has proven that ACCURAT project produced important scientific output. The papers presented were:

At the EU Project Village, ACCURAT project had a booth where information about the project were successfuly disseminated using posters, leaflets, t-shirts and system demonstration.

 

 

 

 

 

 

Aivars Bērziņš and Tatiana Gornostay proudly presented ACCURAT project and its achievements during the EU Project Village at LREC2012.

| 2012-06-04 |

ACCURAT at 7th Web as Corpus Workshop (WAC7)

In Lyon, France on 2012-04-17 The 7th Web as Corpus Workshop (WAC7) was organised. ACCURAT project was represented by Marco Brunello giving a talk titled Understanding the composition of parallel corpora from the web.

| 2012-05-02 |

ACCURAT with three papers at ConsILR 2011-2012

The conference Linguistic Resources and Tools for Processing the Romanian Language was held in Bucharest, Romania in somewhat unusual manner. The conference was split in two sessions happening in a distance of four months: 2011-12-08/09 and 2012-04-26/27. The ACCURAT project was presented with three papers written by RACAI partners: Irimia, E. DEACC – Lexical Dictionary Extractor from Comparable Corpora; Ion, R. Graphic Comparability Levels for Comparable Corpora; Ştefănescu, D. Extracting Parallel Terminology from Comparable Corpora.

| 2012-05-02 |

ACCURAT organised the workshop at GALA2012 conference

The GALA2012 was the annual global meeting of the Globalization and Localization Association, a largest global non-profit association within the language industry, providing resources, education, ideas and research for companies working with translation services, language technology and content localization. Projects ACCURAT and LetsMT! jointly organised a pre-conference workshop Customized Machine Translation: Platform, Tools and Application LetsMT! cloud platform and ACCURAT tools that was held in Monte Carlo, Monaco on 2012-03-25. The whole workshop was targeted for localisation and language technology professional users where we presented the latest development and ACCURAT tools. Presenters were partners from both projects while the invited speaker was Achim Ruopp from Digital Silk Road. You can read more on the workshop web page.

| 2012-04-02 |

The attendees of CLARA Career Course get acquainted with ACCURAT

The CLARA is the Initial Training Network in the Marie Curie Actions. Its Career Course on Product Planning for Next Generation Information Access Technology Solutions was held in the Centre for Advance Academic Studies, University of Zagreb in Dubrovnik, Croatia from 2011-09-20 until 2011-09-23. The whole course is targeted for early stage and experienced researchers in Language Resources and Technology. The course offers complementary skills for their future R&D careers in industry or in academic cooperation with industry. Within this course different applications of LRT were presented starting from their whole life cycle and finishing with their scientific results. The ACCURAT project was explained by the usage of poster that was presented by Aivars Bērziņš.

| 2011-10-05 |

ACCURAT with two papers and a poster at SlaviCorp2011 conference

The 2nd Slavic Corpora Conference, SlaviCorp2011 was held in the Centre for Advance Academic Studies, University of Zagreb in Dubrovnik, Croatia from 2011-09-12 until 2011-09-14. The research within ACCURAT project was presented with two papers: hrWaC and slWac: Web Corpora for Croatian and Slovene by Nikola Ljubešić, Tomaž Erjavec and Development and Applications of the Croatian 1984 Corpus for the MULTEXT-East Resources by Željko Agić, Daša Berović, Danijela Merkler. Also, the ACCURAT poster was presented during the conference at the exhibition of different European projects that deal with the usage of corpora of Slavic languages. You can watch this presentations at our Video lectures pages, while the references and the full papers are available at our Publications page.

| 2011-09-27 |

At RANLP2011 conference ACCURAT also appears with a paper in the main conference program

The International Conference "Recent Advances in NLP", RANLP2011 took place in Hissar, Bulgaria from 2011-09-12 until 2011-09-14 with preceding tutorials (2011-09-10 and 2011-09-11) and following workshops (2011-09-15 and 2011-09-16). The ACCURAT project appeared in the main conference programme with a paper Bilingual Lexicon Extraction from Comparable Corpora for Closely Related Languages by Darja Fišer and Nikola Ljubešić. The references and full papers are available at our Publications page.

| 2011-09-18 |

ACCURAT expanding the scientific output: two new papers at TSD2011

The 14th International Conference TSD 2011 was held from 2011-09-01 until 2011-09-05 in Plzeň, Czech Republic. As usually the Text, Speech and Dialog (TSD) conference is concerned with the topics in the field of natural language processing, in particular: corpora, texts and transcription; speech analysis, recognition and synthesis; their intertwining within NL dialogue systems. The keynote topic of the conference for this year was Integrating Modern Web with Speech and Language Technologies. The ACCURAT project appeared with two papers presenting the topics of using web-collected comparable corpora: Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages by Nikola Ljubešić, Darja Fišer and hrWaC and slWac: Compiling Web Corpora for Croatian and Slovene by Nikola Ljubešić, Tomaž Erjavec. The references and full papers are available at our Publications page.

| 2011-09-11 |

ACCURAT presented at SFCM 2011

The Second Workshop on Systems and Frameworks for Computational Morphology, SFCM 2011 took place on 2011-08-26 in Zürich, Switzerland. The workshop aims to bring together researchers and developers in the area of computational morphology. In this workshop Mārcis Pinnis (Tilde) presented research paper Maximum Entropy Model for Disambiguation of Rich Morphological Tags describing recently obtained results on morphological taging for languages of Baltic countries which are applied in ACCURAT project. He also presented ACCURAT toolkit for multilevel alignment and information extraction at the demo session of the workshop.

| 2011-08-29 |

Two ACCURAT papers at ACL2011 workshop in Portland, USA

The 4th Workshop on Building and Using Comparable Corpora (BUCC), organized by Pierre Zweigenbaum (LIMSI-CNRS and ERTIM-INALCO), Reinhard Rapp (University of Mainz and University of Tarragona) and Serge Sharoff (University of Leeds) was held on 2011-06-24 in conjunction with ACL-HLT 2011 in Portland, Oregon, USA. The ACCURAT project was presented by two papers: An Expectation Maximization Algorithm for Textual Unit Alignment by Radu Ion, Alexandru Ceauşu and Elena Irimia and Building and Using Comparable Corpora for Domain-Specific Bilingual Lexicon Extraction by Darja Fišer, Nikola Ljubešić, Špela Vintar and Senja Pollak. The references and full papers are available at our Publications page.

| 2011-07-08 |

ACCURAT presented with a poster at META-FORUM in Budapest

The central disseminating event of the META-NET community in 2011, META-FORUM 2011 took place in Budapest on 2011-06-27 and 2011-06-28. Within this two days of conference densly packed with a number of parallel actions -- oral presentations, poster presentations, software demonstrations, an exhibition of European projects was organised. The ACCURAT project poster was presented and it attracted quite an interest since the first tangible results were shaping up.

 

| 2011-07-08 |

ACCURAT project at NooJ2011

The NooJ (author Max Silbersztein) is a very popular development environment for construction of formal grammars and their immediate application to corpora. NooJ community organises its yearly conferences regularly in May or June. During 2011-06-13 and 2011-06-16 the NooJ conference took place in the Centre for Advance Academic Studies, University of Zagreb in Dubrovnik, Croatia. Within this three day conference an exhibition of European project was organised where ACCURAT project poster was presented. At the main conference a paper by Daša Berović, Danijela Merkler and Željko Agić Disambiguation of homographic adjective and adverb forms in Croatian was presented. You can watch this presentation at our video lectures pages.

| 2011-07-08 |

ACCURAT presented at EU projects exhibition at EAMT2011

The European Association for Machine Translation (EAMT) organises its yearly conferences regularly in May. This year the venue was Faculty of Arts, Katholieke Universitet Leuven in Belgium. This two day conference started on 2011-05-30 and on 2011-05-31 there was an exhibition of European projects related to machine translation. Since this audience is considered to be a natural lieu for ACCURAT, our poster and flyers were presented there and Inguna Skadina was the presenter. Since our project raised quite an interest, she had to answer a lot of questions from the participants of the conference.

| 2011-06-08 |

New ACCURAT poster presented at FLaReNet Forum 2011

The FLaReNet Forum 2011 took place in Venice from 2011-05-26 to 2011-05-27. It assembled numerous representatives from LRT community from all over Europe and it can be considered the largest event in Europe in this year so far. ACCURAT was presented there with a new mid-term poster and leaflets and attracted considerable interest. Andrejs Vasiljevs also had a presentation How to get more data for under-resourced languages and domains? where ACCURAT project was presented. The whole presentation is available at our video lectures pages.

 

Thierry Declerck from DFKI observed carefully the results achieved so far.

| 2011-06-08 |

ACCURAT presented at workshop Machine Translation and Morphologically-rich Languages

The research workshop Machine Translation and Morphologically rich Languages was held on January 23-27, 2011 in Haifa, Israel. During the workshop work on English-Latvian and English-Lithuanian statistical machine translation systems was presented by Tilde. Authors presented results of human evaluation showing that integration of morphology knowledge into SMT gives significant improvement of translation quality compared to baseline SMT.

 

 

 

 

 

 

 

| 2011-04-08 |

ACCURAT presented at SlaviCorp2010: Corpora of Slavic Languages

The The SlaviCorp2010: Corpora of Slavic Languages conference was held from 2010-11-22 to 2010-11-24 in Warsaw, Poland. Within the presentations of different Slavic corpora and their usage in national and EU projects, the ACCURAT project was presented as it deals with comparable corpora covering two Slavic languages: Croatian and Slovenian.

The ACCURAT project was presented by Marko Tadić.

| 2010-11-25 |

ACCURAT project presented at the first META-NET Forum

The first META-NET Forum was organized on 2010-11-17 and 18 in Bruxelles. It collected the largest number of experts from the field of LR&T in 2010 after the LREC2010 conference. Since ACCURAT was presented as one of cooperating projects within the META-NET alliance, its poster was presented in the poster session.

 

 

 

Mike Rosner from the University of Malta was interested in ACCURAT project results.

| 2010-11-20 |

ACCURAT project presented at the Fourth International Conference HUMAN LANGUAGE TECHNOLOGIES — THE BALTIC PERSPECTIVE

The ACCURAT project was presented at the Fourth International Conference HUMAN LANGUAGE TECHNOLOGIES — THE BALTIC PERSPECTIVE that was organized from 2010-10-07 to 2010-10-08 in Riga by Institute of Mathematics and Computer Science (University of Latvia) and Tilde. The invited speech "From corpora to resources and tools – towards a proper treatment of Eastern European languages" was given by Andreas Eisele. The poster "A Collection of Comparable Corpora for Under-resourced Languages" was presented by Inguna Skadiņa.

| 2010-10-14 |

ACCURAT project presented at projects exhibition within the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages

The Seventh International Conference Formal Approaches to South Slavic and Balkan Languages (FASSBL7) was held from 2010-10-04 to 2010-10-06 in Dubrovnik, Croatia. The current EC funded projects from the LRT field were presented with posters and other dissemination materials.

The poster was presented by Marko Tadić.

 

 

Conference participants were interested about the latest research results from the project

| 2010-10-12 |

ACCURAT project presented at the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages

The Seventh International Conference Formal Approaches to South Slavic and Balkan Languages (FASSBL7) was held from 2010-10-04 to 2010-10-06 in Dubrovnik, Croatia. Since this conference primary goal is to present current research on formal and computational approaches to South-Slavic and Balkan languages, presenting advances in the ACCURAT project was natural since languages of four project partners are covered by this definition. Three presentations were staged by staff from two consortium members (RACAI and FFZG) whose papers were accepted for this conference:

  • Radu Ion et al.: On-line Compilation of Comparable Corpora and their Evaluation;
  • Kristina Vučković et al.: Sentence Classification and Clause Detection for Croatian;
  • Krešimir Šojat et al.: Verb Valency Frame Extraction Using Morphological and Syntactic Features of Croatian

See the whole presentations as video lectures at our Video lectures pages.

| 2010-10-12 |

ACCURAT project progress meeting held in Dubrovnik

Joint with Formal Approaches to South-Slavic and Balkan Languages (FASSBL) conference, an ACCURAT progress meeting was held in Dubrovnik, Croatia on 2010-09-29 and 2010-09-30. Representatives from all partners were present and relevant indicators for project progress were closely examined, deliverables in all WPs were discussed. With some minor adaptations the progress on the project is running as planned.

 

 

 

Participants of the project meeting in front of Centre for Advanced Academic Studies in Dubrovnik, Croatia

| 2010-10-12 |

ACCURAT project presented at the 14th EAMT2010 annual conference

The ACCURAT project was presented by a poster at 14th annual European Association for Machine Translation conference that was organized from 2010-05-27 to 2010-05-28 in in Saint-Raphaël, France.

The poster was presented by Andreas Eisele.

 

 

 

Interested participant Marcello Federico was examining the project aims

| 2010-06-20 |

ACCURAT project presented at the Workshop on Methods for the automatic acquisition of Language Resources and their evaluation methods

The ACCURAT project was presented by a lecture at the "Workshop on on Methods for the automatic acquisition of Language Resources and their evaluation methods" that was held on 2010-05-23 as one of satellite workshops at the

The talk "ACCURAT: Metrics for the evaluation of comparability of multilingual corpora" was given by Andrejs Vasiljevs.

See the whole talk at our Video lectures pages.

| 2010-05-31 |

ACCURAT project presented at the 3rd BUCC

The 3rd "Workshop on Building and Using Comparable Corpora" (BUCC) was held on 2010-05-22 as one of satellite workshops at the Language Resources and Evaluation Conference (LREC2010) in Malta. Since the workshop's primarily aim was to show the full breadth of research on comparable corpora, it seemed like a natural lieu to present the ACCURAT project. Presentations were staged by two consortium members whose papers were accepted in this workshop:

See the whole talks at our Video lectures pages.

| 2010-05-31 |

ACCURAT project participation at EC Projects Village successful

Three day presentation of the ACCURAT project took place at EC Projects Village in conjunction with LREC 2010 conference in Valletta, Malta between 2010-05-19 and 2010-05-21. The ACCURAT booth was set up and manned during all three days and its dissemination materials (poster, flyers and t-shirts) attracted a lot of attention thus raising awareness about the project in LT research community.

 

 

 

 

Inguna Skadiņa explaning the project goals

| 2010-05-27 |

ACCURAT project progress meeting held in Valletta

Joint with Language Resources and Evaluation Conference (LREC2010) in Malta an ACCURAT progress meeting was held in Valletta on 2010-05-18. Representatives from all partners were present and key issues regarding the deliverables in all WPs were discussed in four sessions starting from 9:00 until 19:00. With some minor adjustments the progress on the project is running as planned.

| 2010-05-26 |

ACCURAT project will participate in 3rd BUCC

3rd Workshop on Building and Using Comparable Corpora (BUCC) will be held on 22nd of May, 2010 as one of satellite workshops at the Language Resources and Evaluation Conference (LREC2010) in Malta. By bringing together researchers from several disciplines the workshop aims at showing the full breadth of research on comparable corpora. Two papers by ACCURAT consortium are accepted for presentation in this workshop:

| 2010-05-12 |

ACCURAT project participates at EC Projects Village

The LREC 2010 conference has invited the EC projects to promote their activities and boost their dissemination efforts through their participation in the 7th edition of the conference, May 17-23, 2010 at the Mediterranean Conference Centre in Valletta, Malta.

The EC Projects Village is being set up to promote EC-funded projects and it will be open from May 19 to 21 during the Main conference days. The EC-sponsored projects will participate in this exhibition and show their objectives, progress and activities, either through demos, or through brochures, leaflets or posters if the project is still at the early stages.

Since ACCURAT project has just started, we will participate in this last capacity showing our ideas, plans and expected results.

| 2010-05-03 |

ACCURAT presentation at Language Technology Days

ACCURAT project was presented by Andrejs Vasiljevs at Language Technology Days in Luxembourg, 22-23 March 2010. The project was presentated at Session 7: Presentation of newly started Language Resources projects where META-NET (T4ME), PANACEA and TTC projects also were presented.

| 2010-03-17 |

ACCURAT co-organizer of LREC2010 workshop

ACCURAT project is one of the organizers of the workshop "Methods for the automatic acquisition of Language Resources and their evaluation methods" that will be held as one of satellite workshops at the Language Resources and Evaluation Conference (LREC2010) in Malta on 23rd May 2010. More information is available at the workshop web pages here.

| 2010-03-12 |

ACCURAT project kick-off meeting

Kick-off meeting of the ACCURAT project took place in Tilde, SIA in Riga, Latvia on 2010-02-02 and 2010-02-03. After presentation of individual workpackages, the general procedures and the first steps were coordinated. Also the first deliverables were agreed upon.

 

 

Participants of the kick-off meeting in front of the Tilde building

| 2010-01-20 |

ACCURAT project started

Project ACCURAT that has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under Grant Agreement n° 248347 started on 2010-01-01.

| 2010-01-01 |