Software and data

The research I supervised and contributed to has generated a number of software and data repositories. They are accessible from Idiap’s Data Distribution Portal (pointing to Zenodo repositories), Idiap’s Github repository, HEIG-VD IDA group’s Github repository, and the HuggingFace model store. The software and data are listed in alphabetical order, with co-authors in parentheses, mostly PhD students.

Software

  • ACT: metric for the Accuracy of Connective Translation (with N. Hajlaoui, Th. Meyer)
  • APT: metric for the Accuracy of Pronoun Translation (with L. Miculicich Werlen)
  • CBrec: content-based recommender (with N. Pappas)
  • CRPO: Computer-assisted poetic creation (with A.R. Atrio, G. Luthier, V. Minder, A. Xanthos)
  • DiscoConn: classifier for connective labeling in English texts (with Th. Meyer)
  • DocRec: keyword extraction and document recommendation (with M. Habibi)
  • EMORec: emotion-based recommendation generator (with N. Pappas)
  • PLACAT: robust conversational agent for information access (with G. Luthier)
  • SLOG: learning similarity metrics on graphs (with M. Yazdani)
  • WMIL-SGD: multiple-instance learning for sentiment analysis (with N. Pappas)

Data

  • AREX: AMI requests for explanations, with relevance judgments (with M. Habibi)
  • Discourse Connective Annotation (with Th. Meyer, B. Cartoni, S. Zufferey)
  • HATDOC: Human Attention Scores in Document Classification (with N. Pappas)
  • Tense-annotation: parallel verb tense annotation on Europarl (with Th. Meyer, C. Grisot)
  • TED Ratings: ground truth human-made recommendations (with N. Pappas)

Models

  • GPT2-Poetry: a gpt2-small model (137 M parameters) fine-tuned on the Gutenberg Poetry Corpus to generate English poetry (with T. Ferrari)
  • RoBERTa-Poetry: a series of eight RoBERTa-based models (125M parameters) fine-tuned on poetry, respectively for one of five topics, or one of three emotions (with T. Ferrari)