Natural Language Processing for Portuguese
Leaders: Marcelo Finger, Sandra M. Aluísio and Thiago A. S. Pardo

The goal of this research challenge is to put together open data and tools to enable high-level NLP for Portuguese (written and spoken). The research group has focused on:

  • with a syntactical perspective, a multi-genre corpus of syntactically annotated texts for building robust tagging and parsing models;
  • with a linguistic and computational perspective, a general corpus of Brazilian Portuguese that can serve, among other uses, as a basis for foundational models; and,
  • for spoken language, multitask corpora for speech recognition, multi-speaker synthesis, speaker identification and voice cloning. Applications have included sentiment analysis-related tasks, opinion aspect extraction and categorization, and fake news detection.
Project Websites (External)

NLP2 – Web portal

POeTiSA – Portuguese processing: Towards Syntactic Analysis and parsing

TaRSila – Tarefa de Anotação para o Reconhecimento e Síntese de fala da Língua Portuguesa

Carolina – General corpus of contemporary Brazilian Portuguese


Strengthening Brazilian Indigenous Languages with AI
Leaders: Luciana Storto and Claudio Pinhanez

This research challenge explores three key questions:

  • How can AI technologies improve documentation and related efforts for Brazilian indigenous languages?syntactically annotated texts for building robust tagging and parsing models;
  • How should current tools be adapted to work with low-resource and endangered languages?
  • How can AI tools be developed for indigenous communities in an ethical way?

The team empowers indigenous communities with AI, ethically collaborates with São Paulo’s indigenous groups, investigating Guarani Mbya language and developing student support tools.


Knowledge Enhanced Machine Learning
Leaders: Sarajane Peres and Fabio Cozman

The broad goal of this research challenge has been to merge data-driven learning and knowledge-based reasoning; even though AI is now very successful due to machine learning breakthroughs, more attention must be devoted to reasoning that has formal guarantees. The research team has worked on a conversational agent that combines several sources of information about the ocean, looking both at web-based and at robot-based interfaces. Significant effort is now devoted to evaluation of large language models so as to obtain guarantees of factuality. The team has also investigated neuro-symbolic combinations of probabilistic programming and neural networks to address automated argumentation.


Oceanography-based Machine Learning
Eduardo Tannuri and Zhao Liang

This research challenge aims to combine knowledge of physics with data-driven methods for the prediction of ocean variables, with a special focus on the Brazilian coastal region (referred to as the Blue Amazon), an exclusive economic zone of about 200 nautical miles in the Atlantic ocean. By integrating state-of-the-art machine learning algorithms with a centuries-worth trove of physics modeling, the team has improved current computational predictions and has reduced the size of datasets required to obtain those predictions. Results are applied to settings with social impact, such as flood and storm tide prediction, estimation of oil spill movements, and optimum placement of wind and water turbines.


Multicriteria Decision-Making on Climate
Leaders: Alexandre Delbem and Antonio Saraiva

Climate modeling has shown notable advances in the last decades. However, the challenge of gluing climate measurements to mathematical models remains, specially with respect to extreme weather events. New possibilities emerge with machine learning advances in physics-informed neural networks. This research challenge investigates hyperparameter many-objective optimization to find proper trade-offs among performance criteria (such as accuracy, computing cost, and model complexity or size) and physical constraints (arranged as margins to maximize). The team looks at physics-based models that can compete or cooperate, at causal models, at knowledge-based constraints within data-driven machine learning.


Observing and Discussing the Impact of Artificial Intelligence
Leaders: Cristina Oliveira and João Veiga

The goal of this big challenge is to map, understand, and debate the impact of AI in Brazil and to extent this analysis to other emerging countries. Interests in the team range from disinformation to regulation to ethical behavior. Given AI’s broad impact, these pressing questions can only be successfully addressed from a multi-disciplinary perspective. Since June 2022 the team has participated as an academic branch in the Brazilian AI Observatory led by (the agency that regulates and controls the Brazilian internet). The AI Observatory has been created by the federal Ministry of Science, Technology and Innovation, and is usually referred to as “OBIA” (for Observatório Brasileiro de Inteligência Artificial”).

AI Health

Graph-Oriented Machine Learning for Diagnosis and Rehabilitation
Leaders: Jos´e Krieger and Zhao Liang

The recent advances of machine learning in medicine have been remarkable. However, there are still important issues that need to be addressed; for instance: 1) How to integrate and select relevant medical features (biomarkers) from large-scale heterogeneous and dynamical sources? 2) How to interpret decisions made by machine learning algorithms and how to integrate human and artificial intelligence? This research group addresses such questions, focusing to a great extent on cerebrovascular accidents (CVA), cancer, and Covid-19. The team has developed a host of AI techniques in connection with that applied side, in particular by relying on graph-oriented machine learning.


Causal Multicriteria Decision Making in Food Production Networks
Leaders: Alexandre Delbem and Antonio Saraiva

This research group is interested in causal multicriteria AI models for decision making under uncertainty within food production networks, in particular as related to food security. The team investigates methods to automatically find and represent the fundamental structures of social, economic and physical subsystems related to food production by creating reliable (causal) models. Many AgriBio investigations are data-driven, using multi-source heterogeneous databases with thousands of variables from the Open Government Partnership (OGP). Advances in representation learning, resilience enhancement of the learned models, and multicriteria decision-making are pursued in that context.