Hi!👋😊
I'm José, an NLP researcher deeply passionate about exploring the limitless possibilities of natural language processing. My PhD focused on summarization and attention-based models, but my work spans a wide range of NLP topics, including: 📚 [Zero- and few-shot] Text Classification, 😊 Sentiment and Emotion Analysis, 🌟 Figurative Language Understanding, 🗣️ Dialogue Systems, 📄 Information Extraction, or 🤖 Machine-Generated Text Detection.
Since 2016, my research has centered on the intersection of deep learning and NLP, striving to develop efficient solutions for complex language challenges. I'm also dedicated to advancing NLP for Spanish and co-official languages in Spain, working on initiatives that bridge linguistic and technological gaps.
Over the years, I've been an active participant in shared tasks across a variety of NLP domains. I was part of the winning teams in several competitions, including TASS 2017 to 2020, IroSVA, COSET, and SemEval 2024 Task 8. I’ve also achieved strong results in other SemEval challenges, such as 2017, 2018 (1), 2018 (2), or 2019.
In 2023, I began organizing a line of shared tasks at the Iberian Languages Evaluation Forum (IberLEF), focusing on machine-generated text detection and attribution, such as AuTexTification, IberAuTexTification, and MIMIC. I was also on the program committee for the GenAI content detection task at COLING 2025, and I am one of the three organizers of IberLEF from 2025 to 2027.
Outside of research, I’m passionate about teaching. I currently teach courses on information retrieval, intelligent agents, and programming at Universidad Europea, as well as advanced machine learning techniques in the Master’s in Big Data program at Universidad de Barcelona. Besides, I am a recurrent invited speaker to the Master's in Artificial Intelligence of the UPV to give a talk about language modeling and embeddings.
I'm also proud to share that my PhD thesis was awarded cum laude and received the best NLP thesis award from the Spanish Society for Natural Language Processing.
Here are some of my works with public source code and (a few) publications during these years:
Work | Repo | Paper | Journal/Conference |
---|---|---|---|
BERT for tweets before HuggingFace's era | Link | Link | Neurocomputing |
Hierarchical dual Transformer encoders for summarization | Link | Link | Intelligent & Fuzzy Systems |
Spanish and Catalan datasets for summarization | Link | Link | NAACL |
Source summary entity aggregations in abstractive summarization | Link | Link | COLING |
Transformer-based contextualization for irony detection | Link | Link | Information Processing & Management |
LLMixtic, winning system at SemEval 2024 Task 8 | Link | Link | Proceedings of SemEval 2024 |
TextMachina, a framework to build MGT datasets | Link | Link | KES 2024 |
MinGRU implementation | Link | N/A | N/A |
ProxyTuning implementation | Link | N/A | N/A |
Copy mechanism for Transformers | Link | N/A | N/A |
I'm looking forward to collaborate in any NLP field. Feel free to reach me through Linkedin, Google Scholar, ResearchGate, and HuggingFace!