Skip to content
View jogonba2's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report jogonba2

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jogonba2/README.md

Bio 🌱

Hi!👋😊

I'm José, an NLP researcher deeply passionate about exploring the limitless possibilities of natural language processing. My PhD focused on summarization and attention-based models, but my work spans a wide range of NLP topics, including: 📚 [Zero- and few-shot] Text Classification, 😊 Sentiment and Emotion Analysis, 🌟 Figurative Language Understanding, 🗣️ Dialogue Systems, 📄 Information Extraction, or 🤖 Machine-Generated Text Detection.

Since 2016, my research has centered on the intersection of deep learning and NLP, striving to develop efficient solutions for complex language challenges. I'm also dedicated to advancing NLP for Spanish and co-official languages in Spain, working on initiatives that bridge linguistic and technological gaps.

Over the years, I've been an active participant in shared tasks across a variety of NLP domains. I was part of the winning teams in several competitions, including TASS 2017 to 2020, IroSVA, COSET, and SemEval 2024 Task 8. I’ve also achieved strong results in other SemEval challenges, such as 2017, 2018 (1), 2018 (2), or 2019.

In 2023, I began organizing a line of shared tasks at the Iberian Languages Evaluation Forum (IberLEF), focusing on machine-generated text detection and attribution, such as AuTexTification, IberAuTexTification, and MIMIC. I was also on the program committee for the GenAI content detection task at COLING 2025, and I am one of the three organizers of IberLEF from 2025 to 2027.

Outside of research, I’m passionate about teaching. I currently teach courses on information retrieval, intelligent agents, and programming at Universidad Europea, as well as advanced machine learning techniques in the Master’s in Big Data program at Universidad de Barcelona. Besides, I am a recurrent invited speaker to the Master's in Artificial Intelligence of the UPV to give a talk about language modeling and embeddings.

I'm also proud to share that my PhD thesis was awarded cum laude and received the best NLP thesis award from the Spanish Society for Natural Language Processing.

Works 👨🏻‍🔧

Here are some of my works with public source code and (a few) publications during these years:

Work Repo Paper Journal/Conference
BERT for tweets before HuggingFace's era Link Link Neurocomputing
Hierarchical dual Transformer encoders for summarization Link Link Intelligent & Fuzzy Systems
Spanish and Catalan datasets for summarization Link Link NAACL
Source summary entity aggregations in abstractive summarization Link Link COLING
Transformer-based contextualization for irony detection Link Link Information Processing & Management
LLMixtic, winning system at SemEval 2024 Task 8 Link Link Proceedings of SemEval 2024
TextMachina, a framework to build MGT datasets Link Link KES 2024
MinGRU implementation Link N/A N/A
ProxyTuning implementation Link N/A N/A
Copy mechanism for Transformers Link N/A N/A

Stack and stats 🛠️

Python   Torch   Tensorflow   Transformers   Transformers   Transformers   Transformers   SentenceTransformers   Spacy   SkLearn   Pandas  

Triton   VLLM   Docker   Azure   Streamlit   FastAPI  

Git   GitHub   LATEX  

Rashmi's Github Stats Top Langs

Reach me! 🤙

I'm looking forward to collaborate in any NLP field. Feel free to reach me through Linkedin, Google Scholar, ResearchGate, and HuggingFace!

Pinned Loading

  1. explain-ink explain-ink Public

    Efficient token attribution for text classifiers.

    Python

  2. Genaios/TextMachina Genaios/TextMachina Public

    A modular and extensible Python framework, designed to aid in the creation of high-quality, unbiased datasets to build robust models for MGT-related tasks such as detection, attribution, and bounda…

    Python 15

  3. supervised-finetuning supervised-finetuning Public

    Train and evaluate supervised-finetuned LLMs in zero and few-shot settings

    Python

  4. llmixtic llmixtic Public

    Winning model of SemEval 2024 Task 8: Machine-generated text detection

    Python 1

  5. symanto-research/merge-tokenizers symanto-research/merge-tokenizers Public

    Package to align tokens from different tokenizations.

    Python 3

  6. twilbert twilbert Public

    Specialization of BERT architecture both for the Spanish language and the Twitter domain

    Python 13