Hello, I am
Hello, I am


Hello, I am


PIETRO
PIETRO
Pietro ~ that cares a great d e. al out
Pietro ~ that cares a great d e.
Piethat
Ask me anything
I finetuned a LLM to be a virtual me, so you can ask me anything! If you want my take on tech, design, ananas on pizza, or even the secret to a perfect carbonara, just ask below!
I finetuned a LLM to be a virtual me, so you can ask me anything! If you want my take on tech, design, ananas on pizza, or even the secret to a perfect carbonara, just ask below!
asd ajjsaf asfff da sssss y876j23iti32 gj AK D g asdd4 gg DDDD a. dasda sd jg s asdd lo g
asd ajjsaf asfff 59 gj g !@ ghh h
gjg hj gnj asfj 2 15 5 s
Ask me anything
I finetuned a LLM to be a virtual me, so you can ask me anything! If you want my take on tech, design, ananas on pizza, or even the secret to a perfect carbonara, just ask below!
6121 252666828 j 21 f66 2
Who am I
Technical Engineering Leader, specialized
in Multimodal Generative AI and Cross
Functional Teams
Technical Engineering Leader, specialized
in Multimodal Generative AI and Cross
Functional Teams
I create AI systems that make people more creative and efficient, focusing on image, design, text, and audio generation
At Snap, I lead Multimodal Generative AI for Lens Studio, managing a team and building pipelines serving ~900M monthly active users. Previously, at Kittl, I built graphic design tools using Diffusion Models and LLMs; at Kling Klang Klong, a real-time text-to-song pipeline for the first AI-generated lyric opera; and at DoReMir, a generative melody model that competed in the first AI Eurovision Song Contest
As a technical leader, I work closely with product, design, business, and engineering teams to turn complex AI systems into products people love
When I'm not planning roadmaps or optimizing GPU throughput, I'm likely either hiking in the Alps, cooking risotto, or writing my technical blog; and making generative art, which I've exhibited at a few New Media festivals around the world. More below ↓
I create AI systems that make people more creative and efficient, focusing on image, design, text, and audio generation
At Snap, I lead Multimodal Generative AI for Lens Studio, managing a team and building pipelines serving ~900M monthly active users. Previously, at Kittl, I built graphic design tools using Diffusion Models and LLMs; at Kling Klang Klong, a real-time text-to-song pipeline for the first AI-generated lyric opera; and at DoReMir, a generative melody model that competed in the first AI Eurovision Song Contest
As a technical leader, I work closely with product, design, business, and engineering teams to turn complex AI systems into products people love
When I'm not planning roadmaps or optimizing GPU throughput, I'm likely either hiking in the Alps, cooking risotto, or writing my technical blog; and making generative art, which I've exhibited at a few New Media festivals around the world. More below ↓
Experience


Snap Inc. – Senior AI Engineer (Lead)
Snap Inc. – Senior AI Engineer (Lead)
2025 – Today ⟜ London, United Kingdom
2025 – Today ⟜ London, United Kingdom

Kittl – Principal AI Engineer (Lead)
Kittl – Principal AI Engineer (Lead)
2023 – 2025 ⟜ Berlin, Germany
2023 – 2025 ⟜ Berlin, Germany

Kling Klang Klong – AI Engineer (Lead)
Kling Klang Klong – AI Engineer (Lead)
2020 – 2023 ⟜ Berlin, Germany
2020 – 2023 ⟜ Berlin, Germany

DoReMir – AI Research Scientist
DoReMir – AI Research Scientist
2019 – 2020 ⟜ Stockholm, Sweden
2019 – 2020 ⟜ Stockholm, Sweden
EDUCATION


TU Berlin + KTH Stockholm
TU Berlin + KTH Stockholm
Double Master's Degree – Artificial Intelligence
2019 – 2020 ⟜ Stockholm, Sweden
2018 – 2019 ⟜ Berlin, Germany
Double Master's Degree – Artificial Intelligence
2019 – 2020 ⟜ Stockholm, Sweden
2018 – 2019 ⟜ Berlin, Germany

University of Trento
University of Trento
Bachelor's Degree – Computer science
2015 – 2018 ⟜ Trento, Italy
Bachelor's Degree – Computer science
2015 – 2018 ⟜ Trento, Italy
Who am I
Technical Engineering Leader, specialized
in Multimodal Generative AI and Cross
Functional Teams
I create AI systems that make people more creative and efficient, focusing on image, design, text, and audio generation
At Snap, I lead Multimodal Generative AI for Lens Studio, managing a team and building pipelines serving ~900M monthly active users. Previously, at Kittl, I built graphic design tools using Diffusion Models and LLMs; at Kling Klang Klong, a real-time text-to-song pipeline for the first AI-generated lyric opera; and at DoReMir, a generative melody model that competed in the first AI Eurovision Song Contest
As a technical leader, I work closely with product, design, business, and engineering teams to turn complex AI systems into products people love
When I'm not planning roadmaps or optimizing GPU throughput, I'm likely either hiking in the Alps, cooking risotto, or writing my technical blog; and making generative art, which I've exhibited at a few New Media festivals around the world. More below ↓
Experience

Snap Inc. – Senior AI Engineer (Lead)
2025 – Today ⟜ London, United Kingdom

Kittl – Principal AI Engineer (Lead)
2023 – 2025 ⟜ Berlin, Germany

Kling Klang Klong – AI Engineer (Lead)
2020 – 2023 ⟜ Berlin, Germany

DoReMir – AI Research Scientist
2019 – 2020 ⟜ Stockholm, Sweden
EDUCATION


TU Berlin + KTH Stockholm
Double Master's Degree – AI
2019 – 2020 ⟜ Stockholm, Sweden
2018 – 2019 ⟜ Berlin, Germany

University of Trento
Bachelor's Degree – Computer science
2015 – 2018 ⟜ Trento, Italy
asd g@@$ g kk 61p2oj fjn. asjh. dja ghmadj asg hh k gj dj !!!!!!!!! d. ? gkj/k. jajs. jDDDDDDj d
asd g@@$ g kk 61p2oj fjn. asjh. dja ghmadj d g 3333 33 k gj dj !
as k2125 h5 !f 2s
SELECTED Works

From a Million-Scale Data Engine to On-Device AR Face Generation

Snap ⟜ 2026
I built a SOTA system that turns a text prompt and/or reference image into a real-time, on-device AR face effect; now in production and used by ~733 million monthly users. Through foundation model post training, I specialized it for high-fidelity facial edits with strong prompt control and reference following. To enable this, I first built a million-scale data pipeline: diffusion models generate the training pairs, VLMs evaluate and filter them for quality, and dense captioning plus automated prompt enhancement add rich supervision. I then used teacher-student distillation to compress the teacher into a lightweight on-device student that runs in real time. As of 2026, it's the highest quality AR system in the industry: far better prompt following, combined text-plus-image conditioning, and a broader, higher-quality range of effects.
Post-Training
On-device Distillation
Large-Scale

Agentic AR Filter Creation

Snap ⟜ 2025
I was one of the core contributors to EasyLens, an Agentic platform that turns a text prompt, image, or sketch into a publishable AR experience in minutes, across web, iOS, and Android. The system and has already supported millions of creators, driving close to a billion daily impressions. My focus was bringing advanced generative ML into the product. I designed the integration patterns that let heavy, long-running ML pipelines run within the agentic flow. As a result, state-of-the-art generative capabilities, once limited to pro tools, now are accessible to millions of casual creators.
Large-Scale
Agentic AI
Infrastructure

Editable designs in under 30 seconds

Kittl ⟜ 2025
I built one of the first editable design generator in the industry: a system that produces fully editable, structured designs rather than flat raster images. Because the output is native design (real layers, text, shapes, and assets; not pixels) users get far greater control and the generations plug directly into the Kittl editor and every other function of the app. Under the hood is a custom transformer architecture built on autoregressive LLMs, trained on millions of real designs and running in three stages: a VLM post-trained for visual layout planning and reasoning, then asset retrieval and visual matching, and finally an autoregressive approach to rendering into an editable design. It's a deceptively hard problem: there is no objectively right or wrong design and no fixed rules to optimize against, so the model has to learn "graphic design taste", an implicit sense of balance, hierarchy, and aesthetics. On internal benchmarks it beats both Canva and Adobe on layout quality (+19%), brand and visual coherence (+27%), and editability, and reached ~47% adoption among paid users with a +39% increase in user-generated content and download rates.
VLM Post-Training
Editable Design Generation
Layout Reasoning

Consistent image sets generation in less than a minute

Kittl ⟜ 2024
I built a "Saved Style" feature that distills any reference image into a reusable embedding. Under the hood, I trained a style adapter to encode the visual identity of a reference into the diffusion model's conditioning space, injecting style embeddings directly into the cross-attention of the diffusion backbone. This delivers high-fidelity, repeatable zero-shot style transfer that disentangles style from content, with no prompt engineering required from the user. The insight was simple: users struggled to write the "perfect" prompt, and preset styles boxed them into a narrow set of use cases. Images provide a much richer and more explicit conditioning, so instead of describing the look they wanted, users just show it and let the model operate on it, unlocking a far more precise generation space with far less effort. I also re-architected the inference stack for high-throughput, highly parallel generation, batching and scheduling requests to maximize GPU utilization and cutting end-to-end latency by ~70%, making style previews fast enough to feel interactive. The result was millions of generations per hour and a +27% month-on-month increase in subscription rate driven by this feature.
Style Adapter Training
Diffusion Inference Optimization
Zero-Shot Transfer
The First Fully AI-Generated Opera: From Libretto to Live Voice

Kling Klang Klong ⟜ 2021 – 2023
I built the AI behind the first opera entirely generated by AI: the concept, the libretto, the soundscapes, and the full score were all machine-generated, end to end. A genuine first for the medium, it toured opera houses worldwide and contributed to 136% year-over-year revenue growth for Kling Klang Klong. The centerpiece was the lead singing voice, generated live and uniquely for every performance, so no two shows were the same. I built the in-house pipeline behind it: a post-trained LLM for lyrics, a fine-tuned LLM that composes a melody conditioned on those lyrics, and a voice synthesis stage that turns the result into expressive singing using a diffusion-based acoustic model to generate mel spectrograms, then a neural vocoder to convert them into the final waveform, all fast enough to run in real time on stage. Press play with audio on to hear the generated voice, as played in the premiere! It's a hard problem, and in its time (2021-2023) the most advanced approach ever taken: opera demands long-form musical coherence, emotional expression, and tight alignment between lyrics, melody, and voice, with no second take once the curtain is up.
Singing Voice Synthesis
Real-Time Inference
Generative Art
SELECTED Works

From a Million-Scale Data Engine to On-Device AR Face Generation

Snap ⟜ 2026
I built a SOTA system that turns a text prompt and/or reference image into a real-time, on-device AR face effect; now in production and used by ~733 million monthly users. Through foundation model post training, I specialized it for high-fidelity facial edits with strong prompt control and reference following. To enable this, I first built a million-scale data pipeline: diffusion models generate the training pairs, VLMs evaluate and filter them for quality, and dense captioning plus automated prompt enhancement add rich supervision. I then used teacher-student distillation to compress the teacher into a lightweight on-device student that runs in real time. As of 2026, it's the highest quality AR system in the industry: far better prompt following, combined text-plus-image conditioning, and a broader, higher-quality range of effects.
Post-Training
On-device Distillation
Large-Scale

Agentic AR Filter Creation

Snap ⟜ 2025
I was one of the core contributors to EasyLens, an Agentic platform that turns a text prompt, image, or sketch into a publishable AR experience in minutes, across web, iOS, and Android. The system and has already supported millions of creators, driving close to a billion daily impressions. My focus was bringing advanced generative ML into the product. I designed the integration patterns that let heavy, long-running ML pipelines run within the agentic flow. As a result, state-of-the-art generative capabilities, once limited to pro tools, now are accessible to millions of casual creators.
Large-Scale
Agentic AI
Infrastructure

Editable designs in under 30 seconds

Kittl ⟜ 2025
I built one of the first editable design generator in the industry: a system that produces fully editable, structured designs rather than flat raster images. Because the output is native design (real layers, text, shapes, and assets; not pixels) users get far greater control and the generations plug directly into the Kittl editor and every other function of the app. Under the hood is a custom transformer architecture built on autoregressive LLMs, trained on millions of real designs and running in three stages: a VLM post-trained for visual layout planning and reasoning, then asset retrieval and visual matching, and finally an autoregressive approach to rendering into an editable design. It's a deceptively hard problem: there is no objectively right or wrong design and no fixed rules to optimize against, so the model has to learn "graphic design taste", an implicit sense of balance, hierarchy, and aesthetics. On internal benchmarks it beats both Canva and Adobe on layout quality (+19%), brand and visual coherence (+27%), and editability, and reached ~47% adoption among paid users with a +39% increase in user-generated content and download rates.
VLM Post-Training
Editable Design Generation
Layout Reasoning

Consistent image sets generation in less than a minute

Kittl ⟜ 2024
I built a "Saved Style" feature that distills any reference image into a reusable embedding. Under the hood, I trained a style adapter to encode the visual identity of a reference into the diffusion model's conditioning space, injecting style embeddings directly into the cross-attention of the diffusion backbone. This delivers high-fidelity, repeatable zero-shot style transfer that disentangles style from content, with no prompt engineering required from the user. The insight was simple: users struggled to write the "perfect" prompt, and preset styles boxed them into a narrow set of use cases. Images provide a much richer and more explicit conditioning, so instead of describing the look they wanted, users just show it and let the model operate on it, unlocking a far more precise generation space with far less effort. I also re-architected the inference stack for high-throughput, highly parallel generation, batching and scheduling requests to maximize GPU utilization and cutting end-to-end latency by ~70%, making style previews fast enough to feel interactive. The result was millions of generations per hour and a +27% month-on-month increase in subscription rate driven by this feature.
Style Adapter Training
Diffusion Inference Optimization
Zero-Shot Transfer
The First Fully AI-Generated Opera: From Libretto to Live Voice






