Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
15
4
9
Pietro Lesci
pietrolesci
Follow
regisss's profile picture
Donya's profile picture
kamaludeen's profile picture
18 followers
·
33 following
https://pietrolesci.github.io/
pietro_lesci
pietrolesci
pietrolesci
pietrolesci.bsky.social
AI & ML interests
I like developing and applying causal methods to study the effect of training choices on models’ behaviour, including memorisation, shortcut learning, and tokenisation.
Organizations
pietrolesci
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
cmeister/multilingual-tok-corpus
6 months ago
Create README.md
#2 opened 6 months ago by
pietrolesci
New activity in
JeanKaddour/minipile
about 1 year ago
Domain and provenance annotation
9
#1 opened over 2 years ago by
haukur
New activity in
HuggingFaceTB/SmolLM-135M
over 1 year ago
Trapezoidal scheduler with cooldown phase
👍
1
3
#4 opened over 1 year ago by
maveriq
New activity in
EleutherAI/pythia-160m
over 1 year ago
Tokenizer `merges.txt` files
3
#5 opened over 1 year ago by
pietrolesci
New activity in
EleutherAI/pile-deduped-pythia-preshuffled
almost 2 years ago
Sequence "packing" logic
👍
2
2
#2 opened about 2 years ago by
pietrolesci
New activity in
EleutherAI/pile-deduped-pythia-preshuffled
about 2 years ago
Pad-only sequences from mmap'ed dataset after a certain index
#1 opened about 2 years ago by
pietrolesci
New activity in
EleutherAI/pile-duped-pythia-random-sampled
about 2 years ago
Add full sequences (beyond the first 64 tokens)
3
#1 opened about 2 years ago by
pietrolesci
Add full sequences (beyond the first 64 tokens)
3
#1 opened about 2 years ago by
pietrolesci
New activity in
JeanKaddour/minipile
over 2 years ago
Domain and provenance annotation
9
#1 opened over 2 years ago by
haukur
Load more