Note: this is cross posted from ferritin

PLM-Local

This is a quick update post.

Release of ESMC
Began to port ESMC model in subcrate ferritin-esm
Ported the model in this PR, and this PR
However, when I went to implement the tokenizer, I ended up encountering some issues. Although ESMC is similar to ESM2 in that it is trained only on sequences, it shares a common codebase with ESM3 which uses structural information
The major implication is that the tokenizer is not as simple and self-contained, and porting requires either:
- a full minimal reimplementation of ESM functionality or
- a simpler tokenization scheme that deviates from the current codebase
While brainstorming on ESMC, I remembered that I hadn’t implemented METAL for AMPLIFY; so I did
This makes AMPLIFY really pleasantly fast!
I was therefore inspired to build and serve the binary via a new project, plm-local, and can build and deploy to a Conda channel
h/t to luizirber for showing me this nice trick

pixi exec -c https://repo.prefix.dev/protein-language-local-public \
    plm-local --protein-string MSVVGIDLGFQSCYVAVARAGGIETIANEYSDRCTPACISFGPKNR

# that's pretty awesome.
time pixi exec ... <as above>   0.53s user 0.65s system 42% cpu 2.800 total
time pixi exec ... <as above>  0.53s user 0.27s system 172% cpu 0.464 total

What it looks like at the moment.#

Current CLI simply encodes and decodes the protein. But it can do it in half a second with no compilation step or dependencies! Boom.