PLM-Local
Note: this is cross posted from ferritin
PLM-Local
This is a quick update post.
- Release of ESMC
- Began to port ESMC model in subcrate ferritin-esm
- Ported the model in this PR, and this PR
- However, when I went to implement the tokenizer, I ended up encountering some issues. Although ESMC is similar to ESM2 in that it is trained only on sequences, it shares a common codebase with ESM3 which uses structural information
- The major implication is that the tokenizer is not as simple and self-contained, and porting requires either:
- a full minimal reimplementation of ESM functionality or
- a simpler tokenization scheme that deviates from the current codebase
- While brainstorming on ESMC, I remembered that I hadn’t implemented METAL for AMPLIFY; so I did
- This makes AMPLIFY really pleasantly fast!
- I was therefore inspired to build and serve the binary via a new project, plm-local, and can build and deploy to a Conda channel
- h/t to luizirber for showing me this nice trick
pixi exec -c https://repo.prefix.dev/protein-language-local-public \
plm-local --protein-string MSVVGIDLGFQSCYVAVARAGGIETIANEYSDRCTPACISFGPKNR
# that's pretty awesome.
time pixi exec ... <as above> 0.53s user 0.65s system 42% cpu 2.800 total
time pixi exec ... <as above> 0.53s user 0.27s system 172% cpu 0.464 total
What it looks like at the moment.#
Current CLI simply encodes and decodes the protein. But it can do it in half a second with no compilation step or dependencies! Boom.