Show HN: Ldump – serialize any Lua data

github.com

77 points by girvel 9 hours ago

Some time ago, I was implementing saves for my LOVE2D game. I wanted to do a full dump of the game state -- which included closures (AI), complex graphs, sets with tables as keys and also fundamentally non-serializable data (coroutines and userdata), that require user-defined serialization/deserialization logic. I went through every Lua serialization library -- none covered all data types/cases. So I wrote my own.

It is a polished version, thoroughly annotated, tested and documented. It is made to be as functional and customizable as possible (or at least I did everything I could think of). I would be happy to hear suggestions/corrections for both code and documentation -- even nitpicky ones.

lifthrasiir 7 hours ago

Maybe I'm too pedantic but allowing anything to be "deserialized", which equals to "evaluated" here, is not secure. I think it only has to accept a very limited subset of Lua anyway, so you may switch to a non-Lua format which is made easy to parse. That way the library has a total control over what is being evaluated.

  • girvel 7 hours ago

    This is an interesting thought. Currently, it is unsafe and intended to load only the files you trust. I should definitely include a warning into README.

    Overall, it would be nice to make it safer. I don't think switching to non-Lua format would make it safer, because it is intended to serialize functions too, which can have arbitrary code even if everything else would be stored as data. Maybe it is possible to make a function like `ldump.safe_load` restricting `load`'s environment, so it wouldn't have access to debug/os/io modules.

    • gvx 7 hours ago

      You could take a look at SELÖVE, a (severely out of date) fork of LÖVE that is intended to make it safe to run arbitrary .love games. (It used to be on bitbucket, but it looks like it's gone? I'm not sure if I have the repo locally :/)

      Running arbitrary code was such a problem that I just completely ruled it out for bitser. Instead of serializing functions, you can register safe functions as resources. This doesn't solve the upvalue problem, though.

    • girvel 3 hours ago

      I looked into it, and Lua allows limiting the environment when `load`ing -- through `env` argument since 5.2 or through setfenv before. I will add a helper function to produce a minimal needed environment for safe loading and a documentation page about safety.

      • myrmidon 3 hours ago

        Note that loading (maliciously crafted) bytecode is generally not safe in Lua; sandboxing can be escaped in more ways than what's possible when loading plaintext sourcecode, and there are no full mitigations for this currently as far as I know (and would probably be highly interpreter/version sensitive anyway)-- the only "real" mitigation strategy is to just not `load` bytecode at all.

        But this is probably a non-issue for a lot of usecases.

        See e.g.

        https://gist.github.com/corsix/6575486

        https://www.corsix.org/content/malicious-luajit-bytecode

    • lifthrasiir 7 hours ago

      Yeah, you would need an allowlist for functions. Using bytecode would make it much harder, I haven't given deep thought yet.

koeng 2 hours ago

I've been looking for something similar! Here is what I'd like to do:

I have a long-running script. At several steps, the execution of the script has to pause for a long time for operations to be done in-real-life (biological experiments, so think wait time being like 2 days between running), before getting some data and continuing to run. From what I can see in this, I'd add yielding coroutines at data pause points, right? How would you handle that?

ithkuil 8 hours ago

"lump" would have been a nice name

  • girvel 8 hours ago

    Damn, it really would

    • darig 7 hours ago

      [dead]

JourneyJourney 7 hours ago

I'm afraid I spent too much time with LUA lately and fell in love with its simplicity. Kinda hard to go back to OOP after that.

  • nicoloren 7 hours ago

    Same for me, I used Lua for a desktop software for a client and I enjoyed it a lot!

    I'm thinking of starting to dev a game with LOVE2D just to have an excuse to use Lua.

    • girvel 3 hours ago

      LOVE2D is a great gamedev framework, I can not recommend it enough. It is so pleasant to work with.

  • aldanor 5 hours ago

    It's simple until you dig deep into meta tables lol

appleorchard46 2 hours ago

Very cool! I was just needing something like this for my Defold game, this looks way better than my hacky solution.

Semi-unrelated - you say you're using tables as keys in your project. I didn't know you could do that! What are you using it for?

  • GranPC 2 hours ago

    FWIW anything in Lua can be used as a key - including functions, userdata, etc.

bflesch 8 hours ago

Nice - I wonder why something like this is not built-in to the language itself. Especially debugging tables is painful to say the least :)

gvx 7 hours ago

Cool to see you were inspired by Ser!

  • girvel 6 hours ago

    Oh wow, didn't expect to meet the author, thank you!

sebstefan 8 hours ago

So it also dumps functions and is able to import them back?

Does the function still need to be in memory to be loaded again ("does it just dump the pointer") or can I save it to disk, shut off the interpreter, boot it again and it imports it fine (in which case it somehow dumps them as code...?)?

Even in the linked test case on the readme you don't show the output/expectation of the serialization

  • myrmidon 7 hours ago

    > can I save it to disk, shut off the interpreter, boot it again and it imports it fine (in which case it somehow dumps them as code...?

    Yes, it dumps them as bytecode (probably not compatible between completely different interpreters).

    It even preserves debug metadata, so stack traces involving serialized/deserialized functions look right, and still show the original source file.

    This is really neat.

    • girvel 7 hours ago

      Thank you, it is really nice to hear. Though, I have to give credit to Lua's standard library -- the basic function serialization (without upvalues) is implemented there as `string.dump`.

      • elpocko 6 hours ago

        Be aware that you're gonna have a bad time in scenarios where code is serialized using one Lua version and deserialized using another. Bytecode compatibility is not guaranteed between different versions of Lua(JIT).

        I've shipped Love2D games as bytecode that wouldn't run on many Linux boxes because their LuaJIT installation (which is not part of Love2D but part of the system) was too old, or they stopped working after the user updated their system. There's a plethora of situations where something like that can happen.

        I'm also wary of the "upvalues are preserved" feature, which sounds like a huge footgun, but I haven't looked into the details of your implementation.

  • lifthrasiir 7 hours ago

    Functions are apparently serialized as a bytecode dump contained in a self-extracting expression. So everything can indeed be serialized as a Lua expression. Seems that the author also tried to preserve as many upvalues as possible, though I feel that is way more dangerous than I would like.

    • girvel 7 hours ago

      Yep, that is correct. I think ldump is able to preserve all upvalues, even on edge cases such as "_ENV" and joined upvalues (multiple functions referencing one upvalue). A closure is basically an object with a single method and upvalues as fields -- serialization is straightforward. I think I got it covered, but I would be glad to hear ideas about where the serialization can be unstable.

  • girvel 7 hours ago

    The function (even a closure) would be fully recreated on deserialization, it is fully safe to save it to disk. It wouldn't preserve reference equality -- it would be a new function -- but the behaviour and the state (if using closures) would be equivalent.

    I didn't include asserts in the linked case, because I thought it would be too verbose. You can see asserts in the test, that is linked below the example. Maybe it was the wrong call, I will think about including asserts into the example itself.

    • sebstefan 6 hours ago

      That's super cool

      I think you could make it clearer, try reading the readme as someone with the preconceived notion that this is Yet Another Lua Serializer that translates functions, userdata and threads to their tostring() output. There are hundreds of those projects

brunocroh 7 hours ago

I will try it on my next love2d project, thank you!

synergy20 4 hours ago

newbie question,when is this useful in practice

  • girvel 4 hours ago

    It is intended to be used in cases where you need to store data on a disk or transfer it to another machine -- like in a video game save or a network data exchange

jhatemyjob 4 hours ago

Try running this in a repl and tell me what you get, OP:

    string.format('%q', 'hi\n')
  • myrmidon 4 hours ago

    If you insinuate that %q obviates the need for ldump then you are wrong.

    There is not even significant overlap in what they do; all that %q does is sufficiently escape Lua strings so the interpreter can read them back. It does not serialize functions nor even tables in any shape or form.

    edit: Sorry for being unreasonably harsh after misunderstanding your message.

    • girvel 4 hours ago

      I actually thought the comment was about ldump implementation: it uses %q to serialize strings, and it may not be a reliable way.

  • girvel 4 hours ago

    On my machine it produces an equivalent string, although differently formatted. It seems that ldump preserves all special characters (`"\a\b\f\n\r\t\v\\\"\'"`), although I will need to test in on all supported versions.

    • jhatemyjob 3 hours ago

      Ah, you know what, you're right. It's an equivalent string for me too:

          "hi\
          "
      
      I didn't know Lua treated \ before newlines like that. That's cool! I made a similar Lua serialization library for myself and was using a chain of `string.match` calls to escape my strings. Now I can make it way simpler. Lol. Thanks