id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason 1578790070,I_kwDOCGYnMM5eGmy2,527,`Table.convert()` skips falsey values,167893,closed,0,,,5,2023-02-10T00:00:52Z,2023-05-09T21:15:05Z,2023-05-08T21:03:24Z,CONTRIBUTOR,,"# Summary By design, `Table.convert()` does [not attempt](https://github.com/simonw/sqlite-utils/blob/fc221f9b62ed8624b1d2098e564f525c84497969/sqlite_utils/db.py#L2663) conversion of falsey values (`None`, `""""`, `0`, ...). This is surprising (directly contradicts the docstring) and `convert()` may quietly skip cells where the user assumed a conversion would take place. # Example Increment a column of integers by one ``` python from sqlite_utils import Database db = Database(memory=True) table = db['table'] col = 'x' table.insert_all([{col: 0}, {col:1}]) print(table.get(1)) # 0 print(table.get(2)) # 1 print() table.convert(col, lambda x: x+1) print(table.get(1)) # got 0, expected 1 ⚠⚠⚠ print(table.get(2)) # got 2, expected 2 ``` Another example might be, say, transforming cells containing empty string to `NULL`. # Discussion This was, I think, a pragmatic choice so that consumers can skip writing guard clauses for these falsey values (particularly from the CLI). But this surprising undocumented behavior can lead to incorrect data. I don't think this is a good trade-off between convenience and correctness. In the absence of this convenience users will either have to write guard clauses into their conversion expressions (or adapt the called function to do the same), so: ``` python fn(value) if value else value ``` instead of: ``` python fn(value) ``` This is more typing and sometimes I will forget, and there will be errors. (But they will be noisy errors, which is a good thing). Such a change will certainly inconvenience some existing consumers; there will be some breakage. But I think this is worth it to avoid quietly not converting some values by default, which can lead to quietly bad data. I have a PR that I will attach, please take a look and see what you think.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/527/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 1575131737,I_kwDOCGYnMM5d4ppZ,525,Repeated calls to `Table.convert()` fail,167893,closed,0,,,4,2023-02-07T22:40:47Z,2023-05-08T21:59:41Z,2023-05-08T21:54:02Z,CONTRIBUTOR,,"## Summary When using the API, repeated calls to `Table.convert()` do not work correctly since all conversions quietly use the callable (function, lambda) from the first call to `convert()` only. Subsequent invocations with different callables use the callable from the first invocation only. ## Example ```python from sqlite_utils import Database db = Database(memory=True) table = db['table'] col = 'x' table.insert_all([{col: 1}]) print(table.get(1)) table.convert(col, lambda x: x*2) print(table.get(1)) def zeroize(x): return 0 #zeroize = lambda x: 0 #zeroize.__name__ = 'zeroize' table.convert(col, zeroize) print(table.get(1)) ``` Output: ``` {'x': 1} {'x': 2} {'x': 4} ``` Expected: ``` {'x': 1} {'x': 2} {'x': 0} ``` ## Explanation This is some relevant [documentation](https://github.com/simonw/sqlite-utils/blob/1491b66dd7439dd87cd5cd4c4684f46eb3c5751b/docs/python-api.rst#registering-custom-sql-functions:~:text=By%20default%20registering%20a%20function%20with%20the%20same%20name%20and%20number%20of%20arguments%20will%20have%20no%20effect). * `Table.convert()` takes a `Callable` to perform data conversion on a column * The `Callable` is passed to `Database.register_function()` * `Database.register_function()` uses the callable's `__name__` attribute for registration * (Aside: all lambdas have a `__name__` of ``: I thought this was the problem, and it was close, but not quite) * However `convert()` first wraps the callable by local function [`convert_value()`](https://github.com/simonw/sqlite-utils/blob/fc221f9b62ed8624b1d2098e564f525c84497969/sqlite_utils/db.py#L2661) * Consequently `register_function()` sees name `convert_value` for all invocations from `convert()` * `register_function()` silently ignores registrations using the same name, retaining only the first such registration There's a mismatch between the comments and the code: https://github.com/simonw/sqlite-utils/blob/fc221f9b62ed8624b1d2098e564f525c84497969/sqlite_utils/db.py#L404 but actually the existing function is returned/used instead (as the ""registering custom sql functions"" doc I linked above says too). Seems like this can be rectified to match the comment? ## Suggested fix I think there are four things: 1. The call to `register_function()` from `convert()`should have an explicit `name=` parameter (to continue using `convert_value()` and the progress bar). 2. For functions, this name can be the real function name. (I understand the sqlite api needs a name, and it's nice if those are recognizable names where possible). For lambdas would `'lambda-{uuid}'` or similar be acceptable? 3. `register_function()` really should throw an error on repeated attempts to register a duplicate (function, arity)-pair. 4. A test? I haven't looked at the test framework here but seems this should be testable. ## See also - #458 ",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/525/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed 1576990618,PR_kwDOCGYnMM5JkkED,526,Fix repeated calls to `Table.convert()`,167893,closed,0,,,0,2023-02-09T00:14:49Z,2023-05-08T21:56:05Z,2023-05-08T21:53:58Z,CONTRIBUTOR,simonw/sqlite-utils/pulls/526,"Fixes #525. All tests pass. There's perhaps a better way to name lambdas? There could be a collision if a caller passes a function with name like `lambda_123456`. SQLite [documentation](https://www.sqlite.org/appfunc.html) is a little, ah, lite on function name specs. If there is a character that can be used in place of underscore in a SQLite function name that is not permitted in a Python function identifier then that could be a good way to prevent accidental collisions. (I tried dash, colon, dot, no joy). Otherwise, there is little chance of this happening and if it should happen the risk is mitigated by now throwing an exception in the case of a (name, arity) collision without `replace=True`. ---- :books: Documentation preview :books:: https://sqlite-utils--526.org.readthedocs.build/en/526/ ",140912432,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/526/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0, 1578793661,PR_kwDOCGYnMM5Jqn1u,528,Enable `Table.convert()` on falsey values,167893,closed,0,,,1,2023-02-10T00:04:09Z,2023-05-08T21:08:23Z,2023-05-08T21:08:23Z,CONTRIBUTOR,simonw/sqlite-utils/pulls/528,"Fixes #527 ---- :books: Documentation preview :books:: https://sqlite-utils--528.org.readthedocs.build/en/528/ ",140912432,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/528/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,