html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/simonw/sqlite-utils/issues/61#issuecomment-533818697,https://api.github.com/repos/simonw/sqlite-utils/issues/61,533818697,MDEyOklzc3VlQ29tbWVudDUzMzgxODY5Nw==,49260,2019-09-21T18:09:01Z,2019-09-21T18:09:28Z,CONTRIBUTOR,"@witeshadow The library version doesn't have helpers around CSV (at least not from what I can see in the code). But here's a snippet that makes it easy to insert from CSV using the library. ``` import csv from sqlite_utils import Database # CSV Reader csv_file = open(""filename.csv"") # open the csv file. reader = csv.reader(csv_file) # Create a CSV reader headers = next(reader) # First line is the header docs = (dict(zip(headers, row)) for row in reader) # Now you can use the `sqlite_utils` library. db = Database(""my_database.db"") db[""table_name""].insert_all(docs) ``` This snippet is adapted from reading the CLI source code on how it implements the csv option.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",491219910, https://github.com/simonw/sqlite-utils/pull/604#issuecomment-1843975536,https://api.github.com/repos/simonw/sqlite-utils/issues/604,1843975536,IC_kwDOCGYnMM5t6NVw,16437338,2023-12-07T01:17:05Z,2023-12-07T01:17:05Z,CONTRIBUTOR,Apologies - I pushed a fix that addresses the mypy failures.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",2001006157, https://github.com/simonw/sqlite-utils/issues/578#issuecomment-1648339661,https://api.github.com/repos/simonw/sqlite-utils/issues/578,1648339661,IC_kwDOCGYnMM5iP6rN,25778,2023-07-24T17:44:30Z,2023-07-24T17:44:30Z,CONTRIBUTOR,"> A related feature would be support for plugins to add new ways of ingesting data - currently sqlite-utils insert works against JSON, newline-JSON, CSV and TSV. This is my goal, to have one plugin that handles input and output symmetrically. I'd like to be able to do something like this: ```sh sqlite-utils insert data.db table file.geojson --format geojson # ... explore and manipulate in Datasette sqlite-utils query data.db ... --format geojson > output.geojson ``` This would work especially well with [datasette-query-files](https://github.com/eyeseast/datasette-query-files), since I already have the queries I need saved in standalone SQL files. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1818838294, https://github.com/simonw/sqlite-utils/issues/578#issuecomment-1668113177,https://api.github.com/repos/simonw/sqlite-utils/issues/578,1668113177,IC_kwDOCGYnMM5jbWMZ,25778,2023-08-07T15:41:49Z,2023-08-07T15:41:49Z,CONTRIBUTOR,I wonder if this should be two hooks: input and output. The current `--csv` (and `--tsv`) options apply to both. Haven't looked at how it's implemented. Or maybe it's one hook that returns a format for reading and for writing.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1818838294, https://github.com/simonw/sqlite-utils/pull/57#issuecomment-527211047,https://api.github.com/repos/simonw/sqlite-utils/issues/57,527211047,MDEyOklzc3VlQ29tbWVudDUyNzIxMTA0Nw==,49260,2019-09-02T17:30:43Z,2019-09-02T17:30:43Z,CONTRIBUTOR,"I have merged the other PR (#56) into this one. I have incorporated your suggestions. Cheers!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",487987958, https://github.com/simonw/sqlite-utils/issues/567#issuecomment-1638910473,https://api.github.com/repos/simonw/sqlite-utils/issues/567,1638910473,IC_kwDOCGYnMM5hr8oJ,15178711,2023-07-17T21:27:41Z,2023-07-17T21:27:41Z,CONTRIBUTOR,"Another use-case: I want to make a `sqlite-utils` plugin that'll help me insert data into Datasette. ```bash sqlite-utils insert-datasette \ --token $DATASETTE_API_KEY \ https://latest.datasette.io/fixtures/my-table \ 'select ...' ``` This could also be a datasette plugin (ex `datasette upload-data ...`, but you can also think of `sqlite-utils` plugins that upload to S3, a postgres table, other DBMS's, etc.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1801394744, https://github.com/simonw/sqlite-utils/issues/567#issuecomment-1642808866,https://api.github.com/repos/simonw/sqlite-utils/issues/567,1642808866,IC_kwDOCGYnMM5h60Yi,25778,2023-07-19T21:54:27Z,2023-07-19T21:54:27Z,CONTRIBUTOR,Would this possibly make a bunch of `x-to-sqlite` tools obsolete? Or nudge some to become plugins?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1801394744, https://github.com/simonw/sqlite-utils/issues/567#issuecomment-1646656283,https://api.github.com/repos/simonw/sqlite-utils/issues/567,1646656283,IC_kwDOCGYnMM5iJfsb,25778,2023-07-22T19:32:24Z,2023-07-22T19:32:24Z,CONTRIBUTOR,Cool. I might try to add a geojson plugin that handles both input and output. That would help me out a lot. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1801394744, https://github.com/simonw/sqlite-utils/pull/56#issuecomment-527209840,https://api.github.com/repos/simonw/sqlite-utils/issues/56,527209840,MDEyOklzc3VlQ29tbWVudDUyNzIwOTg0MA==,49260,2019-09-02T17:23:21Z,2019-09-02T17:23:21Z,CONTRIBUTOR,"I have updated the other PR with the changes from this one and added tests. I have also changed the escaping from double quotes to brackets. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",487847945, https://github.com/simonw/sqlite-utils/issues/557#issuecomment-1577355134,https://api.github.com/repos/simonw/sqlite-utils/issues/557,1577355134,IC_kwDOCGYnMM5eBId-,7908073,2023-06-05T19:26:26Z,2023-06-05T19:26:26Z,CONTRIBUTOR,"this isn't really actionable... I'm just being a whiny baby. I have tasted the milk of being able to use `upsert_all`, `insert_all`, etc without having to write DDL to create tables. The meat of the issue is that SQLITE doesn't make rowid stable between vacuums so it is not possible to take shortcuts","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1740150327, https://github.com/simonw/sqlite-utils/issues/557#issuecomment-1590531892,https://api.github.com/repos/simonw/sqlite-utils/issues/557,1590531892,IC_kwDOCGYnMM5ezZc0,7908073,2023-06-14T06:09:21Z,2023-06-14T06:09:21Z,CONTRIBUTOR,"I put together a [simple script](https://github.com/chapmanjacobd/library/blob/42129c5ebe15f9d74653c0f5ca4ed0c991d383e0/xklb/scripts/dedupe_db.py) to upsert and remove duplicate rows based on business keys. If anyone has similar problems with above this might help ``` CREATE TABLE my_table ( id INTEGER PRIMARY KEY, column1 TEXT, column2 TEXT, column3 TEXT ); INSERT INTO my_table (column1, column2, column3) VALUES ('Value 1', 'Duplicate 1', 'Duplicate A'), ('Value 2', 'Duplicate 2', 'Duplicate B'), ('Value 3', 'Duplicate 2', 'Duplicate C'), ('Value 4', 'Duplicate 3', 'Duplicate D'), ('Value 5', 'Duplicate 3', 'Duplicate E'), ('Value 6', 'Duplicate 3', 'Duplicate F'); ``` ``` library dedupe-db test.db my_table --bk column2 ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1740150327, https://github.com/simonw/sqlite-utils/issues/556#issuecomment-1575310378,https://api.github.com/repos/simonw/sqlite-utils/issues/556,1575310378,IC_kwDOCGYnMM5d5VQq,601708,2023-06-04T01:21:15Z,2023-06-04T01:21:15Z,CONTRIBUTOR,"I've resolved my use, with the line-buffered output and while read loop for line buffered input, but I leave this here so the incremental saving or line-buffered use-case can be explicitly handled or rejected (or deferred).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1740026046, https://github.com/simonw/sqlite-utils/issues/555#issuecomment-1592047502,https://api.github.com/repos/simonw/sqlite-utils/issues/555,1592047502,IC_kwDOCGYnMM5e5LeO,7908073,2023-06-14T22:00:10Z,2023-06-14T22:01:57Z,CONTRIBUTOR,"You may want to try doing a performance comparison between this and just selecting all the ids with few constraints and then doing the filtering within python. That might seem like a lazy-programmer, inefficient way but queries with large resultsets are a different profile than what databases like SQLITE are designed for. That is not to say that SQLITE is slow or that python is always faster but when you start reading >20% of an index there is an equilibrium that is reached. Especially when adding in writing extra temp tables and stuff to memory/disk. And especially given the `NOT IN` style of query... You may also try chunking like this: ```py def chunks(lst, n) -> Generator: for i in range(0, len(lst), n): yield lst[i : i + n] SQLITE_PARAM_LIMIT = 32765 data = [] chunked = chunks(video_ids, consts.SQLITE_PARAM_LIMIT) for ids in chunked: data.expand( list( db.query( f""""""SELECT * from videos WHERE id in ("""""" + "","".join([""?""] * len(ids)) + "")"", (*ids,), ) ) ) ``` but that actually won't work with your `NOT IN` requirements. You need to query the full resultset to check any row. Since you are doing stuff with files/videos in SQLITE you might be interested in my side project: https://github.com/chapmanjacobd/library","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1733198948, https://github.com/simonw/sqlite-utils/issues/535#issuecomment-1592052320,https://api.github.com/repos/simonw/sqlite-utils/issues/535,1592052320,IC_kwDOCGYnMM5e5Mpg,7908073,2023-06-14T22:05:28Z,2023-06-14T22:05:28Z,CONTRIBUTOR,piping to `jq` is good enough usually,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1655860104, https://github.com/simonw/sqlite-utils/pull/531#issuecomment-1465315726,https://api.github.com/repos/simonw/sqlite-utils/issues/531,1465315726,IC_kwDOCGYnMM5XVvGO,25778,2023-03-12T22:21:56Z,2023-03-12T22:21:56Z,CONTRIBUTOR,"Exactly, that's what I was running into. On my M2 MacBook, SpatiaLite ends up in what is -- for the moment -- a non-standard location, so even when I passed in the location with `--load-extension`, I still hit an error on `create-spatial-index`. What I learned doing this originally is that SQLite needs to load the extension for each connection, even if all the SpatiaLite stuff is already in the database. So that's why `init_spatialite()` gets called again. Here's the code where I hit the error: https://github.com/eyeseast/boston-parcels/blob/main/Makefile#L30 It works using this branch. I'm not attached to this solution if you can think of something better. And I'm not sure, TBH, my test would actually catch what I'm after here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1620164673, https://github.com/simonw/sqlite-utils/pull/531#issuecomment-1501017004,https://api.github.com/repos/simonw/sqlite-utils/issues/531,1501017004,IC_kwDOCGYnMM5Zd7Os,25778,2023-04-09T01:49:43Z,2023-04-09T01:49:43Z,CONTRIBUTOR,I'm going to close this in favor of #536. Will try a cleaner approach to custom paths once that one is merge.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1620164673, https://github.com/simonw/sqlite-utils/issues/529#issuecomment-1592110694,https://api.github.com/repos/simonw/sqlite-utils/issues/529,1592110694,IC_kwDOCGYnMM5e5a5m,7908073,2023-06-14T23:11:47Z,2023-06-14T23:12:12Z,CONTRIBUTOR,"sorry i was wrong. `sqlite-utils --raw-lines` works correctly ``` sqlite-utils --raw-lines :memory: ""SELECT * FROM (VALUES ('test'), ('line2'))"" | cat -A test$ line2$ sqlite-utils --csv --no-headers :memory: ""SELECT * FROM (VALUES ('test'), ('line2'))"" | cat -A test$ line2$ ``` I think this was fixed somewhat recently","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1581090327, https://github.com/simonw/sqlite-utils/issues/527#issuecomment-1540900733,https://api.github.com/repos/simonw/sqlite-utils/issues/527,1540900733,IC_kwDOCGYnMM5b2Ed9,167893,2023-05-09T21:15:05Z,2023-05-09T21:15:05Z,CONTRIBUTOR,"Sorry, I completely missed your first comment whilst on Easter break. This looks like a good practical compromise before v4. Thanks!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1578790070, https://github.com/simonw/sqlite-utils/issues/525#issuecomment-1423387341,https://api.github.com/repos/simonw/sqlite-utils/issues/525,1423387341,IC_kwDOCGYnMM5U1yrN,167893,2023-02-08T23:48:52Z,2023-02-09T00:17:30Z,CONTRIBUTOR,PR below,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1575131737, https://github.com/simonw/sqlite-utils/issues/525#issuecomment-1435318713,https://api.github.com/repos/simonw/sqlite-utils/issues/525,1435318713,IC_kwDOCGYnMM5VjTm5,167893,2023-02-17T21:55:01Z,2023-02-17T21:55:01Z,CONTRIBUTOR,"Meanwhile, a cheap workaround is to invalidate the registered function cache: ``` python table.convert(...) db._registered_functions = set() table.convert(...) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1575131737, https://github.com/simonw/sqlite-utils/issues/524#issuecomment-1419357290,https://api.github.com/repos/simonw/sqlite-utils/issues/524,1419357290,IC_kwDOCGYnMM5Umaxq,25778,2023-02-06T16:21:44Z,2023-02-06T16:21:44Z,CONTRIBUTOR,SQLite doesn't have a native `DATETIME` type. It stores dates internally as strings and then has [functions](https://www.sqlite.org/lang_datefunc.html) to work with date-like strings. Yes it's weird.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1572766460, https://github.com/simonw/sqlite-utils/issues/523#issuecomment-1407264466,https://api.github.com/repos/simonw/sqlite-utils/issues/523,1407264466,IC_kwDOCGYnMM5T4SbS,536941,2023-01-28T02:41:14Z,2023-01-28T02:41:14Z,CONTRIBUTOR,"I also often then run another little script to cast all empty strings to null, but i save that for another issue if this gets accepted.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1560651350, https://github.com/simonw/sqlite-utils/issues/520#issuecomment-1421571810,https://api.github.com/repos/simonw/sqlite-utils/issues/520,1421571810,IC_kwDOCGYnMM5Uu3bi,167893,2023-02-07T22:43:09Z,2023-02-07T22:43:09Z,CONTRIBUTOR,"Hey, isn't this essentially the same issue as #448 ?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1516644980, https://github.com/simonw/sqlite-utils/issues/511#issuecomment-1304078945,https://api.github.com/repos/simonw/sqlite-utils/issues/511,1304078945,IC_kwDOCGYnMM5Nuqph,7908073,2022-11-04T19:38:36Z,2022-11-04T20:13:17Z,CONTRIBUTOR,"Even more bizarre, the source db only has one record and the target table has no conflicting record: ``` 875 0.3s lb:/ (main|✚2) [0|0]🌺 sqlite-utils tube_71.db 'select * from media where path = ""https://archive.org/details/088ghostofachanceroygetssackedrevengeofthelivinglunchdvdripxvidphz""' | jq [ { ""size"": null, ""time_created"": null, ""play_count"": 1, ""language"": null, ""view_count"": null, ""width"": null, ""height"": null, ""fps"": null, ""average_rating"": null, ""live_status"": null, ""age_limit"": null, ""uploader"": null, ""time_played"": 0, ""path"": ""https://archive.org/details/088ghostofachanceroygetssackedrevengeofthelivinglunchdvdripxvidphz"", ""id"": ""088ghostofachanceroygetssackedrevengeofthelivinglunchdvdripxvidphz/074 - Home Away from Home, Rainy Day Robot, Odie the Amazing DVDRip XviD [PhZ].mkv"", ""ie_key"": ""ArchiveOrg"", ""playlist_path"": ""https://archive.org/details/088ghostofachanceroygetssackedrevengeofthelivinglunchdvdripxvidphz"", ""duration"": 1424.05, ""tags"": null, ""title"": ""074 - Home Away from Home, Rainy Day Robot, Odie the Amazing DVDRip XviD [PhZ].mkv"" } ] 875 0.3s lb:/ (main|✚2) [0|0]🥧 sqlite-utils video.db 'select * from media where path = ""https://archive.org/details/088ghostofachanceroygetssackedrevengeofthelivinglunchdvdripxvidphz""' | jq [] ``` I've been able to use this code successfully several times before so not sure what's causing the issue. I guess the way that I'm handling multiple databases is an issue, though it hasn't ever inserted into the source db, not sure what's different. The only reasonable explanation is that it is trying to insert into the source db from the source db for some reason? Or maybe sqlite3 is checking the source db for primary key violation because the table name is the same","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1436539554, https://github.com/simonw/sqlite-utils/issues/511#issuecomment-1304320521,https://api.github.com/repos/simonw/sqlite-utils/issues/511,1304320521,IC_kwDOCGYnMM5NvloJ,7908073,2022-11-04T22:54:09Z,2022-11-04T22:59:54Z,CONTRIBUTOR,I ran `PRAGMA integrity_check` and it returned `ok`. but then I tried restoring from a backup and I didn't get this `IntegrityError: constraint failed` error. So I think it was just something wrong with my database. If it happens again I will first try to reindex and see if that fixes the issue,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1436539554, https://github.com/simonw/sqlite-utils/issues/510#issuecomment-1318777114,https://api.github.com/repos/simonw/sqlite-utils/issues/510,1318777114,IC_kwDOCGYnMM5OmvEa,7908073,2022-11-17T15:09:47Z,2022-11-17T15:09:47Z,CONTRIBUTOR,"why close? is the only problem that the _config table that incorrectly says 4 for fts5? if so, that's still something that should be fixed","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1434911255, https://github.com/simonw/sqlite-utils/pull/508#issuecomment-1297788531,https://api.github.com/repos/simonw/sqlite-utils/issues/508,1297788531,IC_kwDOCGYnMM5NWq5z,7908073,2022-10-31T22:54:33Z,2022-11-17T15:11:16Z,CONTRIBUTOR,"Maybe this is actually a problem in the python sqlite bindings. Given [SQLITE's stance on this](https://www.sqlite.org/invalidutf.html) they should probably use `encode('utf-8', 'surrogatepass')`. As far as I understand the error here won't actually be resolved by this PR as-is. We would need to modify the data with `surrogateescape`... :/ or modify the sqlite3 module to use `surrogatepass`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1430563092, https://github.com/simonw/sqlite-utils/issues/507#issuecomment-1297859539,https://api.github.com/repos/simonw/sqlite-utils/issues/507,1297859539,IC_kwDOCGYnMM5NW8PT,7908073,2022-11-01T00:40:16Z,2022-11-01T00:40:16Z,CONTRIBUTOR,"Ideally people could fix their data if they run into this issue. If you are using filenames try [convmv](https://linux.die.net/man/1/convmv) ``` convmv --preserve-mtimes -f utf8 -t utf8 --notest -i -r . ``` maybe this script will also help: ```py import argparse, shutil from pathlib import Path import ftfy from xklb import utils from xklb.utils import log def parse_args() -> argparse.Namespace: parser = argparse.ArgumentParser() parser.add_argument(""paths"", nargs='*') parser.add_argument(""--verbose"", ""-v"", action=""count"", default=0) args = parser.parse_args() log.info(utils.dict_filter_bool(args.__dict__)) return args def rename_invalid_paths() -> None: args = parse_args() for path in args.paths: log.info(path) for p in sorted([str(p) for p in Path(path).rglob(""*"")], key=len): fixed = ftfy.fix_text(p, uncurl_quotes=False).replace(""\r\n"", ""\n"").replace(""\r"", ""\n"").replace(""\n"", """") if p != fixed: try: shutil.move(p, fixed) except FileNotFoundError: log.warning(""FileNotFound. %s"", p) else: log.info(fixed) if __name__ == ""__main__"": rename_invalid_paths() ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1430325103, https://github.com/simonw/sqlite-utils/issues/50#issuecomment-1303660293,https://api.github.com/repos/simonw/sqlite-utils/issues/50,1303660293,IC_kwDOCGYnMM5NtEcF,7908073,2022-11-04T14:38:36Z,2022-11-04T14:38:36Z,CONTRIBUTOR,where did you see the limit as 999? I believe the limit has been 32766 for quite some time. If you could detect which one this could speed up batch insert of some types of data significantly,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",473083260, https://github.com/simonw/sqlite-utils/pull/499#issuecomment-1292401308,https://api.github.com/repos/simonw/sqlite-utils/issues/499,1292401308,IC_kwDOCGYnMM5NCHqc,7908073,2022-10-26T17:54:26Z,2022-10-26T17:54:51Z,CONTRIBUTOR,"The problem with how it is currently is that the transformed fts table _will_ return incorrect results (unless the table was only 1 row or something), even if create_triggers was enabled previously. Maybe the simplest solution is to disable fts on a transformed table rather than try to recreate it? Thoughts?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1405196044, https://github.com/simonw/sqlite-utils/pull/498#issuecomment-1274153135,https://api.github.com/repos/simonw/sqlite-utils/issues/498,1274153135,IC_kwDOCGYnMM5L8giv,7908073,2022-10-11T06:34:31Z,2022-10-11T06:34:31Z,CONTRIBUTOR,nevermind it was because I was running `db[table].transform`. The fts tables would still be there but the triggers would be dropped,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1404013495, https://github.com/simonw/sqlite-utils/issues/493#issuecomment-1264219650,https://api.github.com/repos/simonw/sqlite-utils/issues/493,1264219650,IC_kwDOCGYnMM5LWnYC,7908073,2022-10-01T03:22:50Z,2022-10-01T03:23:58Z,CONTRIBUTOR,"this is likely what you are looking for: https://stackoverflow.com/a/51076749/697964 but yeah I would say just disable smart quotes","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1386562662, https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1256858763,https://api.github.com/repos/simonw/sqlite-utils/issues/491,1256858763,IC_kwDOCGYnMM5K6iSL,7908073,2022-09-24T04:50:59Z,2022-09-24T04:52:08Z,CONTRIBUTOR,"Instead of outputting binary data to stdout the interface might be better like this ``` sqlite-utils merge animals.db cats.db dogs.db ``` similar to `zip`, `ogr2ogr`, etc Actually I think this might already be possible within `ogr2ogr`. I don't believe spatial data is a requirement though it might add an `ogc_id` column or something ``` cp cats.db animals.db ogr2ogr -append animals.db dogs.db ogr2ogr -append animals.db another.db ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1383646615, https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258508215,https://api.github.com/repos/simonw/sqlite-utils/issues/491,1258508215,IC_kwDOCGYnMM5LA0-3,25778,2022-09-26T19:22:14Z,2022-09-26T19:22:14Z,CONTRIBUTOR,"This might be fairly straightforward using SQLite's backup utility: https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.backup ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1383646615, https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258712931,https://api.github.com/repos/simonw/sqlite-utils/issues/491,1258712931,IC_kwDOCGYnMM5LBm9j,25778,2022-09-26T22:31:58Z,2022-09-26T22:31:58Z,CONTRIBUTOR,"Right. The backup command will copy tables completely, but in the case of conflicting table names, the destination gets overwritten silently. That might not be what you want here. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1383646615, https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1264218914,https://api.github.com/repos/simonw/sqlite-utils/issues/491,1264218914,IC_kwDOCGYnMM5LWnMi,7908073,2022-10-01T03:18:36Z,2023-06-14T22:14:24Z,CONTRIBUTOR,"> some good concrete use-cases in mind I actually found myself wanting something like this the past couple days. The use-case was databases with slightly different schema but same table names. here is a full script: ``` import argparse from pathlib import Path from sqlite_utils import Database def connect(args, conn=None, **kwargs) -> Database: db = Database(conn or args.database, **kwargs) with db.conn: db.conn.execute(""PRAGMA main.cache_size = 8000"") return db def parse_args() -> argparse.Namespace: parser = argparse.ArgumentParser() parser.add_argument(""database"") parser.add_argument(""dbs_folder"") parser.add_argument(""--db"", ""-db"", help=argparse.SUPPRESS) parser.add_argument(""--verbose"", ""-v"", action=""count"", default=0) args = parser.parse_args() if args.db: args.database = args.db Path(args.database).touch() args.db = connect(args) return args def merge_db(args, source_db): source_db = str(Path(source_db).resolve()) s_db = connect(argparse.Namespace(database=source_db, verbose = args.verbose)) for table in s_db.table_names(): data = s_db[table].rows args.db[table].insert_all(data, alter=True, replace=True) args.db.conn.commit() def merge_directory(): args = parse_args() source_dbs = list(Path(args.dbs_folder).glob('*.db')) for s_db in source_dbs: merge_db(args, s_db) if __name__ == '__main__': merge_directory() ``` edit: I've made some improvements to this and put it on PyPI: ``` $ pip install xklb $ lb merge-db -h usage: library merge-dbs DEST_DB SOURCE_DB ... [--only-target-columns] [--only-new-rows] [--upsert] [--pk PK ...] [--table TABLE ...] Merge-DBs will insert new rows from source dbs to target db, table by table. If primary key(s) are provided, and there is an existing row with the same PK, the default action is to delete the existing row and insert the new row replacing all existing fields. Upsert mode will update matching PK rows such that if a source row has a NULL field and the destination row has a value then the value will be preserved instead of changed to the source row's NULL value. Ignore mode (--only-new-rows) will insert only rows which don't already exist in the destination db Test first by using temp databases as the destination db. Try out different modes / flags until you are satisfied with the behavior of the program library merge-dbs --pk path (mktemp --suffix .db) tv.db movies.db Merge database data and tables library merge-dbs --upsert --pk path video.db tv.db movies.db library merge-dbs --only-target-columns --only-new-rows --table media,playlists --pk path audio-fts.db audio.db library merge-dbs --pk id --only-tables subreddits reddit/81_New_Music.db audio.db library merge-dbs --only-new-rows --pk subreddit,path --only-tables reddit_posts reddit/81_New_Music.db audio.db -v positional arguments: database source_dbs ``` Also if you want to dedupe a table based on a ""business key"" which isn't explicitly your primary key(s) you can run this: ``` $ lb dedupe-db -h usage: library dedupe-dbs DATABASE TABLE --bk BUSINESS_KEYS [--pk PRIMARY_KEYS] [--only-columns COLUMNS] Dedupe your database (not to be confused with the dedupe subcommand) It should not need to be said but *backup* your database before trying this tool! Dedupe-DB will help remove duplicate rows based on non-primary-key business keys library dedupe-db ./video.db media --bk path If --primary-keys is not provided table metadata primary keys will be used If --only-columns is not provided all non-primary and non-business key columns will be upserted positional arguments: database table options: -h, --help show this help message and exit --skip-0 --only-columns ONLY_COLUMNS Comma separated column names to upsert --primary-keys PRIMARY_KEYS, --pk PRIMARY_KEYS Comma separated primary keys --business-keys BUSINESS_KEYS, --bk BUSINESS_KEYS Comma separated business keys ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1383646615, https://github.com/simonw/sqlite-utils/pull/480#issuecomment-1232356302,https://api.github.com/repos/simonw/sqlite-utils/issues/480,1232356302,IC_kwDOCGYnMM5JdEPO,7908073,2022-08-31T01:51:49Z,2022-08-31T01:51:49Z,CONTRIBUTOR,Thanks for pointing me to the right place,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1355433619, https://github.com/simonw/sqlite-utils/issues/467#issuecomment-1224382336,https://api.github.com/repos/simonw/sqlite-utils/issues/467,1224382336,IC_kwDOCGYnMM5I-peA,50527,2022-08-23T17:16:13Z,2022-08-23T17:16:13Z,CONTRIBUTOR,"> Should passing `alter=True` also drop any columns that aren't included in the new table structure? > > It could even spot column types that aren't correct and fix those. > > Is that consistent with the expectations set by how `alter=True` works elsewhere? I would lean towards not dropping them (or making a `drop=True` or `drop_columns=True`or `drop_missing_columns=True`) to work with existing tables easier. I do like that sqlite-utils mostly just works with existing tables but it's also nice to add to existing fields in a few cases. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1348169997, https://github.com/simonw/sqlite-utils/issues/456#issuecomment-1190272780,https://api.github.com/repos/simonw/sqlite-utils/issues/456,1190272780,IC_kwDOCGYnMM5G8h8M,536941,2022-07-20T13:14:54Z,2022-07-20T13:14:54Z,CONTRIBUTOR,"for example, i have data on votes that look like this: | ballot_id | option_id | choice | |-|-|-| | 1 | 1 | 0 | | 1 | 2 | 1 | | 1 | 3 | 0 | | 1 | 4 | 1 | | 2 | 1 | 1 | | 2 | 2 | 0 | | 2 | 3 | 1 | | 2 | 4 | 0 | and i want to reshape from this long form to this wide form: | ballot_id | option_id_1 | option_id_2 | option_id_3 | option_id_ 4| |-|-|-|-| -| | 1 | 0 | 1 | 0 | 1 | | 2 | 1 | 0 | 1| 0 | i could do such a think like this. ```sql select ballot_id, sum(choice) filter (where option_id = 1) as option_id_1, sum(choice) filter (where option_id = 2) as option_id_2, sum(choice) filter (where option_id = 3) as option_id_3, sum(choice) filter (where option_id = 4) as option_id_4 from vote group by ballot_id ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1310243385, https://github.com/simonw/sqlite-utils/issues/456#issuecomment-1190277829,https://api.github.com/repos/simonw/sqlite-utils/issues/456,1190277829,IC_kwDOCGYnMM5G8jLF,536941,2022-07-20T13:19:15Z,2022-07-20T13:19:15Z,CONTRIBUTOR,hadley wickham's melt and reshape could be good inspo: http://had.co.nz/reshape/introduction.pdf,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1310243385, https://github.com/simonw/sqlite-utils/issues/449#issuecomment-1174027079,https://api.github.com/repos/simonw/sqlite-utils/issues/449,1174027079,IC_kwDOCGYnMM5F-jtH,1690072,2022-07-04T17:33:04Z,2022-07-04T17:48:43Z,CONTRIBUTOR,"I've written the code and test. Would you be able to advise how to compare table columns in a pytest function properly? Experiencing a challenge when comparing columns. Test: ```python def test_duplicate(fresh_db): table = fresh_db.create_table( ""table1"", { ""text_col"": str, ""float_col"": float, ""int_col"": int, ""bool_col"": bool, ""bytes_col"": bytes, ""datetime_col"": datetime.datetime, }, ) dt = datetime.datetime.now() b = bytes('hello world', 'utf-8') data = {""text_col"": ""Cleo"", ""float_col"": 3.14, ""int_col"": -2, ""bool_col"": True, ""bytes_col"": b, ""datetime_col"": str(dt)} table1 = fresh_db[""table1""] row_id = table1.insert(data).last_rowid table1.duplicate('table2') table2 = fresh_db[""table2""] assert data == table2.get(row_id) assert table1.columns == table2.columns # FAILS HERE ``` Result: ![Screenshot 2022-07-05 at 1 31 55 AM](https://user-images.githubusercontent.com/1690072/177198814-daac48c9-5746-49d0-a14a-14fe181c5a2f.png) Failure is due to column types being named differently -- e.g. 'FLOAT' vs 'REAL', 'INTEGER' vs 'INT'. How should I go about comparing columns while accounting for equivalent types? Or did I miss out something in my duplication code correctly? Here's how I did it: in `db.py`, I've added the following code: ```python class Table(Queryable): [...] def duplicate( self, name_new: str ) -> ""Table"": """""" Duplicate this table in this database. :param name_new: Name of new table. """""" assert self.exists() with self.db.conn: sql = ""CREATE TABLE [{new_table}] AS SELECT * FROM [{table}];"".format( new_table = name_new, table = self.name, ) self.db.execute(sql) return self.db[name_new] ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1279863844, https://github.com/simonw/sqlite-utils/issues/449#issuecomment-1179579878,https://api.github.com/repos/simonw/sqlite-utils/issues/449,1179579878,IC_kwDOCGYnMM5GTvXm,1690072,2022-07-09T17:41:32Z,2022-07-09T17:41:50Z,CONTRIBUTOR,Learnt that the types in Sqlite-utils differ somewhat from those in Sqlite. I've changed my test to account for this difference and the test has passed successfully. I will submit a PR.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1279863844, https://github.com/simonw/sqlite-utils/issues/448#issuecomment-1297703307,https://api.github.com/repos/simonw/sqlite-utils/issues/448,1297703307,IC_kwDOCGYnMM5NWWGL,167893,2022-10-31T21:23:51Z,2022-10-31T21:27:32Z,CONTRIBUTOR,"The Windows aspect is a red herring: OP's sample above produces the same error on Linux. (Though I don't know what's going on with the CI). The same error can also be obtained by passing an `io` from a file opened in non-binary mode (`'r'` as opposed to `'rb'`) to `rows_from_file()`. This is how I got here. The fix for my case is easy: open the file in mode `'rb'`. The analagous fix for OP's problem also works: use `BytesIO` in place of `StringIO`. Minimal test case (derived from [utils.py](https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/utils.py#L304)): ``` python import io from typing import cast #fp = io.StringIO(""id,name\n1,Cleo"") # error fp = io.BytesIO(bytes(""id,name\n1,Cleo"", encoding='utf-8')) # okay reader = io.BufferedReader(cast(io.RawIOBase, fp)) reader.peek(1) # exception thrown here ``` I see the signature of `rows_from_file()` correctly has `fp: BinaryIO` but I guess you'd need either a runtime type check for that (not all `io`s have `mode()`), or to catch the `AttributeError` on `peek()` to produce a better error for users. Neither option is ideal. Some thoughts on testing binary-ness of `io`s in this SO question: https://stackoverflow.com/questions/44584829/how-to-determine-if-file-is-opened-in-binary-or-text-mode","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1279144769, https://github.com/simonw/sqlite-utils/issues/433#issuecomment-1252898131,https://api.github.com/repos/simonw/sqlite-utils/issues/433,1252898131,IC_kwDOCGYnMM5KrbVT,7908073,2022-09-20T20:51:21Z,2022-09-20T20:56:07Z,CONTRIBUTOR,"When I run `reset` it fixes my terminal. I suspect it is related to the progress bar https://linux.die.net/man/1/reset ``` 950 1s /m/d/03_Downloads 🐑 echo $TERM xterm-kitty ▓░▒░ /m/d/03_Downloads 🌏 kitty -v kitty 0.26.2 created by Kovid Goyal $ sqlite-utils insert test.db facility facility-boundary-us-all.csv --csv blah blah blah (no offense) $ $ reset $ ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239034903, https://github.com/simonw/sqlite-utils/issues/433#issuecomment-1444474487,https://api.github.com/repos/simonw/sqlite-utils/issues/433,1444474487,IC_kwDOCGYnMM5WGO53,167893,2023-02-24T20:57:43Z,2023-02-24T22:22:18Z,CONTRIBUTOR,"I think I see what is happening here, although I haven't quite work out a fix yet. Usually: * `click.progressbar.render_progress()` renders the cursor invisible on each invocation (update of the bar) * When the progress bar goes out of scope, the `__exit()__` method is invoked, which calls `render_finish()` to make the cursor re-appear. (See terminal escape sequences `BEFORE_BAR` and `AFTER_BAR` in click). However the sqlite-utils `utils.file_progress` context manager wraps `click.progressbar` and yields an instance of a helper class: ``` python @contextlib.contextmanager def file_progress(file, silent=False, **kwargs): ... with click.progressbar(length=file_length, **kwargs) as bar: yield UpdateWrapper(file, bar.update) ``` The yielded `UpdateWrapper` goes out of scope quickly and `click.progressbar.__exit__()` is called. The cursor is made un-invisible. Hoewever `bar` is still live and so when the caller iterates on the yielded wrapper this invokes the bar's update method, calling `render_progress()`, each time printing the ""make cursor invisible"" escape code. The `progressbar.__exit__` function is not called again, so the cursor doesn't re-appear. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239034903, https://github.com/simonw/sqlite-utils/issues/433#issuecomment-1747231893,https://api.github.com/repos/simonw/sqlite-utils/issues/433,1747231893,IC_kwDOCGYnMM5oJKSV,62745,2023-10-04T16:15:09Z,2023-10-04T16:28:21Z,CONTRIBUTOR,"I confirm the bug, as above, and that @jonafato 's patch fixes it for me. However, it's not the right fix. The problem is that ProgressBar is being used in the wrong way. This also results in two lines being printed instead of one, like this: ``` [#######-----------------------------] 20% [####################################] 100%% ``` The bug is reproducible for me in any terminal, including Gnome Terminal and Guake, and VSCode. With VSCode I can use this launch.json to reproduce it: ```json { ""version"": ""0.2.0"", ""configurations"": [ { ""name"": ""Python: Module"", ""type"": ""python"", ""request"": ""launch"", ""module"": ""sqlite_utils"", ""justMyCode"": false, ""args"": [""insert"", ""test.db"", ""test"", ""--csv"", ""tests/sniff/example1.csv""] } ] } ``` [edit - deleted my analysis of why the current code is wrong, which was confused and confusing]","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1239034903, https://github.com/simonw/sqlite-utils/issues/423#issuecomment-1189010812,https://api.github.com/repos/simonw/sqlite-utils/issues/423,1189010812,IC_kwDOCGYnMM5G3t18,536941,2022-07-19T12:47:39Z,2022-07-19T12:47:39Z,CONTRIBUTOR,just ran into this!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1199158210, https://github.com/simonw/sqlite-utils/issues/412#issuecomment-1059647114,https://api.github.com/repos/simonw/sqlite-utils/issues/412,1059647114,IC_kwDOCGYnMM4_KO6K,25778,2022-03-05T01:54:24Z,2022-03-05T01:54:24Z,CONTRIBUTOR,"I haven't tried this, but it looks like Pandas has a method for this: https://pandas.pydata.org/docs/reference/api/pandas.read_sql_query.html ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1160182768, https://github.com/simonw/sqlite-utils/issues/411#issuecomment-1065477258,https://api.github.com/repos/simonw/sqlite-utils/issues/411,1065477258,IC_kwDOCGYnMM4_geSK,25778,2022-03-11T20:14:59Z,2022-03-11T20:14:59Z,CONTRIBUTOR,"Good call on adding this to `create-table`, especially for stored columns. Having the stored/virtual split might make this tricky to implement, but I haven't gone any farther than thinking about what the CLI looks like. I'm going to try making the SQL side work first and figure that'll tell me more about what it needs.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1160034488, https://github.com/simonw/sqlite-utils/issues/409#issuecomment-1264223363,https://api.github.com/repos/simonw/sqlite-utils/issues/409,1264223363,IC_kwDOCGYnMM5LWoSD,7908073,2022-10-01T03:41:45Z,2022-10-01T03:41:45Z,CONTRIBUTOR,"``` pytest xklb/check.py --pdb xklb/check.py:11: in test_transaction assert list(db2[""t""].rows) == [] E AssertionError: assert [{'foo': 1}] == [] E + where [{'foo': 1}] = list() E + where = .rows >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > /home/xk/github/xk/lb/xklb/check.py(11)test_transaction() 9 with db1.conn: 10 db1[""t""].insert({""foo"": 1}) ---> 11 assert list(db2[""t""].rows) == [] 12 assert list(db2[""t""].rows) == [{""foo"": 1}] ``` It fails because it is already inserted. btw if you put these two lines in you pyproject.toml you can get `ipdb` in pytest ``` [tool.pytest.ini_options] addopts = ""--pdbcls=IPython.terminal.debugger:TerminalPdb --ignore=tests/data --capture=tee-sys --log-cli-level=ERROR"" ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1149661489, https://github.com/simonw/sqlite-utils/issues/409#issuecomment-1264223554,https://api.github.com/repos/simonw/sqlite-utils/issues/409,1264223554,IC_kwDOCGYnMM5LWoVC,7908073,2022-10-01T03:42:50Z,2022-10-01T03:42:50Z,CONTRIBUTOR,oh weird. it inserts into db2,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1149661489, https://github.com/simonw/sqlite-utils/pull/407#issuecomment-1040580250,https://api.github.com/repos/simonw/sqlite-utils/issues/407,1040580250,IC_kwDOCGYnMM4-Bf6a,25778,2022-02-15T17:40:00Z,2022-02-15T17:40:00Z,CONTRIBUTOR,@simonw I think this is ready for a look.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1138948786, https://github.com/simonw/sqlite-utils/pull/407#issuecomment-1040998433,https://api.github.com/repos/simonw/sqlite-utils/issues/407,1040998433,IC_kwDOCGYnMM4-DGAh,25778,2022-02-16T01:29:39Z,2022-02-16T01:29:39Z,CONTRIBUTOR,Happy to do it and have it in the library. Going to use it a bunch. This whole SpatiaLite toolchain become a huge part of my work in the past year.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1138948786, https://github.com/simonw/sqlite-utils/issues/403#issuecomment-1032126353,https://api.github.com/repos/simonw/sqlite-utils/issues/403,1032126353,IC_kwDOCGYnMM49hP-R,536941,2022-02-08T01:45:15Z,2022-02-08T01:45:31Z,CONTRIBUTOR,"you can hack something like this to achieve this result: `sqlite-utils convert my_database my_table rowid ""{'id': value}"" --multi`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1126692066, https://github.com/simonw/sqlite-utils/issues/403#issuecomment-1033332570,https://api.github.com/repos/simonw/sqlite-utils/issues/403,1033332570,IC_kwDOCGYnMM49l2da,536941,2022-02-09T04:22:43Z,2022-02-09T04:22:43Z,CONTRIBUTOR,dddoooope,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1126692066, https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1031779460,https://api.github.com/repos/simonw/sqlite-utils/issues/402,1031779460,IC_kwDOCGYnMM49f7SE,25778,2022-02-07T18:24:56Z,2022-02-07T18:24:56Z,CONTRIBUTOR,"I wonder if there's any overlap with the goals here and the `sqlite3` module's concept of adapters and converters: https://docs.python.org/3/library/sqlite3.html#sqlite-and-python-types I'm not sure that's _exactly_ what we're talking about here, but it might be a parallel with some useful ideas to borrow.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1125297737, https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1031791783,https://api.github.com/repos/simonw/sqlite-utils/issues/402,1031791783,IC_kwDOCGYnMM49f-Sn,25778,2022-02-07T18:37:40Z,2022-02-07T18:37:40Z,CONTRIBUTOR,"I've never used it either, but it's interesting, right? Feel like I should try it for something. I'm trying to get my head around how this conversions feature might work, because I really like the idea of it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1125297737, https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1032732242,https://api.github.com/repos/simonw/sqlite-utils/issues/402,1032732242,IC_kwDOCGYnMM49jj5S,25778,2022-02-08T15:26:59Z,2022-02-08T15:26:59Z,CONTRIBUTOR,"What if you did something like this: ```python class Conversion: def __init__(self, *args, **kwargs): ""Put whatever settings you need here"" def python(self, row, column, value): # not sure on args here ""Python step to transform value"" return value def sql(self, row, column, value): ""Return the actual sql that goes in the insert/update step, and maybe params"" # value is the return of self.python() return value, [] ``` This way, you're always passing an instance, which has methods that do the conversion. (Or you're passing a SQL string, as you would now.) The `__init__` could take column names, or SRID, or whatever other setup state you need per row, but the row is getting processed with the `python` and `sql` methods (or whatever you want to call them). This is pretty rough, so do what you will with names and args and such. You'd then use it like this: ```python # subclass might be unneeded here, if methods are present class LngLatConversion(Conversion): def __init__(self, x=""longitude"", y=""latitude""): self.x = x self.y = y def python(self, row, column, value): x = row[self.x] y = row[self.y] return x, y def sql(self, row, column, value): # value is now a tuple, returned above s = ""GeomFromText(POINT(? ?))"" return s, value table.insert_all(rows, conversions={""point"": LngLatConversion(""lng"", ""lat""))} ``` I haven't thought through all the implementation details here, and it'll probably break in ways I haven't foreseen, but wanted to get this idea out of my head. Hope it helps.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1125297737, https://github.com/simonw/sqlite-utils/issues/402#issuecomment-1035057014,https://api.github.com/repos/simonw/sqlite-utils/issues/402,1035057014,IC_kwDOCGYnMM49sbd2,25778,2022-02-10T15:30:28Z,2022-02-10T15:30:40Z,CONTRIBUTOR,"Yeah, the CLI experience is probably where any kind of multi-column, configured setup is going to fall apart. Sticking with GIS examples, one way I might think about this is using the [fiona CLI](https://fiona.readthedocs.io/en/latest/cli.html): ```sh # assuming a database is already created and has SpatiaLite fio cat boundary.shp | sqlite-utils insert boundaries --conversion geometry GeometryGeoJSON - ``` Anyway, very interested to see where you land here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1125297737, https://github.com/simonw/sqlite-utils/issues/399#issuecomment-1030740653,https://api.github.com/repos/simonw/sqlite-utils/issues/399,1030740653,IC_kwDOCGYnMM49b9qt,25778,2022-02-06T02:57:17Z,2022-02-06T02:57:17Z,CONTRIBUTOR,"I like the idea of having stock conversions you could import. I'd actually move them to a dedicated module (call it `sqlite_utils.conversions` or something), because it's different from other utilities. Maybe they even take configuration, or they're composable. ```python from sqlite_utils.conversions import LongitudeLatitude db[""places""].insert( { ""name"": ""London"", ""lng"": -0.118092, ""lat"": 51.509865, }, conversions={""point"": LongitudeLatitude(""lng"", ""lat"")}, ) ``` I would definitely use that for every CSV I get with lat/lng columns where I actually need GeoJSON.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1124731464, https://github.com/simonw/sqlite-utils/issues/399#issuecomment-1030740826,https://api.github.com/repos/simonw/sqlite-utils/issues/399,1030740826,IC_kwDOCGYnMM49b9ta,25778,2022-02-06T02:59:10Z,2022-02-06T02:59:10Z,CONTRIBUTOR,"All this said, I don't think it's unreasonable to point people to dedicated tools like `geojson-to-sqlite`. If I'm dealing with a bunch of GeoJSON or Shapefiles, I need to something to read those anyway (or I need to figure out virtual tables). But something like this might make it easier to build those libraries, or standardize the underlying parts.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1124731464, https://github.com/simonw/sqlite-utils/issues/399#issuecomment-1030741289,https://api.github.com/repos/simonw/sqlite-utils/issues/399,1030741289,IC_kwDOCGYnMM49b90p,25778,2022-02-06T03:03:43Z,2022-02-06T03:03:43Z,CONTRIBUTOR,"> I wonder if there are any interesting non-geospatial canned conversions that it would be worth including? Off the top of my head: - Un-nesting JSON objects into columns - Splitting arrays - Normalizing dates and times - URL munging with `urlparse` - Converting strings to numbers Some of this is easy enough with SQL functions, some is easier in Python. Maybe that's where having pre-built classes gets really handy, because it saves you from thinking about which way it's implemented.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1124731464, https://github.com/simonw/sqlite-utils/issues/399#issuecomment-1077671779,https://api.github.com/repos/simonw/sqlite-utils/issues/399,1077671779,IC_kwDOCGYnMM5AO_dj,25778,2022-03-24T14:11:33Z,2022-03-24T14:11:43Z,CONTRIBUTOR,"Coming back to this. I was about to add a utility function to [datasette-geojson]() to convert lat/lng columns to geometries. Thankfully I googled first. There's a SpatiaLite function for this: [MakePoint](https://www.gaia-gis.it/gaia-sins/spatialite-sql-latest.html#p0). ```sql select MakePoint(longitude, latitude) as geometry from places; ``` I'm not sure if that would work with `conversions`, since it needs two columns, but it's an option for tables that already have latitude, longitude columns.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1124731464, https://github.com/simonw/sqlite-utils/issues/398#issuecomment-1030629879,https://api.github.com/repos/simonw/sqlite-utils/issues/398,1030629879,IC_kwDOCGYnMM49bin3,25778,2022-02-05T13:57:33Z,2022-02-05T19:49:38Z,CONTRIBUTOR,"I'm mostly using [geojson-to-sqlite](https://github.com/simonw/geojson-to-sqlite) at the moment. Even with shapefiles, I'm usually converting to GeoJSON and projecting to EPSG:4326 (with [ogr2ogr](https://gdal.org/programs/ogr2ogr.html)) first. I think an open question here is how much you want to leave to external libraries and how much you want here. My thinking has been that adding Spatialite helpers here would make external stuff easier, but it would be nice to have some standard way to insert geometries. I'm in the middle of adding GeoJSON and Spatialite support to [geocode-sqlite](https://github.com/eyeseast/geocode-sqlite), and that will probably use WKT. Since that's all points, I think I can just make the string inline. But for polygons, I'd generally use Shapely, which probably isn't a dependency you want to add to sqlite-utils. I've also been trying to get some of the approaches [here](https://www.gaia-gis.it/fossil/libspatialite/wiki?name=Supporting+GeoJSON) to work, but haven't had any success so far.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1124237013, https://github.com/simonw/sqlite-utils/issues/398#issuecomment-1038336591,https://api.github.com/repos/simonw/sqlite-utils/issues/398,1038336591,IC_kwDOCGYnMM4948JP,25778,2022-02-13T18:48:21Z,2022-02-13T18:49:49Z,CONTRIBUTOR,"Been chipping away at this between other things and realized `sqlite-utils init-spatialite` is probably unnecessary. Any of the other commands requires running `db.init_spatialite` to have the extension functions available, and that will do everything `init-spatialite` would do. I think it's probably worth keeping a SpatiaLite flag on `create-database` in case you wanted to create all the spatial metadata up front. Otherwise, it's going to get added the first time you run `add-geometry-column` or `create-spatial-index`, which is probably fine in most cases.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1124237013, https://github.com/simonw/sqlite-utils/pull/385#issuecomment-1029175907,https://api.github.com/repos/simonw/sqlite-utils/issues/385,1029175907,IC_kwDOCGYnMM49V_pj,25778,2022-02-03T16:36:54Z,2022-02-03T16:36:54Z,CONTRIBUTOR,"@simonw Not sure if you've seen this, but any chance you can run the tests?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1102899312, https://github.com/simonw/sqlite-utils/pull/385#issuecomment-1029180984,https://api.github.com/repos/simonw/sqlite-utils/issues/385,1029180984,IC_kwDOCGYnMM49WA44,25778,2022-02-03T16:42:04Z,2022-02-03T16:42:04Z,CONTRIBUTOR,Fixed my spelling. That's a useful thing.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1102899312, https://github.com/simonw/sqlite-utils/pull/385#issuecomment-1029306428,https://api.github.com/repos/simonw/sqlite-utils/issues/385,1029306428,IC_kwDOCGYnMM49Wfg8,25778,2022-02-03T19:03:43Z,2022-02-03T19:03:43Z,CONTRIBUTOR,"I thought about adding these as methods on `Database` and `Table`, and I'm back and forth on it for the same reasons you are. It's certainly cleaner, and it's clearer what you're operating on. I could go either way. I do sort of like having all the Spatialite stuff in its own module, just because it's built around an extension you might not have or want, but I don't know if that's a good reason to have a different API. You could have `init_spatialite` add methods to `Database` and `Table`, so they're only there if you have Spatialite set up. Is that too clever? It feels too clever. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1102899312, https://github.com/simonw/sqlite-utils/pull/385#issuecomment-1029326568,https://api.github.com/repos/simonw/sqlite-utils/issues/385,1029326568,IC_kwDOCGYnMM49Wkbo,25778,2022-02-03T19:28:26Z,2022-02-03T19:28:26Z,CONTRIBUTOR,"> `from sqlite_utils.utils import find_spatialite` is part of the documented API already: > > https://sqlite-utils.datasette.io/en/3.22.1/python-api.html#finding-spatialite > > To avoid needing to bump the major version number to 4 to indicate a backwards incompatible change, we should keep a `from .gis import find_spatialite` line at the top of `utils.py` such that any existing code with that documented import continues to work. This is fixed now. I had to take out the type annotations for `Database` and `Table` to avoid a circular import, but that's fine and may be moot if these become class methods.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1102899312, https://github.com/simonw/sqlite-utils/pull/385#issuecomment-1029338360,https://api.github.com/repos/simonw/sqlite-utils/issues/385,1029338360,IC_kwDOCGYnMM49WnT4,25778,2022-02-03T19:43:56Z,2022-02-03T19:43:56Z,CONTRIBUTOR,"Works for me. I was just looking at how the FTS extensions work and they're just methods, too. So this can be consistent with that.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1102899312, https://github.com/simonw/sqlite-utils/pull/385#issuecomment-1029370537,https://api.github.com/repos/simonw/sqlite-utils/issues/385,1029370537,IC_kwDOCGYnMM49WvKp,25778,2022-02-03T20:25:58Z,2022-02-03T20:25:58Z,CONTRIBUTOR,"OK, I moved all the GIS helpers into `db.py` as methods on `Database` and `Table`, and I put `find_spatialite` back in `utils.py`. I deleted `gis.py`, since there's nothing left it. Docs and tests are updated and passing. I think this is better.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1102899312, https://github.com/simonw/sqlite-utils/pull/385#issuecomment-1030002502,https://api.github.com/repos/simonw/sqlite-utils/issues/385,1030002502,IC_kwDOCGYnMM49ZJdG,25778,2022-02-04T13:50:19Z,2022-02-04T13:50:19Z,CONTRIBUTOR,Awesome. Thanks for your help getting it in. Will now look at adding CLI versions of this. It's going to be super helpful on a bunch of my projects.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1102899312, https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007636709,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1007636709,IC_kwDOCGYnMM48D1Dl,536941,2022-01-07T18:28:33Z,2022-01-07T18:29:43Z,CONTRIBUTOR,"i added an index to one table with sqlite-utils, and then a query that used to take about 1 second started taking hundreds of seconds. running analyze got me back to sub second speed.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279, https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008161965,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1008161965,IC_kwDOCGYnMM48F1St,536941,2022-01-08T22:02:56Z,2022-01-08T22:02:56Z,CONTRIBUTOR,"for options 2 and 3, i would worry about discoverablity. in other db’s it is not necessary to explicitly call analyze for most indices. ie for postgres > The system regularly collects statistics on all of a table's columns. Newly-created non-expression indexes can immediately use these statistics to determine an index's usefulness. i suppose i would propose raising a warning if the stats table is created that explains what is going on and informs users about a —no-analyze argument.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279, https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164116,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1008164116,IC_kwDOCGYnMM48F10U,536941,2022-01-08T22:18:57Z,2022-01-08T22:18:57Z,CONTRIBUTOR,"the table with the query ran so bad was about 50k. i think the scenario should not be worse than no stats. i also did not know that sqlite was so different from postgres and needed an explicit analyze call.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279, https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164786,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1008164786,IC_kwDOCGYnMM48F1-y,536941,2022-01-08T22:24:19Z,2022-01-08T22:24:19Z,CONTRIBUTOR,the out-of-date scenario you describe could be addressed by automatically adding an analyze to the insert or convert commands if they implicate an index,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279, https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008166084,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1008166084,IC_kwDOCGYnMM48F2TE,536941,2022-01-08T22:32:47Z,2022-01-08T22:32:47Z,CONTRIBUTOR,or using “ pragma optimize”,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279, https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008275546,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1008275546,IC_kwDOCGYnMM48GRBa,536941,2022-01-09T11:01:15Z,2022-01-09T13:37:51Z,CONTRIBUTOR,"i don’t want to be such a partisan for analyze, but the query planner deciding *not* to use an index based on information collected by analyze is not necessarily a bug, but could be the correct choice. the original poster in that stack overflow doesn’t say there’s a performance regression ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279, https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1009548580,https://api.github.com/repos/simonw/sqlite-utils/issues/365,1009548580,IC_kwDOCGYnMM48LH0k,536941,2022-01-11T02:43:34Z,2022-01-11T02:43:34Z,CONTRIBUTOR,thanks so much! always a pleasure to see how you work through these things,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1096558279, https://github.com/simonw/sqlite-utils/issues/353#issuecomment-991405755,https://api.github.com/repos/simonw/sqlite-utils/issues/353,991405755,IC_kwDOCGYnMM47F6a7,536941,2021-12-11T01:38:29Z,2021-12-11T01:38:29Z,CONTRIBUTOR,"wow! that's awesome! thanks so much, @simonw!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1077102934, https://github.com/simonw/sqlite-utils/issues/348#issuecomment-983155079,https://api.github.com/repos/simonw/sqlite-utils/issues/348,983155079,IC_kwDOCGYnMM46mcGH,25778,2021-12-01T00:28:40Z,2021-12-01T00:28:40Z,CONTRIBUTOR,"I'd use this. Right now, I tend to do `touch my.db` and then `enable-wal` or whatever else, but I'm never sure if that's a bad idea.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1067771698, https://github.com/simonw/sqlite-utils/issues/344#issuecomment-1815825863,https://api.github.com/repos/simonw/sqlite-utils/issues/344,1815825863,IC_kwDOCGYnMM5sO03H,16437338,2023-11-17T06:44:49Z,2023-11-17T06:44:49Z,CONTRIBUTOR,"hello Simon, I've added more STRICT table support per https://github.com/simonw/sqlite-utils/issues/344#issuecomment-982014776 in changeset https://github.com/simonw/sqlite-utils/commit/e4b9b582cdb4e48430865f8739f341bc8017c1e4. It also fixes table.transform() to preserve STRICT mode. Please pull if you deem appropriate. Thanks!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1066474200, https://github.com/simonw/sqlite-utils/pull/326#issuecomment-916119657,https://api.github.com/repos/simonw/sqlite-utils/issues/326,916119657,IC_kwDOCGYnMM42muBp,191622,2021-09-09T13:54:10Z,2021-09-09T13:54:10Z,CONTRIBUTOR,dupe of #293?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",991237645, https://github.com/simonw/sqlite-utils/issues/278#issuecomment-864621099,https://api.github.com/repos/simonw/sqlite-utils/issues/278,864621099,MDEyOklzc3VlQ29tbWVudDg2NDYyMTA5OQ==,601708,2021-06-20T22:39:57Z,2021-06-20T22:39:57Z,CONTRIBUTOR,"Fair. I looked into it, it looks like it could be done, but it would be _a bit ugly_. I can upload and link a gist of my exploration. **Click** can parse a first argument while still recognizing it as a sub-command keyword. From there, the program could: 1. ignore it preemptively if it matches a sub-command 2. and/or check if a (db) file exists at the path. It would then also need to set a shared db argument variable. Click also makes it easy to parse arguments from environment variables. If you're amenable, I may submit a patch for only that, which would update each sub-command to check for a DB/SQLITE_UTILS_DB environment variable. The goal would be usage that looks like: `DB=./convenient.db sqlite-utils [operation] [args]`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",923697888, https://github.com/simonw/sqlite-utils/issues/272#issuecomment-861944202,https://api.github.com/repos/simonw/sqlite-utils/issues/272,861944202,MDEyOklzc3VlQ29tbWVudDg2MTk0NDIwMg==,25778,2021-06-16T01:41:03Z,2021-06-16T01:41:03Z,CONTRIBUTOR,"So, I do things like this a lot, too. I like the idea of piping in from stdin. Something like this would be nice to do in a makefile: ```sh cat file.csv | sqlite-utils --csv --table data - 'SELECT * FROM data WHERE col=""whatever""' > filtered.csv ``` If you assumed that you're always piping out the same format you're piping in, the option names don't have to change. Depends how much you want to change formats.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",921878733, https://github.com/simonw/sqlite-utils/issues/26#issuecomment-964205475,https://api.github.com/repos/simonw/sqlite-utils/issues/26,964205475,IC_kwDOCGYnMM45eJuj,536941,2021-11-09T14:31:29Z,2021-11-09T14:31:29Z,CONTRIBUTOR,i was just reaching for a tool to do this this morning,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",455486286, https://github.com/simonw/sqlite-utils/issues/26#issuecomment-1032120014,https://api.github.com/repos/simonw/sqlite-utils/issues/26,1032120014,IC_kwDOCGYnMM49hObO,536941,2022-02-08T01:32:34Z,2022-02-08T01:32:34Z,CONTRIBUTOR,"if you are curious about prior art, https://github.com/jsnell/json-to-multicsv is really good!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",455486286, https://github.com/simonw/sqlite-utils/issues/242#issuecomment-787121933,https://api.github.com/repos/simonw/sqlite-utils/issues/242,787121933,MDEyOklzc3VlQ29tbWVudDc4NzEyMTkzMw==,25778,2021-02-27T19:18:57Z,2021-02-27T19:18:57Z,CONTRIBUTOR,"I think HTTPX gets it exactly right, with a clear separation between sync and async clients, each with a basically identical API. (I'm about to switch [feed-to-sqlite](https://github.com/eyeseast/feed-to-sqlite) over to it, from Requests, to eventually make way for async support.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",817989436, https://github.com/simonw/sqlite-utils/issues/242#issuecomment-953911245,https://api.github.com/repos/simonw/sqlite-utils/issues/242,953911245,IC_kwDOCGYnMM4424fN,25778,2021-10-28T14:37:55Z,2021-10-28T14:37:55Z,CONTRIBUTOR,"I've been thinking about this a bit lately, doing a project that involves moving a lot of data in and out of SQLite files, datasette and GeoJSON. This has me leaning toward the idea that something like [`datasette query`](https://github.com/simonw/datasette/issues/1356) would be a better place to do async queries. I know there's a lot of overlap in sqlite-utils and datasette, and maybe keeping sqlite-utils synchronous would let datasette be entirely async and give a cleaner separation of implementations. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",817989436, https://github.com/simonw/sqlite-utils/pull/203#issuecomment-1404070841,https://api.github.com/repos/simonw/sqlite-utils/issues/203,1404070841,IC_kwDOCGYnMM5TsGu5,536941,2023-01-25T18:47:18Z,2023-01-25T18:47:18Z,CONTRIBUTOR,i'll adopt this PR to make the changes @simonw suggested https://github.com/simonw/sqlite-utils/pull/203#issuecomment-753567932,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",743384829, https://github.com/simonw/sqlite-utils/pull/189#issuecomment-717359145,https://api.github.com/repos/simonw/sqlite-utils/issues/189,717359145,MDEyOklzc3VlQ29tbWVudDcxNzM1OTE0NQ==,35681,2020-10-27T16:20:32Z,2020-10-27T16:20:32Z,CONTRIBUTOR,"No problem. I added a test. Let me know if it looks sufficient or if you want me to to tweak something! If you don't mind, would you tag this PR as ""hacktoberfest-accepted""? If you do mind, no problem and I'm sorry for asking :) My kiddos like the shirts.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",729818242, https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688479163,https://api.github.com/repos/simonw/sqlite-utils/issues/146,688479163,MDEyOklzc3VlQ29tbWVudDY4ODQ3OTE2Mw==,96218,2020-09-07T19:10:33Z,2020-09-07T19:11:57Z,CONTRIBUTOR,"@simonw -- I've gone ahead updated the documentation to reflect the changes introduced in this PR. IMO it's ready to merge now. In writing the documentation changes, I begin to wonder about the value and role of `batch_size` at all, tbh. May I assume it was originally intended to prevent using the entire row set to determine columns and column types, and that this was a performance consideration? If so, this PR entirely undermines its purpose. I've been passing in excess of 500,000 rows at a time to `insert_all()` with these changes and although I'm sure the performance difference is measurable it's not really noticeable; given #145, I don't know that any performance advantages outweigh the problems doing it this way removes. What do you think about just dropping the argument and defaulting to the maximum `batch_size` permissible given `SQLITE_MAX_VARS`? Are there other reasons one might want to restrict `batch_size` that I've overlooked? I could open a new issue to discuss/implement this. Of course the documentation will need to change again too if/when something is done about #147.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",688668680, https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688481317,https://api.github.com/repos/simonw/sqlite-utils/issues/146,688481317,MDEyOklzc3VlQ29tbWVudDY4ODQ4MTMxNw==,96218,2020-09-07T19:18:55Z,2020-09-07T19:18:55Z,CONTRIBUTOR,"Just force-pushed to update d042f9c with more formatting changes to satisfy `black==20.8b1` and pass the GitHub Actions ""Test"" workflow.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",688668680, https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688573964,https://api.github.com/repos/simonw/sqlite-utils/issues/146,688573964,MDEyOklzc3VlQ29tbWVudDY4ODU3Mzk2NA==,96218,2020-09-08T01:55:07Z,2020-09-08T01:55:07Z,CONTRIBUTOR,"Okay, I've rewritten this PR to preserve the batching behaviour but still fix #145, and rebased the branch to account for the `db.execute()` api change. It's not terribly sophisticated -- if it attempts to insert a batch which has too many variables, the exception is caught, the batch is split in two and each half is inserted separately, and then it carries on as before with the same `batch_size`. In the edge case where this gets triggered, subsequent batches will all be inserted in two groups too if they continue to have the same number of columns (which is presumably reasonably likely). Do you reckon this is acceptable when set against the awkwardness of recalculating the `batch_size` on the fly?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",688668680, https://github.com/simonw/sqlite-utils/issues/145#issuecomment-683382252,https://api.github.com/repos/simonw/sqlite-utils/issues/145,683382252,MDEyOklzc3VlQ29tbWVudDY4MzM4MjI1Mg==,96218,2020-08-30T06:27:25Z,2020-08-30T06:27:52Z,CONTRIBUTOR,"Note: had to adjust the test above because trying to exhaust a `SQLITE_MAX_VARIABLE_NUMBER` of 250000 in 99 records requires 2526 columns, and trips the ` ""Rows can have a maximum of {} columns"".format(SQLITE_MAX_VARS)` check even before it trips the default `SQLITE_MAX_COLUMN` value (2000).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",688659182, https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682182178,https://api.github.com/repos/simonw/sqlite-utils/issues/139,682182178,MDEyOklzc3VlQ29tbWVudDY4MjE4MjE3OA==,96218,2020-08-27T20:46:18Z,2020-08-27T20:46:18Z,CONTRIBUTOR,"> I tried changing the batch_size argument to the total number of records, but it seems only to effect the number of rows that are committed at a time, and has no influence on this problem. So the reason for this is that the `batch_size` for import is limited (of necessity) here: https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/db.py#L1048 With regard to the issue of ignoring columns, however, I made a fork and hacked a temporary fix that looks like this: https://github.com/simonwiles/sqlite-utils/commit/3901f43c6a712a1a3efc340b5b8d8fd0cbe8ee63 It doesn't seem to affect performance enormously (but I've not tested it thoroughly), and it now does what I need (and would expect, tbh), but it now fails the test here: https://github.com/simonw/sqlite-utils/blob/main/tests/test_create.py#L710-L716 The existence of this test suggests that `insert_all()` is behaving as intended, of course. It seems odd to me that this would be a desirable default behaviour (let alone the only behaviour), and its not very prominently flagged-up, either. @simonw is this something you'd be willing to look at a PR for? I assume you wouldn't want to change the default behaviour at this point, but perhaps an option could be provided, or at least a bit more of a warning in the docs. Are there oversights in the implementation that I've made? Would be grateful for your thoughts! Thanks! ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",686978131, https://github.com/simonw/sqlite-utils/issues/139#issuecomment-682815377,https://api.github.com/repos/simonw/sqlite-utils/issues/139,682815377,MDEyOklzc3VlQ29tbWVudDY4MjgxNTM3Nw==,96218,2020-08-28T16:14:58Z,2020-08-28T16:14:58Z,CONTRIBUTOR,"Thanks! And yeah, I had updating the docs on my list too :) Will try to get to it this afternoon (budgeting time is fraught with uncertainty at the moment!).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",686978131, https://github.com/simonw/sqlite-utils/issues/131#issuecomment-1067981656,https://api.github.com/repos/simonw/sqlite-utils/issues/131,1067981656,IC_kwDOCGYnMM4_qBtY,25778,2022-03-15T13:21:42Z,2022-03-15T13:21:42Z,CONTRIBUTOR,"Just ran into this issue last night. I have a big table that's _mostly_ numbers, but also a zip code column in a state where ZIP codes start with 0. Would be great to run something like this: ```sh sqlite-utils insert data.db places file.csv --csv --detect-types --type zipcode text ``` Maybe I'll take a crack at this one.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",675753042, https://github.com/simonw/sqlite-utils/issues/121#issuecomment-655652679,https://api.github.com/repos/simonw/sqlite-utils/issues/121,655652679,MDEyOklzc3VlQ29tbWVudDY1NTY1MjY3OQ==,79913,2020-07-08T17:24:46Z,2020-07-08T17:24:46Z,CONTRIBUTOR,"Better transaction handling would be really great. Some of my thoughts on implementing better transaction discipline are in https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655239728. My preferences: - Each CLI command should operate in a single transaction so that either the whole thing succeeds or the whole thing is rolled back. This avoids partially completed operations when an error occurs part way through processing. Partially completed operations are typically much harder to recovery from gracefully and may cause inconsistent data states. - The Python API should be transaction-agnostic and rely on the caller to coordinate transactions. Only the caller knows how individual insert, create, update, etc operations/methods should be bundled conceptually into transactions. When the caller is the CLI, for example, that bundling would be at the CLI command-level. Other callers might want to break up operations into multiple transactions. Transactions are usually most useful when controlled at the application-level (like logging configuration) instead of the library level. The library needs to provide an API that's conducive to transaction use, though. - The Python API should provide a context manager to provide consistent transactions handling with more useful defaults than Python's `sqlite3` module. The latter issues implicit `BEGIN` statements by default for most DML (`INSERT`, `UPDATE`, `DELETE`, … but not `SELECT`, I believe), but **not** DDL (`CREATE TABLE`, `DROP TABLE`, `CREATE VIEW`, …). Notably, the `sqlite3` module doesn't issue the implicit `BEGIN` until the first DML statement. It _does not_ issue it when entering the `with conn` block, like other DBAPI2-compatible modules do. The `with conn` block for `sqlite3` only arranges to commit or rollback an existing transaction when exiting. Including DDL and `SELECT`s in transactions is important for operation consistency, though. There are several existing bugs.python.org tickets about this and future changes are in the works, but sqlite-utils can provide its own API sooner. sqlite-utils's `Database` class could itself be a context manager (built on the `sqlite3` connection context manager) which additionally issues an explicit `BEGIN` when entering. This would then let Python API callers do something like: ```python db = sqlite_utils.Database(path) with db: # ← BEGIN issued here by Database.__enter__ db.insert(…) db.create_view(…) # ← COMMIT/ROLLBACK issue here by sqlite3.connection.__exit__ ```","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",652961907, https://github.com/simonw/sqlite-utils/issues/121#issuecomment-655898722,https://api.github.com/repos/simonw/sqlite-utils/issues/121,655898722,MDEyOklzc3VlQ29tbWVudDY1NTg5ODcyMg==,79913,2020-07-09T04:53:08Z,2020-07-09T04:53:08Z,CONTRIBUTOR,"Yep, I agree that makes more sense for backwards compat and more casual use cases. I think it should be possible for the Database/Queryable methods to DTRT based on seeing if it's within a context-manager-managed transaction.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",652961907, https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655018966,https://api.github.com/repos/simonw/sqlite-utils/issues/118,655018966,MDEyOklzc3VlQ29tbWVudDY1NTAxODk2Ng==,79913,2020-07-07T17:41:06Z,2020-07-07T17:41:06Z,CONTRIBUTOR,"Hmm, while tests pass, this may not work as intended on larger datasets. Looking into it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",651844316,