{"id": 455486286, "node_id": "MDU6SXNzdWU0NTU0ODYyODY=", "number": 26, "title": "Mechanism for turning nested JSON into foreign keys / many-to-many", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 14, "created_at": "2019-06-13T00:52:06Z", "updated_at": "2022-06-29T23:35:29Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "The GitHub JSON APIs have a really interesting convention with respect to related objects.\r\n\r\nConsider https://api.github.com/repos/simonw/sqlite-utils/issues - here's a truncated subset:\r\n```json\r\n {\r\n \"id\": 449818897,\r\n \"node_id\": \"MDU6SXNzdWU0NDk4MTg4OTc=\",\r\n \"number\": 24,\r\n \"title\": \"Additional Column Constraints?\",\r\n \"user\": {\r\n \"login\": \"IgnoredAmbience\",\r\n \"id\": 98555,\r\n \"node_id\": \"MDQ6VXNlcjk4NTU1\",\r\n \"avatar_url\": \"https://avatars0.githubusercontent.com/u/98555?v=4\",\r\n \"gravatar_id\": \"\"\r\n },\r\n \"labels\": [\r\n {\r\n \"id\": 993377884,\r\n \"node_id\": \"MDU6TGFiZWw5OTMzNzc4ODQ=\",\r\n \"url\": \"https://api.github.com/repos/simonw/sqlite-utils/labels/enhancement\",\r\n \"name\": \"enhancement\",\r\n \"color\": \"a2eeef\",\r\n \"default\": true\r\n }\r\n ],\r\n \"state\": \"open\"\r\n }\r\n```\r\nThe `user` column lists a complete user. The `labels` column has a list of labels.\r\n\r\nSince both user and label have populated `id` field this is actually enough information for us to create records for them AND set up the corresponding foreign key (for user) and m2m relationships (for labels).\r\n\r\nIt would be really neat if `sqlite-utils` had some kind of mechanism for correctly processing these kind of patterns.\r\n\r\nThanks to `jq` there's not much need for extra customization of the shape here - if we support a narrowly defined structure users can use `jq` to reshape arbitrary JSON to match.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/26/reactions\", \"total_count\": 4, \"+1\": 4, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 472115381, "node_id": "MDU6SXNzdWU0NzIxMTUzODE=", "number": 49, "title": "extracts= should support multiple-column extracts", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 10, "created_at": "2019-07-24T07:06:41Z", "updated_at": "2020-10-16T19:18:19Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Lookup tables can be constructed on compound columns, but the `extracts=` option doesn't currently support that.\r\n\r\nRight now extracts can be defined in two ways:\r\n```python\r\n# Extract these columns into tables with the same name:\r\ndogs = db.table(\"dogs\", extracts=[\"breed\", \"most_recent_trophy\"])\r\n\r\n# Same as above but with custom table names:\r\ndogs = db.table(\"dogs\", extracts={\"breed\": \"Breeds\", \"most_recent_trophy\": \"Trophies\"})\r\n```\r\nNeed some kind of syntax for much more complicated extractions, like when two columns (say \"source\" and \"source_version\") are extracted into a single table.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/49/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 539204432, "node_id": "MDU6SXNzdWU1MzkyMDQ0MzI=", "number": 70, "title": "Implement ON DELETE and ON UPDATE actions for foreign keys", "user": {"value": 26292069, "label": "LucasElArruda"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2019-12-17T17:19:10Z", "updated_at": "2020-02-27T04:18:53Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Hi! I did not find any mention on the library about ON DELETE and ON UPDATE actions for foreign keys. Are those expected to be implemented? If not, it would be a nice thing to include!", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/70/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 546073980, "node_id": "MDU6SXNzdWU1NDYwNzM5ODA=", "number": 74, "title": "Test failures on openSUSE 15.1: AssertionError: Explicit other_table and other_column", "user": {"value": 15092, "label": "jayvdb"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2020-01-07T04:35:50Z", "updated_at": "2020-01-12T07:21:17Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "openSUSE 15.1 is using python 3.6.5 and click-7.0 , however it has test failures while openSUSE Tumbleweed on py37 passes.\r\n\r\nMost fail on the cli exit code like\r\n```py\r\n[ 74s] =================================== FAILURES ===================================\r\n[ 74s] _________________________________ test_tables __________________________________\r\n[ 74s] \r\n[ 74s] db_path = '/tmp/pytest-of-abuild/pytest-0/test_tables0/test.db'\r\n[ 74s] \r\n[ 74s] def test_tables(db_path):\r\n[ 74s] result = CliRunner().invoke(cli.cli, [\"tables\", db_path])\r\n[ 74s] > assert '[{\"table\": \"Gosh\"},\\n {\"table\": \"Gosh2\"}]' == result.output.strip()\r\n[ 74s] E assert '[{\"table\": \"...e\": \"Gosh2\"}]' == ''\r\n[ 74s] E - [{\"table\": \"Gosh\"},\r\n[ 74s] E - {\"table\": \"Gosh2\"}]\r\n[ 74s] \r\n[ 74s] tests/test_cli.py:28: AssertionError\r\n```\r\n\r\npackaging project at https://build.opensuse.org/package/show/home:jayvdb:py-new/python-sqlite-utils\r\n\r\nI'll keep digging into this after I have github-to-sqlite working on Tumbleweed, as I'll need openSUSE Leap 15.1 working before I can submit this into the main python repo.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/74/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 573578548, "node_id": "MDU6SXNzdWU1NzM1Nzg1NDg=", "number": 89, "title": "Ability to customize columns used by extracts= feature", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2020-03-01T16:54:48Z", "updated_at": "2020-10-16T19:17:50Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "@simonw any thoughts on allow extracts to specify the lookup column name? If I'm understanding the documentation right, `.lookup()` allows you to define the \"value\" column (the documentation uses name), but when you use `extracts` keyword as part of `.insert()`, `.upsert()` etc. the lookup must be done against a column named \"value\". I have an existing lookup table that I've populated with columns \"id\" and \"name\" as opposed to \"id\" and \"value\", and seems I can't use `extracts=`, unless I'm missing something...\r\n\r\nInitial thought on how to do this would be to allow the dictionary value to be a tuple of table name column pair... so:\r\n```\r\ntable = db.table(\"trees\", extracts={\"species_id\": (\"Species\", \"name\"})\r\n```\r\n\r\nI haven't dug too much into the existing code yet, but does this make sense? Worth doing?\r\n\r\n_Originally posted by @chrishas35 in https://github.com/simonw/sqlite-utils/issues/46#issuecomment-592999503_", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/89/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 581795570, "node_id": "MDU6SXNzdWU1ODE3OTU1NzA=", "number": 93, "title": "Support more string values for types in .add_column()", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2020-03-15T19:32:49Z", "updated_at": "2020-09-24T20:36:46Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "https://sqlite-utils.readthedocs.io/en/2.4.2/python-api.html#adding-columns says:\r\n> SQLite types you can specify are \"TEXT\", \"INTEGER\", \"FLOAT\" or \"BLOB\".\r\n\r\nAs discovered in #92 this isn't the right list of values. I should expand this to match https://www.sqlite.org/datatype3.html", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/93/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 644161221, "node_id": "MDU6SXNzdWU2NDQxNjEyMjE=", "number": 117, "title": "Support for compound (composite) foreign keys", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2020-06-23T21:33:42Z", "updated_at": "2020-06-23T21:40:31Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "It turns out SQLite supports composite foreign keys: https://www.sqlite.org/foreignkeys.html#fk_composite\r\n\r\nTheir example looks like this:\r\n```sql\r\nCREATE TABLE album(\r\n albumartist TEXT,\r\n albumname TEXT,\r\n albumcover BINARY,\r\n PRIMARY KEY(albumartist, albumname)\r\n);\r\n\r\nCREATE TABLE song(\r\n songid INTEGER,\r\n songartist TEXT,\r\n songalbum TEXT,\r\n songname TEXT,\r\n FOREIGN KEY(songartist, songalbum) REFERENCES album(albumartist, albumname)\r\n);\r\n```\r\n\r\nHere's what that looks like in sqlite-utils:\r\n\r\n```\r\nIn [1]: import sqlite_utils \r\n\r\nIn [2]: import sqlite3 \r\n\r\nIn [3]: conn = sqlite3.connect(\":memory:\") \r\n\r\nIn [4]: conn \r\nOut[4]: \r\n\r\nIn [5]: conn.executescript(\"\"\" \r\n ...: CREATE TABLE album( \r\n ...: albumartist TEXT, \r\n ...: albumname TEXT, \r\n ...: albumcover BINARY, \r\n ...: PRIMARY KEY(albumartist, albumname) \r\n ...: ); \r\n ...: \r\n ...: CREATE TABLE song( \r\n ...: songid INTEGER, \r\n ...: songartist TEXT, \r\n ...: songalbum TEXT, \r\n ...: songname TEXT, \r\n ...: FOREIGN KEY(songartist, songalbum) REFERENCES album(albumartist, albumname) \r\n ...: ); \r\n ...: \"\"\") \r\nOut[5]: \r\n\r\nIn [6]: db = sqlite_utils.Database(conn) \r\n\r\nIn [7]: db.tables \r\nOut[7]: \r\n[,\r\n
]\r\n\r\nIn [8]: db.tables[0].foreign_keys \r\nOut[8]: []\r\n\r\nIn [9]: db.tables[1].foreign_keys \r\nOut[9]: \r\n[ForeignKey(table='song', column='songartist', other_table='album', other_column='albumartist'),\r\n ForeignKey(table='song', column='songalbum', other_table='album', other_column='albumname')]\r\n```\r\nThe table appears to have two separate foreign keys, when actually it has a single compound composite foreign key.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/117/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 652961907, "node_id": "MDU6SXNzdWU2NTI5NjE5MDc=", "number": 121, "title": "Improved (and better documented) support for transactions", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2020-07-08T04:56:51Z", "updated_at": "2020-09-24T20:36:46Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655283393_\r\n\r\nWe should put some thought into how this library supports and encourages smart use of transactions.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/121/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 675753042, "node_id": "MDU6SXNzdWU2NzU3NTMwNDI=", "number": 131, "title": "sqlite-utils insert: options for column types", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2020-08-09T18:59:11Z", "updated_at": "2022-03-15T13:21:42Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "The `insert` command currently results in string types for every column - at least when used against CSV or TSV inputs.\r\n\r\nIt would be useful if you could do the following:\r\n\r\n- automatically detects the column types based on eg the first 1000 records\r\n- explicitly state the rule for specific columns\r\n\r\n`--detect-types` could work for the former - or it could do that by default and allow opt-out using `--no-detect-types`\r\n\r\nFor specific columns maybe this:\r\n\r\n sqlite-utils insert db.db images images.tsv \\\r\n --tsv \\\r\n -c id int \\\r\n -c score float", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/131/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 688351054, "node_id": "MDU6SXNzdWU2ODgzNTEwNTQ=", "number": 140, "title": "Idea: insert-files mechanism for adding extra columns with fixed values", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2020-08-28T20:57:36Z", "updated_at": "2022-03-20T19:45:45Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Say for example you want to populate a `file_type` column with the value `gif`. That could work like this:\r\n\r\n```\r\nsqlite-utils insert-files gifs.db images *.gif \\\r\n -c path -c md5 -c last_modified:mtime \\\r\n -c file_type:text:gif --pk=path\r\n```\r\nSo a column defined as a `text` column with a value that follows a second colon.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/140/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 688352145, "node_id": "MDU6SXNzdWU2ODgzNTIxNDU=", "number": 141, "title": "insert-files support for compressed values", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2020-08-28T20:59:46Z", "updated_at": "2020-09-24T20:36:08Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "The `sqlar` format supports this, it would be useful if `insert-files` could support this too.\r\n\r\nhttps://www.sqlite.org/sqlar.html", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/141/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 688670158, "node_id": "MDU6SXNzdWU2ODg2NzAxNTg=", "number": 147, "title": "SQLITE_MAX_VARS maybe hard-coded too low", "user": {"value": 96218, "label": "simonwiles"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 7, "created_at": "2020-08-30T07:26:45Z", "updated_at": "2021-02-15T21:27:55Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "I came across this while about to open an issue and PR against the documentation for `batch_size`, which is a bit incomplete.\r\n\r\nAs mentioned in #145, while:\r\n\r\n> [`SQLITE_MAX_VARIABLE_NUMBER`](https://www.sqlite.org/limits.html#max_variable_number) ... defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.\r\n\r\nit is common that it is increased at compile time. Debian-based systems, for example, seem to ship with a version of sqlite compiled with SQLITE_MAX_VARIABLE_NUMBER set to 250,000, and I believe this is the case for homebrew installations too.\r\n\r\nIn working to understand what `batch_size` was actually doing and why, I realized that by setting `SQLITE_MAX_VARS` in `db.py` to match the value my sqlite was compiled with (I'm on Debian), I was able to decrease the time to `insert_all()` my test data set (~128k records across 7 tables) from ~26.5s to ~3.5s. Given that this about .05% of my total dataset, this is time I am keen to save...\r\n\r\nUnfortunately, it seems that `sqlite3` in the python standard library doesn't expose the `get_limit()` C API (even though `pysqlite` used to), so it's hard to know what value sqlite has been compiled with (note that this could mean, I suppose, that it's less than 999, and even hardcoding `SQLITE_MAX_VARS` to the conservative default might not be adequate. It can also be lowered -- but not raised -- at runtime). The best I could come up with is `echo \"\" | sqlite3 -cmd \".limits variable_number\"` (only available in `sqlite >= 2015-05-07 (3.8.10)`).\r\n\r\nObviously this couldn't be relied upon in `sqlite_utils`, but I wonder what your opinion would be about exposing `SQLITE_MAX_VARS` as a user-configurable parameter (with suitable \"here be dragons\" warnings)? I'm going to go ahead and monkey-patch it for my purposes in any event, but it seems like it might be worth considering.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/147/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 695441530, "node_id": "MDU6SXNzdWU2OTU0NDE1MzA=", "number": 154, "title": "OperationalError: cannot change into wal mode from within a transaction", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2020-09-07T23:42:44Z", "updated_at": "2020-09-07T23:47:10Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "I'm getting this error when running:\r\n\r\n sqlite-utils enable-wal beta.db\r\n\r\n`OperationalError: cannot change into wal mode from within a transaction`\r\n\r\nI'm worried that maybe that's because of this new code from #152:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/deb2eb013ff85bbc828ebc244a9654f0d9c3139e/sqlite_utils/db.py#L128-L129", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/154/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 702386948, "node_id": "MDU6SXNzdWU3MDIzODY5NDg=", "number": 159, "title": ".delete_where() does not auto-commit (unlike .insert() or .upsert())", "user": {"value": 11712349, "label": "spdkils"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 9, "created_at": "2020-09-16T01:55:52Z", "updated_at": "2023-04-01T17:21:05Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "When you use the delete_where() function on a table, it never commits....\r\n\r\nIs that intentional?", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/159/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 706001517, "node_id": "MDU6SXNzdWU3MDYwMDE1MTc=", "number": 163, "title": "Idea: conversions= could take Python functions", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2020-09-22T00:37:12Z", "updated_at": "2021-12-20T00:56:52Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Right now you use `conversions=` like this:\r\n\r\n```python\r\ndb[\"example\"].insert({\r\n \"name\": \"The Bigfoot Discovery Museum\"\r\n}, conversions={\"name\": \"upper(?)\"})\r\n```\r\nHow about if you could optionally provide a Python function (or a lambda) like this?\r\n```python\r\ndb[\"example\"].insert({\r\n \"name\": \"The Bigfoot Discovery Museum\"\r\n}, conversions={\"name\": lambda s: s.upper()})\r\n```\r\nThis would work by creating a random name for that function, registering it (similar to #162), executing the SQL and then un-registering the custom function at the end.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/163/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 722816436, "node_id": "MDU6SXNzdWU3MjI4MTY0MzY=", "number": 186, "title": ".extract() shouldn't extract null values", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 7, "created_at": "2020-10-16T02:41:08Z", "updated_at": "2021-08-12T12:32:14Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "This almost works, but it creates a rogue `type` record with a value of None.\r\n```\r\nIn [1]: import sqlite_utils\r\nIn [2]: db = sqlite_utils.Database(memory=True)\r\nIn [5]: db[\"creatures\"].insert_all([\r\n {\"id\": 1, \"name\": \"Simon\", \"type\": None},\r\n {\"id\": 2, \"name\": \"Natalie\", \"type\": None},\r\n {\"id\": 3, \"name\": \"Cleo\", \"type\": \"dog\"}], pk=\"id\")\r\nOut[5]:
\r\nIn [7]: db[\"creatures\"].extract(\"type\")\r\nOut[7]:
\r\nIn [8]: list(db[\"creatures\"].rows)\r\nOut[8]: \r\n[{'id': 1, 'name': 'Simon', 'type_id': None},\r\n {'id': 2, 'name': 'Natalie', 'type_id': None},\r\n {'id': 3, 'name': 'Cleo', 'type_id': 2}]\r\nIn [9]: db[\"type\"]\r\nOut[9]:
\r\nIn [10]: list(db[\"type\"].rows)\r\nOut[10]: [{'id': 1, 'type': None}, {'id': 2, 'type': 'dog'}]\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/186/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 743384829, "node_id": "MDExOlB1bGxSZXF1ZXN0NTIxMjg3OTk0", "number": 203, "title": "changes to allow for compound foreign keys", "user": {"value": 1049910, "label": "drkane"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 7, "created_at": "2020-11-16T00:30:10Z", "updated_at": "2023-01-25T18:47:18Z", "closed_at": null, "author_association": "FIRST_TIME_CONTRIBUTOR", "pull_request": "simonw/sqlite-utils/pulls/203", "body": "Add support for compound foreign keys, as per issue #117 \r\n\r\nNot sure if this is the right approach. In particular I'm unsure about:\r\n\r\n - the new `ForeignKey` class, which replaces the namedtuple in order to ensure that `column` and `other_column` are forced into tuples. The class does the job, but doesn't feel very elegant.\r\n - I haven't rewritten `guess_foreign_table` to take account of multiple columns, so it just checks for the first column in the foreign key definition. This isn't ideal.\r\n - I haven't added any ability to the CLI to add compound foreign keys, it's only in the python API at the moment.\r\n\r\nThe PR also contains a minor related change that columns and tables are always quoted in foreign key definitions.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/203/reactions\", \"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 1, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 816526538, "node_id": "MDU6SXNzdWU4MTY1MjY1Mzg=", "number": 239, "title": "sqlite-utils extract could handle nested objects", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 16, "created_at": "2021-02-25T15:10:28Z", "updated_at": "2022-09-03T23:46:02Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Imagine a table (imported from a nested JSON file) where one of the columns contains values that look like this:\r\n\r\n {\"email\": \"anonymous@noreply.airtable.com\", \"id\": \"usrROSHARE0000000\", \"name\": \"Anonymous\"}\r\n\r\nThe `sqlite-utils extract` command already uses single text values in a column to populate a new table. It would not be much of a stretch for it to be able to use JSON instead, including specifying which of those values should be used as the primary key in the new table.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/239/reactions\", \"total_count\": 6, \"+1\": 5, \"-1\": 0, \"laugh\": 0, \"hooray\": 1, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 816601354, "node_id": "MDExOlB1bGxSZXF1ZXN0NTgwMjM1NDI3", "number": 241, "title": "Extract expand - work in progress", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-02-25T16:36:38Z", "updated_at": "2021-02-25T16:36:38Z", "closed_at": null, "author_association": "OWNER", "pull_request": "simonw/sqlite-utils/pulls/241", "body": "Refs #239. Still needs documentation and CLI implementation.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/241/reactions\", \"total_count\": 3, \"+1\": 3, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 1, "state_reason": null} {"id": 817989436, "node_id": "MDU6SXNzdWU4MTc5ODk0MzY=", "number": 242, "title": "Async support", "user": {"value": 25778, "label": "eyeseast"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 13, "created_at": "2021-02-27T18:29:38Z", "updated_at": "2021-10-28T14:37:56Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Following our conversation last week, want to note this here before I forget.\r\n\r\nI've had a couple situations where I'd like to do a bunch of updates in an async event loop, but I run into SQLite's issues with concurrent writes. This feels like something sqlite-utils could help with.\r\n\r\nPeeWee ORM has a [SQLite write queue](http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#sqliteq) that might be a good model. It's using threads or gevent, but I _think_ that approach would translate well enough to asyncio. \r\n\r\nHappy to help with this, too.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/242/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 818684978, "node_id": "MDU6SXNzdWU4MTg2ODQ5Nzg=", "number": 243, "title": "How can i use this utils to deal with fts on column meta of tables ?", "user": {"value": 27874014, "label": "svjack"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-03-01T09:45:05Z", "updated_at": "2021-03-01T09:45:05Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Thank you to release this bravo project.\r\nWhen i use this project on multi table db, I want to implement \r\nconvenient search on column name from different tables.\r\nI want to develop a meta table to save the meta data of different columns\r\nof different tables and search on this meta table to get rows from the\r\ndata table (which the meta table describes)\r\ndoes this project provide some simple function on it ?\r\n\r\nYou can think a have a knowledge graph about the table in the db, \r\nand i save this knowledge graph into the db with fts enabled.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/243/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 836829560, "node_id": "MDU6SXNzdWU4MzY4Mjk1NjA=", "number": 248, "title": "support for Apache Arrow / parquet files I/O", "user": {"value": 649467, "label": "mhalle"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-03-20T14:59:30Z", "updated_at": "2021-10-28T23:46:48Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "I just started looking at Apache Arrow using pyarrow for import and export of tabular datasets, and it looks quite compelling. It might be worth looking at for sqlite-utils and/or datasette.\r\n\r\nAs a test, I took a random jsonl data dump of a dataset I have with floats, strings, and ints and converted it to arrow's parquet format using the naive `pyarrow.parquet.write_file()` command, which has automatic type inferrence. It compressed down to 7% of the original size. Conversion of a 26MB JSON file and serializing it to parquet was eyeblink instantaneous. Parquet files are portable and can be directly imported into pandas and other analytics software. \r\n\r\nThe only hangup is the automatic type inference of the naive reader. It's great for general laziness and for parsing JSON columns (it correctly interpreted a table of mine with a JSON array). However, I did get an exception for a string column where most entries looked integer-like but had a couple values that weren't -- the reader tried to coerce all of them for some reason, even though the JSON type is string. Since the writer optionally takes a schema, it shouldn't be too hard to grab the sqlite header types. With some additional hinting, you might get datetime columns and JSON, which are native Arrow types. \r\n\r\nSomewhat tangentially, someone even wrote an sqlite vfs extension for Parquet: https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html\r\n\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/248/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 907795562, "node_id": "MDU6SXNzdWU5MDc3OTU1NjI=", "number": 265, "title": "Using enable_fts before search term", "user": {"value": 36287, "label": "prabhur"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-06-01T01:43:34Z", "updated_at": "2023-04-01T17:27:18Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Many thanks for the sqlite-utils suite of utilities. Has made my life much much easier. \r\nI used this to create a table and enable FTS. All works fine. The datasette utility detects FTS and shows a text box. Searching for a term using that interface works well.\r\n\r\nHowever, when I start to use features by following https://www.sqlite.org/fts5.html section **\"3. Full-text Query Syntax\"** I seem to run into issues that I suspect is due to `escape_fts` wrapper function.\r\n\r\nAs an example, if i search for the term `\"^\u0b95\u0bc1\u0b95\u0bc8\" `on the text box in datasette it produces 140 results. However, when i tweak the query produced by datasette to not use \"escape_fts\" it produces 5 results.\r\n\r\nSimilarly, when I try to restrict the search to a single column in FTS using a spec like `{title : ^\u0b95\u0bc1\u0b95\u0bc8}` it returns no rows. The same thing pulls results when used without `escape_fts`. The text in the table is in Tamil language and the search term is a Tamil word.\r\n\r\n```\r\n ...\r\n where\r\n posts_fts match escape_fts(:search)\r\n```\r\nvs\r\n\r\n```\r\n ...\r\n where\r\n posts_fts match (:search)\r\n```\r\n\r\nAny ideas why? How can I get the benefits of both escaping as well as utilizing different facets of providing / controlling search terms? Thanks.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/265/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 915421499, "node_id": "MDU6SXNzdWU5MTU0MjE0OTk=", "number": 267, "title": "row.update() or row.pk", "user": {"value": 12721157, "label": "Gravitar64"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2021-06-08T19:56:00Z", "updated_at": "2021-06-22T17:27:27Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Hi,\r\n\r\nfantastic framework for working with Sqlite3 databases!!!\r\n\r\nI tried to update spezific rows in a table and used\r\n\r\nfor row in db[tablename]:\r\n newValue = row[\"counter\"] * row[\"prize\"] \r\n row.update({\"Fieldname\": newValue})\r\n print(row)\r\n\r\n\r\nThis updates the value in the printet row, but not in the database. So I switched to\r\n\r\ndb[tablename].update(id, {\"Filedname\": newValue})\r\n\r\nThis works fine. But row.update would be nicer, because no need for the id (its that row), no need for the tablename and the db (all defined in the for row ... loop).\r\n\r\nThx\r\n\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/267/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 944846776, "node_id": "MDU6SXNzdWU5NDQ4NDY3NzY=", "number": 297, "title": "Option for importing CSV data using the SQLite .import mechanism", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 23, "created_at": "2021-07-14T22:36:41Z", "updated_at": "2023-09-22T20:49:52Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "As seen in https://til.simonwillison.net/sqlite/import-csv - `.mode csv` and then `.import school.csv schools` is hugely faster than importing via `sqlite-utils insert` and doing the work in Python - but it can only be implemented by shelling out to the `sqlite3` CLI tool, it's not functionality that is exposed to the Python `sqlite3` module.\r\n\r\nAn option to use this would be useful - maybe something like this:\r\n\r\n sqlite-utils insert blah.db blah blah.csv --fast", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/297/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 961008507, "node_id": "MDU6SXNzdWU5NjEwMDg1MDc=", "number": 308, "title": "Add an interactive tutorial as a Jupyter notebook", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2021-08-04T20:34:22Z", "updated_at": "2021-08-04T21:30:59Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Can show people how to open this up in Binder.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/308/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 974067156, "node_id": "MDU6SXNzdWU5NzQwNjcxNTY=", "number": 318, "title": "Research: handle gzipped CSV directly", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2021-08-18T21:23:04Z", "updated_at": "2021-08-18T21:25:30Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Would it be worthwhile for the `sqlite-utils` command-line tool to grow features to efficiently directly interact with gzipped CSV data?\r\n\r\nMaybe add `--gz` options to both `insert` and to the various commands that output query results.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/318/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1066563554, "node_id": "I_kwDOCGYnMM4_knfi", "number": 346, "title": "Way to test SQLite 3.37 (and potentially other versions) in CI", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2021-11-29T22:21:06Z", "updated_at": "2021-11-29T23:12:49Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "> Need to figure out a good pattern for testing this in CI too - it will currently skip the new tests if it doesn't have SQLite 3.37 or higher.\r\n\r\n_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/344#issuecomment-982076924_", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/346/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1066603133, "node_id": "PR_kwDOCGYnMM4vKAzW", "number": 347, "title": "Test against pysqlite3 running SQLite 3.37", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 9, "created_at": "2021-11-29T23:17:57Z", "updated_at": "2021-12-11T01:02:19Z", "closed_at": null, "author_association": "OWNER", "pull_request": "simonw/sqlite-utils/pulls/347", "body": "Refs #346 and #344.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/347/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 1071531082, "node_id": "I_kwDOCGYnMM4_3kRK", "number": 349, "title": "A way of creating indexes on newly created tables", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-12-05T18:56:12Z", "updated_at": "2021-12-07T01:04:37Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "I'm writing code for https://github.com/simonw/git-history/issues/33 that creates a table inside a loop:\r\n\r\n```python\r\nitem_pk = db[item_table].lookup(\r\n {\"_item_id\": item_id},\r\n item_to_insert,\r\n column_order=(\"_id\", \"_item_id\"),\r\n pk=\"_id\",\r\n)\r\n```\r\nI need to look things up by `_item_id` on this table, which means I need an index on that column (the table can get very big).\r\n\r\nBut there's no mechanism in SQLite utils to detect if the table was created for the first time and add an index to it. And I don't want to run `CREATE INDEX IF NOT EXISTS` every time through the loop.\r\n\r\nThis should work like the `foreign_keys=` mechanism.\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/349/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1072435124, "node_id": "I_kwDOCGYnMM4_7A-0", "number": 350, "title": "Optional caching mechanism for table.lookup()", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-12-06T17:54:25Z", "updated_at": "2021-12-06T17:56:57Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Inspired by work on `git-history` where I used this pattern:\r\n```python\r\n column_name_to_id = {}\r\n\r\n def column_id(column):\r\n if column not in column_name_to_id:\r\n id = db[\"columns\"].lookup(\r\n {\"namespace\": namespace_id, \"name\": column},\r\n foreign_keys=((\"namespace\", \"namespaces\", \"id\"),),\r\n )\r\n column_name_to_id[column] = id\r\n return column_name_to_id[column]\r\n```\r\nIf you're going to be doing a large number of `table.lookup(...)` calls and you know that no other script will be modifying the database at the same time you can presumably get a big speedup using a Python in-memory cache - maybe even a LRU one to avoid memory bloat.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/350/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1072792507, "node_id": "I_kwDOCGYnMM4_8YO7", "number": 352, "title": "`sqlite-utils insert --extract colname`", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2021-12-07T00:55:44Z", "updated_at": "2022-02-03T22:59:36Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Is there a reason I've not added `--extract` as an option for `sqlite-utils insert` next? There's a `extracts=` option for the various `table.insert()` etc methods - last line in this code block:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/213a0ff177f23a35f3b235386366ff132eb879f1/sqlite_utils/db.py#L2483-L2495", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/352/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1082651698, "node_id": "I_kwDOCGYnMM5Ah_Qy", "number": 358, "title": "Support for CHECK constraints", "user": {"value": 11597658, "label": "luxint"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 7, "created_at": "2021-12-16T21:19:45Z", "updated_at": "2022-09-25T07:15:59Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Hi,\r\n\r\nI noticed the `transform.table()` method doesn't have an option to add/change or drop a check constraint (see https://sqlite.org/lang_createtable.html -> 3.7 Check Constraints. would be great to have this as an option! \r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/358/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1090798237, "node_id": "I_kwDOCGYnMM5BBEKd", "number": 359, "title": "Use RETURNING if available to populate last_pk", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-12-29T23:43:23Z", "updated_at": "2021-12-29T23:43:23Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Inspired by this: https://news.ycombinator.com/item?id=29729283\r\n\r\n> Because SQLite is effectively serializing all the writes for us, we have zero locking in our code. We used to have to lock when inserting new items (to get the LastInsertRowId), but the newer version of SQLite supports the RETURNING keyword, so we don't even have to lock on inserts now.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/359/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1122446693, "node_id": "I_kwDOCGYnMM5C5y1l", "number": 394, "title": "Test against Python 3.11-dev", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-02-02T22:21:03Z", "updated_at": "2022-02-03T21:06:35Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Same as:\r\n- https://github.com/simonw/datasette/issues/1621", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/394/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1124731464, "node_id": "I_kwDOCGYnMM5DCgpI", "number": 399, "title": "Make it easier to insert geometries, with documentation and maybe code", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 25, "created_at": "2022-02-05T00:11:26Z", "updated_at": "2023-05-16T03:11:52Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "In playing with the new SpatiaLite helpers from #385 I noticed that actually populating geometry columns is still a little bit tricky. Here's what I ended up doing:\r\n\r\n```python\r\nimport httpx, sqlite_utils\r\ndb = sqlite_utils.Database(\"/tmp/spatial.db\")\r\nattractions = httpx.get(\"https://latest.datasette.io/fixtures/roadside_attractions.json?_shape=array\").json()\r\ndb[\"attractions\"].insert_all(attractions, pk=\"pk\")\r\n\r\n# Schema of that table is now:\r\n# CREATE TABLE [attractions] (\r\n# [pk] INTEGER PRIMARY KEY,\r\n# [name] TEXT,\r\n# [address] TEXT,\r\n# [latitude] FLOAT,\r\n# [longitude] FLOAT\r\n# )\r\n\r\ndb.init_spatialite()\r\ndb[\"attractions\"].add_geometry_column(\"point\", \"POINT\")\r\n\r\ndb.execute(\"\"\"\r\n update attractions set point = GeomFromText(\r\n 'POINT(' || longitude || ' ' || latitude || ')', 4326\r\n )\r\n\"\"\")\r\n```\r\nThat last line took some figuring out - especially the need for the SRID of `4326`, without which I got this error:\r\n\r\n> `IntegrityError: attractions.point violates Geometry constraint [geom-type or SRID not allowed]`\r\n\r\nIt would be good to both document this in more detail, but ideally also to come up with a more obvious pattern for inserting common types of spatial data.\r\n\r\nAlso related:\r\n- #398\r\n- #79", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/399/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1125297737, "node_id": "I_kwDOCGYnMM5DEq5J", "number": 402, "title": "Advanced class-based `conversions=` mechanism", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 14, "created_at": "2022-02-06T19:47:41Z", "updated_at": "2022-02-16T10:18:55Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "The `conversions=` parameter works like this at the moment: https://sqlite-utils.datasette.io/en/3.23/python-api.html#converting-column-values-using-sql-functions\r\n\r\n```python\r\ndb[\"places\"].insert(\r\n {\"name\": \"Wales\", \"geometry\": wkt},\r\n conversions={\"geometry\": \"GeomFromText(?, 4326)\"},\r\n)\r\n```\r\nThis proposal is to support values in that dictionary that are objects, not strings, which can represent more complex conversions - spun out from #399.\r\n\r\nNew proposed mechanism:\r\n```python\r\nfrom sqlite_utils.utils import LongitudeLatitude\r\n\r\ndb[\"places\"].insert(\r\n {\r\n \"name\": \"London\",\r\n \"point\": (-0.118092, 51.509865)\r\n },\r\n conversions={\"point\": LongitudeLatitude},\r\n)\r\n```\r\nHere `LongitudeLatitude` is a magical value which does TWO things: it sets up the `GeomFromText(?, 4326)` SQL function, and it handles converting the `(51.509865, -0.118092)` tuple into a `POINT({} {})` string.\r\n\r\nThis would involve a change to the `conversions=` contract - where it usually expects a SQL string fragment, but it can also take an object which combines that SQL string fragment with a Python conversion function.\r\n\r\nBest of all... this resolves the `lat, lon` v.s. `lon, lat` dilemma because you can use `from sqlite_utils.utils import LongitudeLatitude` OR `from sqlite_utils.utils import LatitudeLongitude` depending on which you prefer!\r\n\r\n_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/399#issuecomment-1030739566_", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/402/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1128466114, "node_id": "I_kwDOCGYnMM5DQwbC", "number": 406, "title": "Creating tables with custom datatypes", "user": {"value": 82988, "label": "psychemedia"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2022-02-09T12:16:31Z", "updated_at": "2022-09-15T18:13:50Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Via https://stackoverflow.com/a/18622264/454773 I note the ability to register custom handlers for novel datatypes that can map into and out of things like sqlite `BLOB`s.\r\n\r\nFrom a quick look and a quick play, I didn't spot a way to do this in `sqlite_utils`?\r\n\r\nFor example:\r\n\r\n```python\r\n# Via https://stackoverflow.com/a/18622264/454773\r\nimport sqlite3\r\nimport numpy as np\r\nimport io\r\n\r\ndef adapt_array(arr):\r\n \"\"\"\r\n http://stackoverflow.com/a/31312102/190597 (SoulNibbler)\r\n \"\"\"\r\n out = io.BytesIO()\r\n np.save(out, arr)\r\n out.seek(0)\r\n return sqlite3.Binary(out.read())\r\n\r\ndef convert_array(text):\r\n out = io.BytesIO(text)\r\n out.seek(0)\r\n return np.load(out)\r\n\r\n\r\n# Converts np.array to TEXT when inserting\r\nsqlite3.register_adapter(np.ndarray, adapt_array)\r\n\r\n# Converts TEXT to np.array when selecting\r\nsqlite3.register_converter(\"array\", convert_array)\r\n```\r\n\r\n```python\r\nfrom sqlite_utils import Database\r\ndb = Database('test.db')\r\n\r\n# Reset the database connection to used the parsed datatype\r\n# sqlite_utils doesn't seem to support eg:\r\n# Database('test.db', detect_types=sqlite3.PARSE_DECLTYPES)\r\ndb.conn = sqlite3.connect(db_name, detect_types=sqlite3.PARSE_DECLTYPES)\r\n\r\n# Create a table the old fashioned way\r\n# but using the new custom data type\r\nvector_table_create = \"\"\"\r\nCREATE TABLE dummy \r\n (title TEXT, vector array );\r\n\"\"\"\r\n\r\ncur = db.conn.cursor()\r\ncur.execute(vector_table_create)\r\n\r\n\r\n# sqlite_utils doesn't appear to support custom types (yet?!)\r\n# The following errors on the \"array\" datatype\r\n\"\"\"\r\ndb[\"dummy\"].create({\r\n \"title\": str,\r\n \"vector\": \"array\",\r\n})\r\n\"\"\"\r\n```\r\n\r\nWe can then add / retrieve records from the database where the datatype of the `vector` field is a custom registered `array` type (which is to say, a `numpy` array):\r\n\r\n```python\r\nimport numpy as np\r\n\r\ndb[\"dummy\"].insert({'title':\"test1\", 'vector':np.array([1,2,3])})\r\n\r\nfor row in db.query(\"SELECT * FROM dummy\"):\r\n print(row['title'], row['vector'], type(row['vector']))\r\n\r\n\"\"\"\r\ntest1 [1 2 3] \r\n\"\"\"\r\n```\r\n\r\nIt would be handy to be able to do this idiomatically in `sqlite_utils`.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/406/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1149661489, "node_id": "I_kwDOCGYnMM5EhnEx", "number": 409, "title": "`with db:` for transactions", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2022-02-24T19:22:06Z", "updated_at": "2022-10-01T03:42:50Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "This can be a documented wrapper around `with db.conn:`.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/409/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1160034488, "node_id": "I_kwDOCGYnMM5FJLi4", "number": 411, "title": "Support for generated columns", "user": {"value": 25778, "label": "eyeseast"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 8, "created_at": "2022-03-04T20:41:33Z", "updated_at": "2022-03-11T22:32:43Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "This is a fairly new feature -- SQLite version 3.31.0 (2020-01-22) -- that I, admittedly, haven't gotten to work yet. But it looks _incredibly_ useful: https://dgl.cx/2020/06/sqlite-json-support\r\n\r\nI'm not sure if this is an option on `add-column` or a separate command like `add-generated-column`. Either way, it needs an argument to populate it. It could be something like this:\r\n\r\n```sh\r\nsqlite-utils add-column data.db table-name generated --as 'json_extract(data, \"$.field\")' --virtual\r\n```\r\n\r\nMore here: https://www.sqlite.org/gencol.html", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/411/reactions\", \"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1160182768, "node_id": "I_kwDOCGYnMM5FJvvw", "number": 412, "title": "Optional Pandas integration", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 13, "created_at": "2022-03-05T01:49:27Z", "updated_at": "2022-06-14T15:36:29Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "It would be neat if there was a way to use this more seamlessly with Pandas, in particular Pandas dataframes - but without making Pandas a required dependency.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/412/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1181236173, "node_id": "I_kwDOCGYnMM5GaDvN", "number": 422, "title": "Reconsider not running convert functions against null values", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-03-25T20:22:40Z", "updated_at": "2022-03-25T20:23:21Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "I just got caught out by the fact that `None` values are not processed by the `.convert()` mechanism https://github.com/simonw/sqlite-utils/blob/0b7b80bd40fe86e4d66a04c9f607d94991c45c0b/sqlite_utils/db.py#L2504-L2510\r\n\r\nI had run this code while working on #420 and I wasn't sure why it didn't work:\r\n\r\n```\r\n$ sqlite-utils add-column content.db articles score float\r\n$ sqlite-utils convert content.db articles score '\r\nimport random\r\nrandom.seed(10)\r\n\r\ndef convert(value):\r\n global random\r\n return random.random()\r\n'\r\n```\r\nThe reason it didn't work is that the newly added `score` column was full of `null` values.\r\n\r\nI fixed it by doing this instead:\r\n\r\n $ sqlite-utils add-column content.db articles score float --not-null-default 1.0\r\n\r\nBut this indicates to me that the design of `convert()` here may be incorrect.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/422/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1215216249, "node_id": "I_kwDOCGYnMM5Ibrp5", "number": 428, "title": "Research adding support for savepoints", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-04-26T01:04:01Z", "updated_at": "2022-04-26T01:05:29Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "https://www.sqlite.org/lang_savepoint.html\r\n\r\nSavepoints are like regular transactions except they have names and can be nested.\r\n\r\nWould there be any value in adding support to them to `sqlite-utils`, potentially as some kind of context manager? Something like this:\r\n```python\r\nwith db.savepoint(\"name\"):\r\n # do stuff\r\n with db.savepoint(\"name2\"):\r\n # do more stuff\r\n raise Release # Rolls back to before \"name2\" savepoint\r\n```\r\nI've never used this feature so I'm not comfortable adding anything like this without a bunch of extra research.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/428/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1224112817, "node_id": "I_kwDOCGYnMM5I9nqx", "number": 430, "title": "Document how to use `PRAGMA temp_store` to avoid errors when running VACUUM against huge databases", "user": {"value": 9308268, "label": "rayvoelker"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2022-05-03T13:33:58Z", "updated_at": "2022-06-14T23:26:37Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "I'm trying to figure out a way to get the `table.extract()` method to complete successfully -- I'm not sure if maybe the cause (and a possible solution) of this on Ubuntu Server 22.04 is to adjust some of the PRAGMA values within SQLite itself ... on another Linux system (PopOS), using this method on this same database appears to work just fine.\r\n\r\nHere's the bit that's causing the error, and the resulting error output:\r\n```python\r\n# combine these columns into 1 table \"bib_properties\" :\r\n# best_title\r\n# bib_level_code\r\n# mat_type\r\n# material_code\r\n# best_author\r\ndb[\"circ_trans\"].extract(\r\n [\"best_title\", \"bib_level_code\", \"mat_type\", \"material_code\", \"best_author\"], \r\n table=\"bib_properties\", \r\n fk_column=\"bib_properties_id\"\r\n)\r\n\r\ndb[\"circ_trans\"].extract(\r\n [\"call_number\"], \r\n table=\"call_number\", \r\n fk_column=\"call_number_id\",\r\n rename={\"call_number\": \"value\"}\r\n)\r\n```\r\n\r\n```python\r\n---------------------------------------------------------------------------\r\nOperationalError Traceback (most recent call last)\r\nInput In [17], in ()\r\n 1 # combine these columns into 1 table \"bib_properties\" :\r\n 2 # best_title\r\n 3 # bib_level_code\r\n 4 # mat_type\r\n 5 # material_code\r\n 6 # best_author\r\n----> 7 db[\"circ_trans\"].extract(\r\n 8 [\"best_title\", \"bib_level_code\", \"mat_type\", \"material_code\", \"best_author\"], \r\n 9 table=\"bib_properties\", \r\n 10 fk_column=\"bib_properties_id\"\r\n 11 )\r\n 13 db[\"circ_trans\"].extract(\r\n 14 [\"call_number\"], \r\n 15 table=\"call_number\", \r\n 16 fk_column=\"call_number_id\",\r\n 17 rename={\"call_number\": \"value\"}\r\n 18 )\r\n\r\nFile ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:1764, in Table.extract(self, columns, table, fk_column, rename)\r\n 1761 column_order.append(c.name)\r\n 1763 # Drop the unnecessary columns and rename lookup column\r\n-> 1764 self.transform(\r\n 1765 drop=set(columns),\r\n 1766 rename={magic_lookup_column: fk_column},\r\n 1767 column_order=column_order,\r\n 1768 )\r\n 1770 # And add the foreign key constraint\r\n 1771 self.add_foreign_key(fk_column, table, \"id\")\r\n\r\nFile ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:1526, in Table.transform(self, types, rename, drop, pk, not_null, defaults, drop_foreign_keys, column_order)\r\n 1524 with self.db.conn:\r\n 1525 for sql in sqls:\r\n-> 1526 self.db.execute(sql)\r\n 1527 # Run the foreign_key_check before we commit\r\n 1528 if pragma_foreign_keys_was_on:\r\n\r\nFile ~/jupyter/venv/lib/python3.10/site-packages/sqlite_utils/db.py:465, in Database.execute(self, sql, parameters)\r\n 463 return self.conn.execute(sql, parameters)\r\n 464 else:\r\n--> 465 return self.conn.execute(sql)\r\n\r\nOperationalError: database or disk is full\r\n```\r\n\r\nThis database is about 17G in total size, so I'm assuming the error is coming from the vacuum ... where i'm assuming it's maybe trying to do the temp storage in a location that doesn't have sufficient room. The disk space is more than ample on the host in question (1.8T is free in the directory where the sqlite db resides) The `/tmp` directory however is limited on a smaller disk associated with the OS\r\n\r\nI'm trying to think if there's a way to set the `PRAGMA temp_store` or maybe if it's `temp_store_directory` that I'm after ... to use the same local directory of where the file is located (maybe this is a property of the version of sqlite on the system?) \r\n\r\n```python\r\n# SET the temp file store to be a file ...\r\nprint(db.execute('PRAGMA temp_store').fetchall())\r\nprint(db.execute('PRAGMA temp_store=FILE').fetchall())\r\n\r\nprint(db.execute('PRAGMA temp_store').fetchall())\r\n\r\n# the users home directory ...\r\nprint(db.execute(\"PRAGMA temp_store_directory='/home/plchuser/'\").fetchall())\r\nprint(db.execute(\"PRAGMA sqlite3_temp_directory='/home/plchuser/'\").fetchall())\r\n\r\nprint(db.execute(\"PRAGMA temp_store_directory\").fetchall())\r\nprint(db.execute(\"PRAGMA sqlite3_temp_directory\").fetchall())\r\n```\r\n```text\r\n[(1,)]\r\n[]\r\n[(1,)]\r\n[]\r\n[]\r\n[('/home/plchuser/',)]\r\n[]\r\n```\r\n\r\nHere's the docs on the Temporary File Storage Locations \r\nhttps://www.sqlite.org/tempfiles.html", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/430/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1227571375, "node_id": "I_kwDOCGYnMM5JK0Cv", "number": 431, "title": "Allow making m2m relation of a table to itself", "user": {"value": 738408, "label": "rafguns"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2022-05-06T08:30:43Z", "updated_at": "2022-06-23T14:12:51Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "I am building a database, in which one of the tables has a many-to-many relationship to itself. As far as I can see, this is not (yet) possible using `.m2m()` in sqlite-utils. This may be a bit of a niche use case, so feel free to close this issue if you feel it would introduce too much complexity compared to the benefits.\r\n\r\nExample: suppose I have a table of people, and I want to store the information that John and Mary have two children, Michael and Suzy. It would be neat if I could do something like this:\r\n\r\n```python\r\nfrom sqlite_utils import Database\r\n\r\ndb = Database(memory=True)\r\ndb[\"people\"].insert({\"name\": \"John\"}, pk=\"name\").m2m(\r\n \"people\", [{\"name\": \"Michael\"}, {\"name\": \"Suzy\"}], m2m_table=\"parent_child\", pk=\"name\"\r\n)\r\ndb[\"people\"].insert({\"name\": \"Mary\"}, pk=\"name\").m2m(\r\n \"people\", [{\"name\": \"Michael\"}, {\"name\": \"Suzy\"}], m2m_table=\"parent_child\", pk=\"name\"\r\n)\r\n```\r\n\r\nBut if I do that, the many-to-many table `parent_child` has only one column:\r\n```\r\nCREATE TABLE [parent_child] (\r\n [people_id] TEXT REFERENCES [people]([name]),\r\n PRIMARY KEY ([people_id], [people_id])\r\n)\r\n```\r\n\r\nThis could be solved by adding one or two keyword_arguments to `.m2m()`, e.g. `.m2m(..., left_name=None, right_name=None)` or `.m2m(..., names=(None, None))`.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/431/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1236693079, "node_id": "I_kwDOCGYnMM5JtnBX", "number": 432, "title": "Support `rows_where()`, `delete_where()` etc for attached alias databases", "user": {"value": 11597658, "label": "luxint"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2022-05-16T06:38:58Z", "updated_at": "2022-06-14T22:16:48Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Hi,\r\n\r\nI noticed `rows_where()` doesn't return any rows from tables which are from attached databases. The `exists()` function returns false. As far as I can see this is because the `table_names()` function only looks for table names in the current database and not in attached (or temp) databases.\r\n\r\nBesides, `rows_where()`, also `insert_all()` and `delete_where()` didn't do what I was expecting because of this. For the moment I've patched `table_names()` for myself, see below but I'm not sure what the total impact is on the other functions like lookup truncate etc which all use `exists()`. Also `view_names()` doesn't look for views in attached or temp databases. \r\n```python\r\n def table_names(self, fts4: bool = False, fts5: bool = False) -> List[str]:\r\n \"A list of string table names in this database.\"\r\n where = [\"type = 'table'\"]\r\n if fts4:\r\n where.append(\"sql like '%USING FTS4%'\")\r\n if fts5:\r\n where.append(\"sql like '%USING FTS5%'\")\r\n dbs = [x[1] for x in self.execute('pragma database_list').fetchall()] \r\n lst=[]\r\n for db in dbs: \r\n sql = \"select name from {} where {}\".format(db+\".sqlite_master\",\" AND \".join(where))\r\n lst.extend(r[0] for r in self.execute(sql).fetchall())\r\n return lst\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/432/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1250495688, "node_id": "I_kwDOCGYnMM5KiQzI", "number": 439, "title": "Misleading progress bar against utf-16-le CSV input", "user": {"value": 4068, "label": "frafra"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 12, "created_at": "2022-05-27T08:34:49Z", "updated_at": "2022-06-15T03:53:43Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "The program crashes without any error.\r\n```\r\nwget \"https://artsdatabanken.no/Fab2018/api/export/csv\"\r\nsqlite-utils create-database test.db\r\nsqlite-utils insert --csv --delimiter \";\" --encoding \"utf-16-le\" test test.db csv \r\n [------------------------------------] 0%\r\n [#################-------------------] 49% 00:00:01\r\n```\r\nI would like to highlight various issues:\r\n1. sqlite-utils catches exceptions without printing the stacktrace and/or reraising the exception, so there is no easy way to use `pdb` or similar to debug the program, solution: add a debug option\r\n2. Silent crash: this is related to (1.), and it happens when there is a catch-all mechanism; solution: let the program fail.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/439/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1271426387, "node_id": "I_kwDOCGYnMM5LyG1T", "number": 444, "title": "CSV `extras_key=` and `ignore_extras=` equivalents for CLI tool", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2022-06-14T22:22:47Z", "updated_at": "2022-07-07T16:39:18Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "> I forgot to add equivalents of `extras_key=` and `ignore_extras=` to the CLI tool - will do that in a separate issue.\r\n\r\n_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/440#issuecomment-1155767915_", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/444/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1310243385, "node_id": "I_kwDOCGYnMM5OGLo5", "number": 456, "title": "feature request: pivot command", "user": {"value": 536941, "label": "fgregg"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2022-07-20T00:58:08Z", "updated_at": "2022-07-20T17:50:50Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "pivoting long-format table to wide-format tables is pretty common and kind of pain. would love to see this feature in sqlite-utils!", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/456/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1324659241, "node_id": "I_kwDOCGYnMM5O9LIp", "number": 459, "title": "Single quoted transform recipes on Windows do not work as expected ", "user": {"value": 19921, "label": "shakeel"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2022-08-01T16:14:54Z", "updated_at": "2022-08-01T16:14:54Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Trying to follow the tutorial for sqlite-utils and datasette https://datasette.io/tutorials/clean-data on Windows 11 OS `Microsoft Windows [Version 10.0.22622.440]`, with sqlite-utils and datasette installed using pipx.\r\n\r\n```\r\npipx list\r\npackage datasette 0.61.1, installed using Python 3.10.4\r\n - datasette.exe\r\npackage sqlite-utils 3.28, installed using Python 3.10.4\r\n - sqlite-utils.exe\r\n``` \r\n\r\nIn the step to transform dates into ISO dates the quoted value `'r.parsedatetime(value)'` is copied verbatim into the columns instead of applying the output of the Python recipe.\r\n\r\n```\r\nsqlite-utils convert manatees.db locations \\\r\n REPDATE created_date last_edited_date \\\r\n 'r.parsedatetime(value)' --dry-run\r\n\r\n1975/01/31 00:00:00+00\r\n --- becomes:\r\nr.parsedatetime(value)\r\n\r\nWould affect 13568 rows\r\n```\r\n\r\nHowever, if I change the code from single quotes to double quotes, it works as expected.\r\n\r\n```\r\nsqlite-utils convert manatees.db locations \\\r\n REPDATE created_date last_edited_date \\\r\n \"r.parsedatetime(value)\" --dry-run\r\n\r\n1975/01/31 00:00:00+00\r\n --- becomes:\r\n1975-01-31T00:00:00+00:00\r\n\r\nWould affect 13568 rows\r\n```\r\n\r\nSpecifying the transform code recipe should work with single quotes on Windows.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/459/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1326349129, "node_id": "I_kwDOCGYnMM5PDntJ", "number": 461, "title": "Consider including animated SVG console demos", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-08-02T20:10:04Z", "updated_at": "2022-08-02T20:12:14Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "I recorded this one using https://github.com/nbedos/termtosvg - with `pipx install termtosvg` and then `termtosvg` - execute demo - `exit` to save.\r\n\r\n![sqlite-utils-insert-json](https://user-images.githubusercontent.com/9599/182464206-f4976af4-eda8-4020-8257-4ada1867fb44.svg)\r\n\r\n```json\r\n[\r\n {\r\n \"id\": 1,\r\n \"name\": \"Catimus\"\r\n },\r\n {\r\n \"id\": 2,\r\n \"name\": \"Feliopia\"\r\n }\r\n]\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/461/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1353074021, "node_id": "I_kwDOCGYnMM5QpkVl", "number": 474, "title": "Add an option for specifying column names when inserting CSV data", "user": {"value": 14294, "label": "hubgit"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2022-08-27T15:29:59Z", "updated_at": "2022-08-31T03:42:36Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "https://sqlite-utils.datasette.io/en/stable/cli.html#csv-files-without-a-header-row\r\n\r\n> The first row of any CSV or TSV file is expected to contain the names of the columns in that file.\r\n\r\n> If your file does not include this row, you can use the `--no-headers` option to specify that the tool should not use that fist row as headers.\r\n\r\n> If you do this, the table will be created with column names called `untitled_1` and `untitled_2` and so on. You can then rename them using the `sqlite-utils transform ... --rename` command.\r\n\r\nIt would be nice to be able to specify the column names when importing CSV/TSV without a header row, via an extra command line option.\r\n\r\n(renaming a column of a large table can take a long time, which makes it an inconvenient workaround)", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/474/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1353481513, "node_id": "I_kwDOCGYnMM5QrH0p", "number": 478, "title": "`sqlite-utils tables data.db table1 table2`", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-08-28T22:05:53Z", "updated_at": "2022-08-28T22:22:35Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "The `sqlite-utils tables` command currently lists all tables.\r\n\r\nIf you have a huge table in there then running it with `--counts` can get expensive, because of the huge table.\r\n\r\nWould be useful if it could accept an optional list of tables that it should execute against, as an alternative to the default of all of them.\r\n\r\nThis should be a backwards compatible change. Current design is: https://sqlite-utils.datasette.io/en/stable/cli-reference.html#tables\r\n\r\n```\r\nUsage: sqlite-utils tables [OPTIONS] PATH\r\n\r\n List the tables in the database\r\n\r\n Example:\r\n\r\n sqlite-utils tables trees.db\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/478/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1355193529, "node_id": "I_kwDOCGYnMM5Qxpy5", "number": 479, "title": "OperationalError: cannot VACUUM from within a transaction", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2022-08-30T05:34:24Z", "updated_at": "2022-08-30T05:34:24Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Maybe when calling `.vacuum()` and other DB-level write-lock operations `sqlite_utils` could guard against this error message by automatically committing first?\r\n\r\n```\r\n 46 db[\"media\"].optimize() # type: ignore\r\n---> 47 db.vacuum()\r\n\r\nFile ~/.local/lib/python3.10/site-packages/sqlite_utils/db.py:1047, in Database.vacuum(self)\r\n 1045 def vacuum(self):\r\n 1046 \"Run a SQLite ``VACUUM`` against the database.\"\r\n-> 1047 self.execute(\"VACUUM;\")\r\n\r\nFile ~/.local/lib/python3.10/site-packages/sqlite_utils/db.py:470, in Database.execute(self, sql, parameters)\r\n 468 return self.conn.execute(sql, parameters)\r\n 469 else:\r\n--> 470 return self.conn.execute(sql)\r\n\r\nOperationalError: cannot VACUUM from within a transaction\r\n```\r\n\r\nIt might also be nice to add a sentence or two about how transactions are committed on the [docs page](https://sqlite-utils.datasette.io/en/latest/python-api.html#detect-fts). When I was swapping out my sqlite3 code for this library it was nice that everything was pretty much drop-in but I was/am unsure what to do about the places I explicitly call `.commit()` in my code\r\n\r\nRelated to https://github.com/simonw/sqlite-utils/issues/121", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/479/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1359604075, "node_id": "I_kwDOCGYnMM5RCelr", "number": 481, "title": "Idea: `sqlite-utils create-table tablename --sql \"select ...\"`", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2022-09-02T01:41:24Z", "updated_at": "2022-09-02T01:42:08Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Could offer syntactic sugar for:\r\n\r\n```sql\r\ncreate table foo as select * from bar\r\n```\r\n\r\n```\r\nsqlite-utils create-table data.db foo --sql \"select * from bar\"\r\n```\r\nhttps://sqlite-utils.datasette.io/en/stable/cli-reference.html#create-table", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/481/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1363766973, "node_id": "I_kwDOCGYnMM5RSW69", "number": 484, "title": "Expose convert recipes to `sqlite-utils --functions`", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 11, "created_at": "2022-09-06T20:15:08Z", "updated_at": "2022-09-07T19:09:52Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "`--functions` was added in:\r\n- #471 \r\n\r\nIt would be useful if the `r.jsonsplit()` and similar recipes for `sqlite-utils convert` could be used in these blocks of code too: https://sqlite-utils.datasette.io/en/stable/cli.html#sqlite-utils-convert-recipes", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/484/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1373224657, "node_id": "I_kwDOCGYnMM5R2b7R", "number": 488, "title": "`sqlite-utils transform` should set empty strings to null when converting text columns to integer/float", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2022-09-14T15:51:30Z", "updated_at": "2022-12-23T17:38:55Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "```\r\n/tmp % echo \"id,age,weight\\n1,3,2.5\\n2,,\" | sqlite-utils insert test.db test - --csv\r\n/tmp % sqlite-utils schema test.db \r\nCREATE TABLE [test] (\r\n [id] TEXT,\r\n [age] TEXT,\r\n [weight] TEXT\r\n);\r\n/tmp % sqlite-utils transform test.db test --type age integer --type weight float \r\n/tmp % sqlite-utils schema test.db \r\nCREATE TABLE \"test\" (\r\n [id] TEXT,\r\n [age] INTEGER,\r\n [weight] FLOAT\r\n);\r\n/tmp % sqlite-utils rows test.db test\r\n[{\"id\": \"1\", \"age\": 3, \"weight\": 2.5},\r\n {\"id\": \"2\", \"age\": \"\", \"weight\": \"\"}]\r\n```\r\nIt would be neat if this resulted in the following instead:\r\n```\r\n {\"id\": \"2\", \"age\": null, \"weight\": null}\r\n```\r\nRelated Discord discussion: https://discord.com/channels/823971286308356157/823971286941302908/1019635490833567794", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/488/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1374939463, "node_id": "I_kwDOCGYnMM5R8-lH", "number": 489, "title": "Ability to load JSON records held in a file with a single top level key that is a list of objects", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 9, "created_at": "2022-09-15T18:46:03Z", "updated_at": "2022-09-15T20:56:10Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "It's very common for JSON to look like this:\r\n```json\r\n{\r\n \"Version\": \"5.5.52.6\",\r\n \"List\": [\r\n {\r\n \"Description\": \"Nonpartisan\",\r\n \"Id\": 1,\r\n \"ExternalId\": \"\"\r\n },\r\n {\r\n \"Description\": \"Undeclared\",\r\n \"Id\": 2,\r\n \"ExternalId\": \"\"\r\n }\r\n ]\r\n}\r\n```\r\nThis example taken from the records downloaded from https://www.elections.alaska.gov/election-results/e/\r\n\r\nRight now you can't import this into `sqlite-utils` - you need to run it through `jq .List` first.\r\n\r\nBut since this is so common, it would be neat if `sqlite-utils` could have a rule of thumb that says \"if it's an object, but it has a single key that is is a list of objects, use that instead\".", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/489/reactions\", \"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1383646615, "node_id": "I_kwDOCGYnMM5SeMWX", "number": 491, "title": "Ability to merge databases and tables", "user": {"value": 8904453, "label": "sgraaf"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 7, "created_at": "2022-09-23T11:10:55Z", "updated_at": "2023-06-14T22:14:24Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Hi! Let me firstly say that I am a big fan of your work -- I follow your tweets and blog posts with great interest \ud83d\ude04.\r\n\r\nNow onto the matter at hand: I think it would be great if `sqlite-utils` included a `merge` or `combine` command, with the purpose of combining different SQLite databases into a single SQLite database. This way, the newly \"merged\" database would contain all differently named tables contained in the databases to be merged as-is, as well a concatenation of all tables of the same name.\r\n\r\nThis could look something like this:\r\n\r\n```bash\r\nsqlite-utils merge cats.db dogs.db > animals.db\r\n```\r\n\r\nI imagine this is rather straightforward if all databases involved in the merge contain differently named tables (i.e. no chance of conflicts), but things get slightly more complicated if two or more of the databases to be merged contain tables with the same name. Not only do you have to \"do something\" with the primary key(s), but these tables could also simply have different schemas (and therefore be incompatible for concatenation to begin with).\r\n\r\nAnyhow, I would love your thoughts on this, and, if you are open to it, work together on the design and implementation!", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/491/reactions\", \"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1386530156, "node_id": "I_kwDOCGYnMM5SpMVs", "number": 492, "title": "Idea: ability to pass extra variables to `--convert` scripts", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-09-26T18:30:45Z", "updated_at": "2022-09-26T18:33:19Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Got this idea from this example in https://jeqo.github.io/notes/2022-09-24-ingest-logs-sqlite/\r\n\r\n```bash\r\nsqlite-utils insert /tmp/kafka-logs.db logs server.log.2022-09-24-21 --text --convert \"\r\nimport re\r\nr = re.compile(r'^\\[(?P\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})\\] (?P\\w+) (?P(.+(\\n(?\\!\\[).+|)+))', re.MULTILINE)\r\ndef convert(text):\r\n rows = [m.groupdict() for m in r.finditer(text)]\r\n for row in rows:\r\n row.update({'server': 'localhost'})\r\n row.update({'component': 'broker'})\r\n return rows\r\n\"\r\n```\r\nAnd the accompanying note:\r\n\r\n> The `row.update` allows to label rows as I\u2019m planning to ingest logs from different hosts and potentially different components.\r\n\r\nThis made me think: it might be neat if you could inject additional variable values into that script with extra command-line options, to make this kind of reuse easier. Something like this:\r\n\r\n```bash\r\nsqlite-utils insert /tmp/kafka-logs.db logs server.log.2022-09-24-21 --text --convert \"\r\nimport re\r\nr = re.compile(r'^\\[(?P\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})\\] (?P\\w+) (?P(.+(\\n(?\\!\\[).+|)+))', re.MULTILINE)\r\ndef convert(text):\r\n rows = [m.groupdict() for m in r.finditer(text)]\r\n for row in rows:\r\n row.update({'server': server})\r\n row.update({'component': component})\r\n return rows\r\n\" --var server \"localhost\" --var component \"broker\"\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/492/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1386562662, "node_id": "I_kwDOCGYnMM5SpURm", "number": 493, "title": "Tiny typographical error in install/uninstall docs", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2022-09-26T19:00:42Z", "updated_at": "2022-10-25T21:31:15Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Added in:\r\n- #483\r\n\r\nI don't know how to fix this in Sphinx: I'm getting this: https://sqlite-utils.datasette.io/en/latest/cli.html#cli-install\r\n\r\n> The [insert \u2013convert](https://sqlite-utils.datasette.io/en/latest/cli.html#cli-insert-convert) and [query \u2013functions](https://sqlite-utils.datasette.io/en/latest/cli.html#cli-query-functions) options\r\n\r\n\"image\"\r\n\r\nBut I want it to display `insert --convert` and not `insert \u2013convert` there.\r\n\r\nHere's the code: https://github.com/simonw/sqlite-utils/blob/85247038f70d7eb2f3e272cfeaa4c44459cafba8/docs/cli.rst#L2125", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/493/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1393202060, "node_id": "I_kwDOCGYnMM5TCpOM", "number": 496, "title": "devrel/python api: Pylance type hinting", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2022-10-01T03:03:34Z", "updated_at": "2023-05-03T05:53:27Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Pylance is generally pretty good at figuring out stuff but `sqlite-utils` has some quirks which make type hinting kinda useless. Maybe you don't care but I thought I would bring it to your attention.\r\n\r\nFor example:\r\n\r\n```\r\ndb[\"subs\"].insert_all(subs, pk=\"index\")\r\n```\r\n\r\n```\r\nCannot access member \"insert_all\" for type \"View\"\r\n Member \"insert_all\" is unknown\r\n```\r\n\r\n`insert_all` and all the other methods show up as a type issues because the program can't know whether something is a View or a Table. Fair enough. But that basically throws all type checking out the window.\r\n\r\n`pk=\"index\"` also shows up as a type issue:\r\n\r\n```\r\nArgument of type \"Literal['index']\" cannot be assigned to parameter \"pk\" of type \"Default\" in function \"insert_all\"\r\n \"Literal['index']\" is incompatible with \"Default\"\r\n```\r\n\r\nI think this is because DEFAULT is an empty class? \r\n\r\nmaybe a few small changes could be made to make the library more type-friendly\r\n\r\nThe interim solution is of course to turn off type hints completely for the line\r\n```\r\ndb[\"subs\"].insert_all(subs, pk=\"index\") # type: ignore\r\n```\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/496/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1405196044, "node_id": "PR_kwDOCGYnMM5AmYzy", "number": 499, "title": "feat: recreate fts triggers after table transform", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2022-10-11T20:35:39Z", "updated_at": "2022-10-26T17:54:51Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": "simonw/sqlite-utils/pulls/499", "body": "https://github.com/simonw/sqlite-utils/pull/498\r\n\r\n\r\n----\r\n:books: Documentation preview :books:: https://sqlite-utils--499.org.readthedocs.build/en/499/\r\n\r\n\r\n\r\nalternatively, `self.disable_fts()`", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/499/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 1453134846, "node_id": "I_kwDOCGYnMM5WnRP-", "number": 513, "title": "Add or document streamlined workflow for importing Datasette csv / json exports", "user": {"value": 19328961, "label": "henry501"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2022-11-17T10:54:47Z", "updated_at": "2022-11-17T10:54:47Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "I'm working on some small front-end enhancements to the laion-aesthetic-datasette project, and I wanted to partially populate a database directly using exports from the existing Datasette instance instead of downloading the parquet files and creating my own multi-GB database.\r\n\r\nThere have been a number of small issues that are certainly related to my relative lack of familiarity with the toolkit, but that are still surprising. \r\n\r\nFor example: a CSV export of the images table (http://laion-aesthetic.datasette.io/laion-aesthetic-6pls.csv?sql=select+rowid%2C+url%2C+text%2C+domain_id%2C+width%2C+height%2C+similarity%2C+punsafe%2C+pwatermark%2C+aesthetic%2C+hash%2C+__index_level_0__+from+images+order+by+random%28%29+limit+100) has nested single quotes, double quotes, and commas that aren't handled by rows_from_file. Similarly, the json output has to be manually transformed to add the column names and remove extraneous information before sqlite_utils can import it.\r\n\r\nI was able to work through these issues, but as an enhancement it would be really helpful to create or document a clear workflow that avoids the friction of this data transformation.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/513/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1479914599, "node_id": "I_kwDOCGYnMM5YNbRn", "number": 516, "title": "Feature request: output number of ignored/replaced rows for insert command", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2022-12-06T18:59:21Z", "updated_at": "2022-12-06T19:08:14Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "https://hachyderm.io/@briandorsey/109468185742876820\r\n\r\n> I'm fiddling with piping json to `insert -ignore` I'd love to see the count of records inserted & ignored, but didn't see a way to do that in the help/docs.\r\n>\r\n> Example: `xh \"https://hachyderm.io/api/v1/timelines/tag/rust?max_id=109443380308326328\" | sqlite-utils insert aoc.db aoc - --pk=id --ignore`", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/516/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1550536442, "node_id": "I_kwDOCGYnMM5ca076", "number": 521, "title": "Custom JSON encoder", "user": {"value": 31504, "label": "janrito"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2023-01-20T09:19:40Z", "updated_at": "2023-01-20T09:19:40Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "It would be nice if we could specify a custom encoder (and decoder) for types that will need extra deserialisation \u2013 e.g., sets, enums or sparse matrices \u2013 or even project-specific types", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/521/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1560651350, "node_id": "I_kwDOCGYnMM5dBaZW", "number": 523, "title": "Feature request: trim all leading and trailing white space for all columns for all tables in a database", "user": {"value": 536941, "label": "fgregg"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-01-28T02:40:10Z", "updated_at": "2023-01-28T02:41:14Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "It's pretty common that i need to trim leading or trailing white space from lots of columns in a database a part of an initial ETL.\r\n\r\nI use the following recipe a lot, and it would be great to include this functionality into sqlite-utils\r\n\r\n`trimify.sql`\r\n```sql\r\nselect 'select group_concat(''update [' || name || '] set ['' || name || ''] = trim(['' || name || ''])'', '';\r\n'') || '';\r\n'' as sql_to_run from pragma_table_info('''||name||''');' from sqlite_schema;\r\n```\r\n\r\nthen something like:\r\n\r\n```bash\r\n\tsqlite3 example.db < scripts/trimify.sql > table_trim.sql && \\\r\n sqlite3 $example.db < table_trim.sql > trim.sql && \\\r\n sqlite3 $example.db < trim.sql\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/523/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1595340692, "node_id": "I_kwDOCGYnMM5fFveU", "number": 530, "title": "add ability to configure \"on delete\" and \"on update\" attributes of foreign keys:", "user": {"value": 536941, "label": "fgregg"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2023-02-22T15:44:14Z", "updated_at": "2023-05-08T20:39:01Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "sqlite supports these, and it would be quite nice to be able to add them with sqlite-utils.\r\n\r\nhttps://www.sqlite.org/foreignkeys.html#fk_actions", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/530/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1700840265, "node_id": "I_kwDOCGYnMM5lYMNJ", "number": 541, "title": "Get tests to pass with `pytest -Werror`", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-05-08T19:57:23Z", "updated_at": "2023-05-08T19:59:35Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Inspired by:\r\n- #534", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/541/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1700936245, "node_id": "I_kwDOCGYnMM5lYjo1", "number": 542, "title": "Remove `skip_false=True` and `--no-skip-false` in `sqlite-utils` 4.0", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": {"value": 9374594, "label": "4.0 backwards incomatible changes"}, "comments": 1, "created_at": "2023-05-08T21:04:28Z", "updated_at": "2023-05-08T21:07:41Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Following:\r\n- #527\r\n\r\nThe only reason I didn't remove fix this mis-feature entirely is that it represents a backwards incompatible change. I'll make that change in 4.0.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/542/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1720096994, "node_id": "I_kwDOCGYnMM5mhpji", "number": 554, "title": "`IndexError` when doing `.insert(..., pk='id')` after `insert_all`", "user": {"value": 1231935, "label": "xavdid"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-05-22T17:13:02Z", "updated_at": "2023-05-22T17:18:33Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "I believe this is related to https://github.com/simonw/sqlite-utils/issues/98.\r\n\r\nWhen `pk` is specified by table A's `insert` call, it throws an index error if a different table has written a row with a higher rowid than exists in the first table. Here's a basic example:\r\n\r\n```py\r\nfrom sqlite_utils import Database\r\n\r\n\r\ndef test_pk_for_insert(fresh_db):\r\n user = {\"id\": \"abc\", \"name\": \"david\"}\r\n\r\n fresh_db[\"users\"].insert(user, pk=\"id\")\r\n\r\n fresh_db[\"comments\"].insert_all(\r\n [\r\n {\"id\": \"def\", \"text\": \"ok\"},\r\n {\"id\": \"ghi\", \"text\": \"great\"},\r\n ],\r\n )\r\n\r\n fresh_db[\"users\"].insert(\r\n user,\r\n ignore=True,\r\n # BUG: when specifying pk on the second insert call \r\n # db.py goes into a block it doesn't expect and we get the error\r\n pk=\"id\",\r\n )\r\n\r\n\r\nif __name__ == \"__main__\":\r\n db = Database(\"bug.db\")\r\n if db[\"users\"].exists():\r\n raise ValueError(\r\n \"bug only shows on a new database - remove bug.db before running the script\"\r\n )\r\n test_pk_for_insert(db)\r\n```\r\n\r\nThe error is:\r\n\r\n```py\r\n File \"/Users/david/projects/reddit-to-sqlite/.venv/lib/python3.11/site-packages/sqlite_utils/db.py\", line 2960, in insert_chunk\r\n row = list(self.rows_where(\"rowid = ?\", [self.last_rowid]))[0]\r\n ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^\r\nIndexError: list index out of range\r\n```\r\n\r\nThe issue is in this block: \r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/2747257a3334d55e890b40ec58fada57ae8cfbfd/sqlite_utils/db.py#L2954-L2958\r\n\r\nrelevant locals are:\r\n\r\n- `pk`: `'id'`\r\n- `result.lastrowid`: `2`\r\n\r\nWhat's most interesting is the comment `# self.last_rowid will be 0 if a \"INSERT OR IGNORE\" happened`, which doesn't seem to be the case here. ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/554/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1733198948, "node_id": "I_kwDOCGYnMM5nToRk", "number": 555, "title": "Filter table by a large bunch of ids", "user": {"value": 10843208, "label": "redraw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-05-31T00:29:51Z", "updated_at": "2023-06-14T22:01:57Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Hi! this might be a question related to both SQLite & sqlite-utils, and you might be more experienced with them.\r\n\r\nI have a large bunch of ids, and I'm wondering which is the best way to query them in terms of performance, and simplicity if possible.\r\n\r\nThe naive approach would be something like `select * from table where rowid in (?, ?, ?...)` but that wouldn't scale if ids are >1k.\r\n\r\nAnother approach might be creating a temp table, or in-memory db table, insert all ids in that table and then join with the target one.\r\n\r\nI failed to attach an in-memory db both using sqlite-utils, and plain sql's execute(), so my closest approach is something like,\r\n\r\n```python\r\ndef filter_existing_video_ids(video_ids):\r\n db = get_db() # contains a \"videos\" table\r\n db.execute(\"CREATE TEMPORARY TABLE IF NOT EXISTS tmp (video_id TEXT NOT NULL PRIMARY KEY)\")\r\n db[\"tmp\"].insert_all([{\"video_id\": video_id} for video_id in video_ids])\r\n for row in db[\"tmp\"].rows_where(\"video_id not in (select video_id from videos)\"):\r\n yield row[\"video_id\"]\r\n db[\"tmp\"].drop()\r\n```\r\n\r\nThat kinda worked, I couldn't find an option in sqlite-utils's `create_table()` to tell it's a temporary table. Also, `tmp` table is not dropped finally, neither using `.drop()` despite being created with the keyword `TEMPORARY`. I believe it should be automatically dropped after connection/session ends though I read.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/555/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1740026046, "node_id": "I_kwDOCGYnMM5ntrC-", "number": 556, "title": "Support storing incrementally piped values", "user": {"value": 601708, "label": "mcint"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-06-04T00:45:23Z", "updated_at": "2023-06-04T01:21:15Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "I'm trying to use sqlite-utils to data generated incrementally. There are a few\naspects of this that I don't currently know how to handle. I would like an option\nto apply writes incrementally, line-by-line as they are received. I would like an\noption to echo incremental progress. And, it would be nice to have \n\nIn particular, I'm using CoreLocationCLI -w -j to generate, newline-delimited JSON.\n\nOne variant of the command \n\n`stdbuf -oL CoreLocationCLI -w -j | pee 'sqlite-utils insert loc.db loc -' nl`\n\n`pee`, from `moreutils`, is like `tee` but spawns and pipes to the processes\ncreated by invoking each of its arguments, so, for gratuitous demonstration,\n`pee 'sponge out.log' cat` would behave like `tee`.\n\nIt looks like I can get what I want with:\n`stdbuf -oL CoreLocationCLI -w -j | while read line; do <<<\"$line\" sqlite-utils insert loc.db loc -; echo \"$line\"; done | nl`\n\n\n\n\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/556/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1754174496, "node_id": "I_kwDOCGYnMM5ojpQg", "number": 558, "title": "Ability to define unique columns when creating a table", "user": {"value": 1910303, "label": "aguinane"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2023-06-13T06:56:19Z", "updated_at": "2023-08-18T01:06:03Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "When creating a new table, it would be good to have an option to set unique columns similar to how not_null is set.\r\n\r\n```python\r\nfrom sqlite_utils import Database\r\n\r\ncolumns = {\"mRID\": str, \"name\": str}\r\ndb = Database(\"example.db\")\r\ndb[\"ExampleTable\"].create(columns, pk=\"mRID\", not_null=[\"mRID\"], if_not_exists=True)\r\ndb[\"ExampleTable\"].create_index([\"mRID\"], unique=True, if_not_exists=True)\r\n```\r\n\r\nSo something like this would add the UNIQUE flag to the table definition. \r\n\r\n```python\r\ndb[\"ExampleTable\"].create(columns, pk=\"mRID\", not_null=[\"mRID\"], unique=[\"mRID\"], if_not_exists=True)\r\n```\r\n\r\n```sql\r\nCREATE TABLE ExampleTable (\r\n mRID TEXT PRIMARY KEY\r\n NOT NULL\r\n UNIQUE,\r\n name TEXT\r\n);\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/558/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1784794489, "node_id": "I_kwDOCGYnMM5qYc15", "number": 562, "title": "Explore the intersection between sqlite-utils and dataclasses", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-07-02T19:23:08Z", "updated_at": "2023-07-02T19:26:39Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "> Aside: this makes me think it might be cool if `sqlite-utils` had a way of working with dataclasses rather than just dicts, and knew how to create a SQLite table to match a dataclass and maybe how to code-generate dataclasses for a specific table schema (dynamically or even using code-generation that can be written to disk, for better editor integrations).\r\n\r\n_Originally posted by @simonw in https://github.com/simonw/llm/issues/65#issuecomment-1616742529_\r\n ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/562/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1795219865, "node_id": "I_kwDOCGYnMM5rAOGZ", "number": 566, "title": "`--no-headers` doesn't work on most formats", "user": {"value": 33625, "label": "zellyn"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2023-07-09T03:43:36Z", "updated_at": "2023-07-09T04:13:35Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Version 3.33\r\n\r\n```\r\nsqlite-utils query library.db 'select asin from audible' --fmt plain --no-headers | head -3\r\nasin\r\n0062804006\r\n0062891421\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/566/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1818838294, "node_id": "I_kwDOCGYnMM5saUUW", "number": 578, "title": "Plugin hook for adding new output formats", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2023-07-24T17:29:18Z", "updated_at": "2023-08-07T15:41:49Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "> What would it take to add a format hook? I'm still thinking about my GIS workflow, and being able to do `sqlite-utils query ... --geojson` would be nice. It's the one place my Datasette workflow is messy, having to do `datasette . --get /path/to/query.geojson --setting max_rows_returned 10000 --load-extension spatialite`.\r\n> I know the current pattern is `--csv`, but maybe `--format geojson` is more future-proof.\r\n\r\nhttps://discord.com/channels/823971286308356157/997738192360964156/1133076679011602432", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/578/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1821108702, "node_id": "I_kwDOCGYnMM5si-ne", "number": 579, "title": "Special handling for SQLite column of type `JSON`", "user": {"value": 15178711, "label": "asg017"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2023-07-25T20:37:23Z", "updated_at": "2023-07-25T20:37:23Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "`sqlite-utils` should detect and have specially handling for column with a `JSON` column. For example:\r\n\r\n```sql\r\nCREATE TABLE \"dogs\" (\r\n id INTEGER PRIMARY KEY,\r\n name TEXT,\r\n friends JSON \r\n);\r\n```\r\n\r\n## Automatic Nesting\r\n\r\nAccording to [\"Nested JSON Values\"](https://sqlite-utils.datasette.io/en/stable/cli.html#nested-json-values), sqlite-utils will only expand JSON if the `--json-cols` flag is passed. It looks like it'll try to `json.load` all text column to test if its JSON, which can get expensive on non-json columns. \r\n\r\nInstead, `sqlite-utils` should be default (ie without the `--json-cols` flags) do the `maybe_json()` operation on columns with a declared `JSON` type. So the above table would expand the `\"friends\"` column as expected, withoutthe `--json-cols` flag:\r\n\r\n```bash\r\nsqlite-utils dogs.db \"select * from dogs\" | python -mjson.tool\r\n```\r\n\r\n```\r\n[\r\n {\r\n \"id\": 1,\r\n \"name\": \"Cleo\",\r\n \"friends\": [\r\n {\r\n \"name\": \"Pancakes\"\r\n },\r\n {\r\n \"name\": \"Bailey\"\r\n }\r\n ]\r\n }\r\n]\r\n```\r\n\r\n---\r\n\r\nI'm sure there's other ways `sqlite-utils` can specially handle JSON columns, so keeping this open while I think of more", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/579/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1822918995, "node_id": "I_kwDOCGYnMM5sp4lT", "number": 580, "title": "Add way to export to a csv file using the Python library", "user": {"value": 44324811, "label": "kevinlinxc"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2023-07-26T18:09:26Z", "updated_at": "2023-07-26T18:09:26Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "According to the documentation, we can make a csv output using the CLI tool, but not the Python library. Could we have the latter?", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/580/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1839344979, "node_id": "I_kwDOCGYnMM5toi1T", "number": 582, "title": "Handling CSV/file input that contains NUL bytes", "user": {"value": 1448859, "label": "betatim"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2023-08-07T12:24:14Z", "updated_at": "2023-08-07T12:24:14Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "I was using sqlite-utils to create a DB from a CSV and it turns out the CSV contains a NUL byte.\r\n\r\nWhen the processing reaches the line that contains the NUL an exception is raised.\r\n\r\nI'm wondering if there is something that can be done in `sqlite-utils` to say \"skip lines with encoding errors\" or some such. I think it isn't super straightforward though as the exception comes from inside the `csv` module that does all the parsing.\r\n\r\nConcretely the file is the `KernelVersions.csv` from https://www.kaggle.com/datasets/kaggle/meta-kaggle\r\n\r\nThis is the command and output:\r\n```\r\n$ sqlite-utils insert --csv kaggle.db kaggle KernelVersions.csv\r\n [------------------------------------] 0%\r\n [#####################---------------] 60% 00:04:24Traceback (most recent call last):\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/bin/sqlite-utils\", line 10, in \r\n sys.exit(cli())\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py\", line 1128, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py\", line 1053, in main\r\n rv = self.invoke(ctx)\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py\", line 1659, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py\", line 1395, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py\", line 754, in invoke\r\n return __callback(*args, **kwargs)\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py\", line 1223, in insert\r\n insert_upsert_implementation(\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py\", line 1085, in insert_upsert_implementation\r\n db[table].insert_all(\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/db.py\", line 3198, in insert_all\r\n chunk = list(chunk)\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/db.py\", line 3742, in fix_square_braces\r\n for record in records:\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py\", line 1071, in \r\n docs = (decode_base64_values(doc) for doc in docs)\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py\", line 1068, in \r\n docs = (verify_is_dict(doc) for doc in docs)\r\n File \"/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py\", line 1003, in \r\n docs = (dict(zip(headers, row)) for row in reader)\r\n_csv.Error: line contains NUL\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/582/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1856075668, "node_id": "I_kwDOCGYnMM5uoXeU", "number": 586, "title": ".transform() fails to drop column if table is part of a view", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2023-08-18T05:25:22Z", "updated_at": "2023-08-18T06:13:47Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "I got this error trying to drop a column from a table that was part of a SQL view:\r\n\r\n> error in view plugins: no such table: main.pypi_releases\r\n\r\nUpon further investigation I found that this pattern seemed to fix it:\r\n```python\r\ndef transform_the_table(conn):\r\n # Run this in a transaction:\r\n with conn:\r\n # We have to read all the views first, because we need to drop and recreate them\r\n db = sqlite_utils.Database(conn)\r\n views = {v.name: v.schema for v in db.views if table.lower() in v.schema.lower()}\r\n for view in views.keys():\r\n db[view].drop()\r\n db[table].transform(\r\n types=types,\r\n rename=rename,\r\n drop=drop,\r\n column_order=[p[0] for p in order_pairs],\r\n )\r\n # Now recreate the views\r\n for name, schema in views.items():\r\n db.create_view(name, schema)\r\n```\r\nSo grab a copy of any view that might reference this table, start a transaction, drop those views, run the transform, recreate the views again.\r\n\r\n> I wonder if this should become an option in `sqlite-utils`? Maybe a `recreate_views=True` argument for `table.tranform(...)`? Should it be opt-in or opt-out?\r\n\r\n_Originally posted by @simonw in https://github.com/simonw/datasette-edit-schema/issues/35#issuecomment-1683370548_\r\n ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/586/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1868713944, "node_id": "I_kwDOCGYnMM5vYk_Y", "number": 588, "title": "`table.get(column=value)` option for retrieving things not by their primary key", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-08-28T00:41:23Z", "updated_at": "2023-08-28T00:41:54Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "This came up working on this feature:\r\n- https://github.com/simonw/llm/pull/186\r\n\r\nI have a table with this schema:\r\n```sql\r\nCREATE TABLE [collections] (\r\n [id] INTEGER PRIMARY KEY,\r\n [name] TEXT,\r\n [model] TEXT\r\n);\r\nCREATE UNIQUE INDEX [idx_collections_name]\r\n ON [collections] ([name]);\r\n```\r\nSo the primary key is an integer (because it's going to have a huge number of rows foreign key related to it, and I don't want to store a larger text value thousands of times), but there is a unique constraint on the `name` - that would be the primary key column if not for all of those foreign keys.\r\n\r\nProblem is, fetching the collection by name is actually pretty inconvenient.\r\n\r\nFetch by numeric ID:\r\n\r\n```python\r\ntry:\r\n table[\"collections\"].get(1)\r\nexcept NotFoundError:\r\n # It doesn't exist\r\n```\r\nFetching by name:\r\n```python\r\ndef get_collection(db, collection):\r\n rows = db[\"collections\"].rows_where(\"name = ?\", [collection])\r\n try:\r\n return next(rows)\r\n except StopIteration:\r\n raise NotFoundError(\"Collection not found: {}\".format(collection))\r\n```\r\nIt would be neat if, for columns where we know that we should always get 0 or one result, we could do this instead:\r\n```python\r\ntry:\r\n collection = table[\"collections\"].get(name=\"entries\")\r\nexcept NotFoundError:\r\n # It doesn't exist\r\n```\r\nThe existing `.get()` method doesn't have any non-positional arguments, so using `**kwargs` like that should work:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/1260bdc7bfe31c36c272572c6389125f8de6ef71/sqlite_utils/db.py#L1495", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/588/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1879209560, "node_id": "I_kwDOCGYnMM5wAnZY", "number": 589, "title": "Mechanism for de-registering registered SQL functions", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2023-09-03T19:32:39Z", "updated_at": "2023-09-03T19:36:34Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "I used a custom SQL function in a migration script and then realized that it should be de-registered before the end of the script to avoid leaking into the calling code.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/589/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1879214365, "node_id": "I_kwDOCGYnMM5wAokd", "number": 590, "title": "Ability to tell if a Database is an in-memory one", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-09-03T19:50:15Z", "updated_at": "2023-09-03T19:50:36Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Currently the constructor accepts `memory=True` or `memory_name=...` and uses those to create a connection, but does not record what those values were:\r\n\r\nhttps://github.com/simonw/sqlite-utils/blob/1260bdc7bfe31c36c272572c6389125f8de6ef71/sqlite_utils/db.py#L307-L349\r\n\r\nThis makes it hard to tell if a database object is to an in-memory or a file-based database, which is sometimes useful to know.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/590/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1891614971, "node_id": "I_kwDOCGYnMM5wv8D7", "number": 594, "title": "Represent compound foreign keys in table.foreign_keys output", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2023-09-12T03:48:24Z", "updated_at": "2023-09-12T03:51:13Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "Given this schema:\r\n```sql\r\nCREATE TABLE departments (\r\n campus_name TEXT NOT NULL,\r\n dept_code TEXT NOT NULL,\r\n dept_name TEXT,\r\n PRIMARY KEY (campus_name, dept_code)\r\n);\r\nCREATE TABLE courses (\r\n course_code TEXT PRIMARY KEY,\r\n course_name TEXT,\r\n campus_name TEXT NOT NULL,\r\n dept_code TEXT NOT NULL,\r\n FOREIGN KEY (campus_name, dept_code) REFERENCES departments(campus_name, dept_code)\r\n);\r\n```\r\nThe output of `db[\"courses\"].foreign_keys` right now is:\r\n```\r\n[ForeignKey(table='courses', column='campus_name', other_table='departments', other_column='campus_name'),\r\n ForeignKey(table='courses', column='dept_code', other_table='departments', other_column='dept_code')]\r\n```\r\nWhich suggests two normal foreign keys, not one compound foreign key.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/594/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1920416843, "node_id": "I_kwDOCGYnMM5ydzxL", "number": 597, "title": "sqlite-utils insert-files should be able to convert fields", "user": {"value": 1737541, "label": "grimnight"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2023-09-30T22:20:47Z", "updated_at": "2023-09-30T22:20:47Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Currently using both `insert-files` and `convert` is needed in order to create sqlar files, it would be more convenient if it could be done with just one command.\r\n\r\n```shell\r\n~\r\n\u276f cat test.py\r\nimport os\r\n\r\nclass Example:\r\n def __init__(self, arg1, arg2):\r\n self.arg1 = arg1\r\n\r\n~\r\n\u276f sqlite-utils insert-files test.sqlar sqlar test.py -c name:name -c data:content -c mode:mode -c mtime:mtime -c sz:size --pk=name\r\n [####################################] 100%\r\n\r\n~\r\n\u276f sqlite-utils convert test.sqlar sqlar data \"zlib.compress(value)\" --import=zlib --where \"name = 'test.py'\"\r\n[####################################] 100%\r\n\r\n~\r\n\u276f cat test.py | sqlite-utils convert test.sqlar sqlar data \"zlib.compress(sys.stdin.buffer.read())\" --import=zlib --import=sys --where \"name = 'test.py'\" # Alternative way\r\n [####################################] 100%\r\n\r\n~\r\n\u276f sqlite3 test.sqlar \"SELECT hex(data) FROM sqlar WHERE name = 'test.py';\" | python3 -c \"import sys, zlib; sys.stdout.buffer.write(zlib.decompress(bytes.fromhex(sys.stdin.read())))\"\r\nimport os\r\n\r\nclass Example:\r\n def __init__(self, arg1, arg2):\r\n self.arg1 = arg1\r\n\r\n~\r\n\u276f rm test.py\r\n\r\n~\r\n\u276f sqlar -l test.sqlar\r\ntest.py\r\n\r\n~\r\n\u276f sqlar -x test.sqlar\r\n\r\n~\r\n\u276f cat test.py\r\nimport os\r\n\r\nclass Example:\r\n def __init__(self, arg1, arg2):\r\n self.arg1 = arg1\r\n\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/597/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1977155641, "node_id": "I_kwDOCGYnMM512QA5", "number": 601, "title": "Move plugin directory into documentation", "user": {"value": 9599, "label": "simonw"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2023-11-04T04:07:52Z", "updated_at": "2023-11-04T04:07:52Z", "closed_at": null, "author_association": "OWNER", "pull_request": null, "body": "https://github.com/simonw/sqlite-utils-plugins should be in the official documentation.\r\n\r\nI can use the same pattern as https://llm.datasette.io/en/stable/plugins/directory.html\r\n\r\nhttps://til.simonwillison.net/readthedocs/stable-docs", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/601/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1978603203, "node_id": "I_kwDOCGYnMM517xbD", "number": 602, "title": "`sqlite-utils transform` removes the `AUTOINCREMENT` keyword", "user": {"value": 4472046, "label": "ArsTapatun"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2023-11-06T08:48:43Z", "updated_at": "2023-11-06T08:48:43Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "### Context\r\n\r\nWe ran into this bug randomly, noticing that deleted `ROWID` would get reused after migrating the DB. Using `transform` to change any column in the table will also unexpectedly strip away the `AUTOINCREMENT` keyword from the primary key definition, even if it was not the transformation target.\r\n\r\n### Reproducible example\r\n\r\n**Original database**\r\n\r\n```sql\r\n$ sqlite3 test.db << EOF\r\nCREATE TABLE mytable (\r\n col1 INTEGER PRIMARY KEY AUTOINCREMENT,\r\n col2 TEXT NOT NULL\r\n)\r\nEOF\r\n\r\n$ sqlite3 test.db \".schema mytable\"\r\nCREATE TABLE mytable (\r\n col1 INTEGER PRIMARY KEY AUTOINCREMENT,\r\n col2 TEXT NOT NULL\r\n);\r\n```\r\n\r\n**Modified database after sqlite-utils**\r\n\r\n```sql\r\n$ sqlite-utils transform test.db mytable --rename col2 renamedcol2\r\n\r\n$ sqlite3 test.db \"SELECT sql FROM sqlite_master WHERE name = 'mytable';\"\r\nCREATE TABLE IF NOT EXISTS \"mytable\" (\r\n [col1] INTEGER PRIMARY KEY,\r\n [renamedcol2] TEXT NOT NULL\r\n);\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/602/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1988525411, "node_id": "I_kwDOCGYnMM52hn1j", "number": 603, "title": "Pyhton 3.12 Bug report", "user": {"value": 1324252, "label": "constantinedev"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-11-10T22:57:48Z", "updated_at": "2023-12-08T05:10:31Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "I start with new python3 verison 3.12.0\r\nAlso have the error where connect DataBase\r\n\r\n```\r\nTraceback (most recent call last):\r\n File \"/home/t/Development/python/FKPJ/ClinicSYS/run.py\", line 1, in \r\n import re, os, io, json, sqlite_utils, requests, pytz, logging\r\n File \"/home/t/.local/lib/python3.12/site-packages/sqlite_utils/__init__.py\", line 1, in \r\n from .db import Database\r\n File \"/home/t/.local/lib/python3.12/site-packages/sqlite_utils/db.py\", line 277, in \r\n class Database:\r\n File \"/home/t/.local/lib/python3.12/site-packages/sqlite_utils/db.py\", line 306, in Database\r\n filename_or_conn: Optional[Union[str, pathlib.Path, sqlite3.Connection]] = None,\r\n ^^^^^^^^^^^^^^^^^^\r\n```\r\nThis bug come from `sqlite-utils` since's v3.33.\r\nAnyone get the same ?\r\n\r\nAs well now of the resolved plan just keep the sqlite-utils version in python3.12 with v3.32.1 [tested]\r\nbut where are the sqlite3.Connection problem.... \r\n\r\nThis won't happen on python version down to 3.11[tested]\r\nJust the python3.12.0, I have test this error are come from the sqlite3 connection\r\nThe error say from `sqlite_utils` and with the sqlite3 Connection, what can I do.\r\n\r\nLet fix together.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/603/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 347058326, "node_id": "MDExOlB1bGxSZXF1ZXN0MjA1NzcwOTk2", "number": 1, "title": "Make .indexes compatible with older SQLite versions", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2018-08-02T15:17:05Z", "updated_at": "2018-08-02T15:17:30Z", "closed_at": "2018-08-02T15:17:30Z", "author_association": "OWNER", "pull_request": "simonw/sqlite-utils/pulls/1", "body": "Older SQLite versions return a different set of columns from the PRAGMA we are using.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/1/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 403028630, "node_id": "MDExOlB1bGxSZXF1ZXN0MjQ3NTc2OTQy", "number": 4, "title": "Fts5", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2019-01-25T06:54:05Z", "updated_at": "2019-01-25T06:54:33Z", "closed_at": "2019-01-25T06:54:33Z", "author_association": "OWNER", "pull_request": "simonw/sqlite-utils/pulls/4", "body": "", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/4/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 403396009, "node_id": "MDExOlB1bGxSZXF1ZXN0MjQ3ODYxNDE5", "number": 5, "title": "Run Travis tests against Python 3.8-dev", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2019-01-26T02:30:55Z", "updated_at": "2019-01-26T02:37:54Z", "closed_at": "2019-01-26T02:37:54Z", "author_association": "OWNER", "pull_request": "simonw/sqlite-utils/pulls/5", "body": "", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/5/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 403624090, "node_id": "MDU6SXNzdWU0MDM2MjQwOTA=", "number": 6, "title": "\"sqlite-utils insert\" should support newline-delimited JSON", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2019-01-28T02:00:02Z", "updated_at": "2019-01-28T02:17:45Z", "closed_at": "2019-01-28T02:17:45Z", "author_association": "OWNER", "pull_request": null, "body": "We can already export newline delimited JSON. We should learn to import it as well.\r\n\r\nThe neat thing about importing it is that you can import GBs of data without having to read the whole lot into memory in order to decode the wrapping JSON array.\r\n\r\nDatasette can export it now: https://github.com/simonw/datasette/issues/405\r\n\r\nDemo: https://latest.datasette.io/fixtures/facetable.json?_shape=array&_nl=on\r\n\r\nIt should be possible to do this:\r\n\r\n $ curl \"https://latest.datasette.io/fixtures/facetable.json?_shape=array&_nl=on\" \\\r\n | sqlite-utils insert data.db facetable - --nl\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/6/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 403625674, "node_id": "MDU6SXNzdWU0MDM2MjU2NzQ=", "number": 7, "title": ".insert_all() should accept a generator and process it efficiently", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2019-01-28T02:11:58Z", "updated_at": "2019-01-28T06:26:53Z", "closed_at": "2019-01-28T06:26:53Z", "author_association": "OWNER", "pull_request": null, "body": "Right now you have to load every record into memory before passing the list to `.insert_all()` and friends.\r\n\r\nIf you want to process millions of rows, this is inefficient. Python has generators - we should use them!\r\n\r\nThe only catch here is that part of the magic of `sqlite-utils` is that it guesses the column types and creates the table for you. This code will need to be updated to notice if the table needs creating and, if it does, create it using the first X (where x=1,000 but can be customized) records.\r\n\r\nIf a record outside of those first 1,000 has a rogue column, we can crash with an error.\r\n\r\nThis will free us up to make the `--nl` option added in #6 much more efficient.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/7/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 403922644, "node_id": "MDU6SXNzdWU0MDM5MjI2NDQ=", "number": 8, "title": "Problems handling column names containing spaces or - ", "user": {"value": 82988, "label": "psychemedia"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2019-01-28T17:23:28Z", "updated_at": "2019-04-14T15:29:33Z", "closed_at": "2019-02-23T21:09:03Z", "author_association": "NONE", "pull_request": null, "body": "Irrrespective of whether using column names containing a space or - character is good practice, SQLite does allow it, but `sqlite-utils` throws an error in the following cases:\r\n\r\n```python\r\nfrom sqlite_utils import Database\r\n\r\ndbname = 'test.db'\r\nDB = Database(sqlite3.connect(dbname))\r\n\r\nimport pandas as pd\r\ndf = pd.DataFrame({'col1':range(3), 'col2':range(3)})\r\n\r\n#Convert pandas dataframe to appropriate list/dict format\r\nDB['test1'].insert_all( df.to_dict(orient='records') )\r\n#Works fine\r\n```\r\n\r\nHowever:\r\n\r\n```python\r\ndf = pd.DataFrame({'col 1':range(3), 'col2':range(3)})\r\nDB['test1'].insert_all(df.to_dict(orient='records'))\r\n```\r\n\r\nthrows:\r\n\r\n```\r\n---------------------------------------------------------------------------\r\nOperationalError Traceback (most recent call last)\r\n in ()\r\n 1 import pandas as pd\r\n 2 df = pd.DataFrame({'col 1':range(3), 'col2':range(3)})\r\n----> 3 DB['test1'].insert_all(df.to_dict(orient='records'))\r\n\r\n/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, upsert, batch_size, column_order)\r\n 327 jsonify_if_needed(record.get(key, None)) for key in all_columns\r\n 328 )\r\n--> 329 result = self.db.conn.execute(sql, values)\r\n 330 self.db.conn.commit()\r\n 331 self.last_id = result.lastrowid\r\n\r\nOperationalError: near \"1\": syntax error\r\n```\r\n\r\nand:\r\n\r\n```python\r\ndf = pd.DataFrame({'col-1':range(3), 'col2':range(3)})\r\nDB['test1'].upsert_all(df.to_dict(orient='records'))\r\n```\r\n\r\nresults in:\r\n\r\n```\r\n---------------------------------------------------------------------------\r\nOperationalError Traceback (most recent call last)\r\n in ()\r\n 1 import pandas as pd\r\n 2 df = pd.DataFrame({'col-1':range(3), 'col2':range(3)})\r\n----> 3 DB['test1'].insert_all(df.to_dict(orient='records'))\r\n\r\n/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, upsert, batch_size, column_order)\r\n 327 jsonify_if_needed(record.get(key, None)) for key in all_columns\r\n 328 )\r\n--> 329 result = self.db.conn.execute(sql, values)\r\n 330 self.db.conn.commit()\r\n 331 self.last_id = result.lastrowid\r\n\r\nOperationalError: near \"-\": syntax error\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/8/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 405801771, "node_id": "MDExOlB1bGxSZXF1ZXN0MjQ5NjgwOTQ0", "number": 9, "title": ":pencil: Updates my_database.py to my_database.db", "user": {"value": 50527, "label": "jefftriplett"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2019-02-01T17:35:43Z", "updated_at": "2019-02-24T03:55:04Z", "closed_at": "2019-02-24T03:55:04Z", "author_association": "CONTRIBUTOR", "pull_request": "simonw/sqlite-utils/pulls/9", "body": "I noticed that both `.py` and `.db` were used in the docs and assumed you'd prefer `.db`. ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/9/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 413740684, "node_id": "MDU6SXNzdWU0MTM3NDA2ODQ=", "number": 11, "title": "Detect numpy types when creating tables", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2019-02-23T21:09:35Z", "updated_at": "2019-02-24T04:02:20Z", "closed_at": "2019-02-24T04:02:20Z", "author_association": "OWNER", "pull_request": null, "body": "Inspired by #8", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/11/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 413778585, "node_id": "MDExOlB1bGxSZXF1ZXN0MjU1NjU4MTEy", "number": 12, "title": "Support for numpy types, closes #11", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2019-02-24T03:57:32Z", "updated_at": "2019-02-24T04:02:20Z", "closed_at": "2019-02-24T04:02:20Z", "author_association": "OWNER", "pull_request": "simonw/sqlite-utils/pulls/12", "body": "", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/12/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 413779210, "node_id": "MDU6SXNzdWU0MTM3NzkyMTA=", "number": 13, "title": "Ability to automatically create IDs from content hash of row", "user": {"value": 9599, "label": "simonw"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2019-02-24T04:07:08Z", "updated_at": "2019-02-24T04:36:48Z", "closed_at": "2019-02-24T04:36:48Z", "author_association": "OWNER", "pull_request": null, "body": "Sometimes when you are importing data the underlying source provides records without IDs that can be uniquely identified by their contents.\r\n\r\nA utility mechanism for calculating a sha1 hash of the contents and using that as a unique ID would be useful.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/13/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 411066700, "node_id": "MDU6SXNzdWU0MTEwNjY3MDA=", "number": 10, "title": "Error in upsert if column named 'order'", "user": {"value": 82988, "label": "psychemedia"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2019-02-16T12:05:18Z", "updated_at": "2019-02-24T16:55:38Z", "closed_at": "2019-02-24T16:55:37Z", "author_association": "NONE", "pull_request": null, "body": "The following works fine:\r\n```\r\nconnX = sqlite3.connect('DELME.db', timeout=10)\r\n\r\ndfX=pd.DataFrame({'col1':range(3),'col2':range(3)})\r\nDBX = Database(connX)\r\nDBX['test'].upsert_all(dfX.to_dict(orient='records'))\r\n```\r\n\r\nBut if a column is named `order`:\r\n```\r\nconnX = sqlite3.connect('DELME.db', timeout=10)\r\n\r\ndfX=pd.DataFrame({'order':range(3),'col2':range(3)})\r\nDBX = Database(connX)\r\nDBX['test'].upsert_all(dfX.to_dict(orient='records'))\r\n```\r\n\r\nit throws an error:\r\n\r\n```\r\n---------------------------------------------------------------------------\r\nOperationalError Traceback (most recent call last)\r\n in \r\n 3 dfX=pd.DataFrame({'order':range(3),'col2':range(3)})\r\n 4 DBX = Database(connX)\r\n----> 5 DBX['test'].upsert_all(dfX.to_dict(orient='records'))\r\n\r\n/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in upsert_all(self, records, pk, foreign_keys, column_order)\r\n 347 foreign_keys=foreign_keys,\r\n 348 upsert=True,\r\n--> 349 column_order=column_order,\r\n 350 )\r\n 351 \r\n\r\n/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, upsert, batch_size, column_order)\r\n 327 jsonify_if_needed(record.get(key, None)) for key in all_columns\r\n 328 )\r\n--> 329 result = self.db.conn.execute(sql, values)\r\n 330 self.db.conn.commit()\r\n 331 self.last_id = result.lastrowid\r\n\r\nOperationalError: near \"order\": syntax error\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/10/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"}