{"id": 488293926, "node_id": "MDU6SXNzdWU0ODgyOTM5MjY=", "number": 58, "title": "Support enabling FTS on views", "user": {"value": 49260, "label": "amjith"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2019-09-02T18:56:36Z", "updated_at": "2020-10-16T18:39:36Z", "closed_at": "2020-10-16T18:39:31Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Right now enable_fts() is only implemented for Table(). Technically sqlite supports enabling fts on views. But it requires deeper thought since views don't have `rowid` and the current implementation of enable_fts() relies on the presence of `rowid` column. \r\n\r\nIt is possible to provide an alternative rowid using the `content_rowid` option to the FTS5() function. \r\n\r\nRef: https://sqlite.org/fts5.html#fts5_table_creation_and_initialization\r\n\r\n> The \"content_rowid\" option, used to set the rowid field of an external content table. \r\n\r\nThis will further complicate `enable_fts()` function by adding an extra argument. I'm wondering if that is outside the scope of this tool or should I work on that feature and send a PR? ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/58/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 546073980, "node_id": "MDU6SXNzdWU1NDYwNzM5ODA=", "number": 74, "title": "Test failures on openSUSE 15.1: AssertionError: Explicit other_table and other_column", "user": {"value": 15092, "label": "jayvdb"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2020-01-07T04:35:50Z", "updated_at": "2020-01-12T07:21:17Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "openSUSE 15.1 is using python 3.6.5 and click-7.0 , however it has test failures while openSUSE Tumbleweed on py37 passes.\r\n\r\nMost fail on the cli exit code like\r\n```py\r\n[ 74s] =================================== FAILURES ===================================\r\n[ 74s] _________________________________ test_tables __________________________________\r\n[ 74s] \r\n[ 74s] db_path = '/tmp/pytest-of-abuild/pytest-0/test_tables0/test.db'\r\n[ 74s] \r\n[ 74s] def test_tables(db_path):\r\n[ 74s] result = CliRunner().invoke(cli.cli, [\"tables\", db_path])\r\n[ 74s] > assert '[{\"table\": \"Gosh\"},\\n {\"table\": \"Gosh2\"}]' == result.output.strip()\r\n[ 74s] E assert '[{\"table\": \"...e\": \"Gosh2\"}]' == ''\r\n[ 74s] E - [{\"table\": \"Gosh\"},\r\n[ 74s] E - {\"table\": \"Gosh2\"}]\r\n[ 74s] \r\n[ 74s] tests/test_cli.py:28: AssertionError\r\n```\r\n\r\npackaging project at https://build.opensuse.org/package/show/home:jayvdb:py-new/python-sqlite-utils\r\n\r\nI'll keep digging into this after I have github-to-sqlite working on Tumbleweed, as I'll need openSUSE Leap 15.1 working before I can submit this into the main python repo.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/74/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 610517472, "node_id": "MDU6SXNzdWU2MTA1MTc0NzI=", "number": 103, "title": "sqlite3.OperationalError: too many SQL variables in insert_all when using rows with varying numbers of columns", "user": {"value": 32605365, "label": "b0b5h4rp13"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 8, "created_at": "2020-05-01T02:26:14Z", "updated_at": "2020-05-14T00:18:57Z", "closed_at": "2020-05-14T00:18:57Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "If using insert_all to put in 1000 rows of data with varying number of columns, it comes up with this message `sqlite3.OperationalError: too many SQL variables` if the number of columns is larger in later records (past the first row)\r\n\r\nI've reduced `SQLITE_MAX_VARS` by 100 to 899 at the top of `db.py` to add wiggle room, so that if the column count increases it wont go past SQLite's batch limit as calculated by this line of code based on the count of the first row's dict keys\r\n\r\n batch_size = max(1, min(batch_size, SQLITE_MAX_VARS // num_columns))", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/103/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 686978131, "node_id": "MDU6SXNzdWU2ODY5NzgxMzE=", "number": 139, "title": "insert_all(..., alter=True) should work for new columns introduced after the first 100 records", "user": {"value": 96218, "label": "simonwiles"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 7, "created_at": "2020-08-27T06:25:25Z", "updated_at": "2020-08-28T22:48:51Z", "closed_at": "2020-08-28T22:30:14Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Is there a way to make `.insert_all()` work properly when new columns are introduced outside the first 100 records (with or without the `alter=True` argument)?\r\n\r\nI'm using `.insert_all()` to bulk insert ~3-4k records at a time and it is common for records to need to introduce new columns. However, if new columns are introduced after the first 100 records, `sqlite_utils` doesn't even raise the `OperationalError: table ... has no column named ...` exception; it just silently drops the extra data and moves on.\r\n\r\nIt took me a while to find this little snippet in the [documentation for `.insert_all()`](https://sqlite-utils.readthedocs.io/en/stable/python-api.html#bulk-inserts) (it's not mentioned under [Adding columns automatically on insert/update](https://sqlite-utils.readthedocs.io/en/stable/python-api.html#bulk-inserts)):\r\n\r\n> The column types used in the CREATE TABLE statement are automatically derived from the types of data in that first batch of rows. **_Any additional or missing columns in subsequent batches will be ignored._**\r\n\r\nI tried changing the `batch_size` argument to the total number of records, but it seems only to effect the number of rows that are committed at a time, and has no influence on this problem.\r\n\r\nIs there a way around this that you would suggest? It seems like it should raise an exception at least.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/139/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 688659182, "node_id": "MDU6SXNzdWU2ODg2NTkxODI=", "number": 145, "title": "Bug when first record contains fewer columns than subsequent records", "user": {"value": 96218, "label": "simonwiles"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2020-08-30T05:44:44Z", "updated_at": "2020-09-08T23:21:23Z", "closed_at": "2020-09-08T23:21:23Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "`insert_all()` selects the maximum batch size based on the number of fields in the first record. If the first record has fewer fields than subsequent records (and `alter=True` is passed), this can result in SQL statements with more than the maximum permitted number of host parameters. This situation is perhaps unlikely to occur, but could happen if the first record had, say, 10 columns, such that `batch_size` (based on `SQLITE_MAX_VARIABLE_NUMBER = 999`) would be 99. If the next 98 rows had 11 columns, the resulting SQL statement for the first batch would have `10 * 1 + 11 * 98 = 1088` host parameters (and subsequent batches, if the data were consistent from thereon out, would have `99 * 11 = 1089`).\r\n\r\nI suspect that this bug is masked somewhat by the fact that while:\r\n> [`SQLITE_MAX_VARIABLE_NUMBER`](https://www.sqlite.org/limits.html#max_variable_number) ... defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.\r\n\r\nit is common that it is increased at compile time. Debian-based systems, for example, seem to ship with a version of sqlite compiled with `SQLITE_MAX_VARIABLE_NUMBER` set to 250,000, and I believe this is the case for homebrew installations too.\r\n\r\nA test for this issue might look like this:\r\n```python\r\ndef test_columns_not_in_first_record_should_not_cause_batch_to_be_too_large(fresh_db):\r\n # sqlite on homebrew and Debian/Ubuntu etc. is typically compiled with\r\n # SQLITE_MAX_VARIABLE_NUMBER set to 250,000, so we need to exceed this value to\r\n # trigger the error on these systems.\r\n THRESHOLD = 250000\r\n extra_columns = 1 + (THRESHOLD - 1) // 99\r\n records = [\r\n {\"c0\": \"first record\"}, # one column in first record -> batch_size = 100\r\n # fill out the batch with 99 records with enough columns to exceed THRESHOLD\r\n *[\r\n dict([(\"c{}\".format(i), j) for i in range(extra_columns)])\r\n for j in range(99)\r\n ]\r\n ]\r\n try:\r\n fresh_db[\"too_many_columns\"].insert_all(records, alter=True)\r\n except sqlite3.OperationalError:\r\n raise\r\n```\r\n\r\nThe best solution, I think, is simply to process all the records when determining columns, column types, and the batch size. In my tests this doesn't seem to be particularly costly at all, and cuts out a lot of complications (including obviating my implementation of #139 at #142). I'll raise a PR for your consideration.\r\n\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/145/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 688670158, "node_id": "MDU6SXNzdWU2ODg2NzAxNTg=", "number": 147, "title": "SQLITE_MAX_VARS maybe hard-coded too low", "user": {"value": 96218, "label": "simonwiles"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 7, "created_at": "2020-08-30T07:26:45Z", "updated_at": "2021-02-15T21:27:55Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "I came across this while about to open an issue and PR against the documentation for `batch_size`, which is a bit incomplete.\r\n\r\nAs mentioned in #145, while:\r\n\r\n> [`SQLITE_MAX_VARIABLE_NUMBER`](https://www.sqlite.org/limits.html#max_variable_number) ... defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.\r\n\r\nit is common that it is increased at compile time. Debian-based systems, for example, seem to ship with a version of sqlite compiled with SQLITE_MAX_VARIABLE_NUMBER set to 250,000, and I believe this is the case for homebrew installations too.\r\n\r\nIn working to understand what `batch_size` was actually doing and why, I realized that by setting `SQLITE_MAX_VARS` in `db.py` to match the value my sqlite was compiled with (I'm on Debian), I was able to decrease the time to `insert_all()` my test data set (~128k records across 7 tables) from ~26.5s to ~3.5s. Given that this about .05% of my total dataset, this is time I am keen to save...\r\n\r\nUnfortunately, it seems that `sqlite3` in the python standard library doesn't expose the `get_limit()` C API (even though `pysqlite` used to), so it's hard to know what value sqlite has been compiled with (note that this could mean, I suppose, that it's less than 999, and even hardcoding `SQLITE_MAX_VARS` to the conservative default might not be adequate. It can also be lowered -- but not raised -- at runtime). The best I could come up with is `echo \"\" | sqlite3 -cmd \".limits variable_number\"` (only available in `sqlite >= 2015-05-07 (3.8.10)`).\r\n\r\nObviously this couldn't be relied upon in `sqlite_utils`, but I wonder what your opinion would be about exposing `SQLITE_MAX_VARS` as a user-configurable parameter (with suitable \"here be dragons\" warnings)? I'm going to go ahead and monkey-patch it for my purposes in any event, but it seems like it might be worth considering.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/147/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 766156875, "node_id": "MDU6SXNzdWU3NjYxNTY4NzU=", "number": 209, "title": "Test failure with sqlite 3.34 in test_cli.py::test_optimize", "user": {"value": 191622, "label": "meatcar"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2020-12-14T08:58:18Z", "updated_at": "2021-01-01T23:52:46Z", "closed_at": "2021-01-01T23:52:46Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "pytest output:\r\n```\r\n...\r\n============================== short test summary info ===============================\r\nFAILED tests/test_cli.py::test_optimize[tables0] - assert 1662976 < 1662976\r\nFAILED tests/test_cli.py::test_optimize[tables1] - assert 1667072 < 1662976\r\n===================== 2 failed, 538 passed, 3 skipped in 34.32s ======================\r\n```\r\n\r\nCame across this while packaging `sqlite-utils` for NixOS, but it can be recreated it using the `alpine:edge` docker image as well as follows:\r\n\r\n```\r\ndocker run --rm -it alpine:edge /bin/sh\r\n# apk update && apk add git sqlite python3 gcc python3-dev musl-dev && python3 -m ensurepip\r\n# git clone https://github.com/simonw/sqlite-utils.git\r\n# cd sqlite-utils/\r\n# pip3 install -e .[test]\r\n# pytest\r\n```\r\n\r\nThis definitely works on sqlite v3.33.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/209/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 817989436, "node_id": "MDU6SXNzdWU4MTc5ODk0MzY=", "number": 242, "title": "Async support", "user": {"value": 25778, "label": "eyeseast"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 13, "created_at": "2021-02-27T18:29:38Z", "updated_at": "2021-10-28T14:37:56Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Following our conversation last week, want to note this here before I forget.\r\n\r\nI've had a couple situations where I'd like to do a bunch of updates in an async event loop, but I run into SQLite's issues with concurrent writes. This feels like something sqlite-utils could help with.\r\n\r\nPeeWee ORM has a [SQLite write queue](http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#sqliteq) that might be a good model. It's using threads or gevent, but I _think_ that approach would translate well enough to asyncio. \r\n\r\nHappy to help with this, too.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/242/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 831751367, "node_id": "MDU6SXNzdWU4MzE3NTEzNjc=", "number": 246, "title": "Escaping FTS search strings", "user": {"value": 16001974, "label": "DeNeutoy"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2021-03-15T12:15:09Z", "updated_at": "2021-08-18T18:57:13Z", "closed_at": "2021-08-18T18:43:12Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "\r\nThanks for the excellent library, it's very nice to use!\r\n\r\nI've been building some in memory search functionality for a data annotation tool i'm making, and I got tripped up a little bit with escaping the full text search queries. First I tried using `db.quote(q)`, which doesn't work, because sqlite FTS has it's own (separate)[ query syntax](https://www2.sqlite.org/fts5.html#full_text_query_syntax). You can see this happening here also:\r\n\r\nhttp://search-24ways.herokuapp.com/24ways-f8f455f/articles?_search=acces%2A\r\n\r\nI got around this by aggressively escaping quotes inside the query string like this:\r\n\r\n```python\r\n quoted = q.replace('\"', '\"\"')\r\n quoted = f'\"{quoted}\"'\r\n print(quoted)\r\n results = db[\"data\"].search(quoted, columns=[\"id\"])\r\n return [x[\"id\"] for x in results]\r\n\r\n```\r\n\r\nThis works in the sense it doesn't crash, but it also removes access to the search query syntax. Given the well specified definition, it might be possible for sqlite-utils to provide a `db.quote_query(q)` which would intelligently escape a query whilst leaving the syntax intact. This would be very nice!\r\n\r\n\r\n\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/246/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 868188068, "node_id": "MDU6SXNzdWU4NjgxODgwNjg=", "number": 257, "title": "Insert from JSON containing strings with non-ascii characters are escaped as unicode for lists, tuples, dicts.", "user": {"value": 6586811, "label": "dylan-wu"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-04-26T20:46:25Z", "updated_at": "2021-05-19T02:57:05Z", "closed_at": "2021-05-19T02:57:05Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "JSON Test File (test.json):\r\n\r\n```json\r\n[\r\n {\r\n \"id\": 123,\r\n \"text\": \"FR Th\u00e9\u00e2tre\"\r\n },\r\n {\r\n \"id\": 223,\r\n \"text\": [\r\n \"FR Th\u00e9\u00e2tre\"\r\n ]\r\n }\r\n]\r\n```\r\n\r\nCommand to import:\r\n\r\n```bash\r\nsqlite-utils insert test.db text test.json --pk=id\r\n```\r\n\r\nResulting table view from datasette:\r\n\r\n![image](https://user-images.githubusercontent.com/6586811/116147833-cdf2fb00-a6a5-11eb-8412-0aae81b6e6dd.png)\r\n\r\nOriginal, db.py line 2225:\r\n\r\n```python\r\n return json.dumps(value, default=repr)\r\n```\r\n\r\nFix, db.py line 2225:\r\n\r\n```python\r\n return json.dumps(value, default=repr, ensure_ascii=False)\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/257/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 907642546, "node_id": "MDU6SXNzdWU5MDc2NDI1NDY=", "number": 264, "title": "Supporting additional output formats, like GeoJSON", "user": {"value": 25778, "label": "eyeseast"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-05-31T18:03:32Z", "updated_at": "2021-06-03T05:12:21Z", "closed_at": "2021-06-03T05:12:21Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "I have a project going where it would be useful to do some spatial processing in SQLite (instead of large files) and then output GeoJSON. So my workflow would be something like this:\r\n\r\n1. Read Shapefiles, GeoJSON, CSVs into a SQLite database\r\n2. Join, filter, prune as needed\r\n3. Export GeoJSON for just the stuff I need at that moment, while still having a database of things that will be useful later\r\n\r\nI'm wondering if this is worth adding to SQLite-utils itself (GeoJSON, at least), or if it's better to make a counterpart to the ecosystem of `*-to-sqlite` tools, say a suite of `sqlite-to-*` things. Or would it be crazy to have a plugin system?", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/264/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 923602693, "node_id": "MDU6SXNzdWU5MjM2MDI2OTM=", "number": 276, "title": "support small help flag -h", "user": {"value": 601708, "label": "mcint"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-06-17T07:59:31Z", "updated_at": "2021-06-18T14:56:59Z", "closed_at": "2021-06-18T14:56:59Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/276/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 923697888, "node_id": "MDU6SXNzdWU5MjM2OTc4ODg=", "number": 278, "title": "Support db as first parameter before subcommand, or as environment variable", "user": {"value": 601708, "label": "mcint"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-06-17T09:26:29Z", "updated_at": "2021-06-20T22:39:57Z", "closed_at": "2021-06-18T15:43:19Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/278/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1044267332, "node_id": "I_kwDOCGYnMM4-PkFE", "number": 336, "title": "sqlite-util tranform --column-order mangles columns of type \"timestamp\"", "user": {"value": 536941, "label": "fgregg"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-11-04T01:15:38Z", "updated_at": "2023-05-08T21:13:38Z", "closed_at": "2023-05-08T21:13:38Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Reproducible code below:\r\n\r\n```bash\r\n> echo 'create table bar (baz text, created_at timestamp default CURRENT_TIMESTAMP)' | sqlite3 foo.db\r\n> sqlite3 foo.db\r\nSQLite version 3.36.0 2021-06-18 18:36:39\r\nEnter \".help\" for usage hints.\r\nsqlite> .schema bar\r\nCREATE TABLE bar (baz text, created_at timestamp default CURRENT_TIMESTAMP);\r\nsqlite> .exit\r\n> sqlite-utils transform foo.db bar --column-order baz\r\nsqlite3 foo.db\r\nSQLite version 3.36.0 2021-06-18 18:36:39\r\nEnter \".help\" for usage hints.\r\nsqlite> .schema bar\r\nCREATE TABLE IF NOT EXISTS \"bar\" (\r\n [baz] TEXT,\r\n [created_at] FLOAT DEFAULT 'CURRENT_TIMESTAMP'\r\n);\r\nsqlite> .exit\r\n> sqlite-utils transform foo.db bar --column-order baz\r\n> sqlite3 foo.db\r\nSQLite version 3.36.0 2021-06-18 18:36:39\r\nEnter \".help\" for usage hints.\r\nsqlite> .schema bar\r\nCREATE TABLE IF NOT EXISTS \"bar\" (\r\n [baz] TEXT,\r\n [created_at] FLOAT DEFAULT '''CURRENT_TIMESTAMP'''\r\n);\r\n```\r\n\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/336/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1077102934, "node_id": "I_kwDOCGYnMM5AM0lW", "number": 353, "title": "Allow passing a file of code to \"sqlite-utils convert\"", "user": {"value": 536941, "label": "fgregg"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 8, "created_at": "2021-12-10T18:06:14Z", "updated_at": "2021-12-11T01:38:29Z", "closed_at": "2021-12-11T01:09:39Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "sqlite-utils is so nice, but the ergonomics of the multiline code in kind of tough. It's really hard (maybe impossible) to make the newlines play well with Makefiles.\r\n\r\nit would be great to write your code fragment in a separate file and direct it into the sqlite-utils\r\n\r\neither like\r\n\r\n```sqlite-utils convert my.db my_table my_column < custom_code.py```\r\n\r\nor\r\n\r\n```sqlite-utils convert my.db my_table my_column --custom-code=custom_code.py```\r\n\r\nThanks, as ever, for these great tools!", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/353/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1096558279, "node_id": "I_kwDOCGYnMM5BXCbH", "number": 365, "title": "create-index should run analyze after creating index", "user": {"value": 536941, "label": "fgregg"}, "state": "closed", "locked": 0, "assignee": null, "milestone": {"value": 7558727, "label": "3.21"}, "comments": 16, "created_at": "2022-01-07T18:21:25Z", "updated_at": "2022-01-11T02:43:34Z", "closed_at": "2022-01-11T01:36:48Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "sqlite's query planner depends upon analyze to make good use of indices. It would be nice if analyze was run as part of the create-index command.\r\n\r\nIf data is inserted later, things can get out date, but it would still probably be a net win. ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/365/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1124237013, "node_id": "I_kwDOCGYnMM5DAn7V", "number": 398, "title": "Add SpatiaLite helpers to CLI", "user": {"value": 25778, "label": "eyeseast"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 9, "created_at": "2022-02-04T14:01:28Z", "updated_at": "2022-02-16T01:02:29Z", "closed_at": "2022-02-16T00:58:07Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Now that #385 is merged, add CLI versions of those methods.\r\n\r\n```sh\r\n# init spatialite\r\nsqlite-utils init-spatialite database.db\r\n\r\n# or maybe/also\r\nsqlite-utils create database.db --enable-wal --spatialite\r\n\r\n# add geometry columns\r\n# needs a database, table, geometry column name, type, with optional SRID and not-null\r\n# this needs to create a table if it doesn't already exist\r\nsqlite-utils add-geometry-column database.db table-name geometry --srid 4326 --not-null\r\n\r\n# spatial index an existing table/column\r\nsqlite-utils create-spatial-index database.db table-name geometry\r\n```\r\n\r\nShould be mostly straightforward. The one thing worth highlighting in docs is that geometry columns can only be added to existing tables. Trying to add a geometry column to a table that doesn't exist yet might mean you have a schema like `{\"rowid\": int, \"geometry\": bytes}`. Might be worth nudging people to explicitly create a table first, then add geometry columns.\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/398/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1126692066, "node_id": "I_kwDOCGYnMM5DJ_Ti", "number": 403, "title": "Document how to add a primary key to a rowid table using `sqlite-utils transform --pk`", "user": {"value": 536941, "label": "fgregg"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2022-02-08T01:39:40Z", "updated_at": "2022-02-09T04:22:43Z", "closed_at": "2022-02-08T19:33:59Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "*Original title: Add option for adding a new, serial, primary key*\r\n\r\nsometimes we have tables that don't have primary keys, but ought to have them. we *can* use rowid for that, but it would often be nicer to have an explicit primary key. using the current value of rowid would be fine.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/403/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1160034488, "node_id": "I_kwDOCGYnMM5FJLi4", "number": 411, "title": "Support for generated columns", "user": {"value": 25778, "label": "eyeseast"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 8, "created_at": "2022-03-04T20:41:33Z", "updated_at": "2022-03-11T22:32:43Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "This is a fairly new feature -- SQLite version 3.31.0 (2020-01-22) -- that I, admittedly, haven't gotten to work yet. But it looks _incredibly_ useful: https://dgl.cx/2020/06/sqlite-json-support\r\n\r\nI'm not sure if this is an option on `add-column` or a separate command like `add-generated-column`. Either way, it needs an argument to populate it. It could be something like this:\r\n\r\n```sh\r\nsqlite-utils add-column data.db table-name generated --as 'json_extract(data, \"$.field\")' --virtual\r\n```\r\n\r\nMore here: https://www.sqlite.org/gencol.html", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/411/reactions\", \"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1178456794, "node_id": "I_kwDOCGYnMM5GPdLa", "number": 418, "title": "Add generated files to .gitignore", "user": {"value": 25778, "label": "eyeseast"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2022-03-23T17:48:12Z", "updated_at": "2022-03-24T21:01:44Z", "closed_at": "2022-03-24T21:01:44Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "I end up with these in my local directory:\r\n\r\n\t.hypothesis/\r\n\tPipfile\r\n\tPipfile.lock\r\n\tpyproject.toml\r\n\r\nMight as well gitignore them.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/418/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1239034903, "node_id": "I_kwDOCGYnMM5J2iwX", "number": 433, "title": "CLI eats my cursor", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 10, "created_at": "2022-05-17T18:52:52Z", "updated_at": "2023-11-04T00:46:30Z", "closed_at": "2023-11-04T00:46:30Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "I'm not sure why this happens but `sqlite-utils` makes my terminal cursor disappear after running commands like `sqlite-utils insert`. I've only noticed this behavior in `sqlite-utils`, not in any other CLI tools\r\n\r\nI can still type commands after it runs but the text cursor is invisible", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/433/reactions\", \"total_count\": 5, \"+1\": 5, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1279863844, "node_id": "I_kwDOCGYnMM5MSSwk", "number": 449, "title": "Utilities for duplicating tables and creating a table with the results of a query", "user": {"value": 1690072, "label": "davidleejy"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2022-06-22T09:41:43Z", "updated_at": "2022-07-15T21:46:13Z", "closed_at": "2022-07-15T21:21:36Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "is there a duplicate table functionality? Otherwise, I'd be happy to submit a PR.\r\n\r\nIn sqlite3 it would look like:\r\n\r\n```python\r\nimport sqlite3 as sl\r\n\r\ncon = sl.connect('prompt-tune.db')\r\n\r\ndef db_duplicate_table(table_name, table_name_new, con=con):\r\n # Duplicates table `table_name` to a new table `table_name_new`.\r\n try:\r\n cur = con.cursor()\r\n cur.execute(f\"\"\"CREATE TABLE {table_name_new} AS SELECT * FROM {table_name}\"\"\")\r\n except Exception as e:\r\n print(e)\r\n finally:\r\n cur.close()\r\n\r\ndb_duplicate_table('orig_table', 'new_table')\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/449/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1310243385, "node_id": "I_kwDOCGYnMM5OGLo5", "number": 456, "title": "feature request: pivot command", "user": {"value": 536941, "label": "fgregg"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2022-07-20T00:58:08Z", "updated_at": "2022-07-20T17:50:50Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "pivoting long-format table to wide-format tables is pretty common and kind of pain. would love to see this feature in sqlite-utils!", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/456/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1324659241, "node_id": "I_kwDOCGYnMM5O9LIp", "number": 459, "title": "Single quoted transform recipes on Windows do not work as expected ", "user": {"value": 19921, "label": "shakeel"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2022-08-01T16:14:54Z", "updated_at": "2022-08-01T16:14:54Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Trying to follow the tutorial for sqlite-utils and datasette https://datasette.io/tutorials/clean-data on Windows 11 OS `Microsoft Windows [Version 10.0.22622.440]`, with sqlite-utils and datasette installed using pipx.\r\n\r\n```\r\npipx list\r\npackage datasette 0.61.1, installed using Python 3.10.4\r\n - datasette.exe\r\npackage sqlite-utils 3.28, installed using Python 3.10.4\r\n - sqlite-utils.exe\r\n``` \r\n\r\nIn the step to transform dates into ISO dates the quoted value `'r.parsedatetime(value)'` is copied verbatim into the columns instead of applying the output of the Python recipe.\r\n\r\n```\r\nsqlite-utils convert manatees.db locations \\\r\n REPDATE created_date last_edited_date \\\r\n 'r.parsedatetime(value)' --dry-run\r\n\r\n1975/01/31 00:00:00+00\r\n --- becomes:\r\nr.parsedatetime(value)\r\n\r\nWould affect 13568 rows\r\n```\r\n\r\nHowever, if I change the code from single quotes to double quotes, it works as expected.\r\n\r\n```\r\nsqlite-utils convert manatees.db locations \\\r\n REPDATE created_date last_edited_date \\\r\n \"r.parsedatetime(value)\" --dry-run\r\n\r\n1975/01/31 00:00:00+00\r\n --- becomes:\r\n1975-01-31T00:00:00+00:00\r\n\r\nWould affect 13568 rows\r\n```\r\n\r\nSpecifying the transform code recipe should work with single quotes on Windows.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/459/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1355193529, "node_id": "I_kwDOCGYnMM5Qxpy5", "number": 479, "title": "OperationalError: cannot VACUUM from within a transaction", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2022-08-30T05:34:24Z", "updated_at": "2022-08-30T05:34:24Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Maybe when calling `.vacuum()` and other DB-level write-lock operations `sqlite_utils` could guard against this error message by automatically committing first?\r\n\r\n```\r\n 46 db[\"media\"].optimize() # type: ignore\r\n---> 47 db.vacuum()\r\n\r\nFile ~/.local/lib/python3.10/site-packages/sqlite_utils/db.py:1047, in Database.vacuum(self)\r\n 1045 def vacuum(self):\r\n 1046 \"Run a SQLite ``VACUUM`` against the database.\"\r\n-> 1047 self.execute(\"VACUUM;\")\r\n\r\nFile ~/.local/lib/python3.10/site-packages/sqlite_utils/db.py:470, in Database.execute(self, sql, parameters)\r\n 468 return self.conn.execute(sql, parameters)\r\n 469 else:\r\n--> 470 return self.conn.execute(sql)\r\n\r\nOperationalError: cannot VACUUM from within a transaction\r\n```\r\n\r\nIt might also be nice to add a sentence or two about how transactions are committed on the [docs page](https://sqlite-utils.datasette.io/en/latest/python-api.html#detect-fts). When I was swapping out my sqlite3 code for this library it was nice that everything was pretty much drop-in but I was/am unsure what to do about the places I explicitly call `.commit()` in my code\r\n\r\nRelated to https://github.com/simonw/sqlite-utils/issues/121", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/479/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1361355564, "node_id": "I_kwDOCGYnMM5RJKMs", "number": 482, "title": "balanced table default column_order", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-09-05T03:00:18Z", "updated_at": "2022-10-10T17:43:02Z", "closed_at": "2022-09-06T20:17:27Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Is there any performance or size difference with column order in SQLITE ? similar to this https://www.cybertec-postgresql.com/en/column-order-in-postgresql-does-matter/\r\n\r\nIt might be interesting to have an option to create with an optimized column order. I'm assuming this would look something like INTEGER columns, REAL columns, BLOB columns, TEXT columns, NULL columns. NULL columns at the end because they are more likely to be TEXT and it is impossible to know if they will become INTEGER\r\n\r\n(Of course, any schema evolution would reduce optimization but maybe column order could also be re-evaluated when schema changes)\r\n\r\nedit:\r\n\r\nthis is easy to accomplish with the existing `transform` method:\r\n\r\n```\r\nint_columns = [k for k, v in table_columns.items() if v == int]\r\ndb[table].transform(column_order=[*int_columns])\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/482/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1366423176, "node_id": "I_kwDOCGYnMM5RcfaI", "number": 485, "title": "Progressbar not shown when inserting/upserting jsonlines file", "user": {"value": 99098079, "label": "MischaU8"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-09-08T14:13:18Z", "updated_at": "2022-09-15T20:39:52Z", "closed_at": "2022-09-15T20:37:52Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "When inserting or upserting a jsonlines file, no progressbar is shown. Expected behavior is that, just like with .csv/.tsv files, also for a jsonlines file (--nl), unless --silent is provided, a progressbar is shown.\r\n\r\n```bash\r\nsql-utils upsert mydb.db posts posts.jl --nl --pk post_id\r\n(silence)\r\n```\r\n\r\nCurrently `file_progress` is only called within the tsv/csv logic, however I think it can be safely wrapped around all the all the input formats that use `decoded`: https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/cli.py#L963", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/485/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1393202060, "node_id": "I_kwDOCGYnMM5TCpOM", "number": 496, "title": "devrel/python api: Pylance type hinting", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2022-10-01T03:03:34Z", "updated_at": "2023-05-03T05:53:27Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Pylance is generally pretty good at figuring out stuff but `sqlite-utils` has some quirks which make type hinting kinda useless. Maybe you don't care but I thought I would bring it to your attention.\r\n\r\nFor example:\r\n\r\n```\r\ndb[\"subs\"].insert_all(subs, pk=\"index\")\r\n```\r\n\r\n```\r\nCannot access member \"insert_all\" for type \"View\"\r\n Member \"insert_all\" is unknown\r\n```\r\n\r\n`insert_all` and all the other methods show up as a type issues because the program can't know whether something is a View or a Table. Fair enough. But that basically throws all type checking out the window.\r\n\r\n`pk=\"index\"` also shows up as a type issue:\r\n\r\n```\r\nArgument of type \"Literal['index']\" cannot be assigned to parameter \"pk\" of type \"Default\" in function \"insert_all\"\r\n \"Literal['index']\" is incompatible with \"Default\"\r\n```\r\n\r\nI think this is because DEFAULT is an empty class? \r\n\r\nmaybe a few small changes could be made to make the library more type-friendly\r\n\r\nThe interim solution is of course to turn off type hints completely for the line\r\n```\r\ndb[\"subs\"].insert_all(subs, pk=\"index\") # type: ignore\r\n```\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/496/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1393212964, "node_id": "I_kwDOCGYnMM5TCr4k", "number": 497, "title": "column_names", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-10-01T03:34:21Z", "updated_at": "2022-10-25T21:09:28Z", "closed_at": "2022-10-25T21:09:28Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "It would be nice to have a `column_names`. Similar to `table_names`.\r\n\r\nOr if you could get one or all of the following syntax to work for both Database and Table that might be even better: \r\n\r\nStyle 1\r\n- `if 'table1' in db`\r\n- `if 'col1' in db['table1']`\r\n\r\nStyle 2\r\n- `if 'table1' in db.tables`\r\n- `if 'col1' in db['table1'].columns`\r\n\r\nmaybe the table ones actually work but I'm too lazy to check. I just know that I have to do:\r\n\r\n `[c.name for c in db['table1'].columns]`\r\n\r\nEdit: This is possible with `columns_dict`. I have actually used that before but I forgot about it. Feel free to close, but I do think accessing this data could be more consistent and intuitive.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/497/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1430325103, "node_id": "I_kwDOCGYnMM5VQQdv", "number": 507, "title": "conn.execute: UnicodeEncodeError: 'utf-8' codec can't encode character", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-10-31T18:49:51Z", "updated_at": "2022-11-01T00:40:17Z", "closed_at": "2022-11-01T00:40:16Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "I'm not really sure what caused this and it happened in the middle of my program (after running for 35775 seconds).\r\n\r\n```\r\nExtracting metadata 49.9% (chunk 9893 of 19831)\r\n...\r\n File \"/home/xk/.local/lib/python3.10/site-packages/xklb/fs_extract.py\", line 90, in extract_chunk\r\n args.db[\"media\"].insert_all(utils.list_dict_filter_bool(media), pk=\"path\", alter=True, replace=True)\r\n File \"/home/xk/.local/lib/python3.10/site-packages/sqlite_utils/db.py\", line 3107, in insert_all\r\n self.insert_chunk(\r\n File \"/home/xk/.local/lib/python3.10/site-packages/sqlite_utils/db.py\", line 2872, in insert_chunk\r\n result = self.db.execute(query, params)\r\n File \"/home/xk/.local/lib/python3.10/site-packages/sqlite_utils/db.py\", line 483, in execute\r\n return self.conn.execute(sql, parameters)\r\nUnicodeEncodeError: 'utf-8' codec can't encode character '\\udcc3' in position 62: surrogates not allowed\r\n```\r\n\r\nThis might be relevant: https://stackoverflow.com/questions/31898353/python-cant-encode-with-surrogateescape\r\n\r\nI'm going to try re-running with \r\n\r\n```py\r\n def execute(\r\n self, sql: str, parameters: Optional[Union[Iterable, dict]] = None\r\n ) -> sqlite3.Cursor:\r\n \"\"\"\r\n Execute SQL query and return a ``sqlite3.Cursor``.\r\n\r\n :param sql: SQL query to execute\r\n :param parameters: Parameters to use in that query - an iterable for ``where id = ?``\r\n parameters, or a dictionary for ``where id = :id``\r\n \"\"\"\r\n try:\r\n if self._tracer:\r\n self._tracer(sql, parameters)\r\n if parameters is not None:\r\n return self.conn.execute(sql, parameters)\r\n else:\r\n return self.conn.execute(sql)\r\n except UnicodeEncodeError:\r\n sql = sql.encode('utf-8', 'surrogatepass').decode('utf-8')\r\n if parameters is not None:\r\n parameters = parameters.encode('utf-8', 'surrogatepass').decode('utf-8')\r\n return self.execute(sql, parameters)\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/507/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1436539554, "node_id": "I_kwDOCGYnMM5Vn9qi", "number": 511, "title": "[insert_all, upsert_all] IntegrityError: constraint failed", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2022-11-04T19:21:48Z", "updated_at": "2022-11-04T22:59:54Z", "closed_at": "2022-11-04T22:54:09Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "My understand is that `INSERT OR IGNORE` will ignore when inserts would cause duplicate keys so I'm not sure exactly why the error is raised from `sqlite3`.\r\n\r\n```\r\nimport argparse\r\nfrom pathlib import Path\r\n\r\nfrom xklb import db, utils\r\nfrom xklb.utils import log\r\n\r\n\r\ndef parse_args() -> argparse.Namespace:\r\n parser = argparse.ArgumentParser()\r\n parser.add_argument(\"database\")\r\n parser.add_argument(\"dbs\", nargs=\"*\")\r\n parser.add_argument(\"--upsert\")\r\n parser.add_argument(\"--db\", \"-db\", help=argparse.SUPPRESS)\r\n parser.add_argument(\"--verbose\", \"-v\", action=\"count\", default=0)\r\n args = parser.parse_args()\r\n\r\n if args.db:\r\n args.database = args.db\r\n Path(args.database).touch()\r\n args.db = db.connect(args)\r\n log.info(utils.dict_filter_bool(args.__dict__))\r\n\r\n return args\r\n\r\n\r\ndef merge_db(args, source_db):\r\n source_db = str(Path(source_db).resolve())\r\n\r\n s_db = db.connect(argparse.Namespace(database=source_db, verbose=args.verbose))\r\n for table in [s for s in s_db.table_names() if not \"_fts\" in s and not s.startswith(\"sqlite_\")]:\r\n log.info(\"[%s]: %s\", source_db, table)\r\n with s_db.conn:\r\n data = s_db[table].rows\r\n\r\n with args.db.conn:\r\n if args.upsert:\r\n args.db[table].upsert_all(data, pk=args.upsert.split(\",\"), alter=True)\r\n else:\r\n args.db[table].insert_all(data, alter=True, replace=True)\r\n\r\n\r\ndef merge_dbs():\r\n args = parse_args()\r\n for s_db in args.dbs:\r\n merge_db(args, s_db)\r\n\r\n\r\nif __name__ == \"__main__\":\r\n merge_dbs()\r\n\r\n```\r\n\r\n```\r\n$ lb-dev merge video.db tube_71.db --upsert path -vv\r\nSQL: INSERT OR IGNORE INTO [media]([path]) VALUES(?); - params: ['https://archive.org/details/088ghostofachanceroygetssackedrevengeofthelivinglunchdvdripxvidphz']\r\n...\r\nFile ~/.local/lib/python3.10/site-packages/sqlite_utils/db.py:3122, in Table.insert_all(self, records, pk, foreign_keys, column_order, not_null, defaults, batch_size, hash_id, hash_id_columns, alter, ignore, replace, truncate, extracts, conversions, columns, upsert, analyze)\r\n 3116 all_columns += [\r\n 3117 column for column in record if column not in all_columns\r\n 3118 ]\r\n 3120 first = False\r\n-> 3122 self.insert_chunk(\r\n 3123 alter,\r\n 3124 extracts,\r\n 3125 chunk,\r\n 3126 all_columns,\r\n 3127 hash_id,\r\n 3128 hash_id_columns,\r\n 3129 upsert,\r\n 3130 pk,\r\n 3131 conversions,\r\n 3132 num_records_processed,\r\n 3133 replace,\r\n 3134 ignore,\r\n 3135 )\r\n 3137 if analyze:\r\n 3138 self.analyze()\r\n\r\nFile ~/.local/lib/python3.10/site-packages/sqlite_utils/db.py:2887, in Table.insert_chunk(self, alter, extracts, chunk, all_columns, hash_id, hash_id_columns, upsert, pk, conversions, num_records_processed, replace, ignore)\r\n 2885 for query, params in queries_and_params:\r\n 2886 try:\r\n-> 2887 result = self.db.execute(query, params)\r\n 2888 except OperationalError as e:\r\n 2889 if alter and (\" column\" in e.args[0]):\r\n 2890 # Attempt to add any missing columns, then try again\r\n\r\nFile ~/.local/lib/python3.10/site-packages/sqlite_utils/db.py:484, in Database.execute(self, sql, parameters)\r\n 482 self._tracer(sql, parameters)\r\n 483 if parameters is not None:\r\n--> 484 return self.conn.execute(sql, parameters)\r\n 485 else:\r\n 486 return self.conn.execute(sql)\r\n\r\nIntegrityError: constraint failed\r\n> /home/xk/.local/lib/python3.10/site-packages/sqlite_utils/db.py(484)execute()\r\n 482 self._tracer(sql, parameters)\r\n 483 if parameters is not None:\r\n--> 484 return self.conn.execute(sql, parameters)\r\n 485 else:\r\n 486 return self.conn.execute(sql)\r\n```\r\n\r\n```\r\nsqlite3 --version\r\n3.36.0 2021-06-18 18:36:39\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/511/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1560651350, "node_id": "I_kwDOCGYnMM5dBaZW", "number": 523, "title": "Feature request: trim all leading and trailing white space for all columns for all tables in a database", "user": {"value": 536941, "label": "fgregg"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-01-28T02:40:10Z", "updated_at": "2023-01-28T02:41:14Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "It's pretty common that i need to trim leading or trailing white space from lots of columns in a database a part of an initial ETL.\r\n\r\nI use the following recipe a lot, and it would be great to include this functionality into sqlite-utils\r\n\r\n`trimify.sql`\r\n```sql\r\nselect 'select group_concat(''update [' || name || '] set ['' || name || ''] = trim(['' || name || ''])'', '';\r\n'') || '';\r\n'' as sql_to_run from pragma_table_info('''||name||''');' from sqlite_schema;\r\n```\r\n\r\nthen something like:\r\n\r\n```bash\r\n\tsqlite3 example.db < scripts/trimify.sql > table_trim.sql && \\\r\n sqlite3 $example.db < table_trim.sql > trim.sql && \\\r\n sqlite3 $example.db < trim.sql\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/523/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1575131737, "node_id": "I_kwDOCGYnMM5d4ppZ", "number": 525, "title": "Repeated calls to `Table.convert()` fail", "user": {"value": 167893, "label": "mcarpenter"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2023-02-07T22:40:47Z", "updated_at": "2023-05-08T21:59:41Z", "closed_at": "2023-05-08T21:54:02Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "## Summary\r\nWhen using the API, repeated calls to `Table.convert()` do not work correctly since all conversions quietly use the callable (function, lambda) from the first call to `convert()` only. Subsequent invocations with different callables use the callable from the first invocation only.\r\n\r\n## Example\r\n```python\r\nfrom sqlite_utils import Database\r\n\r\ndb = Database(memory=True)\r\ntable = db['table']\r\ncol = 'x'\r\ntable.insert_all([{col: 1}])\r\nprint(table.get(1))\r\n\r\ntable.convert(col, lambda x: x*2)\r\nprint(table.get(1))\r\n\r\ndef zeroize(x):\r\n return 0\r\n#zeroize = lambda x: 0\r\n#zeroize.__name__ = 'zeroize'\r\ntable.convert(col, zeroize)\r\nprint(table.get(1))\r\n```\r\n\r\nOutput:\r\n```\r\n{'x': 1}\r\n{'x': 2}\r\n{'x': 4}\r\n```\r\nExpected:\r\n```\r\n{'x': 1}\r\n{'x': 2}\r\n{'x': 0}\r\n```\r\n\r\n## Explanation\r\nThis is some relevant [documentation](https://github.com/simonw/sqlite-utils/blob/1491b66dd7439dd87cd5cd4c4684f46eb3c5751b/docs/python-api.rst#registering-custom-sql-functions:~:text=By%20default%20registering%20a%20function%20with%20the%20same%20name%20and%20number%20of%20arguments%20will%20have%20no%20effect).\r\n\r\n * `Table.convert()` takes a `Callable` to perform data conversion on a column\r\n * The `Callable` is passed to `Database.register_function()`\r\n * `Database.register_function()` uses the callable's `__name__` attribute for registration\r\n * (Aside: all lambdas have a `__name__` of ``: I thought this was the problem, and it was close, but not quite)\r\n * However `convert()` first wraps the callable by local function [`convert_value()`](https://github.com/simonw/sqlite-utils/blob/fc221f9b62ed8624b1d2098e564f525c84497969/sqlite_utils/db.py#L2661)\r\n * Consequently `register_function()` sees name `convert_value` for all invocations from `convert()`\r\n * `register_function()` silently ignores registrations using the same name, retaining only the first such registration\r\n\r\nThere's a mismatch between the comments and the code: https://github.com/simonw/sqlite-utils/blob/fc221f9b62ed8624b1d2098e564f525c84497969/sqlite_utils/db.py#L404\r\n\r\nbut actually the existing function is returned/used instead (as the \"registering custom sql functions\" doc I linked above says too). Seems like this can be rectified to match the comment?\r\n\r\n## Suggested fix\r\nI think there are four things:\r\n1. The call to `register_function()` from `convert()`should have an explicit `name=` parameter (to continue using `convert_value()` and the progress bar).\r\n2. For functions, this name can be the real function name. (I understand the sqlite api needs a name, and it's nice if those are recognizable names where possible). For lambdas would `'lambda-{uuid}'` or similar be acceptable? \r\n3. `register_function()` really should throw an error on repeated attempts to register a duplicate (function, arity)-pair.\r\n4. A test? I haven't looked at the test framework here but seems this should be testable.\r\n\r\n## See also \r\n- #458 ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/525/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1578790070, "node_id": "I_kwDOCGYnMM5eGmy2", "number": 527, "title": "`Table.convert()` skips falsey values", "user": {"value": 167893, "label": "mcarpenter"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2023-02-10T00:00:52Z", "updated_at": "2023-05-09T21:15:05Z", "closed_at": "2023-05-08T21:03:24Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "# Summary\r\n\r\nBy design, `Table.convert()` does [not attempt](https://github.com/simonw/sqlite-utils/blob/fc221f9b62ed8624b1d2098e564f525c84497969/sqlite_utils/db.py#L2663) conversion of falsey values (`None`, `\"\"`, `0`, ...). This is surprising (directly contradicts the docstring) and `convert()` may quietly skip cells where the user assumed a conversion would take place. \r\n\r\n# Example\r\nIncrement a column of integers by one\r\n\r\n``` python\r\nfrom sqlite_utils import Database\r\n\r\ndb = Database(memory=True)\r\ntable = db['table']\r\ncol = 'x'\r\ntable.insert_all([{col: 0}, {col:1}])\r\nprint(table.get(1)) # 0\r\nprint(table.get(2)) # 1\r\nprint()\r\n\r\ntable.convert(col, lambda x: x+1)\r\nprint(table.get(1)) # got 0, expected 1 \u26a0\u26a0\u26a0\r\nprint(table.get(2)) # got 2, expected 2\r\n```\r\n\r\nAnother example might be, say, transforming cells containing empty string to `NULL`.\r\n\r\n# Discussion\r\n\r\nThis was, I think, a pragmatic choice so that consumers can skip writing guard clauses for these falsey values (particularly from the CLI). But this surprising undocumented behavior can lead to incorrect data. I don't think this is a good trade-off between convenience and correctness.\r\n\r\nIn the absence of this convenience users will either have to write guard clauses into their conversion expressions (or adapt the called function to do the same), so: \r\n``` python\r\n fn(value) if value else value\r\n```\r\ninstead of:\r\n``` python\r\n fn(value)\r\n```\r\nThis is more typing and sometimes I will forget, and there will be errors. (But they will be noisy errors, which is a good thing).\r\n\r\nSuch a change will certainly inconvenience some existing consumers; there will be some breakage. But I think this is worth it to avoid quietly not converting some values by default, which can lead to quietly bad data.\r\n\r\nI have a PR that I will attach, please take a look and see what you think.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/527/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1581090327, "node_id": "I_kwDOCGYnMM5ePYYX", "number": 529, "title": "Microsoft line endings", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-02-12T02:20:48Z", "updated_at": "2023-06-14T23:12:12Z", "closed_at": "2023-06-14T23:11:47Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "sqlite-utils prints `\\r\\n` but [it should probably](https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/) print `\\n` (unless the platform is detected as Windows?)\r\n\r\nIt has tripped me up a few times when piping the output of sqlite-utils to other programs:\r\n\r\n```\r\n$ sqlite-utils --no-headers --csv ~/lb/fs/d.db 'select path from media limit 1' | cat -A\r\n/mnt/d7/file^M$\r\n$ sqlite-utils --no-headers --csv ~/lb/fs/d.db 'select path from media limit 1' | tr -d '\\r' | cat -A\r\n/mnt/d7/file$\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/529/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1595340692, "node_id": "I_kwDOCGYnMM5fFveU", "number": 530, "title": "add ability to configure \"on delete\" and \"on update\" attributes of foreign keys:", "user": {"value": 536941, "label": "fgregg"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2023-02-22T15:44:14Z", "updated_at": "2023-05-08T20:39:01Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "sqlite supports these, and it would be quite nice to be able to add them with sqlite-utils.\r\n\r\nhttps://www.sqlite.org/foreignkeys.html#fk_actions", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/530/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1655860104, "node_id": "I_kwDOCGYnMM5ismuI", "number": 535, "title": "rows: --transpose or psql extended view-like functionality", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2023-04-05T15:37:33Z", "updated_at": "2023-06-15T08:39:49Z", "closed_at": "2023-06-14T22:05:28Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "It would be nice if the rows subcommand had a flag, perhaps called `--transpose` which would print in long form instead of wide. Similar to extended display mode in psql (`\\x`)\r\n\r\nIn other words instead of this:\r\n\r\n```\r\nsqlite-utils rows --limit 5 --fmt github track_metadata.db songs\r\n```\r\n\r\n| track_id | title | song_id | release | artist_id | artist_mbid | artist_name | duration | artist_familiarity | artist_hotttnesss | year | track_7digitalid | shs_perf | shs_work |\r\n|--------------------|-------------------|--------------------|--------------------------------------|--------------------|--------------------------------------|------------------|------------|----------------------|---------------------|--------|--------------------|------------|------------|\r\n| TRMMMYQ128F932D901 | Silent Night | SOQMMHC12AB0180CB8 | Monster Ballads X-Mas | ARYZTJS1187B98C555 | 357ff05d-848a-44cf-b608-cb34b5701ae5 | Faster Pussy cat | 252.055 | 0.649822 | 0.394032 | 2003 | 7032331 | -1 | 0 |\r\n| TRMMMKD128F425225D | Tanssi vaan | SOVFVAK12A8C1350D9 | Karkuteill\u00e4 | ARMVN3U1187FB3A1EB | 8d7ef530-a6fd-4f8f-b2e2-74aec765e0f9 | Karkkiautomaatti | 156.551 | 0.439604 | 0.356992 | 1995 | 1514808 | -1 | 0 |\r\n| TRMMMRX128F93187D9 | No One Could Ever | SOGTUKN12AB017F4F1 | Butter | ARGEKB01187FB50750 | 3d403d44-36ce-465c-ad43-ae877e65adc4 | Hudson Mohawke | 138.971 | 0.643681 | 0.437504 | 2006 | 6945353 | -1 | 0 |\r\n| TRMMMCH128F425532C | Si Vos Quer\u00e9s | SOBNYVR12A8C13558C | De Culo | ARNWYLR1187B9B2F9C | 12be7648-7094-495f-90e6-df4189d68615 | Yerba Brava | 145.058 | 0.448501 | 0.372349 | 2003 | 2168257 | -1 | 0 |\r\n| TRMMMWA128F426B589 | Tangle Of Aspens | SOHSBXH12A8C13B0DF | Rene Ablaze Presents Winter Sessions | AREQDTE1269FB37231 | | Der Mystic | 514.298 | 0 | 0 | 0 | 2264873 | -1 | 0 |\r\n\r\n\r\nThe output would look something like this:\r\n\r\n```\r\n$ for col in (sqlite-columns track_metadata.db songs)\r\n sqlite-utils --fmt github track_metadata.db \"select $col from songs order by rowid desc limit 5\"\r\nend\r\n```\r\n\r\n| track_id |\r\n|--------------------|\r\n| TRYYYVU12903CD01E3 |\r\n| TRYYYDJ128F9310A21 |\r\n| TRYYYMG128F4260ECA |\r\n| TRYYYJO128F426DA37 |\r\n| TRYYYUS12903CD2DF0 |\r\n| title |\r\n|-------------------------------------|\r\n| Fernweh feat. Sektion Kuchik\u00e4schtli |\r\n| Faraday |\r\n| Novemba |\r\n| Jago Chhadeo |\r\n| O Samba Da Vida |\r\n| song_id |\r\n|--------------------|\r\n| SOWXJXQ12AB0189F43 |\r\n| SOLXGOR12A81C21EB7 |\r\n| SOHODZI12A8C137BB3 |\r\n| SOXQYIQ12A8C137FBB |\r\n| SOTXAME12AB018F136 |\r\n| release |\r\n|---------------------------------|\r\n| So Oder So |\r\n| The Trance Collection Vol. 2 |\r\n| Dub_Connected: electronic music |\r\n| Naale Baba Lassi Pee Gya |\r\n| Pacha V.I.P. |\r\n| artist_id |\r\n|--------------------|\r\n| AR7PLM21187B990D08 |\r\n| ARCMCOK1187B9B1073 |\r\n| ARZ3R6M1187B9AF750 |\r\n| ART5FZD1187B9A7FCF |\r\n| AR7Z4J81187FB3FC59 |\r\n| artist_mbid |\r\n|--------------------------------------|\r\n| 3af2b07e-c91c-4160-9bda-f0b9e3144ed3 |\r\n| 4ac5f3de-c5ad-475e-ad50-41f1ef9dba20 |\r\n| 8b97e9c8-61f5-4615-9a96-276f24204e34 |\r\n| 2357c400-9109-42b6-b3fe-9e2d9f8e3872 |\r\n| 9d50cb20-7e42-45cc-b0dd-154c3e92a577 |\r\n| artist_name |\r\n|----------------|\r\n| Texta |\r\n| Elude |\r\n| Gabriel Le Mar |\r\n| Kuldeep Manak |\r\n| Kiko Navarro |\r\n| duration |\r\n|------------|\r\n| 295.079 |\r\n| 484.519 |\r\n| 553.038 |\r\n| 244.166 |\r\n| 217.443 |\r\n| artist_familiarity |\r\n|----------------------|\r\n| 0.552977 |\r\n| 0.403668 |\r\n| 0.556918 |\r\n| 0.4015 |\r\n| 0.528617 |\r\n| artist_hotttnesss |\r\n|---------------------|\r\n| 0.454869 |\r\n| 0.256935 |\r\n| 0.336914 |\r\n| 0.374866 |\r\n| 0.411595 |\r\n| year |\r\n|--------|\r\n| 2004 |\r\n| 0 |\r\n| 0 |\r\n| 0 |\r\n| 0 |\r\n| track_7digitalid |\r\n|--------------------|\r\n| 8486723 |\r\n| 5472456 |\r\n| 2219291 |\r\n| 1632096 |\r\n| 7522478 |\r\n| shs_perf |\r\n|------------|\r\n| -1 |\r\n| -1 |\r\n| -1 |\r\n| -1 |\r\n| -1 |\r\n| shs_work |\r\n|------------|\r\n| 0 |\r\n| 0 |\r\n| 0 |\r\n| 0 |\r\n| 0 |\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/535/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1740026046, "node_id": "I_kwDOCGYnMM5ntrC-", "number": 556, "title": "Support storing incrementally piped values", "user": {"value": 601708, "label": "mcint"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-06-04T00:45:23Z", "updated_at": "2023-06-04T01:21:15Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "I'm trying to use sqlite-utils to data generated incrementally. There are a few\naspects of this that I don't currently know how to handle. I would like an option\nto apply writes incrementally, line-by-line as they are received. I would like an\noption to echo incremental progress. And, it would be nice to have \n\nIn particular, I'm using CoreLocationCLI -w -j to generate, newline-delimited JSON.\n\nOne variant of the command \n\n`stdbuf -oL CoreLocationCLI -w -j | pee 'sqlite-utils insert loc.db loc -' nl`\n\n`pee`, from `moreutils`, is like `tee` but spawns and pipes to the processes\ncreated by invoking each of its arguments, so, for gratuitous demonstration,\n`pee 'sponge out.log' cat` would behave like `tee`.\n\nIt looks like I can get what I want with:\n`stdbuf -oL CoreLocationCLI -w -j | while read line; do <<<\"$line\" sqlite-utils insert loc.db loc -; echo \"$line\"; done | nl`\n\n\n\n\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/556/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1740150327, "node_id": "I_kwDOCGYnMM5nuJY3", "number": 557, "title": "Aliased ROWID option for tables created from alter=True commands", "user": {"value": 7908073, "label": "chapmanjacobd"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2023-06-04T05:29:28Z", "updated_at": "2023-06-14T06:09:21Z", "closed_at": "2023-06-05T19:26:26Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "> If you use INTEGER PRIMARY KEY column, the VACUUM does not change the values of that column. However, if you use unaliased rowid, the VACUUM command will reset the rowid values.\r\n\r\nROWID should never be used with foreign keys but the simple act of aliasing rowid to id (which is what happens when one does `id integer primary key` DDL) makes it OK.\r\n\r\nIt would be convenient if there were more options to use a string column (eg. filepath) as the PK, and be able to use it during upserts, but when creating a foreign key, to create an integer column which aliases rowid\r\n\r\nI made an attempt to switch to integer primary keys here but it is not going well... In my usecase the path column is a business key. Yes, it should be as simple as including the `id` column in any select statement where I plan on using `upsert` but it would be nice if this could be abstracted away somehow https://github.com/chapmanjacobd/library/commit/788cd125be01d76f0fe2153335d9f6b21db1343c\r\n\r\nhttps://github.com/chapmanjacobd/library/actions/runs/5173602136/jobs/9319024777", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/557/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1801394744, "node_id": "I_kwDOCGYnMM5rXxo4", "number": 567, "title": "Plugin system", "user": {"value": 15178711, "label": "asg017"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 9, "created_at": "2023-07-12T17:02:14Z", "updated_at": "2023-07-22T22:59:37Z", "closed_at": "2023-07-22T22:59:36Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "I'd like there to be a plugin system for sqlite-utils, similar to the datasette/llm plugins. I'd like to make plugins that would do things like:\r\n\r\n- Register SQLite extensions for more SQL functions + virtual tables\r\n- Register new subcommands\r\n- Different input file formats for `sqlite-utils memory`\r\n- Different output file formats (in addition to `--csv` `--tsv` `--nl` etc.\r\n\r\nA few real-world use-cases of plugins I'd like to see in sqlite-utils:\r\n\r\n- Register many of my sqlite extensions in sqlite-utils (`sqlite-http`, `sqlite-lines`, `sqlite-regex`, etc.)\r\n- New subcommands to work with `sqlite-vss` vector tables\r\n- Input/ouput Parquet/Avro/Arrow IPC files with `sqlite-arrow`", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/567/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1821108702, "node_id": "I_kwDOCGYnMM5si-ne", "number": 579, "title": "Special handling for SQLite column of type `JSON`", "user": {"value": 15178711, "label": "asg017"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2023-07-25T20:37:23Z", "updated_at": "2023-07-25T20:37:23Z", "closed_at": null, "author_association": "CONTRIBUTOR", "pull_request": null, "body": "`sqlite-utils` should detect and have specially handling for column with a `JSON` column. For example:\r\n\r\n```sql\r\nCREATE TABLE \"dogs\" (\r\n id INTEGER PRIMARY KEY,\r\n name TEXT,\r\n friends JSON \r\n);\r\n```\r\n\r\n## Automatic Nesting\r\n\r\nAccording to [\"Nested JSON Values\"](https://sqlite-utils.datasette.io/en/stable/cli.html#nested-json-values), sqlite-utils will only expand JSON if the `--json-cols` flag is passed. It looks like it'll try to `json.load` all text column to test if its JSON, which can get expensive on non-json columns. \r\n\r\nInstead, `sqlite-utils` should be default (ie without the `--json-cols` flags) do the `maybe_json()` operation on columns with a declared `JSON` type. So the above table would expand the `\"friends\"` column as expected, withoutthe `--json-cols` flag:\r\n\r\n```bash\r\nsqlite-utils dogs.db \"select * from dogs\" | python -mjson.tool\r\n```\r\n\r\n```\r\n[\r\n {\r\n \"id\": 1,\r\n \"name\": \"Cleo\",\r\n \"friends\": [\r\n {\r\n \"name\": \"Pancakes\"\r\n },\r\n {\r\n \"name\": \"Bailey\"\r\n }\r\n ]\r\n }\r\n]\r\n```\r\n\r\n---\r\n\r\nI'm sure there's other ways `sqlite-utils` can specially handle JSON columns, so keeping this open while I think of more", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/579/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1976986318, "node_id": "I_kwDOCGYnMM511mrO", "number": 599, "title": "Cannot find spatialite on arm64 linux", "user": {"value": 37802088, "label": "MikeCoats"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2023-11-03T22:05:51Z", "updated_at": "2023-11-04T01:06:31Z", "closed_at": "2023-11-04T00:33:28Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Initially, I found an issue in `datasette` where it wouldn\u2019t find `spatialite` when running on my Radxa Rock 5B - an RK3588 powered SBC, running the arm64 build of Debian Bullseye. I confirmed the same behaviour on my Raspberry Pi 4 - a BCM2711 powered SBC, running the arm64 build of Debian Bookworm.\r\n\r\n```\r\n$ datasette --load-extension=spatialite example.db\r\nError: Could not find SpatiaLite extension\r\n```\r\n\r\nI did some digging and realised the issue originates in this project. Even with the `libsqlite3-mod-spatialite` package installed, `pytest` skips all of the GIS tests in the project.\r\n\r\n```\r\n$ apt list --installed | grep spatial\r\n[\u2026]\r\nlibsqlite3-mod-spatialite/stable,now 5.0.1-3 arm64 [installed]\r\n\r\n$ ls -l /usr/lib/*/*spatial*\r\nlrwxrwxrwx 1 root root 23 Dec 1 2022 /usr/lib/aarch64-linux-gnu/mod_spatialite.so -> mod_spatialite.so.7.1.0\r\nlrwxrwxrwx 1 root root 23 Dec 1 2022 /usr/lib/aarch64-linux-gnu/mod_spatialite.so.7 -> mod_spatialite.so.7.1.0\r\n-rw-r--r-- 1 root root 7348584 Dec 1 2022 /usr/lib/aarch64-linux-gnu/mod_spatialite.so.7.1.0\r\n```\r\n\r\n```\r\n$ pytest\r\ntests/test_get.py ...... [ 73%]\r\ntests/test_gis.py ssssssssssss [ 75%]\r\ntests/test_hypothesis.py .... [ 75%]\r\n```\r\n\r\nI tracked the issue down to the [`find_sqlite()` function in the `utils.py`](https://github.com/simonw/sqlite-utils/blob/622c3a5a7dd53a09c029e2af40c2643fe7579340/sqlite_utils/utils.py#L60) file. The [`SPATIALITE_PATHS`](https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/utils.py#L34-L39) array doesn\u2019t have an entry for the location of this module on arm64 linux.\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/599/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 403922644, "node_id": "MDU6SXNzdWU0MDM5MjI2NDQ=", "number": 8, "title": "Problems handling column names containing spaces or - ", "user": {"value": 82988, "label": "psychemedia"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2019-01-28T17:23:28Z", "updated_at": "2019-04-14T15:29:33Z", "closed_at": "2019-02-23T21:09:03Z", "author_association": "NONE", "pull_request": null, "body": "Irrrespective of whether using column names containing a space or - character is good practice, SQLite does allow it, but `sqlite-utils` throws an error in the following cases:\r\n\r\n```python\r\nfrom sqlite_utils import Database\r\n\r\ndbname = 'test.db'\r\nDB = Database(sqlite3.connect(dbname))\r\n\r\nimport pandas as pd\r\ndf = pd.DataFrame({'col1':range(3), 'col2':range(3)})\r\n\r\n#Convert pandas dataframe to appropriate list/dict format\r\nDB['test1'].insert_all( df.to_dict(orient='records') )\r\n#Works fine\r\n```\r\n\r\nHowever:\r\n\r\n```python\r\ndf = pd.DataFrame({'col 1':range(3), 'col2':range(3)})\r\nDB['test1'].insert_all(df.to_dict(orient='records'))\r\n```\r\n\r\nthrows:\r\n\r\n```\r\n---------------------------------------------------------------------------\r\nOperationalError Traceback (most recent call last)\r\n in ()\r\n 1 import pandas as pd\r\n 2 df = pd.DataFrame({'col 1':range(3), 'col2':range(3)})\r\n----> 3 DB['test1'].insert_all(df.to_dict(orient='records'))\r\n\r\n/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, upsert, batch_size, column_order)\r\n 327 jsonify_if_needed(record.get(key, None)) for key in all_columns\r\n 328 )\r\n--> 329 result = self.db.conn.execute(sql, values)\r\n 330 self.db.conn.commit()\r\n 331 self.last_id = result.lastrowid\r\n\r\nOperationalError: near \"1\": syntax error\r\n```\r\n\r\nand:\r\n\r\n```python\r\ndf = pd.DataFrame({'col-1':range(3), 'col2':range(3)})\r\nDB['test1'].upsert_all(df.to_dict(orient='records'))\r\n```\r\n\r\nresults in:\r\n\r\n```\r\n---------------------------------------------------------------------------\r\nOperationalError Traceback (most recent call last)\r\n in ()\r\n 1 import pandas as pd\r\n 2 df = pd.DataFrame({'col-1':range(3), 'col2':range(3)})\r\n----> 3 DB['test1'].insert_all(df.to_dict(orient='records'))\r\n\r\n/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, upsert, batch_size, column_order)\r\n 327 jsonify_if_needed(record.get(key, None)) for key in all_columns\r\n 328 )\r\n--> 329 result = self.db.conn.execute(sql, values)\r\n 330 self.db.conn.commit()\r\n 331 self.last_id = result.lastrowid\r\n\r\nOperationalError: near \"-\": syntax error\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/8/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 411066700, "node_id": "MDU6SXNzdWU0MTEwNjY3MDA=", "number": 10, "title": "Error in upsert if column named 'order'", "user": {"value": 82988, "label": "psychemedia"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2019-02-16T12:05:18Z", "updated_at": "2019-02-24T16:55:38Z", "closed_at": "2019-02-24T16:55:37Z", "author_association": "NONE", "pull_request": null, "body": "The following works fine:\r\n```\r\nconnX = sqlite3.connect('DELME.db', timeout=10)\r\n\r\ndfX=pd.DataFrame({'col1':range(3),'col2':range(3)})\r\nDBX = Database(connX)\r\nDBX['test'].upsert_all(dfX.to_dict(orient='records'))\r\n```\r\n\r\nBut if a column is named `order`:\r\n```\r\nconnX = sqlite3.connect('DELME.db', timeout=10)\r\n\r\ndfX=pd.DataFrame({'order':range(3),'col2':range(3)})\r\nDBX = Database(connX)\r\nDBX['test'].upsert_all(dfX.to_dict(orient='records'))\r\n```\r\n\r\nit throws an error:\r\n\r\n```\r\n---------------------------------------------------------------------------\r\nOperationalError Traceback (most recent call last)\r\n in \r\n 3 dfX=pd.DataFrame({'order':range(3),'col2':range(3)})\r\n 4 DBX = Database(connX)\r\n----> 5 DBX['test'].upsert_all(dfX.to_dict(orient='records'))\r\n\r\n/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in upsert_all(self, records, pk, foreign_keys, column_order)\r\n 347 foreign_keys=foreign_keys,\r\n 348 upsert=True,\r\n--> 349 column_order=column_order,\r\n 350 )\r\n 351 \r\n\r\n/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, upsert, batch_size, column_order)\r\n 327 jsonify_if_needed(record.get(key, None)) for key in all_columns\r\n 328 )\r\n--> 329 result = self.db.conn.execute(sql, values)\r\n 330 self.db.conn.commit()\r\n 331 self.last_id = result.lastrowid\r\n\r\nOperationalError: near \"order\": syntax error\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/10/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 432727685, "node_id": "MDU6SXNzdWU0MzI3Mjc2ODU=", "number": 20, "title": "JSON column values get extraneously quoted ", "user": {"value": 649467, "label": "mhalle"}, "state": "closed", "locked": 0, "assignee": null, "milestone": {"value": 4348046, "label": "1.0"}, "comments": 1, "created_at": "2019-04-12T20:15:30Z", "updated_at": "2019-05-25T00:57:19Z", "closed_at": "2019-05-25T00:57:19Z", "author_association": "NONE", "pull_request": null, "body": "If the input to `sqlite-utils insert` includes a column that is a JSON array or object, `sqlite-utils query` will introduce an extra level of quoting on output:\r\n\r\n```\r\n# echo '[{\"key\": [\"one\", \"two\", \"three\"]}]' | sqlite-utils insert t.db t -\r\n\r\n# sqlite-utils t.db 'select * from t'\r\n[{\"key\": \"[\\\"one\\\", \\\"two\\\", \\\"three\\\"]\"}]\r\n\r\n# sqlite3 t.db 'select * from t'\r\n[\"one\", \"two\", \"three\"]\r\n```\r\n\r\nThis might require an imperfect solution, since sqlite3 doesn't have a JSON type. Perhaps fields that start with `[\"` or `{\"` and end with `\"]` or `\"}` could be detected, with a flag to turn off that behavior for weird text fields (or vice versa).", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/20/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 449818897, "node_id": "MDU6SXNzdWU0NDk4MTg4OTc=", "number": 24, "title": "Additional Column Constraints?", "user": {"value": 98555, "label": "IgnoredAmbience"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 6, "created_at": "2019-05-29T13:47:03Z", "updated_at": "2019-06-13T06:47:17Z", "closed_at": "2019-06-13T06:30:26Z", "author_association": "NONE", "pull_request": null, "body": "I'm looking to import data from XML with a pre-defined schema that maps fairly closely to a relational database.\r\nIn particular, it has explicit annotations for when fields are required, optional, or when a default value should be inferred.\r\n\r\nWould there be value in adding the ability to define `NOT NULL` and `DEFAULT` column constraints to sqlite-utils?", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/24/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 480961330, "node_id": "MDU6SXNzdWU0ODA5NjEzMzA=", "number": 54, "title": "Ability to list views, and to access db[\"view_name\"].rows / rows_where / etc", "user": {"value": 20264, "label": "ftrain"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2019-08-15T02:00:28Z", "updated_at": "2019-08-23T12:41:09Z", "closed_at": "2019-08-23T12:20:15Z", "author_association": "NONE", "pull_request": null, "body": "The docs show me how to create a view via `db.create_view()` but I can't seem to get back to that view post-creation; if I query it as a table it returns `None`, and it doesn't appear in the table listing, even though querying the view works fine from inside the sqlite3 command-line.\r\n\r\nIt'd be great to have the view as a pseudo-table, or if the python/sqlite3 module makes that hard to pull off (I couldn't figure it out), to have that edge-case documented next to the `db.create_view()` docs.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/54/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 491219910, "node_id": "MDU6SXNzdWU0OTEyMTk5MTA=", "number": 61, "title": "importing CSV to SQLite as library", "user": {"value": 17739, "label": "witeshadow"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2019-09-09T17:12:40Z", "updated_at": "2019-11-04T16:25:01Z", "closed_at": "2019-11-04T16:25:01Z", "author_association": "NONE", "pull_request": null, "body": "CSV can be imported to SQLite when used CLI, but I don't see documentation for when using as library. ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/61/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 500783373, "node_id": "MDU6SXNzdWU1MDA3ODMzNzM=", "number": 62, "title": "[enhancement] Method to delete a row in python", "user": {"value": 4454869, "label": "Sergeileduc"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2019-10-01T09:45:47Z", "updated_at": "2019-11-04T16:30:34Z", "closed_at": "2019-11-04T16:18:18Z", "author_association": "NONE", "pull_request": null, "body": "Hi !\r\nThanks for the lib !\r\n\r\nObviously, every possible sql queries won't have a dedicated method.\r\n\r\nBut I was thinking : a method to delete a row (I'm terrible with names, maybe `delete_where()` or something, would be useful.\r\n\r\nI have a Database, with primary key.\r\n\r\nFor the moment, I use :\r\n\r\n```Python3\r\ndb.conn.execute(f\"DELETE FROM table WHERE key = {key_id}\")\r\ndb.conn.commit()\r\n```\r\nto delete a row I don't need anymore, giving his primary key.\r\n\r\nWorks like a charm.\r\n\r\nJust an idea :\r\n\r\n```Python3\r\ntable.delete_where_pkey({'key': key_id})\r\n```\r\nor something (I know, I'm terrible at naming methods...).\r\n\r\nPros : well, no need to write SQL query.\r\n\r\nCons : WHERE normally allows to do many more things (operators =, <>, >, <, BETWEEN), not to mention AND, OR, etc...\r\nMethod is maybe to specific, and/or a pain to render more flexible.\r\n\r\nAgain, just a thought. Writing his own sql works too, so...\r\n\r\nThanks again.\r\nSee yah.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/62/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 534507142, "node_id": "MDU6SXNzdWU1MzQ1MDcxNDI=", "number": 69, "title": "Feature request: enable extensions loading", "user": {"value": 30607, "label": "aborruso"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2019-12-08T08:06:25Z", "updated_at": "2022-02-05T00:04:25Z", "closed_at": "2020-10-16T18:42:49Z", "author_association": "NONE", "pull_request": null, "body": "Hi, it would be great to add a parameter that enables the load of a sqlite extension you need.\r\n\r\nSomething like \"-ext modspatialite\".\r\n\r\nIn this way your great tool would be even more comfortable and powerful.\r\n\r\n\r\nThank you very much", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/69/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 539204432, "node_id": "MDU6SXNzdWU1MzkyMDQ0MzI=", "number": 70, "title": "Implement ON DELETE and ON UPDATE actions for foreign keys", "user": {"value": 26292069, "label": "LucasElArruda"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2019-12-17T17:19:10Z", "updated_at": "2020-02-27T04:18:53Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Hi! I did not find any mention on the library about ON DELETE and ON UPDATE actions for foreign keys. Are those expected to be implemented? If not, it would be a nice thing to include!", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/70/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 545407916, "node_id": "MDU6SXNzdWU1NDU0MDc5MTY=", "number": 73, "title": "upsert_all() throws issue when upserting to empty table", "user": {"value": 82988, "label": "psychemedia"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 6, "created_at": "2020-01-05T11:58:57Z", "updated_at": "2020-01-31T14:21:09Z", "closed_at": "2020-01-05T17:20:18Z", "author_association": "NONE", "pull_request": null, "body": "If I try to add a list of `dict`s to an empty table using `upsert_all`, I get an error:\r\n\r\n```python\r\nimport sqlite3\r\nfrom sqlite_utils import Database\r\nimport pandas as pd\r\n\r\nconx = sqlite3.connect(':memory')\r\ncx = conx.cursor()\r\ncx.executescript('CREATE TABLE \"test\" (\"Col1\" TEXT);')\r\n\r\nq=\"SELECT * FROM test;\"\r\npd.read_sql(q, conx) #shows empty table\r\n\r\ndb = Database(conx)\r\ndb['test'].upsert_all([{'Col1':'a'},{'Col1':'b'}])\r\n\r\n---------------------------------------------------------------------------\r\nTypeError Traceback (most recent call last)\r\n in \r\n 1 db = Database(conx)\r\n----> 2 db['test'].upsert_all([{'Col1':'a'},{'Col1':'b'}])\r\n\r\n/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in upsert_all(self, records, pk, foreign_keys, column_order, not_null, defaults, batch_size, hash_id, alter, extracts)\r\n 1157 alter=alter,\r\n 1158 extracts=extracts,\r\n-> 1159 upsert=True,\r\n 1160 )\r\n 1161 \r\n\r\n/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, column_order, not_null, defaults, batch_size, hash_id, alter, ignore, replace, extracts, upsert)\r\n 1040 sql = \"INSERT OR IGNORE INTO [{table}]({pks}) VALUES({pk_placeholders});\".format(\r\n 1041 table=self.name,\r\n-> 1042 pks=\", \".join([\"[{}]\".format(p) for p in pks]),\r\n 1043 pk_placeholders=\", \".join([\"?\" for p in pks]),\r\n 1044 )\r\n\r\nTypeError: 'NoneType' object is not iterable\r\n\r\n```\r\n\r\nA hacky workaround in use is:\r\n\r\n```python\r\ntry:\r\n db['test'].upsert_all([{'Col1':'a'},{'Col1':'b'}])\r\nexcept:\r\n db['test'].insert_all([{'Col1':'a'},{'Col1':'b'}])\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/73/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 549287310, "node_id": "MDU6SXNzdWU1NDkyODczMTA=", "number": 76, "title": "order_by mechanism", "user": {"value": 10501166, "label": "metab0t"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2020-01-14T02:06:03Z", "updated_at": "2020-04-16T06:23:29Z", "closed_at": "2020-04-16T03:13:06Z", "author_association": "NONE", "pull_request": null, "body": "In some cases, I want to iterate rows in a table with `ORDER BY` clause. It would be nice to have a `rows_order_by` function similar to `rows_where`.\r\nIn a more general case, `rows_filter` function might be added to allow more customized filtering to iterate rows.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/76/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 559197745, "node_id": "MDU6SXNzdWU1NTkxOTc3NDU=", "number": 82, "title": "Tutorial command no longer works", "user": {"value": 10350886, "label": "petey284"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2020-02-03T16:36:11Z", "updated_at": "2020-02-27T04:16:43Z", "closed_at": "2020-02-27T04:16:30Z", "author_association": "NONE", "pull_request": null, "body": "Issue with command on [tutorial](https://simonwillison.net/2019/Feb/25/sqlite-utils/) on Simon's site.\r\n\r\nThe following command no longer works, and breaks with the previous too many variables error: #50\r\n\r\n``` cmd\r\n> curl \"https://data.nasa.gov/resource/y77d-th95.json\" | \\\r\n sqlite-utils insert meteorites.db meteorites - --pk=id\r\n```\r\n\r\nOutput:\r\n``` cmd\r\nTraceback (most recent call last):\r\n File \"continuum\\miniconda3\\envs\\main\\lib\\runpy.py\", line 193, in _run_module_as_main\r\n \"__main__\", mod_spec)\r\n File \"continuum\\miniconda3\\envs\\main\\lib\\runpy.py\", line 85, in _run_code\r\n exec(code, run_globals)\r\n File \"Continuum\\miniconda3\\envs\\main\\Scripts\\sqlite-utils.exe\\__main__.py\", line 9, in \r\n File \"continuum\\miniconda3\\envs\\main\\lib\\site-packages\\click\\core.py\", line 764, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"continuum\\miniconda3\\envs\\main\\lib\\site-packages\\click\\core.py\", line 717, in main\r\n rv = self.invoke(ctx)\r\n File \"continuum\\miniconda3\\envs\\main\\lib\\site-packages\\click\\core.py\", line 1137, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"continuum\\miniconda3\\envs\\main\\lib\\site-packages\\click\\core.py\", line 956, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"continuum\\miniconda3\\envs\\main\\lib\\site-packages\\click\\core.py\", line 555, in invoke\r\n return callback(*args, **kwargs)\r\n File \"continuum\\miniconda3\\envs\\main\\lib\\site-packages\\sqlite_utils\\cli.py\", line 434, in insert\r\n default=default,\r\n File \"continuum\\miniconda3\\envs\\main\\lib\\site-packages\\sqlite_utils\\cli.py\", line 384, in insert_upsert_implementation\r\n docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs\r\n File \"continuum\\miniconda3\\envs\\main\\lib\\site-packages\\sqlite_utils\\db.py\", line 1081, in insert_all\r\n result = self.db.conn.execute(query, params)\r\nsqlite3.OperationalError: too many SQL variables\r\n```\r\n\r\nMy thought is that maybe the dataset grew over the last few years and so didn't run into this issue before.\r\n\r\nNo error when I reduce the count of entries to 83. Once the number of entries hits 84 the command fails.\r\n\r\n// This passes\r\n``` cmd\r\ntype meteorite_83.txt | sqlite-utils insert meteorites.db meteorites - --pk=id\r\n```\r\n\r\n// But this fails\r\n``` cmd\r\ntype meteorite_84.txt | sqlite-utils insert meteorites.db meteorites - --pk=id\r\n```\r\n\r\nA potential fix might be to chunk the incoming data? I can work on a PR if pointed in right direction.\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/82/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 564579430, "node_id": "MDU6SXNzdWU1NjQ1Nzk0MzA=", "number": 86, "title": "Problem with square bracket in CSV column name", "user": {"value": 8149512, "label": "foscoj"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 7, "created_at": "2020-02-13T10:19:57Z", "updated_at": "2020-02-27T04:16:08Z", "closed_at": "2020-02-27T04:16:07Z", "author_association": "NONE", "pull_request": null, "body": "testing some data from european power information (entsoe.eu), the title of the csv contains square brackets.\r\nas I am playing with glitch, sqlite-utils are used for creating the db.\r\n\r\nTraceback (most recent call last):\r\n\r\n File \"/app/.local/bin/sqlite-utils\", line 8, in \r\n\r\n sys.exit(cli())\r\n\r\n File \"/app/.local/lib/python3.7/site-packages/click/core.py\", line 764, in __call__\r\n\r\n return self.main(*args, **kwargs)\r\n\r\n File \"/app/.local/lib/python3.7/site-packages/click/core.py\", line 717, in main\r\n\r\n rv = self.invoke(ctx)\r\n\r\n File \"/app/.local/lib/python3.7/site-packages/click/core.py\", line 1137, in invoke\r\n\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n\r\n File \"/app/.local/lib/python3.7/site-packages/click/core.py\", line 956, in invoke\r\n\r\n return ctx.invoke(self.callback, **ctx.params)\r\n\r\n File \"/app/.local/lib/python3.7/site-packages/click/core.py\", line 555, in invoke\r\n\r\n return callback(*args, **kwargs)\r\n\r\n File \"/app/.local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 434, in insert\r\n\r\n default=default,\r\n\r\n File \"/app/.local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 384, in insert_upsert_implementation\r\n\r\n docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs\r\n\r\n File \"/app/.local/lib/python3.7/site-packages/sqlite_utils/db.py\", line 997, in insert_all\r\n\r\n extracts=extracts,\r\n\r\n File \"/app/.local/lib/python3.7/site-packages/sqlite_utils/db.py\", line 618, in create\r\n\r\n extracts=extracts,\r\n\r\n File \"/app/.local/lib/python3.7/site-packages/sqlite_utils/db.py\", line 310, in create_table\r\n\r\n self.conn.execute(sql)\r\n\r\nsqlite3.OperationalError: unrecognized token: \"]\"\r\n\r\nentsoe_2016.csv\r\n\r\nrenamed to txt for uploading compatibility\r\n\r\n[entsoe_2016.txt](https://github.com/simonw/sqlite-utils/files/4197688/entsoe_2016.txt)\r\n\r\ncode is remixed directly from your https://glitch.com/edit/#!/datasette-csvs repo\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/86/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 577302229, "node_id": "MDU6SXNzdWU1NzczMDIyMjk=", "number": 91, "title": "Enable ordering FTS results by rank", "user": {"value": 416374, "label": "gfrmin"}, "state": "closed", "locked": 0, "assignee": null, "milestone": {"value": 6079500, "label": "3.0"}, "comments": 1, "created_at": "2020-03-07T08:43:51Z", "updated_at": "2020-11-06T23:53:26Z", "closed_at": "2020-11-06T23:53:25Z", "author_association": "NONE", "pull_request": null, "body": "According to https://www.sqlite.org/fts5.html (not sure about FTS4) results can be sorted by relevance. At the moment results are returned by default by `rowid`. Perhaps a flag can be added to the `search` method?", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/91/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 593751293, "node_id": "MDU6SXNzdWU1OTM3NTEyOTM=", "number": 97, "title": "Adding a \"recreate\" flag to the `Database` constructor", "user": {"value": 1448859, "label": "betatim"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2020-04-04T05:41:10Z", "updated_at": "2020-04-15T14:29:31Z", "closed_at": "2020-04-13T03:52:29Z", "author_association": "NONE", "pull_request": null, "body": "I have a [script](https://github.com/betatim/binder-datasette/blob/master/create-db.ipynb) that imports data into a sqlite DB. When I re-run that script I'd like to remove the existing sqlite DB, instead of adding to it. The pragmatic answer is to add the check and file deletion to my script.\r\n\r\nHowever I thought it would be easy and useful for others to add a `recreate=True` flag to `db = sqlite_utils.Database(\"binder-launches.db\")`. After taking a look at the code for it I am not so sure any more. This is because the connection string could be a URL (or \"connection string\") like `\"file:///tmp/foo.db\"`. I don't know what the equivalent of `os.path.exists()` is for a connection string or how to detect that something is a connection string and raise an error \"can't use recreate=True and conn_string at the same time\".\r\n\r\nDoes anyone have an idea/suggestion where to start investigating?", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/97/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 613755043, "node_id": "MDU6SXNzdWU2MTM3NTUwNDM=", "number": 110, "title": "Support decimal.Decimal type", "user": {"value": 134771, "label": "dvhthomas"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 6, "created_at": "2020-05-07T03:57:19Z", "updated_at": "2020-05-11T01:58:20Z", "closed_at": "2020-05-11T01:50:11Z", "author_association": "NONE", "pull_request": null, "body": "Decimal types in Postgres cause a failure in db.py data type selection\r\n---\r\nI have a Django app using a MoneyField, which uses a `numeric(14,0)` data type in Postgres (https://www.postgresql.org/docs/9.3/datatype-numeric.html). When attempting to export that table I get the following error:\r\n\r\n```bash\r\n$ db-to-sqlite --table isaweb_proposal \"postgres://connection\" test.db\r\n....\r\n column_type=COLUMN_TYPE_MAPPING[column_type],\r\nKeyError: \r\n```\r\n\r\nLooking at `sql_utils.db.py` at 292-ish it's clear that there is no matching type for what I assume SQLAlchemy interprets as Python decimal.Decimal.\r\n\r\nFrom the [SQLite docs](https://www.sqlite.org/datatype3.html#affinity_name_examples) it looks like DECIMAL in other DBs are considered numeric.\r\n\r\nI'm not quite sure if it's as simple as adding a data type to that list or if there are repercussions beyond it.\r\n\r\nThanks for a great tool!", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/110/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 702386948, "node_id": "MDU6SXNzdWU3MDIzODY5NDg=", "number": 159, "title": ".delete_where() does not auto-commit (unlike .insert() or .upsert())", "user": {"value": 11712349, "label": "spdkils"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 9, "created_at": "2020-09-16T01:55:52Z", "updated_at": "2023-04-01T17:21:05Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "When you use the delete_where() function on a table, it never commits....\r\n\r\nIs that intentional?", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/159/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 707407567, "node_id": "MDU6SXNzdWU3MDc0MDc1Njc=", "number": 171, "title": "Idea: transitive closure tables for tree structures", "user": {"value": 649467, "label": "mhalle"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2020-09-23T14:17:33Z", "updated_at": "2020-10-22T04:38:35Z", "closed_at": "2020-10-22T04:07:14Z", "author_association": "NONE", "pull_request": null, "body": "I just read that sqlite has a transitive closure table extension using a virtual table in order to represent trees:\r\n\r\nhttps://charlesleifer.com/blog/querying-tree-structures-in-sqlite-using-python-and-the-transitive-closure-extension/\r\n\r\nEven without this extension, though, a util to build a transitive closure table would allow trees to be queried easily. Since it relies on self-referential foreign keys, the relationships might even be able to be automatically detected. ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/171/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 711649325, "node_id": "MDU6SXNzdWU3MTE2NDkzMjU=", "number": 182, "title": "Better handling of encodings other than utf-8 for \"sqlite-utils insert\"", "user": {"value": 765871, "label": "kaihendry"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2020-09-30T05:43:48Z", "updated_at": "2020-10-16T17:20:41Z", "closed_at": "2020-10-16T17:18:52Z", "author_association": "NONE", "pull_request": null, "body": "Makefile:\r\n```\r\ndata.db:\r\n curl -O http://maps.natalian.org/data.txt\r\n go run csv-write.go > data.csv\r\n sqlite-utils insert data.db travels data.csv --csv\r\n\r\nclean:\r\n rm data*\r\n```\r\n[csv-write.go](https://gist.github.com/kaihendry/dff2442de20d73f900026d13bf7a11d9)\r\n\r\n\r\nError message is:\r\n\r\n```\r\nsqlite-utils insert data.db travels data.csv --csv\r\nTraceback (most recent call last):\r\n File \"/home/hendry/.local/bin/sqlite-utils\", line 8, in \r\n sys.exit(cli())\r\n File \"/home/hendry/.local/lib/python3.8/site-packages/click/core.py\", line 829, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/home/hendry/.local/lib/python3.8/site-packages/click/core.py\", line 782, in main\r\n rv = self.invoke(ctx)\r\n File \"/home/hendry/.local/lib/python3.8/site-packages/click/core.py\", line 1259, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/home/hendry/.local/lib/python3.8/site-packages/click/core.py\", line 1066, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/home/hendry/.local/lib/python3.8/site-packages/click/core.py\", line 610, in invoke\r\n return callback(*args, **kwargs)\r\n File \"/home/hendry/.local/lib/python3.8/site-packages/sqlite_utils/cli.py\", line 614, in insert\r\n insert_upsert_implementation(\r\n File \"/home/hendry/.local/lib/python3.8/site-packages/sqlite_utils/cli.py\", line 553, in insert_upsert_implementation\r\n headers = next(reader)\r\n File \"/usr/lib/python3.8/codecs.py\", line 322, in decode\r\n (result, consumed) = self._buffer_decode(data, self.errors, final)\r\nUnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 1234: invalid continuation byte\r\nmake: *** [Makefile:4: data.db] Error 1\r\n[hendry@t14s datasette-map]$ sqlite-utils --version\r\nsqlite-utils, version 2.19\r\n```\r\n\r\nLittle bit surprised if Go is spewing out bad Unicode, but I'm not sure how to grok `position 1234`..\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/182/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 723708310, "node_id": "MDU6SXNzdWU3MjM3MDgzMTA=", "number": 188, "title": "About loading spatialite", "user": {"value": 30607, "label": "aborruso"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2020-10-17T08:47:02Z", "updated_at": "2022-02-05T00:04:26Z", "closed_at": "2020-10-17T08:52:58Z", "author_association": "NONE", "pull_request": null, "body": "Hi @simonw ,\r\nIf I run\r\n\r\n```\r\nsqlite3\r\n.load /usr/local/lib/mod_spatialite.so\r\nselect spatialite_version();\r\n```\r\n\r\nI have `5.0.0`.\r\n\r\n![image](https://user-images.githubusercontent.com/30607/96332706-d8cd3300-1065-11eb-906b-daf99963198e.png)\r\n\r\n\r\nIf I run\r\n\r\n```\r\nsqlite-utils :memory: \"select spatialite_version()\" --load-extension=spatialite\r\n```\r\n\r\nI have\r\n\r\n```\r\nTraceback (most recent call last):\r\n File \"/home/aborruso/.local/bin/sqlite-utils\", line 8, in \r\n sys.exit(cli())\r\n File \"/home/aborruso/.local/lib/python3.8/site-packages/click/core.py\", line 829, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/home/aborruso/.local/lib/python3.8/site-packages/click/core.py\", line 782, in main\r\n rv = self.invoke(ctx)\r\n File \"/home/aborruso/.local/lib/python3.8/site-packages/click/core.py\", line 1259, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/home/aborruso/.local/lib/python3.8/site-packages/click/core.py\", line 1066, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/home/aborruso/.local/lib/python3.8/site-packages/click/core.py\", line 610, in invoke\r\n return callback(*args, **kwargs)\r\n File \"/home/aborruso/.local/lib/python3.8/site-packages/sqlite_utils/cli.py\", line 936, in query\r\n _load_extensions(db, load_extension)\r\n File \"/home/aborruso/.local/lib/python3.8/site-packages/sqlite_utils/cli.py\", line 1326, in _load_extensions\r\n db.conn.load_extension(ext)\r\nTypeError: argument 1 must be str, not None\r\n```\r\n\r\nHow to load properly spatialite extension in sqlite-utils?\r\n\r\nThank you very muc", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/188/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 760960559, "node_id": "MDU6SXNzdWU3NjA5NjA1NTk=", "number": 205, "title": "sqlite3.OperationalError: near \"(\": syntax error", "user": {"value": 765871, "label": "kaihendry"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2020-12-10T06:44:40Z", "updated_at": "2020-12-10T19:18:22Z", "closed_at": "2020-12-10T07:24:23Z", "author_association": "NONE", "pull_request": null, "body": "The sqlite version is 3.22.0 2018-01-22 18:45:57 0c55d179733b46d8d0ba4d88e01a25e10677046ee3da1d5b1581e86726f2alt1\r\nsqlite-utils, version 3.0\r\n\r\nIt fails here: https://github.com/kaihendry/aws-partners-datasette/runs/1528432635?check_suite_focus=true\r\n\r\nI'm not sure where the problem is, since it works _fine locally_ on Archlinux system running 3.34.0 2020-12-01 16:14:00 a26b6597e3ae272231b96f9982c3bcc17ddec2f2b6eb4df06a224b91089fed5b\r\n\r\n\r\nhttps://github.com/kaihendry/aws-partners-datasette/blob/main/create-summary-view.sh\r\n\r\nMaybe I need to bump up from ubuntu-latest to ?\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/205/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 767685961, "node_id": "MDU6SXNzdWU3Njc2ODU5NjE=", "number": 210, "title": "Support of RData files", "user": {"value": 23739126, "label": "PeterBailey"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2020-12-15T15:04:14Z", "updated_at": "2021-01-02T00:02:40Z", "closed_at": "2021-01-02T00:02:40Z", "author_association": "NONE", "pull_request": null, "body": "Hi Simon,\r\n\r\nWould be great if you could ingest RData files!\r\n\r\nI could do this in a few lines of code but I am too lazy - sorry!\r\n\r\nPeter", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/210/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 783778672, "node_id": "MDU6SXNzdWU3ODM3Nzg2NzI=", "number": 220, "title": "Better error message for *_fts methods against views", "user": {"value": 649467, "label": "mhalle"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-01-11T23:24:00Z", "updated_at": "2021-02-22T20:44:51Z", "closed_at": "2021-02-14T22:34:26Z", "author_association": "NONE", "pull_request": null, "body": "enable_fts and its related methods only work on tables, not views. \r\n\r\nCould those methods and possibly others move up to the Queryable superclass?\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/220/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 807174161, "node_id": "MDU6SXNzdWU4MDcxNzQxNjE=", "number": 227, "title": "Error reading csv files with large column data", "user": {"value": 295329, "label": "camallen"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2021-02-12T11:51:47Z", "updated_at": "2021-02-16T11:48:03Z", "closed_at": "2021-02-14T21:17:19Z", "author_association": "NONE", "pull_request": null, "body": "*Feel free to close this issue - I mostly added it for reference for future folks that run into this :)*\r\n\r\nI have a CSV file with one column that has very long strings. When i try to import this file via the `insert` command I get the following error: \r\n```\r\nsqlite-utils insert database.db table_name file_with_large_column.csv\r\n\r\nTraceback (most recent call last):\r\n File \"/usr/local/bin/sqlite-utils\", line 10, in \r\n sys.exit(cli())\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 829, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 782, in main\r\n rv = self.invoke(ctx)\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 1259, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 1066, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 610, in invoke\r\n return callback(*args, **kwargs)\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 774, in insert\r\n default=default,\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 705, in insert_upsert_implementation\r\n docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py\", line 1852, in insert_all\r\n first_record = next(records)\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 703, in \r\n docs = (decode_base64_values(doc) for doc in docs)\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 681, in \r\n docs = (dict(zip(headers, row)) for row in reader)\r\n_csv.Error: field larger than field limit (131072)\r\n```\r\nBuilt with the docker image `datasetteproject/datasette:0.54` with the following versions:\r\n```\r\n# sqlite-utils --version\r\nsqlite-utils, version 3.4.1\r\n\r\n# datasette --version\r\ndatasette, version 0.54\r\n```\r\nIt appears this is a [known issue](https://stackoverflow.com/a/54517228/2761423) reading in csv files in python and [doesn't look to be modifiable](https://github.com/python/cpython/blob/ea46579067fd2d4e164d6605719ffec690c4d621/Modules/_csv.c#L1685) through system / env vars (i may be very wrong on this).\r\n\r\nNoting that using sqlite3 `import` command work without error (not using the python csv reader)\r\n```\r\nsqlite3 database.db\r\nsqlite> .mode csv\r\nsqlite> .import file_with_large_column.csv table_name\r\n```\r\nSadly I couldn't see an easy way around this while using the cli as it appears this value needs to be changed in python code. FWIW I've switched to using https://datasette.io/tools/csvs-to-sqlite for importing csv data and it's working well. \r\n\r\nFinally, I'm loving https://datasette.io/ thank you very much for an amazing tool and data ecosytem \ud83d\ude47\u200d\u2640\ufe0f ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/227/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 807817197, "node_id": "MDU6SXNzdWU4MDc4MTcxOTc=", "number": 229, "title": "Hitting `_csv.Error: field larger than field limit (131072)`", "user": {"value": 631242, "label": "frosencrantz"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-02-13T19:52:44Z", "updated_at": "2021-02-14T21:33:33Z", "closed_at": "2021-02-14T21:33:33Z", "author_association": "NONE", "pull_request": null, "body": "I have a csv file where one of the fields is so large it is throwing an exception with this error and stops loading:\r\n\t```\r\n\t_csv.Error: field larger than field limit (131072)\r\n\t```\r\n\r\nThe stack trace occurs here: https://github.com/simonw/sqlite-utils/blob/3.1/sqlite_utils/cli.py#L633\r\n\r\n\r\nThere is a way to handle this that helps:\r\nhttps://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072\r\n\r\nOne issue I had with this problem was sqlite-utils only provides limited context as to where the problem line is.\r\nThere is the progress bar, but that is by percent rather than by line number. It would have been helpful if it could have provided a line number.\r\n\r\nAlso, it would have been useful if it had allowed the loading to continue with later lines.\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/229/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 810618495, "node_id": "MDU6SXNzdWU4MTA2MTg0OTU=", "number": 235, "title": "Extract columns cannot create foreign key relation: sqlite3.OperationalError: table sqlite_master may not be modified", "user": {"value": 6913891, "label": "kristomi"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 18, "created_at": "2021-02-17T23:33:23Z", "updated_at": "2023-06-26T01:47:01Z", "closed_at": "2023-06-25T23:25:53Z", "author_association": "NONE", "pull_request": null, "body": "Thanks for what seems like a truly great suite of libraries. I wanted to try out Datasette, but never got more than half way through your YouTube video with the SF tree dataset. Whenever I try to extract a column, I get a `sqlite3.OperationalError: table sqlite_master may not be modified` error from Python. This snippet reproduces the error on my system, Python 3.9.1 and sqlite-utils 3.5 on an M1 Macbook Pro running in rosetta mode:\r\n```\r\ncurl \"https://data.nasa.gov/resource/y77d-th95.json\" | \\\r\n sqlite-utils insert meteorites.db meteorites - --pk=id\r\nsqlite-utils extract meteorites.db meteorites recclass\r\n```\r\n\r\nI have tried googling the problem, but all I've found is that this *might* be a problem with the sqlite3 database running in defensive mode, but I definitely can't know for sure. Does the problem seem familiar to you?\r\n\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/235/reactions\", \"total_count\": 3, \"+1\": 3, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 815554385, "node_id": "MDU6SXNzdWU4MTU1NTQzODU=", "number": 237, "title": "db[\"my_table\"].drop(ignore=True) parameter, plus sqlite-utils drop-table --ignore and drop-view --ignore", "user": {"value": 649467, "label": "mhalle"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-02-24T14:55:06Z", "updated_at": "2021-02-25T17:11:41Z", "closed_at": "2021-02-25T17:11:41Z", "author_association": "NONE", "pull_request": null, "body": "When I'm generating a derived table in python, I often drop the table and create it from scratch. However, the first time I generate the table, it doesn't exist, so the drop raises an exception. That means more boilerplate.\r\n\r\nI was going to submit a pull request that adds an \"if_exists\" option to the `drop` method of tables and views. \r\n\r\nHowever, for a utility like sqlite_utils, perhaps the \"IF EXISTS\" SQL semantics is what you want most of the time, and thus should be the default.\r\n\r\nWhat do you think?", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/237/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 818684978, "node_id": "MDU6SXNzdWU4MTg2ODQ5Nzg=", "number": 243, "title": "How can i use this utils to deal with fts on column meta of tables ?", "user": {"value": 27874014, "label": "svjack"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-03-01T09:45:05Z", "updated_at": "2021-03-01T09:45:05Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Thank you to release this bravo project.\r\nWhen i use this project on multi table db, I want to implement \r\nconvenient search on column name from different tables.\r\nI want to develop a meta table to save the meta data of different columns\r\nof different tables and search on this meta table to get rows from the\r\ndata table (which the meta table describes)\r\ndoes this project provide some simple function on it ?\r\n\r\nYou can think a have a knowledge graph about the table in the db, \r\nand i save this knowledge graph into the db with fts enabled.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/243/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 836829560, "node_id": "MDU6SXNzdWU4MzY4Mjk1NjA=", "number": 248, "title": "support for Apache Arrow / parquet files I/O", "user": {"value": 649467, "label": "mhalle"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-03-20T14:59:30Z", "updated_at": "2021-10-28T23:46:48Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "I just started looking at Apache Arrow using pyarrow for import and export of tabular datasets, and it looks quite compelling. It might be worth looking at for sqlite-utils and/or datasette.\r\n\r\nAs a test, I took a random jsonl data dump of a dataset I have with floats, strings, and ints and converted it to arrow's parquet format using the naive `pyarrow.parquet.write_file()` command, which has automatic type inferrence. It compressed down to 7% of the original size. Conversion of a 26MB JSON file and serializing it to parquet was eyeblink instantaneous. Parquet files are portable and can be directly imported into pandas and other analytics software. \r\n\r\nThe only hangup is the automatic type inference of the naive reader. It's great for general laziness and for parsing JSON columns (it correctly interpreted a table of mine with a JSON array). However, I did get an exception for a string column where most entries looked integer-like but had a couple values that weren't -- the reader tried to coerce all of them for some reason, even though the JSON type is string. Since the writer optionally takes a schema, it shouldn't be too hard to grab the sqlite header types. With some additional hinting, you might get datetime columns and JSON, which are native Arrow types. \r\n\r\nSomewhat tangentially, someone even wrote an sqlite vfs extension for Parquet: https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html\r\n\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/248/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 836963850, "node_id": "MDU6SXNzdWU4MzY5NjM4NTA=", "number": 249, "title": "Full text search possibly broken?", "user": {"value": 36287, "label": "prabhur"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2021-03-21T02:03:44Z", "updated_at": "2021-03-21T02:43:32Z", "closed_at": "2021-03-21T02:43:32Z", "author_association": "NONE", "pull_request": null, "body": "I'm not quite sure if this is an issue with sqlite-utils or datasette.\r\n\r\n**Background**\r\nI was previously using sqlite-utils version < 3.6. I have a bunch of csv files that have some data scraped from a website.\r\n\r\n```\r\nsqlite-utils create-table mydb.db post \\\r\n posted_date text \\\r\n url text \\\r\n title text \\\r\n raw_text text \\\r\n --not-null posted_date \\\r\n --not-null url \\\r\n --pk=url\r\n```\r\nFTS is enabled via\r\n`sqlite-utils enable-fts ./mydb.db post title raw_text`\r\n\r\nData is loaded to the table via\r\n`sqlite-utils insert ./mydb.db post ${filename} --csv`\r\n\r\nNote that the data contains text in my language Tamil.\r\n\r\nLoading happens just fine.\r\ndatasette serves the db file just fine. It recognizes FTS and shows the \"search\" box. However, none of the queries work. Whatever text I supply, it always returns 0 rows. I literally copy paste words from the row listing on the screen and paste it on the search box.\r\n\r\nInterestingly, only thing I can remember is switching to sqlite-utils 3.6. I had to do this because the prior version had an issue with column size.\r\n\r\nI have attached one of the csv files that can be loaded to the table. Substitute \"${filename}\" with that file for the sqlite-utils insert command.\r\n[posts_20200417-20201231.csv.zip](https://github.com/simonw/sqlite-utils/files/6176697/posts_20200417-20201231.csv.zip)\r\n\r\nInterestingly, the FTS based search from datasette worked just fine before this version upgrade. That is, the queries returned results. I will try to downgrade just to see if the theory is correct. \r\n\r\nI appreciate any help here. Thanks.\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/249/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 842062949, "node_id": "MDU6SXNzdWU4NDIwNjI5NDk=", "number": 252, "title": "Support json-line files", "user": {"value": 279769, "label": "rathboma"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-03-26T15:19:39Z", "updated_at": "2021-03-26T15:21:38Z", "closed_at": "2021-03-26T15:21:38Z", "author_association": "NONE", "pull_request": null, "body": "It's common for many processes to create flat files where each line is a JSON object.\r\n\r\nSo the file isn't a json array.\r\n\r\nMany tools (like jq) support this natively, it'd be great for sqlite-utils to also!", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/252/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 847423559, "node_id": "MDU6SXNzdWU4NDc0MjM1NTk=", "number": 253, "title": "fixtures.db example error in sql-utils blog post", "user": {"value": 192568, "label": "mroswell"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2021-03-31T22:07:36Z", "updated_at": "2021-05-19T03:31:48Z", "closed_at": "2021-05-19T03:31:47Z", "author_association": "NONE", "pull_request": null, "body": "En route to trying to understand column order transform documentation, I tried the instructions here:\r\nhttps://simonwillison.net/2020/Sep/23/sqlite-advanced-alter-table/\r\nI get a malformed database schema syntax error.\r\n\r\n```\r\n$ wget https://latest.datasette.io/fixtures.db\r\n--2021-03-31 18:00:23-- https://latest.datasette.io/fixtures.db\r\nResolving latest.datasette.io (latest.datasette.io)... 2607:f8b0:4004:801::2013, 142.250.73.211\r\nConnecting to latest.datasette.io (latest.datasette.io)|2607:f8b0:4004:801::2013|:443... connected.\r\nHTTP request sent, awaiting response... 200 OK\r\nLength: unspecified [application/octet-stream]\r\nSaving to: \u2018fixtures.db\u2019\r\n\r\nfixtures.db [ <=> ] 260.00K --.-KB/s in 0.1s\r\n\r\n2021-03-31 18:00:23 (2.41 MB/s) - \u2018fixtures.db\u2019 saved [266240]\r\n\r\n$ sqlite3 fixtures.db '.schema facetable'\r\nError: malformed database schema (generated_columns) - near \"AS\": syntax error\r\n\r\n$ sqlite3 fixtures.db\r\nSQLite version 3.28.0 2019-04-15 14:49:49\r\nEnter \".help\" for usage hints.\r\nsqlite> .schema\r\nError: malformed database schema (generated_columns) - near \"AS\": syntax error\r\n```\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/253/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 861622839, "node_id": "MDU6SXNzdWU4NjE2MjI4Mzk=", "number": 256, "title": "inserting with --nl errors with: sqlite3.OperationalError: table has no column named ", "user": {"value": 279769, "label": "rathboma"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2021-04-19T18:01:03Z", "updated_at": "2021-05-19T03:26:54Z", "closed_at": "2021-05-19T03:26:54Z", "author_association": "NONE", "pull_request": null, "body": "I have a `jsonl` file, it is 10,000 lines long.\r\n\r\nInserting from the cli with `sqlite-utils insert db table file --nl --batch-size 10000` fails with this missing column error, even though I'm telling it to use the whole file in the first batch.\r\n\r\nThis seems similar to #18 and #139, but maybe it's unique to `--nl`?", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/256/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 907795562, "node_id": "MDU6SXNzdWU5MDc3OTU1NjI=", "number": 265, "title": "Using enable_fts before search term", "user": {"value": 36287, "label": "prabhur"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-06-01T01:43:34Z", "updated_at": "2023-04-01T17:27:18Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Many thanks for the sqlite-utils suite of utilities. Has made my life much much easier. \r\nI used this to create a table and enable FTS. All works fine. The datasette utility detects FTS and shows a text box. Searching for a term using that interface works well.\r\n\r\nHowever, when I start to use features by following https://www.sqlite.org/fts5.html section **\"3. Full-text Query Syntax\"** I seem to run into issues that I suspect is due to `escape_fts` wrapper function.\r\n\r\nAs an example, if i search for the term `\"^\u0b95\u0bc1\u0b95\u0bc8\" `on the text box in datasette it produces 140 results. However, when i tweak the query produced by datasette to not use \"escape_fts\" it produces 5 results.\r\n\r\nSimilarly, when I try to restrict the search to a single column in FTS using a spec like `{title : ^\u0b95\u0bc1\u0b95\u0bc8}` it returns no rows. The same thing pulls results when used without `escape_fts`. The text in the table is in Tamil language and the search term is a Tamil word.\r\n\r\n```\r\n ...\r\n where\r\n posts_fts match escape_fts(:search)\r\n```\r\nvs\r\n\r\n```\r\n ...\r\n where\r\n posts_fts match (:search)\r\n```\r\n\r\nAny ideas why? How can I get the benefits of both escaping as well as utilizing different facets of providing / controlling search terms? Thanks.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/265/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 915421499, "node_id": "MDU6SXNzdWU5MTU0MjE0OTk=", "number": 267, "title": "row.update() or row.pk", "user": {"value": 12721157, "label": "Gravitar64"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2021-06-08T19:56:00Z", "updated_at": "2021-06-22T17:27:27Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Hi,\r\n\r\nfantastic framework for working with Sqlite3 databases!!!\r\n\r\nI tried to update spezific rows in a table and used\r\n\r\nfor row in db[tablename]:\r\n newValue = row[\"counter\"] * row[\"prize\"] \r\n row.update({\"Fieldname\": newValue})\r\n print(row)\r\n\r\n\r\nThis updates the value in the printet row, but not in the database. So I switched to\r\n\r\ndb[tablename].update(id, {\"Filedname\": newValue})\r\n\r\nThis works fine. But row.update would be nicer, because no need for the id (its that row), no need for the tablename and the db (all defined in the for row ... loop).\r\n\r\nThx\r\n\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/267/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 919250621, "node_id": "MDU6SXNzdWU5MTkyNTA2MjE=", "number": 269, "title": "bool type not supported", "user": {"value": 4068, "label": "frafra"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-06-11T22:00:36Z", "updated_at": "2021-06-15T01:34:10Z", "closed_at": "2021-06-15T01:34:10Z", "author_association": "NONE", "pull_request": null, "body": "Hi! Thank you for sharing this very nice tool :)\r\nIt would be nice to have support for more types, like `bool`: it is not possible to convert to boolean at the moment. My suggestion would be to handle it as `bool(int(value))`, like csvkit does.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/269/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 919314806, "node_id": "MDU6SXNzdWU5MTkzMTQ4MDY=", "number": 270, "title": "Cannot set type JSON", "user": {"value": 4068, "label": "frafra"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2021-06-11T23:53:22Z", "updated_at": "2021-06-16T17:34:49Z", "closed_at": "2021-06-16T15:47:06Z", "author_association": "NONE", "pull_request": null, "body": "It would be great if the column type could be set to JSON. That would not be different from handling a regular string. It would be something like `repr(value)` and it would work with both JSON and CSV inputs, no matter if `value` is a real list or just a string representing a list.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/270/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 925677191, "node_id": "MDU6SXNzdWU5MjU2NzcxOTE=", "number": 289, "title": "Mypy fixes for rows_from_file()", "user": {"value": 857609, "label": "adamchainz"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-06-20T20:34:59Z", "updated_at": "2021-06-22T18:44:36Z", "closed_at": "2021-06-22T18:13:26Z", "author_association": "NONE", "pull_request": null, "body": "Following https://github.com/simonw/sqlite-utils/issues/279#issuecomment-864328927\r\n\r\nYou had two mypy errors.\r\n\r\nThe first:\r\n\r\n> sqlite_utils/utils.py:157: error: Argument 1 to \"BufferedReader\" has incompatible type \"BinaryIO\"; expected \"RawIOBase\"\r\n\r\nLooking at the `BufferedReader` docs, it seems to expect a `RawIOBase`, and this [has been copied into typeshed](https://github.com/python/typeshed/blob/9ec2f8712480c57353cea097a65d75a2c4ec1846/stdlib/io.pyi#L100). There may be scope to change how `BufferedReader` is documented and typed upstream, but for now it wouldn't be too bad to use a `typing.cast()`:\r\n\r\n```\r\n# Detect the format, then call this recursively\r\nbuffered = io.BufferedReader(\r\n cast(io.RawIOBase, fp), # Undocumented BufferedReader support for BinaryIO\r\n buffer_size=4096,\r\n)\r\n```\r\n\r\nThe second error seems to be flagging a legitimate bug in your code:\r\n\r\n> sqlite_utils/utils.py:163: error: Argument 1 to \"decode\" of \"bytes\" has incompatible type \"Optional[str]\"; expected \"str\"\r\n\r\nFrom your type hints, `encoding` may be `None`. In the CSV format block, you use `encoding or \"utf-8-sig\"` to set a default, maybe that's desirable in this case too?\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/289/reactions\", \"total_count\": 1, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 1, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 934123448, "node_id": "MDU6SXNzdWU5MzQxMjM0NDg=", "number": 295, "title": "Insert with --tsv and --no-headers give error about --nl arguments", "user": {"value": 7288187, "label": "davidscotson"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-06-30T21:01:01Z", "updated_at": "2021-08-18T20:19:04Z", "closed_at": "2021-08-18T20:18:57Z", "author_association": "NONE", "pull_request": null, "body": "Not quite sure if this is a bug, or just an assumption I made but I thought `--tsv` and `--no-headers` would work together when inserting from a file, and currently they seem not to (sqlite-utils, version 3.12, installed on Mac OS X via brew)\r\n\r\nInstead it says:\r\n\r\n`Error: Use just one of --nl, --csv or --tsv`\r\n\r\nAs if it has interpreted the --no-headers as --nl.\r\n\r\nThe --help does specifically say CSV:\r\n `--no-headers CSV file has no header row`\r\n \r\nAnd this heading in the documentation also only refers to CSV, but the text does mention TSV in passing, and I'd generally expect them to behave the same in most cases.\r\nhttps://sqlite-utils.datasette.io/en/stable/cli.html#csv-files-without-a-header-row", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/295/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 944326512, "node_id": "MDU6SXNzdWU5NDQzMjY1MTI=", "number": 296, "title": "`table.search(..., quote=True)` parameter and `sqlite-utils search --quote` option", "user": {"value": 32427188, "label": "deafmute1"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 6, "created_at": "2021-07-14T11:26:47Z", "updated_at": "2021-08-18T20:13:12Z", "closed_at": "2021-08-18T20:10:48Z", "author_association": "NONE", "pull_request": null, "body": "Hi,\r\nRecently got this error:\r\n```\r\nTraceback (most recent call last):\r\n File \"\", line 1, in \r\n File \"/home/ethan/git/music-metadata-indexer/src/mmindexer/__init__.py\", line 38, in \r\n start(\"/home/ethan/git/music-metadata-indexer/sample\", \"/home/ethan/git/music-metadata-indexer/test.db\")\r\n File \"/home/ethan/git/music-metadata-indexer/src/mmindexer/__init__.py\", line 23, in start\r\n scanner.build_database()\r\n File \"/home/ethan/git/music-metadata-indexer/src/mmindexer/scan.py\", line 79, in build_database\r\n _import_song(self.db, Path(dirpath).joinpath(f), self.logger) \r\n File \"/home/ethan/git/music-metadata-indexer/src/mmindexer/scan.py\", line 23, in _import_song\r\n db.add_song(filepath)\r\n File \"/home/ethan/git/music-metadata-indexer/src/mmindexer/index.py\", line 166, in add_song\r\n for match in self.search(\"albums\", album): \r\n File \"/home/ethan/git/music-metadata-indexer/env/lib/python3.9/site-packages/sqlite_utils/db.py\", line 1625, in search\r\n cursor = self.db.execute(\r\n File \"/home/ethan/git/music-metadata-indexer/env/lib/python3.9/site-packages/sqlite_utils/db.py\", line 243, in execute\r\n return self.conn.execute(sql, parameters)\r\nsqlite3.OperationalError: fts5: syntax error near \".\" \r\n```\r\nSo, the error seems to suggest there was a \".\" character somewhere in the SQL command that was causing the error. I did a little digging and found this in the docs: https://www.sqlite.org/fts5.html#fts5_strings. \".\" is one of the many prohibited characters.\r\n\r\nMy solution was to just strip these out of the query using this line\r\n`query = query.translate({e: None for e in itertools.chain(range(0,26), range(27, 48), range(58,65), range(91,95), [96], range(123,128))})`\r\n\r\nPerhaps this could be included into the `table.search()` function?\r\n\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/296/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 951581763, "node_id": "MDU6SXNzdWU5NTE1ODE3NjM=", "number": 298, "title": "Read lines with JSON object", "user": {"value": 2172260, "label": "qqilihq"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2021-07-23T13:28:52Z", "updated_at": "2021-08-03T06:50:47Z", "closed_at": "2021-08-02T21:55:16Z", "author_association": "NONE", "pull_request": null, "body": "I found this posted on HN a while ago and love it -- thank you!\r\n\r\nAs a minor improvement, it would be great to have the ability to parse a file with line-separated JSON objects. Currently the parser obviously requires an array wrapping all these objects.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/298/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 956832836, "node_id": "MDU6SXNzdWU5NTY4MzI4MzY=", "number": 300, "title": "Returning underlying cause for User Defined Functions ", "user": {"value": 71236, "label": "wsargent"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-07-30T15:08:21Z", "updated_at": "2021-08-02T21:53:50Z", "closed_at": "2021-08-02T21:53:50Z", "author_association": "NONE", "pull_request": null, "body": "The sqlite3 client takes user defined functions and replaces the text with \"user-defined function raised exception`\" so it's not apparent what's gone wrong:\r\n\r\n```\r\nUnexpected error: user-defined function raised exception\r\n```\r\n\r\nAs mentioned in https://code.djangoproject.com/ticket/29500 and https://stackoverflow.com/questions/45824209/how-to-get-an-error-kind-from-sqlite-create-function/45834923#45834923 the workaround for this is to enable callback tracebacks:\r\n\r\n```\r\nsqlite3.enable_callback_tracebacks(True)\r\n```\r\n\r\nIt would be nice if https://sqlite-utils.datasette.io/en/stable/python-api.html#registering-custom-sql-functions either included a reference to `enable_callback_tracebacks` or if registering a user defined function set this flag automatically.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/300/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 963897111, "node_id": "MDU6SXNzdWU5NjM4OTcxMTE=", "number": 309, "title": "sqlite-utils insert errors should show SQL and parameters, if possible", "user": {"value": 16622642, "label": "scaleoutsean"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 6, "created_at": "2021-08-09T11:24:14Z", "updated_at": "2021-08-09T23:40:29Z", "closed_at": "2021-08-09T22:25:58Z", "author_association": "NONE", "pull_request": null, "body": "I've tried several approaches, but this is the current one:\r\n\r\n```sh\r\necho $json-line | sqlite-utils insert json.db jsontable --truncate --alter --detect-types -\r\n```\r\nIn all cases, I get this error:\r\n\r\n```sh\r\nOverflowError: Python int too large to convert to SQLite INTEGER\r\nTraceback (most recent call last):\r\n File \"/home/sean/.local/bin/sqlite-utils\", line 8, in \r\n sys.exit(cli())\r\n File \"/usr/lib/python3/dist-packages/click/core.py\", line 764, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/usr/lib/python3/dist-packages/click/core.py\", line 717, in main\r\n rv = self.invoke(ctx)\r\n File \"/usr/lib/python3/dist-packages/click/core.py\", line 1137, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/usr/lib/python3/dist-packages/click/core.py\", line 956, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/usr/lib/python3/dist-packages/click/core.py\", line 555, in invoke\r\n return callback(*args, **kwargs)\r\n File \"/home/sean/.local/lib/python3.8/site-packages/sqlite_utils/cli.py\", line 841, in insert\r\n insert_upsert_implementation(\r\n File \"/home/sean/.local/lib/python3.8/site-packages/sqlite_utils/cli.py\", line 780, in insert_upsert_implementation\r\n db[table].insert_all(\r\n File \"/home/sean/.local/lib/python3.8/site-packages/sqlite_utils/db.py\", line 2145, in insert_all\r\n self.insert_chunk(\r\n File \"/home/sean/.local/lib/python3.8/site-packages/sqlite_utils/db.py\", line 1957, in insert_chunk\r\n result = self.db.execute(query, params)\r\n File \"/home/sean/.local/lib/python3.8/site-packages/sqlite_utils/db.py\", line 257, in execute\r\n return self.conn.execute(sql, parameters)\r\n```\r\n\r\nI googled the error and checked SO answers and advice, all good. I changed my JSON file to not use integers so I no longer get this error. Of course, that makes using the database a bit harder, so I also tried to solve the problem by modifying DB structure (while using integers in JSON).\r\n\r\nIf change all `INTEGER` Data Types to something else (`STRING`, `TEXT`) and try to import again using `--truncate`, I still get this error. I suppose I should tell sqlite-utils which columns should use non-INTEGER Data Type rather than rely on it to check my SQL table configuration.\r\n\r\nIf that is the case, can this error be a bit more specific for easier troubleshooting - maybe tell us which which record caused the problem when that error is thrown? \r\n\r\nMy table has 60+ columns, many of which use 64-bit integers (not all records are large or known in advance), so while I can modify JSON to use strings instead of integers, it decreases usability and finding out which records have values for which SQLite integers aren't sufficient requires some work (I'm thinking about parsing all integers with `jq` and sorting output by length to identify those columns, but I'd prefer if sqlite-utils could tell me that). \r\n\r\nMy environment:\r\n\r\n- Python 3.8.10\r\n - sqlite-utils 3.14 \r\n - pandas 1.3.1 \r\n - numpy 1.21.1 \r\n - sqlite-fts4 1.0.1\r\n- sqlite 3.31.1-4ubuntu0.2\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/309/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 976399638, "node_id": "MDU6SXNzdWU5NzYzOTk2Mzg=", "number": 319, "title": "[Enhancement] Please allow 'insert-files' to insert content as text.", "user": {"value": 66709385, "label": "pjamargh"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 10, "created_at": "2021-08-22T15:10:46Z", "updated_at": "2021-08-24T23:33:45Z", "closed_at": "2021-08-24T23:33:44Z", "author_association": "NONE", "pull_request": null, "body": "'insert-files' creates BLOB columns for file contents. Transforming the column to TEXT still keep the content as binary. Even though I'm sure there is a transform that can be applied decoding the text it would be great to have a argument to make 'insert-files' to do it as text (with optional text encoding).\r\n\r\nThe use case is a bunch of htmls (single file) on a directory structure that inserted with this command could be served in Datasette allowing full text search.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/319/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 990844088, "node_id": "MDU6SXNzdWU5OTA4NDQwODg=", "number": 325, "title": "sqlite-utils memory can't deal with multiple files with the same name", "user": {"value": 144773, "label": "karlb"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2021-09-08T08:14:42Z", "updated_at": "2021-09-22T20:52:56Z", "closed_at": "2021-09-22T20:45:45Z", "author_association": "NONE", "pull_request": null, "body": "When I use multiple files with the same name, e.g. in `sqlite-utils memory a/bug.csv b/bug.csv`, sqlite-utils creates invalid views.\r\n```\r\nTraceback (most recent call last):\r\n File \"/home/karl/.local/bin/sqlite-utils\", line 8, in \r\n sys.exit(cli())\r\n File \"/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/click/core.py\", line 1137, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/click/core.py\", line 1062, in main\r\n rv = self.invoke(ctx)\r\n File \"/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/click/core.py\", line 1668, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/click/core.py\", line 1404, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/click/core.py\", line 763, in invoke\r\n return __callback(*args, **kwargs)\r\n File \"/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/sqlite_utils/cli.py\", line 1299, in memory\r\n db[csv_table].transform(types=tracker.types)\r\n File \"/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/sqlite_utils/db.py\", line 1287, in transform\r\n self.db.execute(sql)\r\n File \"/home/karl/.local/pipx/venvs/sqlite-utils/lib/python3.9/site-packages/sqlite_utils/db.py\", line 421, in execute\r\n return self.conn.execute(sql)\r\nsqlite3.OperationalError: error in view t1: no such table: main.bug\r\n```\r\n\r\nThis can be reproduced with\r\n```sh\r\n#!/bin/bash\r\nmkdir foo\r\nmkdir bar\r\necho -e 'col1,col2\\nval1,val2' > foo/bug.csv\r\necho -e 'col3,col4\\nval3,val4' > bar/bug.csv\r\nsqlite-utils memory */bug.csv 'SELECT 1'\r\n```\r\n\r\nIdeally, the tables would get unique names by including the next path segment until the names are unique. But just making the numbered t* aliases work would be good enough.\r\n\r\nThis problem can of course be worked around by renaming the files, but it would be nice if this case was handled more gracefully.\r\n\r\nThanks a lot for this great tool!", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/325/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1004613267, "node_id": "I_kwDOCGYnMM474S6T", "number": 328, "title": "Invalid JSON output when no rows", "user": {"value": 12752, "label": "gravis"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-09-22T18:37:26Z", "updated_at": "2021-09-22T20:21:34Z", "closed_at": "2021-09-22T20:20:18Z", "author_association": "NONE", "pull_request": null, "body": "`sqlite-utils query` generates a JSON output with the result from the query:\r\n\r\n```json\r\n[{...},{...}]\r\n```\r\nIf no rows are returned by the query, I'm expecting an empty JSON array:\r\n\r\n```json\r\n[]\r\n```\r\n\r\nBut actually I'm getting an empty string. To be consistent, the output should be `[]` when the request succeeds (return code == `0`).", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/328/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1026794056, "node_id": "I_kwDOCGYnMM49M6JI", "number": 331, "title": "Mypy error: found module but no type hints or library stubs", "user": {"value": 53032010, "label": "andreaslongo"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2021-10-14T20:29:50Z", "updated_at": "2021-11-14T23:21:08Z", "closed_at": "2021-11-14T23:21:08Z", "author_association": "NONE", "pull_request": null, "body": "```\r\nPython 3.9.5\r\nmypy 0.910\r\nsqlite-utils 3.17.1\r\n```\r\n\r\nWhile using sqlite-utils as a library, when I use mypy for static type checking, it throws an error:\r\n\r\n```\r\nmypy .\r\nsrc/etl.py:5: error: Skipping analyzing \"sqlite_utils\": found module but no type hints or library stubs\r\n import sqlite_utils\r\n ^\r\nsrc/etl.py:5: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports\r\ntest/test_etl.py:4: error: Skipping analyzing \"sqlite_utils\": found module but no type hints or library stubs\r\n import sqlite_utils\r\n ^\r\nFound 2 errors in 2 files (checked 7 source files)\r\n```\r\n\r\n\r\nWhen I add a `py.typed` file to the sqlite-utils package to mark it as PEP 561 compatible, the error goes away.\r\n\r\n```\r\nal@nbal ..b/python3.9/site-packages/sqlite_utils (git)-[main] % la\r\ntotal 200\r\ndrwx------ 3 al al 4096 Oct 14 22:00 .\r\ndrwx------ 117 al al 4096 Oct 12 21:12 ..\r\n-rw------- 1 al al 64409 Oct 12 21:11 cli.py\r\n-rw------- 1 al al 109092 Oct 12 21:11 db.py\r\n-rw------- 1 al al 0 Oct 14 22:00 py.typed\r\n-rw------- 1 al al 684 Oct 12 21:11 recipes.py\r\n-rw------- 1 al al 7988 Oct 12 21:11 utils.py\r\n-rw------- 1 al al 113 Oct 12 21:11 __init__.py\r\n```\r\n\r\nI would like to suggest adding a `py.typed` file to the repository.\r\n\r\nSee also the mypy docs on creating PEP 561 compatible packages:\r\nhttps://mypy.readthedocs.io/en/stable/installed_packages.html#creating-pep-561-compatible-packages\r\n\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/331/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1028056713, "node_id": "I_kwDOCGYnMM49RuaJ", "number": 332, "title": "`sqlite-utils memory --flatten` option to flatten nested JSON", "user": {"value": 22523840, "label": "rdtq"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-10-16T14:04:42Z", "updated_at": "2021-11-14T23:05:05Z", "closed_at": "2021-11-14T23:05:05Z", "author_association": "NONE", "pull_request": null, "body": "currently --flatten option works only for `insert` command, it would be cool if it worked for `memory` as well to query nested json", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/332/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1041778507, "node_id": "I_kwDOCGYnMM4-GEdL", "number": 334, "title": "Filter by datetime objects using rows_where()", "user": {"value": 11642379, "label": "viseshrp"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-11-02T00:44:08Z", "updated_at": "2021-11-13T19:23:21Z", "closed_at": "2021-11-13T19:23:21Z", "author_association": "NONE", "pull_request": null, "body": "Firstly, thanks for this nice utility. \r\nIt would be nice to have an example in the docs on how to filter by date range using `rows_where()`. \r\nThis doesn't seem to work:\r\n```\r\ntable.rows_where('datetime(created) between datetime(\"2021-10-31T17:29:59.277428-04:00\") AND datetime(\"2021-11-01T03:44:04.544651+00:00\")')\r\n```\r\n\r\n\r\nI could probably just use `db.query()`, which works for the above, but it would be nice if I could pass in `datetime` objects in `rows_where()`.\r\nThanks.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/334/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1042569687, "node_id": "I_kwDOCGYnMM4-JFnX", "number": 335, "title": "sqlite-utils index-foreign-keys fails due to pre-existing index", "user": {"value": 596279, "label": "zaneselvans"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 11, "created_at": "2021-11-02T16:22:11Z", "updated_at": "2021-11-14T22:55:56Z", "closed_at": "2021-11-14T22:55:56Z", "author_association": "NONE", "pull_request": null, "body": "While running the command:\r\n```sh\r\nsqlite-utils index-foreign-keys $SQLITE_DIR/pudl.sqlite\r\n```\r\n\r\nI got the following error:\r\n\r\n```\r\nTraceback (most recent call last):\r\n File \"/home/zane/miniconda3/envs/pudl-dev/bin/sqlite-utils\", line 8, in \r\n sys.exit(cli())\r\n File \"/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/click/core.py\", line 829, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/click/core.py\", line 782, in main\r\n rv = self.invoke(ctx)\r\n File \"/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/click/core.py\", line 1259, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/click/core.py\", line 1066, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/click/core.py\", line 610, in invoke\r\n return callback(*args, **kwargs)\r\n File \"/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/sqlite_utils/cli.py\", line 454, in index_foreign_keys\r\n db.index_foreign_keys()\r\n File \"/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/sqlite_utils/db.py\", line 902, in index_foreign_keys\r\n table.create_index([fk.column])\r\n File \"/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/sqlite_utils/db.py\", line 1563, in create_index\r\n self.db.execute(sql)\r\n File \"/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/sqlite_utils/db.py\", line 421, in execute\r\n return self.conn.execute(sql)\r\nsqlite3.OperationalError: index idx_generators_eia860_report_date already exists\r\n```\r\n\r\nThis DB was created with the foreign key constraint `PRAGMA` enabled and a bunch of column-level `CHECK` constraints. Is this an expected behavior? Should one not try to index foreign keys if FK constraints are already being enforced within the DB?\r\n\r\nI'm also noticing that the size of the DB after FK indexes have been added went from 483MB to 835MB, which seems like a much bigger jump than when I've done this previously.\r\n\r\nSoftware versions...\r\n* sqlite-utils 3.17.1\r\n* sqlite 3.36.0\r\n* SQLAlchemy 1.4.26 (used to create the DB)", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/335/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1063388037, "node_id": "I_kwDOCGYnMM4_YgOF", "number": 343, "title": "Provide function to generate hash_id from specified columns", "user": {"value": 82988, "label": "psychemedia"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2021-11-25T10:12:12Z", "updated_at": "2022-03-02T04:25:25Z", "closed_at": "2022-03-02T04:25:25Z", "author_association": "NONE", "pull_request": null, "body": "Hi\r\n\r\nI note that you define `_hash()` to create a `hash_id` from non-id column values in a table [here](https://github.com/simonw/sqlite-utils/blob/8f386a0d300d1b1c76132bb75972b755049fb742/sqlite_utils/db.py#L2996).\r\n\r\nIt would be useful to be able to call a complementary function to generate a corresponding `_id` from a subset of specified columns when adding items to another table, eg to support the creation of foreign keys.\r\n\r\nOr is there a better pattern for doing that?", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/343/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1079422215, "node_id": "I_kwDOCGYnMM5AVq0H", "number": 357, "title": "pytest-runner is not required", "user": {"value": 4067843, "label": "pgajdos"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2021-12-14T07:51:24Z", "updated_at": "2021-12-16T20:43:19Z", "closed_at": "2021-12-16T20:43:13Z", "author_association": "NONE", "pull_request": null, "body": "Deprecated pytest-runner is not necessary for running the testsuite.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/357/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1082651698, "node_id": "I_kwDOCGYnMM5Ah_Qy", "number": 358, "title": "Support for CHECK constraints", "user": {"value": 11597658, "label": "luxint"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 7, "created_at": "2021-12-16T21:19:45Z", "updated_at": "2022-09-25T07:15:59Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Hi,\r\n\r\nI noticed the `transform.table()` method doesn't have an option to add/change or drop a check constraint (see https://sqlite.org/lang_createtable.html -> 3.7 Check Constraints. would be great to have this as an option! \r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/358/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1091819089, "node_id": "I_kwDOCGYnMM5BE9ZR", "number": 360, "title": "MemoryError", "user": {"value": 559453, "label": "nzaar9"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 1, "created_at": "2022-01-01T13:39:17Z", "updated_at": "2022-03-21T04:22:46Z", "closed_at": "2022-03-21T04:22:46Z", "author_association": "NONE", "pull_request": null, "body": "HI, when dealing with large json file (~170GB) i got the following error \r\n```\r\nTraceback (most recent call last):\r\n File \"/usr/local/bin/sqlite-utils\", line 8, in \r\n sys.exit(cli())\r\n File \"/usr/lib/python3/dist-packages/click/core.py\", line 1126, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/usr/lib/python3/dist-packages/click/core.py\", line 1051, in main\r\n rv = self.invoke(ctx)\r\n File \"/usr/lib/python3/dist-packages/click/core.py\", line 1657, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/usr/lib/python3/dist-packages/click/core.py\", line 1393, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/usr/lib/python3/dist-packages/click/core.py\", line 752, in invoke\r\n return __callback(*args, **kwargs)\r\n File \"/usr/local/lib/python3.9/dist-packages/sqlite_utils/cli.py\", line 1300, in memory\r\n rows, format_used = rows_from_file(csv_fp, format=format, encoding=encoding)\r\n File \"/usr/local/lib/python3.9/dist-packages/sqlite_utils/utils.py\", line 185, in rows_from_file\r\n return rows_from_file(buffered, format=Format.JSON)\r\n File \"/usr/local/lib/python3.9/dist-packages/sqlite_utils/utils.py\", line 156, in rows_from_file\r\n decoded = json.load(fp)\r\n File \"/usr/lib/python3.9/json/__init__.py\", line 293, in load\r\n return loads(fp.read(),\r\nMemoryError\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/360/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1123903919, "node_id": "I_kwDOCGYnMM5C_Wmv", "number": 397, "title": "Support IF NOT EXISTS for table creation", "user": {"value": 738408, "label": "rafguns"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2022-02-04T07:41:15Z", "updated_at": "2022-02-06T01:30:46Z", "closed_at": "2022-02-06T01:29:01Z", "author_association": "NONE", "pull_request": null, "body": "Currently, I have a bunch of code that looks like this:\r\n\r\n```python\r\nsubjects = db[\"subjects\"] if db[\"subjects\"].exists() else db[\"subjects\"].create({\r\n ...\r\n})\r\n```\r\nIt would be neat if sqlite-utils could simplify that by supporting `CREATE TABLE IF NOT EXISTS`, so that I'd be able to write, e.g.\r\n\r\n```python\r\nsubjects = db[\"subjects\"].create({...}, if_not_exists=True)\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/397/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1128466114, "node_id": "I_kwDOCGYnMM5DQwbC", "number": 406, "title": "Creating tables with custom datatypes", "user": {"value": 82988, "label": "psychemedia"}, "state": "open", "locked": 0, "assignee": null, "milestone": null, "comments": 5, "created_at": "2022-02-09T12:16:31Z", "updated_at": "2022-09-15T18:13:50Z", "closed_at": null, "author_association": "NONE", "pull_request": null, "body": "Via https://stackoverflow.com/a/18622264/454773 I note the ability to register custom handlers for novel datatypes that can map into and out of things like sqlite `BLOB`s.\r\n\r\nFrom a quick look and a quick play, I didn't spot a way to do this in `sqlite_utils`?\r\n\r\nFor example:\r\n\r\n```python\r\n# Via https://stackoverflow.com/a/18622264/454773\r\nimport sqlite3\r\nimport numpy as np\r\nimport io\r\n\r\ndef adapt_array(arr):\r\n \"\"\"\r\n http://stackoverflow.com/a/31312102/190597 (SoulNibbler)\r\n \"\"\"\r\n out = io.BytesIO()\r\n np.save(out, arr)\r\n out.seek(0)\r\n return sqlite3.Binary(out.read())\r\n\r\ndef convert_array(text):\r\n out = io.BytesIO(text)\r\n out.seek(0)\r\n return np.load(out)\r\n\r\n\r\n# Converts np.array to TEXT when inserting\r\nsqlite3.register_adapter(np.ndarray, adapt_array)\r\n\r\n# Converts TEXT to np.array when selecting\r\nsqlite3.register_converter(\"array\", convert_array)\r\n```\r\n\r\n```python\r\nfrom sqlite_utils import Database\r\ndb = Database('test.db')\r\n\r\n# Reset the database connection to used the parsed datatype\r\n# sqlite_utils doesn't seem to support eg:\r\n# Database('test.db', detect_types=sqlite3.PARSE_DECLTYPES)\r\ndb.conn = sqlite3.connect(db_name, detect_types=sqlite3.PARSE_DECLTYPES)\r\n\r\n# Create a table the old fashioned way\r\n# but using the new custom data type\r\nvector_table_create = \"\"\"\r\nCREATE TABLE dummy \r\n (title TEXT, vector array );\r\n\"\"\"\r\n\r\ncur = db.conn.cursor()\r\ncur.execute(vector_table_create)\r\n\r\n\r\n# sqlite_utils doesn't appear to support custom types (yet?!)\r\n# The following errors on the \"array\" datatype\r\n\"\"\"\r\ndb[\"dummy\"].create({\r\n \"title\": str,\r\n \"vector\": \"array\",\r\n})\r\n\"\"\"\r\n```\r\n\r\nWe can then add / retrieve records from the database where the datatype of the `vector` field is a custom registered `array` type (which is to say, a `numpy` array):\r\n\r\n```python\r\nimport numpy as np\r\n\r\ndb[\"dummy\"].insert({'title':\"test1\", 'vector':np.array([1,2,3])})\r\n\r\nfor row in db.query(\"SELECT * FROM dummy\"):\r\n print(row['title'], row['vector'], type(row['vector']))\r\n\r\n\"\"\"\r\ntest1 [1 2 3] \r\n\"\"\"\r\n```\r\n\r\nIt would be handy to be able to do this idiomatically in `sqlite_utils`.", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/406/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": null} {"id": 1145882578, "node_id": "I_kwDOCGYnMM5ETMfS", "number": 408, "title": "`deterministic=True` fails on versions of SQLite prior to 3.8.3", "user": {"value": 24938923, "label": "learning4life"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 6, "created_at": "2022-02-21T14:36:43Z", "updated_at": "2022-03-13T16:54:09Z", "closed_at": "2022-03-02T00:38:11Z", "author_association": "NONE", "pull_request": null, "body": "Hi, love your work.\r\n\r\nI am unable to lookup indexes in a database using sqlite-utils:\r\n\r\n`\r\nsqlite-utils indexes city_spec.db --table`\r\n\r\nor\r\n\r\n`sqlite-utils indexes city_spec.db MyTable\r\n`\r\n\r\n**Software**\r\nsqlite-utils, version 3.24\r\nsqlite3 --version: 3.36.0 \r\n\r\n**Output:**\r\n\r\nTraceback (most recent call last):\r\n File \"/opt/app-root/bin/sqlite-utils\", line 8, in \r\n sys.exit(cli())\r\n File \"/opt/app-root/lib64/python3.8/site-packages/click/core.py\", line 1128, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/opt/app-root/lib64/python3.8/site-packages/click/core.py\", line 1053, in main\r\n rv = self.invoke(ctx)\r\n File \"/opt/app-root/lib64/python3.8/site-packages/click/core.py\", line 1659, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/opt/app-root/lib64/python3.8/site-packages/click/core.py\", line 1395, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/opt/app-root/lib64/python3.8/site-packages/click/core.py\", line 754, in invoke\r\n return __callback(*args, **kwargs)\r\n File \"/opt/app-root/lib64/python3.8/site-packages/click/decorators.py\", line 26, in new_func\r\n return f(get_current_context(), *args, **kwargs)\r\n File \"/opt/app-root/lib64/python3.8/site-packages/sqlite_utils/cli.py\", line 2123, in indexes\r\n ctx.invoke(\r\n File \"/opt/app-root/lib64/python3.8/site-packages/click/core.py\", line 754, in invoke\r\n return __callback(*args, **kwargs)\r\n File \"/opt/app-root/lib64/python3.8/site-packages/sqlite_utils/cli.py\", line 1624, in query\r\n db.register_fts4_bm25()\r\n File \"/opt/app-root/lib64/python3.8/site-packages/sqlite_utils/db.py\", line 403, in register_fts4_bm25\r\n self.register_function(rank_bm25, deterministic=True)\r\n File \"/opt/app-root/lib64/python3.8/site-packages/sqlite_utils/db.py\", line 399, in register_function\r\n register(fn)\r\n File \"/opt/app-root/lib64/python3.8/site-packages/sqlite_utils/db.py\", line 392, in register\r\n self.conn.create_function(name, arity, fn, **kwargs)\r\nsqlite3.NotSupportedError: deterministic=True requires SQLite 3.8.3 or higher\r\n", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/408/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 1171599874, "node_id": "I_kwDOCGYnMM5F1TIC", "number": 415, "title": "Convert with `--multi` and `--dry-run` flag does not work", "user": {"value": 3976183, "label": "dotcs"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 2, "created_at": "2022-03-16T21:59:46Z", "updated_at": "2022-03-21T04:18:24Z", "closed_at": "2022-03-21T04:18:24Z", "author_association": "NONE", "pull_request": null, "body": "It's not possible to combine `--multi` and `--dry-run` flag in the `convert` command.\r\n\r\nLet's first create a simple database from JSON string\r\n\r\n```console\r\n$ echo '[{\"foo\": \"abc\"}]' | sqlite-utils insert demo.db demo -\r\n$ sqlite-utils query demo.db \"SELECT * FROM demo\" \r\n[{\"foo\": \"abc\"}]\r\n```\r\n\r\nand then try to convert the \"foo\" column with a static value \"bar\" (see docs [Converting a column into multiple columns](https://sqlite-utils.datasette.io/en/stable/cli.html#converting-a-column-into-multiple-columns))\r\n\r\n```console\r\n$ sqlite-utils convert demo.db demo foo '{\"foo\": \"bar\"}' --multi --dry-run\r\nTraceback (most recent call last):\r\n File \"/home/dotcs/anaconda3/envs/tools/bin/sqlite-utils\", line 8, in \r\n sys.exit(cli())\r\n File \"/home/dotcs/anaconda3/envs/tools/lib/python3.9/site-packages/click/core.py\", line 1128, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/home/dotcs/anaconda3/envs/tools/lib/python3.9/site-packages/click/core.py\", line 1053, in main\r\n rv = self.invoke(ctx)\r\n File \"/home/dotcs/anaconda3/envs/tools/lib/python3.9/site-packages/click/core.py\", line 1659, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/home/dotcs/anaconda3/envs/tools/lib/python3.9/site-packages/click/core.py\", line 1395, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/home/dotcs/anaconda3/envs/tools/lib/python3.9/site-packages/click/core.py\", line 754, in invoke\r\n return __callback(*args, **kwargs)\r\n File \"/home/dotcs/anaconda3/envs/tools/lib/python3.9/site-packages/sqlite_utils/cli.py\", line 2686, in convert\r\n for row in db.conn.execute(sql, where_args).fetchall():\r\nsqlite3.OperationalError: user-defined function raised exception\r\n```\r\n\r\nBut without the `--dry-run` flag it does work as expected:\r\n\r\n```console\r\n$ sqlite-utils convert demo.db demo foo '{\"foo\": \"bar\"}' --multi\r\n$ sqlite-utils query demo.db \"SELECT * FROM demo\" \r\n[{\"foo\": \"bar\"}]\r\n```\r\n\r\n```console\r\n$ sqlite-utils --version\r\nsqlite-utils, version 3.25.1\r\n```", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/415/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"}