{"id": 810507413, "node_id": "MDExOlB1bGxSZXF1ZXN0NTc1MTg3NDU3", "number": 1229, "title": "ensure immutable databses when starting in configuration directory mode with", "user": {"value": 295329, "label": "camallen"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 3, "created_at": "2021-02-17T20:18:26Z", "updated_at": "2022-04-22T13:16:36Z", "closed_at": "2021-03-29T00:17:32Z", "author_association": "CONTRIBUTOR", "pull_request": "simonw/datasette/pulls/1229", "body": "fixes #1224 \r\n\r\nThis PR ensures all databases found in a configuration directory that match the files in `inspect-data.json` will be set to `immutable` as outlined in https://docs.datasette.io/en/latest/settings.html#configuration-directory-mode\r\n\r\nspecifically on building the `datasette` instance it checks:\r\n- if `immutables` is an empty tuple - as passed by the cli code\r\n- if `immutables` is the default function value `None` - when it's not explicitly set\r\n\r\nAnd correctly builds the immutable database list from the `inspect-data[file]` keys.\r\n\r\nNote for this to work the `inspect-data.json` file must contain `file` paths which are relative to the configuration directory otherwise the file paths won't match and the dbs won't be set to immutable. \r\n\r\nI couldn't find an easy way to test this due to the way `make_app_client` works, happy to take directions on adding a test for this. \r\n\r\nI've updated the relevant docs as well, i.e. use the `inspect` cli cmd from the config directory path to create the relevant file\r\n```\r\ncd $config_dir\r\ndatasette inspect *.db --inspect-file=inspect-data.json\r\n```\r\nhttps://docs.datasette.io/en/latest/performance.html#using-datasette-inspect", "repo": {"value": 107914493, "label": "datasette"}, "type": "pull", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/datasette/issues/1229/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": 0, "state_reason": null} {"id": 807433181, "node_id": "MDU6SXNzdWU4MDc0MzMxODE=", "number": 1224, "title": "can't start immutable databases from configuration dir mode", "user": {"value": 295329, "label": "camallen"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 0, "created_at": "2021-02-12T17:50:13Z", "updated_at": "2021-03-29T00:17:31Z", "closed_at": "2021-03-29T00:17:31Z", "author_association": "CONTRIBUTOR", "pull_request": null, "body": "Say I have a `/databases/` directory with multiple sqlite db files in that dir (`1.db` & `2.db`) and an `inspect-data.json` file.\r\n\r\nIf I start datasette via `datasette -h 0.0.0.0 /databases/` then the resulting databases are set to `is_mutable: true` as inspected via http://127.0.0.1:8001/-/databases.json\r\n\r\nI don't want to have to list out the databases by name, e.g. `datasette -i /databases/1.db -i /databases/2.db` as i want the system to autodetect the sqlite dbs i have in the configuration directory \r\n\r\nAccording to the docs outlined in https://docs.datasette.io/en/latest/settings.html?highlight=immutable#configuration-directory-mode this should be possible\r\n> `inspect-data.json` the result of running datasette inspect - any database files listed here will be treated as immutable, so they should not be changed while Datasette is running\r\n \r\nI believe that if the `inspect-json.json` file present, then in theory the databases will be automatically set to immutable via this code https://github.com/simonw/datasette/blob/9603d893b9b72653895318c9104d754229fdb146/datasette/app.py#L211-L216\r\n\r\nHowever it appears the Click Multiple Options will return a tuple via https://github.com/simonw/datasette/blob/9603d893b9b72653895318c9104d754229fdb146/datasette/cli.py#L311-L317\r\n\r\nThe resulting tuple is passed to the Datasette app via `kwargs` and overrides the behaviour to set the databases to immutable via this arg https://github.com/simonw/datasette/blob/9603d893b9b72653895318c9104d754229fdb146/datasette/app.py#L182\r\n\r\nIf you think this is a bug and needs fixing, I am willing to make a PR to check for the empty `immutable` tuple before calling the Datasette class initializer as I think leaving that class interface alone is the best path here.\r\n\r\nThoughts?\r\n\r\nAlso - i'm loving Datasette, it truly is a wonderful tool, thank you :)", "repo": {"value": 107914493, "label": "datasette"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/datasette/issues/1224/reactions\", \"total_count\": 1, \"+1\": 1, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"} {"id": 807174161, "node_id": "MDU6SXNzdWU4MDcxNzQxNjE=", "number": 227, "title": "Error reading csv files with large column data", "user": {"value": 295329, "label": "camallen"}, "state": "closed", "locked": 0, "assignee": null, "milestone": null, "comments": 4, "created_at": "2021-02-12T11:51:47Z", "updated_at": "2021-02-16T11:48:03Z", "closed_at": "2021-02-14T21:17:19Z", "author_association": "NONE", "pull_request": null, "body": "*Feel free to close this issue - I mostly added it for reference for future folks that run into this :)*\r\n\r\nI have a CSV file with one column that has very long strings. When i try to import this file via the `insert` command I get the following error: \r\n```\r\nsqlite-utils insert database.db table_name file_with_large_column.csv\r\n\r\nTraceback (most recent call last):\r\n File \"/usr/local/bin/sqlite-utils\", line 10, in \r\n sys.exit(cli())\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 829, in __call__\r\n return self.main(*args, **kwargs)\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 782, in main\r\n rv = self.invoke(ctx)\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 1259, in invoke\r\n return _process_result(sub_ctx.command.invoke(sub_ctx))\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 1066, in invoke\r\n return ctx.invoke(self.callback, **ctx.params)\r\n File \"/usr/local/lib/python3.7/site-packages/click/core.py\", line 610, in invoke\r\n return callback(*args, **kwargs)\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 774, in insert\r\n default=default,\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 705, in insert_upsert_implementation\r\n docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py\", line 1852, in insert_all\r\n first_record = next(records)\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 703, in \r\n docs = (decode_base64_values(doc) for doc in docs)\r\n File \"/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py\", line 681, in \r\n docs = (dict(zip(headers, row)) for row in reader)\r\n_csv.Error: field larger than field limit (131072)\r\n```\r\nBuilt with the docker image `datasetteproject/datasette:0.54` with the following versions:\r\n```\r\n# sqlite-utils --version\r\nsqlite-utils, version 3.4.1\r\n\r\n# datasette --version\r\ndatasette, version 0.54\r\n```\r\nIt appears this is a [known issue](https://stackoverflow.com/a/54517228/2761423) reading in csv files in python and [doesn't look to be modifiable](https://github.com/python/cpython/blob/ea46579067fd2d4e164d6605719ffec690c4d621/Modules/_csv.c#L1685) through system / env vars (i may be very wrong on this).\r\n\r\nNoting that using sqlite3 `import` command work without error (not using the python csv reader)\r\n```\r\nsqlite3 database.db\r\nsqlite> .mode csv\r\nsqlite> .import file_with_large_column.csv table_name\r\n```\r\nSadly I couldn't see an easy way around this while using the cli as it appears this value needs to be changed in python code. FWIW I've switched to using https://datasette.io/tools/csvs-to-sqlite for importing csv data and it's working well. \r\n\r\nFinally, I'm loving https://datasette.io/ thank you very much for an amazing tool and data ecosytem \ud83d\ude47\u200d\u2640\ufe0f ", "repo": {"value": 140912432, "label": "sqlite-utils"}, "type": "issue", "active_lock_reason": null, "performed_via_github_app": null, "reactions": "{\"url\": \"https://api.github.com/repos/simonw/sqlite-utils/issues/227/reactions\", \"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "draft": null, "state_reason": "completed"}