{"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-752098906", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 752098906, "node_id": "MDEyOklzc3VlQ29tbWVudDc1MjA5ODkwNg==", "user": {"value": 82988, "label": "psychemedia"}, "created_at": "2020-12-29T14:34:30Z", "updated_at": "2020-12-29T14:34:50Z", "author_association": "CONTRIBUTOR", "body": "FWIW, I had a look at `watchdog` for a `datasette` powered Jupyter notebook search tool: https://github.com/ouseful-testing/nbsearch/blob/main/nbsearch/nbwatchdog.py\r\n\r\nNot a production thing, just an experiment trying to explore what might be possible...", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-751504136", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 751504136, "node_id": "MDEyOklzc3VlQ29tbWVudDc1MTUwNDEzNg==", "user": {"value": 212369, "label": "drewda"}, "created_at": "2020-12-27T19:02:06Z", "updated_at": "2020-12-27T19:02:06Z", "author_association": "NONE", "body": "Very much looking forward to seeing this functionality come together. This is probably out-of-scope for an initial release, but in the future it could be useful to also think of how to run this is a container'ized context. For example, an immutable datasette container that points to an S3 bucket of SQLite DBs or CSVs. Or an immutable datasette container pointing to a NFS volume elsewhere on a Kubernetes cluster.", "reactions": "{\"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-751127485", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 751127485, "node_id": "MDEyOklzc3VlQ29tbWVudDc1MTEyNzQ4NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-12-24T22:58:05Z", "updated_at": "2020-12-24T22:58:05Z", "author_association": "OWNER", "body": "That's a great idea. I'd ruled that out because working with the different operating system versions of those is tricky, but if `watchdog` can handle those differences for me this could be a really good option.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-751127384", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 751127384, "node_id": "MDEyOklzc3VlQ29tbWVudDc1MTEyNzM4NA==", "user": {"value": 1279360, "label": "dyllan-to-you"}, "created_at": "2020-12-24T22:56:48Z", "updated_at": "2020-12-24T22:56:48Z", "author_association": "NONE", "body": "Instead of scanning the directory every 10s, have you considered listening for the native system events to notify you of updates?\r\n\r\nI think python has a nice module to do this for you called [watchdog](https://pypi.org/project/watchdog/)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-586599424", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 586599424, "node_id": "MDEyOklzc3VlQ29tbWVudDU4NjU5OTQyNA==", "user": {"value": 82988, "label": "psychemedia"}, "created_at": "2020-02-15T15:12:19Z", "updated_at": "2020-02-15T15:12:33Z", "author_association": "CONTRIBUTOR", "body": "So could the polling support also allow you to call sqlite_utils to update a database with csv files? (Though I'm guessing you would only want to handle changed files? Do your scrapers check and cache csv datestamps/hashes?)", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-586066798", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 586066798, "node_id": "MDEyOklzc3VlQ29tbWVudDU4NjA2Njc5OA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-02-14T02:24:54Z", "updated_at": "2020-02-14T02:24:54Z", "author_association": "OWNER", "body": "I'm going to move this over to a draft pull request.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-586065843", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 586065843, "node_id": "MDEyOklzc3VlQ29tbWVudDU4NjA2NTg0Mw==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-02-14T02:20:53Z", "updated_at": "2020-02-14T02:20:53Z", "author_association": "OWNER", "body": "MVP for this feature: just do it once on startup, don't scan for new files every X seconds.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-586047525", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 586047525, "node_id": "MDEyOklzc3VlQ29tbWVudDU4NjA0NzUyNQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-02-14T01:03:43Z", "updated_at": "2020-02-14T01:59:02Z", "author_association": "OWNER", "body": "OK, I have a plan. I'm going to try and implement this is a core Datasette feature (no plugins) with the following design:\r\n\r\n- You can tell Datasette \"load any databases you find in this directory\" by passing the `--dir=path/to/dir` option to `datasette` that are valid SQLite files and will attach them to Datasette\r\n- Every 10 seconds Datasette will re-scan those directories to see if any new files have been added\r\n- That 10s will be the default for a new `--config directory_scan_s:10` config option. You can set this to `0` to disable scanning entirely, at which point Datasette will only run the scan once on startup.\r\n\r\nTo check if a file is valid SQLite, Datasette will first check if the first few bytes of the file are `b\"SQLite format 3\\x00\"`. If they are, it will open a connection to the file and attempt to run `select * from sqlite_master` against it. If that runs without any errors it will assume the file is usable and connect it.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-586047995", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 586047995, "node_id": "MDEyOklzc3VlQ29tbWVudDU4NjA0Nzk5NQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2020-02-14T01:05:20Z", "updated_at": "2020-02-14T01:05:20Z", "author_association": "OWNER", "body": "I'm going to add two methods to the Datasette class to help support this work (and to enable exciting new plugin opportunities in the future):\r\n\r\n- `datasette.add_database(name, db)` - adds a new named database to the list of connected databases. `db` will be a `Database()` object, which may prove useful in the future for things like #670 and could also allow some plugins to provide in-memory SQLite databases.\r\n- `datasette.remove_database(name)`", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-474280581", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 474280581, "node_id": "MDEyOklzc3VlQ29tbWVudDQ3NDI4MDU4MQ==", "user": {"value": 82988, "label": "psychemedia"}, "created_at": "2019-03-19T10:06:42Z", "updated_at": "2019-03-19T10:06:42Z", "author_association": "CONTRIBUTOR", "body": "This would be really interesting but several possibilities in use arise, I think?\r\n\r\nFor example:\r\n\r\n- I put a new CSV file into the import dir and a new table is created therefrom\r\n- I put a CSV file into the import dir that replaces a previous file / table of the same name as a pre-existing table (eg files that contain monthly data in year to date). The data may also patch previous months, so a full replace / DROP on the original table may well be in order.\r\n- I put a CSV file into the import dir that updates a table of the same name as a pre-existing table (eg files that contain last month's data)\r\n\r\nCSV files may also have messy names compared to the table you want. Or for an update CSV, may have the form `MYTABLENAME-February2019.csv` etc", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-473312514", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 473312514, "node_id": "MDEyOklzc3VlQ29tbWVudDQ3MzMxMjUxNA==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-03-15T14:42:07Z", "updated_at": "2019-03-17T22:12:30Z", "author_association": "OWNER", "body": "A neat ability of Datasette Library would be if it can work against other files that have been dropped into the folder. In particular: if a user drops a CSV file into the folder, how about automatically converting that CSV file to SQLite using [sqlite-utils](https://github.com/simonw/sqlite-utils)?", "reactions": "{\"total_count\": 2, \"+1\": 2, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null} {"html_url": "https://github.com/simonw/datasette/issues/417#issuecomment-473308631", "issue_url": "https://api.github.com/repos/simonw/datasette/issues/417", "id": 473308631, "node_id": "MDEyOklzc3VlQ29tbWVudDQ3MzMwODYzMQ==", "user": {"value": 9599, "label": "simonw"}, "created_at": "2019-03-15T14:32:13Z", "updated_at": "2019-03-15T14:32:13Z", "author_association": "OWNER", "body": "This would allow Datasette to be easily used as a \"data library\" (like a data warehouse but less expectation of big data querying technology such as Presto).\r\n\r\nOne of the things I learned at the NICAR CAR 2019 conference in Newport Beach is that there is a very real need for some kind of easily accessible data library at most newsrooms.", "reactions": "{\"total_count\": 0, \"+1\": 0, \"-1\": 0, \"laugh\": 0, \"hooray\": 0, \"confused\": 0, \"heart\": 0, \"rocket\": 0, \"eyes\": 0}", "issue": {"value": 421546944, "label": "Datasette Library"}, "performed_via_github_app": null}