github
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/simonw/datasette/pull/672#issuecomment-586067794 | https://api.github.com/repos/simonw/datasette/issues/672 | 586067794 | MDEyOklzc3VlQ29tbWVudDU4NjA2Nzc5NA== | 9599 | 2020-02-14T02:29:16Z | 2020-02-14T02:29:16Z | OWNER | One design issue: how to pick neat unique names for database files in a file hierarchy? Here's what I have so far: https://github.com/simonw/datasette/blob/fe6f9e6a7397cab2e4bc57745a8da9d824dad218/datasette/app.py#L231-L237 For these files: ``` ../travel-old.db ../sf-tree-history/trees.db ../library-of-congress/records-from-df.db ``` It made these names: ``` travel-old sf-tree-history_trees library-of-congress_records-from-df ``` Maybe this is good enough? Needs some tests. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
565064079 | |
https://github.com/simonw/datasette/pull/672#issuecomment-586068095 | https://api.github.com/repos/simonw/datasette/issues/672 | 586068095 | MDEyOklzc3VlQ29tbWVudDU4NjA2ODA5NQ== | 9599 | 2020-02-14T02:30:37Z | 2020-02-14T02:30:46Z | OWNER | This can take a LONG time to run, and at the moment it's blocking and prevents Datasette from starting up. It would be much better if this ran in a thread, or an asyncio task. Probably have to be a thread because there's no easy `async` version of `pathlib.Path.glob()` that I've seen. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
565064079 | |
https://github.com/simonw/datasette/pull/672#issuecomment-586069529 | https://api.github.com/repos/simonw/datasette/issues/672 | 586069529 | MDEyOklzc3VlQ29tbWVudDU4NjA2OTUyOQ== | 9599 | 2020-02-14T02:37:17Z | 2020-02-14T02:37:17Z | OWNER | Another problem: if any of the found databases use SpatiaLite then Datasette will fail to start at all. It should skip them instead. The `select * from sqlite_master` check apparently isn't quite enough to catch this case. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
565064079 | |
https://github.com/simonw/datasette/pull/672#issuecomment-586107989 | https://api.github.com/repos/simonw/datasette/issues/672 | 586107989 | MDEyOklzc3VlQ29tbWVudDU4NjEwNzk4OQ== | 9599 | 2020-02-14T05:45:12Z | 2020-02-14T05:45:12Z | OWNER | I tried running the `scan_dirs()` method in a thread and got an interesting error while trying to load the homepage: `RuntimeError: OrderedDict mutated during iteration` Makes sense - I had a thread that added an item to that dictionary right while the homepage was attempting to run this code: https://github.com/simonw/datasette/blob/efa54b439fd0394440c302602b919255047b59c5/datasette/views/index.py#L24-L27 | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
565064079 | |
https://github.com/simonw/datasette/pull/672#issuecomment-586109032 | https://api.github.com/repos/simonw/datasette/issues/672 | 586109032 | MDEyOklzc3VlQ29tbWVudDU4NjEwOTAzMg== | 9599 | 2020-02-14T05:50:15Z | 2020-02-14T05:50:15Z | OWNER | So I need to ensure the `ds.databases` data structure is manipulated in a thread-safe manner. Mainly I need to ensure that it is locked during iterations over it, then unlocked at the end. Trickiest part is probably ensuring there is a test that proves this is working - I feel like I got lucky encountering that `RuntimeError` as early as I did. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
565064079 | |
https://github.com/simonw/datasette/pull/672#issuecomment-586109238 | https://api.github.com/repos/simonw/datasette/issues/672 | 586109238 | MDEyOklzc3VlQ29tbWVudDU4NjEwOTIzOA== | 9599 | 2020-02-14T05:51:12Z | 2020-02-14T05:51:12Z | OWNER | ... or maybe I can cheat and wrap the access to `self.ds.databases.items()` in `list()`, so I'm iterating over an atomically-created list of those things instead? I'll try that first. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
565064079 | |
https://github.com/simonw/datasette/pull/672#issuecomment-586109784 | https://api.github.com/repos/simonw/datasette/issues/672 | 586109784 | MDEyOklzc3VlQ29tbWVudDU4NjEwOTc4NA== | 9599 | 2020-02-14T05:53:50Z | 2020-02-14T05:54:21Z | OWNER | ... cheating like this seems to work: ``` for name, db in list(self.ds.databases.items()): ``` Python built-in operations are supposedly threadsafe, so in this case I can grab a copy of the list atomically (I think) and then safely iterate over it. Seems to work in my testing. Wish I could prove it with a unit test though. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
565064079 | |
https://github.com/simonw/datasette/pull/672#issuecomment-586111102 | https://api.github.com/repos/simonw/datasette/issues/672 | 586111102 | MDEyOklzc3VlQ29tbWVudDU4NjExMTEwMg== | 9599 | 2020-02-14T05:59:24Z | 2020-02-14T06:00:36Z | OWNER | Interesting new problem: hitting Ctrl+C no longer terminates the problem provided that `scan_dirs()` thread is still running. https://stackoverflow.com/questions/49992329/the-workers-in-threadpoolexecutor-is-not-really-daemon has clues. The workers are only meant to exit when their worker queues are empty. But... I want to run the worker every 10 seconds. How do I do that without having it loop forever and hence never quit? | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
565064079 | |
https://github.com/simonw/datasette/pull/672#issuecomment-586111619 | https://api.github.com/repos/simonw/datasette/issues/672 | 586111619 | MDEyOklzc3VlQ29tbWVudDU4NjExMTYxOQ== | 9599 | 2020-02-14T06:01:24Z | 2020-02-14T06:01:24Z | OWNER | https://gist.github.com/clchiou/f2608cbe54403edb0b13 might work. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
565064079 | |
https://github.com/simonw/datasette/pull/672#issuecomment-586112662 | https://api.github.com/repos/simonw/datasette/issues/672 | 586112662 | MDEyOklzc3VlQ29tbWVudDU4NjExMjY2Mg== | 9599 | 2020-02-14T06:05:27Z | 2020-02-14T06:05:27Z | OWNER | It think the fix is to use an old-fashioned `threading` module daemon thread directly. That should exit cleanly when the program exits. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
565064079 | |
https://github.com/simonw/datasette/pull/672#issuecomment-586441484 | https://api.github.com/repos/simonw/datasette/issues/672 | 586441484 | MDEyOklzc3VlQ29tbWVudDU4NjQ0MTQ4NA== | 9599 | 2020-02-14T19:34:25Z | 2020-02-14T19:34:25Z | OWNER | I've figured out how to tell if a database is safe to open or not: ```sql select sql from sqlite_master where sql like 'CREATE VIRTUAL TABLE%'; ``` This returns the SQL definitions for virtual tables. The bit after `using` tells you what they need. Run this against a SpatiaLite database and you get the following: ```sql CREATE VIRTUAL TABLE SpatialIndex USING VirtualSpatialIndex() CREATE VIRTUAL TABLE ElementaryGeometries USING VirtualElementary() ``` Run it against an Apple Photos `photos.db` file (found with `find ~/Library | grep photos.db`) and you get this (partial list): ```sql CREATE VIRTUAL TABLE RidList_VirtualReader using RidList_VirtualReaderModule CREATE VIRTUAL TABLE Array_VirtualReader using Array_VirtualReaderModule CREATE VIRTUAL TABLE LiGlobals_VirtualBufferReader using VirtualBufferReaderModule CREATE VIRTUAL TABLE RKPlace_RTree using rtree (modelId,minLongitude,maxLongitude,minLatitude,maxLatitude) ``` For a database with FTS4 you get: ```sql CREATE VIRTUAL TABLE "docs_fts" USING FTS4 ( [title], [content], content="docs" ) ``` FTS5: ```sql CREATE VIRTUAL TABLE [FARA_All_Registrants_fts] USING FTS5 ( [Name], [Address_1], [Address_2], content=[FARA_All_Registrants] ) ``` So I can use this to figure out all of the `using` pieces and then compare them to a list of known support ones. | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
565064079 |