issue_comments

15 rows where issue = 565064079 sorted by updated_at descending

View and edit SQL

Suggested facets: created_at (date), updated_at (date)

user

issue

  • --dirs option for scanning directories for SQLite databases · 15

author_association

id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue
604667029 https://github.com/simonw/datasette/pull/672#issuecomment-604667029 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDYwNDY2NzAyOQ== simonw 9599 2020-03-26T20:26:46Z 2020-03-26T20:26:46Z OWNER

I think I can tell what the current file limit is like so:

In [1]: import resource                                                                                                                                                                   

In [2]: resource.getrlimit(resource.RLIMIT_NOFILE)                                                                                                                                        
Out[2]: (256, 9223372036854775807)

So maybe I should have Datasette refuse to open more database files than that number minus 5 (to give me some spare room for opening CSS files etc).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
604665229 https://github.com/simonw/datasette/pull/672#issuecomment-604665229 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDYwNDY2NTIyOQ== simonw 9599 2020-03-26T20:22:48Z 2020-03-26T20:22:48Z OWNER

I also eventually get this error:

Traceback (most recent call last):
  File "/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py", line 121, in route_path
    return await view(new_scope, receive, send)
  File "/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py", line 336, in inner_static
    await asgi_send_file(send, full_path, chunk_size=chunk_size)
  File "/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py", line 303, in asgi_send_file
    async with aiofiles.open(str(filepath), mode="rb") as fp:
  File "/Users/simonw/.local/share/virtualenvs/datasette-oJRYYJuA/lib/python3.7/site-packages/aiofiles/base.py", line 78, in __aenter__
  File "/Users/simonw/.local/share/virtualenvs/datasette-oJRYYJuA/lib/python3.7/site-packages/aiofiles/threadpool/__init__.py", line 35, in _open
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 57, in run
OSError: [Errno 24] Too many open files: '/Users/simonw/Dropbox/Development/datasette/datasette/static/app.css'
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
604569063 https://github.com/simonw/datasette/pull/672#issuecomment-604569063 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDYwNDU2OTA2Mw== simonw 9599 2020-03-26T17:32:06Z 2020-03-26T17:32:06Z OWNER

While running it against a nested directory with a TON of databases I kept seeing errors like this:

Traceback (most recent call last):
  File "/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py", line 121, in route_path
    return await view(new_scope, receive, send)
  File "/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py", line 193, in view
    request, **scope["url_route"]["kwargs"]
  File "/Users/simonw/Dropbox/Development/datasette/datasette/views/index.py", line 58, in get
    tables[table]["num_relationships_for_sorting"] = count
KeyError: 'primary-candidates-2018/rep_candidates'

And

Traceback (most recent call last):
  File "/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py", line 121, in route_path
    return await view(new_scope, receive, send)
  File "/Users/simonw/Dropbox/Development/datasette/datasette/utils/asgi.py", line 193, in view
    request, **scope["url_route"]["kwargs"]
  File "/Users/simonw/Dropbox/Development/datasette/datasette/views/index.py", line 58, in get
    tables[table]["num_relationships_for_sorting"] = count
KeyError: 'space_used'
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
604561639 https://github.com/simonw/datasette/pull/672#issuecomment-604561639 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDYwNDU2MTYzOQ== simonw 9599 2020-03-26T17:22:07Z 2020-03-26T17:22:07Z OWNER

Here's the new utility function I should be using to verify database files that I find:

https://github.com/simonw/datasette/blob/6aa516d82dea9885cb4db8d56ec2ccfd4cd9b840/datasette/utils/init.py#L773-L787

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
586441484 https://github.com/simonw/datasette/pull/672#issuecomment-586441484 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDU4NjQ0MTQ4NA== simonw 9599 2020-02-14T19:34:25Z 2020-02-14T19:34:25Z OWNER

I've figured out how to tell if a database is safe to open or not:

select sql from sqlite_master where sql like 'CREATE VIRTUAL TABLE%';

This returns the SQL definitions for virtual tables. The bit after using tells you what they need.

Run this against a SpatiaLite database and you get the following:

CREATE VIRTUAL TABLE SpatialIndex USING VirtualSpatialIndex()
CREATE VIRTUAL TABLE ElementaryGeometries USING VirtualElementary()

Run it against an Apple Photos photos.db file (found with find ~/Library | grep photos.db) and you get this (partial list):

CREATE VIRTUAL TABLE RidList_VirtualReader using RidList_VirtualReaderModule
CREATE VIRTUAL TABLE Array_VirtualReader using Array_VirtualReaderModule
CREATE VIRTUAL TABLE LiGlobals_VirtualBufferReader using VirtualBufferReaderModule
CREATE VIRTUAL TABLE RKPlace_RTree using rtree (modelId,minLongitude,maxLongitude,minLatitude,maxLatitude)

For a database with FTS4 you get:

CREATE VIRTUAL TABLE "docs_fts" USING FTS4 (
    [title], [content], content="docs"
)

FTS5:

CREATE VIRTUAL TABLE [FARA_All_Registrants_fts] USING FTS5 (
                [Name], [Address_1], [Address_2],
                content=[FARA_All_Registrants]
            )

So I can use this to figure out all of the using pieces and then compare them to a list of known support ones.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
586112662 https://github.com/simonw/datasette/pull/672#issuecomment-586112662 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDU4NjExMjY2Mg== simonw 9599 2020-02-14T06:05:27Z 2020-02-14T06:05:27Z OWNER

It think the fix is to use an old-fashioned threading module daemon thread directly. That should exit cleanly when the program exits.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
586111619 https://github.com/simonw/datasette/pull/672#issuecomment-586111619 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDU4NjExMTYxOQ== simonw 9599 2020-02-14T06:01:24Z 2020-02-14T06:01:24Z OWNER

https://gist.github.com/clchiou/f2608cbe54403edb0b13 might work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
586111102 https://github.com/simonw/datasette/pull/672#issuecomment-586111102 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDU4NjExMTEwMg== simonw 9599 2020-02-14T05:59:24Z 2020-02-14T06:00:36Z OWNER

Interesting new problem: hitting Ctrl+C no longer terminates the problem provided that scan_dirs() thread is still running.

https://stackoverflow.com/questions/49992329/the-workers-in-threadpoolexecutor-is-not-really-daemon has clues. The workers are only meant to exit when their worker queues are empty.

But... I want to run the worker every 10 seconds. How do I do that without having it loop forever and hence never quit?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
586109784 https://github.com/simonw/datasette/pull/672#issuecomment-586109784 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDU4NjEwOTc4NA== simonw 9599 2020-02-14T05:53:50Z 2020-02-14T05:54:21Z OWNER

... cheating like this seems to work:

for name, db in list(self.ds.databases.items()):

Python built-in operations are supposedly threadsafe, so in this case I can grab a copy of the list atomically (I think) and then safely iterate over it.

Seems to work in my testing. Wish I could prove it with a unit test though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
586109238 https://github.com/simonw/datasette/pull/672#issuecomment-586109238 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDU4NjEwOTIzOA== simonw 9599 2020-02-14T05:51:12Z 2020-02-14T05:51:12Z OWNER

... or maybe I can cheat and wrap the access to self.ds.databases.items() in list(), so I'm iterating over an atomically-created list of those things instead? I'll try that first.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
586109032 https://github.com/simonw/datasette/pull/672#issuecomment-586109032 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDU4NjEwOTAzMg== simonw 9599 2020-02-14T05:50:15Z 2020-02-14T05:50:15Z OWNER

So I need to ensure the ds.databases data structure is manipulated in a thread-safe manner.

Mainly I need to ensure that it is locked during iterations over it, then unlocked at the end.

Trickiest part is probably ensuring there is a test that proves this is working - I feel like I got lucky encountering that RuntimeError as early as I did.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
586107989 https://github.com/simonw/datasette/pull/672#issuecomment-586107989 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDU4NjEwNzk4OQ== simonw 9599 2020-02-14T05:45:12Z 2020-02-14T05:45:12Z OWNER

I tried running the scan_dirs() method in a thread and got an interesting error while trying to load the homepage: RuntimeError: OrderedDict mutated during iteration

Makes sense - I had a thread that added an item to that dictionary right while the homepage was attempting to run this code:

https://github.com/simonw/datasette/blob/efa54b439fd0394440c302602b919255047b59c5/datasette/views/index.py#L24-L27

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
586069529 https://github.com/simonw/datasette/pull/672#issuecomment-586069529 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDU4NjA2OTUyOQ== simonw 9599 2020-02-14T02:37:17Z 2020-02-14T02:37:17Z OWNER

Another problem: if any of the found databases use SpatiaLite then Datasette will fail to start at all.

It should skip them instead.

The select * from sqlite_master check apparently isn't quite enough to catch this case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
586068095 https://github.com/simonw/datasette/pull/672#issuecomment-586068095 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDU4NjA2ODA5NQ== simonw 9599 2020-02-14T02:30:37Z 2020-02-14T02:30:46Z OWNER

This can take a LONG time to run, and at the moment it's blocking and prevents Datasette from starting up.

It would be much better if this ran in a thread, or an asyncio task. Probably have to be a thread because there's no easy async version of pathlib.Path.glob() that I've seen.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079
586067794 https://github.com/simonw/datasette/pull/672#issuecomment-586067794 https://api.github.com/repos/simonw/datasette/issues/672 MDEyOklzc3VlQ29tbWVudDU4NjA2Nzc5NA== simonw 9599 2020-02-14T02:29:16Z 2020-02-14T02:29:16Z OWNER

One design issue: how to pick neat unique names for database files in a file hierarchy?

Here's what I have so far:

https://github.com/simonw/datasette/blob/fe6f9e6a7397cab2e4bc57745a8da9d824dad218/datasette/app.py#L231-L237

For these files:

../travel-old.db
../sf-tree-history/trees.db
../library-of-congress/records-from-df.db

It made these names:

travel-old
sf-tree-history_trees
library-of-congress_records-from-df

Maybe this is good enough? Needs some tests.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--dirs option for scanning directories for SQLite databases 565064079

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Query took 32.128ms · About: github-to-sqlite