home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

24 rows where "updated_at" is on date 2020-02-24 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 5

  • Mechanism for writing to database via a queue 10
  • .execute_write() and .execute_write_fn() methods on Database 9
  • --cp option for datasette publish and datasette package for shipping additional files and directories 3
  • ?_searchmode=raw option for running FTS searches without escaping characters 1
  • Cashe-header missing in http-response 1

user 4

  • simonw 20
  • aviflax 2
  • clausjuhl 1
  • tunguyenatwork 1

author_association 2

  • OWNER 20
  • NONE 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
590608228 https://github.com/simonw/datasette/pull/683#issuecomment-590608228 https://api.github.com/repos/simonw/datasette/issues/683 MDEyOklzc3VlQ29tbWVudDU5MDYwODIyOA== simonw 9599 2020-02-24T23:52:35Z 2020-02-24T23:52:35Z OWNER

I'm going to punt on the ability to introspect the write queue and poll for completion using a UUID for the moment. Can add those later.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.execute_write() and .execute_write_fn() methods on Database 570101428  
590607385 https://github.com/simonw/datasette/pull/683#issuecomment-590607385 https://api.github.com/repos/simonw/datasette/issues/683 MDEyOklzc3VlQ29tbWVudDU5MDYwNzM4NQ== simonw 9599 2020-02-24T23:49:37Z 2020-02-24T23:49:37Z OWNER

Here's the upload_csv.py plugin file I've been playing with: ```python from datasette import hookimpl from starlette.responses import PlainTextResponse, HTMLResponse from starlette.endpoints import HTTPEndpoint import csv as csv_std import codecs import sqlite_utils

class UploadApp(HTTPEndpoint): def init(self, scope, receive, send, datasette): self.datasette = datasette super().init(scope, receive, send)

def get_database(self):
    # For the moment just use the first one that's not immutable
    mutable = [db for db in self.datasette.databases.values() if db.is_mutable]
    return mutable[0]

async def get(self, request):
    return HTMLResponse(
        await self.datasette.render_template(
            "upload_csv.html", {"database_name": self.get_database().name}
        )
    )

async def post(self, request):
    formdata = await request.form()
    csv = formdata["csv"]
    # csv.file is a SpooledTemporaryFile, I can read it directly
    filename = csv.filename
    # TODO: Support other encodings:
    reader = csv_std.reader(codecs.iterdecode(csv.file, "utf-8"))
    headers = next(reader)
    docs = (dict(zip(headers, row)) for row in reader)
    if filename.endswith(".csv"):
        filename = filename[:-4]
    # Import data into a table of that name using sqlite-utils
    db = self.get_database()

    def fn(conn):
        writable_conn = sqlite_utils.Database(db.path)
        writable_conn[filename].insert_all(docs, alter=True)
        return writable_conn[filename].count

    # Without block=True we may attempt 'select count(*) from ...'
    # before the table has been created by the write thread
    count = await db.execute_write_fn(fn, block=True)

    return HTMLResponse(
        await self.datasette.render_template(
            "upload_csv_done.html",
            {
                "database": self.get_database().name,
                "table": filename,
                "num_docs": count,
            },
        )
    )

@hookimpl def asgi_wrapper(datasette): def wrap_with_asgi_auth(app): async def wrapped_app(scope, recieve, send): if scope["path"] == "/-/upload-csv": await UploadApp(scope, recieve, send, datasette) else: await app(scope, recieve, send)

    return wrapped_app

return wrap_with_asgi_auth

`` I also dropped copies of the two template files from https://github.com/simonw/datasette-upload-csvs/tree/699e6ca591f36264bfc8e590d877e6852f274beb/datasette_upload_csvs/templates into mywrite-templates/` directory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.execute_write() and .execute_write_fn() methods on Database 570101428  
590606825 https://github.com/simonw/datasette/pull/683#issuecomment-590606825 https://api.github.com/repos/simonw/datasette/issues/683 MDEyOklzc3VlQ29tbWVudDU5MDYwNjgyNQ== simonw 9599 2020-02-24T23:47:38Z 2020-02-24T23:47:38Z OWNER

Another demo plugin: delete_table.py ```python from datasette import hookimpl from datasette.utils import escape_sqlite from starlette.responses import HTMLResponse from starlette.endpoints import HTTPEndpoint

class DeleteTableApp(HTTPEndpoint): def init(self, scope, receive, send, datasette): self.datasette = datasette super().init(scope, receive, send)

async def post(self, request):
    formdata = await request.form()
    database = formdata["database"]
    db = self.datasette.databases[database]
    await db.execute_write("drop table {}".format(escape_sqlite(formdata["table"])))
    return HTMLResponse("Table has been deleted.")

@hookimpl def asgi_wrapper(datasette): def wrap_with_asgi_auth(app): async def wrapped_app(scope, recieve, send): if scope["path"] == "/-/delete-table": await DeleteTableApp(scope, recieve, send, datasette) else: await app(scope, recieve, send)

    return wrapped_app

return wrap_with_asgi_auth

Then I saved this as `table.html` in the `write-templates/` directory:html+django {% extends "default:table.html" %}

{% block content %}

<form action="/-/delete-table" method="POST">

</form>

{{ super() }} {% endblock %} ``` (Needs CSRF protection added)

I ran Datasette like this:

$ datasette --plugins-dir=write-plugins/ data.db --template-dir=write-templates/

Result: I can delete tables!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.execute_write() and .execute_write_fn() methods on Database 570101428  
590599257 https://github.com/simonw/datasette/pull/683#issuecomment-590599257 https://api.github.com/repos/simonw/datasette/issues/683 MDEyOklzc3VlQ29tbWVudDU5MDU5OTI1Nw== simonw 9599 2020-02-24T23:21:56Z 2020-02-24T23:22:35Z OWNER

Also: are UUIDs really necessary here or could I use a simpler form of task identifier? Like an in-memory counter variable that starts at 0 and increments every time this instance of Datasette issues a new task ID?

The neat thing about UUIDs is that I don't have to worry if there are multiple Datasette instances accepting writes behind a load balancer. That seems pretty unlikely (especially considering SQLite databases encourage only one process to be writing at a time)... but I am experimenting with PostgreSQL support in #670 so it's probably worth ensuring these task IDs really are globally unique.

I'm going to stick with UUIDs. They're short-lived enough that their size doesn't really matter.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.execute_write() and .execute_write_fn() methods on Database 570101428  
590598689 https://github.com/simonw/datasette/pull/683#issuecomment-590598689 https://api.github.com/repos/simonw/datasette/issues/683 MDEyOklzc3VlQ29tbWVudDU5MDU5ODY4OQ== simonw 9599 2020-02-24T23:20:11Z 2020-02-24T23:20:11Z OWNER

I think if block it makes sense to return the return value of the function that was executed. Without it all I really need to do is return the uuid so something could theoretically poll for completion later on.

But is it weird having a function that returns different types depending on if you passed block=True or not? Should they be differently named functions?

I'm OK with the block=True pattern changing the return value I think.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.execute_write() and .execute_write_fn() methods on Database 570101428  
590598248 https://github.com/simonw/datasette/pull/683#issuecomment-590598248 https://api.github.com/repos/simonw/datasette/issues/683 MDEyOklzc3VlQ29tbWVudDU5MDU5ODI0OA== simonw 9599 2020-02-24T23:18:50Z 2020-02-24T23:18:50Z OWNER

I'm not convinced by the return value of the .execute_write_fn() method:

https://github.com/simonw/datasette/blob/ab2348280206bde1390b931ae89d372c2f74b87e/datasette/database.py#L79-L83

Do I really need that WriteResponse class or can I do something nicer?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.execute_write() and .execute_write_fn() methods on Database 570101428  
590593247 https://github.com/simonw/datasette/issues/675#issuecomment-590593247 https://api.github.com/repos/simonw/datasette/issues/675 MDEyOklzc3VlQ29tbWVudDU5MDU5MzI0Nw== aviflax 141844 2020-02-24T23:02:52Z 2020-02-24T23:02:52Z NONE

Design looks great to me.

Excellent, thanks!

I'm not keen on two letter short versions (-cp) - I'd rather either have a single character or no short form at all.

Hmm, well, anyone running datasette package is probably at least somewhat familiar with UNIX CLIs… so how about --cp as a middle ground?

shell $ datasette package --cp /the/source/path /the/target/path data.db

I think I like it. Easy to remember!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--cp option for datasette publish and datasette package for shipping additional files and directories 567902704  
590593120 https://github.com/simonw/datasette/pull/683#issuecomment-590593120 https://api.github.com/repos/simonw/datasette/issues/683 MDEyOklzc3VlQ29tbWVudDU5MDU5MzEyMA== simonw 9599 2020-02-24T23:02:30Z 2020-02-24T23:02:30Z OWNER

I'm going to muck around with a couple more demo plugins - in particular one derived from datasette-upload-csvs - to make sure I'm comfortable with this API - then add a couple of tests and merge it with documentation that warns "this is still an experimental feature and may change".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.execute_write() and .execute_write_fn() methods on Database 570101428  
590592581 https://github.com/simonw/datasette/pull/683#issuecomment-590592581 https://api.github.com/repos/simonw/datasette/issues/683 MDEyOklzc3VlQ29tbWVudDU5MDU5MjU4MQ== simonw 9599 2020-02-24T23:00:44Z 2020-02-24T23:01:09Z OWNER

I've been testing this out by running one-off demo plugins. I saved the following in a file called write-plugins/log_asgi.py (it's a hacked around copy of asgi-log-to-sqlite) and then running datasette data.db --plugins-dir=write-plugins/: ```python from datasette import hookimpl import sqlite_utils import time

class AsgiLogToSqliteViaWriteQueue: lookup_columns = ( "path", "user_agent", "referer", "accept_language", "content_type", "query_string", )

def __init__(self, app, db):
    self.app = app
    self.db = db
    self._tables_ensured = False

async def ensure_tables(self):
    def _ensure_tables(conn):
        db = sqlite_utils.Database(conn)
        for column in self.lookup_columns:
            table = "{}s".format(column)
            if not db[table].exists():
                db[table].create({"id": int, "name": str}, pk="id")
        if "requests" not in db.table_names():
            db["requests"].create(
                {
                    "start": float,
                    "method": str,
                    "path": int,
                    "query_string": int,
                    "user_agent": int,
                    "referer": int,
                    "accept_language": int,
                    "http_status": int,
                    "content_type": int,
                    "client_ip": str,
                    "duration": float,
                    "body_size": int,
                },
                foreign_keys=self.lookup_columns,
            )
    await self.db.execute_write_fn(_ensure_tables)

async def __call__(self, scope, receive, send):
    if not self._tables_ensured:
        self._tables_ensured = True
        await self.ensure_tables()

    response_headers = []
    body_size = 0
    http_status = None

    async def wrapped_send(message):
        nonlocal body_size, response_headers, http_status
        if message["type"] == "http.response.start":
            response_headers = message["headers"]
            http_status = message["status"]

        if message["type"] == "http.response.body":
            body_size += len(message["body"])

        await send(message)

    start = time.time()
    await self.app(scope, receive, wrapped_send)
    end = time.time()

    path = str(scope["path"])
    query_string = None
    if scope.get("query_string"):
        query_string = "?{}".format(scope["query_string"].decode("utf8"))

    request_headers = dict(scope.get("headers") or [])

    referer = header(request_headers, "referer")
    user_agent = header(request_headers, "user-agent")
    accept_language = header(request_headers, "accept-language")

    content_type = header(dict(response_headers), "content-type")

    def _log_to_database(conn):
        db = sqlite_utils.Database(conn)
        db["requests"].insert(
            {
                "start": start,
                "method": scope["method"],
                "path": lookup(db, "paths", path),
                "query_string": lookup(db, "query_strings", query_string),
                "user_agent": lookup(db, "user_agents", user_agent),
                "referer": lookup(db, "referers", referer),
                "accept_language": lookup(db, "accept_languages", accept_language),
                "http_status": http_status,
                "content_type": lookup(db, "content_types", content_type),
                "client_ip": scope.get("client", (None, None))[0],
                "duration": end - start,
                "body_size": body_size,
            },
            alter=True,
            foreign_keys=self.lookup_columns,
        )

    await self.db.execute_write_fn(_log_to_database)

def header(d, name): return d.get(name.encode("utf8"), b"").decode("utf8") or None

def lookup(db, table, value): return db[table].lookup({"name": value}) if value else None

@hookimpl def asgi_wrapper(datasette): def wrap_with_class(app): return AsgiLogToSqliteViaWriteQueue( app, next(iter(datasette.databases.values())) )

return wrap_with_class

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.execute_write() and .execute_write_fn() methods on Database 570101428  
590543398 https://github.com/simonw/datasette/issues/681#issuecomment-590543398 https://api.github.com/repos/simonw/datasette/issues/681 MDEyOklzc3VlQ29tbWVudDU5MDU0MzM5OA== clausjuhl 2181410 2020-02-24T20:53:56Z 2020-02-24T20:53:56Z NONE

Excellent. I'll implement the simple plugin-solution now. And will have a go at a more mature plugin later. Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Cashe-header missing in http-response 569317377  
590539805 https://github.com/simonw/datasette/issues/675#issuecomment-590539805 https://api.github.com/repos/simonw/datasette/issues/675 MDEyOklzc3VlQ29tbWVudDU5MDUzOTgwNQ== simonw 9599 2020-02-24T20:44:59Z 2020-02-24T20:45:08Z OWNER

Design looks great to me.

I'm not keen on two letter short versions (-cp) - I'd rather either have a single character or no short form at all.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--cp option for datasette publish and datasette package for shipping additional files and directories 567902704  
590518182 https://github.com/simonw/datasette/pull/683#issuecomment-590518182 https://api.github.com/repos/simonw/datasette/issues/683 MDEyOklzc3VlQ29tbWVudDU5MDUxODE4Mg== simonw 9599 2020-02-24T19:53:12Z 2020-02-24T19:53:12Z OWNER

Next steps are from comment https://github.com/simonw/datasette/issues/682#issuecomment-590517338

I'm going to move ahead without needing that ability though. I figure SQLite writes are fast, and plugins can be trusted to implement just fast writes. So I'm going to support either fire-and-forget writes (they get added to the queue and a task ID is returned) or have the option to block awaiting the completion of the write (using Janus) but let callers decide which version they want. I may add optional timeouts some time in the future.

I am going to make both execute_write() and execute_write_fn() awaitable functions though, for consistency with .execute() and to give me flexibility to change how they work in the future.

I'll also add a block=True option to both of them which causes the function to wait for the write to be successfully executed - defaults to False (fire-and-forget mode).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
.execute_write() and .execute_write_fn() methods on Database 570101428  
590517744 https://github.com/simonw/datasette/issues/682#issuecomment-590517744 https://api.github.com/repos/simonw/datasette/issues/682 MDEyOklzc3VlQ29tbWVudDU5MDUxNzc0NA== simonw 9599 2020-02-24T19:52:16Z 2020-02-24T19:52:16Z OWNER

Moving further development to a pull request: https://github.com/simonw/datasette/pull/683

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for writing to database via a queue 569613563  
590517338 https://github.com/simonw/datasette/issues/682#issuecomment-590517338 https://api.github.com/repos/simonw/datasette/issues/682 MDEyOklzc3VlQ29tbWVudDU5MDUxNzMzOA== simonw 9599 2020-02-24T19:51:21Z 2020-02-24T19:51:21Z OWNER

I filed a question / feature request with Janus about supporting timeouts for .get() against async queues here: https://github.com/aio-libs/janus/issues/240

I'm going to move ahead without needing that ability though. I figure SQLite writes are fast, and plugins can be trusted to implement just fast writes. So I'm going to support either fire-and-forget writes (they get added to the queue and a task ID is returned) or have the option to block awaiting the completion of the write (using Janus) but let callers decide which version they want. I may add optional timeouts some time in the future.

I am going to make both execute_write() and execute_write_fn() awaitable functions though, for consistency with .execute() and to give me flexibility to change how they work in the future.

I'll also add a block=True option to both of them which causes the function to wait for the write to be successfully executed - defaults to False (fire-and-forget mode).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for writing to database via a queue 569613563  
590511601 https://github.com/simonw/datasette/issues/682#issuecomment-590511601 https://api.github.com/repos/simonw/datasette/issues/682 MDEyOklzc3VlQ29tbWVudDU5MDUxMTYwMQ== simonw 9599 2020-02-24T19:38:27Z 2020-02-24T19:38:27Z OWNER

I tested this using the following code in a view (after from sqlite_utils import Database): python db = next(iter(self.ds.databases.values())) db.execute_write_fn(lambda conn: Database(conn)["counter"].insert({"id": 1, "count": 0}, pk="id", ignore=True)) db.execute_write("update counter set count = count + 1 where id = 1")

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for writing to database via a queue 569613563  
590436368 https://github.com/simonw/datasette/issues/682#issuecomment-590436368 https://api.github.com/repos/simonw/datasette/issues/682 MDEyOklzc3VlQ29tbWVudDU5MDQzNjM2OA== simonw 9599 2020-02-24T17:00:21Z 2020-02-24T17:00:21Z OWNER

Interesting challenge: I would like to be able to "await" on queue.get() (with a timeout).

Problem is: queue.Queue() is designed for threading and cannot be awaited. asyncio.Queue can be awaited but is not meant to be used with threads.

https://stackoverflow.com/a/32894169 suggests using Janus, a thread-aware asyncio queue: https://github.com/aio-libs/janus

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for writing to database via a queue 569613563  
590430988 https://github.com/simonw/datasette/issues/682#issuecomment-590430988 https://api.github.com/repos/simonw/datasette/issues/682 MDEyOklzc3VlQ29tbWVudDU5MDQzMDk4OA== simonw 9599 2020-02-24T16:50:48Z 2020-02-24T16:50:48Z OWNER

I'm dropping the progress bar idea. This mechanism is supposed to guarantee exclusive access to the single write connection, which means it should be targeted by operations that are as short as possible. An operation running long enough to need a progress bar is too long!

Any implementation of progress bars for long running write operations needs to happen elsewhere in the stack.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for writing to database via a queue 569613563  
590417619 https://github.com/simonw/datasette/issues/682#issuecomment-590417619 https://api.github.com/repos/simonw/datasette/issues/682 MDEyOklzc3VlQ29tbWVudDU5MDQxNzYxOQ== simonw 9599 2020-02-24T16:27:36Z 2020-02-24T16:27:36Z OWNER

Error handling could be tricky. Exceptions thrown in threads don't show up anywhere by default - I would need to explicitly catch them and decide what to do with them.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for writing to database via a queue 569613563  
590417366 https://github.com/simonw/datasette/issues/682#issuecomment-590417366 https://api.github.com/repos/simonw/datasette/issues/682 MDEyOklzc3VlQ29tbWVudDU5MDQxNzM2Ng== simonw 9599 2020-02-24T16:27:10Z 2020-02-24T16:27:10Z OWNER

I wonder if I even need the reply_queue mechanism? Are the replies from writes generally even interesting?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for writing to database via a queue 569613563  
590405736 https://github.com/simonw/datasette/issues/675#issuecomment-590405736 https://api.github.com/repos/simonw/datasette/issues/675 MDEyOklzc3VlQ29tbWVudDU5MDQwNTczNg== aviflax 141844 2020-02-24T16:06:27Z 2020-02-24T16:06:27Z NONE

So yeah - if you're happy to design this I think it would be worth us adding.

Great! I’ll give it a go.

Small design suggestion: allow --copy to be applied multiple times…

Makes a ton of sense, will do.

Also since Click arguments can take multiple options I don't think you need to have the : in there - although if it better matches Docker's own UI it might be more consistent to have it.

Great point. I double checked the docs for docker cp and in that context the colon is used to delimit a container and a path, while spaces are used to separate the source and target.

The usage string is:

text docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH|- docker cp [OPTIONS] SRC_PATH|- CONTAINER:DEST_PATH

so in fact it’ll be more consistent to use a space to delimit the source and destination paths, like so:

shell $ datasette package --copy /the/source/path /the/target/path data.db

and I suppose the short-form version of the option should be cp like so:

shell $ datasette package -cp /the/source/path /the/target/path data.db

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--cp option for datasette publish and datasette package for shipping additional files and directories 567902704  
590399600 https://github.com/simonw/datasette/issues/682#issuecomment-590399600 https://api.github.com/repos/simonw/datasette/issues/682 MDEyOklzc3VlQ29tbWVudDU5MDM5OTYwMA== simonw 9599 2020-02-24T15:56:10Z 2020-02-24T15:56:23Z OWNER

Implementation plan

Method on Database class called execute_write(sql)

Which calls .execute_write_fn(fn) - so you can instead create a function that applies a whole batch of writes and pass that instead if you need to

Throws an error of database isn't mutable.

Add ._writer_thread thread property to Database - we start that thread the first time we need it. It blocks on ._writer_queue.get()

We write to that queue with WriteTask(fn, uuid, reply_queue) namedtuples - then time-out block awaiting reply for 0.5s

Have a .write_status(uuid) method that checks if uuid has completed

This should be enough to get it all working. MVP can skip the .5s timeout entirely

But... what about that progress bar supporting stretch goal?

For that let's have each write operation that's currently in progress have total and done integer properties. So I guess we can add those to the WriteTask.

Should we have the ability to see what the currently executing write is? Seems useful.

Hopefully I can integrate https://github.com/tqdm/tqdm such that it calculates ETAs without actually trying to print to the console.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for writing to database via a queue 569613563  
590209074 https://github.com/simonw/datasette/issues/676#issuecomment-590209074 https://api.github.com/repos/simonw/datasette/issues/676 MDEyOklzc3VlQ29tbWVudDU5MDIwOTA3NA== tunguyenatwork 58088336 2020-02-24T08:20:15Z 2020-02-24T08:20:15Z NONE

Awesome, thank you so much. I’ll try it out and let you know.

On Sun, Feb 23, 2020 at 1:44 PM Simon Willison notifications@github.com wrote:

You can try this right now like so:

pip install https://github.com/simonw/datasette/archive/search-raw.zip

Then use the following:

?_search=foo*&_searchmode=raw`

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/simonw/datasette/issues/676?email_source=notifications&email_token=AN3FXEFS6B22U2NOT6M5FULRELNY7A5CNFSM4KYIOIB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWGYTI#issuecomment-590113869, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN3FXEANDJ6AIHGU4ADK4D3RELNY7ANCNFSM4KYIOIBQ .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
?_searchmode=raw option for running FTS searches without escaping characters 568091133  
590154309 https://github.com/simonw/datasette/issues/682#issuecomment-590154309 https://api.github.com/repos/simonw/datasette/issues/682 MDEyOklzc3VlQ29tbWVudDU5MDE1NDMwOQ== simonw 9599 2020-02-24T03:14:10Z 2020-02-24T03:14:10Z OWNER

Some prior art: Charles Leifer implemented a SqliteQueueDatabase class that automatically queues writes for you: https://charlesleifer.com/blog/multi-threaded-sqlite-without-the-operationalerrors/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for writing to database via a queue 569613563  
590153892 https://github.com/simonw/datasette/issues/682#issuecomment-590153892 https://api.github.com/repos/simonw/datasette/issues/682 MDEyOklzc3VlQ29tbWVudDU5MDE1Mzg5Mg== simonw 9599 2020-02-24T03:10:45Z 2020-02-24T03:13:03Z OWNER

Some more detailed notes I made earlier:

Datasette would run a single write thread per database. That thread gets an exclusive connection, and a queue. Plugins can add functions to the queue which will be called and given access to that connection.

The write thread for that database is created the first time a write is attempted.

Question: should that thread have its own asyncio loop so that async techniques like httpx can be used within the thread? I think not at first - only investigate this if it turns out to be necessary in the future.

This thread will run as part of the Datasette process. This means there is always a risk that the thread will die in the middle of something because the server got restarted - so use transactions to limit risk of damage to database should that happen.

I don’t want web responses blocking waiting for stuff to happen here - so every task put on that queue will have a task ID, and that ID will be returned such that client code can poll for its completion.

Could the request block for up to 0.5s just in case the write is really fast, then return a polling token if it isn't finished yet? Looks possible - Queue.get can block with a timeout.

There will be a /-/writes page which shows currently queued writes - so each one needs a human-readable description of some sort. (You can access a deque called q.queue to see what’s in there)

Stretch goal: It would be cool if write operations could optionally handle their own progress reports. That way I can do some really nice UI around what’s going on with these things.

This mechanism has a ton of potential. It may even be how we handle things like Twitter imports and suchlike - queued writing tasks.

One catch with this approach: if a plugin is reading from APIs etc it shouldn't block writes to the database while it is doing so. So sticking a function in the queue that does additional time consuming stuff is actually an anti pattern. Instead, plugins should schedule their API access in the main event loop and occasionally write just the updates they need to make to that write queue.

Implementation notes

Maybe each item in the queue is a (callable, uuid, reply_queue) triple. You can do a blocking .get() on the reply_queue if you want to wait for the answer. The execution framework could look for the return value from callable() and automatically send it to reply_queue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Mechanism for writing to database via a queue 569613563  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 927.096ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows