11 rows where user = 79913 sorted by updated_at descending

View and edit SQL

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

author_association

user

  • tsibley · 11
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
795950636 https://github.com/simonw/datasette/issues/838#issuecomment-795950636 https://api.github.com/repos/simonw/datasette/issues/838 MDEyOklzc3VlQ29tbWVudDc5NTk1MDYzNg== tsibley 79913 2021-03-10T19:24:13Z 2021-03-10T19:24:13Z NONE

I think this could be solved by one of:

  1. Stop generating absolute URLs, e.g. ones that include an origin. Relative URLs with absolute paths are fine, as long as they take base_url into account (as they do now, yay!).
  2. Extend base_url to include the expected frontend origin, and then use that information when generating absolute URLs.
  3. Document which HTTP headers the reverse proxy should set (e.g. the X-Forwarded-* family of conventional headers) to pass the frontend origin information to Datasette, and then use that information when generating absolute URLs.

Option 1 seems like the easiest to me, if you can get away with never having to generate an absolute URL.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Incorrect URLs when served behind a proxy with base_url set 637395097  
795939998 https://github.com/simonw/datasette/issues/838#issuecomment-795939998 https://api.github.com/repos/simonw/datasette/issues/838 MDEyOklzc3VlQ29tbWVudDc5NTkzOTk5OA== tsibley 79913 2021-03-10T19:16:55Z 2021-03-10T19:16:55Z NONE

Nod. The problem with the tests is that they're ignoring the origin (hostname, port) of links. In a reverse proxy situation, the frontend request origin is different than the backend request origin. The problem is Datasette generates links with the backend request origin.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Incorrect URLs when served behind a proxy with base_url set 637395097  
795893813 https://github.com/simonw/datasette/issues/838#issuecomment-795893813 https://api.github.com/repos/simonw/datasette/issues/838 MDEyOklzc3VlQ29tbWVudDc5NTg5MzgxMw== tsibley 79913 2021-03-10T18:43:39Z 2021-03-10T18:43:39Z NONE

@simonw Unfortunately this issue as I reported it is not actually solved in version 0.55.

Every link which is returned by the Datasette.absolute_url method is still wrong, because it uses the request URL as the base. This still includes the suggested facet links and pagination links.

What I wrote originally still stands:

Although many of the URLs in the pages are correct (presumably because they either use absolute paths which include base_url or relative paths), the faceting and pagination links still use fully-qualified URLs pointing at http://localhost:8001.

I looked into this a little in the source code, and it seems to be an issue anywhere request.url or request.path is used, as these contain the values for the request between the frontend (Apache) and backend (Datasette) server. Those properties are primarily used via the path_with_… family of utility functions and the Datasette.absolute_url method.

Would you prefer to re-open this issue or have me create a new one?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Incorrect URLs when served behind a proxy with base_url set 637395097  
790857004 https://github.com/simonw/datasette/issues/1238#issuecomment-790857004 https://api.github.com/repos/simonw/datasette/issues/1238 MDEyOklzc3VlQ29tbWVudDc5MDg1NzAwNA== tsibley 79913 2021-03-04T19:06:55Z 2021-03-04T19:06:55Z NONE

@rgieseke Ah, that's super helpful. Thank you for the workaround for now!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Custom pages don't work with base_url setting 813899472  
655898722 https://github.com/simonw/sqlite-utils/issues/121#issuecomment-655898722 https://api.github.com/repos/simonw/sqlite-utils/issues/121 MDEyOklzc3VlQ29tbWVudDY1NTg5ODcyMg== tsibley 79913 2020-07-09T04:53:08Z 2020-07-09T04:53:08Z CONTRIBUTOR

Yep, I agree that makes more sense for backwards compat and more casual use cases. I think it should be possible for the Database/Queryable methods to DTRT based on seeing if it's within a context-manager-managed transaction.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Improved (and better documented) support for transactions 652961907  
655652679 https://github.com/simonw/sqlite-utils/issues/121#issuecomment-655652679 https://api.github.com/repos/simonw/sqlite-utils/issues/121 MDEyOklzc3VlQ29tbWVudDY1NTY1MjY3OQ== tsibley 79913 2020-07-08T17:24:46Z 2020-07-08T17:24:46Z CONTRIBUTOR

Better transaction handling would be really great. Some of my thoughts on implementing better transaction discipline are in https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655239728.

My preferences:

  • Each CLI command should operate in a single transaction so that either the whole thing succeeds or the whole thing is rolled back. This avoids partially completed operations when an error occurs part way through processing. Partially completed operations are typically much harder to recovery from gracefully and may cause inconsistent data states.

  • The Python API should be transaction-agnostic and rely on the caller to coordinate transactions. Only the caller knows how individual insert, create, update, etc operations/methods should be bundled conceptually into transactions. When the caller is the CLI, for example, that bundling would be at the CLI command-level. Other callers might want to break up operations into multiple transactions. Transactions are usually most useful when controlled at the application-level (like logging configuration) instead of the library level. The library needs to provide an API that's conducive to transaction use, though.

  • The Python API should provide a context manager to provide consistent transactions handling with more useful defaults than Python's sqlite3 module. The latter issues implicit BEGIN statements by default for most DML (INSERT, UPDATE, DELETE, … but not SELECT, I believe), but not DDL (CREATE TABLE, DROP TABLE, CREATE VIEW, …). Notably, the sqlite3 module doesn't issue the implicit BEGIN until the first DML statement. It does not issue it when entering the with conn block, like other DBAPI2-compatible modules do. The with conn block for sqlite3 only arranges to commit or rollback an existing transaction when exiting. Including DDL and SELECTs in transactions is important for operation consistency, though. There are several existing bugs.python.org tickets about this and future changes are in the works, but sqlite-utils can provide its own API sooner. sqlite-utils's Database class could itself be a context manager (built on the sqlite3 connection context manager) which additionally issues an explicit BEGIN when entering. This would then let Python API callers do something like:

db = sqlite_utils.Database(path)

with db: # ← BEGIN issued here by Database.__enter__
    db.insert(…)
    db.create_view(…)
# ← COMMIT/ROLLBACK issue here by sqlite3.connection.__exit__
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Improved (and better documented) support for transactions 652961907  
655643078 https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655643078 https://api.github.com/repos/simonw/sqlite-utils/issues/118 MDEyOklzc3VlQ29tbWVudDY1NTY0MzA3OA== tsibley 79913 2020-07-08T17:05:59Z 2020-07-08T17:05:59Z CONTRIBUTOR

The only thing missing from this PR is updates to the documentation.

Ah, yes, thanks for this reminder! I've repushed with doc bits added.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add insert --truncate option 651844316  
655239728 https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655239728 https://api.github.com/repos/simonw/sqlite-utils/issues/118 MDEyOklzc3VlQ29tbWVudDY1NTIzOTcyOA== tsibley 79913 2020-07-08T02:16:42Z 2020-07-08T02:16:42Z CONTRIBUTOR

I fixed my original oops by moving the DELETE FROM $table out of the chunking loop and repushed. I think this change can be considered in isolation from issues around transactions, which I discuss next.

I wanted to make the DELETE + INSERT happen all in the same transaction so it was robust, but that was more complicated than I expected. The transaction handling in the Database/Table classes isn't systematic, and this poses big hurdles to making Table.insert_all (or other operations) consistent and robust in the face of errors.

For example, I wanted to do this (whitespace ignored in diff, so indentation change not highlighted):

diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index d6b9ecf..4107ceb 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -1028,6 +1028,11 @@ class Table(Queryable):
         batch_size = max(1, min(batch_size, SQLITE_MAX_VARS // num_columns))
         self.last_rowid = None
         self.last_pk = None
+        with self.db.conn:
+            # Explicit BEGIN is necessary because Python's sqlite3 doesn't
+            # issue implicit BEGINs for DDL, only DML.  We mix DDL and DML
+            # below and might execute DDL first, e.g. for table creation.
+            self.db.conn.execute("BEGIN")
             if truncate and self.exists():
                 self.db.conn.execute("DELETE FROM [{}];".format(self.name))
             for chunk in chunks(itertools.chain([first_record], records), batch_size):
@@ -1038,7 +1043,11 @@ class Table(Queryable):
                         # Use the first batch to derive the table names
                         column_types = suggest_column_types(chunk)
                         column_types.update(columns or {})
-                    self.create(
+                        # Not self.create() because that is wrapped in its own
+                        # transaction and Python's sqlite3 doesn't support
+                        # nested transactions.
+                        self.db.create_table(
+                            self.name,
                             column_types,
                             pk,
                             foreign_keys,
@@ -1139,7 +1148,6 @@ class Table(Queryable):
                     flat_values = list(itertools.chain(*values))
                     queries_and_params = [(sql, flat_values)]

-            with self.db.conn:
                 for query, params in queries_and_params:
                     try:
                         result = self.db.conn.execute(query, params)

but that fails in tests because other methods call insert/upsert/insert_all/upsert_all in the middle of their transactions, so the BEGIN statement throws an error (no nested transactions allowed).

Stepping back, it would be nice to make the transaction handling systematic and predictable. One way to do this is to make the sqlite_utils/db.py code generally not begin or commit any transactions, and require the caller to do that instead. This lets the caller mix and match the Python API calls into transactions as appropriate (which is impossible for the API methods themselves to fully determine). Then, make sqlite_utils/cli.py begin and commit a transaction in each @cli.command function, making each command robust and consistent in the face of errors. The big change here, and why I didn't just submit a patch, is that it dramatically changes the Python API to require callers to begin a transaction rather than just immediately calling methods.

There is also the caveat that for each transaction, an explicit BEGIN is also necessary so that DDL as well as DML (as well as SELECTs) are consistent and rolled back on error. There are several bugs.python.org discussions around this particular problem of DDL and some plans to make it better and consistent with DBAPI2, eventually. In the meantime, the sqlite-utils Database class could be a context manager which supports the incantations necessary to do proper transactions. This would still be a Python API change for callers but wouldn't expose them to the weirdness of the sqlite3's default transaction handling.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add insert --truncate option 651844316  
655052451 https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655052451 https://api.github.com/repos/simonw/sqlite-utils/issues/118 MDEyOklzc3VlQ29tbWVudDY1NTA1MjQ1MQ== tsibley 79913 2020-07-07T18:45:23Z 2020-07-07T18:45:23Z CONTRIBUTOR

Ah, I see the problem. The truncate is inside a loop I didn't realize was there.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add insert --truncate option 651844316  
655018966 https://github.com/simonw/sqlite-utils/pull/118#issuecomment-655018966 https://api.github.com/repos/simonw/sqlite-utils/issues/118 MDEyOklzc3VlQ29tbWVudDY1NTAxODk2Ng== tsibley 79913 2020-07-07T17:41:06Z 2020-07-07T17:41:06Z CONTRIBUTOR

Hmm, while tests pass, this may not work as intended on larger datasets. Looking into it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add insert --truncate option 651844316  
643083451 https://github.com/simonw/datasette/issues/838#issuecomment-643083451 https://api.github.com/repos/simonw/datasette/issues/838 MDEyOklzc3VlQ29tbWVudDY0MzA4MzQ1MQ== tsibley 79913 2020-06-12T06:04:14Z 2020-06-12T06:04:14Z NONE

Hmm, I haven't tried removing ProxyPassReverse, but it doesn't touch the HTML, which is the issue I'm seeing. You can read the documentation here. ProxyPassReverse is a standard directive when proxying with Apache. I've used it dozens of times with other applications.

Looking a little more at the code, I think the issue here is that the behaviour of base_url makes sense when Datasette is mounted at a path within a larger application, but not when HTTP requests are being proxied to it.

In a mount situation, it is perfectly fine to construct URLs reusing the domain and path from the request. In a proxy situation, it never is, as the domain and path in the request are not the domain and path that the non-proxy client actually needs to use. That is, links which include the Apache → Datasette request origin, localhost:8001, instead of the browser → Apache request origin, example.com, will be broken.

The tests you pointed to also reflect this in two ways:

  1. They strip a leading http://localhost, allowing such URLs in the facet links to pass, but inclusion of that in a proxy situation would mean the URL is broken.

  2. The test client emits direct ASGI events instead of actual proxied HTTP requests. The headers of these ASGI events don't reflect the way an HTTP proxy works; instead they pass through the original request path which contains base_url. This works because Datasette responds to requests equivalently at either /… or /{base_url}/…, which makes some sense in a mount situation but is unconventional (albeit workable) for a proxied app.

Apps that support being proxied automatically support being mounted, but apps that only support being mounted don't automatically support being proxied.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Incorrect URLs when served behind a proxy with base_url set 637395097  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);