1,691 rows sorted by updated_at descending

View and edit SQL

Suggested facets: milestone, author_association, created_at (date), updated_at (date), closed_at (date)

type

state

id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app
617323873 MDU6SXNzdWU2MTczMjM4NzM= 766 Enable wildcard-searches by default clausjuhl 2181410 open 0     2 2020-05-13T10:14:48Z 2021-03-05T16:35:21Z   NONE  

Hi Simon.

It seems that datasette currently has wildcard-searches disabled by default (along with the boolean search-options, NEAR-queries and more, and despite the docs). If I try out the search-url provided in the docs (https://fara.datasettes.com/fara/FARA_All_ShortForms?_search=manafort), it does not handle wildcard-searches, and I'm unable to make it work on my datasette-instance.

I would argue that wildcard-searches is such a standard query, that it should be enabled by default. Requiring "_searchmode=raw" when using prefix-searches seems unnecessary. Plus: What happens to non-ascii searches when using "_searchmode=raw"? Is the "escape_fts"-function from datasette.utils ignored?

Thanks!

/Claus

datasette 107914493 issue    
813880401 MDExOlB1bGxSZXF1ZXN0NTc3OTUzNzI3 5 WIP: Add Gmail takeout mbox import UtahDave 306240 open 0     19 2021-02-22T21:30:40Z 2021-03-05T16:28:07Z   FIRST_TIME_CONTRIBUTOR dogsheep/google-takeout-to-sqlite/pulls/5

WIP

This PR adds the ability to import emails from a Gmail mbox export from Google Takeout.

This is my first PR to a datasette/dogsheep repo. I've tested this on my personal Google Takeout mbox with ~520,000 emails going back to 2004. This took around ~20 minutes to process.

To provide some feedback on the progress of the import I added the "rich" python module. I'm happy to remove that if adding a dependency is discouraged. However, I think it makes a nice addition to give feedback on the progress of a long import.

Do we want to log emails that have errors when trying to import them?

Dealing with encodings with emails is a bit tricky. I'm very open to feedback on how to deal with those better. As well as any other feedback for improvements.

google-takeout-to-sqlite 206649770 pull    
823035080 MDU6SXNzdWU4MjMwMzUwODA= 1248 duckdb database (very low performance in SQLite) verajosemanuel 15836677 open 0     0 2021-03-05T12:20:29Z 2021-03-05T12:20:29Z   NONE  

My sqlite is getting too big to be processed by datasette (more than 10 minutes waiting to load) so I am working with duckdb and is waaaaay faster. I think the fastest embeddable database actually.

https://duckdb.org/

Taking into account DuckDb is SQLite based it would be GREAT to use it with datasette.

is that possible?

Regards and thanks for a superb job

datasette 107914493 issue    
803333769 MDU6SXNzdWU4MDMzMzM3Njk= 32 KeyError: 'Contents' on running upload robmarkcole 11855322 open 0     1 2021-02-08T08:36:37Z 2021-03-05T00:31:28Z   NONE  

Following the readme, on big sur, and having entered my auth creds via dogsheep-photos s3-auth:

(venv) (base) Robins-MacBook:datasette robin$ dogsheep-photos upload photos.db     ~/Pictures/Photos\ /Users/robin/Pictures/Library.photoslibrary --dry-run

Fetching existing keys from S3...
Traceback (most recent call last):
  File "/Users/robin/datasette/venv/bin/dogsheep-photos", line 8, in <module>
    sys.exit(cli())
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/dogsheep_photos/cli.py", line 96, in upload
    key.split(".")[0] for key in get_all_keys(client, creds["photos_s3_bucket"])
  File "/Users/robin/datasette/venv/lib/python3.8/site-packages/dogsheep_photos/utils.py", line 46, in get_all_keys
    for row in page["Contents"]:
KeyError: 'Contents'

Possibly since the bucket is in EU (London) eu-west-2 and this into is not requested?

dogsheep-photos 256834907 issue    
778380836 MDU6SXNzdWU3NzgzODA4MzY= 4 Feature Request: Gmail Btibert3 203343 open 0     5 2021-01-04T21:31:09Z 2021-03-04T20:54:44Z   NONE  

From takeout, I only exported my Gmail account. Ideally I could parse this into sqlite via this tool.

google-takeout-to-sqlite 206649770 issue    
813899472 MDU6SXNzdWU4MTM4OTk0NzI= 1238 Custom pages don't work with base_url setting tsibley 79913 open 0     5 2021-02-22T21:58:58Z 2021-03-04T19:06:55Z   NONE  

It seems that custom pages aren't routing properly when the base_url setting is used.

To reproduce, with Datasette 0.55.

Create a templates/pages/custom.html with some text.

mkdir -p templates/pages/
echo "Hello, world!" > templates/pages/custom.html

Start Datasette.

datasette --template-dir templates/

Visit http://localhost:8001/custom and see "Hello, world!".

Start Datasette with a base_url.

datasette --template-dir templates/ --setting base_url /prefix/

Visit http://localhost:8001/prefix/custom and see a "Database not found: custom" 404.

Note that like all routes, http://localhost:8001/custom still works when run with base_url.

datasette 107914493 issue    
821841046 MDU6SXNzdWU4MjE4NDEwNDY= 6 Upgrade to latest sqlite-utils simonw 9599 open 0     1 2021-03-04T07:21:54Z 2021-03-04T07:22:51Z   MEMBER  

This is pinned to v1 at the moment.

google-takeout-to-sqlite 206649770 issue    
815955014 MDExOlB1bGxSZXF1ZXN0NTc5Njk3ODMz 1243 fix small typo UtahDave 306240 closed 0     2 2021-02-25T00:22:34Z 2021-03-04T05:46:10Z 2021-03-04T05:46:10Z CONTRIBUTOR simonw/datasette/pulls/1243
datasette 107914493 pull    
323718842 MDU6SXNzdWUzMjM3MTg4NDI= 268 Mechanism for ranking results from SQLite full-text search simonw 9599 open 0   Datasette Next 6158551 7 2018-05-16T17:36:40Z 2021-03-04T03:20:23Z   OWNER  

This isn't particularly straight-forward - all the more reason for Datasette to implement it for you. This article is helpful: http://charlesleifer.com/blog/using-sqlite-full-text-search-with-python/

datasette 107914493 issue    
325958506 MDU6SXNzdWUzMjU5NTg1MDY= 283 Support cross-database joins simonw 9599 closed 0     25 2018-05-24T04:18:39Z 2021-03-03T12:28:42Z 2021-02-18T22:16:46Z OWNER  

SQLite has the ability to attach multiple databases to a single connection and then run joins across multiple databases.

Since Datasette supports more than one database, this would make a pretty neat feature.

datasette 107914493 issue    
820468864 MDExOlB1bGxSZXF1ZXN0NTgzNDA3OTg5 244 Typo in upsert example j-e-d 387669 open 0     0 2021-03-02T23:14:14Z 2021-03-02T23:14:14Z   FIRST_TIME_CONTRIBUTOR simonw/sqlite-utils/pulls/244

Remove extra [

sqlite-utils 140912432 pull    
818684978 MDU6SXNzdWU4MTg2ODQ5Nzg= 243 How can i use this utils to deal with fts on column meta of tables ? svjack 27874014 open 0     0 2021-03-01T09:45:05Z 2021-03-01T09:45:05Z   NONE  

Thank you to release this bravo project.
When i use this project on multi table db, I want to implement
convenient search on column name from different tables.
I want to develop a meta table to save the meta data of different columns
of different tables and search on this meta table to get rows from the
data table (which the meta table describes)
does this project provide some simple function on it ?

You can think a have a knowledge graph about the table in the db,
and i save this knowledge graph into the db with fts enabled.

sqlite-utils 140912432 issue    
818430405 MDU6SXNzdWU4MTg0MzA0MDU= 1247 datasette.add_memory_database() method simonw 9599 closed 0     2 2021-03-01T03:48:38Z 2021-03-01T04:02:26Z 2021-03-01T04:02:26Z OWNER  

I just wrote this code:

https://github.com/simonw/datasette/blob/47eb885cc2c3aafa03645c330c6f597bee9b3b25/tests/test_facets.py#L334-L335

It would be nice if you didn't have to separately instantiate a database object here.

datasette 107914493 issue    
817597268 MDU6SXNzdWU4MTc1OTcyNjg= 1246 Suggest for ArrayFacet possibly confused by blank values simonw 9599 closed 0     3 2021-02-26T19:11:52Z 2021-03-01T03:46:11Z 2021-03-01T03:46:11Z OWNER  

I sometimes don't get the suggestion for facet-by-array for columns that contain arrays. I think it may be because they have empty spaces in them - or perhaps it's because the null detection doesn't actually work.

datasette 107914493 issue    
718259202 MDU6SXNzdWU3MTgyNTkyMDI= 1005 Remove xfail tests when new httpx is released simonw 9599 closed 0   Datasette 1.0 3268330 3 2020-10-09T16:00:19Z 2021-02-28T22:41:08Z 2021-02-28T22:41:08Z OWNER  

My httpx pull request adding raw_path support was just merged: https://github.com/encode/httpx/pull/1357 - but it's not in a release yet.

I'm going to mark these tests as xfail so I can land this change - I'll remove that once an httpx release comes out that I can use to get the tests passing.

_Originally posted by @simonw in https://github.com/simonw/datasette/pull/1000#issuecomment-706263157_

datasette 107914493 issue    
817989436 MDU6SXNzdWU4MTc5ODk0MzY= 242 Async support eyeseast 25778 open 0     12 2021-02-27T18:29:38Z 2021-02-28T22:09:37Z   NONE  

Following our conversation last week, want to note this here before I forget.

I've had a couple situations where I'd like to do a bunch of updates in an async event loop, but I run into SQLite's issues with concurrent writes. This feels like something sqlite-utils could help with.

PeeWee ORM has a SQLite write queue that might be a good model. It's using threads or gevent, but I think that approach would translate well enough to asyncio.

Happy to help with this, too.

sqlite-utils 140912432 issue    
816526538 MDU6SXNzdWU4MTY1MjY1Mzg= 239 sqlite-utils extract could handle nested objects simonw 9599 open 0     12 2021-02-25T15:10:28Z 2021-02-26T18:52:40Z   OWNER  

Imagine a table (imported from a nested JSON file) where one of the columns contains values that look like this:

{"email": "anonymous@noreply.airtable.com", "id": "usrROSHARE0000000", "name": "Anonymous"}

The sqlite-utils extract command already uses single text values in a column to populate a new table. It would not be much of a stretch for it to be able to use JSON instead, including specifying which of those values should be used as the primary key in the new table.

sqlite-utils 140912432 issue    
814591962 MDU6SXNzdWU4MTQ1OTE5NjI= 1240 Allow facetting on custom queries Kabouik 7107523 closed 0     3 2021-02-23T15:52:19Z 2021-02-26T18:19:46Z 2021-02-26T18:18:18Z NONE  

Facets are a tremendously useful feature, especially for people peeking at the database for the first time and still having little knowledge about the details of the data. It is of great assistance to discover interesting features to explore futher in advanced queries.

Yet, it seems it's impossible to use facets when running a custom SQL query, be it from the little gear icons in column names, the facet suggestions at the top (hidden when performing a custom query), or by appending a facet code to the URL.

Is there a technical limitation, or is this something that could be unlocked easily?

datasette 107914493 issue    
817544251 MDU6SXNzdWU4MTc1NDQyNTE= 1245 Sticky table column headers would be useful, especially on the query page simonw 9599 open 0     0 2021-02-26T17:42:51Z 2021-02-26T17:42:57Z   OWNER  

Suggestion from office hours.

datasette 107914493 issue    
817528452 MDU6SXNzdWU4MTc1Mjg0NTI= 1244 Plugin tip: look at the examples linked from the hooks page simonw 9599 closed 0     1 2021-02-26T17:18:27Z 2021-02-26T17:30:38Z 2021-02-26T17:27:15Z OWNER  

Someone asked "what are good example plugins I can look at?" and I realized that the answer is to look through the example links on https://docs.datasette.io/en/stable/plugin_hooks.html - but that tip should be written down somewhere on the https://docs.datasette.io/en/stable/writing_plugins.html page.

datasette 107914493 issue    
815554385 MDU6SXNzdWU4MTU1NTQzODU= 237 db["my_table"].drop(ignore=True) parameter, plus sqlite-utils drop-table --ignore and drop-view --ignore mhalle 649467 closed 0     3 2021-02-24T14:55:06Z 2021-02-25T17:11:41Z 2021-02-25T17:11:41Z NONE  

When I'm generating a derived table in python, I often drop the table and create it from scratch. However, the first time I generate the table, it doesn't exist, so the drop raises an exception. That means more boilerplate.

I was going to submit a pull request that adds an "if_exists" option to the drop method of tables and views.

However, for a utility like sqlite_utils, perhaps the "IF EXISTS" SQL semantics is what you want most of the time, and thus should be the default.

What do you think?

sqlite-utils 140912432 issue    
816523763 MDU6SXNzdWU4MTY1MjM3NjM= 238 .add_foreign_key() corrupts database if column contains a space simonw 9599 closed 0     1 2021-02-25T15:07:20Z 2021-02-25T16:54:02Z 2021-02-25T16:54:02Z OWNER  

I ran this:

db["Reports"].add_foreign_key("Reported by ID", "Reporters", "id")

And got this:

~/jupyter-venv/lib/python3.9/site-packages/sqlite_utils/db.py in add_foreign_keys(self, foreign_keys)
    616         # Have to VACUUM outside the transaction to ensure .foreign_keys property
    617         # can see the newly created foreign key.
--> 618         self.vacuum()
    619 
    620     def index_foreign_keys(self):

~/jupyter-venv/lib/python3.9/site-packages/sqlite_utils/db.py in vacuum(self)
    629 
    630     def vacuum(self):
--> 631         self.execute("VACUUM;")
    632 
    633 

~/jupyter-venv/lib/python3.9/site-packages/sqlite_utils/db.py in execute(self, sql, parameters)
    234             return self.conn.execute(sql, parameters)
    235         else:
--> 236             return self.conn.execute(sql)
    237 
    238     def executescript(self, sql):

DatabaseError: database disk image is malformed
sqlite-utils 140912432 issue    
816560819 MDU6SXNzdWU4MTY1NjA4MTk= 240 table.pks_and_rows_where() method returning primary keys along with the rows simonw 9599 closed 0     7 2021-02-25T15:49:28Z 2021-02-25T16:39:23Z 2021-02-25T16:28:23Z OWNER  

Original title: Easier way to update a row returned from .rows

Here's a surprisingly hard problem I ran into while trying to implement #239 - given a row returned by db[table].rows how can you update that row?

The problem is that the db[table].update(...) method requires a primary key. But if you have a row from the db[table].rows iterator it might not even contain the primary key - provided the table is a rowid table.

Instead, currently, you need to introspect the table and, if rowid is a primary key, explicitly include that in the select= argument to table.rows_where(...) - otherwise it will not be returned.

A utility mechanism to make this easier would be very welcome.

sqlite-utils 140912432 issue    
816601354 MDExOlB1bGxSZXF1ZXN0NTgwMjM1NDI3 241 Extract expand - work in progress simonw 9599 open 0     0 2021-02-25T16:36:38Z 2021-02-25T16:36:38Z   OWNER simonw/sqlite-utils/pulls/241

Refs #239. Still needs documentation and CLI implementation.

sqlite-utils 140912432 pull    
814595021 MDU6SXNzdWU4MTQ1OTUwMjE= 1241 [Feature request] Button to copy URL Kabouik 7107523 open 0     4 2021-02-23T15:55:40Z 2021-02-23T22:46:12Z   NONE  

I use datasette in an iframe inside another HTML file that contains other ways to represent my data (mostly leaflets maps built with R on summarized data), and the datasette iframe is a tab in that page.

This particular use prevents users to access the full URLs of their datasette views and queries, which is a shame because the way datasette handles URLs to make every view or query easy to share is awesome. I know how to get the URL from the context menu of my browser, but I don't think many visitors would do it or even notice that datasette uses permalinks for pretty much every action they do. Would it be possible to add a "Share link" button to the interface, either in datasette itself or in a plugin?

datasette 107914493 issue    
803356942 MDU6SXNzdWU4MDMzNTY5NDI= 1218 /usr/local/opt/python3/bin/python3.6: bad interpreter: No such file or directory robmarkcole 11855322 open 0     1 2021-02-08T09:07:00Z 2021-02-23T12:12:17Z   NONE  

Error as above, however I do have python3.8 and the readme indicates this is supported.

(venv) (base) Robins-MacBook:datasette robin$ ls /usr/local/opt/python3/bin/

.. pip3 python3 python3.8
datasette 107914493 issue    
813978858 MDU6SXNzdWU4MTM5Nzg4NTg= 1239 JSON filter fails if column contains spaces simonw 9599 closed 0     1 2021-02-23T00:18:07Z 2021-02-23T00:22:53Z 2021-02-23T00:22:53Z OWNER  

Got this exception:

ERROR: conn=<sqlite3.Connection object at 0x10ea68e40>, sql = 'select Address, Affiliation, County, [Has Report], [Latest report notes], [Latest report yes], Latitude, [Location Type], Longitude, Name, id, [Appointment scheduling instructions], [Availability Info], [Latest report] from locations where rowid in (\n select locations.rowid from locations, json_each(locations.Availability Info) j\n where j.value = :p0\n ) and "Latest report yes" = :p1 order by id limit 101', params = {'p0': 'Yes: appointment required', 'p1': '1'}: near "Info": syntax error

datasette 107914493 issue    
812704869 MDU6SXNzdWU4MTI3MDQ4Njk= 1237 ?_pretty=1 option for pretty-printing JSON output simonw 9599 open 0     1 2021-02-20T20:54:40Z 2021-02-22T21:10:25Z   OWNER   datasette 107914493 issue    
811505638 MDU6SXNzdWU4MTE1MDU2Mzg= 1234 Runtime support for ATTACHing multiple databases simonw 9599 open 0     1 2021-02-18T22:06:47Z 2021-02-22T21:06:28Z   OWNER  

The implementation in #1232 is ready to land. It's the simplest-thing-that-could-possibly-work: you can run datasette one.db two.db three.db --crossdb and then use the /_memory page to run joins across tables from multiple databases.

It only works on the first 10 databases that were passed to the command-line. This means that if you have a Datasette instance with hundreds of attached databases (see Datasette Library) this won't be particularly useful for you.

So... a better, future version of this feature would be one that lets you join across databases on command - maybe by hitting /_memory?attach=db1&attach=db2 to get a special connection.

_Originally posted by @simonw in https://github.com/simonw/datasette/issues/283#issuecomment-781665560_

datasette 107914493 issue    
812228314 MDU6SXNzdWU4MTIyMjgzMTQ= 1236 Ability to increase size of the SQL editor window simonw 9599 closed 0     8 2021-02-19T18:09:27Z 2021-02-22T21:05:22Z 2021-02-22T21:05:21Z OWNER  
datasette 107914493 issue    
783778672 MDU6SXNzdWU3ODM3Nzg2NzI= 220 Better error message for *_fts methods against views mhalle 649467 closed 0     3 2021-01-11T23:24:00Z 2021-02-22T20:44:51Z 2021-02-14T22:34:26Z NONE  

enable_fts and its related methods only work on tables, not views.

Could those methods and possibly others move up to the Queryable superclass?

sqlite-utils 140912432 issue    
777140799 MDU6SXNzdWU3NzcxNDA3OTk= 1166 Adopt Prettier for JavaScript code formatting simonw 9599 open 0   Datasette 0.54 6346396 10 2020-12-31T21:25:27Z 2021-02-22T18:13:11Z   OWNER  

https://prettier.io/ - I'm going to go with 2 spaces.

datasette 107914493 issue    
627794879 MDU6SXNzdWU2Mjc3OTQ4Nzk= 782 Redesign default JSON format in preparation for Datasette 1.0 simonw 9599 open 0   Datasette 1.0 3268330 45 2020-05-30T18:47:07Z 2021-02-22T10:21:14Z   OWNER  

The default JSON just isn't right. I find myself using ?_shape=array for almost everything I build against the API.

datasette 107914493 issue    
797651831 MDU6SXNzdWU3OTc2NTE4MzE= 1212 Tests are very slow. kbaikov 4488943 closed 0     4 2021-01-31T08:06:16Z 2021-02-19T22:54:13Z 2021-02-19T22:54:13Z NONE  

Working on my PR i noticed that tests are very slow.

The plain pytest run took about 37 minutes for me.
However i could shave of about 10 minutes from that if i used pytest-xdist to parallelize execution.
pytest -n 8 is run only in 28 minutes on my machine.

I can create a PR to mention that in your documentation.
This will be a simple change to add pytest-xdist to requirements and change a command to run pytest in documentation.

Does that make sense to you?

After a bit more investigation it looks like python-xdist is not an answer. It creates a race condition for tests that try to clead temp dir before run.

Profiling shows that most time is spent on conn.executescript(TABLES) in make_app_client function. Which makes sense.

Perhaps the better approach would be look at the app_client fixture which is already session scoped, but not used by all test cases.
And/or use conn = sqlite3.connect(":memory:") which is much faster.
And/or truncate tables after each TC instead of deleting the file and re-creating them.

I can take a look which is the best approach if you give the go-ahead.

datasette 107914493 issue    
520655983 MDU6SXNzdWU1MjA2NTU5ODM= 619 "Invalid SQL" page should let you edit the SQL simonw 9599 open 0   Datasette Next 6158551 8 2019-11-10T20:54:12Z 2021-02-19T18:11:22Z   OWNER  

https://latest.datasette.io/fixtures?sql=select%0D%0A++*%0D%0Afrom%0D%0A++%5Bfoo%5D

Would be useful if this page showed you the invalid SQL you entered so you can edit it and try again.

datasette 107914493 issue    
810507413 MDExOlB1bGxSZXF1ZXN0NTc1MTg3NDU3 1229 ensure immutable databses when starting in configuration directory mode with camallen 295329 open 0     2 2021-02-17T20:18:26Z 2021-02-19T12:47:19Z   FIRST_TIME_CONTRIBUTOR simonw/datasette/pulls/1229

fixes #1224

This PR ensures all databases found in a configuration directory that match the files in inspect-data.json will be set to immutable as outlined in https://docs.datasette.io/en/latest/settings.html#configuration-directory-mode

specifically on building the datasette instance it checks:
- if immutables is an empty tuple - as passed by the cli code
- if immutables is the default function value None - when it's not explicitly set

And correctly builds the immutable database list from the inspect-data[file] keys.

Note for this to work the inspect-data.json file must contain file paths which are relative to the configuration directory otherwise the file paths won't match and the dbs won't be set to immutable.

I couldn't find an easy way to test this due to the way make_app_client works, happy to take directions on adding a test for this.

I've updated the relevant docs as well, i.e. use the inspect cli cmd from the config directory path to create the relevant file

cd $config_dir
datasette inspect *.db --inspect-file=inspect-data.json

https://docs.datasette.io/en/latest/performance.html#using-datasette-inspect

datasette 107914493 pull    
811680502 MDU6SXNzdWU4MTE2ODA1MDI= 236 --attach command line option for attaching extra databases simonw 9599 closed 0     1 2021-02-19T04:38:30Z 2021-02-19T05:10:41Z 2021-02-19T05:08:43Z OWNER  

This will enable cross-database joins, as seen in https://github.com/simonw/datasette/issues/283

Also refs #113

sqlite-utils 140912432 issue    
621286870 MDU6SXNzdWU2MjEyODY4NzA= 113 Syntactic sugar for ATTACH DATABASE simonw 9599 closed 0     2 2020-05-19T21:10:00Z 2021-02-19T05:09:12Z 2021-02-19T04:56:36Z OWNER  

https://www.sqlite.org/lang_attach.html

Maybe something like this:

db.attach("other_db", "other_db.db")
sqlite-utils 140912432 issue    
811589344 MDU6SXNzdWU4MTE1ODkzNDQ= 1235 Upgrade Python version used by official Datasette Docker image simonw 9599 closed 0     2 2021-02-19T00:47:40Z 2021-02-19T01:48:31Z 2021-02-19T01:48:30Z OWNER  

Currently uses 3.7.2:

https://github.com/simonw/datasette/blob/73bed175631a79e13a521eee82f8451dd0477eb3/Dockerfile#L1

There's a security fix for Python which it would be good to ship in this image (even though I'm reasonably confident it doesn't affect Datasette): https://bugs.python.org/issue42938

datasette 107914493 issue    
811407131 MDExOlB1bGxSZXF1ZXN0NTc1OTQwMTkz 1232 --crossdb option for joining across databases simonw 9599 closed 0     8 2021-02-18T19:48:50Z 2021-02-18T22:09:13Z 2021-02-18T22:09:12Z OWNER simonw/datasette/pulls/1232

Refs #283. Still needs:

  • Unit test for --crossdb queries
  • Show warning on console if it truncates at ten databases (or on web interface)
  • Show connected databases on the /_memory database page
  • Documentation
  • https://latest.datasette.io/ demo should demonstrate this feature
datasette 107914493 pull    
811458446 MDU6SXNzdWU4MTE0NTg0NDY= 1233 "datasette publish cloudrun" cannot publish files with spaces in their name simonw 9599 open 0     1 2021-02-18T21:08:31Z 2021-02-18T21:10:08Z   OWNER  

Got this error:

Step 6/9 : RUN datasette inspect fixtures.db extra database.db --inspect-file inspect-data.json
 ---> Running in db9da0068592
Usage: datasette inspect [OPTIONS] [FILES]...
Try 'datasette inspect --help' for help.

Error: Invalid value for '[FILES]...': Path 'extra' does not exist.
The command '/bin/sh -c datasette inspect fixtures.db extra database.db --inspect-file inspect-data.json' returned a non-zero code: 2
ERROR
ERROR: build step 0 "gcr.io/cloud-builders/docker" failed: step exited with non-zero status: 2

While working on the demo for #1232, using this deploy command:

GITHUB_SHA=crossdb datasette publish cloudrun fixtures.db 'extra database.db' \
            -m fixtures.json \
            --plugins-dir=plugins \
            --branch=$GITHUB_SHA \
            --version-note=$GITHUB_SHA \
            --extra-options="--setting template_debug 1 --crossdb" \
            --install=pysqlite3-binary \
            --service=datasette-latest-crossdb
datasette 107914493 issue    
811367257 MDU6SXNzdWU4MTEzNjcyNTc= 1231 Race condition errors in new refresh_schemas() mechanism simonw 9599 open 0     2 2021-02-18T18:49:54Z 2021-02-18T18:50:53Z   OWNER  

I tried running a Locust load test against Datasette and hit an error message about a failure to create tables because they already existed. I think this means there are race conditions in the new refresh_schemas() mechanism added in #1150.

datasette 107914493 issue    
808843401 MDU6SXNzdWU4MDg4NDM0MDE= 1226 --port option should validate port is between 0 and 65535 simonw 9599 closed 0     4 2021-02-15T22:01:33Z 2021-02-18T18:41:27Z 2021-02-18T18:41:27Z OWNER  

Currently throws an ugly error message:

(datasette-graphql) datasette-graphql % datasette fivethirtyeight.db -p 80094
INFO:     Started server process [45497]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/datasette-graphql-n1OSJCS8/bin/datasette", line 8, in <module>
    sys.exit(cli())
...
    server = await loop.create_server(
  File "/Users/simon/.pyenv/versions/3.8.2/lib/python3.8/asyncio/base_events.py", line 1461, in create_server
    sock.bind(sa)
OverflowError: bind(): port must be 0-65535.
datasette 107914493 issue    
811054000 MDU6SXNzdWU4MTEwNTQwMDA= 1230 Vega charts are plotted only for rows on the visible page, cluster maps only for rows in the remaining pages Kabouik 7107523 open 0     1 2021-02-18T12:27:02Z 2021-02-18T15:22:15Z   NONE  

I filtered a data set on some criteria and obtain 265 results, split over three pages (100, 100, 65), and reazlized that Vega plots are only applied to the results displayed on the current page, instead of the whole filtered data, e.g., 100 on page 1, 100 on page 2, 65 on page 3. Is there a way to force the graphs to consider all results instead of just the page, considering that pages rarely represent sensible information?

Likewise, while the cluster map does show all results on the first page, if you go to next pages, it will show all remaining results except the previous page(s), e.g., 265 on page 1, 165 on page 2, 65 on page 3.

In both cases, I don't see many situations where one would like to represent the data this way, and it might even lead to interpretation errors when viewing the data. Am I missing some cases where this would be best? Perhaps a clickable option to subset visual representations according visible pages vs. display all search results would do?

[Edit] Oh, I just saw the "Load all" button under the cluster map as well as the setting to alter the max number or results. So I guess this issue only is about the Vega charts.

datasette 107914493 issue    
810394616 MDU6SXNzdWU4MTAzOTQ2MTY= 1227 Configure sphinx.ext.extlinks for issues simonw 9599 closed 0     0 2021-02-17T17:38:02Z 2021-02-18T01:20:33Z 2021-02-18T01:20:33Z OWNER  

Spotted this in the aspw documentation: https://github.com/rogerbinns/apsw/blob/3.34.0-r1/doc/conf.py#L29-L36

extlinks={
    'cvstrac': ('https://sqlite.org/cvstrac/tktview?tn=%s',
                'SQLite ticket #'),
    'sqliteapi': ('https://sqlite.org/c3ref/%s.html', 'XXYouShouldNotSeeThisXX'),
    'issue': ('https://github.com/rogerbinns/apsw/issues/%s',
              'APSW issue '),
    'source': ('https://github.com/rogerbinns/apsw/blob/master/%s',
               ''),
    }

Which lets you link to issues like this:

:issue:`268`
datasette 107914493 issue    
810618495 MDU6SXNzdWU4MTA2MTg0OTU= 235 Extract columns cannot create foreign key relation: sqlite3.OperationalError: table sqlite_master may not be modified kristomi 6913891 open 0     0 2021-02-17T23:33:23Z 2021-02-17T23:33:23Z   NONE  

Thanks for what seems like a truly great suite of libraries. I wanted to try out Datasette, but never got more than half way through your YouTube video with the SF tree dataset. Whenever I try to extract a column, I get a sqlite3.OperationalError: table sqlite_master may not be modified error from Python. This snippet reproduces the error on my system, Python 3.9.1 and sqlite-utils 3.5 on an M1 Macbook Pro running in rosetta mode:

curl "https://data.nasa.gov/resource/y77d-th95.json" | \
    sqlite-utils insert meteorites.db meteorites - --pk=id
sqlite-utils extract meteorites.db meteorites  recclass

I have tried googling the problem, but all I've found is that this might be a problem with the sqlite3 database running in defensive mode, but I definitely can't know for sure. Does the problem seem familiar to you?

sqlite-utils 140912432 issue    
810397025 MDU6SXNzdWU4MTAzOTcwMjU= 1228 '>' not supported between instances of 'str' and 'int' Kabouik 7107523 open 0     0 2021-02-17T17:41:20Z 2021-02-17T17:41:20Z   NONE  

I recently discovered datasette thanks to your great talk at FOSDEM and would like to use it for some projects. However, when trying to use it on databases created from some csv ot tsv files, I am sometimes getting this issue when going to http://127.0.0.1:8001/databasetest/databasetest and I don't exactly understand what it refers to.

So far, I couldn't find anything relevant when reviewing the raw text files that could explain this issue, nor could I find something obvious between the files that generate this issue and those that don't. Does the error ring a bell and, if so, could you please point me to the right direction?

$ datasette databasetest.db 
INFO:     Started server process [1408482]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)
INFO:     127.0.0.1:56394 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:56394 - "GET /-/static/app.css?4e362c HTTP/1.1" 200 OK
INFO:     127.0.0.1:56396 - "GET /-/static-plugins/datasette_vega/main.2acbb312.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:56398 - "GET /-/static-plugins/datasette_vega/main.08f5d3d8.js HTTP/1.1" 200 OK
Traceback (most recent call last):
  File "/home/kabouik/.local/lib/python3.7/site-packages/datasette/app.py", line 1099, in route_path
    response = await view(request, send)
  File "/home/kabouik/.local/lib/python3.7/site-packages/datasette/views/base.py", line 147, in view
    request, **request.scope["url_route"]["kwargs"]
  File "/home/kabouik/.local/lib/python3.7/site-packages/datasette/views/base.py", line 121, in dispatch_request
    return await handler(request, *args, **kwargs)
  File "/home/kabouik/.local/lib/python3.7/site-packages/datasette/views/base.py", line 260, in get
    request, database, hash, correct_hash_provided, **kwargs
  File "/home/kabouik/.local/lib/python3.7/site-packages/datasette/views/base.py", line 434, in view_get
    request, database, hash, **kwargs
  File "/home/kabouik/.local/lib/python3.7/site-packages/datasette/views/table.py", line 782, in data
    suggested_facets.extend(await facet.suggest())
  File "/home/kabouik/.local/lib/python3.7/site-packages/datasette/facets.py", line 168, in suggest
    and any(r["n"] > 1 for r in distinct_values)
  File "/home/kabouik/.local/lib/python3.7/site-packages/datasette/facets.py", line 168, in <genexpr>
    and any(r["n"] > 1 for r in distinct_values)
TypeError: '>' not supported between instances of 'str' and 'int'
INFO:     127.0.0.1:56402 - "GET /databasetest/databasetest HTTP/1.1" 500 Internal Server Error
INFO:     127.0.0.1:56402 - "GET /-/static/app.css?4e362c HTTP/1.1" 200 OK
INFO:     127.0.0.1:56404 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:56404 - "GET /-/static/app.css?4e362c HTTP/1.1" 200 OK
INFO:     127.0.0.1:56406 - "GET /-/static-plugins/datasette_vega/main.2acbb312.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:56408 - "GET /-/static-plugins/datasette_vega/main.08f5d3d8.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:56408 - "GET /databasetest HTTP/1.1" 200 OK
INFO:     127.0.0.1:56408 - "GET /-/static/app.css?4e362c HTTP/1.1" 200 OK
INFO:     127.0.0.1:56404 - "GET /-/static-plugins/datasette_vega/main.2acbb312.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:56406 - "GET /-/static/codemirror-5.57.0.min.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:56410 - "GET /-/static-plugins/datasette_vega/main.08f5d3d8.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:56414 - "GET /-/static/codemirror-5.57.0-sql.min.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:56412 - "GET /-/static/codemirror-5.57.0.min.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:56408 - "GET /-/static/sql-formatter-2.3.3.min.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:56408 - "GET /databasetest?sql=select+*+from+databasetest HTTP/1.1" 200 OK
INFO:     127.0.0.1:56410 - "GET /-/static/app.css?4e362c HTTP/1.1" 200 OK
INFO:     127.0.0.1:56408 - "GET /-/static-plugins/datasette_vega/main.2acbb312.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:56412 - "GET /-/static/codemirror-5.57.0.min.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:56404 - "GET /-/static/sql-formatter-2.3.3.min.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:56406 - "GET /-/static/codemirror-5.57.0.min.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:56414 - "GET /-/static-plugins/datasette_vega/main.08f5d3d8.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:56408 - "GET /-/static/codemirror-5.57.0-sql.min.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:56410 - "GET /databasetest.json?sql=select+*+from+databasetest&_shape=array&_shape=array HTTP/1.1" 200 OK
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1408482]

Note that there is no error if I go to http://127.0.0.1:8001/databasetest and then click on Run SQL.

datasette 107914493 issue    
807174161 MDU6SXNzdWU4MDcxNzQxNjE= 227 Error reading csv files with large column data camallen 295329 closed 0     4 2021-02-12T11:51:47Z 2021-02-16T11:48:03Z 2021-02-14T21:17:19Z NONE  

Feel free to close this issue - I mostly added it for reference for future folks that run into this :)

I have a CSV file with one column that has very long strings. When i try to import this file via the insert command I get the following error:

sqlite-utils insert database.db table_name file_with_large_column.csv

Traceback (most recent call last):
  File "/usr/local/bin/sqlite-utils", line 10, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 774, in insert
    default=default,
  File "/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 705, in insert_upsert_implementation
    docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs
  File "/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py", line 1852, in insert_all
    first_record = next(records)
  File "/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 703, in <genexpr>
    docs = (decode_base64_values(doc) for doc in docs)
  File "/usr/local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 681, in <genexpr>
    docs = (dict(zip(headers, row)) for row in reader)
_csv.Error: field larger than field limit (131072)

Built with the docker image datasetteproject/datasette:0.54 with the following versions:

# sqlite-utils --version
sqlite-utils, version 3.4.1

# datasette --version
datasette, version 0.54

It appears this is a known issue reading in csv files in python and doesn't look to be modifiable through system / env vars (i may be very wrong on this).

Noting that using sqlite3 import command work without error (not using the python csv reader)

sqlite3 database.db
sqlite> .mode csv
sqlite> .import file_with_large_column.csv table_name

Sadly I couldn't see an easy way around this while using the cli as it appears this value needs to be changed in python code. FWIW I've switched to using https://datasette.io/tools/csvs-to-sqlite for importing csv data and it's working well.

Finally, I'm loving https://datasette.io/ thank you very much for an amazing tool and data ecosytem 🙇‍♀️

sqlite-utils 140912432 issue    
688670158 MDU6SXNzdWU2ODg2NzAxNTg= 147 SQLITE_MAX_VARS maybe hard-coded too low simonwiles 96218 open 0     7 2020-08-30T07:26:45Z 2021-02-15T21:27:55Z   CONTRIBUTOR  

I came across this while about to open an issue and PR against the documentation for batch_size, which is a bit incomplete.

As mentioned in #145, while:

SQLITE_MAX_VARIABLE_NUMBER ... defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.

it is common that it is increased at compile time. Debian-based systems, for example, seem to ship with a version of sqlite compiled with SQLITE_MAX_VARIABLE_NUMBER set to 250,000, and I believe this is the case for homebrew installations too.

In working to understand what batch_size was actually doing and why, I realized that by setting SQLITE_MAX_VARS in db.py to match the value my sqlite was compiled with (I'm on Debian), I was able to decrease the time to insert_all() my test data set (~128k records across 7 tables) from ~26.5s to ~3.5s. Given that this about .05% of my total dataset, this is time I am keen to save...

Unfortunately, it seems that sqlite3 in the python standard library doesn't expose the get_limit() C API (even though pysqlite used to), so it's hard to know what value sqlite has been compiled with (note that this could mean, I suppose, that it's less than 999, and even hardcoding SQLITE_MAX_VARS to the conservative default might not be adequate. It can also be lowered -- but not raised -- at runtime). The best I could come up with is echo "" | sqlite3 -cmd ".limits variable_number" (only available in sqlite >= 2015-05-07 (3.8.10)).

Obviously this couldn't be relied upon in sqlite_utils, but I wonder what your opinion would be about exposing SQLITE_MAX_VARS as a user-configurable parameter (with suitable "here be dragons" warnings)? I'm going to go ahead and monkey-patch it for my purposes in any event, but it seems like it might be worth considering.

sqlite-utils 140912432 issue    
808771690 MDU6SXNzdWU4MDg3NzE2OTA= 1225 More flexible formatting of records with CSS grid mhalle 649467 open 0     0 2021-02-15T19:28:17Z 2021-02-15T19:28:35Z   NONE  

In several applications I've been experimenting with alternate formatting of datasette query results. Lately I've found that CSS grids work very well and seem quite general for formatting rows. In CSS I use grid templates to define the layout of each record and the regions for each field, hiding the fields I don't want. It's pretty flexible and looks good. It's also a great basis for highly responsive layout.

I initially thought I'd only use this feature for record detail views, but now I use it for index views as well.

However, there are some limitations: * With the existing table templates, it seems that you can change the display property on the enclosing table, tbody, and tr to make them be grid-like, but that seems hacky (convert table and tbody to be display: block and tr to be display: grid). * More significantly, it's very nice to have the column name available when rendering each record to display headers/field labels. The existing templates don't do that, so a custom _table template is necessary. * I don't know if any plugins are sensitive to whether data is rendered as a table or not since I'm not completely clear how plugins get their data. * Regardless, you need custom CSS to take full advantage of grids. I don't have a proposal on how to integrate them more deeply.

It would be helpful to at least have an official example or test that used a grid layout for records to make sure nothing in datasette breaks with it.

datasette 107914493 issue    
807437089 MDU6SXNzdWU4MDc0MzcwODk= 228 --no-headers option for CSV and TSV simonw 9599 closed 0     9 2021-02-12T17:56:51Z 2021-02-14T22:25:17Z 2021-02-14T22:25:17Z OWNER  

https://bl.iro.bl.uk/work/ns/3037474a-761c-456d-a00c-9ef3c6773f4c has a fascinating CSV file that doesn't have a header row - it starts like this:

Computation and measurement of turbulent flow through idealized turbine blade passages,,"Loizou, Panos A.",https://isni.org/isni/0000000136122593,,University of Manchester,https://isni.org/isni/0000000121662407,1989,Thesis (Ph.D.),,Physical Sciences,,,https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.232781,
"Prolactin and growth hormone secretion in normal, hyperprolactinaemic and acromegalic man",,"Prescott, R. W. G.",https://isni.org/isni/0000000134992122,,University of Newcastle upon Tyne,https://isni.org/isni/0000000104627212,1983,Thesis (Ph.D.),,Biological Sciences,,,https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.232784,

It would be useful if sqlite-utils insert ... --csv had a mechanism for importing files like this one.

sqlite-utils 140912432 issue    
807817197 MDU6SXNzdWU4MDc4MTcxOTc= 229 Hitting `_csv.Error: field larger than field limit (131072)` frosencrantz 631242 closed 0     3 2021-02-13T19:52:44Z 2021-02-14T21:33:33Z 2021-02-14T21:33:33Z NONE  

I have a csv file where one of the fields is so large it is throwing an exception with this error and stops loading:
_csv.Error: field larger than field limit (131072)

The stack trace occurs here: https://github.com/simonw/sqlite-utils/blob/3.1/sqlite_utils/cli.py#L633

There is a way to handle this that helps:
https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072

One issue I had with this problem was sqlite-utils only provides limited context as to where the problem line is.
There is the progress bar, but that is by percent rather than by line number. It would have been helpful if it could have provided a line number.

Also, it would have been useful if it had allowed the loading to continue with later lines.

sqlite-utils 140912432 issue    
808008305 MDU6SXNzdWU4MDgwMDgzMDU= 230 --sniff option for sniffing delimiters simonw 9599 closed 0     8 2021-02-14T17:43:54Z 2021-02-14T21:15:33Z 2021-02-14T19:24:32Z OWNER  

I just spotted that csv.Sniffer in the Python standard library has a .has_header(sample) method which detects if the first row appears to be a header or not, which is interesting. https://docs.python.org/3/library/csv.html#csv.Sniffer

_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/228#issuecomment-778812050_

sqlite-utils 140912432 issue    
797159961 MDExOlB1bGxSZXF1ZXN0NTY0MjE1MDEx 225 fix for problem in Table.insert_all on search for columns per chunk of rows nieuwenhoven 261237 closed 0     2 2021-01-29T20:16:07Z 2021-02-14T21:04:13Z 2021-02-14T21:04:13Z NONE simonw/sqlite-utils/pulls/225

Hi,

I ran into a problem when trying to create a database from my Apple Healthkit data using healthkit-to-sqlite. The program crashed because of an invalid insert statement that was generated for table rDistanceCycling.

The actual problem turned out to be in sqlite-utils. Table.insert_all processes the data to be inserted in chunks of rows and checks for every chunk which columns are used, and it will collect all column names in the variable all_columns. The collection of columns is done using a nested list comprehension that is not completely correct.

I'm using a Windows machine and had to make a few adjustments to the tests in order to be able to run them because they had a posix dependency.

Thanks, kind regards,

Frans

# this is a (condensed) chunk of data from my Apple healthkit export that caused the problem.
# the 3 last items in the chunk have additional keys: metadata_HKMetadataKeySyncVersion and metadata_HKMetadataKeySyncIdentifier

chunk = [{'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '7.0.1',
          'device': '<<HKDevice: 0x281cf6c70>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',
          'unit': 'km', 'creationDate': '2020-10-10 12:29:09 +0100', 'startDate': '2020-10-10 12:29:06 +0100',
          'endDate': '2020-10-10 12:29:07 +0100', 'value': '0.00518016'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '7.0.1',
          'device': '<<HKDevice: 0x281cf6c70>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:7.0.1>',
          'unit': 'km', 'creationDate': '2020-10-10 12:29:10 +0100', 'startDate': '2020-10-10 12:29:07 +0100',
          'endDate': '2020-10-10 12:29:08 +0100', 'value': '0.00544049'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:40:50 +0100',
          'endDate': '2020-07-15 16:42:49 +0100', 'value': '0.952092', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520450.99823:616520569.99360:119'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:42:49 +0100',
          'endDate': '2020-07-15 16:44:51 +0100', 'value': '0.848983', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520569.99360:616520691.98826:119'},
         {'sourceName': 'AppleÂ\xa0Watch van Frans', 'sourceVersion': '6.2.6',
          'device': '<<HKDevice: 0x281cf83e0>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch3,4, software:6.2.6>',
          'unit': 'km', 'creationDate': '2020-10-14 05:54:12 +0100', 'startDate': '2020-07-15 16:44:51 +0100',
          'endDate': '2020-07-15 16:46:50 +0100', 'value': '0.834403', 'metadata_HKMetadataKeySyncVersion': '1',
          'metadata_HKMetadataKeySyncIdentifier': '3:674DBCDB-3FE8-40D1-9FC1-E54A2B413805:616520691.98826:616520810.98305:119'}]



def all_columns_old():
    all_columns = [col for col in chunk[0]]
    all_columns += [column for record in chunk
                           for column in record if column not in all_columns]
    return all_columns


def all_columns_new():
    all_columns = [col for col in chunk[0]]
    for record in chunk:
        all_columns += [column for column in record if column not in all_columns]
    return all_columns



if __name__ == '__main__':
    from pprint import pprint

    print('problem: ')
    pprint(all_columns_old())
    print('\nfix: ')
    pprint(all_columns_new())
sqlite-utils 140912432 pull    
808046597 MDU6SXNzdWU4MDgwNDY1OTc= 234 .insert_all() fails if subsequent chunks contain additional columns simonw 9599 closed 0     1 2021-02-14T21:01:51Z 2021-02-14T21:03:40Z 2021-02-14T21:03:40Z OWNER  

Reported by @nieuwenhoven in #225 along with a proposed fix.

sqlite-utils 140912432 issue    
808036774 MDU6SXNzdWU4MDgwMzY3NzQ= 232 Run tests against Windows in GitHub Actions simonw 9599 closed 0     0 2021-02-14T20:09:45Z 2021-02-14T20:39:55Z 2021-02-14T20:39:55Z OWNER  

I'm going to try and get the test suite to run in Windows on GitHub Actions.

_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/225#issuecomment-778834504_

sqlite-utils 140912432 issue    
808037010 MDExOlB1bGxSZXF1ZXN0NTczMTQ3MTY4 233 Run tests against Ubuntu, macOS and Windows simonw 9599 closed 0     0 2021-02-14T20:11:02Z 2021-02-14T20:39:54Z 2021-02-14T20:39:54Z OWNER simonw/sqlite-utils/pulls/233

Refs #232

sqlite-utils 140912432 pull    
808028757 MDU6SXNzdWU4MDgwMjg3NTc= 231 limit=X, offset=Y parameters for more Python methods simonw 9599 closed 0     2 2021-02-14T19:31:23Z 2021-02-14T20:03:08Z 2021-02-14T20:03:08Z OWNER  

I'm going to add a offset= parameter to support this case. Thanks for the suggestion!

_Originally posted by @simonw in https://github.com/simonw/sqlite-utils/issues/224#issuecomment-778828495_

sqlite-utils 140912432 issue    
792297010 MDExOlB1bGxSZXF1ZXN0NTYwMjA0MzA2 224 Add fts offset docs. polyrand 37962604 closed 0     2 2021-01-22T20:50:58Z 2021-02-14T19:31:06Z 2021-02-14T19:31:06Z NONE simonw/sqlite-utils/pulls/224

The limit can be passed as a string to the query builder to have an offset. I have tested it using the shorthand limit=f"15, 30", the standard syntax should work too.

sqlite-utils 140912432 pull    
675753042 MDU6SXNzdWU2NzU3NTMwNDI= 131 sqlite-utils insert: options for column types simonw 9599 open 0     3 2020-08-09T18:59:11Z 2021-02-12T23:25:06Z   OWNER  

The insert command currently results in string types for every column - at least when used against CSV or TSV inputs.

It would be useful if you could do the following:

  • automatically detects the column types based on eg the first 1000 records
  • explicitly state the rule for specific columns

--detect-types could work for the former - or it could do that by default and allow opt-out using --no-detect-types

For specific columns maybe this:

sqlite-utils insert db.db images images.tsv \
  --tsv \
  -c id int \
  -c score float
sqlite-utils 140912432 issue    
806743116 MDU6SXNzdWU4MDY3NDMxMTY= 1220 Installing datasette via docker: Path 'fixtures.db' does not exist aborruso 30607 closed 0     4 2021-02-11T21:09:14Z 2021-02-12T21:35:17Z 2021-02-12T21:35:17Z NONE  

Hi,
If I run

docker run -p 8001:8001 -v `pwd`:/mnt \                                                                         1 ↵
    datasetteproject/datasette \
    datasette -p 8001 -h 0.0.0.0 fixtures.db

I have

Error: Invalid value for '[FILES]...': Path 'fixtures.db' does not exist.

If I run test -f fixtures.db && echo "it exists." I have it exists..

What's my error?

Thank you

datasette 107914493 issue    
797097140 MDU6SXNzdWU3OTcwOTcxNDA= 60 Use Data from SQLite in other commands daniel-butler 22578954 open 0     3 2021-01-29T18:35:52Z 2021-02-12T18:29:43Z   NONE  

As a total beginner here how could you access data from the sqlite table to run other commands.

What I am thinking is I want to get all the repos in an organization then using the repo list pull all the commit messages for each repo.

I love this project by the way!

github-to-sqlite 207052882 issue    
807433181 MDU6SXNzdWU4MDc0MzMxODE= 1224 can't start immutable databases from configuration dir mode camallen 295329 open 0     0 2021-02-12T17:50:13Z 2021-02-12T17:50:13Z   NONE  

Say I have a /databases/ directory with multiple sqlite db files in that dir (1.db & 2.db) and an inspect-data.json file.

If I start datasette via datasette -h 0.0.0.0 /databases/ then the resulting databases are set to is_mutable: true as inspected via http://127.0.0.1:8001/-/databases.json

I don't want to have to list out the databases by name, e.g. datasette -i /databases/1.db -i /databases/2.db as i want the system to autodetect the sqlite dbs i have in the configuration directory

According to the docs outlined in https://docs.datasette.io/en/latest/settings.html?highlight=immutable#configuration-directory-mode this should be possible

inspect-data.json the result of running datasette inspect - any database files listed here will be treated as immutable, so they should not be changed while Datasette is running

I believe that if the inspect-json.json file present, then in theory the databases will be automatically set to immutable via this code https://github.com/simonw/datasette/blob/9603d893b9b72653895318c9104d754229fdb146/datasette/app.py#L211-L216

However it appears the Click Multiple Options will return a tuple via https://github.com/simonw/datasette/blob/9603d893b9b72653895318c9104d754229fdb146/datasette/cli.py#L311-L317

The resulting tuple is passed to the Datasette app via kwargs and overrides the behaviour to set the databases to immutable via this arg https://github.com/simonw/datasette/blob/9603d893b9b72653895318c9104d754229fdb146/datasette/app.py#L182

If you think this is a bug and needs fixing, I am willing to make a PR to check for the empty immutable tuple before calling the Datasette class initializer as I think leaving that class interface alone is the best path here.

Thoughts?

Also - i'm loving Datasette, it truly is a wonderful tool, thank you :)

datasette 107914493 issue    
803338729 MDU6SXNzdWU4MDMzMzg3Mjk= 33 photo-to-sqlite: command not found robmarkcole 11855322 open 0     4 2021-02-08T08:42:57Z 2021-02-12T15:00:44Z   NONE  

Having installed in a venv I get:

(venv) (base) Robins-MacBook:datasette robin$ photo-to-sqlite apple-photos photos.db

-bash: photo-to-sqlite: command not found
dogsheep-photos 256834907 issue    
806918878 MDExOlB1bGxSZXF1ZXN0NTcyMjU0MTAz 1223 Add compile option to Dockerfile to fix failing test (fixes #696) bobwhitelock 7476523 open 0     1 2021-02-12T03:38:05Z 2021-02-12T03:45:31Z   FIRST_TIME_CONTRIBUTOR simonw/datasette/pulls/1223

This test was failing when run inside the Docker container: test_searchable[/fixtures/searchable.json?_search=te*+AND+do*&_searchmode=raw-expected_rows3],

with this error:

    def test_searchable(app_client, path, expected_rows):
        response = app_client.get(path)
>       assert expected_rows == response.json["rows"]
E       AssertionError: assert [[1, 'barry c...sel', 'puma']] == []
E         Left contains 2 more items, first extra item: [1, 'barry cat', 'terry dog', 'panther']
E         Full diff:
E         + []
E         - [[1, 'barry cat', 'terry dog', 'panther'],
E         -  [2, 'terry dog', 'sara weasel', 'puma']]

The issue was that the version of sqlite3 built inside the Docker container was built with FTS3 and FTS4 enabled, but without the
SQLITE_ENABLE_FTS3_PARENTHESIS compile option passed, which adds support for using AND and NOT within match expressions (see https://sqlite.org/fts3.html#compiling_and_enabling_fts3_and_fts4 and https://www.sqlite.org/compile.html).

Without this, the AND used in the search in this test was being interpreted as a literal string, and so no matches were found. Adding this compile option fixes this.


I actually ran into this issue because the same test was failing when I ran the test suite on my own machine, outside of Docker, and so I eventually tracked this down to my system sqlite3 also being compiled without this option.

I wonder if this is a sign of a slightly deeper issue, that Datasette can silently behave differently based on the version and compilation of sqlite3 it is being used with. On my own system I fixed the test suite by running pip install pysqlite3-binary, so that this would be picked up instead of the sqlite package, as this seems to be compiled using this option, . Maybe using pysqlite3-binary could be installed/recommended by default so a more deterministic version of sqlite is used? Or there could be some feature detection done on the available sqlite version, to know what features are available and can be used/tested?

datasette 107914493 pull    
806849424 MDU6SXNzdWU4MDY4NDk0MjQ= 1221 Support SSL/TLS directly simonw 9599 closed 0     3 2021-02-12T00:18:29Z 2021-02-12T01:09:54Z 2021-02-12T00:52:18Z OWNER  

This should be pretty easy because Uvicorn supports them already. Need a good mechanism for testing it - https://pypi.org/project/trustme/ looks ideal.

datasette 107914493 issue    
806861312 MDExOlB1bGxSZXF1ZXN0NTcyMjA5MjQz 1222 --ssl-keyfile and --ssl-certfile, refs #1221 simonw 9599 closed 0     0 2021-02-12T00:45:58Z 2021-02-12T00:52:18Z 2021-02-12T00:52:17Z OWNER simonw/datasette/pulls/1222
datasette 107914493 pull    
770712149 MDExOlB1bGxSZXF1ZXN0NTQyNDA2OTEw 10 BugFix for encoding and not update info. riverzhou 1277270 closed 0     1 2020-12-18T08:58:54Z 2021-02-11T22:37:56Z 2021-02-11T22:37:56Z FIRST_TIMER dogsheep/evernote-to-sqlite/pulls/10

Bugfix 1:

Traceback (most recent call last):
File "d:\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "d:\anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Anaconda3\Scripts\evernote-to-sqlite.exe__main__.py", line 7, in <module>
File "d:\anaconda3\lib\site-packages\click\core.py", line 829, in call
File "d:\anaconda3\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "d:\anaconda3\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
return ctx.invoke(self.callback, ctx.params)
File "d:\anaconda3\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args,
kwargs)
File "d:\anaconda3\lib\site-packages\evernote_to_sqlite\cli.py", line 30, in enex
for tag, note in find_all_tags(fp, ["note"], progress_callback=bar.update):
File "d:\anaconda3\lib\site-packages\evernote_to_sqlite\utils.py", line 11, in find_all_tags
chunk = fp.read(1024 * 1024)
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 383: illegal multibyte sequence

Bugfix 2:

Traceback (most recent call last):
File "D:\Anaconda3\Scripts\evernote-to-sqlite-script.py", line 33, in <module>
sys.exit(load_entry_point('evernote-to-sqlite==0.3', 'console_scripts', 'evernote-to-sqlite')())
File "D:\Anaconda3\lib\site-packages\click\core.py", line 829, in call
return self.main(args, kwargs)
File "D:\Anaconda3\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "D:\Anaconda3\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "D:\Anaconda3\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback,
ctx.params)
File "D:\Anaconda3\lib\site-packages\click\core.py", line 610, in invoke
return callback(
args, **kwargs)
File "D:\Anaconda3\lib\site-packages\evernote_to_sqlite-0.3-py3.8.egg\evernote_to_sqlite\cli.py", line 31, in enex
File "D:\Anaconda3\lib\site-packages\evernote_to_sqlite-0.3-py3.8.egg\evernote_to_sqlite\utils.py", line 28, in save_note
AttributeError: 'NoneType' object has no attribute 'text'

evernote-to-sqlite 303218369 pull    
748370021 MDExOlB1bGxSZXF1ZXN0NTI1MzcxMDI5 8 fix import error if note has no "updated" element mkorosec 4028322 closed 0     0 2020-11-22T22:51:05Z 2021-02-11T22:34:06Z 2021-02-11T22:34:06Z CONTRIBUTOR dogsheep/evernote-to-sqlite/pulls/8

I got the following error when executing evernote-to-sqlite enex evernote.db evernote.enex

... 
  File "evernote_to_sqlite/cli.py", line 31, in enex
    save_note(db, note)
  File "evernote_to_sqlite/utils.py", line 28, in save_note
    updated = note.find("updated").text
AttributeError: 'NoneType' object has no attribute 'text'

Seems that in some cases the updated element is not added to the note, this is a part of the problematic note:

<created>20201019T074518Z</created>
<note-attributes>
   <source>web.clip7</source>
   <source-application>webclipper.evernote</source-application>
</note-attributes>
evernote-to-sqlite 303218369 pull    
743297582 MDU6SXNzdWU3NDMyOTc1ODI= 7 evernote-to-sqlite on windows 10 give this error: TypeError: insert() got an unexpected keyword argument 'replace' martinvanwieringen 42387931 closed 0     1 2020-11-15T16:57:28Z 2021-02-11T22:13:17Z 2021-02-11T22:13:17Z NONE  

running evernote-to-sqlite 0.2 on windows 10. Command:

evernote-to-sqlite enex evernote.db MyNotes.enex

I get the followinng error:

File "C:\Users\marti\AppData\Roaming\Python\Python38\site-packages\evernote_to_sqlite\utils.py", line 46, in save_note
note_id = db["notes"].insert(row, hash_id="id", replace=True, alter=True).last_pk
TypeError: insert() got an unexpected keyword argument 'replace'

Removing replace=True,

Leads to below error:

note_id = db["notes"].insert(row, hash_id="id", alter=True).last_pk
File "C:\Users\marti\AppData\Roaming\Python\Python38\site-packages\sqlite_utils\db.py", line 924, in insert
return self.insert_all(
File "C:\Users\marti\AppData\Roaming\Python\Python38\site-packages\sqlite_utils\db.py", line 1046, in insert_all
result = self.db.conn.execute(sql, values)
sqlite3.IntegrityError: UNIQUE constraint failed: notes.id

evernote-to-sqlite 303218369 issue    
748372469 MDU6SXNzdWU3NDgzNzI0Njk= 9 ParseError: undefined entity &scaron; mkorosec 4028322 closed 0     1 2020-11-22T23:04:35Z 2021-02-11T22:10:55Z 2021-02-11T22:10:55Z CONTRIBUTOR  

I encountered a parse error if the enex file contained š or  

Run command:
evernote-to-sqlite enex evernote.db evernote.enex

Traceback (most recent call last):
...
  File "evernote_to_sqlite/cli.py", line 31, in enex
    save_note(db, note)
  File "evernote_to_sqlite/utils.py", line 35, in save_note
    content = ET.tostring(ET.fromstring(content_xml)).decode("utf-8")
  File "/usr/lib/python3.8/xml/etree/ElementTree.py", line 1320, in XML
    parser.feed(text)
xml.etree.ElementTree.ParseError: undefined entity &scaron;: line 3, column 35

Workaround:

sed -i 's/&scaron;//g' evernote.enex
sed -i 's/&nbsp;//g' evernote.enex
evernote-to-sqlite 303218369 issue    
792851444 MDU6SXNzdWU3OTI4NTE0NDQ= 11 XML parse error dskrad 3613583 closed 0     2 2021-01-24T17:38:54Z 2021-02-11T21:18:58Z 2021-02-11T21:18:48Z NONE  

I am on Windows 10 using Windows Subsystem for Linux, Python 3.8. I installed evernote-to-sqlite via pipx (in a venv). I tried using enex files from the latest version of Evernote for Windows (10.6.9 which only lets you export 50 notes at a time) and from Legacy Evernote (6.25.2.9198 which lets you export all your notes at once). The enex file from latest evernote gives this error:

File "/usr/lib/python3.8/xml/etree/ElementTree.py", line 1320, in XML parser.feed(text)
xml.etree.ElementTree.ParseError: XML or text declaration not at start of entity: line 2, column 6

The enex file from Legacy Evernote gives this error:

File "/home/david/.local/pipx/venvs/evernote-to-sqlite/lib/python3.8/site-packages/evernote_to_sqlite/utils.py", line 28, in save_note
updated = note.find("updated").text
AttributeError: 'NoneType' object has no attribute 'text'
evernote-to-sqlite 303218369 issue    
792890765 MDU6SXNzdWU3OTI4OTA3NjU= 1200 ?_size=10 option for the arbitrary query page would be useful simonw 9599 open 0     2 2021-01-24T20:55:35Z 2021-02-11T03:13:59Z   OWNER  

https://latest.datasette.io/fixtures?sql=select+*+from+compound_three_primary_keys&_size=10 - _size=10 does not do anything at the moment. It would be useful if it did.

Would also be good if it persisted in a hidden form field.

datasette 107914493 issue    
803929694 MDU6SXNzdWU4MDM5Mjk2OTQ= 1219 Try profiling Datasette using scalene simonw 9599 open 0     2 2021-02-08T20:37:06Z 2021-02-08T22:13:00Z   OWNER  

https://github.com/emeryberger/scalene looks like an interesting profiling tool.

datasette 107914493 issue    
801780625 MDU6SXNzdWU4MDE3ODA2MjU= 9 SSL Error jfeiwell 12669260 open 0     2 2021-02-05T02:12:56Z 2021-02-07T18:45:04Z   NONE  

Here's the error I get when running pip install pocket-to-sqlite:

  Could not fetch URL https://pypi.python.org/simple/pocket-to-sqlite/: There was a problem confirming the ssl certificate: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:661) - skipping
  Could not find a version that satisfies the requirement pocket-to-sqlite (from versions: )
No matching distribution found for pocket-to-sqlite

Does this require python 3?

pocket-to-sqlite 213286752 issue    
802513359 MDU6SXNzdWU4MDI1MTMzNTk= 1217 Possible to deploy as a python app (for Rstudio connect server)? pavopax 6165713 open 0     2 2021-02-05T22:21:24Z 2021-02-06T19:23:41Z   NONE  

Is it possible to deploy a datasette application as a python web app?

In my enterprise, I have option to deploy python apps via Rstudio Connect, and I would like to publish a datasette dashboard for sharing.

I welcome any pointers to converting datasette serve into a python app that can be run as something like python datasette.py --my_data.db

datasette 107914493 issue    
802583450 MDU6SXNzdWU4MDI1ODM0NTA= 226 3.4 release is broken - includes a rogue line simonw 9599 closed 0     0 2021-02-06T02:08:01Z 2021-02-06T02:10:26Z 2021-02-06T02:10:26Z OWNER  

I started seeing weird errors, caused by this line: https://github.com/simonw/sqlite-utils/blob/f8010ca78fed8c5fca6cde19658ec09fdd468420/sqlite_utils/cli.py#L1-L3

That was added by accident in 1b666f9315d4ea6bb332b2e75e48480c26100199

I'm surprised the tests didn't catch this!

sqlite-utils 140912432 issue    
788527932 MDU6SXNzdWU3ODg1Mjc5MzI= 223 --delimiter option for CSV import simonw 9599 closed 0     2 2021-01-18T20:25:03Z 2021-02-06T01:39:47Z 2021-02-06T01:34:54Z OWNER  

https://bruxellesdata.opendatasoft.com/explore/dataset/dog-toilets/export/?location=12,50.85802,4.38054 says:

CSV uses semicolon (;) as a separator.

Would be useful to be able to do this:

sqlite-utils insert places.db places places.csv --delimiter ';'

--delimiter could imply --csv

sqlite-utils 140912432 issue    
794554881 MDU6SXNzdWU3OTQ1NTQ4ODE= 1208 A lot of open(file) functions are used without a context manager thus producing ResourceWarning: unclosed file <_io.TextIOWrapper kbaikov 4488943 open 0     2 2021-01-26T20:56:28Z 2021-02-05T21:02:39Z   NONE  

Your code is full of open files that are never closed, especially when you deal with reading/writing json/yaml files.

If you run python with warnings enabled this problem becomes evident.
This probably contributes to some memory leaks in long running datasettes if the GC will not 'collect' those resources properly.

This is easily fixed by using a context manager instead of just using open:

with open('some_file', 'w') as opened_file:
    opened_file.write('string')

In some newer parts of the code you use Path objects 'read_text' and 'write_text' functions which close the file properly and are prefered in some cases.

If you want I can create a PR for all places i found this pattern in.

Bellow is a fraction of places where i found a ResourceWarning:

update-docs-help.py:
  20          actual = actual.replace("Usage: cli ", "Usage: datasette ")
  21:         open(docs_path / filename, "w").write(actual)
  22  

datasette\app.py:
  210          ):
  211:             inspect_data = json.load((config_dir / "inspect-data.json").open())
  212              if immutables is None:

  266          if config_dir and (config_dir / "settings.json").exists() and not config:
  267:             config = json.load((config_dir / "settings.json").open())
  268          self._settings = dict(DEFAULT_SETTINGS, **(config or {}))

  445              self._app_css_hash = hashlib.sha1(
  446:                 open(os.path.join(str(app_root), "datasette/static/app.css"))
  447                  .read()

datasette\cli.py:
  130      else:
  131:         out = open(inspect_file, "w")
  132      loop = asyncio.get_event_loop()

  459      if inspect_file:
  460:         inspect_data = json.load(open(inspect_file))
  461  
datasette 107914493 issue    
743384829 MDExOlB1bGxSZXF1ZXN0NTIxMjg3OTk0 203 changes to allow for compound foreign keys drkane 1049910 open 0     5 2020-11-16T00:30:10Z 2021-02-05T18:44:13Z   FIRST_TIME_CONTRIBUTOR simonw/sqlite-utils/pulls/203

Add support for compound foreign keys, as per issue #117

Not sure if this is the right approach. In particular I'm unsure about:

  • the new ForeignKey class, which replaces the namedtuple in order to ensure that column and other_column are forced into tuples. The class does the job, but doesn't feel very elegant.
  • I haven't rewritten guess_foreign_table to take account of multiple columns, so it just checks for the first column in the foreign key definition. This isn't ideal.
  • I haven't added any ability to the CLI to add compound foreign keys, it's only in the python API at the moment.

The PR also contains a minor related change that columns and tables are always quoted in foreign key definitions.

sqlite-utils 140912432 pull    
796234313 MDU6SXNzdWU3OTYyMzQzMTM= 1210 Immutable Database w/ Canned Queries heyarne 525780 closed 0     2 2021-01-28T18:08:29Z 2021-02-05T11:30:34Z 2021-02-05T11:30:34Z NONE  

I have a database that I only want to read from; when instructing datasette to treat the database as immutable my defined canned queries disappear. Are these two features incompatible or have I hit an unintended bug? Thanks for datasette in any way, it's a joy to use!

datasette 107914493 issue    
800669347 MDU6SXNzdWU4MDA2NjkzNDc= 1216 /-/databases should reflect connection order, not alphabetical order simonw 9599 open 0     1 2021-02-03T20:20:23Z 2021-02-03T20:20:48Z   OWNER  

The order in which databases are attached to Datasette matters - it affects the homepage, and it's beginning to influence how certain plugins work (see https://github.com/simonw/datasette-tiles/issues/8).

Two years ago in cccea85be6aaaeadb31f3b588ec7f732628815f5 I made /-/databases return things in alphabetical order, to fix a test failure in Python 3.5.

Python 3.5 is no longer supported, so this is no longer necessary - and this behaviour should now be treated as a bug.

datasette 107914493 issue    
796736607 MDU6SXNzdWU3OTY3MzY2MDc= 56 Not all quoted statuses get fetched? gsajko 42315895 closed 0     3 2021-01-29T09:48:44Z 2021-02-03T10:36:36Z 2021-02-03T10:36:36Z NONE  

In my database I have 13300 quote tweets, but eta 3600 have quoted_status empty.

I fetched some of them using https://api.twitter.com/1.1/statuses/show.json?id=xx and they did have ids of quoted tweets.

twitter-to-sqlite 206156866 issue    
799693777 MDU6SXNzdWU3OTk2OTM3Nzc= 1214 Re-submitting filter form duplicates _x querystring arguments simonw 9599 closed 0     3 2021-02-02T21:13:35Z 2021-02-02T21:28:53Z 2021-02-02T21:21:13Z OWNER  

Really nasty bug, caused by #1194 fix in 07e163561592c743e4117f72102fcd350a600909

Navigate to this page: https://github-to-sqlite.dogsheep.net/github/labels?_search=help&_sort=id

Click "Apply" to submit the form and the resulting URL is https://github-to-sqlite.dogsheep.net/github/labels?_search=help&_sort=id&_search=help&_sort=id

That's because the (truncated) HTML for the form looks like this:

    ... <input id="_search" type="search" name="_search" value="help">
    ...
            <div class="select-wrapper small-screen-only">
                <select name="_sort" id="sort_by">
                    <option value="">Sort...</option>
                            <option value="id" selected>Sort by id</option>
                            <option value="node_id">Sort by node_id</option>
                            ...
                </select>
            </div>
            ...
            <input type="hidden" name="_search" value="help">
            <input type="hidden" name="_sort" value="id">
        <input type="submit" value="Apply">
datasette 107914493 issue    
799663959 MDU6SXNzdWU3OTk2NjM5NTk= 1213 gzip support for HTML (and JSON) responses simonw 9599 open 0     3 2021-02-02T20:36:28Z 2021-02-02T20:41:55Z   OWNER  

This page https://datasette-tiles-demo.datasette.io/San_Francisco/tiles is 2MB because of all of the base64 images. Gzipped it's 1.5MB.

Since Datasette is usually deployed without a frontend gzipping proxy, Datasette itself needs to solve for this.

Gzipping everything won't work because some endpoints - the all-rows CSV endpoint and the download-database endpoint - are streaming and hence can't be buffered-and-gzipped.

datasette 107914493 issue    
797649915 MDExOlB1bGxSZXF1ZXN0NTY0NjA4MjY0 1211 Use context manager instead of plain open kbaikov 4488943 open 0     2 2021-01-31T07:58:10Z 2021-02-01T20:13:39Z   FIRST_TIME_CONTRIBUTOR simonw/datasette/pulls/1211

Context manager with open closes the files after usage. Fixes: https://github.com/simonw/datasette/issues/1208

When the object is already a pathlib.Path i used read_text
write_text functions

In some cases pathlib.Path.open were used in context manager,
it is basically the same as builtin open.

Tests are passing: 850 passed, 5 xfailed, 10 xpassed

datasette 107914493 pull    
774332247 MDExOlB1bGxSZXF1ZXN0NTQ1MjY0NDM2 1159 Improve the display of facets information lovasoa 552629 open 0     3 2020-12-24T11:01:47Z 2021-02-01T13:42:29Z   FIRST_TIME_CONTRIBUTOR simonw/datasette/pulls/1159

This PR changes the display of facets to hopefully make them more readable.

<table> <thead> <tr> <th>Before</th> <th>After</th> </tr> </thead> <tbody> <tr> <td></td> <td></td> </tr> </tbody> </table>
datasette 107914493 pull    
797784080 MDU6SXNzdWU3OTc3ODQwODA= 62 Stargazers and workflows commands always require an auth file when using GITHUB_TOKEN frosencrantz 631242 open 0     0 2021-01-31T18:56:05Z 2021-01-31T18:56:05Z   NONE  

Requested fix in https://github.com/dogsheep/github-to-sqlite/pull/59

The stargazers and workflows commands always require an auth file, even when using a GITHUB_TOKEN. Other commands don't require the auth file.

github-to-sqlite 207052882 issue    
797728929 MDU6SXNzdWU3OTc3Mjg5Mjk= 8 QUESTION: extract full text darribas 417363 open 0     0 2021-01-31T14:50:10Z 2021-01-31T14:50:10Z   NONE  

This may be solved or a feature already, but I couldn't figure it out, is it possible to extract and store also full text from the saved pages? The same way that Pocket parses the text, it'd be amazing to be able to store (and thus make searchable later) the text.

Thank you very much for the project, it's such an amazing idea!

pocket-to-sqlite 213286752 issue    
703246031 MDU6SXNzdWU3MDMyNDYwMzE= 51 github-to-sqlite should handle rate limits better simonw 9599 open 0     2 2020-09-17T04:01:50Z 2021-01-30T03:47:24Z   MEMBER  

From #50 - right now it will crash with an error of it hits the rate limit. Since the rate limit information (including reset time) is available in the headers it could automatically sleep and try again instead.

github-to-sqlite 207052882 issue    
797108702 MDExOlB1bGxSZXF1ZXN0NTY0MTcyMTQw 61 fixing typo in get cli help text daniel-butler 22578954 open 0     0 2021-01-29T18:57:04Z 2021-01-29T18:57:04Z   FIRST_TIME_CONTRIBUTOR dogsheep/github-to-sqlite/pulls/61
github-to-sqlite 207052882 pull    
793881756 MDU6SXNzdWU3OTM4ODE3NTY= 1207 Document the Datasette(..., pdb=True) testing pattern simonw 9599 closed 0     1 2021-01-26T02:48:10Z 2021-01-29T02:37:19Z 2021-01-29T02:12:34Z OWNER  

If you're writing tests for a Datasette plugin and you get a 500 error from inside Datasette, you can cause Datasette to open a PDB session within the application server code by doing this:

ds = Datasette([db_path], pdb=True)
response = await ds.client.get("/")

You'll need to run pytest -s to interact with the debugger, otherwise you'll get an error.

datasette 107914493 issue    
795367402 MDU6SXNzdWU3OTUzNjc0MDI= 1209 v0.54 500 error from sql query in custom template; code worked in v0.53; found a workaround jrdmb 11788561 open 0     1 2021-01-27T19:08:13Z 2021-01-28T23:00:27Z   NONE  

v0.54 500 error in sql query template; code worked in v0.53; found a workaround

schema:
CREATE TABLE "talks" ("talk" TEXT,"series" INTEGER, "talkdate" TEXT)
CREATE TABLE "series" ("id" INTEGER PRIMARY KEY, "series" TEXT, talks_list TEXT default '', website TEXT default '');

Live example of correctly rendered template in v.053: https://cosmotalks-cy6xkkbezq-uw.a.run.app/cosmotalks/talks/1

Description of problem: I needed 'sql select' code in a custom row-mydatabase-mytable.html template to lookup the series name for a foreign key integer value in the talks table. So metadata.json specifies the datasette-template-sql plugin.

The code below worked perfectly in v0.53 (just the relevant sql statement part is shown; full code is here):

{# custom addition #}  
{% for row in display_rows %}  
    ...  
    {% set sname = sql("select series from series where id = ?", [row.series]) %}  
    <strong>Series name: {{ sname[0].series }}  
    ...  
{% endfor %}  
{# End of custom addition #}  

In v0.54, that code resulted in a 500 error with a 'no such table series' message. A second query in that template also did not work but the above is fully illustrative of the problem.

All templates were up-to-date along with datasette v0.54.

Workaround: After fiddling around with trying different things, what worked was the syntax from Querying a different database from the datasette-template-sql github repo to add the database name to the sql statement:

{% set sname = sql("select series from series where id = ?", [row.series], database="cosmotalks") %}

Though this was found to work, it should not be necessary to add database="cosmotalks" since per the datasette-template-sql README, it's only needed when querying a different database, but here it's a table within the same database.

datasette 107914493 issue    
793027837 MDU6SXNzdWU3OTMwMjc4Mzc= 1205 Rename /:memory: to /_memory simonw 9599 closed 0   Datasette 1.0 3268330 3 2021-01-25T05:04:56Z 2021-01-28T22:55:02Z 2021-01-28T22:51:42Z OWNER  

For consistency with /_internal - and because then we don't need to escape the : characters.

This change would need to be in before Datasette 1.0. I could land it earlier and set up redirects from the old URLs though.

datasette 107914493 issue    
779088071 MDU6SXNzdWU3NzkwODgwNzE= 54 Archive import appears to be broken on recent exports jacobian 21148 open 0     3 2021-01-05T14:18:01Z 2021-01-26T23:07:41Z   NONE  

I requested a Twitter export yesterday, and unfortunately they seem to have changed it such that twitter-to-sqlite import can't handle it anymore 😢

So far I've ran into two issues. The first was easy to work around, but the second will take more investigation. If I can find the time I'll keep working on it and update this issue accordingly.

The issues (so far):

1. Data seems to have moved to a data/ subdirectory

Running twitter-to-sqlite import on the raw zip file reports a bunch of "not yet implemented" errors, and then exits without actually importing anything:

❯ twitter-to-sqlite import tarchive.db twitter.zip
...
data/manifest: not yet implemented
data/account-creation-ip: not yet implemented
data/account-suspension: not yet implemented
... (dozens of more lines like this, including critical stuff like data/tweets) ...

(tarchive.db now exists, but is empty)

Workaround: unpack the zip file, and run twitter-to-sqlite import tarchive.db path/to/archive/data

That gets further, but:

2. Some schema(s?) have changed

At least, the blocks schema seems different now:

❯ twitter-to-sqlite import tarchive.db archive/data
direct-messages-group: not yet implemented
branch-links: not yet implemented
periscope-expired-broadcasts: not yet implemented
direct-messages: not yet implemented
mute: not yet implemented
Traceback (most recent call last):
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/bin/twitter-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/cli.py", line 772, in import_
    archive.import_from_file(db, filepath.name, open(filepath, "rb").read())
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/archive.py", line 215, in import_from_file
    to_insert = transformer(data)
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/archive.py", line 115, in lists_member
    return {"lists-member": _list_from_common(data)}
  File "/Users/jacob/Library/Caches/pypoetry/virtualenvs/jacobian-dogsheep-4AXaN4tu-py3.8/lib/python3.8/site-packages/twitter_to_sqlite/archive.py", line 200, in _list_from_common
    for url in block["userListInfo"]["urls"]:
KeyError: 'urls'

That's as far as I got before I needed to work on something else. I'll report back if I get further!

twitter-to-sqlite 206156866 issue    
770448622 MDU6SXNzdWU3NzA0NDg2MjI= 1151 Database class mechanism for cross-connection in-memory databases simonw 9599 closed 0   Datasette 0.54 6346396 11 2020-12-17T23:25:43Z 2021-01-26T19:07:44Z 2020-12-18T01:01:26Z OWNER  

Next challenge: figure out how to use the Database class from https://github.com/simonw/datasette/blob/0.53/datasette/database.py for an in-memory database which persists data for the duration of the lifetime of the server, and allows access to that in-memory database from multiple threads in a way that lets them see each other's changes.

_Originally posted by @simonw in https://github.com/simonw/datasette/issues/1150#issuecomment-747768112_

datasette 107914493 issue    
714377268 MDU6SXNzdWU3MTQzNzcyNjg= 991 Redesign application homepage simonw 9599 open 0     7 2020-10-04T18:48:45Z 2021-01-26T19:06:36Z   OWNER  

Most Datasette instances only host a single database, but the current homepage design assumes that it should leave plenty of space for multiple databases:

https://user-images.githubusercontent.com/9599/95024344-5b51fd80-0637-11eb-8a11-40bad16f6907.png">

Reconsider this design - should the default show more information?

The Covid-19 Datasette homepage looks particularly sparse I think: https://covid-19.datasettes.com/

https://user-images.githubusercontent.com/9599/95024391-876d7e80-0637-11eb-8f19-ef38e4c87d2a.png">

datasette 107914493 issue    
793907673 MDExOlB1bGxSZXF1ZXN0NTYxNTEyNTAz 15 added try / except to write_records ryancheley 9857779 open 0     0 2021-01-26T03:56:21Z 2021-01-26T03:56:21Z   FIRST_TIME_CONTRIBUTOR dogsheep/healthkit-to-sqlite/pulls/15

to keep the data write from failing if it came across an error during processing. In particular when trying to convert my HealthKit zip file (and that of my wife's) it would consistently error out with the following:

db.py 1709 insert_chunk
result = self.db.execute(query, params)

db.py 226 execute
return self.conn.execute(sql, parameters)

sqlite3.OperationalError:
too many SQL variables

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
db.py 1709 insert_chunk
result = self.db.execute(query, params)

db.py 226 execute
return self.conn.execute(sql, parameters)

sqlite3.OperationalError:
too many SQL variables

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
db.py 1709 insert_chunk
result = self.db.execute(query, params)

db.py 226 execute
return self.conn.execute(sql, parameters)

sqlite3.OperationalError:
table rBodyMass has no column named metadata_HKWasUserEntered

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
healthkit-to-sqlite 8 <module>
sys.exit(cli())

core.py 829 __call__
return self.main(*args, **kwargs)

core.py 782 main
rv = self.invoke(ctx)

core.py 1066 invoke
return ctx.invoke(self.callback, **ctx.params)

core.py 610 invoke
return callback(*args, **kwargs)

cli.py 57 cli
convert_xml_to_sqlite(fp, db, progress_callback=bar.update, zipfile=zf)

utils.py 42 convert_xml_to_sqlite
write_records(records, db)

utils.py 143 write_records
db[table].insert_all(

db.py 1899 insert_all
self.insert_chunk(

db.py 1720 insert_chunk
self.insert_chunk(

db.py 1720 insert_chunk
self.insert_chunk(

db.py 1714 insert_chunk
result = self.db.execute(query, params)

db.py 226 execute
return self.conn.execute(sql, parameters)

sqlite3.OperationalError:
table rBodyMass has no column named metadata_HKWasUserEntered

Adding the try / except in the write_records seems to fix that issue.

healthkit-to-sqlite 197882382 pull    
792904595 MDU6SXNzdWU3OTI5MDQ1OTU= 1201 Release notes for Datasette 0.54 simonw 9599 closed 0   Datasette 0.54 6346396 5 2021-01-24T21:22:28Z 2021-01-25T17:42:21Z 2021-01-25T17:42:21Z OWNER  

These will incorporate the release notes from the alpha, much expanded: https://github.com/simonw/datasette/releases/tag/0.54a0

datasette 107914493 issue    
793086333 MDExOlB1bGxSZXF1ZXN0NTYwODMxNjM4 1206 Release 0.54 simonw 9599 closed 0     3 2021-01-25T06:45:47Z 2021-01-25T17:33:30Z 2021-01-25T17:33:29Z OWNER simonw/datasette/pulls/1206

Refs #1201

datasette 107914493 pull    

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [pull_request] TEXT,
   [body] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
, [active_lock_reason] TEXT, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issues_repo]
                ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
                ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
                ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
                ON [issues] ([user]);