rowid,body,issue_comments_fts,rank 338523957,"I also need to solve for weird primary keys. If it’s a single integer or a single char field that’s easy. But what if it is a compound key with more than one chat field? What delimiter can I use that will definitely be safe? Let’s say I use hyphen. Now I need to find a durable encoding for any hyphens that might exist in the key fields themselves. How about I use URLencoding for every non-alpha-numeric character? That will turn hyphens into (I think) %2D. It should also solve for unicode characters, but it means the vast majority of keys (integers) will display neatly, including a compound key of eg 5678-345 ",17608, 338524454,Table rendering logic needs to detect the primary key field and turn it into a hyperlink. If there is a compound primary key it should add an extra column at the start of the table which displays the compound key as a link,17608, 338524857,"https://stackoverflow.com/a/14468878/6083 Looks like I should order by compound primary key and implement cursor-based pagination.",17608, 338526148,https://github.com/ahupp/python-magic/blob/master/README.md,17608, 338530389,"This means I need a good solution for these compile time options while running in development mode ",17608, 338530480," How about when the service starts up it checks for a compile.json file and, if it is missing, creates it using the same code we run at compile time normally ",17608, 338530704,Needed by https://github.com/simonw/stateless-datasets/issues/4#issuecomment-338530389,17608, 338531827,"Many of the applications I want to implement with this would benefit from having permanent real URLs. So let’s have both. The sha1 urls will serve far future cache headers (and an etag derived from their path). The non sha1 URLs will serve 302 uncached redirects to the sha1 locations. We will have a setting that lets people opt out of this behavior.",17608, 338697223,"Now returning this: { ""error"": ""attempt to write a readonly database"", ""ok"": false } ",17608, 338768860,I could use the table-reflow mechanism demonstrated here: http://demos.jquerymobile.com/1.4.3/table-reflow/,17608, 338769538,"Maybe this should be handled by views instead? https://stateless-datasets-wreplxalgu.now.sh/ lists some views https://stateless-datasets-wreplxalgu.now.sh/?sql=select%20*%20from%20%22Order%20Subtotals%22 is an example showing the content of a view. What would the URL to views be? I don't think a view can share a name with a table, so the same URL scheme could work for both.",17608, 338789734,"URL design: /database/table.json - redirects to /database-6753f4a/table.json So we always redirect to the version with the truncated hash in the URL. ",17608, 338797522,"https://stackoverflow.com/a/18134919/6083 is a good answer about how many characters of the hash are needed to be unique. I say we default to 7 characters, like git does - but allow extras to be configured.",17608, 338799438,Can I take advantage of HTTP/2 so even if you get redirected I start serving you the correct resource straight away?,17608, 338804173,"Looks like the easiest way to implement HTTP/2 server push today is to run behind Cloudflare and use this: Link: ; rel=preload; as=script https://blog.cloudflare.com/announcing-support-for-http-2-server-push-2/ Here's the W3C draft: https://w3c.github.io/preload/ From https://w3c.github.io/preload/#as-attribute it looks like I should use `as=fetch` if the content is intended for consumption by fetch() or XMLHTTPRequest. Unclear if I should throw `as=fetch crossorigin` in there. Need to experiment on that. ",17608, 338806718,"Here's what the homepage of cloudflare.com does (with newlines added within the link header for clarity): $ curl -i 'https://www.cloudflare.com/' HTTP/1.1 200 OK Date: Mon, 23 Oct 2017 21:45:58 GMT Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Connection: keep-alive link: ; rel=preload; as=style, ; rel=preload; as=style, ; rel=preload, ; rel=preload, ; rel=preload; as=video, ; rel=preload; as=video, ; rel=preload; as=video, ; rel=preload; as=video, ; rel=preload; as=video, ; rel=preload; as=image The original header looked like this: link: ; rel=preload; as=style, ; rel=preload; as=style, ; rel=preload, ; rel=preload, ; rel=preload; as=video, ; rel=preload; as=video, ; rel=preload; as=video, ; rel=preload; as=video, ; rel=preload; as=video, ; rel=preload; as=image ",17608, 338834213,"If I can’t setect a primary key, I won’t provide a URL for those records",17608, 338852971,I'm not going to bother with this.,17608, 338853083,Fixed in 9d219140694551453bfa528e0624919eb065f9d6,17608, 338854988," /database-name/table-name?name__contains=simon&sort=id+desc Note that if there's a column called ""sort"" you can still do sort__exact=blah ",17608, 338857568,"I can find the primary keys using: PRAGMA table_info(myTable) ",17608, 338859620,I’m going to implement everything in https://docs.djangoproject.com/en/1.11/ref/models/querysets/#field-lookups with the exception of range and the various date ones.,17608, 338859709,"I’m going to need to write unit tests for this, is this depends on #9",17608, 338861511,"Some tables won't have primary keys, in which case I won't generate pages for individual records.",17608, 338863155,I’m going to use py.test and start with all tests in a single tests.py module,17608, 338872286,"I'm going to use `,` as the separator between elements of a compound primary key. If those elements themselves include a comma I will use `%2C` in its place.",17608, 338882110,"Well, I've started it at least.",17608, 338882207,Next step: generate links to these.,17608, 339003850,As of b46e370ee6126aa2fa85cf789a31da38aed98496 this is done.,17608, 339019873,"Here's what I've got now: ",17608, 339027711,I have code to detect primary keys on tables... but what should I do for tables that lack primary keys? How should I even sort them?,17608, 339028979,"Looks like I can use the SQLite specific “rowid” in that case. It isn’t guaranteed to stay consistent across a VACUUM but that’s ok because we are immutable anyway. https://www.sqlite.org/lang_createtable.html#rowid",17608, 339138809,May as well support most of https://sqlite.org/lang_expr.html,17608, 339186887,"Still to do: - [x] `gt`, `gte`, `lt`, `lte` - [x] `like` - [x] `glob` ",17608, 339210353,I'm going to call this one done for the moment. The date filters can go in a stretch goal.,17608, 339366612,"I had to manually set the content disposition header: return await response.file_stream( filepath, headers={ 'Content-Disposition': 'attachment; filename=""{}""'.format(ilepath) } ) In the next release of Sanic I can just use the filename= argument instead: https://github.com/channelcat/sanic/commit/07e95dba4f5983afc1e673df14bdd278817288aa",17608, 339382054,Could this be as simple as using the iterative JSON encoder and adding a yield statement in between each chunk?,17608, 339388215,"First experiment: hook up an iterative CSV dump (just because that’s a tiny bit easier to get started with than iterative a JSON). Have it execute a big select statement and then iterate through the result set 100 rows at a time using sqite fetchmany() - also have it async sleep for a second in between each batch of 100. Can this work without needing python threads? ",17608, 339388771,"If this does work, I need to figure it what to do about the HTML view. ASsuming I can iteratively produce JSON and CSV, what to do about HTML? One option: render the first 500 rows as HTML, then hand off to an infinite scroll experience that iteratively loads more rows as JSON.",17608, 339389105,The gold standard here is to be able to serve up increasingly large datasets without blocking the event loop and while using a sustainable amount of RAM,17608, 339389328,Ideally we can get some serious gains from the fact that our database file is opened with the immutable option.,17608, 339395551,"Simplest implementation will be to create a temporary directory somewhere, copy in a Dockerfile and the databases and run “now” in it. Ideally I can use symlinks rather than copying potentially large database files around.",17608, 339406634,It certainly looks like some of the stuff in https://sqlite.org/pragma.html could be used to screw around with things. Example: `PRAGMA case_sensitive_like = 1` - would that affect future queries?,17608, 339413825,Could I use https://sqlparse.readthedocs.io/en/latest/ to parse incoming statements and ensure they are pure SELECTs? Would that prevent people from using a compound SELECT statement to trigger an evil PRAGMA of some sort?,17608, 339420462,"https://sitesforprofit.com/responsive-table-plugins-and-patterns has some useful links. I really like the pattern from https://css-tricks.com/responsive-data-tables/ /* Max width before this PARTICULAR table gets nasty This query will take effect for any screen smaller than 760px and also iPads specifically. */ @media only screen and (max-width: 760px), (min-device-width: 768px) and (max-device-width: 1024px) { /* Force table to not be like tables anymore */ table, thead, tbody, th, td, tr { display: block; } /* Hide table headers (but not display: none;, for accessibility) */ thead tr { position: absolute; top: -9999px; left: -9999px; } tr { border: 1px solid #ccc; } td { /* Behave like a ""row"" */ border: none; border-bottom: 1px solid #eee; position: relative; padding-left: 50%; } td:before { /* Now like a table header */ position: absolute; /* Top/left values mimic padding */ top: 6px; left: 6px; width: 45%; padding-right: 10px; white-space: nowrap; } /* Label the data */ td:nth-of-type(1):before { content: ""First Name""; } td:nth-of-type(2):before { content: ""Last Name""; } td:nth-of-type(3):before { content: ""Job Title""; } td:nth-of-type(4):before { content: ""Favorite Color""; } td:nth-of-type(5):before { content: ""Wars of Trek?""; } td:nth-of-type(6):before { content: ""Porn Name""; } td:nth-of-type(7):before { content: ""Date of Birth""; } td:nth-of-type(8):before { content: ""Dream Vacation City""; } td:nth-of-type(9):before { content: ""GPA""; } td:nth-of-type(10):before { content: ""Arbitrary Data""; } }",17608, 339510770,It looks like I should double quote my columns and ensure they are correctly escaped https://blog.christosoft.de/2012/10/sqlite-escaping-table-acolumn-names/ - hopefully using ? placeholders for column names will work. I should use ? for tables too.,17608, 339514819,"I’m going to have a single command-line app that does everything. Name to be decided - options include dataset, stateless, datasite (I quite like that - it reflects SQLite and the fact that you create a website)",17608, 339515822,"datasite . - starts web app in current directory, serving all DB files datasite . -p 8001 - serves on custom port datasite blah.db blah2.db - serves specified files You can’t specify more than one directory. You can specify as many files as you like. If you specify two files with different oaths but the same name then they must be accessed by hash. datasite publish . - publishes current directory to the internet! Uses now by default, if it detects it on your path. Other publishers will be eventually added as plugins. datasite publish http://path-to-db.db - publishes a DB available at a URL. Works by constructing the Dockerfile with wget calls in it. datasite blah.db -m metadata.json If you specify a directory it looks for metadata.json in that directory. Otherwise you can pass an explicit metadata file oath with -m or —metadata",17608, 339516032,Another potential name: datapi ,17608, 339517846,"I’m going to use Click for this http://nvie.com/posts/writing-a-cli-in-python-in-under-60-seconds/ https://kushaldas.in/posts/building-command-line-tools-in-python-with-click.html",17608, 339724700,"Here’s how to make the “serve” subcommand the default if it is called with no arguments: @click.group(invoke_without_command=True) def serve(): # ...",17608, 339866724," ",17608, 339891755,"Deploys to Now aren't working at the moment - they aren't showing the uploaded databases, because I've broken the path handling somehow. I need to do a bit more work here.",17608, 340561577,http://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/quickstart.html describes how to package this for PyPI,17608, 340787868,"Here’s how I can (I think) provide safe execution of arbitrary SQL while blocking PRAGMA calls: let people use names parameters in their SQL and apply strict filtering to the SQL query but not to the parameter values. cur.execute( ""select * from people where name_last=:who and age=:age"", { ""who"": who, ""age"": age }) In URL form: ?sql=select...&who=Terry&age=34 Now we can apply strict, dumb validation rules to the SQL part while allowing anything in the named queries - so people can execute a search for PRAGMA without being able to execute a PRAGMA statement.",17608, 341938424,Done: https://github.com/simonw/stateless-datasets/commit/edaa10587e60946e0c1935333f6b79553db33798,17608, 341945420,"To simplify things a bit, I'm going to require that every database is explicitly listed in the command line. I won't support ""serve everything in this directory"" for the moment.",17608, 342030075,"... I tried that, I don't like it. I'm going to bring back ""directory serving"" by allowing you to pass a directory as an argument to `datasite` (including `datasite .`). I may even make `.` the default if you don't provide anything at all.",17608, 342032943,"Default look with Bootstrap 4 looks like this: ",17608, 342484889,I’m going to call this feature “count values”,17608, 342521344,GDS Registries could be fun too: https://registers.cloudapps.digital/,17608, 343164111,Implemented in 31b21f5c5e15fc3acab7fabb170c1da71dc3c98c,17608, 343168796,Won't fix: ujson is not compatible with the custom JSON encoder I'm using here: https://github.com/simonw/immutabase/blob/b2dee11fcd989d9e2a7bf4de1e23dbc320c05013/immutabase/app.py#L401-L416,17608, 343237982,"More terms: * publish * share * docker * host * stateless I want to capture the idea of publishing an immutable database in a stateless container.",17608, 343238262,The name should ideally be available on PyPI and should make sense as both a command line application and a library.,17608, 343239062,This looks promising: https://github.com/esnme/ultrajson/issues/124#issuecomment-323882878,17608, 343266326,http://sanic.readthedocs.io/en/latest/sanic/testing.html,17608, 343281876,How about datasette?,17608, 343551356,I'm going with datasette.,17608, 343557070,"https://file.io/ looks like it could be good for this. It's been around since 2015, and lets you upload a temporary file which can be downloaded once. $ curl -s -F ""file=@database.db"" ""https://file.io/?expires=1d"" {""success"":true,""key"":""ySrl1j"",""link"":""https://file.io/ySrl1j"",""expiry"":""1 day""} Downloading from that URL serves up the data with a `Content-disposition` header containing the filename: simonw$ curl -vv https://file.io/ySrl1j | more % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 34.232.1.167... * Connected to file.io (34.232.1.167) port 443 (#0) * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 * Server certificate: file.io * Server certificate: Amazon * Server certificate: Amazon Root CA 1 * Server certificate: Starfield Services Root Certificate Authority - G2 > GET /ySrl1j HTTP/1.1 > Host: file.io > User-Agent: curl/7.43.0 > Accept: */* > < HTTP/1.1 200 OK < Date: Fri, 10 Nov 2017 18:14:38 GMT < Content-Type: undefined < Transfer-Encoding: chunked < Connection: keep-alive < X-Powered-By: Express < X-RateLimit-Limit: 5 < X-RateLimit-Remaining: 4 < Access-Control-Allow-Origin: * < Access-Control-Allow-Headers: Cache-Control,X-reqed-With,x-requested-with < Content-disposition: attachment; filename=database.db ... ",17608, 343581130,"I'm going to handle this a different way. I'm going to support a local history of your own queries stored in localStorage, but if you want to share a query you have to do it with a URL. If people really want canned query support, they can do that using custom templates - see #12 - or by adding views to their database before they publish it.",17608, 343581332,I'm not going to use Sanic's mechanism for this. I'll use arguments passed to my cli instead.,17608, 343643332,"Here's what a table looks like now at a smaller screen size: ",17608, 343644891,"I can detect something is a view like this: SELECT name from sqlite_master WHERE type ='view'; ",17608, 343644976,"Simplest version of this: 1. Create a temporary directory 2. Write a Dockerfile into it that pulls an image and pip installs datasette 3. Add symlinks to the DBs they listed (so we don't have to copy them) 4. Shell out to ""now"" 5. Done! ",17608, 343645249,"Doing this works: import os os.link('/tmp/databases/northwind.db', '/tmp/tmp-blah/northwind.db') That creates a link in tmp-blah - and then when I delete that entire directory like so: import shutil shutil.rmtree('/tmp/tmp-blah') The original database is not deleted, just the link.",17608, 343645327,"I can create the temporary directory like so: import tempfile t = tempfile.TemporaryDirectory() t t.name '/var/folders/w9/0xm39tk94ng9h52g06z4b54c0000gp/T/tmpkym70wlp' And then to delete it all: t.cleanup() ",17608, 343646740,I'm happy with this now that I've implemented the publish command in #26 ,17608, 343647102,"http://2016.padjo.org/tutorials/data-primer-census-acs1-demographics/ has a sqlite database: http://2016.padjo.org/files/data/starterpack/census-acs-1year/acs-1-year-2015.sqlite I tested this by deploying it here: https://datasette-fewuggrvwr.now.sh/",17608, 343647300,"Still needed: - [ ] A link to the homepage from some kind of navigation bar in the header - [ ] link to github.com/simonw/datasette in the footer - [ ] Slightly better titles (maybe ditch the visited link colours for titles only? should keep those for primary key links) - [ ] Links to the .json and .jsono versions of every view",17608, 343675165,The plugin system can also allow alternative providers for the `publish` command - e.g. maybe hook up hyper.sh as an option for publishing containers.,17608, 343676574,See also #14,17608, 343683566,"I’m going to solve this by making it an optional argument you can pass to the serve command. Then the Dockerfile can still build and use it but it won’t interfere with tests or dev. If argument is not passed, we will calculate hashes on startup and calculate table row counts on demand. ",17608, 343690060," ""parlgov-development.db"": { ""url"": ""http://www.parlgov.org/"" }, ""nhsadmin.sqlite"": { ""url"": ""https://github.com/psychemedia/openHealthDataDoodles"" }",17608, 343691342,"Closing this, opening a fresh ticket for the navigation stuff.",17608, 343697291,"I'm going to bundle sql and sql_params together into a query nested object like this: { ""query"": { ""sql"": ""select ..."", ""params"": { ""p0"": ""blah"" } } }",17608, 343698214,"I'm closing #50 - more tests will be added in the future, but the framework is neatly in place for them now. ",17608, 343699115,This needs to incorporate a sensible way of presenting custom SQL query results too. And let's get a textarea in there for executing SQL while we're at it.,17608, 343705966,https://github.com/fivethirtyeight/data has a ton of CSVs,17608, 343707624,Split the SQL thing out into #65 ,17608, 343707676,"Here's the new design: Also lists views at the bottom (refs #54): ",17608, 343708447,I ditched the metadata file concept.,17608, 343709217," ",17608, 343715915," con = sqlite3.connect('existing_db.db') with open('dump.sql', 'w') as f: for line in con.iterdump(): f.write('%s\n' % line) ",17608, 343752404,"Re-opening this - I've decided to bring back this concept, see #68 ",17608, 343752579,"By default I'll allow LIMIT and OFFSET up to a maximum of X (where X is let's say 50,000 to start with, but can be custom configured to a larger number or set to None for no limit).",17608, 343752683,"Maybe SQL views should have their own Sanic view class (`ViewView` is kinda funny), subclassed from `TableView`?",17608, 343753999,"For initial launch, I could just support this as some optional command line arguments you pass to the publish command: datasette publish data.db --title=""Title"" --source=""url""",17608, 343754058,I’m going to store this stuff in a file called metadata.json and move the existing automatically generated metadata to a file called build.json,17608, 343769692,I have created a Docker Hub public repository for this: https://hub.docker.com/r/simonwillison/datasette/,17608, 343780039,"I think the only safe way to do this is using SQLite `.fetchmany(1000)` - I can't guarantee that the user has not entered SQL that will outfox a limit in some way. So instead of attempting to edit their SQL, I'll always return 1001 records and let them know if they went over 1000 or not.",17608, 343780141,I've registered datasettes.com as a domain name for doing this. Now setting it up so Cloudflare and Now can serve content from it.,17608, 343780539,"https://zeit.co/docs/features/dns is docs now domain add -e datasettes.com I had to set up a custom TXT record on `_now.datasettes.com` to get this to work.",17608, 343780671,- [x] Redirect https://datasettes.com/ and https://www.datasettes.com/ to https://github.com/simonw/datasette,17608, 343780814,"Achieved those redirects using Cloudflare ""page rules"": https://www.cloudflare.com/a/page-rules/datasettes.com",17608, 343781030,"- [x] Have `now domain add -e datasettes.com` run without errors (hopefully just a matter of waiting for the DNS to update) - [x] Alias an example dataset hosted on Now on a datasettes.com subdomain - [x] Confirm that HTTP caching and HTTP/2 redirect pushing works as expected - this may require another page rule",17608, 343788581,"I had to add a rule like this to get letsencrypt certificates on now.sh working: https://github.com/zeit/now-cli/issues/188#issuecomment-270105052 I also have to flip this switch off every time I want to add a new alias: ",17608, 343788780,"Added another page rule in order to get Cloudflare to always obey cache headers sent by the server: ",17608, 343788817,https://fivethirtyeight.datasettes.com/ is now up and running.,17608, 343789162,"``` $ curl -i 'https://fivethirtyeight.datasettes.com/fivethirtyeight-75d605c/obama-commutations%2Fobama_commutations.csv.jsono' HTTP/1.1 200 OK Date: Mon, 13 Nov 2017 01:50:57 GMT Content-Type: application/json Transfer-Encoding: chunked Connection: keep-alive Set-Cookie: __cfduid=de836090f3e12a60579cc7a1696cf0d9e1510537857; expires=Tue, 13-Nov-18 01:50:57 GMT; path=/; domain=.datasettes.com; HttpOnly; Secure Access-Control-Allow-Origin: * Cache-Control: public, max-age=31536000 X-Now-Region: now-sfo CF-Cache-Status: HIT Expires: Tue, 13 Nov 2018 01:50:57 GMT Server: cloudflare-nginx CF-RAY: 3bce154a6d9293b4-SJC {""database"": ""fivethirtyeight"", ""table"": ""obama-commutations/obama_commutations.csv""...```",17608, 343790984,"HTTP/2 push totally worked on the redirect! fetch('https://fivethirtyeight.datasettes.com/fivethirtyeight/riddler-pick-lowest%2Flow_numbers.csv.jsono').then(r => r.json()).then(console.log) Meanwhile, in the network pane... ",17608, 343791348,I should use this on https://fivethirtyeight.datasettes.com/,17608, 343801392,"While I’m at it, let’s allow people to opt out of HTTP/2 push with a ?_nopush=1 argument too - in case they decide they don’t want to receive large 302 responses.",17608, 343951751,"For first version, I'm just supporting title, source and license information at the database level.",17608, 343961784,"`datasette package ...` - same arguments as `datasette publish`. Creates Docker container in your local repo, optionally tagged with `--tag`",17608, 343967020,http://odewahn.github.io/docker-jumpstart/example.html is helpful,17608, 344000982,"This is necessary because one of the fun things to do with this tool is run it locally, e.g.: datasette ~/Library/Application\ Support/Google/Chrome/Default/History -p 8003 BUT... if we enable CORS by default, an evil site could try sniffing for localhost:8003 and attempt to steal data. So we'll enable the CORS headers only if `--cors` is provided to the command, and then use that command in the default Dockerfile.",17608, 344017088,Implemented in https://github.com/simonw/datasette/commit/e838bd743d31358b362875854a0ac5e78047727f,17608, 344018680,Turns out it does this already: https://github.com/simonw/datasette/blob/6b3b05b6db0d2a7b7cec8b8dbb4ddc5e12a376b2/datasette/app.py#L96-L107,17608, 344019631,I'm going with a page size of 100 and a max limit of 1000,17608, 344048656," ",17608, 344060070,"I'm going to add some extra metadata to setup.py and then tag this as version 0.8: git tag 0.8 git push --tags Then to ship to PyPI: python setup.py bdist_wheel twine register dist/datasette-0.8-py3-none-any.whl twine upload dist/datasette-0.8-py3-none-any.whl ",17608, 344061762,And we're live! https://pypi.python.org/pypi/datasette,17608, 344074443,"The fivethirtyeight dataset: datasette publish now --name fivethirtyeight --metadata metadata.json fivethirtyeight.db now alias https://fivethirtyeight-jyqfudvjli.now.sh fivethirtyeight.datasettes.com And parlgov: datasette publish now parlgov.db --name=parlgov --metadata=parlgov.json now alias https://parlgov-hqvxuhmbyh.now.sh parlgov.datasettes.com ",17608, 344075696,"Parlgov was throwing errors on one of the views, which takes longer than 1000ms to execute - so I added the ability to customize the time limit in https://github.com/simonw/datasette/commit/1e698787a4dd6df0432021a6814c446c8b69bba2 datasette publish now parlgov.db --metadata parlgov.json --name parlgov --extra-options=""--sql_time_limit_ms=3500"" now alias https://parlgov-nvkcowlixq.now.sh parlgov.datasettes.com https://parlgov.datasettes.com/parlgov-25f9855/view_cabinet now returns in just over 2.5s ",17608, 344076554,"Hah, I haven't even announced this yet :) Travis is upset because I'm using SQL in the tests which isn't compatible with their version of Python 3.",17608, 344081876,The `datasette package` command introduced in 4143e3b45c16cbae5e3e3419ef479a71810e7df3 is relevant here.,17608, 344118849,Did this: https://simonwillison.net/2017/Nov/13/datasette/,17608, 344125441,"Oops, if I jumped the gun. I saw the project in my github activity feed and saw some low hanging fruit :) ",17608, 344132481,I ended up shipping with https://fivethirtyeight.datasettes.com/ and https://parlgov.datasettes.com/,17608, 344141199,"I managed to do this manually: datasette package ~/parlgov-db/parlgov.db --metadata=parlgov.json # Output 8758ec31dda3 as the new image ID docker save 8758ec31dda3 > /tmp/my-image # I could have just piped this straight to hyper cat /tmp/my-image | hyper load # Now start the container running in hyper hyper run -d -p 80:8001 --name parlgov 8758ec31dda3 # We need to assign an IP address so we can see it hyper fip allocate 1 # Outputs 199.245.58.78 hyper fip attach 199.245.58.78 parlgov At this point, visiting the IP address in a browser showed the parlgov UI. To clean up... hyper hyper fip detach parlgov hyper fip release 199.245.58.78 hyper stop parlgov hyper rm parlgov ",17608, 344141515,This is probably a bit too much for the README - I should get readthedocs working.,17608, 344145265,"I'm happy to contribute this. Just let me know if you want a Dockerfile for development or production purposes, or both. If it's prod then we can just pip install the source from pypi, otherwise for dev we'll need a `requirements.txt` to speed up rebuilds.",17608, 344147583,"Let me know if you'd like a PR. The image is usable as `docker run --rm -t -i -p 9000:8001 -v $(pwd)/db:/db datasette datasette serve /db/chinook.db`",17608, 344149165,"I’m intrigued by this pattern: https://github.com/macropin/datasette/blob/147195c2fdfa2b984d8f9fc1c6cab6634970a056/Dockerfile#L8 What’s the benefit of doing that? Does it result in a smaller image size?",17608, 344151223,"The pattern is called ""multi-stage builds"". And the result is a svelte 226MB image (201MB for 3.6-slim) vs 700MB+ for the full image. It's possible to get it even smaller, but that takes a lot more work.",17608, 344161226,Spatial extensions would be really useful too. https://www.gaia-gis.it/spatialite-2.1/SpatiaLite-manual.html,17608, 344161371,http://charlesleifer.com/blog/going-fast-with-sqlite-and-python/ is useful here too.,17608, 344161430,Also requested on Twitter: https://twitter.com/DenubisX/status/930322813864439808,17608, 344179878,https://github.com/frappe/charts perhaps ,17608, 344180866,"This isn’t necessary - restarting the server is fast and easy, and I’ve not found myself needing this at all during development.",17608, 344185817,Thanks for the explanation! Please do start a pull request. ,17608, 344352573,This is a dupe of #85 ,17608, 344409906,"Even without bundling in the database file itself, I'd love to have a standalone binary version of the core `datasette` CLI utility. I think Sanic may have some complex dependencies, but I've never tried pyinstaller so I don't know how easy or hard it would be to get this working.",17608, 344415756,Looks like we'd need to use this recipe: https://github.com/pyinstaller/pyinstaller/wiki/Recipe-Setuptools-Entry-Point,17608, 344424382,"tried quickly, this seems working: ``` ~ pip3 install pyinstaller ~ pyinstaller -F --add-data /usr/local/lib/python3.6/site-packages/datasette/templates:datasette/templates --add-data /usr/local/lib/python3.6/site-packages/datasette/static:datasette/static /usr/local/bin/datasette ~ du -h dist/datasette 6.8M dist/datasette ~ file dist/datasette dist/datasette: Mach-O 64-bit executable x86_64 ```",17608, 344426887,"That didn't quite work for me. It built me a `dist/datasette` executable but when I try to run it I get an error: $ pwd /Users/simonw/Dropbox/Development/datasette $ source venv/bin/activate $ pyinstaller -F --add-data datasette/templates:datasette/templates --add-data datasette/static:datasette/static /Users/simonw/Dropbox/Development/datasette/venv/bin/datasette $ dist/datasette --help Traceback (most recent call last): File ""datasette"", line 11, in File ""site-packages/pkg_resources/__init__.py"", line 572, in load_entry_point File ""site-packages/pkg_resources/__init__.py"", line 564, in get_distribution File ""site-packages/pkg_resources/__init__.py"", line 436, in get_provider File ""site-packages/pkg_resources/__init__.py"", line 984, in require File ""site-packages/pkg_resources/__init__.py"", line 870, in resolve pkg_resources.DistributionNotFound: The 'datasette' distribution was not found and is required by the application [99117] Failed to execute script datasette ",17608, 344427448,Hooray! First dataset that wasn't deployed by me :) https://github.com/simonw/datasette/wiki/Datasettes,17608, 344427560,I'm getting an internal server error on http://run.plnkr.co/preview/cj9zlf1qc0003414y90ajkwpk/ at the moment,17608, 344430299,"i will look better tomorrow, it's late i surely made some mistake https://asciinema.org/a/ZyAWbetrlriDadwWyVPUWB94H",17608, 344430689,"> I'm getting an internal server error on http://run.plnkr.co/preview/cj9zlf1qc0003414y90ajkwpk/ at the moment Sorry about that - here's a working version on Netlify: https://nhs-england-map.netlify.com",17608, 344438724,"Plugins should be able to interact with the build step. This would give plugins an opportunity to modify the SQL databases and help prepare them for serving - for example, a full-text search plugin might create additional FTS tables, or a mapping plugin might pre-calculate a bunch of geohashes for tables that have latitude/longitude values. Plugins could really take advantage of the immutable nature of the dataset here.",17608, 344440377,"It worked! $ pyinstaller -F \ --add-data /usr/local/lib/python3.5/site-packages/datasette/templates:datasette/templates \ --add-data /usr/local/lib/python3.5/site-packages/datasette/static:datasette/static \ /usr/local/bin/datasette $ file dist/datasette dist/datasette: Mach-O 64-bit executable x86_64 $ dist/datasette --help Usage: datasette [OPTIONS] COMMAND [ARGS]... Datasette! Options: --help Show this message and exit. Commands: serve* Serve up specified SQLite database files with... build package Package specified SQLite files into a new... publish Publish specified SQLite database files to... ",17608, 344440658,It's a shame pyinstaller can't act as a cross-compiler - so I don't think I can get Travis CI to build packages. But it's fantastic that it's possible to turn the tool into a standalone executable!,17608, 344452063,"This can work in reverse too. If you view the row page for something that has foreign keys against it, we can show you “53 items in TABLE link to this” and provide a link to view them all. That count worry could be prohibitively expensive. To counter that, we could run the count query via Ajax and set a strict time limit on it. See #95",17608, 344452326,This will work well in conjunction with https://github.com/simonw/csvs-to-sqlite/issues/2,17608, 344462277,"This is exactly what I was after, thanks!",17608, 344462608,"Fixed in https://github.com/simonw/datasette/commit/8252daa4c14d73b4b69e3f2db4576bb39d73c070 - thanks, @tomdyson!",17608, 344463436,"This means clients can ask questions but say ""don't bother if it takes longer than X"" - which is really handy when you're working against unknown databases that might be small or might be enormous.",17608, 344472313,"Works for me. I'm going to land this. Just one thing: simonw$ docker run --rm -t -i -p 9001:8001 c408e8cfbe40 datasette publish now The publish command requires ""now"" to be installed and configured Follow the instructions at https://zeit.co/now#whats-now Maybe we should have the Docker container install the ""now"" client? Not sure how much size that would add though. I think it's OK without for the moment.",17608, 344487639,"Since you can already download the database directly, I'm not going to bother with this one.",17608, 344516406,actually you can use travis to build for linux/macos and [appveyor](https://www.appveyor.com/) to build for windows.,17608, 344597274,This is a duplicate of https://github.com/simonw/datasette/issues/100,17608, 344657040,"Since detecting foreign keys that point to a specific table is a bit expensive (you have to call a PRAGMA on every other table) I’m going to add this to the build/inspect stage. Idea: if we detect that the foreign key table only has one other column in it (id, name) AND we know that the id is the primary key, we can add an efficient lookup on the table list view and prefetch a dictionary mapping IDs to their value. Then we can feed that dictionary in as extra tenplate context and use it to render labeled hyperlinks in the corresponding column. This means our build step should also cache which columns are indexed, and add a “label_column” property for tables with an obvious lane column.",17608, 344667202,@jacobian points out that a buildpack may be a better fit than a Docker container for implementing this: https://twitter.com/jacobian/status/930849058465255424,17608, 344680385,"Maybe we don’t even need a buildpack... we could create a temporary directory, set up a classic heroku app with the datasette serve command in the Procfile and then git push to deploy.",17608, 344686483,The “datasette build” command would need to run in a bin/post_compile script eg https://github.com/simonw/simonwillisonblog/blob/cloudflare-ips/bin/post_compile,17608, 344687328,"By default the command could use a temporary directory that gets cleaned up after the deploy, but we could allow users to opt in to keeping the generated directory like so: datasette publish heroku mydb.py -d ~/dev/my-heroku-app This would create the my-heroku-app folder so you can later execute further git deploys from there.",17608, 344710204,"A first basic stab at making this work, just to prove the approach. Right now this requires [a Heroku CLI plugin](https://github.com/heroku/heroku-builds), which seems pretty unreasonable. I think this can be replaced with direct API calls, which could clean up a lot of things. But I wanted to prove it worked first, and it does.",17608, 344770170,"It is - but I think this will break on this line since it expects two format string parameters: https://github.com/simonw/datasette/blob/f45ca30f91b92ac68adaba893bf034f13ec61ced/datasette/utils.py#L61 Needs unit tests too, which live here: https://github.com/simonw/datasette/blob/f45ca30f91b92ac68adaba893bf034f13ec61ced/tests/test_utils.py#L49",17608, 344771130,"Aha... it looks like this is a Jinja version problem: https://github.com/ansible/ansible/issues/25381#issuecomment-306492389 Datasette depends on sanic-jinja2 - and that doesn't depend on a particular jinja2 version: https://github.com/lixxu/sanic-jinja2/blob/7e9520850d8c6bb66faf43b7f252593d7efe3452/setup.py#L22 So if you have an older version of Jinja installed, stuff breaks.",17608, 344786528," ",17608, 344788435,Demo: https://australian-dogs.now.sh/australian-dogs-3ba9628?sql=select+name%2C+count%28*%29+as+n+from+%28%0D%0A%0D%0Aselect+upper%28%22Animal+name%22%29+as+name+from+%5BAdelaide-City-Council-dog-registrations-2013%5D+where+Breed+like+%3Abreed%0D%0A%0D%0Aunion+all%0D%0A%0D%0Aselect+upper%28Animal_Name%29+as+name+from+%5BAdelaide-City-Council-dog-registrations-2014%5D+where+Breed_Description+like+%3Abreed%0D%0A%0D%0Aunion+all+%0D%0A%0D%0Aselect+upper%28Animal_Name%29+as+name+from+%5BAdelaide-City-Council-dog-registrations-2015%5D+where+Breed_Description+like+%3Abreed%0D%0A%0D%0Aunion+all%0D%0A%0D%0Aselect+upper%28%22AnimalName%22%29+as+name+from+%5BCity-of-Port-Adelaide-Enfield-Dog_Registrations_2016%5D+where+AnimalBreed+like+%3Abreed%0D%0A%0D%0Aunion+all%0D%0A%0D%0Aselect+upper%28%22Animal+Name%22%29+as+name+from+%5BMitcham-dog-registrations-2015%5D+where+Breed+like+%3Abreed%0D%0A%0D%0Aunion+all%0D%0A%0D%0Aselect+upper%28%22DOG_NAME%22%29+as+name+from+%5Bburnside-dog-registrations-2015%5D+where+DOG_BREED+like+%3Abreed%0D%0A%0D%0Aunion+all+%0D%0A%0D%0Aselect+upper%28%22Animal_Name%22%29+as+name+from+%5Bcity-of-playford-2015-dog-registration%5D+where+Breed_Description+like+%3Abreed%0D%0A%0D%0Aunion+all%0D%0A%0D%0Aselect+upper%28%22Animal+Name%22%29+as+name+from+%5Bcity-of-prospect-dog-registration-details-2016%5D+where%22Breed+Description%22+like+%3Abreed%0D%0A%0D%0A%29+group+by+name+order+by+n+desc%3B&breed=chihuahua,17608, 344788763,Another demo - this time it lets you search by name and see the most popular breeds with that name: https://australian-dogs.now.sh/australian-dogs-3ba9628?sql=select+breed%2C+count%28*%29+as+n+from+%28%0D%0A%0D%0Aselect+upper%28%22Breed%22%29+as+breed+from+%5BAdelaide-City-Council-dog-registrations-2013%5D+where+%22Animal+name%22+like+%3Aname%0D%0A%0D%0Aunion+all%0D%0A%0D%0Aselect+upper%28%22Breed_Description%22%29+as+breed+from+%5BAdelaide-City-Council-dog-registrations-2014%5D+where+%22Animal_Name%22+like+%3Aname%0D%0A%0D%0Aunion+all+%0D%0A%0D%0Aselect+upper%28%22Breed_Description%22%29+as+breed+from+%5BAdelaide-City-Council-dog-registrations-2015%5D+where+%22Animal_Name%22+like+%3Aname%0D%0A%0D%0Aunion+all%0D%0A%0D%0Aselect+upper%28%22AnimalBreed%22%29+as+breed+from+%5BCity-of-Port-Adelaide-Enfield-Dog_Registrations_2016%5D+where+%22AnimalName%22+like+%3Aname%0D%0A%0D%0Aunion+all%0D%0A%0D%0Aselect+upper%28%22Breed%22%29+as+breed+from+%5BMitcham-dog-registrations-2015%5D+where+%22Animal+Name%22+like+%3Aname%0D%0A%0D%0Aunion+all%0D%0A%0D%0Aselect+upper%28%22DOG_BREED%22%29+as+breed+from+%5Bburnside-dog-registrations-2015%5D+where+%22DOG_NAME%22+like+%3Aname%0D%0A%0D%0Aunion+all+%0D%0A%0D%0Aselect+upper%28%22Breed_Description%22%29+as+breed+from+%5Bcity-of-playford-2015-dog-registration%5D+where+%22Animal_Name%22+like+%3Aname%0D%0A%0D%0Aunion+all%0D%0A%0D%0Aselect+upper%28%22Breed+Description%22%29+as+breed+from+%5Bcity-of-prospect-dog-registration-details-2016%5D+where+%22Animal+Name%22+like+%3Aname%0D%0A%0D%0A%29+group+by+breed+order+by+n+desc%3B&name=rex,17608, 344810525,"@simonw On the spatialite support, here is some info to make it work and a screenshot: I used the following Dockerfile: ``` FROM prolocutor/python3-sqlite-ext:3.5.1-spatialite as build RUN mkdir /code ADD . /code/ RUN pip install /code/ EXPOSE 8001 CMD [""datasette"", ""serve"", ""/code/ne.sqlite"", ""--host"", ""0.0.0.0""] ``` and added this to `prepare_connection`: ``` conn.enable_load_extension(True) conn.execute(""SELECT load_extension('/usr/local/lib/mod_spatialite.so')"") ```",17608, 344811268,"Thanks for the guidance. I added a unit test and made a slight change to utils.py. I didn't realize this, but evidently string.format only complains if you supply less arguments than there are format placeholders, so the original commit worked, but was adding a superfluous named param. I added a conditional that prevents the named param from being created and ensures the correct number of args are passed to sting.format. It has the side effect of hiding the SQL query in /templates/table.html when there are no other where clauses--not sure if that's the desired outcome here.",17608, 344864254,@simonw I see. I upgraded sanic-jinja2 and jinja2: it now works flawlessly. Thank you!,17608, 344975156,"That's fantastic! Thank you very much for that. Do you know if it's possible to view the Dockerfile used by https://hub.docker.com/r/prolocutor/python3-sqlite-ext/ ?",17608, 344976104,Found a relevant Dockerfile on Reddit: https://www.reddit.com/r/Python/comments/5unkb3/install_sqlite3_on_python_3/ddzdz2b/,17608, 344976882,Maybe part of the solution here is to add a `--load-extension` argument to `datasette` - so when you run the command you can specify SQLite extensions that should be loaded. ,17608, 344986423,http://datasette.readthedocs.io/,17608, 344988263,"Here's how I tested this. First I downloaded and started a docker container using https://hub.docker.com/r/prolocutor/python3-sqlite-ext - which includes the compiled spatialite extension. This downloads it, then starts a shell in that container. docker run -it -p 8018:8018 prolocutor/python3-sqlite-ext:3.5.1-spatialite /bin/sh Installed a pre-release build of datasette which includes the new `--load-extension` option. pip install https://static.simonwillison.net/static/2017/datasette-0.13-py3-none-any.whl Now grab a sample database from https://www.gaia-gis.it/spatialite-2.3.1/resources.html - and unzip and rename it (datasette doesn't yet like databases with dots in their filename): wget http://www.gaia-gis.it/spatialite-2.3.1/test-2.3.sqlite.gz gunzip test-2.3.sqlite.gz mv test-2.3.sqlite test23.sqlite Now start datasette on port 8018 (the port I exposed earlier) with the extension loaded: datasette test23.sqlite -p 8018 -h 0.0.0.0 --load-extension /usr/local/lib/mod_spatialite.so Now I can confirm that it worked: http://localhost:8018/test23-c88bc35?sql=select+ST_AsText%28Geometry%29+from+HighWays+limit+1 If I run datasette without `--load-extension` I get this: datasette test23.sqlite -p 8018 -h 0.0.0.0 ",17608, 344988591,"OK, `--load-extension` is now a supported command line option - see #110 which includes my notes on how I manually tested it using the `prolocutor/python3-sqlite-ext` Docker image.",17608, 344989340,The fact that `prolocutor/python3-sqlite-ext` doesn't provide a visible Dockerfile and hasn't been updated in two years makes me hesitant to bake it into datasette itself. I'd rather put together a Dockerfile that enables the necessary extensions and can live in the datasette repository itself.,17608, 344995571,The JSON extension would be very worthwhile too: https://www.sqlite.org/json1.html,17608, 345002908,I'll try to find alternatives to the Dockerfile option - I also think we should not use that old one without sources or license.,17608, 345013127,Having this as a global option may not make sense when publishing multiple databases. We can revisit that when we implement per-database and per-table metadata.,17608, 345017256,"To finish up, I committed the image I created in the above so I can run it again in the future: docker commit $(docker ps -lq) datasette-sqlite Now I can run it like this: docker run -it -p 8018:8018 datasette-sqlite datasette /tmp/test23.sqlite -p 8018 -h 0.0.0.0 --load-extension /usr/local/lib/mod_spatialite.so ",17608, 345067498,"For visualizations, Google Maps should be made available as a plugin. The default visualizations can use Leaflet and Open Street Map, but there's no reason to not make Google Maps available as a plugin, especially if the plugin can provide a mechanism for configuring the necessary API key. I'm particularly excited in the Google Maps heatmap visualization https://developers.google.com/maps/documentation/javascript/heatmaplayer as seen on http://mochimachine.org/wasteland/",17608, 345108644,Looks like your tests are failing because of a bug which I fixed in https://github.com/simonw/datasette/commit/9199945a1bcec4852e1cb866eb3642614dd32a48 - if you rebase to master the tests should pass.,17608, 345117690,"Thanks for bearing with me. I was getting a message about my branch diverging when I tried to push after rebasing, so I merged master into isnull, seems like that did the trick. Let me know if I should make any corrections.",17608, 345138134,Fantastic! Thank you very much.,17608, 345138347,We now have a Dockerfile that compiles spatialite! https://github.com/simonw/datasette/pull/114/commits/6c6b63d890529eeefcefb7ab126ea3bd7b2315c1,17608, 345150048,`csvs-to-sqlite` is now capable of generating databases with foreign key lookup tables: https://github.com/simonw/csvs-to-sqlite/releases/tag/0.3,17608, 345242447,"I could support explicit label columns using additional arguments to `datasette serve`: datasette serve mydb.py --label-column mydb:table1:name --label-column mydb:table2:title This would mean ""in mydb, set the label column for table1 to name, and the label column for table2 to title""",17608, 345255655,"I tesed this by first building and running a container using the new Dockerfile from #114: docker build . docker run -it -p 8001:8001 6c9ca7e29181 /bin/sh Then I ran this inside the container itself: apt update && apt-get install wget -y \ && wget http://www.gaia-gis.it/spatialite-2.3.1/test-2.3.sqlite.gz \ && gunzip test-2.3.sqlite.gz \ && mv test-2.3.sqlite test23.sqlite \ && datasette -h 0.0.0.0 test23.sqlite I visited this URL to confirm I got an error due to spatialite not being loaded: http://localhost:8001/test23-c88bc35?sql=select+ST_AsText%28Geometry%29+from+HighWays+limit+1 Then I checked that loading it with `--load-extension` worked correctly: datasette -h 0.0.0.0 test23.sqlite \ --load-extension=/usr/lib/x86_64-linux-gnu/mod_spatialite.so Then, finally, I tested it with the new environment variable option: SQLITE_EXTENSIONS=/usr/lib/x86_64-linux-gnu/mod_spatialite.so \ datasette -h 0.0.0.0 test23.sqlite Running it with an invalid environment variable option shows an error: $ SQLITE_EXTENSIONS=/usr/lib/x86_64-linux-gnu/blah.so datasette \ -h 0.0.0.0 test23.sqlite Usage: datasette -h [OPTIONS] [FILES]... Error: Invalid value for ""--load-extension"": Path ""/usr/lib/x86_64-linux-gnu/blah.so"" does not exist. ",17608, 345256576,"This is great - I've been frustrated by how CodeMirror prevents me from hitting tab-enter to activate the ""Run SQL"" button. ",17608, 345259115,"OK, I can confirm that the version in the new docker container supports FTS5, JSON *and* spatialite! Notes on how I built the container and tested the spatialite extension are here: https://github.com/simonw/datasette/issues/112#issuecomment-345255655 To confirm that JSON and FTS5 are working, I ran the following: $ docker run -it -p 8001:8001 6c9ca7e29181 python Python 3.6.3 (default, Nov 4 2017, 14:24:48) [GCC 6.3.0 20170516] on linux Type ""help"", ""copyright"", ""credits"" or ""license"" for more information. >>> import sqlite3 >>> sqlite3.connect(':memory:').execute('CREATE VIRTUAL TABLE email USING fts5(sender, title, body);') >>> list(sqlite3.connect(':memory:').execute('''SELECT json(' { ""this"" : ""is"", ""a"": [ ""test"" ] } ') ''')) [('{""this"":""is"",""a"":[""test""]}',)] If I do the same thing in python3 on my OS X laptop directly, I get this: $ python3 Python 3.5.1 (default, Apr 18 2016, 11:46:32) [GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.29)] on darwin Type ""help"", ""copyright"", ""credits"" or ""license"" for more information. >>> import sqlite3 >>> sqlite3.connect(':memory:').execute('CREATE VIRTUAL TABLE email USING fts5(sender, title, body);') Traceback (most recent call last): File """", line 1, in sqlite3.OperationalError: no such module: fts5 >>> list(sqlite3.connect(':memory:').execute('''SELECT json(' { ""this"" : ""is"", ""a"": [ ""test"" ] } ') ''')) Traceback (most recent call last): File """", line 1, in sqlite3.OperationalError: no such function: json ",17608, 345260784,This was fixed by ed2b3f25beac720f14869350baacc5f62b065194 in #107 - thanks @raynae!,17608, 345262738,"Consider for example https://fivethirtyeight.datasettes.com/fivethirtyeight/inconvenient-sequel%2Fratings The idea here is to be able to support querystring parameters like this: * `?timestamp___date=2017-07-17` - return every item where the timestamp falls on that date * `?timestamp___year=2017` - return every item where the timestamp falls within 2017 * `?timestamp___month=1` - return every item where the month component is January * `?timestamp___day=10` - return every item where the day-of-the-month component is 10 This is similar to #64 but a fair bit more complicated. SQLite date functions are documented here: https://sqlite.org/lang_datefunc.html ",17608, 345342512,"This should support multiple columns, e.g. `?_group_count=precinct&_group_count=candidate`",17608, 345343079,Should this support sum/avg/etc as well?,17608, 345404257,Thanks!,17608, 345447161,any reason I shouldn't land this?,17608, 345448756,"This may be useful: https://github.com/coleifer/peewee/blob/db85167d93861451a1fe7cde8c4f05748b222634/peewee.py#L162-L185",17608, 345452215,"If a column value is invalid JSON, let's return the invalid JSON as a regular string.",17608, 345452669,"I'd like to do a bit of cleanup, and some error checking in case heroku/heroku-builds isn't installed.",17608, 345493344,Looks like there are a ton of interesting datasets packaged in this way at http://datahub.io/docs/core-data - see also https://github.com/datasets,17608, 345494052,https://github.com/rgieseke/pandas-datapackage-reader,17608, 345494724,"This is working really nicely now: ",17608, 345494775,"Now that we have foreign key support (#85) this is even more important, since foreign key support actively encourages linking to filtered table views.",17608, 345494918,"If the selected relationship is a foreign key reference, we should resolve that foreign key and display it on the page.",17608, 345494971,It would be great if this could support foreign key references and automatically resolve and hyperlink them if they are detected.,17608, 345495046,Maybe I should support `&_count=1` to handle this - that would be easy to Ajax-in in conjenction with the other filters.,17608, 345496540,"OK,I've figured out how to do an initial version of this without JavaScript. I'll provide three form fields labell d ""add filter"": * a select box of all of the columns * a select box of the available operations * a value box Submit those and the site will redirect you to a correctly populated querystring for that filter. If you have filters applied, those will display as prepopulated form field triples. For foreign key reference filters, I will display the resolved value next to the text box containing the numeric ID. In the future this can get a select2 style treatment.",17608, 345497453,I'm going to be a bit classier about this and auto generate a title for the page that describes the currently applied filters.,17608, 345497534,"""Tablename: 3,567 rows where status = 3 (published) and n > 55""",17608, 345497689,"I'll have to refactor the foreign key annotating code to be usable in other contexts - at the moment it only works for annotating displays of rows, but I need to use it to resolve selected filters as well. ",17608, 345503897,"Thanks, I wrote this very simple reader because the default approach as described on the Datahub pages seemed to complicated. I had metadata from the `datapackage.json` attached to the returned DataFrames but removed this due to some attribute handling change in the latest Pandas version. This could also be useful for getting from Data Package to SQL db: https://github.com/frictionlessdata/tableschema-sql-py I maintain a few climate science related dataset at https://github.com/openclimatedata/ The Data Retriever (mainly ecological data) by @ethanwhite et al. is also using the Data Package format for metadata and has some tooling for different dbs: https://frictionlessdata.io/articles/the-data-retriever/ https://github.com/weecology/retriever The Open Power System Data project also has a couple of datasets that show nicely how CSV is great for assembling and then already make SQLite files available. It's one of the first data sets I tried with Datasette, perfect for the use case of getting an API for putting power stations on a map ... https://data.open-power-system-data.org/",17608, 345509500,"Specifically docs should make it clearer this file exists https://parlgov.datasettes.com/.json And from that you can build https://parlgov.datasettes.com/parlgov-25f9855.json Then https://parlgov.datasettes.com/parlgov-25f9855/cabinet.json",17608, 345526171,"Relevant SQLite docs: * https://sqlite.org/fts5.html * https://www.sqlite.org/fts3.html",17608, 345526517,"Since SQLite supports column specifications in the MATCH body itself, there's no need to provide a separate mechanism for specifying columns in the query string: https://sqlite.org/fts5.html#fts5_column_filters",17608, 345533274,"Demo: https://sf-trees.now.sh/sf-trees-ebc2ad9/Street_Tree_List?_search=grove+st ",17608, 345537268,Dupe of #127 ,17608, 345537315,This would enable faceted search - moving it to the search milestone.,17608, 345538016,I implemented a basic version of this in f59c840e7db8870afcdeba7a53bdea07bb674334 for custom SQL.,17608, 345552358,"For the overall shape of the rows: `?_shape=lists` (default), `?_shape=objects`, `?_shape=object` (primary key as object keys) For getting back extra keys: `?_extras=schema,query,timing` For expanding columns: `?_expand_all=1` Or `?_expand=qSpecies&_expand=qCaretaker` The template view will only be allowed to work with data it can request using extra options. That leaves one sighted nasty edge-case: the default view will expand all columns, but the `.json` view of it won't? I think that's OK. The default view won't include the extras used by the template to render the page either.",17608, 345552440,"This calls for refactoring the code so the table view, the row view and the custom SQL view share as much logic as possible.",17608, 345552500,"To start with, I could just ditch the .jsono in favour of the new _shape argument.",17608, 345559864,"I need a nicer abstraction around the concept of filters. It needs to be able to: - convert querystring parameters into filters - convert filters into a querystring - iterate through currently applied filters - convert selected filters into a human description (e.g. for a title) - expand filters that involve a foreign key - add filters - remove filters - define different types of filters It should replace my current `build_where_clauses` implementation, in particular this bit: https://github.com/simonw/datasette/blob/a5881e105a02830d26f07e98177248d5910893da/datasette/utils.py#L38-L56",17608, 345601103,"Some demos: Single column: https://sf-trees-flat.now.sh/sf-trees-flat-ba738ce/Street_Tree_List?_group_count=qSpecies Multi column: https://sf-trees-flat.now.sh/sf-trees-flat-ba738ce/Street_Tree_List?_group_count=qLegalStatus&_group_count=qSpecies ",17608, 345601870,This may be tackled by the filters work happening in #86,17608, 345652450,"If Data Package metadata gets adopted (#105) the views spec work might also be worth a look: http://frictionlessdata.io/specs/views/ http://datahub.io/docs/features/views ",17608, 345750135,"One possible route: introduce prefixes eg `?a.Trees.age__gt=5&a.Trees._group_count=qSpecies&b.Trees.age__gt=10&b.Trees._group_count=qSpecies` ",17608, 345793887,"Need to hide these from the index summary page as well: ",17608, 345809808,"OK, https://github.com/openclimatedata/global-carbon-budget/blob/master/datapackage.json really does look like it covers all of the bases I need for #138. Closing this ticket in favour of that new one.",17608, 345810031,See also #138,17608, 345893877,http://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins Is pretty good ,17608, 346116745,"@simonw ready for a review and merge if you want. There's still some nasty duplicated code in cli.py and utils.py, which is just going to get worse if/when we start adding any other deploy targets (and I want to do one for cloud.gov, at least). I think there's an opportunity for some refactoring here. I'm happy to do that now as part of this PR, or if you merge this first I'll do it in a different one.",17608, 346124073,"Actually hang on, don't merge - there are some bugs that #141 masked when I tested this out elsewhere.",17608, 346124764,"OK, now this should work.",17608, 346157542,"I think a copy is the right thing to do here - it will be cleaned up when the temp directory is removed. The hard link thing was always intended to save space, but if we can't do a hard link I don't see any harm in a temporary file copy.",17608, 346161985,"Woohoo! I've found one tiny issue: right now, the following doesn't work: datasette publish heroku ../demo-databses/google-trends.db It results in this error in the Heroku logs: 2017-11-21T21:03:29.210511+00:00 app[web.1]: Usage: datasette serve [OPTIONS] [FILES]... 2017-11-21T21:03:29.210524+00:00 app[web.1]: 2017-11-21T21:03:29.210555+00:00 app[web.1]: Error: Invalid value for ""files"": Path ""../demo-databses/google-trends.db"" does not exist. The command works fine if you run it in the same directory as the database file you are publishing.",17608, 346163513,"The reason relative paths work for `publish now` is that the `make_dockerfile()` function is called by passing the file names, not the full file paths: https://github.com/simonw/datasette/blob/e47117ce1d15f11246a3120aa49de70205713d05/datasette/utils.py#L166 Clearly the correct thing to do here is for us to refactor the shared code between heroku/package/now.",17608, 346217739,Might be nice to have a --no-limits option that disables time and maximum row count limits.,17608, 346244871,"I'd also suggest taking a look at [stevedore](https://docs.openstack.org/stevedore/latest/), which has a ton of tools for doing plugin stuff. I've had good luck with it in the past.",17608, 346405660,"I have a solution for FTS already, but I'm interested in apsw as a mechanism for allowing custom virtual tables to be written in Python (pysqlite only lets you write custom functions) Not having PyPI support is pretty tough though. I'm planning a plugin/extension system which would be ideal for things like an optional apsw mode, but that's a lot harder if apsw isn't in PyPI.",17608, 346406009,"Oh thanks, that definitely looks like an interesting option.",17608, 346427794,"Thanks. There is a way to use pip to grab apsw, which also let's you configure it (flags to build extensions, use an internal sqlite, etc). Don't know how that works as a dependency for another package, though. On November 22, 2017 11:38:06 AM EST, Simon Willison wrote: >I have a solution for FTS already, but I'm interested in apsw as a >mechanism for allowing custom virtual tables to be written in Python >(pysqlite only lets you write custom functions) > >Not having PyPI support is pretty tough though. I'm planning a >plugin/extension system which would be ideal for things like an >optional apsw mode, but that's a lot harder if apsw isn't in PyPI. > >-- >You are receiving this because you authored the thread. >Reply to this email directly or view it on GitHub: >https://github.com/simonw/datasette/issues/144#issuecomment-346405660 ",17608, 346463342,"On the index page: On the database index page: After clicking that link: ",17608, 346530498,"Here's where I am now. Needs a bit of UI tidy up and it will be good to release: ",17608, 346682905," ",17608, 346691243," ",17608, 346694211,And with ef3eacf622e69723d48ab1ad597645770a7361db I'm ready to call this one done.,17608, 346701751," ",17608, 346705879,"Easiest way to do this will be to move it into the same `
` as the filters. Would be nice to detect `?_search=` and redirect to URL without the `_search` parameter, just for aesthetics.",17608, 346900554," ",17608, 346902583," ",17608, 346903317,"Custom SQL results now look like this: ",17608, 346974336,FWIW I worked around this by setting TMPDIR to ~/tmp before running the command.,17608, 346987395,"Are there performance gains when using immutable as opposed to read-only? From what I see other processes can still modify the DB when immutable, but there are no change notifications.",17608, 347049888,"https://sqlite.org/c3ref/open.html Is the only documentation I've been able to find of the immutable option: > **immutable**: The immutable parameter is a boolean query parameter that indicates that the database file is stored on read-only media. When immutable is set, SQLite assumes that the database file cannot be changed, even by a process with higher privilege, and so the database is opened read-only and all locking and change detection is disabled. Caution: Setting the immutable property on a database file that does in fact change can result in incorrect query results and/or SQLITE_CORRUPT errors. ",17608, 347050235,"I've been thinking about 1. a bit - I actually think it would be fine to have a rule that says ""if the contents of the cell starts with `http://` or `https://` and doesn't contain any whitespace, turn that into a link"". If you need the non-linked version that will always be available in the JSON. For the other two... I think #12 may be the way to go here: if you can easily over-ride the `row.html` and `table.html` templates for specific databases you can easily set pre-formatted text or similar for certain values - maybe even with CSS that targets a specific table column.",17608, 347051331,"One quick fix could be to add a `extra_css_url` key to the `metadata.json` format (which currently hosts `title`, `license_url` etc) - if populated, we can inject a link to that stylesheet on every page. We could add a few classes in strategic places that include the database and table names to give people styling hooks. While we're at it, an `extra_js_url` key would let people go really nuts!",17608, 347123991,"That's the only reference to immutable I saw as well, making me think that there may be no perceivable advantages over simply using mode=ro. Since the database is never or seldom updated the change notifications should not impact performance.",17608, 347236102,I'd really like to get some benchmarks working so I can see the actual impact of this kind of thing.,17608, 347713453,Could you provide the SQL to create a reproducible test case (both CREATE TABLE and INSERT statements)?,17608, 347714314,"``` CREATE TABLE rhs ( id INTEGER PRIMARY KEY, name TEXT ); CREATE TABLE lhs ( symbol INTEGER PRIMARY KEY, FOREIGN KEY (symbol) REFERENCES rhs(id) ); INSERT INTO rhs VALUES (1, ""foo""); INSERT INTO rhs VALUES (2, ""bar""); INSERT INTO lhs VALUES (1); INSERT INTO lhs VALUES (2); ``` It's expected that in lhs's view, foo / bar should be displayed.",17608, 347714471,Thanks!,17608, 347715452,"Interestingly, it almost does the right thing on the individual row page: https://bug-155-dkcqckhgki.now.sh/bug-155-9a7bb68/lhs/1 The symbol has been expanded, but there's a rogue '1' that shouldn't be there at all - I think that's bug #152 The table view itself is definitely doing the wrong thing: https://bug-155-dkcqckhgki.now.sh/bug-155-9a7bb68/lhs ",17608, 347735334,"@ftrain OK I've shipped the first version of this. Here's the initial documentation: Create a `metadata.json` file that looks like this: { ""extra_css_urls"": [ ""https://simonwillison.net/static/css/all.bf8cd891642c.css"" ], ""extra_js_urls"": [ ""https://code.jquery.com/jquery-3.2.1.slim.min.js"" ] } Then start datasette like this: datasette mydb.db --metadata=metadata.json The CSS and JavaScript files will be linked in the `` of every page. You can also specify a SRI (subresource integrity hash) for these assets: { ""extra_css_urls"": [ { ""url"": ""https://simonwillison.net/static/css/all.bf8cd891642c.css"", ""sri"": ""sha384-9qIZekWUyjCyDIf2YK1FRoKiPJq4PHt6tp/ulnuuyRBvazd0hG7pWbE99zvwSznI"" } ], ""extra_js_urls"": [ { ""url"": ""https://code.jquery.com/jquery-3.2.1.slim.min.js"", ""sri"": ""sha256-k2WSCIexGzOj3Euiig+TlR8gA0EmPjuc79OEeY5L45g="" } ] } Modern browsers will only execute the stylsheet or JavaScript if the SRI hash matches the content served. You can generate hashes using www.srihash.org This isn't shipped in a release yet, but you can still access these features in `datasette publish` like so: datasette publish now mydb.db --metadata=metadata.json --branch=master The `--branch=master` option will pull the latest master build of Datasette from GitHub.",17608, 347735598,"To style individual columns you'll currently need to use the `nth-of-type` selector, e.g.: td:nth-of-type(5):before { white-space: pre }",17608, 347735724,(This only addresses point 2 in your issue description - points 1 and point 3 are still to come),17608, 347928926,"OK, that's point 1 covered.",17608, 348103270,"Every template now gets CSS classes in the body designed to support custom styling. The index template (the top level page at /) gets this: The database template (/dbname/) gets this: The table template (/dbname/tablename) gets: The row template (/dbname/tablename/rowid) gets: The db-x and table-x classes use the database or table names themselves IF they are valid CSS identifiers. If they aren't, we strip any invalid characters out and append a 6 character md5 digest of the original name, in order to ensure that multiple tables which resolve to the same stripped character version still have different CSS classes. Some examples (extracted from the unit tests): ""simple"" => ""simple"" ""MixedCase"" => ""MixedCase"" ""-no-leading-hyphens"" => ""no-leading-hyphens-65bea6"" ""_no-leading-underscores"" => ""no-leading-underscores-b921bc"" ""no spaces"" => ""no-spaces-7088d7"" ""-"" => ""336d5e"" ""no $ characters"" => ""no--characters-59e024"" ",17608, 348245757,"It is now possible to over-ride templates on a per-database / per-row or per- table basis. When you access e.g. `/mydatabase/mytable` Datasette will look for the following: - table-mydatabase-mytable.html - table.html If you provided a `--template-dir` argument to datasette serve it will look in that directory first. The lookup rules are as follows: Index page (/): index.html Database page (/mydatabase): database-mydatabase.html database.html Table page (/mydatabase/mytable): table-mydatabase-mytable.html table.html Row page (/mydatabase/mytable/id): row-mydatabase-mytable.html row.html If a table name has spaces or other unexpected characters in it, the template filename will follow the same rules as our custom `` CSS classes introduced in 8ab3a16 - for example, a table called ""Food Trucks"" will attempt to load the following templates: table-mydatabase-Food-Trucks-399138.html table.html It is possible to extend the default templates using Jinja template inheritance. If you want to customize EVERY row template with some additional content you can do so by creating a `row.html` template like this: {% extends ""default:row.html"" %} {% block content %}

EXTRA HTML AT THE TOP OF THE CONTENT BLOCK

This line renders the original block:

{{ super() }} {% endblock %} ",17608, 348245843,"It is now possible to over-ride templates on a per-database / per-row or per- table basis. When you access e.g. `/mydatabase/mytable` Datasette will look for the following: - table-mydatabase-mytable.html - table.html If you provided a `--template-dir` argument to datasette serve it will look in that directory first. The lookup rules are as follows: Index page (/): index.html Database page (/mydatabase): database-mydatabase.html database.html Table page (/mydatabase/mytable): table-mydatabase-mytable.html table.html Row page (/mydatabase/mytable/id): row-mydatabase-mytable.html row.html If a table name has spaces or other unexpected characters in it, the template filename will follow the same rules as our custom `` CSS classes introduced in 8ab3a16 - for example, a table called ""Food Trucks"" will attempt to load the following templates: table-mydatabase-Food-Trucks-399138.html table.html It is possible to extend the default templates using Jinja template inheritance. If you want to customize EVERY row template with some additional content you can do so by creating a `row.html` template like this: {% extends ""default:row.html"" %} {% block content %}

EXTRA HTML AT THE TOP OF THE CONTENT BLOCK

This line renders the original block:

{{ super() }} {% endblock %} ",17608, 348248406,Remaining work on this now lives in a milestone: https://github.com/simonw/datasette/milestone/6,17608, 348248957,https://simonwillison.net/2017/Nov/25/new-in-datasette/,17608, 348252037,"WOW! -- Paul Ford // (646) 369-7128 // @ftrain On Thu, Nov 30, 2017 at 11:47 AM, Simon Willison wrote: > Remaining work on this now lives in a milestone: > https://github.com/simonw/datasette/milestone/6 > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > , > or mute the thread > > . > ",17608, 348255782,http://datasette.readthedocs.io/en/latest/custom_templates.html,17608, 348255925,Documentation is now live for this: http://datasette.readthedocs.io/en/latest/custom_templates.html,17608, 348404864,"Question is... what should happen to the default static stuff? At the moment that's just https://fivethirtyeight.datasettes.com/-/static/app.css - though I want to improve that to include a content hash, see #154 ",17608, 348404988,If I do add additional static file bundling should that automatically get content hashes as well? #160 - problem with that is then I might have to parse the CSS files and rewrite their internal background-url references etc.,17608, 348420129,"I've found some examples of canned queries I want to support that can't be represented as views, so I'm going to reopen this.",17608, 348420955,"I'll use the existing metadata.json file: { ""databases"": { ""mydb"": { ""queries"": { ""custom_thingy"": {... The query definition can either be just a string of SQL, or it can be an object with a sql key and optional title and description keys. ",17608, 348719680,"This is about more than just CSS and JavaScript - there are plenty of reasons someone might want to bundle HTML as well, e.g. for building something like https://sf-tree-search.now.sh/ So, instead of thinking about this in terms of /static/, I'm going to think about this in terms of allowing people to mount one or more document roots (or docroots). datasette serve mydb.db -d my-doc-root/ This will cause the root of the server to show content from the `my-doc-root/` directory (assuming it has an index.html file in it). A more common option will be to mount specific folders to specific directories, like this: datasette serve mydb.db -d static:my-static/ Now any hits to `/static/foo.css` will serve content from `my-static/foo.css`",17608, 348719752,Not sure which I like better out of `-d/--docroot` or `-s/--static` or `-m/--mount` for this.,17608, 348719827,`-m` is already taken for `--metadata`.,17608, 348793054,"You can now tell Datasette to serve static files from a specific location at a specific mountpoint. For example: datasette serve mydb.db --static extra-css:/tmp/static/css Now if you visit this URL: http://localhost:8001/extra-css/blah.css The following file will be served: /tmp/static/css/blah.css ",17608, 348793156,Still TODO: teach `datasette publish` and friends about this.,17608, 348860191,Seems like a reasonable thing for us to support.,17608, 348860623,"While I'm doing this, I could add per-database and per-table metadata too ala #68",17608, 349027974, This is also a good opportunity to re-factor out a separate query.html template - right now the database.html template is doing two jobs.,17608, 349047335,Turns out there's a bug in this: https://timezones-now-hrjgkinozh.now.sh/timezones-0d61a90/ElementaryGeometries should not be showing the search box.,17608, 349359498,"Named canned queries can now be defined in metadata.json like this: { ""databases"": { ""timezones"": { ""queries"": { ""timezone_for_point"": ""select tzid from timezones ..."" } } } } These will be shown in a new ""Queries"" section beneath ""Views"" on the database page. ",17608, 349383276,http://datasette.readthedocs.io/en/latest/sql_queries.html,17608, 349406761,Demo: https://timezones-api.now.sh/timezones-3cb9f64/by_point,17608, 349408214,I think `.json` should continue to return rows as list-of-lists - it's a nice default because it produces a smaller overall JSON file. Encouraging people to specify an alternative shape to get the current `.jsono` format feels appropriate.,17608, 349860851,"I'm testing this like so: datasette ~/Dropbox/Development/timezones-api/timezones.db --reload --load-extension /usr/local/lib/mod_spatialite.dylib ",17608, 349861461,"This query looks like it does the right thing: select * from sqlite_master where rootpage = 0 and ( sql like '%VIRTUAL TABLE%USING FTS%content=""ElementaryGeometries""%' or ( tbl_name = ""ElementaryGeometries"" and sql like '%VIRTUAL TABLE%USING FTS%' ) ) Against a table that should not be shown as FTS: https://timezones-now-hrjgkinozh.now.sh/timezones-0d61a90?sql=++++++++select+*+from+sqlite_master%0D%0A++++++++++++where+rootpage+%3D+0%0D%0A++++++++++++and+%28%0D%0A++++++++++++++++sql+like+%27%25VIRTUAL+TABLE%25USING+FTS%25content%3D%22ElementaryGeometries%22%25%27%0D%0A++++++++++++++++or+%28%0D%0A++++++++++++++++++tbl_name+%3D+%22ElementaryGeometries%22%0D%0A++++++++++++++++++and+sql+like+%27%25VIRTUAL+TABLE%25USING+FTS%25%27%0D%0A++++++++++++++++%29%0D%0A++++++++++++%29+ Against a table that SHOULD match: https://sf-trees.now.sh/sf-trees-ebc2ad9?sql=++++++++select+*+from+sqlite_master%0D%0A++++++++++++where+rootpage+%3D+0%0D%0A++++++++++++and+%28%0D%0A++++++++++++++++sql+like+%27%25VIRTUAL+TABLE%25USING+FTS%25content%3D%22Street_Tree_List_fts%22%25%27%0D%0A++++++++++++++++or+%28%0D%0A++++++++++++++++++tbl_name+%3D+%22Street_Tree_List_fts%22%0D%0A++++++++++++++++++and+sql+like+%27%25VIRTUAL+TABLE%25USING+FTS%25%27%0D%0A++++++++++++++++%29%0D%0A++++++++++++%29+",17608, 349868849,"I'm happy with this - we have extra_head, content, body_class and title blocks which should provide enough hooks for most reasonable customizations.",17608, 349874052,"In #159 I added a mechanism for easily customizing per-column displays, and I've added documentation showing an example of using this mechanism to set certain columns to display as unescaped HTML: http://datasette.readthedocs.io/en/latest/custom_templates.html#custom-templates This fixes item 3, so I'm closing this ticket!",17608, 349874709,"Example usage: datasette skeleton parlgov.db -m parlgov.json Generates a `parlgov.json` file containing this: { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null, ""databases"": { ""parlgov"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null, ""queries"": {}, ""tables"": { ""info_data_source"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""external_party_castles_mair"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""external_party_chess"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""external_party_huber_inglehart"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""info_table"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""external_party_euprofiler"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""party_family"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""info_id"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""sqlite_stat1"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""external_party_benoit_laver"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""external_country_iso"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""viewcalc_party_position"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""viewcalc_election_parameter"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""viewcalc_parliament_composition"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""viewcalc_country_year_share"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""election"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""politician_president"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""party_name_change"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""external_commissioner_doering"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""external_party_ray"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""party_change"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""cabinet_party"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""external_party_ees"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""party"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""external_party_cmp"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""country"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""cabinet"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""info_variable"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null }, ""election_result"": { ""title"": null, ""description"": null, ""description_html"": null, ""license"": null, ""license_url"": null, ""source"": null, ""source_url"": null } } } } } ",17608, 349874844,This metadata doesn't yet do anything - need to implement #165,17608, 350026183,"Here's an example metadata.json file illustrating custom per-database and per- table metadata: { ""title"": ""Overall datasette title"", ""description_html"": ""This is a description with HTML."", ""databases"": { ""db1"": { ""title"": ""First database"", ""description"": ""This is a string description & has no HTML"", ""license_url"": ""http://example.com/"", ""license"": ""The example license"", ""queries"": { ""canned_query"": ""select * from table1 limit 3;"" }, ""tables"": { ""table1"": { ""title"": ""Custom title for table1"", ""description"": ""Tables can have descriptions too"", ""source"": ""This has a custom source"", ""source_url"": ""http://example.com/"" } } } } }",17608, 350026452,"Needs documentation, see #166 ",17608, 350035741,"http://datasette.readthedocs.io/en/latest/metadata.html ",17608, 350108113,"It's not throwing the validation error anymore, but i still cannot run following with query: ``` WITH RECURSIVE cnt(x) AS (SELECT 1 UNION ALL SELECT x+1 FROM cnt LIMIT 10) SELECT x FROM cnt; ``` I got `near ""WITH"": syntax error`.",17608, 350125953,"My column/row HTML display logic has got way too convoluted. This is a sign I need to add proper unit tests for it and clean it up. The complexity comes from: * Displaying a rowid for tables that do not have a primary key * Showing an additional Link column for rows with a primary key * Not displaying that Link column on the individual row pages * Trying to get foreign keys working correctly in all cases, e.g. #152 ",17608, 350158037,That might mean your version of SQLite doesn't support that syntax. Unfortunately the version bundled with Python is a bit old - the one built by the Dockerfile in this repo should handle it though.,17608, 350182904,"You're right..got this resolved after upgrading the sqlite version. Thanks you!",17608, 350292364,"I can emulate this on OS X using a disk image (Disk Utility -> File -> New Image -> Blank Image...) - once mounted, I get the following: >>> os.link('/tmp/hello', '/Volumes/Untitled/hello') Traceback (most recent call last): File """", line 1, in OSError: [Errno 18] Cross-device link: '/tmp/hello' -> '/Volumes/Untitled/hello' I can simulate that in a mock like this: >>> from unittest.mock import patch >>> @patch('os.link') ... def test_link(mock_link): ... mock_link.side_effect = OSError ... mock_link() ... ",17608, 350301248,"This fix should work, please have a go with latest master and let me know if you run into any problems.",17608, 350302417,I think I'll do this as a custom Jinja template filter. That way template authors can re-use it for their own static files if they want.,17608, 350323722,If I do this as a querystring parameter I won't need to worry about URL routing.,17608, 350413422,https://github.com/channelcat/sanic/releases/tag/0.7.0,17608, 350421661,"Input: results from the database, foreign key definitions, primary key definitions, type of page Output: display_columns and display_rows",17608, 350424595,Perhaps the row.html and table.html templates should be passed the same data but should themselves decide if they will display the Link column ,17608, 350496258,"Example usage: datasette package --static css:extra-css/ --static js:extra-js/ \ sf-trees.db --template-dir templates/ --tag sf-trees --branch master This creates a local Docker image that includes copies of the templates/, extra-css/ and extra-js/ directories. You can then run it like this: docker run -p 8001:8001 sf-trees For publishing to Zeit now: datasette publish now --static css:extra-css/ --static js:extra-js/ \ sf-trees.db --template-dir templates/ --name sf-trees --branch master Example: https://sf-trees-wbihszoazc.now.sh/sf-trees-02c8ef1/Street_Tree_List For publishing to Heroku: datasette publish heroku --static css:extra-css/ --static js:extra-js/ \ sf-trees.db --template-dir templates/ --branch master ",17608, 350496277,"Example usage: datasette package --static css:extra-css/ --static js:extra-js/ \ sf-trees.db --template-dir templates/ --tag sf-trees --branch master This creates a local Docker image that includes copies of the templates/, extra-css/ and extra-js/ directories. You can then run it like this: docker run -p 8001:8001 sf-trees For publishing to Zeit now: datasette publish now --static css:extra-css/ --static js:extra-js/ \ sf-trees.db --template-dir templates/ --name sf-trees --branch master Example: https://sf-trees-wbihszoazc.now.sh/sf-trees-02c8ef1/Street_Tree_List For publishing to Heroku: datasette publish heroku --static css:extra-css/ --static js:extra-js/ \ sf-trees.db --template-dir templates/ --branch master ",17608, 350506593,Turns out this is already supported: https://github.com/simonw/datasette/blob/6bdfcf60760c27e29ff34692d06e62b36aeecc56/datasette/app.py#L307,17608, 350506751,"My mistake, that's using the database name - there isn't a way of customizing for a specific named query yet.",17608, 350507155," Canned query page (/mydatabase/canned-query): query-mydatabase-canned-query.html query-mydatabase.html query.html",17608, 350508049,"Quoting the new documentation: You can find out which templates were considered for a specific page by viewing source on that page and looking for an HTML comment at the bottom. The comment will look something like this: This example is from the canned query page for a query called ""tz"" in the database called ""mydb"". The asterisk shows which template was selected - so in this case, Datasette found a template file called `query-mydb-tz.html` and used that - but if that template had not been found, it would have tried for `query-mydb.html` or the default `query.html`.",17608, 350515616,This function signature is pretty gross: https://github.com/simonw/datasette/blob/7a7e4b2ed8c76c6d002a9d707dbc840f6a2abf7f/datasette/app.py#L418,17608, 350515985,"A better alternative: ```async def display_columns_and_rows(self, database, table, rows, link_column=False):```",17608, 350516782,I can simplify this all by dropping the nicety where if a table is using a rowid the Link column is titled rowid instead.,17608, 350519711,Done! https://github.com/simonw/datasette/releases/tag/0.14,17608, 350519736,@ftrain Datasette 0.14 is now released with all of the above: https://github.com/simonw/datasette/releases/tag/0.14,17608, 350519821,"Also worth mentioning: as of #160 and #157 the `datasette publish now`, `datasette publish heroku` and `datasette package` commands all know how to bundle up any `--static` or `--template-dir` content and include it in the Docker image / Heroku/Now deployment that gets generated.",17608, 350521619,I think the `datasette skeleton` command from #164 makes this obsolete.,17608, 350521635,I don't think this is necessary.,17608, 350521711,I fixed that last issue in c195ee4d46f2577b1943836a8270d84c8341d138,17608, 350521736,Heroku is now in the README as of 6bdfcf60760c27e29ff34692d06e62b36aeecc56,17608, 350521780,Won't fix - I think the custom templates and static stuff in https://github.com/simonw/datasette/releases/tag/0.14 renders this obsolete.,17608, 350521806,Implemented this in 80bf3afa43e3cb396c7a7c9b168eedbc6fe0fa15 and #165. Didn't use data package though.,17608, 350521853,I'm going to keep this separate in csvs-to-sqlite.,17608, 350527283,This is also really interesting when combined with the spatialite AsGeoJSON function: http://www.gaia-gis.it/gaia-sins/spatialite-sql-4.2.0.html#p3misc,17608, 353424169,Done - thanks for curating these: https://github.com/topics/automatic-api,17608, 355487646,"Ah, glad I found this issue. I have private data that I'd like to share to a few different people. Personally, a shared username and password would be sufficient for me, more-or-less Basic Auth. Do you have more complex requirements in mind? I'm not sure if ""plugin"" means ""build a plugin"" or ""find a plugin"" or something else entirely. FWIW, I stumbled upon [sanic-auth](https://github.com/pyx/sanic-auth) which looks like a new project to bring some interfaces around auth to sanic, similar to Flask. Alternatively, it shouldn't be too bad to add in Basic Auth. If we went down that route, that would probably be best built as a separate package for sanic that `datasette` brings in. What are your thoughts around this?",17608, 356115657,"This project probably would not be the place for that. This is a layer for sqllite specifically. It solves a similar problem as graphql, so adding that here wouldn't make sense. Here's an example i found from google that uses micro to run a graphql microservice. you'd just then need to connect your db. https://github.com/timneutkens/micro-graphql",17608, 356161672,"@wulfmann I think I disagree, except I'm not entirely sure what you mean by that first paragraph. The JSON API that Datasette currently exposes is quite different to GraphQL. Furthermore, there's no ""just"" about connecting micro-graphql to a DB; at least, no more ""just"" than adding any other API. You still need to configure the schema, which is exactly the kind of thing that Datasette does for JSON API. This is why I think that GraphQL's a good fit here.",17608, 356175667,"@yozlet Yes I think that I was confused when I posted my original comment. I see your main point now and am in agreement. ",17608, 357542404,"Thanks for catching this, merged!",17608, 359697938,👍 I'd like this too! ,17608, 360535979,"To summarise that thread: - expose full `metadata.json` object to the index page template, eg to allow tables to be referred to by name; - ability to import multiple `metadata.json` files, eg to allow metadata files created for a specific SQLite db to be reused in a datasette referring to several database files; It could also be useful to allow users to import a python file containing custom functions that can that be loaded into scope and made available to custom templates. ",17608, 368625350,great idea!,17608, 370273359,"Are you talking specifically about accessing metadata from HTML templates? That makes a lot of sense, I'll think about how this could work.",17608, 370461231,"Yes. I think the simplest implementation is to change lines like ```python metadata = self.ds.metadata.get('databases', {}).get(name, {}) ``` to ```python metadata = { **self.ds.metadata, **self.ds.metadata.get('databases', {}).get(name, {}), } ``` so that specified inner values overwrite outer values, but only if they exist.",17608, 374810115,"Hah, this is exactly the opposite of datasette's default approach to caching, which is to cache everything for as long as possible. I don't think we'll need to add `Cache-Control: no-cache` headers provided we instead set it up so you can turn off Datasette's caching.",17608, 374811114,"We actually have this already: https://github.com/simonw/datasette/blob/012fc7c5cd3e9160c9a4c19cc964253e97fb054a/datasette/cli.py#L253-L255 You can disable the cache headers using the `datasette --debug` option.",17608, 374872202,--debug is perfect tnk,17608, 376585911,"OK, I have an implementation of this. I realised that not ALL metadata should be inherited: it makes sense for source/source_url/license/license_url to be inherited, but it doesn't make sense for the title and description to be inherited down to the individual databases and tables.",17608, 376587017,One thing that's missing from this: if you set source/license data at the individual database level they should be inherited by tables within that database.,17608, 376589591,"Also needed: the ability to unset metadata. If the root metadata specifies a license_url it should be possible to set ""license_url"": null on a child database or table. The current implementation will ignore null (or empty string) values and default to the top level value. I think the templates themselves should be able to indicate if they want the inherited values or not. That way we could support arbitrary key/values and avoid the application code having special knowledge of license_url etc.",17608, 376590265,">I think the templates themselves should be able to indicate if they want the inherited values or not. That way we could support arbitrary key/values and avoid the application code having special knowledge of license_url etc. Yes, you could have `metadata` that works like `metadata` does currently and `inherited_metadata` that works with inheritance.",17608, 376592044,"It would be nice to also allow arbitrary keys (maybe under a parent key called params or something to prevent conflicts). For our datasette project, we just have a bunch of dictionaries defined in the base template for things like site URL and column humanized names: https://github.com/baltimore-sun-data/salaries-datasette/blob/master/templates/base.html It would be cleaner if this were in the metadata.json.",17608, 376594727,"One point of complexity: datasette can be used to bundle multiple .db files into a single ""app"". I think that's OK. We could require that the `datasette_files` table is present in the first database file passed on the command-line. Or we could even construct a search path and consult multiple versions of the table spread across multiple files. That said... any configuration that corresponds to a specific table should live in the same database file as that table. Ditto for general metadata: if we have license/source information for a specific table or database that information should be able to live in the same .db file as the data.",17608, 376604558,"I am SO inspired by what you've done with https://salaries.news.baltimoresun.com/ - that's pretty much my ideal use-case for Datasette, and it's by far the most elaborate customization I've seen so far. I'd love to hear other ideas that came up while building that.",17608, 376614973,"@simonw Other than metadata, the biggest item on wishlist for the salaries project was the ability to reorder by column. Of course, that could be done with a custom SQL query, but we didn't want to have to reimplement all the nav/pagination stuff from scratch. @carolinp, feel free to add your thoughts. ",17608, 376981291,"Given how unlikely it is that this will pose a real problem I think I like option 1: enable sort-by-column by default for all tables, then allow power users to instead switch to explicit enabling of the functionality in their `metadata.json` if they know their data is too big.",17608, 376983741,"I think this can work with a `?_sort=xxx` parameter - and `?_sort=-xxx` to sort in the opposite direction. I'd like to support ""sort by X descending, then by Y ascending if there are dupes for X"" as well. Two ways that could work: `?_sort=-xxx,yyy` Or... `?_sort=-xxx&_sort=yyy` The second option is probably better in that it makes it easier for columns to have a comma in their name. Is it possible for a SQLite column to start with a `-` character?",17608, 376986668,"Might have to do something special to get sort-by-nulls-last: https://stackoverflow.com/questions/12503120/how-to-do-nulls-last-in-sqlite order by ifnull(column_name, -999999) Would need to figure out a smart way to get the default value - maybe by running a min() or max() against the column first?",17608, 377049625,"This is a better pattern as you don't have to pick a minimum value: ORDER BY CASE WHEN SOMECOL IS NULL THEN 1 ELSE 0 END, SOMECOL",17608, 377050461,"I think there are actually four kinds of sort order we need to support; * ascending * descending * ascending, nulls last * descending, nulls last It looks like [-blah] is a valid SQLite table name, so mark I descending with a hyphen prefix isn't good. Instead, maybe this: ?_sort_asc=col1&_sort_desc_nulls_last=col2 ",17608, 377051018,"I'd like to continue to support _next=token pagination even for custom sort orders. To do that I should include rowid (or general primary key) as the tie breaker on all sorts so I can incorporate that it into the _next= token.",17608, 377052634,"In terms of user interface: the obvious place to put this is as a drop down menu on the column headers. This also means the UI can support combined sort orders. Assuming you are already sorted by county descending and you select the candidate column header, the options could be: * sort all by candidate * sort all by candidate, descending * sort by county descending, then by candidate * sort by county descending, then by candidate descending",17608, 377054358,I'm tempted to put these verbose sorting options inline in the page HTML but have them in the table footer so they don't clog up the top half of the page with uninteresting links - then use JavaScript to hoik them out into a dropdown menu attached to each column header.,17608, 377055663,"There is one other interesting option for auto-enabling/disabling sort: the inspect command could include data about column index presence and whether or not a column has any null values in it. This would allow us to dynamically include a ""nulls last"" option but only for columns that contain at least one null. It's quite a lot of additional engineering for a very minor feature though, so I think I'll punt on that for the moment. We may find that the _group_count feature can benefit from column value statistics later on though.",17608, 377065541,"This is because the SQL we are using here is: select * from compound_primary_key where ""pk1"" > ""d"" and ""pk2"" > ""v"" order by pk1, pk2 limit 101 This is incorrect. The correct SQL syntax (according to the example on https://www.sqlite.org/rowvalue.html#scrolling_window_queries ) is: select * from compound_primary_key where (""pk1"", ""pk2"") > (""d"", ""v"") order by pk1, pk2 limit 101 BUT... this uses ""row values"" syntax which was only added to SQLite in version 3.15.0 in October 2016: https://sqlite.org/changes.html#version_3_15_0 The version on https://datasette-issue-190-compound-pks.now.sh/compound-pks-9aafe8f?sql=select+sqlite_version%28%29%3B is 3.8.7.1 from October 2014.",17608, 377066466,"Without row values syntax, the necessary SQL to retrieve the next page after `d, v` gets a bit gnarly: select * from compound_primary_key where pk1 >= ""d"" and not (pk1 = ""d"" and pk2 <= ""v"") order by pk1, pk2 See https://datasette-issue-190-compound-pks.now.sh/compound-pks-9aafe8f?sql=select+*+from+compound_primary_key+where+pk1+%3E%3D+%22d%22+and+not+%28pk1+%3D+%22d%22+and+pk2+%3C%3D+%22v%22%29+order+by+pk1%2C+pk2 This article was useful for figuring this out: https://use-the-index-luke.com/sql/partial-results/fetch-next-page",17608, 377067541,"Here's how I generated the table for testing this with 3 compound primary keys: CREATE_SQL = ''' CREATE TABLE compound_three_primary_keys ( pk1 varchar(30), pk2 varchar(30), pk3 varchar(30), content text, PRIMARY KEY (pk1, pk2, pk3) );''' alphabet = 'abcdefghijklmnopqrstuvwxyz' for a in alphabet: for b in alphabet: for c in alphabet: print(''' INSERT INTO compound_three_primary_keys VALUES ('{}', '{}', '{}', '{}'); '''.strip().format(a, b, c, '{}-{}-{}-{}-{}-{}'.format(a,b,c,a,b,c))) ",17608, 377072022,"Here's the SQL for a next page with three compound primary keys: https://datasette-issue-190-compound-pks.now.sh/compound-pks-8e99805?sql=select+*+from+compound_three_primary_keys%0D%0Awhere%0D%0A++%28pk1+%3E+%3Apk1%29%0D%0A++++or%0D%0A++%28pk1+%3D+%3Apk1+and+pk2+%3E+%3Apk2%29%0D%0A++++or%0D%0A++%28pk1+%3D+%3Apk1+and+pk2+%3D+%3Apk2+and+pk3+%3E+%3Apk3%29%0D%0Aorder+by+pk1%2C+pk2%2C+pk3%3B%0D%0A%0D%0A%0D%0A&pk1=a&pk2=d&pk3=v ``` select * from compound_three_primary_keys where (pk1 > :pk1) or (pk1 = :pk1 and pk2 > :pk2) or (pk1 = :pk1 and pk2 = :pk2 and pk3 > :pk3) order by pk1, pk2, pk3; ```",17608, 377362466,"Alternative idea: by default enable all sorting in the UI. If a table has more than 100,000 rows disable sorting UI except for columns that have an index. Allow this to be overridden in metadata.json ",17608, 377454591,"Re-opening this issue: my fix doesn't play nicely with extra filter arguments. Consider this page: https://datasette-issue-190-compound-pks-not-quite-fixed.now.sh/compound-pks-8e99805/compound_three_primary_keys?content__contains=d The next link is to `?_next=f%2Cz%2Ct&content__contains=z` (that's next of `f,z,t`) but that gives us https://datasette-issue-190-compound-pks-not-quite-fixed.now.sh/compound-pks-8e99805/compound_three_primary_keys?_next=b%2Cx%2Cd&content__contains=d which shows `a,a,d` at the top. Sure enough, the generated SQL looks like this: https://datasette-issue-190-compound-pks-not-quite-fixed.now.sh/compound-pks-8e99805?sql=select+%2A+from+compound_three_primary_keys+where+%22content%22+like+%3Ap0+and+%28%5Bpk1%5D+%3E+%3Ap0%29%0A++or%0A%28%5Bpk1%5D+%3D+%3Ap0+and+%5Bpk2%5D+%3E+%3Ap1%29%0A++or%0A%28%5Bpk1%5D+%3D+%3Ap0+and+%5Bpk2%5D+%3D+%3Ap1+and+%5Bpk3%5D+%3E+%3Ap2%29+order+by+pk1%2C+pk2%2C+pk3+limit+101&p0=%25d%25&p1=b&p2=x&p3=d select * from compound_three_primary_keys where ""content"" like :p0 and ([pk1] > :p0) or ([pk1] = :p0 and [pk2] > :p1) or ([pk1] = :p0 and [pk2] = :p1 and [pk3] > :p2) order by pk1, pk2, pk3 limit 101 The parameters here are confused. The :p0 should be reserved just for the like clause - the other parameters should be p1, p2 and p3 (not p0, p1 and p2).",17608, 377457087,"Interestingly, in deploying a copy of the database to demonstrate this final bug fix I had to use the `--force` argument like so: datasette publish now --branch=master compound-pks.db --force This is because `now` had already deployed a Dockerfile referencing `--branch=master` once already, so it thought nothing had changed and it could re-use that last deployment.",17608, 377457214,"Fixed! https://datasette-issue-190-compound-pks-second-fix.now.sh/compound-pks-8e99805/compound_three_primary_keys?_next=b%2Cx%2Cd&content__contains=d now correctly shows `b,y,d` as the first row on the page.",17608, 377459579,"I'm not entirely sure how to get `_next=` pagination working against sorted collections when a tie-breaker is needed. Consider this data: https://fivethirtyeight.datasettes.com/fivethirtyeight-2628db9?sql=select+rowid%2C+*+from+%5Bnfl-wide-receivers%2Fadvanced-historical%5D%0D%0Aorder+by+case+when+career_ranypa+is+null+then+1+else+0+end%2C+career_ranypa%2C+rowid+limit+11 ![2018-03-29 at 11 46 pm](https://user-images.githubusercontent.com/9599/38127549-790c8bd0-33ab-11e8-8d32-66f5d3847c8a.png) If the page size was set to 9 rather than 11, the page divide would be between those two rows with the same value in the `career_ranypa` column. What would the `?_next=` token look like such that the correct row would be returned? ",17608, 377460127,"The problem is that our `_next=` pagination currently works based on a `>` - but for this case a `>=` for the value is needed combined with a `>` on the tie-breaker (which would be the `rowid` column). So I think this is the right SQL: ``` select rowid, * from [nfl-wide-receivers/advanced-historical] where career_ranypa >= -6.331167749 and rowid > 2736 order by case when career_ranypa is null then 1 else 0 end, career_ranypa, rowid limit 11 ``` https://fivethirtyeight.datasettes.com/fivethirtyeight-2628db9?sql=select+rowid%2C+*+from+%5Bnfl-wide-receivers%2Fadvanced-historical%5D%0D%0Awhere+career_ranypa+%3E%3D+-6.331167749+and+rowid+%3E+2736%0D%0Aorder+by+case+when+career_ranypa+is+null+then+1+else+0+end%2C+career_ranypa%2C+rowid+limit+11 But how do I encode a `_next` token that means "">= X and > Y""?",17608, 377462334,"Maybe the answer here is that anything that's encoded in the next token is treated as >= with the exception of columns known to be primary keys, which are treated as >",17608, 377546510,"Pushed some work-in-progress with failing unit tests here: https://github.com/simonw/datasette/commit/2f8359c6f25768805431c80c74e5ec4213c2b2a6 Here's a demo: https://datasette-column-sort-wip.now.sh/sortable-4bbaa6f/sortable?_sort=sortable - note that the `_sort_desc` and `_sort_nulls_last` options aren't done yet, plus it doesn't correctly paginate (the `_next` tokens do not yet take sorting into account).",17608, 377547265,"I think this is the right incantation for a ""next"" link: https://datasette-column-sort-wip.now.sh/sortable-4bbaa6f?sql=select+*+from+sortable%0D%0Awhere+sortable+%3C%3D+94%0D%0Aand+%28%0D%0A++%28pk1+%3E+%27d%27%29%0D%0A++or%0D%0A++%28pk1+%3D+%27d%27+and+pk2+%3E+%27w%27%29%0D%0A%29%0D%0Aorder+by+sortable+desc%2C+pk1%2C+pk2%0D%0Alimit+7",17608, 378279612,The new documentation for the `_shape=` parameter is now live at http://datasette.readthedocs.io/en/latest/json_api.html,17608, 378281740,"I'm having trouble replicating this bug. In particular, I don't understand what you mean by ""these are then rendered in the datasette query box using single quotes"" - since canned queries aren't displayed in a textarea. Do you have an example database / metadata.json I can use to investigate this further?",17608, 378293484,"Here's what this looks like: ![2018-04-03 at 8 32 am](https://user-images.githubusercontent.com/9599/38259345-9e1c75ea-3719-11e8-83c9-2160c6fa079c.png) I need to figure out the right way to handle licensing of bundled software like this - it's MIT licensed which is compatible with Datasette's Apache 2 license, but I feel like bundled licensed software (including codemirror) needs to be recognized in the README or docs somehow.",17608, 378293599,"Let's only show the ""Format SQL"" button if the user has JavaScript enabled. We can do that in this code here: https://github.com/bsmithgall/datasette/blob/4a7151a58d6ab7c8404a91beef7083e8a5807cf8/datasette/templates/_codemirror_foot.html#L14-L21",17608, 378295376,"On the licensing front: it looks like the way Django handles this is to keep the licensing header in the files intact, e.g. https://github.com/django/django/blob/6deaddcca367d0143c815aaa42342021baa3b41e/django/contrib/admin/static/admin/js/vendor/jquery/jquery.js So for this change, adding a comment at the top of `sql-formatter.min.js` which references the MIT license would do the trick.",17608, 378297842,I can work on that -- would you prefer to inline a `display: hidden` and then have the javascript flip the visibility or include it as css?,17608, 379142500,"You could try pulling out a validate query strings method. If it fails validation build the error object from the message. If it passes, you only need to go down a happy path. ",17608, 379555484,I'm going to combine the code for explicit sorting with the existing code for _next= pagination - so even tables without an explicit sort order will run through the same code since they are ordered and paginated by primary key.,17608, 379556637,It would be useful to have a microbenchmark in place to help understand how much of a performance benefit this would actually provide.,17608, 379556774,"A common problem with keyset pagination is that it can distort the ""total number of rows"" logic - every time you navigate to a further page the total rows count can decrease due to the extra arguments in the `where` clause. The `filtered_table_rows` value (see #194) calculated using `count_sql` currently has this problem.",17608, 379556881,`table_rows_count` is always the *total* number of rows in the table. ,17608, 379556981,Maybe `table_rows_filtered_count` would be more aesthetically pleasing than `filtered_table_rows_count`.,17608, 379557743,https://github.com/simonw/datasette/blob/446d47fdb005b3776bc06ad8d1f44b01fc2e938b/datasette/app.py#L93-L102,17608, 379557982,"A note about views: a view cannot be paginated using keyset pagination because records returned from a view don't have a primary key - so there's no way to reliably distinguish between _next= records when the sorted column has duplicates with the same value. Datasette already takes this into account: views are paginated using offset/limit instead. We can continue to do that even for views that have been sorted using a `_sort` parameter. ",17608, 379559074,"While I'm at it, doing the same thing for fts_table detection is worth considering: https://github.com/simonw/datasette/blob/446d47fdb005b3776bc06ad8d1f44b01fc2e938b/datasette/app.py#L598-L603",17608, 379559214,The single biggest challenge here is expanding foreign key references. This is the blocker that prevents `_group_count` from being useful at the moment.,17608, 379559319,"From a code point of view, the current mechanism for `_group_count` makes the `TableView` even **more** complicated: https://github.com/simonw/datasette/blob/446d47fdb005b3776bc06ad8d1f44b01fc2e938b/datasette/app.py#L644-L653 Instead, I think if `_group_count` is detected we should generate the SQL and then defer to `self.custom_sql`, like we do for canned queries: https://github.com/simonw/datasette/blob/446d47fdb005b3776bc06ad8d1f44b01fc2e938b/datasette/app.py#L539-L541",17608, 379588602,"Could also identify all views for that database, which would save on these queries: https://github.com/simonw/datasette/blob/b2188f044265c95f7e54860e28107c17d2a6ed2e/datasette/app.py#L543-L545",17608, 379591062,"To break this up into smaller units, the first implementation of this will only support a single `_sort` or `_sort_desc` querystring parameter.",17608, 379592393,"Actually next page SQL when sorting looks more like this: ``` select rowid, * from [alcohol-consumption/drinks] where ""country"" like :p0 and ( beer_servings > 111 or (beer_servings = 111 and rowid > 190) ) order by beer_servings, rowid limit 101 ``` The next page after row 190 with sortable value 111 should show either records that are greater than 111 or records that match 111 but have a greater primary key than the last one seen. https://fivethirtyeight.datasettes.com/fivethirtyeight-2628db9?sql=select+rowid%2C+*+from+%5Balcohol-consumption%2Fdrinks%5D%0D%0Awhere+%22country%22+like+%3Ap0%0D%0Aand+%28%0D%0A++++beer_servings+%3E+111%0D%0A++++or+%28beer_servings+%3D+111+and+rowid+%3E+190%29%0D%0A%29%0D%0Aorder+by+beer_servings%2C+rowid+limit+101&p0=%25a%25",17608, 379594529,"Demo: senator tweets ordered by number of replies: https://datasette-issue-189-demo.now.sh/fivethirtyeight-2628db9/twitter-ratio%2Fsenators?_sort_desc=replies Page 2 (note that since Senators retweet things there are tweets with the same text/number-of-replies but retweeted by different senators that span the page break): https://datasette-issue-189-demo.now.sh/fivethirtyeight-2628db9/twitter-ratio%2Fsenators?_next=8556%2C121799&_sort_desc=replies ",17608, 379595253,@carlmjohnson in case you aren't following along with #189 I've shipped the first working prototype of sort-by-column - you can try it out here: https://datasette-issue-189-demo-2.now.sh/salaries-7859114-7859114/2017+Maryland+state+salaries?_search=university&_sort_desc=annual_salary,17608, 379595274,"Another demo: https://datasette-issue-189-demo-2.now.sh/salaries-7859114-7859114/2017+Maryland+state+salaries?_search=university&_sort_desc=annual_salary https://datasette-issue-189-demo-2.now.sh/salaries-7859114-7859114/2017+Maryland+state+salaries?_search=university&last_name__exact=JOHNSON&_sort_desc=annual_salary",17608, 379602339,"Small bug: ""201 rows where sorted by sortable_with_nulls"" shouldn't have the word ""where"" in it.",17608, 379602690,"I'm going to split the following out into separate tickets: * Ability to sort by multiple columns e.g. `?_sort=name&sort_desc=age&_sort=height` * Ability to specify nulls last e.g. `?_sort_desc_nulls_last=age`",17608, 379603156,"Actually I think I always want nulls last when ordering asc, nulls first when ordering desc.",17608, 379608977,"Here's a demo of the new clickable column headers: https://datasette-issue-189-demo-3.now.sh/salaries-7859114-7859114/2017+Maryland+state+salaries?_search=university&_sort_desc=last_name ![2018-04-08 at 7 22 pm](https://user-images.githubusercontent.com/9599/38476370-3e62a60e-3b62-11e8-9d30-8dc6608133dd.png) ",17608, 379624163,"This is harder than I thought, because the `_shape=` logic actually runs AFTER the main block of code which is set up to catch exceptions - this code here: https://github.com/simonw/datasette/blob/0abd3abacb309a2bd5913a7a2df4e9256585b1bb/datasette/app.py#L200-L216",17608, 379634425,I've merged this into master.,17608, 379636068,Do you have steps to reproduce here - ideally a small example SQLite database that exhibits the error?,17608, 379636695,"I'd prefer to have the JavaScript actually manipulate the DOM to add the button - something like this: var button = document.createElement('button'); button.value = 'Format SQL'; button.addEventListener( 'click', format, false ); document.getElementById('run-sql').parentNode.appendChild(button);",17608, 379759875,I've implemented that approach in 86ac746. It does cause the button to pop in only after Codemirror is finished rendering which is a bit awkward.,17608, 379788103,Visit https://salaries.news.baltimoresun.com/salaries/bad-table.,17608, 379791047,Awesome!,17608, 379803864,This is now released in Datasette 0.15 https://github.com/simonw/datasette/releases/tag/0.15,17608, 379830529,Another demo: https://fivethirtyeight.datasettes.com/fivethirtyeight-2628db9/congress-age%2Fcongress-terms,17608, 379833216,I may do this by adding select boxes for _sort and _sort_desc to the filters UI. This would allow sorting in mobile portrait mode but would also ensure that the existing sort order is persisted if the user edits the current filters (right now sort resets when filters are applied).,17608, 379833481,"Since you can't apply `_sort` and `_sort_desc` at the same time, maybe just one select box for picking the column to sort by and a boolean checkbox for ""sort descending"" - which then redirects to the `_sort_desc=` URL variant.",17608, 379936068,"![2018-04-09 at 5 32 pm](https://user-images.githubusercontent.com/9599/38529802-fd2a7e68-3c1b-11e8-974a-bf5438fec701.png) ",17608, 379936832,Demo: https://fivethirtyeight.datasettes.com/fivethirtyeight-2628db9/twitter-ratio%2Fsenators?_sort_desc=replies&text__contains=bipartisan,17608, 380606998,"We should only do this if we're certain the spatialite module has been loaded. I could imagine someone having a `sql_statements_log` table of their own without using spatialite for example. I think the most reliable way to detect spatialite is to run `SELECT AddGeometryColumn(1, 2, 3, 4, 5);` against a `:memory:` database and see if it throws an exception - similar to how we detect FTS. We could add this as a `detect_spatialite()` function in `utils.py` and call it once on startup.",17608, 380608340,"Yuck, nasty - OK I get it, this happens with ANY non-existent table name. Let's fix that - these should clearly return an HTTP 404.",17608, 380608372,"> I think the most reliable way to detect spatialite is to run `SELECT AddGeometryColumn(1, 2, 3, 4, 5);` against a `:memory:` database and see if it throws an exception Or just see if there's a `geometry_columns` table? I think that's quite unlikely to be added by accident (and it's an OGC standard). It also tells you if Spatialite is installed in the database rather than just loaded.",17608, 380619851,I can clean this up further with the mechanism I'm using for #184,17608, 380951474,"Nice, thanks very much.",17608, 380951815,I like this. I'd like to be able to attach a full description to a column as well. We could support these in `metadata.json`,17608, 380951920,This also feeds into the visualization features I want to add - we could use this kind of metadata to automatically apply meaningful labels to graphs.,17608, 380966565,"Looks like [pint](https://pint.readthedocs.io/en/latest/tutorial.html) is pretty good at this. ```python In [1]: import pint In [2]: ureg = pint.UnitRegistry() In [3]: q = 3e6 * ureg('Hz') In [4]: '{:~P}'.format(q.to_compact()) Out[4]: '3.0 MHz' In [5]: q = 0.3 * ureg('m') In [5]: '{:~P}'.format(q.to_compact()) Out[5]: '300.0 mm' In [6]: q = 5 * ureg('') In [7]: '{:~P}'.format(q.to_compact()) Out[7]: '5' ```",17608, 381220441,I'm afraid I've just made this obsolete with 9f28bbe43dc277a3963a12aaae37b5ee3c277207,17608, 381237440,I spotted you'd mentioned that in #184 but only after I'd written the patch!,17608, 381262824,"Demo: https://fivethirtyeight.datasettes.com/fivethirtyeight-2628db9?sql=explain+query+plan+select+*+from+%5Bmost-common-name%2Fsurnames%5D+order+by+rank+desc https://fivethirtyeight.datasettes.com/fivethirtyeight-2628db9?sql=explain+select+*+from+%5Bmost-common-name%2Fsurnames%5D+order+by+rank+desc",17608, 381300336,"This is really cool - I'm very impressed by pint. I'd like to figure out a sensible opt-in way to expose this in the JSON output as well. Maybe with a `&_units=true` parameter? We should definitely expose the units section from the table metadata in the output of https://wtr-api.herokuapp.com/wtr-663ea99/license_frequency.json",17608, 381300386,"In #204 you said ""I'd like to add support for using units when querying but this is PR is pretty usable as-is."" - I'm fascinated to hear more about how this could work.",17608, 381315675,"> I'd like to figure out a sensible opt-in way to expose this in the JSON output as well. Maybe with a &_units=true parameter? From a machine-readable perspective I'm not sure why it would be useful to decorate the values with units. Edit: Should have had some coffee first. It's clearly useful for stuff like map rendering! I agree that the unit metadata should definitely be exposed in the JSON. > In #204 you said ""I'd like to add support for using units when querying but this is PR is pretty usable as-is."" - I'm fascinated to hear more about how this could work. I'm thinking about a couple of approaches here. I think the simplest one is: if the column has a unit attached, optionally accept units in query fields: ```python column_units = ureg(""Hz"") # Create a unit object for the column's unit query_variable = ureg(""4 GHz"") # Supplied query variable # Now we can convert the query units into column units before querying supplied_value.to(column_units).magnitude > 4000000000.0 # If the user doesn't supply units, pint just returns the plain # number and we can query as usual assuming it's the base unit query_variable = ureg(""50"") query_variable > 50 isinstance(query_variable, numbers.Number) > True ``` This also lets us do some nice unit conversion on querying: ```python column_units = ureg(""m"") query_variable = ureg(""50 ft"") supplied_value.to(column_units) > ``` The alternative would be to provide a dropdown of units next to the query field (so a ""Hz"" field would give you ""kHz"", ""MHz"", ""GHz""). Although this would be clearer to the user, it isn't so easy - we'd need to know more about the context of the field to give you sensible SI prefixes (I'm not so interested in nanoHertz, for example). You also lose the bonus of being able to convert - although pint will happily show you all the compatible units, it again suffers from a lack of context: ```python ureg(""m"").compatible_units() > frozenset({, , , , , , , , , , , }) ```",17608, 381330075,"Presumably units only work for numeric fields? If that's the case then automatically processing them if the incoming query string argument has a unit suffix makes total sense to me. Here's a pretty crazy idea: what if we exposed unit conversion to SQL as a custom SQLite function? That way it would be possible to optionally use units in actual custom SQL queries. I'd have to think quite carefully about performance implications here - wouldn't want a poorly considered unit calculation over a 500,000 row table to lock up the server. But I think the 1s query time limit might still prevent that.",17608, 381330220,This looks great so far - love the new documentation. Let's throw in a unit test or two for the basic unit filters (mainly as a protection against accidental regressions in the future).,17608, 381332222,I've added some tests and that docs link.,17608, 381334973,I'm going to merge this and then add a unit test.,17608, 381336696,I merged this to master in c857608738d6b6c3e4f3248304a22f8b2648dd3e - thanks @russss!,17608, 381348849,I think I'm going to hold on to the custom sql function idea for the moment and implement it as an example plugin.,17608, 381361734,"FWIW I am now doing this on my WTR app (instead of silently limiting maps to 1000). [Telefonica](https://wtr-api.herokuapp.com/wtr-663ea99/licensee/18325) now has about 4000 markers and good old [BT](https://wtr-api.herokuapp.com/wtr-663ea99/licensee/8412) has 22,000 or so.",17608, 381429213,"I think I found a bug. I tried to sort by middle initial in my salaries set, and many middle initials are null. The next_url gets set by Datasette to: http://localhost:8001/salaries-d3a5631/2017+Maryland+state+salaries?_next=None%2C391&_sort=middle_initial But then `None` is interpreted literally and it tries to find a name with the middle initial ""None"" and ends up skipping ahead to O on page 2.",17608, 381441392,"I suspected this would cause some test failures, but I'll wait for opinions before attempting to fix them.",17608, 381442233,"I started a thread on Twitter asking people for good examples of Python projects with a strong plugin ecosystem: https://twitter.com/simonw/status/985377670388105216 The most impressive example that came back was pytest - which now has nearly 400 plugins: https://plugincompat.herokuapp.com/ The pytest plugin infrastructure is available as an independent package called pluggy - which appears to offer everything I need for Datasette. I'm going to give that a go and see how well it works: https://pluggy.readthedocs.io/en/latest/",17608, 381442494,"Datasette 1.0 will be the release of Datasette that attempts to provide a stable plugin API: https://github.com/simonw/datasette/milestone/7 There's a lot of work to be done before then, but as a starting point I'm going to support two very simple extension mechanisms: * Template system plugins - where the hook gets passed the Jinja environment and can freely register new template tags and filters * SQLite connection plugins - where the hook gets passed a new SQLite connection and can register custom SQLite functions The template system hook will go near here: https://github.com/simonw/datasette/blob/efbb4e83374a2c795e436c72fa79f70da72309b8/datasette/app.py#L1225-L1228 The SQLite connection hook will go near here: https://github.com/simonw/datasette/blob/efbb4e83374a2c795e436c72fa79f70da72309b8/datasette/app.py#L1094-L1098 These two feel simple enough that I'm not worried that I might design an API that I later regret.",17608, 381443728,Tox is a good example of a project that uses pluggy in the way I want to use it (function hooks rather than classes): https://github.com/tox-dev/tox/blob/master/tox/hookspecs.py,17608, 381446392,"OK, from that prototype in f2720b0c6b7172ebe8820 it looks like pluggy provides a solid path forward. Next steps: - [x] Build a demo plugin that uses setuptools entrypoints to register with the `datasette` plugin manager via pluggy - [x] Figure out a mechanism for registering plugins without first needing to publish them to PyPI. Can I load plugins from a special `plugins/` directory similar to the `--template-dir=templates/` option already supported by Datasette? #211",17608, 381446511,"Here's a demo of the `convert_units()` SQL function I prototyped in f2720b0c6b7172ebe88 ![2018-04-15 at 4 23 pm](https://user-images.githubusercontent.com/9599/38784633-8c43821e-40c9-11e8-97dd-697755a0f858.png) ",17608, 381446554,I built a prototype of the `convert_units()` custom SQL function as a plugin over in https://github.com/simonw/datasette/issues/14#issuecomment-381446511,17608, 381446906,"Once I've got the plugins mechanism stable and people start releasing plugins it would be useful to have a dedicated Trove classifier on PyPI for Datasette plugins - `Framework :: Datasette` for example. This would help me build a Datasette equivalent of the http://plugincompat.herokuapp.com/ site, which works by scanning PyPI for items with the ``Framework :: Pytest`` classifier: https://github.com/pytest-dev/plugincompat/blob/8bdf1a6fb82807091ece0c68c196103ee8270194/update_index.py#L52-L53 It looks like the mechanism for requesting new PyPI classifiers is to file a ticket against warehouse, like these ones: https://github.com/pypa/warehouse/issues/3570 and https://github.com/pypa/warehouse/issues/2881",17608, 381450394,"I created https://github.com/simonw/datasette-plugin-demos which is now published to PyPI and can be installed with `pip install datasette-plugin-demos` - I've confirmed that if you DO install it my Datasette `plugins` branch picks up the plugins, and `select random_integer(1, 4)` works as it should.",17608, 381450591,"Slight code design problem... when I tried installing my branch in a fresh virtual environment I got this error, because `setup.py` now depends on `pluggy` (from importing `__version__`): ``` File ""/private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/pip-req-build-dftqdezt/setup.py"", line 2, in from datasette import __version__ File ""/private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/pip-req-build-dftqdezt/datasette/__init__.py"", line 2, in from .hookspecs import hookimpl # noqa File ""/private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/pip-req-build-dftqdezt/datasette/hookspecs.py"", line 1, in from pluggy import HookimplMarker ModuleNotFoundError: No module named 'pluggy' ``` Looks like I've run into point 6 on https://packaging.python.org/guides/single-sourcing-package-version/ : ![2018-04-15 at 5 34 pm](https://user-images.githubusercontent.com/9599/38785314-403ce86a-40d3-11e8-8542-ba426eddf4ac.png) ",17608, 381455054,"I think Vega-Lite is the way to go here: https://vega.github.io/vega-lite/ I've been playing around with it and Datasette with some really positive initial results: https://vega.github.io/editor/#/gist/vega-lite/simonw/89100ce80573d062d70f780d10e5e609/decada131575825875c0a076e418c661c2adb014/vice-shootings-gender-race-by-department.vl.json https://vega.github.io/editor/#/gist/vega-lite/simonw/5f69fbe29380b0d5d95f31a385f49ee4/7087b64df03cf9dba44a5258a606f29182cb8619/trees-san-francisco.vl.json",17608, 381456434,"The easiest way to implement this in Python 2 would be `execfile(...)` - but that was removed in Python 3. According to https://stackoverflow.com/a/437857/6083 `2to3` replaces that with this, which ensures the filename is associated with the code for debugging purposes: ``` with open(""somefile.py"") as f: code = compile(f.read(), ""somefile.py"", 'exec') exec(code, global_vars, local_vars) ``` Implementing it this way would force this kind of plugin to be self-contained in a single file. I think that's OK: if you want a more complex plugin you can use the standard pluggy-powered setuptools mechanism to build it.",17608, 381462005,This needs unit tests. I also need to manually test the `datasette package` and `datesette publish` commands.,17608, 381478217,"Here's the result of running: datasette publish now fivethirtyeight.db \ --plugins-dir=plugins/ --title=""FiveThirtyEight"" --branch=plugins-dir https://datasette-phjtvzwwzl.now.sh/fivethirtyeight-2628db9?sql=select+convert_units%28100%2C+%27m%27%2C+%27ft%27%29 Where `plugins/pint_plugin.py` contains the following: ``` from datasette import hookimpl import pint ureg = pint.UnitRegistry() @hookimpl def prepare_connection(conn): def convert_units(amount, from_, to_): ""select convert_units(100, 'm', 'ft');"" return (amount * ureg(from_)).to(to_).to_tuple()[0] conn.create_function('convert_units', 3, convert_units) ```",17608, 381478253,"This worked as well: datasette package fivethirtyeight.db \ --plugins-dir=plugins/ --title=""FiveThirtyEight"" --branch=plugins-dir ",17608, 381481990,Added unit tests in 33c6bcadb962457be6b0c7f369826b404e2bcef5,17608, 381482407,"Here's the result of running this: datasette publish heroku fivethirtyeight.db \ --plugins-dir=plugins/ --title=""FiveThirtyEight"" --branch=plugins-dir https://intense-river-24599.herokuapp.com/fivethirtyeight-2628db9?sql=select+convert_units%28100%2C+%27m%27%2C+%27ft%27%29",17608, 381483301,I think this is a good improvement. If you fix the tests I'll merge it.,17608, 381488049,"I think this is pretty hard. @coleifer has done some work in this direction, including https://github.com/coleifer/pysqlite3 which ports the standalone pysqlite module to Python 3. ",17608, 381490361,"Packaging JS and CSS in a pip installable wheel is fiddly but possible. http://peak.telecommunity.com/DevCenter/PythonEggs#accessing-package-resources from pkg_resources import resource_string foo_config = resource_string(__name__, 'foo.conf')",17608, 381491707,This looks like a good example: https://github.com/funkey/nyroglancer/commit/d4438ab42171360b2b8e9020f672846dd70c8d80,17608, 381602005,I don't think it should be too difficult... you can look at what @ghaering did with pysqlite (and similarly what I copied for pysqlite3). You would theoretically take an amalgamation build of Sqlite (all code in a single .c and .h file). The `AmalgamationLibSqliteBuilder` class detects the presence of this amalgamated source file and builds a statically-linked pysqlite.,17608, 381611738,I should check if it's possible to have two template registration function plugins in a single plugin module. If it isn't maybe I should use class plugins instead of module plugins.,17608, 381612585,`resource_stream` returns a file-like object which may be better for serving from Sanic.,17608, 381621338,"Annoyingly, the following only results in the last of the two `prepare_connection` hooks being registered: ``` from datasette import hookimpl import pint import random ureg = pint.UnitRegistry() @hookimpl def prepare_connection(conn): def convert_units(amount, from_, to_): ""select convert_units(100, 'm', 'ft');"" return (amount * ureg(from_)).to(to_).to_tuple()[0] conn.create_function('convert_units', 3, convert_units) @hookimpl def prepare_connection(conn): conn.create_function('random_integer', 2, random.randint) ```",17608, 381622793,"I think that's OK. The two plugins I've implemented so far (`prepare_connection` and `prepare_jinja2_environment`) both make sense if they can only be defined once-per-plugin. For the moment I'll assume I can define future hooks to work well with the same limitation. The syntactic sugar idea in #220 can help here too.",17608, 381643173,"Yikes, definitely a bug.",17608, 381644355,"So there are two tricky problems to solve here: * I need a way of encoding `null` into that `_next=` that is unambiguous from the string `None` or `null`. This means introducing some kind of escaping mechanism in those strings. I already use URL encoding as part of the construction of those components here, maybe that can help here? * I need to figure out what the SQL should be for the ""next"" set of results if the previous value was null. Thankfully we use the primary key as a tie-breaker so this shouldn't be impossible.",17608, 381645274,"Relevant code: https://github.com/simonw/datasette/blob/904f1c75a3c17671d25c53b91e177c249d14ab3b/datasette/app.py#L828-L832",17608, 381645973,"I could use `$null` as a magic value that means None. Since I'm applying `quote_plus()` to actual values, any legit strings that look like this will be encoded as `%24null`: ``` >>> urllib.parse.quote_plus('$null') '%24null' ```",17608, 381648053,"I think the correct SQL is this: https://datasette-issue-189-demo-3.now.sh/salaries-7859114-7859114?sql=select+rowid%2C+*+from+%5B2017+Maryland+state+salaries%5D%0D%0Awhere+%28middle_initial+is+not+null+or+%28middle_initial+is+null+and+rowid+%3E+%3Ap0%29%29%0D%0Aorder+by+middle_initial+limit+101&p0=391 ``` select rowid, * from [2017 Maryland state salaries] where (middle_initial is not null or (middle_initial is null and rowid > :p0)) order by middle_initial limit 101 ``` Though this will also need to be taken into account for #198 ",17608, 381649140,But what would that SQL look like for `_sort_desc`?,17608, 381649437,"Here's where that SQL gets constructed at the moment: https://github.com/simonw/datasette/blob/10a34f995c70daa37a8a2aa02c3135a4b023a24c/datasette/app.py#L761-L771",17608, 381738137,"Tests now fixed, honest. The failing test on Travis looks like an intermittent sqlite failure which should resolve itself on a retry...",17608, 381763651,"Ah, I had no idea you could bind python functions into sqlite! I think the primary purpose of this issue has been served now - I'm going to close this and create a new issue for the only bit of this that hasn't been touched yet, which is (optionally) exposing units in the JSON API.",17608, 381777108,This could also help workaround the current predicament that a single plugin can only define one prepare_connection hook.,17608, 381786522,"Weird... tests are failing in Travis, despite passing on my local machine. https://travis-ci.org/simonw/datasette/builds/367423706",17608, 381788051,Still failing. This is very odd.,17608, 381794744,I'm reverting this out of master until I can figure out why the tests are failing.,17608, 381798786,"Here's the test that's failing: https://github.com/simonw/datasette/blob/59a3aa859c0e782aeda9a515b1b52c358e8458a2/tests/test_api.py#L437-L470 I got Travis to spit out the `fetched` and `expected` variables. `expected` has 201 items in it and is identical to what I get on my local laptop. `fetched` has 250 items in it, so it's clearly different from my local environment. I've managed to replicate the bug in production! I created a test database like this: python tests/fixtures.py sortable.db Then deployed that database like so: datasette publish now sortable.db \ --extra-options=""--page_size=50"" --branch=debug-travis-issue-216 And... if you click ""next"" on this page https://datasette-issue-216-pagination.now.sh/sortable-5679797/sortable?_sort_desc=sortable_with_nulls five times you get back 250 results, when you should only get back 201.",17608, 381799267,"The version that I deployed which exhibits the bug is running SQLite `3.8.7.1` - https://datasette-issue-216-pagination.now.sh/sortable-5679797?sql=select+sqlite_version%28%29 The version that I have running locally which does NOT exhibit the bug is running SQLite `3.23.0`",17608, 381799408,"... which is VERY surprising, because `3.23.0` only came out on 2nd April this year: https://www.sqlite.org/changes.html - I have no idea how I came to be running that version on my laptop.",17608, 381801302,"This is the SQL that returns differing results in production and on my laptop: https://datasette-issue-216-pagination.now.sh/sortable-5679797?sql=select+%2A+from+sortable+where+%28sortable_with_nulls+is+null+and+%28%28pk1+%3E+%3Ap0%29%0A++or%0A%28pk1+%3D+%3Ap0+and+pk2+%3E+%3Ap1%29%29%29+order+by+sortable_with_nulls+desc+limit+51&p0=b&p1=t ``` select * from sortable where (sortable_with_nulls is null and ((pk1 > :p0) or (pk1 = :p0 and pk2 > :p1))) order by sortable_with_nulls desc limit 51 ``` I think that `order by sortable_with_nulls desc` bit is at fault - the primary keys should be included in that order by as well. Sure enough, changing the query to this one returns the same results across both environments: ``` select * from sortable where (sortable_with_nulls is null and ((pk1 > :p0) or (pk1 = :p0 and pk2 > :p1))) order by sortable_with_nulls desc, pk1, pk2 limit 51 ```",17608, 381803157,Fixed!,17608, 381809998,I just shipped Datasette 0.19 with where I'm at so far: https://github.com/simonw/datasette/releases/tag/0.19,17608, 381905593,"I've added another commit which puts classes a class on each `` by default with its column name, and I've also made the PK column bold. Unfortunately the tests are still failing on 3.6, which is weird. I can't reproduce locally...",17608, 382038613,"I figured out the recipe for bundling static assets in a plugin: https://github.com/simonw/datasette-plugin-demos/commit/26c5548f4ab7c6cc6d398df17767950be50d0edf (and then `python3 setup.py bdist_wheel`) Having done that, I ran `pip install ../datasette-plugin-demos/dist/datasette_plugin_demos-0.2-py3-none-any.whl` from my Datasette virtual environment and then did the following: ``` >>> import pkg_resources >>> pkg_resources.resource_stream( ... 'datasette_plugin_demos', 'static/plugin.js' ... ).read() b""alert('hello');\n"" >>> pkg_resources.resource_filename( ... 'datasette_plugin_demos', 'static/plugin.js' ... ) '..../venv/lib/python3.6/site-packages/datasette_plugin_demos/static/plugin.js' >>> pkg_resources.resource_string( ... 'datasette_plugin_demos', 'static/plugin.js' ... ) b""alert('hello');\n"" ```",17608, 382048582,"One possible option: let plugins bundle their own `static/` directory and then register themselves with Datasette, then have `/-/static-plugins/name-of-plugin/...` serve files from that directory.",17608, 382069980,"Even if we automatically serve ALL `static/` content from installed plugins, we'll still need them to register which files need to be linked to from `extra_css_urls` and `extra_js_urls`",17608, 382205189,"I managed to get a better error message out of that test. The server is returning this (but only on Python 3.6, not on Python 3.5 - and only in Travis, not in my local environment): ```{'error': 'interrupted', 'ok': False, 'status': 400, 'title': 'Invalid SQL'}``` https://travis-ci.org/simonw/datasette/jobs/367929134",17608, 382210976,"OK, aaf59db570ab7688af72c08bb5bc1edc145e3e07 should mean that the tests pass when I merge that.",17608, 382256729,I added a mechanism for plugins to serve static files and define custom CSS and JS URLs in #214 - see new documentation on http://datasette.readthedocs.io/en/latest/plugins.html#static-assets and http://datasette.readthedocs.io/en/latest/plugins.html#extra-css-urls,17608, 382408128,"Demo: datasette publish now sortable.db --install datasette-plugin-demos --branch=master Produced this deployment, with both the `random_integer()` function and the static file from https://github.com/simonw/datasette-plugin-demos/tree/0.2 https://datasette-issue-223.now.sh/-/static-plugins/datasette_plugin_demos/plugin.js https://datasette-issue-223.now.sh/sortable-4bbaa6f?sql=select+random_integer%280%2C+10%29 ",17608, 382409989,"Tested on Heroku as well. datasette publish heroku sortable.db --install datasette-plugin-demos --branch=master https://morning-tor-45944.herokuapp.com/-/static-plugins/datasette_plugin_demos/plugin.js https://morning-tor-45944.herokuapp.com/sortable-4bbaa6f?sql=select+random_integer%280%2C+10%29",17608, 382413121,"And tested `datasette package` - this time exercising the ability to pass more than one `--install` option: ``` $ datasette package sortable.db --branch=master --install requests --install datasette-plugin-demos Sending build context to Docker daemon 125.4kB Step 1/7 : FROM python:3 ---> 79e1dc9af1c1 Step 2/7 : COPY . /app ---> 6e8e40bce378 Step 3/7 : WORKDIR /app Removing intermediate container 7cdc9ab20d09 ---> f42258c2211f Step 4/7 : RUN pip install https://github.com/simonw/datasette/archive/master.zip requests datasette-plugin-demos ---> Running in a0f17cec08a4 Collecting ... Removing intermediate container a0f17cec08a4 ---> beea84e73271 Step 5/7 : RUN datasette inspect sortable.db --inspect-file inspect-data.json ---> Running in 4daa28792348 Removing intermediate container 4daa28792348 ---> c60312d21b99 Step 6/7 : EXPOSE 8001 ---> Running in fa728468482d Removing intermediate container fa728468482d ---> 8f219a61fddc Step 7/7 : CMD [""datasette"", ""serve"", ""--host"", ""0.0.0.0"", ""sortable.db"", ""--cors"", ""--port"", ""8001"", ""--inspect-file"", ""inspect-data.json""] ---> Running in cd4eaeb2ce9e Removing intermediate container cd4eaeb2ce9e ---> 066e257c7c44 Successfully built 066e257c7c44 (venv) datasette $ docker run -p 8081:8001 066e257c7c44 Serve! files=('sortable.db',) on port 8001 [2018-04-18 14:40:18 +0000] [1] [INFO] Goin' Fast @ http://0.0.0.0:8001 [2018-04-18 14:40:18 +0000] [1] [INFO] Starting worker [1] [2018-04-18 14:46:01 +0000] - (sanic.access)[INFO][1:7]: GET http://localhost:8081/-/static-plugins/datasette_plugin_demos/plugin.js 200 16 ``` ",17608, 382616527,"No need to use `PackageLoader` after all, we can use the same mechanism we used for the static path: https://github.com/simonw/datasette/blob/b55809a1e20986bb2e638b698815a77902e8708d/datasette/utils.py#L694-L695",17608, 382808266,"Maybe this should have a second argument indicating which codepath was being handled. That way plugins could say ""only inject this extra context variable on the row page"".",17608, 382924910,"Hiding tables with the `idx_` prefix should be good enough here, since false positives aren't very harmful.",17608, 382958693,"A better way to do this would be with many different plugin hooks, one for each view.",17608, 382959857,"Plus a generic prepare_context() hook called in the common render method. prepare_context_table(), prepare_context_row() etc Arguments are context, request, self (hence can access self.ds) ",17608, 382964794,"What if the context needs to make await calls? One possible option: plugins can either manipulate the context in place OR they can return an awaitable. If they do that, the caller will await it.",17608, 382966604,Should this differentiate between preparing the data to be sent back as JSON and preparing the context for the template?,17608, 382967238,Maybe prepare_table_data() vs prepare_table_context(),17608, 383109984,Refs #229,17608, 383139889,"I released everything we have so far in [Datasette 0.20](https://github.com/simonw/datasette/releases/tag/0.20) and built and released an example plugin, [datasette-cluster-map](https://pypi.org/project/datasette-cluster-map/). Here's my blog entry about it: https://simonwillison.net/2018/Apr/20/datasette-plugins/",17608, 383140111,Here's a link demonstrating my new plugin: https://datasette-cluster-map-demo.now.sh/polar-bears-455fe3a/USGS_WC_eartags_output_files_2009-2011-Status,17608, 383252624,Thanks!,17608, 383315348,"I could also have an `""autodetect"": false` option for that plugin to turn off autodetecting entirely. Would be useful if the plugin didn't append its JavaScript in pages that it wasn't used for - that might require making the `extra_js_urls()` hook optionally aware of the columns and table and metadata.",17608, 383398182,"```{ ""databases"": { ""database1"": { ""tables"": { ""example_table"": { ""label_column"": ""name"" } } } } } ```",17608, 383399762,Docs here: http://datasette.readthedocs.io/en/latest/metadata.html#specifying-the-label-column-for-a-table,17608, 383410146,"I built this wrong: my implementation is looking for the `label_column` on the table-being-displayed, but it should be looking for it on the table-the-foreign-key-links-to.",17608, 383727973,"There might also be something clever we can do here with PRAGMA statements: https://stackoverflow.com/questions/14146881/limit-the-maximum-amount-of-memory-sqlite3-uses And https://www.sqlite.org/pragma.html",17608, 383764533,The `resource` module in he standard library has the ability to set limits on memory usage for the current process: https://pymotw.com/2/resource/,17608, 384362028,"On further thought: this is actually only an issue for immutable deployments to platforms like Zeit Now and Heroku. As such, adding it to `datasette serve` feels clumsy. Maybe `datasette publish` should instead gain the ability to optionally install an extra mechanism that periodically pulls a fresh copy of `metadata.json` from a URL.",17608, 384500327,"``` { ""databases"": { ""database1"": { ""tables"": { ""example_table"": { ""hidden"": true } } } } } ```",17608, 384503873,Documentation: http://datasette.readthedocs.io/en/latest/metadata.html#hiding-tables,17608, 384512192,Documentation: http://datasette.readthedocs.io/en/latest/json_api.html#special-table-arguments,17608, 384675792,"Docs now live at http://datasette.readthedocs.io/ I still need to document a few more parts of the API before closing this.",17608, 384676488,Remaining work for this is tracked in #150,17608, 384678319,"I shipped this last week as the first plugin: https://simonwillison.net/2018/Apr/20/datasette-plugins/ Demo: https://datasette-cluster-map-demo.datasettes.com/polar-bears-455fe3a/USGS_WC_eartags_output_files_2009-2011-Status Plugin: https://github.com/simonw/datasette-cluster-map",17608, 386309928,Demo: https://datasette-versions-and-shape-demo.now.sh/-/versions,17608, 386310149,"Demos: * https://datasette-versions-and-shape-demo.now.sh/sf-trees-02c8ef1/qSpecies.json?_shape=array * https://datasette-versions-and-shape-demo.now.sh/sf-trees-02c8ef1/qSpecies.json?_shape=object * https://datasette-versions-and-shape-demo.now.sh/sf-trees-02c8ef1/qSpecies.json?_shape=arrays * https://datasette-versions-and-shape-demo.now.sh/sf-trees-02c8ef1/qSpecies.json?_shape=objects",17608, 386357645,"Even better: use `plugin_manager.list_plugin_distinfo()` from pluggy to get back a list of tuples, the second item in each tuple is a `pkg_resources.DistInfoDistribution` with a `.version` attribute.",17608, 386692333,Demo: https://datasette-plugins-and-max-size-demo.now.sh/-/plugins,17608, 386692534,Demo: https://datasette-plugins-and-max-size-demo.now.sh/sf-trees/Street_Tree_List.json?_size=max,17608, 386840307,Documented here: http://datasette.readthedocs.io/en/latest/json_api.html#special-table-arguments,17608, 386840806,"Demo: datasette publish now ../datasettes/san-francisco/sf-film-locations.db --branch=master --name datasette-column-search-demo https://datasette-column-search-demo.now.sh/sf-film-locations/Film_Locations_in_San_Francisco?_search_Locations=justin",17608, 386879509,"We can solve this using the `sqlite_timelimit(conn, 20)` helper, which can tell SQLite to give up after 20ms. We can wrap that around the following SQL: select distinct COLUMN from TABLE limit 21; Then we look at the number of rows returned. If it's 21 or more we know that this table had more than 21 distinct values, so we'll treat it as ""unlimited"". Likewise, if the SQL times out before 20ms is up we will skip this introspection.",17608, 386879840,"Here's a quick demo of that exploration: https://datasette-distinct-column-values.now.sh/-/inspect Example output: ``` { ""antiquities-act/actions_under_antiquities_act"": { ""columns"": [ ""current_name"", ""states"", ""original_name"", ""current_agency"", ""action"", ""date"", ""year"", ""pres_or_congress"", ""acres_affected"" ], ""count"": 344, ""distinct_values_by_column"": { ""acres_affected"": null, ""action"": null, ""current_agency"": [ ""NPS"", ""State of Montana"", ""BLM"", ""State of Arizona"", ""USFS"", ""State of North Dakota"", ""NPS, BLM"", ""State of South Carolina"", ""State of New York"", ""FWS"", ""FWS, NOAA"", ""NPS, FWS"", ""NOAA"", ""BLM, USFS"", ""NOAA, FWS"" ], ""current_name"": null, ""date"": null, ""original_name"": null, ""pres_or_congress"": null, ""states"": null, ""year"": null }, ""foreign_keys"": { ""incoming"": [], ""outgoing"": [] }, ""fts_table"": null, ""hidden"": false, ""label_column"": null, ""name"": ""antiquities-act/actions_under_antiquities_act"", ""primary_keys"": [] } } ```",17608, 386879878,If I'm going to expand column introspection in this way it would be useful to also capture column type information.,17608, 388360255,"Do you have an example I can look at? I think I have a possible route for fixing this, but it's pretty tricky (it involves adding a full SQL statement parser, but that's needed for some other potential improvements as well). In the meantime, is this causing actual errors for you or is it more of an inconvenience (form fields being displayed that don't actually do anything)? Another potential solution here could be to allow canned queries to optionally declare their parameters in metadata.json",17608, 388367027,"An example deployment @ https://datasette-zkcvlwdrhl.now.sh/simplestreams-270f20c/cloudimage?content_id__exact=com.ubuntu.cloud%3Areleased%3Adownload It is not causing errors, more of an inconvenience. I have worked around it using a `like` query instead. ",17608, 388497467,"Got it, this seems to trigger the problem: https://datasette-zkcvlwdrhl.now.sh/simplestreams-270f20c?sql=select+*+from+cloudimage+where+%22content_id%22+%3D+%22com.ubuntu.cloud%3Areleased%3Adownload%22+order+by+id+limit+10",17608, 388525357,Facet counts will be generated by extra SQL queries with their own aggressive time limit.,17608, 388550742,http://datasette.readthedocs.io/en/latest/full_text_search.html,17608, 388587855,Adding some TODOs to the original description (so they show up as a todo progress bar),17608, 388588011,Initial documentation: http://datasette.readthedocs.io/en/latest/facets.html,17608, 388588998,"A few demos: * https://datasette-facets-demo.now.sh/fivethirtyeight-2628db9/college-majors%2Fall-ages?_facet=Major_category * https://datasette-facets-demo.now.sh/fivethirtyeight-2628db9/congress-age%2Fcongress-terms?_facet=chamber&_facet=state&_facet=party&_facet=incumbent * https://datasette-facets-demo.now.sh/fivethirtyeight-2628db9/bechdel%2Fmovies?_facet=binary&_facet=test",17608, 388589072,"I need to decide how to display these. They currently look like this: https://datasette-facets-demo.now.sh/fivethirtyeight-2628db9/congress-age%2Fcongress-terms?_facet=chamber&_facet=state&_facet=party&_facet=incumbent&state=MO ![2018-05-12 at 7 58 pm](https://user-images.githubusercontent.com/9599/39962230-e7bf9e10-561e-11e8-80a7-0941b8991318.png) ",17608, 388625703,"I'm still seeing intermittent Python 3.5 failures due to dictionary ordering differences. https://travis-ci.org/simonw/datasette/jobs/378356802 ``` > assert expected_facet_results == facet_results E AssertionError: assert {'city': [{'c...alue': 'MI'}]} == {'city': [{'co...alue': 'MI'}]} E Omitting 1 identical items, use -vv to show E Differing items: E {'city': [{'count': 4, 'toggle_url': '_facet=state&_facet=city&state=MI&city=Detroit', 'value': 'Detroit'}]} != {'city': [{'count': 4, 'toggle_url': 'state=MI&_facet=state&_facet=city&city=Detroit', 'value': 'Detroit'}]} E Use -v to get the full diff ``` To solve these cleanly I need to be able to run Python 3.5 on my local laptop rather than relying on Travis every time.",17608, 388626721,"I managed to get Python 3.5.0 running on my laptop using [pyenv](https://github.com/pyenv/pyenv). Here's the incantation I used: ``` # Install pyenv using homebrew (turns out I already had it) brew install pyenv # Check which versions of Python I have installed pyenv versions # Install Python 3.5.0 pyenv install 3.5.0 # Figure out where pyenv has been installing things pyenv root # Check I can run my newly installed Python 3.5.0 /Users/simonw/.pyenv/versions/3.5.0/bin/python # Use it to create a new virtualenv /Users/simonw/.pyenv/versions/3.5.0/bin/python -mvenv venv35 source venv35/bin/activate # Install datasette into that virtualenv python setup.py install ```",17608, 388626804,"Unfortunately, running `python setup.py test` on my laptop using Python 3.5.0 in that virtualenv results in a flow of weird Sanic-related errors: ``` File ""/Users/simonw/Dropbox/Development/datasette/venv35/lib/python3.5/site-packages/sanic-0.7.0-py3.5.egg/sanic/testing.py"", line 16, in _local_request import aiohttp File ""/Users/simonw/Dropbox/Development/datasette/.eggs/aiohttp-2.3.2-py3.5-macosx-10.13-x86_64.egg/aiohttp/__init__.py"", line 6, in from .client import * # noqa File ""/Users/simonw/Dropbox/Development/datasette/.eggs/aiohttp-2.3.2-py3.5-macosx-10.13-x86_64.egg/aiohttp/client.py"", line 13, in from yarl import URL File ""/Users/simonw/Dropbox/Development/datasette/.eggs/yarl-1.2.4-py3.5-macosx-10.13-x86_64.egg/yarl/__init__.py"", line 11, in from .quoting import _Quoter, _Unquoter File ""/Users/simonw/Dropbox/Development/datasette/.eggs/yarl-1.2.4-py3.5-macosx-10.13-x86_64.egg/yarl/quoting.py"", line 3, in from typing import Optional, TYPE_CHECKING, cast ImportError: cannot import name 'TYPE_CHECKING' ```",17608, 388627281,"https://github.com/rtfd/readthedocs.org/issues/3812#issuecomment-373780860 suggests Python 3.5.2 may have the fix. Yup, that worked: ``` pyenv install 3.5.2 rm -rf venv35 /Users/simonw/.pyenv/versions/3.5.2/bin/python -mvenv venv35 source venv35/bin/activate # Not sure why I need this in my local environment but I do: pip install datasette_plugin_demos python setup.py test ``` This is now giving me the same test failure locally that I am seeing in Travis.",17608, 388628966,"Running specific tests: ``` venv35/bin/pip install pytest beautifulsoup4 aiohttp venv35/bin/pytest tests/test_utils.py ```",17608, 388645828,I may be able to run the SQL for all of the facet counts in one go using a WITH CTE query - will have to microbenchmark this to make sure it is worthwhile: https://datasette-facets-demo.now.sh/fivethirtyeight-2628db9?sql=with+blah+as+%28select+*+from+%5Bcollege-majors%2Fall-ages%5D%29%0D%0Aselect+*+from+%28select+%22Major_category%22%2C+Major_category%2C+count%28*%29+as+n+from%0D%0Ablah+group+by+Major_category+order+by+n+desc+limit+10%29%0D%0Aunion+all%0D%0Aselect+*+from+%28select+%22Major_category2%22%2C+Major_category%2C+count%28*%29+as+n+from%0D%0Ablah+group+by+Major_category+order+by+n+desc+limit+10%29,17608, 388684356,"I just landed pull request #257 - I haven't refactored the tests, I may do that later if it looks worthwhile.",17608, 388686463,It would be neat if there was a mechanism for calculating aggregates per facet - e.g. calculating the sum() of specific columns against each facet result on https://datasette-facets-demo.now.sh/fivethirtyeight-2628db9/nba-elo%2Fnbaallelo?_facet=lg_id&_facet=fran_id&lg_id=ABA&_facet=team_id,17608, 388784063,"Can I get facets working across many2many relationships? This would be fiendishly useful, but the querystring and `metadata.json` syntax is non-obvious.",17608, 388784787,"To decide which facets to suggest: for each column, is the unique value count less than the number of rows matching the current query or is it less than 20 (if we are showing more than 20 rows)? Maybe only do this if there are less than ten non-float columns. Or always try for foreign keys and booleans, then if there are none of those try indexed text and integer fields, then finally try non-indexed text and integer fields but only if there are less than ten.",17608, 388797919,"For M2M to work we will need a mechanism for applying IN queries to the table view, so you can select multiple M2M filters. Maybe this would work: ?_m2m_category=123&_m2m_category=865",17608, 388987044,This work is now happening in the facets branch. Closing this in favor of #255.,17608, 389145872,Activity has now moved to this branch: https://github.com/simonw/datasette/commits/suggested-facets,17608, 389147608,"New demo (published with `datasette publish now --branch=suggested-facets fivethirtyeight.db sf-trees.db --name=datastte-suggested-facets-demo`): https://datasette-suggested-facets-demo.now.sh/fivethirtyeight-2628db9/comic-characters%2Fmarvel-wikia-data After turning on a couple of suggested facets... https://datasette-suggested-facets-demo.now.sh/fivethirtyeight-2628db9/comic-characters%2Fmarvel-wikia-data?_facet=SEX&_facet=ID ![2018-05-15 at 7 24 am](https://user-images.githubusercontent.com/9599/40056411-fa265d16-5810-11e8-89ec-e38fe29ffb2c.png) ",17608, 389386142,"The URL does persist across deployments already, in that you can use the URL without the hash and it will redirect to the current location. Here's an example of that: https://san-francisco.datasettes.com/sf-trees/Street_Tree_List.json This also works if you attempt to hit the incorrect hash, e.g. if you have deployed a new version of the database with an updated hash. The old hash will redirect, e.g. https://san-francisco.datasettes.com/sf-trees-c4b972c/Street_Tree_List.json If you serve Datasette from a HTTP/2 proxy (I've been using Cloudflare for this) you won't even have to pay the cost of the redirect - Datasette sends a `Link: ; rel=preload` header with those redirects, which causes Cloudflare to push out the redirected source as part of that HTTP/2 request. You can fire up the Chrome DevTools to watch this happen. https://github.com/simonw/datasette/blob/2b79f2bdeb1efa86e0756e741292d625f91cb93d/datasette/views/base.py#L91 All of that said... I'm not at all opposed to this feature. For consistency with other Datasette options (e.g. `--cors`) I'd prefer to do this as an optional argument to the `datasette serve` command - something like this: datasette serve mydb.db --no-url-hash",17608, 389386919,"I updated that demo to demonstrate the new foreign key label expansions: https://datasette-suggested-facets-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List?_facet=qLegalStatus ![2018-05-15 at 8 58 pm](https://user-images.githubusercontent.com/9599/40095806-b645026a-5882-11e8-8100-76136df50212.png) ",17608, 389397457,Maybe `suggested_facets` should only be calculated for the HTML view.,17608, 389536870,"The principle benefit provided by the hash URLs is that Datasette can set a far-future cache expiry header on every response. This is particularly useful for JavaScript API work as it makes fantastic use of the browser's cache. It also means that if you are serving your API from behind a caching proxy like Cloudflare you get a fantastic cache hit rate. An option to serve without persistent hashes would also need to turn off the cache headers. Maybe the option should support both? If you hit a page with the hash in the URL you still get the cache headers, but hits to the URL without the hash serve uncashed content directly.",17608, 389546040,"Latest demo - now with multiple columns: https://datasette-suggested-facets-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List?_facet=qCaretaker&_facet=qCareAssistant&_facet=qLegalStatus ![2018-05-16 at 7 47 am](https://user-images.githubusercontent.com/9599/40124418-63e680ba-58dd-11e8-8063-9686826abb8e.png) ",17608, 389562708,"This is now landed in master, ready for the next release.",17608, 389563719,The underlying mechanics for the `_extras` mechanism described in #262 may help with this.,17608, 389566147,"An official demo instance of Datasette dedicated to this use-case would be useful, especially if it was automatically deployed by Travis for every commit to master that passes the tests. Maybe there should be a permanent version of it deployed for each released version too?",17608, 389570841,"At the most basic level, this will work based on an extension. Most places you currently put a `.json` extension should also allow a `.csv` extension. By default this will return the exact results you see on the current page (default max will remain 1000). ## Streaming all records Where things get interested is *streaming mode*. This will be an option which returns ALL matching records as a streaming CSV file, even if that ends up being millions of records. I think the best way to build this will be on top of the existing mechanism used to efficiently implement keyset pagination via `_next=` tokens. ## Expanding foreign keys For tables with foreign key references it would be useful if the CSV format could expand those references to include the labels from `label_column` - maybe via an additional `?_expand=1` option. When expanding each foreign key column will be shown twice: rowid,city_id,city_id_label,state",17608, 389572201,"This will likely be implemented in the `BaseView` class, which needs to know how to spot the `.csv` extension, call the underlying JSON generating function and then return the `columns` and `rows` as correctly formatted CSV. https://github.com/simonw/datasette/blob/9959a9e4deec8e3e178f919e8b494214d5faa7fd/datasette/views/base.py#L201-L207 This means it will take ALL arguments that are available to the `.json` view. It may ignore some (e.g. `_facet=` makes no sense since CSV tables don't have space to show the facet results). In streaming mode, things will behave a little bit differently - in particular, if `_stream=1` then `_next=` will be forbidden. It can't include a length header because we don't know how many bytes it will be CSV output will throw an error if the endpoint doesn't have rows and columns keys eg `/-/inspect.json` So the implementation... - looks for the `.csv` extension - internally fetches the `.json` data instead - If no `_stream` it just transposes that JSON to CSV with the correct content type header - If `_stream=1` - checks for `_next=` and throws an error if it was provided - Otherwise... fetch first page and emit CSV header and first set of rows - Then start async looping, emitting more CSV rows and following the `_next=` internal reference until done I like that this takes advantage of efficient pagination. It may not work so well for views which use offset/limit though. It won't work at all for custom SQL because custom SQL doesn't support _next= pagination. That's fine. For views... easiest fix is to cut off after first X000 records. That seems OK. View JSON would need to include a property that the mechanism can identify.",17608, 389579363,I started a thread on Twitter discussing various CSV output dialects: https://twitter.com/simonw/status/996783395504979968 - I want to pick defaults which will work as well as possible for whatever tools people might be using to consume the data.,17608, 389579762,"> I basically want someone to tell me which arguments I can pass to Python's csv.writer() function that will result in the least complaints from people who try to parse the results :) https://twitter.com/simonw/status/996786815938977792",17608, 389592566,Let's provide a CSV Dialect definition too: https://frictionlessdata.io/specs/csv-dialect/ - via https://twitter.com/drewdaraabrams/status/996794915680997382,17608, 389608473,"There are some code examples in this issue which should help with the streaming part: https://github.com/channelcat/sanic/issues/1067 Also https://github.com/channelcat/sanic/blob/master/docs/sanic/streaming.md#response-streaming",17608, 389626715,"> I’d recommend using the Windows-1252 encoding for maximum compatibility, unless you have any characters not in that set, in which case use UTF8 with a byte order mark. Bit of a pain, but some progams (eg various versions of Excel) don’t read UTF8. **frankieroberto** https://twitter.com/frankieroberto/status/996823071947460616 > There is software that consumes CSV and doesn't speak UTF8!? Huh. Well I can't just use Windows-1252 because I need to support the full UTF8 range of potential data - maybe I should support an optional ?_encoding=windows-1252 argument **simonw** https://twitter.com/simonw/status/996824677245857793",17608, 389702480,Idea: `?_extra=sqllog` could output a lot of every individual SQL statement that was executed in order to generate the page - useful for seeing how foreign key expansion and faceting actually works.,17608, 389893810,Idea: add a `supports_csv = False` property to `BaseView` and over-ride it to `True` just on the view classes that should support CSV (Table and Row). Slight subtlety: the `DatabaseView` class only supports CSV in the `custom_sql()` path. Maybe that needs to be refactored a bit.,17608, 389894382,"I should definitely sanity check if the `_next=` route really is the most efficient way to build this. It may turn out that iterating over a SQLite cursor with a million rows in it is super-efficient and would provide much more reliable performance (plus solve the problem for retrieving full custom SQL queries where we can't do keyset pagination). Problem here is that we run SQL queries in a thread pool. A query that returns millions of rows would presumably tie up a SQL thread until it has finished, which could block the server. This may be a reason to stick with `_next=` keyset pagination - since it ensures each SQL thread yields back again after each 1,000 rows.",17608, 389989015,"This is a departure from how Datasette has been designed so far, and it may turn out that it's not feasible or it requires too many philosophical changes to be worthwhile. If we CAN do it though it would mean Datasette could stay running pointed at a directory on disk and new SQLite databases could be dropped into that directory by another process and served directly as they become available.",17608, 389989615,"From https://www.sqlite.org/c3ref/open.html > **immutable**: The immutable parameter is a boolean query parameter that indicates that the database file is stored on read-only media. When immutable is set, SQLite assumes that the database file cannot be changed, even by a process with higher privilege, and so the database is opened read-only and all locking and change detection is disabled. Caution: Setting the immutable property on a database file that does in fact change can result in incorrect query results and/or SQLITE_CORRUPT errors. See also: SQLITE_IOCAP_IMMUTABLE. So this would probably have to be a new mode, `datasette serve --detect-db-changes`, which no longer opens in immutable mode. Or maybe current behavior becomes not-the-default and you opt into it with `datasette serve --immutable`",17608, 390105147,I'm going to add a `/-/limits` page that shows the current limits.,17608, 390105943,Docs: http://datasette.readthedocs.io/en/latest/limits.html#default-facet-size,17608, 390250253,"Shouldn't [versioneer](https://github.com/warner/python-versioneer) do that? E.g. 0.21+2.g1076c97 You'd need to install via `pip install git+https://github.com/simow/datasette.git` though, this does a temp git clone.",17608, 390433040,Could also support these as optional environment variables - `DATASETTE_NAMEOFSETTING`,17608, 390496376,http://datasette.readthedocs.io/en/latest/config.html,17608, 390577711,"Excellent, I was not aware of the auto redirect to the new hash. My bad This solves my use case. I do agree that your suggested --no-url-hash approach is much neater. I will investigate ",17608, 390689406,"I've changed my mind about the way to support external connectors aside of SQLite and I'm working in a more simple style that respects the original Datasette, i.e. less refactoring. I present you [a version of Datasette wich supports other database connectors](https://github.com/jsancho-gpl/datasette/tree/external-connectors) and [a Datasette connector for HDF5/PyTables files](https://github.com/jsancho-gpl/datasette-pytables).",17608, 390707183,"This is definitely a big improvement. I'd like to refactor the unit tests that cover .inspect() too - currently they are a huge ugly blob at the top of test_api.py",17608, 390707760,"This probably needs to be in a plugin simply because getting Spatialite compiled and installed is a bit of a pain. It's a great opportunity to expand the plugin hooks in useful ways though.",17608, 390795067,"Well, we do have the capability to detect spatialite so my intention certainly wasn't to require it. I can see the advantage of having it as a plugin but it does touch a number of points in the code. I think I'm going to attack this by refactoring the necessary bits and seeing where that leads (which was my plan anyway). I think my main concern is - if I add certain plugin hooks for this, is anything else ever going to use them? I'm not sure I have an answer to that question yet, either way.",17608, 390804333,"We should merge this before refactoring the tests though, because that way we don't couple the new tests to the verification of this change.",17608, 390991640,For SpatiaLite this example may be useful - though it's building 4.3.0 and not 4.4.0: https://github.com/terranodo/spatialite-docker/blob/master/Dockerfile,17608, 390993397,"Useful GitHub code search: https://github.com/search?utf8=✓&q=%22libspatialite-4.4.0%22+%22RC0%22&type=Code ",17608, 390993861,If we can't get `import sqlite3` to load the latest version but we can get `import pysqlite3` to work that's fine too - I can teach Datasette to import the best available version.,17608, 390999055,This shipped in Datasette 0.22. Here's my blog post about it: https://simonwillison.net/2018/May/20/datasette-facets/,17608, 391000659,"Right now the plugin stuff is early enough that I'd like to get as many potential plugin hooks as possible crafted out A much easier to judge if they should be added as actual hooks if we have a working branch prototype of them. Some kind of mechanism for custom column display is already needed - eg there are columns where I want to say ""render this as markdown"" or ""URLify any links in this text"" - or even ""use this date format"" or ""add commas to this integer"". You can do it with a custom template but a lower-level mechanism would be nicer. ",17608, 391003285,"That looks great. I don't think it's possible to derive the current commit version from the .zip downloaded directly from GitHub, so needing to pip install via git+https feels reasonable to me.",17608, 391011268,"I think I can do this almost entirely within my existing BaseView class structure. First, decouple the async data() methods by teaching them to take a querystring object as an argument instead of a Sanic request object. The get() method can then send that new object instead of a request. Next teach the base class how to obey the ASGI protocol. I should be able to get support for both Sanic and uvicorn/daphne working in the same codebase, which will make it easy to compare their performance. ",17608, 391025841,"The other reason I mention plugins is that I have an idea to outlaw JavaScript entirely from Datasette core and instead encourage ALL JavaScript functionality to move into plugins.right now that just means CodeMirror. I may set up some of those plugins (like CodeMirror) as default dependencies so you get them from ""pip install datasette"". I like the neatness of saying that core Datasette is a very simple JSON + HTML application, then encouraging people to go completely wild with JavaScript in the plugins.",17608, 391030083,See also #278,17608, 391050113,"Yup, I'll have a think about it. My current thoughts are for spatialite we'll need to hook into the following places: * Inspection, so we can detect which columns are geometry columns. (We also currently ignore spatialite tables during inspection, it may be worth moving that to the plugin as well.) * After data load, so we can convert WKB into the correct intermediate format for display. The alternative here is to alter the select SQL itself and get spatialite to do this conversion, but that strikes me as a bit more complex and possibly not as useful. * HTML rendering. * Querying? The rendering and querying hooks could also potentially be used to move the units support into a plugin.",17608, 391055490,"This is fantastic! I think I prefer the aesthetics of just ""0.22"" for the version string if it's a tagged release with no additional changes - does that work? I'd like to continue to provide a tuple that can be imported from the version.py module as well, as seen here: https://github.com/simonw/datasette/blob/558d9d7bfef3dd633eb16389281b67d42c9bdeef/datasette/version.py#L1 Presumably we can generate that from the versioneer string? ",17608, 391059008,"```python >>> import sqlite3 >>> sqlite3.sqlite_version '3.23.1' >>> ``` running the above in the container seems to show 3.23.1 too so maybe we don't need pysqlite3 at all?",17608, 391073009,"> I think I prefer the aesthetics of just ""0.22"" for the version string if it's a tagged release with no additional changes - does that work? Yes! That's the default versioneer behaviour. > I'd like to continue to provide a tuple that can be imported from the version.py module as well, as seen here: Should work now, it can be a two (for a tagged version), three or four items tuple. ``` In [2]: datasette.__version__ Out[2]: '0.12+292.ga70c2a8.dirty' In [3]: datasette.__version_info__ Out[3]: ('0', '12+292', 'ga70c2a8', 'dirty') ```",17608, 391073267,"Sorry, just realised you rely on `version` being a module ...",17608, 391076239,This looks amazing! Can't wait to try this out this evening.,17608, 391076458,Yeah let's try this without pysqlite3 and see if we still get the correct version.,17608, 391077700,"Alright, that should work now -- let me know if you would prefer any different behaviour.",17608, 391141391,"I'm going to clean this up for consistency tomorrow morning so hold off merging until then please On Tue, May 22, 2018 at 6:34 PM, Simon Willison wrote: > Yeah let's try this without pysqlite3 and see if we still get the correct > version. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > , or mute > the thread > > . > ",17608, 391190497,"I grabbed just your Dockerfile and built it like this: docker build . -t datasette Once it had built, I ran it like this: docker run -p 8001:8001 -v `pwd`:/mnt datasette \ datasette -p 8001 -h 0.0.0.0 /mnt/fixtures.db \ --load-extension=/usr/local/lib/mod_spatialite.so (The fixtures.db file is created by running `python tests/fixtures.py fixtures.db`) Then I visited http://localhost:8001/-/versions and I got this: { ""datasette"": { ""version"": ""0+unknown"" }, ""python"": { ""full"": ""3.6.3 (default, Dec 12 2017, 06:37:05) \n[GCC 6.3.0 20170516]"", ""version"": ""3.6.3"" }, ""sqlite"": { ""extensions"": { ""json1"": null, ""spatialite"": ""4.4.0-RC0"" }, ""fts_versions"": [ ""FTS4"", ""FTS3"" ], ""version"": ""3.23.1"" } } Fantastic! I'm getting SQLite `3.23.1` and SpatiaLite `4.4.0-RC0`",17608, 391290271,"Running: ```bash docker run -p 8001:8001 -v `pwd`:/mnt datasette \ datasette -p 8001 -h 0.0.0.0 /mnt/fixtures.db \ --load-extension=/usr/local/lib/mod_spatialite.so ``` is now returning FTS5 enabled in the versions output: ```json { ""datasette"": { ""version"": ""0.22"" }, ""python"": { ""full"": ""3.6.5 (default, May 5 2018, 03:07:21) \n[GCC 6.3.0 20170516]"", ""version"": ""3.6.5"" }, ""sqlite"": { ""extensions"": { ""json1"": null, ""spatialite"": ""4.4.0-RC0"" }, ""fts_versions"": [ ""FTS5"", ""FTS4"", ""FTS3"" ], ""version"": ""3.23.1"" } } ``` The old query didn't work because specifying `(t TEXT)` caused an error",17608, 391354237,@r4vi any objections to me merging this?,17608, 391355030,"No objections; It's good to go @simonw On Wed, 23 May 2018, 14:51 Simon Willison, wrote: > @r4vi any objections to me merging this? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > , or mute > the thread > > . > ",17608, 391355099,Confirmed fixed: https://fivethirtyeight-datasette-mipwbeadvr.now.sh/fivethirtyeight-5de27e3/nba-elo%2Fnbaallelo?_facet=lg_id&_next=100 ,17608, 391437199,Thank you very much! This is most excellent.,17608, 391437462,I'm afraid I just merged #280 which means this no longer applies. You're very welcome to see if you can further optimize the new Dockerfile though.,17608, 391504199,"I'm not keen on anything that modifies the SQLite file itself on startup - part of the datasette contract is that it should work with any SQLite file you throw at it without having any side-effects. A neat thing about SQLite is that because everything happens in the same process there's very little additional overhead involved in executing extra SQL queries - even if we ran a query-per-row to transform data in one specific column it shouldn't add more than a few ms to the total page load time (whereas with MySQL all of the extra query overhead would kill us).",17608, 391504757,"That said, it looks like we may be able to use a library like https://github.com/geomet/geomet to run the conversion from WKB entirely in Python space.",17608, 391505930,"> I'm not keen on anything that modifies the SQLite file itself on startup Ah I didn't mean that - I meant altering the SELECT query to fetch the data so that it ran a spatialite function to transform that specific column. I think that's less useful as a general-purpose plugin hook though, and it's not that hard to parse the WKB in Python (my default approach would be to use [shapely](https://github.com/Toblerity/Shapely), which is great, but geomet looks like an interesting pure-python alternative).",17608, 391583528,"The challenge here is which database should be the ""default"" database. The first database attached to SQLite is treated as the default - if no database is specified in a query, that's the database that queries will be executed against. Currently, each database URL in Datasette (e.g. https://san-francisco.datasettes.com/sf-film-locations-84594a7 v.s. https://san-francisco.datasettes.com/sf-trees-ebc2ad9 ) gets its own independent connection, and all queries within that base URL run against that database. If we're going to attach multiple databases to the same connection, how do we set which database gets to be the default? The easiest thing to do here will be to have a special database (maybe which is turned off by default and can be enabled using `datasette serve --enable-cross-database-joins` or similar) which attaches to ALL the databases. Perhaps it starts as an in-memory database, maybe at `/memory`? ",17608, 391584112,"I built a very rough prototype of this to prove it could work. It's deployed here - and here's an example of a query that joins across two different databases: https://datasette-cross-database-joins-prototype.now.sh/memory?sql=select+fivethirtyeight.%5Blove-actually%2Flove_actually_adjacencies%5D.rowid%2C%0D%0Afivethirtyeight.%5Blove-actually%2Flove_actually_adjacencies%5D.actors%2C%0D%0A%5Bgoogle-trends%5D.%5B20150430_UKDebate%5D.city%0D%0Afrom+fivethirtyeight.%5Blove-actually%2Flove_actually_adjacencies%5D%0D%0Ajoin+%5Bgoogle-trends%5D.%5B20150430_UKDebate%5D%0D%0A++on+%5Bgoogle-trends%5D.%5B20150430_UKDebate%5D.rowid+%3D+fivethirtyeight.%5Blove-actually%2Flove_actually_adjacencies%5D.rowid ``` select fivethirtyeight.[love-actually/love_actually_adjacencies].rowid, fivethirtyeight.[love-actually/love_actually_adjacencies].actors, [google-trends].[20150430_UKDebate].city from fivethirtyeight.[love-actually/love_actually_adjacencies] join [google-trends].[20150430_UKDebate] on [google-trends].[20150430_UKDebate].rowid = fivethirtyeight.[love-actually/love_actually_adjacencies].rowid ``` I deployed it like this: datasette publish now --branch=cross-database-joins fivethirtyeight.db google-trends.db --name=datasette-cross-database-joins-prototype ",17608, 391584366,"I used some pretty ugly hacks, like faking an entire `.inspect()` block for the `:memory:` database just to get past the errors I was seeing. To ship this as a feature it will need quite a bit of code refactoring to make those hacks unnecessary. https://github.com/simonw/datasette/blob/7a3040f5782375373b2b66e5969bc2c49b3a6f0e/datasette/views/database.py#L18-L26",17608, 391584527,Rather than stealing the `/memory` namespace for this it would be nicer if these cross-database joins could be executed at the very top-level URL of the Datasette instance - `https://example.com/?sql=...`,17608, 391752218,Most of the time Datasette is used with just a single database file. So maybe it makes sense for this option to be turned on by default and to ALWAYS be available on the Datasette instance homepage unless the user has explicitly disabled it.,17608, 391752425,"This would make Datasett's SQL features a lot more instantly obvious to people who land on a homepage, which is probably a good thing.",17608, 391752629,"Should this support canned queries too? I think it should, though that raises interesting questions regarding their URL structure.",17608, 391752882,Another option: give this the `/-/all` URL namespace.,17608, 391754506,"Giving it `/all/` would be easier since that way the existing URL routes (including canned queries) would all work... but I would have to teach it NOT to expect a database content hash on that URL. Or maybe it should still have a content hash (to enable far-future cache expiry headers on query results) but the hash should be constructed out of all of the other database hashes concatenated together. That way the URLs would be `/all-5de27e3` and `/all-5de27e3/canned-query-name` Only downside: this would make it impossible to have a database file with the name `all.db`. I think that's probably an OK trade-off. You could turn the feature off with a config flag if you really want to use that filename (for whatever reason). How about `/-all-5de27e3/` instead to avoid collisions?",17608, 391755300,On the `/-all-5de27e3` page we can show the regular https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3 interface but instead of the list of tables we can show a list of attached databases plus some help text showing how to construct a cross-database join.,17608, 391756841,"For an example query that pre-populates that textarea... maybe a UNION that pulls the first 10 rows from the first table of each of the first two databases? ``` select * from (select rowid, actors from fivethirtyeight.[love-actually/love_actually_adjacencies] limit 10) union all select * from (select rowid, city from [google-trends].[20150430_UKDebate] limit 10) ``` https://datasette-cross-database-joins-prototype.now.sh/memory?sql=select+*+from+%28select+rowid%2C+actors+from+fivethirtyeight.%5Blove-actually%2Flove_actually_adjacencies%5D+limit+10%29%0D%0A+++union+all%0D%0Aselect+*+from+%28select+rowid%2C+city+from+%5Bgoogle-trends%5D.%5B20150430_UKDebate%5D+limit+10%29",17608, 391765706,I'm not crazy about the `enable_` prefix on these.,17608, 391765973,This will also give us a mechanism for turning on and off the cross-database joins feature from #283,17608, 391766420,"Maybe `allow_sql`, `allow_facet` and `allow_download`",17608, 391768302,I like `/-/all-5de27e3` for this (with `/-/all` redirecting to the correct hash),17608, 391771202,"So the lookup priority order should be: * table level in metadata * database level in metadata * root level in metadata * `--config` options passed to `datasette serve` * `DATASETTE_X` environment variables",17608, 391771658,It feels slightly weird continuing to call it `metadata.json` as it starts to grow support for config options (which already started with the `units` and `facets` keys) but I can live with that.,17608, 391912392,`allow_sql` should only affect the `?sql=` parameter and whether or not the form is displayed. You should still be able to use and execute canned queries even if this option is turned off.,17608, 391950691,"Demo: datasette publish now --branch=master fixtures.db \ --source=""#284 Demo"" \ --source_url=""https://github.com/simonw/datasette/issues/284"" \ --extra-options ""--config allow_sql:off --config allow_facet:off --config allow_download:off"" \ --name=datasette-demo-284 now alias https://datasette-demo-284-jogjwngegj.now.sh datasette-demo-284.now.sh https://datasette-demo-284.now.sh/ Note the following: * https://datasette-demo-284.now.sh/fixtures-fda0fea has no SQL input textarea * https://datasette-demo-284.now.sh/fixtures-fda0fea has no database download link * https://datasette-demo-284.now.sh/fixtures-fda0fea.db returns 403 forbidden * https://datasette-demo-284.now.sh/fixtures-fda0fea?sql=select%20*%20from%20sqlite_master throws error 400 * https://datasette-demo-284.now.sh/fixtures-fda0fea/facetable shows no suggested facets * https://datasette-demo-284.now.sh/fixtures-fda0fea/facetable?_facet=city_id throws error 400",17608, 392118755,"Thinking about this further, maybe I should embrace ASGI turtles-all-the-way-down and teach each datasette view class to take a scope to the constructor and act entirely as an ASGI component. Would be a nice way of diving deep into ASGI and I can add utility helpers for things like querystring evaluation as I need them.",17608, 392121500,"A few extra thoughts: * Some users may want to opt out of this. We could have `--config version_in_hash:false` * should this affect the filename for the downloadable copy of the SQLite database? Maybe that should stay as just the hash of the contents, but that's a fair bit more complex * What about users who stick with the same version of datasette but deploy changes to their custom templates - how can we help them cache bust? Maybe with `--config cache_version:2`",17608, 392121743,This is also a great excuse to finally write up some detailed documentation on Datasette's caching strategy,17608, 392121905,See also #286,17608, 392212119,"This should detect any table which can be linked to the current table via some other table, based on the other table having a foreign key to them both. These join tables could be arbitrarily complicated. They might have foreign keys to more than two other tables, maybe even multiple foreign keys to the same column. Ideally M2M defection would catch all of these cases. Maybe the resulting inspect data looks something like this: ``` ""artists"": { ... ""m2m"": [{ ""other_table"": ""festivals"", ""through"": ""performances"", ""our_fk"": ""artist_id"", ""other_fk"": ""performance_id"" }] ``` Let's ignore compound primary keys: we k it detect m2m relationships where the join table has foreign keys to a single primary key on the other two tables.",17608, 392214791,"We may need to derive a usable name for each of these relationships that can be used in eg querystring parameters. The name of the join table is a reasonable choice here. Say the join table is called `event_tags` - the querystring for returning all events that are tagged `badger` could be `/db/events?_m2m_event_tags__tag=badger` perhaps? But what if `event_tags` has more than one foreign key back to `events`? Might need to specify the column in `events` that is referred back to by `event_tags` somehow in that case.",17608, 392279508,Related: I started the documentation for using SpatiaLite with Datasette here: https://datasette.readthedocs.io/en/latest/spatialite.html,17608, 392279644,"I've been thinking a bit about modifying the SQL select statement used for the table view recently. I've run into a few examples of SQLite database that slow to a crawl when viewed with datasette because the rows are too big, so there's definitely scope for supporting custom select clauses (avoiding some columns, showing length(colname) for others).",17608, 392288531,"This might also be an opportunity to support an __in= operator - though that's an odd one as it acts equivalent to an OR whereas every other parameter is combined with an AND UPDATE 15th April 2019: I implemented `?column__in=` in a different way, see #433 ",17608, 392288990,An example of a query where you might want to use this option: https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3?sql=select+rowid%2C+*+from+%5Balcohol-consumption%2Fdrinks%5D+order+by+random%28%29+limit+1,17608, 392291605,Documented here https://datasette.readthedocs.io/en/latest/json_api.html#special-table-arguments and here: https://datasette.readthedocs.io/en/latest/config.html#default-cache-ttl,17608, 392291716,Demo: hit refresh on https://fivethirtyeight.datasettes.com/fivethirtyeight-5de27e3?sql=select+rowid%2C+*+from+%5Balcohol-consumption%2Fdrinks%5D+order+by+random%28%29+limit+1&_ttl=0,17608, 392296758,Docs: https://datasette.readthedocs.io/en/latest/json_api.html#different-shapes,17608, 392297392,"I ran a very rough micro-benchmark on the new `num_sql_threads` config option. datasette --config num_sql_threads:1 fivethirtyeight.db Then ab -n 100 -c 10 'http://127.0.0.1:8011/fivethirtyeight-2628db9/twitter-ratio%2Fsenators' | Number of threads | Requests/second | |---|---| | 1 | 4.57 | | 3 | 9.77 | | 10 | 13.53 | | 20 | 15.24 | 50 | 8.21 | ",17608, 392297508,Documentation: http://datasette.readthedocs.io/en/latest/config.html#num-sql-threads,17608, 392302406,"My first attempt at this was to have plugins depend on each other - so there would be a `datasette-leaflet` plugin which adds Leaflet to the page, and the `datasette-cluster-map` and `datasette-leaflet-geojson` plugins would depend on that plugin. I tried this and it didn't work, because it turns out the order in which plugins are loaded isn't predictable. `datasette-cluster-map` ended up adding it's script link before Leaflet had been loaded by `datasette-leaflet`, resulting in JavaScript errors.",17608, 392302416,For the moment then I'm going with a really simple solution: when iterating through `extra_css_urls` and `extra_js_urls` de-dupe by URL and avoid outputting the same link twice.,17608, 392302456,The big gap in this solution is conflicting versions: I don't yet have a story for what happens if two plugins attempt to load different versions of Leaflet. ,17608, 392305776,These plugin config options should be exposed to JavaScript as `datasette.config.plugins`,17608, 392316250,It looks like we can use the `geometry_columns` table to introspect which columns are SpatiaLite geometries. It includes a `geometry_type` integer which is documented here: https://www.gaia-gis.it/fossil/libspatialite/wiki?name=switching-to-4.0,17608, 392316306,Relevant to this ticket: I've been playing with a plugin that automatically renders any GeoJSON cells as leaflet maps: https://github.com/simonw/datasette-leaflet-geojson,17608, 392316673,Open question: how should this affect the row page? Just because columns were hidden on the table page doesn't necessarily mean they should be hidden on the row page as well. ,17608, 392316701,I could certainly see people wanting different custom column selects for the row page compared to the table page.,17608, 392338130,"Here's my first sketch at a metadata format for this: * `columns`: optional list of columns to include - if missing, shows all * `column_selects`: dictionary mapping column names to alternative select clauses `column_selects` can also invent new keys and use them to create derived columns. These new keys will be selected at the end of the list of columns UNLESS they are mentioned in `columns`, in which case that sequence will define the order. Can you facet by things that are customized using `column_selects`? Yes, and let's try running suggested facets against those columns as well. ``` { ""databases"": { ""databasename"": { ""tables"": { ""tablename"": { ""columns"": [ ""id"", ""name"", ""size"" ], ""column_selects"": { ""name"": ""upper(name)"", ""geo_json"": ""AsGeoJSON(Geometry)"" } ""row_columns"": [...] ""row_column_selects"": {...} } ``` The `row_columns` and `row_column_selects` properties work the same as the `column*` ones, except they are applied on the row page instead. If omitted, the `column*` ones will be used on the row page as well. If you want the row page to switch back to Datasette's default behaviour you can set `""row_columns"": [], ""row_column_selects"": {}`.",17608, 392342269,"Here's the metadata I tried against that first working prototype: ``` { ""databases"": { ""timezones"": { ""tables"": { ""timezones"": { ""columns"": [""PK_UID""], ""column_selects"": { ""upper_tzid"": ""upper(tzid)"", ""Geometry"": ""AsGeoJSON(Geometry)"" } } } }, ""wtr"": { ""tables"": { ""license_frequency"": { ""columns"": [""id"", ""license"", ""tx_rx"", ""frequency""], ""column_selects"": { ""latitude"": ""Y(Geometry)"", ""longitude"": ""X(Geometry)"" } } } } } } ``` Run using this: datasette timezones.db wtr.db \ --reload --debug --load-extension=/usr/local/lib/mod_spatialite.dylib \ -m column-metadata.json --config sql_time_limit_ms:10000 Usefully, the `--reload` flag detects changes to the `metadata.json` file as well as Datasette's own Python code.",17608, 392342947,I'd still like to be able to over-ride this using querystring arguments.,17608, 392343690,"Turns out it's actually possible to pull data from other tables using the mechanism in the prototype: ``` { ""databases"": { ""wtr"": { ""tables"": { ""license"": { ""column_selects"": { ""count"": ""(select count(*) from license_frequency where license_frequency.license = license.id)"" } } } } } } ``` Performance using this technique is pretty terrible though: ![2018-05-27 at 9 07 am](https://user-images.githubusercontent.com/9599/40588124-8169d7fa-618d-11e8-9880-ccc1904b05d9.png) ",17608, 392343839,"The more efficient way of doing this kind of count would be to provide a mechanism which can also add extra fragments to a `GROUP BY` clause used for the `SELECT`. Or... how about a mechanism similar to Django's `prefetch_related` which lets you define extra queries that will be called with a list of primary keys (or values from other columns) and used to populate a new column? A little unconventional but could be extremely useful and efficient. Related to that: since the per-query overhead in SQLite is tiny, could even define an extra query to be run once-per-row before returning results.",17608, 392345062,There needs to be a way to turn this off and return to Datasette default bahviour. Maybe a `?_raw=1` querystring parameter for the table view.,17608, 392350495,"Querystring design: * `?_column=a&_column=b` - equivalent of `""columns"": [""a"", ""b""]` in `metadata.json` * `?_select_nameupper=upper(name)` - equivalent of `""column_selects"": {""nameupper"": ""upper(name)""}`",17608, 392350568,"If any `?_column=` parameters are provided the metadata version is completely ignored. ",17608, 392350980,"Should `?_raw=1` also turn off foreign key expansions? No, we will eventually provide a separate mechanism for that (or leave it to nerds who care to figure out using JSON or CSV export).",17608, 392568047,Closing this as obsolete since we have facets now.,17608, 392574208,"I'm handling this as separate documentation sections instead, e.g. http://datasette.readthedocs.io/en/latest/spatialite.html",17608, 392574358,Closing this as obsolete in favor of other issues [tagged documentation](https://github.com/simonw/datasette/labels/documentation).,17608, 392574415,I implemented this as `?_ttl=0` in #289 ,17608, 392575160,"I've changed my mind about this. ""Select every record on the 3rd day of the month"" doesn't strike me as an actually useful feature. ""Select every record in 2018 / in May 2018 / on 1st May 2018"", if you are using the SQLite-preferred datestring format, are already supported using LIKE queries (or the startswith filter): * https://fivethirtyeight.datasettes.com/fivethirtyeight/inconvenient-sequel%2Fratings?timestamp__startswith=2017 * https://fivethirtyeight.datasettes.com/fivethirtyeight/inconvenient-sequel%2Fratings?timestamp__startswith=2017-08 * https://fivethirtyeight.datasettes.com/fivethirtyeight/inconvenient-sequel%2Fratings?timestamp__startswith=2017-08-29 ",17608, 392575448,"This shouldn't be a comma-separated list, it should be an argument you can pass multiple times to better match #255 and #292 Maybe `?_json=foo&_json=bar` ",17608, 392580715,"Oops, that commit should have referenced #121 ",17608, 392580902,"Implemented in 76d11eb768e2f05f593c4d37a25280c0fcdf8fd6 Documented here: http://datasette.readthedocs.io/en/latest/json_api.html#special-json-arguments",17608, 392600866,"This is an accidental duplicate, work is now taking place in #266",17608, 392601114,I think the way Datasette executes SQL queries in a thread pool introduced in #45 is a good solution for this ticket.,17608, 392601478,I'm going to close this as WONTFIX for the moment. Once Plugins #14 grows the ability to add extra URL paths and views someone who needs this could build it as a plugin instead.,17608, 392602334,"The `/.json` endpoint is more of an implementation detail of the homepage at this point. A better, documented ( http://datasette.readthedocs.io/en/stable/introspection.html#inspect ) endpoint for finding all of the databases and tables is https://parlgov.datasettes.com/-/inspect.json",17608, 392602558,I'll have the error message display a link to the documentation.,17608, 392605574,"![2018-05-28 at 2 24 pm](https://user-images.githubusercontent.com/9599/40629887-e991c61c-6282-11e8-9d66-6387f90e87ca.png) ",17608, 392606044,"The other major limitation of APSW is its treatment of unicode: https://rogerbinns.github.io/apsw/types.html - it tells you that it is your responsibility to ensure that TEXT columns in your SQLite database are correctly encoded. Since Datasette is designed to work against ANY SQLite database that someone may have already created, I see that as a show-stopping limitation. Thanks to https://github.com/coleifer/sqlite-vtfunc I now have a working mechanism for virtual tables (I've even built a demo plugin with them - https://github.com/simonw/datasette-sql-scraper ) which was the main thing that interested me about APSW. I'm going to close this as WONTFIX - I think Python's built-in `sqlite3` is good enough, and is now so firmly embedded in the project that making it pluggable would be more trouble than it's worth.",17608, 392606418,"> It could also be useful to allow users to import a python file containing custom functions that can that be loaded into scope and made available to custom templates. That's now covered by the plugins mechanism - you can create plugins that define custom template functions: http://datasette.readthedocs.io/en/stable/plugins.html#prepare-jinja2-environment-env",17608, 392815673,"I'm coming round to the idea that this should be baked into Datasette core - see above referenced issues for some of the explorations I've been doing around this area. Datasette should absolutely work without SpatiaLite, but it's such a huge bonus part of the SQLite ecosystem that I'm happy to ship features that take advantage of it without being relegated to plugins. I'm also becoming aware that there aren't really that many other interesting loadable extensions for SQLite. If SpatiaLite was one of dozens I'd feel that a rule that ""anything dependent on an extension lives in a plugin"" would make sense, but as it stands I think 99% of the time the only loadable extensions people will be using will be SpatiaLite and json1 (and json1 is available in the amalgamation anyway). ",17608, 392822050,"I don't know how it happened, but I've somehow got myself into a state where my local SQLite for Python 3 on OS X is `3.23.1`: ``` ~ $ python3 Python 3.6.5 (default, Mar 30 2018, 06:41:53) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin Type ""help"", ""copyright"", ""credits"" or ""license"" for more information. >>> import sqlite3 >>> sqlite3.connect(':memory:').execute('select sqlite_version()').fetchall() [('3.23.1',)] >>> ``` Maybe I did something in homebrew that changed this? I'd love to understand what exactly I did to get to this state.",17608, 392825746,"I haven't had time to look further into this, but if doing this as a plugin results in useful hooks then I think we should do it that way. We could always require the plugin as a standard dependency. I think this is going to result in quite a bit of refactoring anyway so it's a good time to add hooks regardless. On the other hand, if we have to add lots of specialist hooks for it then maybe it's worth integrating into the core.",17608, 392828475,"Python standard-library SQLite dynamically links against the system sqlite3. So presumably you installed a more up-to-date sqlite3 somewhere on your `LD_LIBRARY_PATH`. To compile a statically-linked pysqlite you need to include an amalgamation in the project root when building the extension. Read the relevant setup.py.",17608, 392831543,"I ran an informal survey on twitter and most people were on 3.21 - https://twitter.com/simonw/status/1001487546289815553 Maybe this is from upgrading to the latest OS X release.",17608, 392840811,"Since #275 will allow configs to be overridden at the table and database level it also makes sense to expose a completely evaluated list of configs at: * `/dbname/-/config` * `/dbname/tablename/-/config` Similar to https://fivethirtyeight.datasettes.com/-/config",17608, 392890045,"Just about to ask for this! Move this page https://github.com/simonw/datasette/wiki/Datasettes into a datasette, with some concept of versioning as well.",17608, 392895733,Do you have an existing example with views?,17608, 392917380,Creating URLs using concatenation as seen in `('https://twitter.com/' || user) as user_url` is likely to have all sorts of useful applications for ad-hoc analysis.,17608, 392918311,Should the `tablename` ones also work for views and canned queries? Probably not.,17608, 392969173,The more time I spend with SpatiaLite the more convinced I am that this should be default behavior. There's nothing useful about the binary Geometry representation - it's not even valid WKB. I'm on board with WKT as the default display in HTML and GeoJSON as the default for `.json`,17608, 393003340,Funny you should mention that... I'm planning on doing that as part of the official Datasette website at some point soon. A Datasette instance that lists other Datasette instances feels pleasingly appropriate.,17608, 393014943,I just realised a problem with GeoJSON is that it assumes that the underlying geometry is WGS 84 latitude/longitude points - but it's very possible for a SpatiaLite geometry to contain geometric data that's nothing to do with geospatial projections.,17608, 393020749,"Challenge: how to deal with tables where the name ends in `.csv`? I actually have one of these in the test suite at the moment: https://github.com/simonw/datasette/blob/d69ebce53385b7c6fafb85fdab3b136dbf3f332c/tests/fixtures.py#L234-L237",17608, 393064224,"https://datasette-registry.now.sh Is now live, powered by https://github.com/simonw/datasette-registry - still needs plenty of work but it's an interesting start.",17608, 393106520,"I don't think it's unreasonable to only support spatialite geometries in a coordinate reference system which is at least transformable to WGS84. It would be nice to support different CRSes in the database so conversion to spatialite from the source data is lossless. I think the working CRS for datasette should be WGS84 though (leaflet requires it, for example) - it's just a case of calling `ST_Transform(geom, 4326)` on the column while we're loading the data.",17608, 393534579,I actually started doing this in 45e502aace6cc1198cc5f9a04d61b4a1860a012b,17608, 393544357,"Demo: https://datasette-publish-spatialite-demo.now.sh/spatialite-test-c88bc35?sql=select+AsText(Geometry)+from+HighWays+limit+1%3B Published using `datasette publish now --spatialite /tmp/spatialite-test.db`",17608, 393547960,"SpatialLite columns are actually quite a bit more interesting than this - they also have a `geometry_type` (point, polygon, linestring etc), a `coord_dimension` (usually 2 but can be higher) and an `srid`. For example: https://datasette-publish-spatialite-demo.now.sh/spatialite-test-c88bc35/geometry_columns ![2018-05-31 at 7 22 am](https://user-images.githubusercontent.com/9599/40787843-6f9600ee-64a3-11e8-84e5-64d7cc69603a.png) The SRID here is particularly interesting, because it helps hint at the fact that the results from these queries won't be latitude/longitude co-ordinates - which means that `AsGeoJSON()` won't return results that can be easily rendered by Leaflet: https://datasette-publish-spatialite-demo.now.sh/spatialite-test-c88bc35?sql=select+AsGeoJSON(Geometry)+from+HighWays%20limit1 Compare with https://timezones-api.now.sh/timezones-a99b2e3/geometry_columns: ![2018-05-31 at 7 25 am](https://user-images.githubusercontent.com/9599/40787991-d2650756-64a3-11e8-936e-2dcce7dd1515.png) ",17608, 393548602,Presumably the difference in primary key structure between those two is caused by the fact that the `spatialite-test` database (actually https://www.gaia-gis.it/spatialite-2.3.1/test-2.3.sqlite.gz downloaded from https://www.gaia-gis.it/spatialite-2.3.1/resources.html ) was created by a much older version of SpatialLite - presumably v2.3.1,17608, 393549215,"Also of note: `spatialite-test` uses readable strings in the `type` column, while `timezones` has a `geometry_type` column with integers in it. Those integers are documented here: https://www.gaia-gis.it/fossil/libspatialite/wiki?name=switching-to-4.0 ![2018-05-31 at 7 29 am](https://user-images.githubusercontent.com/9599/40788210-5d0f0dd4-64a4-11e8-8141-0386b5c7b384.png) ",17608, 393554151,I fixed this in https://github.com/simonw/datasette/commit/b18e4515855c3f1eeca3dfcccdbb6df05869084a,17608, 393557406,"Our test fixtures currently have a table with a name ending in `.csv`: https://github.com/simonw/datasette/blob/d69ebce53385b7c6fafb85fdab3b136dbf3f332c/tests/fixtures.py#L234-L237",17608, 393557968,"I'm not sure what the best JSON shape for this would be considering the potential complexity of geospatial columns. I do think it's worth exposing these in the inspect JSON though, mainly so Datasette Registry can keep track of all of the openly available geodata out there.",17608, 393599840,The interesting thing about this is that it requires URL routing to become aware of the names of all of the available tables.,17608, 393600441,"Here's a nasty challenge: what happens if a database has the following two tables: * `blah` * `blah.json` What would the URL be for the JSON endpoint for the `blah` table?",17608, 393610731,I prototyped this a while ago here https://github.com/simonw/datasette/commit/04476ead53758044a5f272ae8696b63d6703115e before we had the ``--config`` mechanism.,17608, 394037368,"Solution for he above: support an optional `?_format=json/csv` parameter on the regular table view. Then if you have tables with the above colliding names you can use `/db/blah.json?_format=json` ",17608, 394400419,"In the interest of getting this shipped, I'm going to ignore the `3.7.10` issue.",17608, 394412217,Docs: http://datasette.readthedocs.io/en/latest/config.html#cache-size-kb,17608, 394412784,I think this is related to #303,17608, 394417567,"When serving streaming responses, I need to check that a large CSV file doesn't completely max out the CPU in a way that is harmful to the rest of the instance. If it does, one option may be to insert an async sleep call in between each chunk that is streamed back. This could be controlled by a `csv_pause_ms` config setting, defaulting to maybe 5 but can be disabled entirely by setting to 0. That's only if testing proves that this is a necessary mechanism.",17608, 394431323,I built this ASGI debugging tool to help with this migration: https://asgi-scope.now.sh/fivethirtyeight-34d6604/most-common-name%2Fsurnames.json?foo=bar&bazoeuto=onetuh&a=.,17608, 394503399,Results of an extremely simple micro-benchmark comparing the two shows that uvicorn is at least as fast as Sanic (benchmarks a little faster with a very simple payload): https://gist.github.com/simonw/418950af178c01c416363cc057420851,17608, 394764713,"https://github.com/encode/uvicorn/blob/572b5fe6c811b63298d5350a06b664839624c860/uvicorn/run.py#L63 is how you start a Uvicorn server from code as opposed to the `uvicorn` CLI from uvicorn.run import UvicornServer UvicornServer().run(app, host=host, port=port) ",17608, 394894500,"Input: - function that says if a name is a valid database - Function that says if a table exists - URL Output: - view class - Arguments - Redirect (if it should redirect)",17608, 394894910,I'm going to use a named tuple for the output. That way I can support either tuple destructing or explicit property access on the returned value.,17608, 394895267,To support a future where Datasette is an ASGI app that can be attached to a URL within a larger application the routing function should have the option to accept a path prefix which will then be automatically attached to any resulting redirects.,17608, 394895750,"A neat trick could be that if the router returns a redirect it could then resolve that redirect to see if it will 404 (or redirect itself) before returning that response. This would need its own counter to guard against infinite redirects. I'm not going to do this though: any view that results in a chain of redirects like this is a bug that should be fixed at the source.",17608, 395463497,"I started sketching this out in a branch, see pull request #307 - but I've decided I don't like it. I'm going to close this ticket and stick with regular expression URL routing for the moment. If I change my mind in the future the code in #307 lives in separate files (`datasette/routes.py` and `tests/test_routes.py`) so bringing it back into the project will be trivial.",17608, 395463598,Closing this pull request for reasons outlined here: https://github.com/simonw/datasette/issues/306#issuecomment-395463497,17608, 396048471,https://github.com/kubernetes/community/blob/master/contributors/devel/help-wanted.md Is worth stealing from too.,17608, 397534196,"The first version of this is now shipped to master. I ended up rewriting most of the experimental branch to deal with the nasty issue described in #303 Demo is available on https://fivethirtyeight.datasettes.com/fivethirtyeight-ab24e01/most-common-name%2Fsurnames ![2018-06-15 at 12 11 am](https://user-images.githubusercontent.com/9599/41455090-bd5ece30-7030-11e8-8da4-11fbb1f2ef8b.png) Here's the CSV version of that page: https://fivethirtyeight.datasettes.com/fivethirtyeight-ab24e01/most-common-name%2Fsurnames.csv",17608, 397534404,"Still to add: the streaming version that iterates through all of the pages, as seen in experimental commit https://github.com/simonw/datasette/commit/ced379ea325787b8c3bf0a614daba1fa4856a3bd",17608, 397534498,Also needs documentation.,17608, 397637302,"I'm going with the terminology ""labels"" here. You'll be able to add ``?_labels=1`` and the JSON will look something like this: ``` { ""rowid"": 233, ""TreeID"": 121240, ""qLegalStatus"": { ""value"" 2, ""label"": ""Private"" } ""qSpecies"": { ""value"": 16, ""label"": ""Sycamore"" } ""qAddress"": ""91 Commonwealth Ave"", ... } ``` I need this to help build foreign key expansions for CSV files, see #266 ",17608, 397648080,"I considered including a `""table""` key like this: ``` ""qLegalStatus"": { ""value"" 2, ""label"": ""Private"", ""table"": ""qLegalStatus"" } ``` This would help generate the HTML links using just the JSON data. But... I realized that in a list of 50 rows that value would be duplicated 50 times which is a bit nasty.",17608, 397663968,"Nearly done, but I need the HTML view to ignore the `?_labels=1` param (it throws an error at the moment).",17608, 397668427,Demo: https://datasette-json-labels-demo.now.sh/fixtures-fda0fea/facetable.json?_labels=1&_shape=array,17608, 397729319,I'm also going to add the ability to specify individual columns that you want to expand using `?_label=city_id&_label=state_id`,17608, 397729500,The `.json` and `.csv` links displayed on the table page should default to using `?_labels=1` if Datasette detects that there are foreign key expansions available for the page.,17608, 397729945,"The ""This data as ..."" area of the page is getting a bit untidy, especially if I'm going to add other download options in the future. I think I'll move the HTML to the page footer (less concerns about taking up lots of space there) and then have a bit of JavaScript that turns it into a show/hide menu of some sort in its current location.",17608, 397823913,"Still todo: - [ ] HTML view to obey the ?_labels=1 param (it throws an error at the moment) - [ ] `?_label=one&_label=2` support for only expanding specific labels - [ ] Better docs",17608, 397824991,"I'm going to support `?_labels=` on HTML views, but I'll allow it to be used to turn them off (they are on by default) using `?_labels=off`. Related: 7e0caa1e62607c6579101cc0e62bec8899013715 where I added a new `value_as_boolean` helper extracted from how `--config` works in `cli.py`.",17608, 397825583,This is already covered by #292 ,17608, 397839482,Should facets always have their labels expanded or should they also obey the `_labels` and `_label` querystring arguments?,17608, 397839583,"I'm a bit torn on naming - choices are: * `?_labels=on` and `?_label=col1&_label=col2` * `?_expands=on` (or `?_expand_all=on`) and `?_expand=col1&_expand=col2`",17608, 397840676,For the moment I'm going with `_labels=`.,17608, 397841968,I merged this manually in ed631e690b81e34fcaeaba1f16c9166f1c505990,17608, 397842194,"Some demos: * https://datasette-labels-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List - regular HTML view * https://datasette-labels-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List?_labels=off - no labels * https://datasette-labels-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List.json?_labels=on - JSON with all labels * https://datasette-labels-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List.json?_label=qSpecies&_shape=array - JSON with specific labels in array shape * https://datasette-labels-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List.csv?_labels=on - CSV with all labels * https://datasette-labels-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List.csv?_label=qSpecies - CSV with specific labels",17608, 397842246,"Two demos of the new functionality in #233 as it applies to CSV: * https://datasette-labels-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List.csv?_labels=on - CSV with all foreign key columns expanded * https://datasette-labels-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List.csv?_label=qSpecies - CSV with specific columns expanded",17608, 397842667,"Still todo: - [x] Streaming version - [ ] Tidy up the ""This data as ..."" UI - [x] Default .csv (and .json) links to use `?_labels=on` (only if at least one foreign key detected) ",17608, 397900434,This will require some relatively sophisticated Travis build steps. Useful docs: https://docs.travis-ci.com/user/build-stages/ - useful example: https://docs.travis-ci.com/user/build-stages/deploy-heroku/,17608, 397907987,"This very nearly works... * https://latest.datasette.io/ * https://f0c1722.datasette.io/ But... https://f0c1722.datasette.io/-/versions isn't showing the correct note: ``` { ""datasette"": { ""version"": ""0.22.1"" } ... ``` There should be a `""note""` field there with the full commit hash.",17608, 397908093,"It looks like all of my test deploys ended up going to the same Zeit deployment ID: https://zeit.co/simonw/datasette-latest/rbmtcedvlj This is strange... the Dockerfile should be different for each one (due to the differing version-note). https://github.com/simonw/datasette/commit/db1e6bc182d11f333e6addaa1a6be87625a4e12b#diff-34418c57343344c73271e13b01b7fcd9R255",17608, 397908185,"``` The command ""datasette publish now fixtures.db -m fixtures.json --token=$NOW_TOKEN --branch=$TRAVIS_COMMIT --version-note=$TRAVIS_COMMIT"" exited with 0. ``` Partial log of the ``datasette publish now`` output: ``` > Step 5/7 : RUN datasette inspect fixtures.db --inspect-file inspect-data.json > ---> Running in d373f330e53e > ---> 09bab386aaa3 > Removing intermediate container d373f330e53e > Step 6/7 : EXPOSE 8001 > ---> Running in e0fe37b3061c > ---> 47798440e214 > Removing intermediate container e0fe37b3061c > Step 7/7 : CMD datasette serve --host 0.0.0.0 fixtures.db --cors --port 8001 --inspect-file inspect-data.json --metadata metadata.json --version-note f0c17229b7a7914d3da02e087dfd0e25d8321448 ``` So it looks like `--version-note` is being correctly set there.",17608, 397908614,"Aha! ```1.03s$ now alias --token=$NOW_TOKEN > Error! Couldn't find a deployment to alias. Please provide one as an argument. The command ""now alias --token=$NOW_TOKEN"" exited with 1. ``` That explains it. I need to set the same alias in my call to `datasette publish`.",17608, 397908947,"That fixed it! https://958b75c.datasette.io/-/versions ``` { ""python"": { ""version"": ""3.6.5"", ""full"": ""3.6.5 (default, Jun 6 2018, 19:19:24) \n[GCC 6.3.0 20170516]"" }, ""datasette"": { ""version"": ""0+unknown"", ""note"": ""958b75c69841ef5913da86e0eb2df634a9b95fda"" }, ""sqlite"": { ""version"": ""3.16.2"", ""fts_versions"": [ ""FTS5"", ""FTS4"", ""FTS3"" ], ""extensions"": { ""json1"": null } } } ```",17608, 397912840,"This worked! https://github.com/simonw/datasette/commit/5a0a82faf9cf9dd109d76181ed00eea19472087c - it spat out a 76MB CSV when I ran it against the sf-trees demo database. It was just a quick hack though - it currently ignores `_labels=` and `_dl=` which need to be supported. I'm going to add a config option for turning full CSV export off just in case any Datasette users are uncomfortable with URLs that churn out that much data in one go. ``` ConfigOption(""allow_csv_stream"", True, """""" Allow .csv?_stream=1 to download all rows (ignoring max_returned_rows) """""".strip()), ```",17608, 397915258,Someone malicious could use a UNION to generate an unpleasantly large CSV response. I'll add another config setting which limits the response size to 100MB but can be turned off by setting it to 0.,17608, 397915403,"Since CSV streaming export doesn't work for custom SQL queries (since they don't support `_next=` pagination) there's no need to provide a option that disables streams just for custom SQL. Related: the UI should not show the option to download everything on custom SQL pages.",17608, 397916091,I was also worried about the performance of pagination over custom `_sort` orders or views which use offset pagination - but Datasette's SQL time limits should prevent those from getting out of hand. This does mean that a streaming CSV file may be truncated with an error - if this happens we should ensure the error is written out as the last line of the CSV so anyone who tried to import it gets a relevant error message informing them that the export did not complete.,17608, 397916321,The export UI could be a GET form controlling various parameters. This would discourage crawlers from hitting the export links and would also allow us to express the full range of export options.,17608, 397918264,"Simpler design: the top of the page will link to basic .json and .csv and ""advanced"" - which will fragment link to an advanced export format the bottom of the page.",17608, 397923253,Ideally the downloadable filenames of exported CSVs would differ across different querystring parameters. Maybe S`treet_Trees-56cbd54.csv` where `56cbd54` is a hash of the querystring?,17608, 397949002,"Advanced export pane: ![2018-06-17 at 10 52 pm](https://user-images.githubusercontent.com/9599/41520166-3809a45a-7281-11e8-9dfa-2b10f4cb9672.png) ",17608, 397952129,Advanced export pane demo: https://latest.datasette.io/fixtures-35b6eb6/facetable?_size=4,17608, 398030903,"I should add that I'm using datasette version 0.22, Python 2.7.10 on Mac OS X. Happy to send more info if helpful.",17608, 398098582,This is now released in Datasette 0.23! http://datasette.readthedocs.io/en/latest/changelog.html#v0-23,17608, 398101670,"Wow, I've gone as high as 7GB but I've never tried it against 600GB. `datasette inspect` is indeed expected to take a long time for large databases. That's why it's available as a separate command: by running `datasette inspect` to generate `inspect-data.json` you can execute it just once against a large database and then have `datasette serve` take advantage of that cached metadata (hence avoiding `datasette serve` hanging on startup). As you spotted, most of the time is spent in those counts. I imagine you don't need those row counts in order for the rest of Datasette to function correctly (they are mainly used for display purposes - on the https://latest.datasette.io/fixtures index page for example). If your database changes infrequently, for the moment I recommend running `datasette inspect` once to generate the `inspect-data.json` file (let me know how long it takes) and then passing that file to `datasette serve mydb.db --inspect-file=inspect-data.json` If your database DOES change frequently then this workaround won't help you much. Let me know and I'll see how much work it would take to have those row counts be optional rather than required.",17608, 398102537,https://latest.datasette.io/ now always hosts the latest version of the code. I've started linking to it from our documentation.,17608, 398109204,"Hi Simon, Thanks for the response. Ok I'll try running `datasette inspect` up front. In principle the db won't change. However, the site's in development and it's likely I'll need to add views and some auxiliary (smaller) tables as I go along. I will need to be careful with this if it involves an inspect step in each iteration, though. g. ",17608, 398133159,"For #271 I've been contemplating having Datasette work against an on-disk database that gets modified without needing to restart the server. For that to work, I'll have to dramatically change the inspect() mechanism. It may be that inspect becomes an optional optimization in the future.",17608, 398133924,"As seen in #316 inspect is already taking a VERY long time to run against large (600GB) databases. To get this working I may have to make inspect an optional optimization and run introspection for columns and primary keys in demand. The one catch here is the `count(*)` queries - Datasette may need to learn not to return full table counts in circumstances where the count has not been pre-calculates and takes more than Xms to generate.",17608, 398778485,This would be a great feature to have!,17608, 398825294,Still a bug in 0.23,17608, 398826108,This depends on #272 - Datasette ported to ASGI.,17608, 398973176,"This is a little bit fiddly, but it's possible to do it using SQLite string concatenation. Here's an example: ``` select * from facetable where neighborhood like ""%"" || :text || ""%""; ``` Try it here: https://latest.datasette.io/fixtures-35b6eb6?sql=select+*+from+facetable+where+neighborhood+like+%22%25%22+%7C%7C+%3Atext+%7C%7C+%22%25%22%3B&text=town ![2018-06-20 at 9 33 pm](https://user-images.githubusercontent.com/9599/41698185-a52143f2-74d1-11e8-8d16-32bfc4542104.png) ",17608, 398973309,"Demo of fix: the `on_earth` facet on https://latest.datasette.io/fixtures-cafd088/facetable?_facet=planet_int&_facet=on_earth&_facet=city_id ![2018-06-20 at 9 35 pm](https://user-images.githubusercontent.com/9599/41698208-ebb6b72a-74d1-11e8-9d85-de7600177f69.png) ",17608, 398976488,"I've added this to the unit tests and the documentation. Docs: http://datasette.readthedocs.io/en/latest/sql_queries.html#canned-queries Canned query demo: https://latest.datasette.io/fixtures/neighborhood_search?text=town New unit test: https://github.com/simonw/datasette/blob/3683a6b626b2e79f4dc9600d45853ca4ae8de11a/tests/test_api.py#L333-L344 https://github.com/simonw/datasette/blob/3683a6b626b2e79f4dc9600d45853ca4ae8de11a/tests/fixtures.py#L145-L153",17608, 399098080,"Perfect, thank you!!",17608, 399106871,"One thing I've noticed with this approach is that the query is executed with no parameters which I do not believe was the case previously. In the case the table contains a lot of data, this adds some time executing the query before the user can enter their input and run it with the parameters they want.",17608, 399126228,"This seems to fix that: ``` select neighborhood, facet_cities.name, state from facetable join facet_cities on facetable.city_id = facet_cities.id where :text != '' and neighborhood like '%' || :text || '%' order by neighborhood; ``` Compare this (with empty string): https://latest.datasette.io/fixtures-cafd088?sql=select+neighborhood%2C+facet_cities.name%2C+state%0D%0Afrom+facetable%0D%0A++++join+facet_cities+on+facetable.city_id+%3D+facet_cities.id%0D%0Awhere+%3Atext+%21%3D+%22%22+and+neighborhood+like+%27%25%27+%7C%7C+%3Atext+%7C%7C+%27%25%27%0D%0Aorder+by+neighborhood%3B To this: https://latest.datasette.io/fixtures-cafd088?sql=select+neighborhood%2C+facet_cities.name%2C+state%0D%0Afrom+facetable%0D%0A++++join+facet_cities+on+facetable.city_id+%3D+facet_cities.id%0D%0Awhere+%3Atext+%21%3D+%22%22+and+neighborhood+like+%27%25%27+%7C%7C+%3Atext+%7C%7C+%27%25%27%0D%0Aorder+by+neighborhood%3B&text=town",17608, 399129220,Those queries look identical. How can this be prevented if the queries are in a metadata.json file?,17608, 399134680,I can use Sanic middleware for this: http://sanic.readthedocs.io/en/latest/sanic/middleware.html#responding-early,17608, 399139462,"Demo of fix: https://latest.datasette.io/fixtures-e14e080/searchable_tags ![2018-06-21 at 8 13 am](https://user-images.githubusercontent.com/9599/41728203-0b571e9a-752b-11e8-9702-9887e3ede5bc.png) ",17608, 399142274,Demo: https://latest.datasette.io/fixtures-e14e080/,17608, 399144688,"From https://docs.travis-ci.com/user/deployment/pypi/ > Note that if your PyPI password contains special characters you need to escape them before encrypting your password. Some people have [reported difficulties](https://github.com/travis-ci/dpl/issues/377) connecting to PyPI with passwords containing anything except alphanumeric characters. ",17608, 399150285,That fixed it! https://travis-ci.org/simonw/datasette/jobs/395078407 ran successfully and https://pypi.org/project/datasette/ now hosts Datasette 0.23.1 deployed via Travis.,17608, 399154550,Fixed here too now: https://registry.datasette.io/registry-c10707b/datasette_tags,17608, 399156960,"Demo of fix: https://latest.datasette.io/fixtures-e14e080/simple_view ![2018-06-21 at 9 04 am](https://user-images.githubusercontent.com/9599/41731021-2be526aa-7532-11e8-9c3b-f787f918328e.png) ",17608, 399157944,Thanks to #319 the test suite now includes a m2m table: https://latest.datasette.io/fixtures-e14e080/searchable_tags,17608, 399171239,"I may have misunderstood your problem here. I understood that the problem is that when using the `""%"" || :text || ""%""` construct the first hit to that page (with an empty string for `:text`) results in a `where neighborhood like ""%%""` query which is slow because it matches every row in the database. My fix was to add this to the where clause: where :text != '' and ... Which means that when you first load the page the where fails to match any rows and you get no results (and hopefully instant loading times assuming SQLite is smart enough to optimize this away). That's why you don't see any rows returned on this page: https://latest.datasette.io/fixtures-cafd088?sql=select+neighborhood%2C+facet_cities.name%2C+state%0D%0Afrom+facetable%0D%0A++++join+facet_cities+on+facetable.city_id+%3D+facet_cities.id%0D%0Awhere+%3Atext+%21%3D+%22%22+and+neighborhood+like+%27%25%27+%7C%7C+%3Atext+%7C%7C+%27%25%27%0D%0Aorder+by+neighborhood%3B",17608, 399173916,"Oh I see.. My issue is that the query executes with an empty string prior to the user submitting the parameters. I'll try adding your workaround to some of my queries. Thanks again,",17608, 399721346,"Demo: go to https://vega.github.io/editor/ and paste in the following: ``` { ""data"": { ""url"": ""https://fivethirtyeight.datasettes.com/fivethirtyeight/twitter-ratio%2Fsenators.csv?_size=max&_sort_desc=replies"", ""format"": { ""type"": ""csv"" } }, ""mark"": ""bar"", ""encoding"": { ""x"": { ""field"": ""created_at"", ""type"": ""temporal"" }, ""y"": { ""field"": ""replies"", ""type"": ""quantitative"" }, ""color"": { ""field"": ""user"", ""type"": ""nominal"" } } } ``` ![2018-06-23 at 6 10 pm](https://user-images.githubusercontent.com/9599/41814923-b1613370-7710-11e8-94ac-5b87b0b629ed.png) ",17608, 400166540,This looks VERY relevant: https://github.com/encode/starlette,17608, 400571521,"I’m up for helping with this. Looks like you’d need static files support, which I’m planning on adding a component for. Anything else obviously missing? For a quick overview it looks very doable - the test client ought to me your test cases stay roughly the same. Are you using any middleware or other components for the Sanic ecosystem? Do you use cookies or sessions at all?",17608, 400903687,Need to ship docker image: #57 ,17608, 400903871,"Shipped to Docker Hub: https://hub.docker.com/r/datasetteproject/datasette/ I did this manually the first time. I'll set Travis up to do this automatically in #329",17608, 400904514,https://datasette.readthedocs.io/en/latest/installation.html#using-docker,17608, 401003061,I pushed this to Docker Hub https://hub.docker.com/r/datasetteproject/datasette/ and added notes on how to use it to the documentation: http://datasette.readthedocs.io/en/latest/installation.html#using-docker,17608, 401310732,"@russs Different map projections can presumably be handled on the client side using a leaflet plugin to transform the geometry (eg [kartena/Proj4Leaflet](https://kartena.github.io/Proj4Leaflet/)) although the leaflet side would need to detect or be informed of the original projection? Another possibility would be to provide an easy way/guidance for users to create an FK'd table containing the WGS84 projection of a non-WGS84 geometry in the original/principle table? This could then as a proxy for serving GeoJSON to the leaflet map?",17608, 401312981,"> @RusSs Different map projections can presumably be handled on the client side using a leaflet plugin to transform the geometry (eg kartena/Proj4Leaflet) although the leaflet side would need to detect or be informed of the original projection? Well, as @simonw mentioned, GeoJSON only supports WGS84, and GeoJSON (and/or TopoJSON) is the standard we probably want to aim for. On-the-fly reprojection in spatialite is not an issue anyway, and in general I think you want to be serving stuff to web maps in WGS84 or Web Mercator.",17608, 401477622,"https://docs.python.org/3/library/json.html#json.dump > **json.dump**(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)¶ > If `allow_nan` is false (default: True), then it will be a ValueError to serialize out of range float values (nan, inf, -inf) in strict compliance of the JSON specification. If allow_nan is true, their JavaScript equivalents (NaN, Infinity, -Infinity) will be used.",17608, 401478223,"I'm not sure what the correct thing to do here is. I don't want to throw a `ValueError` when trying to render that data as JSON, but I also want to produce JSON that doesn't break when fetched by JavaScript.",17608, 402243153,"I think I'm going to return `null` in the JSON for infinity/nan values by default, but if you send `_nan=1` I will instead return invalid JSON with `Infinity` or `NaN` in it (since you have opted in to getting those and hence should be able to handle them).",17608, 403263890,Fixed: https://v0-23-2.datasette.io/fixtures-e14e080/table%2Fwith%2Fslashes.csv / https://v0-23-2.datasette.io/fixtures-e14e080/table%2Fwith%2Fslashes.csv/3,17608, 403526263,Yup that's definitely a bug.,17608, 403672561,"Tested with `datasette publish heroku fixtures.db --extra-options=""--config sql_time_limit_ms:4000""` https://blooming-anchorage-31561.herokuapp.com/-/config",17608, 403855639,I'm satisified with the improvement we got from the pip wheel cache.,17608, 403855963,This relates to #276 - I'm definitely convinced now that displaying a giant `b'...'` blob on the page is not a useful default.,17608, 403856114,Great idea.,17608, 403858949,"``` $ datasette airports.sqlite Serve! files=('airports.sqlite',) on port 8001 Usage: datasette airports.sqlite [OPTIONS] [FILES]... Error: It looks like you're trying to load a SpatiaLite database without first loading the SpatiaLite module. Read more: https://datasette.readthedocs.io/en/latest/spatialite.html ```",17608, 403863927,Here are some useful examples of other Python apps that have been packaged using the recipe described above: https://github.com/Homebrew/homebrew-core/search?utf8=%E2%9C%93&q=virtualenv_install_with_resources&type=,17608, 403865063,"Huh... from https://docs.brew.sh/Acceptable-Formulae > We frown on authors submitting their own work unless it is very popular. Marking this one as ""help wanted"" :)",17608, 403866099,"I can host a custom tap without needing to get anything accepted into homebrew-core: https://docs.brew.sh/How-to-Create-and-Maintain-a-Tap Since my principle goal here is ensuring an easy installation path for people who are familiar with `brew` but don't know how to use pip and Python 3 that could be a good option.",17608, 403868584,"I think this makes sense for the HTML view (not for JSON or CSV). It could be controlled be a new [config option](http://datasette.readthedocs.io/en/latest/config.html), `truncate_cells_html` - which is on by default but can be turned off.",17608, 403906747,"``` datasette publish now timezones.db --spatialite \ --extra-options=""--config truncate_cells_html:200"" \ --name=datasette-issue-330-demo \ --branch=master ``` https://datasette-issue-330-demo-sbelwxttfn.now.sh/timezones-3cb9f64/timezones ![2018-07-10 at 10 39 am](https://user-images.githubusercontent.com/9599/42527428-7eabc6c8-842d-11e8-91ac-5666dbc5872c.png) But https://datasette-issue-330-demo-sbelwxttfn.now.sh/timezones-3cb9f64/timezones/1 displays the full blob.",17608, 403907193,Documentation: http://datasette.readthedocs.io/en/latest/config.html#truncate-cells-html,17608, 403908704,I consider this resolved by #46 ,17608, 403909389,This is done! https://github.com/simonw/datasette-vega,17608, 403909469,This is now a dupe of https://github.com/simonw/datasette-vega/issues/4,17608, 403909671,This was fixed by https://github.com/simonw/datasette/commit/6a32684ebba89dfe882e1147b23aa8778479f5d8#diff-354f30a63fb0907d4ad57269548329e3,17608, 403910318,This would be a nice example plugin to demonstrate plugin configuration options in #231,17608, 403910774,I consider this handled by https://github.com/simonw/datasette-vega,17608, 403939399,Building this using Svelte would also produce a neat example of a plugin that uses Svelte: https://svelte.technology/guide - and if I like it I might part datasette-vega to it.,17608, 403959704,"No cookies or sessions - no POST requests in fact, Datasette just cares about GET (path and querystring) and being able to return custom HTTP headers.",17608, 403996143,Easiest way to do this I think would be to make those help blocks separate files in the docs/ directory (publish-help.txt perhaps) and then include them with a sphinx directive: https://reinout.vanrees.org/weblog/2010/12/08/include-external-in-sphinx.html,17608, 404021589,http://datasette.readthedocs.io/en/latest/publish.html,17608, 404021890,"I decided against the unit tests, instead I have a new script called `./update-docs-help.sh` which I can run any time I want to refresh the included documentation: https://github.com/simonw/datasette/commit/aec3ae53237e43b0c268dbf9b58fa265ef38cfe1#diff-cb15a1e5a244bb82ad4afce67f252543",17608, 404208602,Here's a good example of a homebrew tap: https://github.com/saulpw/homebrew-vd,17608, 404209205,"Oops, opened this in the wrong repo - moved it here: https://github.com/simonw/datasette-vega/issues/13",17608, 404338345,"It sounds like you're running into the Sanic default response timeout value of 60 seconds: https://github.com/channelcat/sanic/blob/master/docs/sanic/config.md#builtin-configuration-values For the moment you can over-ride that using an environment variable like this: SANIC_RESPONSE_TIMEOUT=6000 datasette fivethirtyeight.db -p 8008 --config sql_time_limit_ms:600000",17608, 404514973,"Okay. I reckon the latest version should have all the kinds of components you'd need: Recently added ASGI components for Routing and Static Files support, as well as making few tweaks to make sure requests and responses are instantiated efficiently. Don't have any redirect-to-slash / redirect-to-non-slash stuff out of the box yet, which it looks like you might miss.",17608, 404565566,I'm going to turn this into an issue about better supporting the above option.,17608, 404567587,Here's how plotly handled this issue: https://github.com/plotly/plotly.py/pull/203 - see also https://github.com/plotly/plotly.py/blob/213602df6c89b45ce2b811ed2591171c961408e7/plotly/utils.py#L137,17608, 404569003,And here's how django-rest-framework did it: https://github.com/encode/django-rest-framework/pull/4918/files,17608, 404574598,Since my data is all flat lists of values I don't think I need to customize the JSON encoder itself (no need to deal with nested values). I'll fix the data on its way into the encoder instead. This will also help if I decide to move to uJSON for better performance #48,17608, 404576136,Thanks for the quick reply. Looks like that is working well.,17608, 404923318,Relevant: https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-system-for-self-contained-executables/,17608, 404953877,That's a good idea. We already do this for tables - e.g. on https://fivethirtyeight.datasettes.com/fivethirtyeight-ac35616/most-common-name%2Fsurnames - so having it as an option for canned queries definitely makes sense.,17608, 404954202,"https://timezones-api.now.sh/-/metadata currently shows this: ``` { ""databases"": { ""timezones"": { ""license"": ""ODbL"", ""license_url"": ""http://opendatacommons.org/licenses/odbl/"", ""queries"": { ""by_point"": ""select tzid\nfrom\n timezones\nwhere\n within(GeomFromText(\u0027POINT(\u0027 || :longitude || \u0027 \u0027 || :latitude || \u0027)\u0027), timezones.Geometry)\n and rowid in (\n SELECT pkid FROM idx_timezones_Geometry\n where xmin \u003c :longitude\n and xmax \u003e :longitude\n and ymin \u003c :latitude\n and ymax \u003e :latitude\n )"" }, ""source"": ""timezone-boundary-builder"", ""source_url"": ""https://github.com/evansiroky/timezone-boundary-builder"", ""tables"": { ""timezones"": { ""license"": ""ODbL"", ""license_url"": ""http://opendatacommons.org/licenses/odbl/"", ""sortable_columns"": [ ""tzid"" ], ""source"": ""timezone-boundary-builder"", ""source_url"": ""https://github.com/evansiroky/timezone-boundary-builder"" } } } }, ""license"": ""ODbL"", ""license_url"": ""http://opendatacommons.org/licenses/odbl/"", ""source"": ""timezone-boundary-builder"", ""source_url"": ""https://github.com/evansiroky/timezone-boundary-builder"", ""title"": ""OpenStreetMap Time Zone Boundaries"" } ``` We could support the value part of the `""queries""` array optionally being a dictionary with the same set of metadata fields supported for a table, plus a new `""sql""` key to hold the SQL for the query. ",17608, 404954672,"So it would look like this: ``` { ""databases"": { ""timezones"": { ""license"": ""ODbL"", ""license_url"": ""http://opendatacommons.org/licenses/odbl/"", ""queries"": { ""by_point"": { ""title"": ""Timezones by point"", ""description"": ""Find the timezone for a latitude/longitude point"", ""sql"": ""select tzid\nfrom\n timezones\nwhere\n within(GeomFromText('POINT(' || :longitude || ' ' || :latitude || ')'), timezones.Geometry)\n and rowid in (\n SELECT pkid FROM idx_timezones_Geometry\n where xmin < :longitude\n and xmax > :longitude\n and ymin < :latitude\n and ymax > :latitude\n )"" } } } } } ```",17608, 405022335,"Looks like this was a red herring actually, and heroku had a blip when I was testing it...",17608, 405025731,"Fantastic, we really needed this.",17608, 405026441,This probably depends on #294.,17608, 405026800,"I had a quick look at this in relation to #343 and I feel like it might be worth modelling the inspected table metadata internally as an object rather than a dict. (We'd still have to serialise it back to JSON.) There are a few places where we rely on the structure of this metadata dict for various reasons, including in templates (and potentially also in user templates). It would be nice to have a reasonably well defined API for accessing metadata internally so that it's clearer what we're breaking.",17608, 405138460,"Demos: * https://latest.datasette.io/fixtures/neighborhood_search * https://timezones-api.now.sh/timezones/by_point Documentation: http://datasette.readthedocs.io/en/latest/sql_queries.html#canned-queries",17608, 405968983,Maybe argument should be `?_json_nan=1` since that makes it more explicitly obvious what is going on here.,17608, 405971920,"It looks like there are a few extra options we should support: https://devcenter.heroku.com/articles/heroku-cli-commands ``` -t, --team=team team to use --region=region specify region for the app to run in --space=space the private space to create the app in ``` Since these differ from the options for Zeit Now I think this means splitting up `datasette publish now` and `datasette publish Heroku` into separate subcommands.",17608, 405975025,"A `force_https_api_urls` config option would work here - if set, Datasette will ignore the incoming protocol and always use https. The `datasette deploy now` command could then add that as an option passed to `datasette serve`. This is the pattern which is producing incorrect URLs on Zeit Now, because the Sanic `request.url` property is not being correctly set. https://github.com/simonw/datasette/blob/6e37f091edec35e2706197489f54fff5d890c63c/datasette/views/table.py#L653-L655 Suggested help text: > Always use https:// for URLs output as part of Datasette API responses",17608, 405988035,"I'll add a `absolute_url(request, path)` method on the base view class which knows to check the new config option.",17608, 407109113,I still need to modify `datasette publish now` to set this config option on the instances that it deploys.,17608, 407262311,Actually SQLite doesn't handle NaN at all (it treats it as null) so I'm going to change this ticket to just deal with Infinity and -Infinity.,17608, 407262436,I'm going with `_json_infinity=1` as the querystring argument.,17608, 407262561,According to https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg110573.html you can insert Infinity/-Infinity in raw SQL (as used by our fixtures) using 1e999 and -1e999.,17608, 407267707,"Demo: * https://700d83d.datasette.io/fixtures-dcc1dbf/infinity.json - Infinity converted to Null * https://700d83d.datasette.io/fixtures-dcc1dbf/infinity.json?_json_infinity=on - invalid JSON containing `Infinity` and `-Infinity`",17608, 407267762,Documentation: http://datasette.readthedocs.io/en/latest/json_api.html#special-json-arguments,17608, 407267966,Demo: https://700d83d.datasette.io/fixtures-dcc1dbf/facetable.json?_facet=state&_size=5&_labels=on,17608, 407269243,"* No primary key => no ""object"" option: https://latest.datasette.io/fixtures-dcc1dbf/no_primary_key * Has a primary key => show ""object"" option: https://latest.datasette.io/fixtures-dcc1dbf/complex_foreign_keys * Has a next page => has ""stream all rows"" option: https://latest.datasette.io/fixtures-dcc1dbf/no_primary_key * Has foreign key references = show default-checked ""expand labels"" option: https://latest.datasette.io/fixtures-dcc1dbf/complex_foreign_keys * Does not have a next page => do not show ""stream all rows"" option: https://latest.datasette.io/fixtures-dcc1dbf/complex_foreign_keys ",17608, 407274059,Demo: https://latest.datasette.io/fixtures-dcc1dbf?sql=select+%28%27https%3A%2F%2Ftwitter.com%2F%27+%7C%7C+%27simonw%27%29+as+user_url%3B,17608, 407275996,Hopefully this will do the trick: https://github.com/simonw/datasette/commit/2bdab66772dca51b0c729b4e1063610cb2edd890,17608, 407280689,"It almost worked... but I had to fix the `docker login` command: https://github.com/simonw/datasette/commit/3a46d5e3c4278e74c3694f36995ea134bff800bc Hopefully the next release will be published correctly.",17608, 407450815,Actually I do like the idea of a unit test that reminds me if I've forgotten to update the included files.,17608, 407979065,This code now lives in https://github.com/simonw/datasette/blob/master/datasette/publish/heroku.py,17608, 407980050,Documentation: http://datasette.readthedocs.io/en/latest/plugins.html#publish-subcommand-publish,17608, 407980716,"Documentation here: http://datasette.readthedocs.io/en/latest/plugins.html#publish-subcommand-publish The best way to write a new publish plugin is to check out how the Heroku and Now default plugins are implemented: https://github.com/simonw/datasette/tree/master/datasette/publish",17608, 407983375,"Oops, forgot to commit those unit tests.",17608, 408093480,"I'm now hacking around with an initial version of this in the [starlette branch](https://github.com/simonw/datasette/tree/starlette). Here's my work in progress, deployed using `datasette publish now fixtures.db -n datasette-starlette-demo --branch=starlette --extra-options=""--asgi""` https://datasette-starlette-demo.now.sh/ Lots more work to do - the CSS isn't being served correctly for example, it's showing this error when I hit `/-/static/app.css`: ``` INFO: 127.0.0.1 - ""GET /-/static/app.css HTTP/1.1"" 200 ERROR: Exception in ASGI application Traceback (most recent call last): File ""/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py"", line 363, in run_asgi result = await asgi(self.receive, self.send) File ""/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/starlette/staticfiles.py"", line 91, in __call__ await response(receive, send) File ""/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/starlette/response.py"", line 180, in __call__ {""type"": ""http.response.body"", ""body"": chunk, ""more_body"": False} File ""/Users/simonw/Dropbox/Development/datasette/venv/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py"", line 483, in send raise RuntimeError(""Response content shorter than Content-Length"") RuntimeError: Response content shorter than Content-Length ```",17608, 408097719,It looks like that's a bug in Starlette - filed here: https://github.com/encode/starlette/issues/32,17608, 408105251,"Tom shipped my fix for that bug already, so https://datasette-starlette-demo.now.sh/ is now serving CSS!",17608, 408478935,"Refs https://github.com/encode/uvicorn/issues/168",17608, 408581551,New documentation is now online here: https://datasette.readthedocs.io/en/latest/pages.html,17608, 409087501,Parent ticket: #354,17608, 409087871,"I started playing with this in the `m2m` branch - work so far: https://github.com/simonw/datasette/compare/295d005ca48747faf046ed30c3c61e7563c61ed2...af4ce463e7518f9d7828b846efd5b528a1905eca Here's a demo: https://datasette-m2m-work-in-progress.now.sh/russian-ads-e8e09e2/ads?_m2m_ad_targets__target_id=ec3ac&_m2m_ad_targets__target_id=e128e",17608, 409088967,"Here's the query I'm playing with for facet counts: https://datasette-m2m-work-in-progress.now.sh/russian-ads-e8e09e2?sql=select+target_id%2C+count%28*%29+as+n+from+ad_targets%0D%0Awhere%0D%0A++target_id+not+in+%28%22ec3ac%22%2C+%22e128e%22%29%0D%0A++and+ad_id+in+%28select+ad_id+from+ad_targets+where+target_id+%3D+%22ec3ac%22%29%0D%0A++and+ad_id+in+%28select+ad_id+from+ad_targets+where+target_id+%3D+%22e128e%22%29%0D%0Agroup+by+target_id+order+by+n+desc%3B ``` select target_id, count(*) as n from ad_targets where target_id not in (""ec3ac"", ""e128e"") and ad_id in (select ad_id from ad_targets where target_id = ""ec3ac"") and ad_id in (select ad_id from ad_targets where target_id = ""e128e"") group by target_id order by n desc; ```",17608, 409715112,The hook is currently only used on the custom SQL results page - it needs to run on table/view pages as well.,17608, 410485995,"First plugin using this hook: https://github.com/simonw/datasette-json-html Hook documentation: http://datasette.readthedocs.io/en/latest/plugins.html#render-cell-value",17608, 410580202,I used `datasette-json-html` to build this: https://russian-ira-facebook-ads-datasette-whmbonekoj.now.sh/russian-ads-919cbfd/display_ads,17608, 410818501,Another potential use-case for this hook: loading metadata via a URL,17608, 412290986,This was fixed in https://github.com/simonw/datasette/commit/89d9fbb91bfc0dd9091b34dbf3cf540ab849cc44,17608, 412291327,"Potential problem: the existing `metadata.json` format looks like this: ``` { ""title"": ""Custom title for your index page"", ""description"": ""Some description text can go here"", ""license"": ""ODbL"", ""license_url"": ""https://opendatacommons.org/licenses/odbl/"", ""databases"": { ""database1"": { ""source"": ""Alternative source"", ""source_url"": ""http://example.com/"", ""tables"": { ""example_table"": { ""description_html"": ""Custom table description"", ""license"": ""CC BY 3.0 US"", ""license_url"": ""https://creativecommons.org/licenses/by/3.0/us/"" } } } } } ``` This doesn't make sense for metadata that is bundled with a specific database - there's no point in having the `databases` key, instead the content of `database1` in the above example should be at the top level. This also means that if you rename the `*.db` file you won't have to edit its metadata at the same time. Calling such an embedded file `metadata.json` when the shape is different could be confusing. Maybe call it `database-metadata.json` instead.",17608, 412291395,"I'm going to separate the issue of enabling and disabling plugins from the existence of the `plugins` key. The format will simply be: ``` { ""plugins"": { ""name-of-plugin"": { ... any structures you like go here, defined by the plugin ... } } } ```",17608, 412291437,"On further thought, I'd much rather implement this using some kind of metadata plugin hook - see #357",17608, 412299013,"I've been worrying about how this one relates to #260 - I'd like to validate metadata (to help protect against people e.g. misspelling `license_url` and then being confused when their license isn't displayed properly), but this issue requests the ability to add arbitrary additional keys to the metadata structure. I think the solution is to introduce a metadata key called `extra_metadata_keys` which allows you to specifically list the extra keys that you want to enable. Something like this: ``` { ""title"": ""My title"", ""source"": ""Source"", ""source_url"": ""https://www.example.com/"", ""release_date"": ""2018-04-01"", ""extra_metadata_keys"": [""release_date""] } ``` ",17608, 412356537,"Example table: https://latest-code.datasette.io/code/definitions Here's a query that does facet counting against that column: https://latest-code.datasette.io/code-a26fa3c?sql=select+count%28*%29+as+n%2C+j.value+from+definitions+join+json_each%28params%29+j+group+by+j.value+order+by+n+desc%3B ``` select count(*) as n, j.value from definitions join json_each(params) j group by j.value order by n desc; ```",17608, 412356746,"And here's the query for pulling back every record tagged with a specific tag: https://latest-code.datasette.io/code-a26fa3c?sql=select+*+from+definitions+where+rowid+in+%28%0D%0A++select+definitions.rowid%0D%0A++from+definitions+join+json_each%28params%29+j%0D%0A++where+j.value+%3D+%3Atag%0D%0A%29&tag=filename ``` select * from definitions where rowid in ( select definitions.rowid from definitions join json_each(params) j where j.value = :tag ) ```",17608, 412357691,"Note that there doesn't seem to be a way to use indexes (even [indexes on expressions](https://www.sqlite.org/expridx.html)) to speed these up, so this will only ever be effective on smaller data sets, probably in the 10,000-100,000 range. Datasette is often used with smaller data sets so this is still worth pursuing.",17608, 412663658,That seems good to me.,17608, 413386332,Relevant: https://github.com/coleifer/pysqlite3/issues/2,17608, 413387424,"I deployed a working demo of this here: https://pysqlite3-datasette.now.sh I used this command to deploy it: datasette publish now \ fixtures.db fivethirtyeight.db \ --branch=pysqlite3 \ --install=https://github.com/karlb/pysqlite3/archive/master.zip \ -n pysqlite3-datasette https://pysqlite3-datasette.now.sh/-/versions confirms version of SQLite is `3.25.0`",17608, 413396812,"Now that this has merged into master the command for deploying it can use `--branch=master` instead: datasette publish now \ fixtures.db fivethirtyeight.db \ --branch=master \ --install=https://github.com/karlb/pysqlite3/archive/master.zip \ -n pysqlite3-datasette ",17608, 414860009,"Looks to me like hashing, redirects and caching were documented as part of https://github.com/simonw/datasette/commit/788a542d3c739da5207db7d1fb91789603cdd336#diff-3021b0e065dce289c34c3b49b3952a07 - so perhaps this can be closed? :tada:",17608, 416659043,Closed in https://github.com/simonw/datasette/commit/0bd41d4cb0a42d7d2baf8b49675418d1482ae39b,17608, 416667565,https://b7257a2.datasette.io/-/plugins is now correctly returning an empty list.,17608, 416727898,"Are you talking about these filters here? ![2018-08-28 at 9 22 pm](https://user-images.githubusercontent.com/9599/44748784-8688cb00-ab08-11e8-8baf-ace2e04e181f.png) I haven't thought much about how those could be made more usable - right now they basically expose all available options, but customizing them for particular use-cases is certainly an interesting potential space. Could you sketch out a bit more about how your ideal interface here would work?",17608, 417684877,"It looks like the check passed, not sure why it's showing as running in GH.",17608, 418106781,Now that I've split the heroku command out into a separate default plugin this is a much easier thing to add: https://github.com/simonw/datasette/blob/master/datasette/publish/heroku.py,17608, 418695115,"Some notes: * Starlette just got a bump to 0.3.0 - there's some renamings in there. It's got enough functionality now that you can treat it either as a framework or as a toolkit. Either way the component design is all just *here's an ASGI app* all the way through. * Uvicorn got a bump to 0.3.3 - Removed some cyclical references that were causing garbage collection to impact performance. Ought to be a decent speed bump. * Wrt. passing config - Either use a single envvar that points to a config, or use multiple envvars for the config. Uvicorn could get a flag to read a `.env` file, but I don't see ASGI itself having a specific interface there.",17608, 420295524,I close this PR because it's better to use the new one #364 ,17608, 422821483,"I'm using the docker image (0.23.2) and notice some differences/bugs between the docs and the published version with canned queries. (submitted a tiny doc fix also) I was able to build the docker container locally using `master` and I'm using that for now. Would it be possible to manually push 0.24 to DockerHub until the TravisCI stuff is fixed? I would like to run this in our Kubernetes cluster but don't want to publish a version in our internal registry if I don't have to. Thanks!",17608, 422885014,Thanks!,17608, 422903031,"The new 0.25 release has been successfully pushed to Docker Hub! https://hub.docker.com/r/datasetteproject/datasette/tags/ One catch: it looks like it didn't update the ""latest"" tag to point at it. Looking into that now.",17608, 422908130,"I fixed that by running the following on my laptop: $ docker pull datasetteproject/datasette:0.25 $ docker tag datasetteproject/datasette:0.25 datasetteproject/datasette:latest $ docker push datasetteproject/datasette The `latest` tag now points to the most recent release.",17608, 422915450,"That works for me. Was able to pull the public image and no errors on my canned query. (~although a small rendering bug. I'll create an issue and if I have time today, a PR to fix~ this turned out to be my error.) Thanks for the quick response!",17608, 423543060,"I keep on finding new reasons that I want this. The latest is that I'm playing with the more advanced features of FTS5 - in particular the highlight() function and the ability to sort by rank. The problem is... in order to do this, I need to properly join against the `_fts` table. Here's an example query: select highlight(events_fts, 0, '', ''), events_fts.rank, events.* from events join events_fts on events.rowid = events_fts.rowid where events_fts match :search order by rank Note that this is a different query from the usual FTS one (which does `where rowid in (select rowid from events_fts...)`) because I need the rank column somewhere I can sort against. I'd like to be able to use this on the table view page so I can get faceting etc for free, but this is a completely different query from the default. Maybe I need a way to customize the entire query? That feels weird though - why am I not using a view in that case? Answer: because views can't accept `:search` style parameters. I could use a canned query, but canned queries don't get faceting etc.",17608, 427261369,"``` ~ $ docker pull datasetteproject/datasette ~ $ docker run -p 8001:8001 -v `pwd`:/mnt datasetteproject/datasette datasette -p 8001 -h 0.0.0.0 /mnt/fixtures.db Usage: datasette -p [OPTIONS] [FILES]... Error: Invalid value for ""files"": Path ""/mnt/fixtures.db"" does not exist. ```",17608, 427943710,"I have same error: ``` Collecting uvloop Using cached https://files.pythonhosted.org/packages/5c/37/6daa39aac42b2deda6ee77f408bec0419b600e27b89b374b0d440af32b10/uvloop-0.11.2.tar.gz Complete output from command python setup.py egg_info: Traceback (most recent call last): File """", line 1, in File ""C:\Users\sageev\AppData\Local\Temp\pip-install-bq64l8jy\uvloop\setup.py"", line 15, in raise RuntimeError('uvloop does not support Windows at the moment') RuntimeError: uvloop does not support Windows at the moment ```",17608, 429737929,"Very hacky solution is to write now.json file forcing the usage of v1 of Zeit cloud, see https://github.com/slygent/datasette/commit/3ab824793ec6534b6dd87078aa46b11c4fa78ea3 This does work, at least.",17608, 431867885,"I'd like this as well. It would let me access Datasette-driven projects from GatsbyJS the same way I can access Postgres DBs via Hasura. While I don't see SQLite replacing Postgres for the 50m row datasets I sometimes have to work with, there's a whole class of smaller datasets that are great with Datasette but currently would find another option.",17608, 433680598,I've just started running into this as well. Looks like I'll have to anchor to v1 for the moment - I'm hoping the discussion on https://github.com/zeit/now-cli/issues/1523 encourages an increase in this limit policy :/,17608, 435767775,"This would be fantastic - that tutorial looks like many of the details needed for this. Do you know if Digital Ocean have the ability to provision URLs for a droplet without you needing to buy your own domain name? Heroku have https://example.herokuapp.com/ and Zeit have https://blah.now.sh/ - does Digital Ocean have an equivalent? ",17608, 435767827,"This is a good idea. Basically a version of this bug but on the custom SQL query page: ![2018-11-04 at 10 28 pm](https://user-images.githubusercontent.com/9599/47981499-fd9a8c80-e080-11e8-9c59-00e626d3aa4c.png) ",17608, 435768450,"That would be ideal, but you know better than me whether the CSV streaming trick works for custom SQL queries.",17608, 435772031,"This works now! The `0.25.1` release was the first release which successfully pushed to Docker Hub: https://hub.docker.com/r/datasetteproject/datasette/tags/ ![2018-11-04 at 10 53 pm](https://user-images.githubusercontent.com/9599/47982395-70593700-e084-11e8-8870-9100677c2bde.png) Here's the log from the successful Travis release job: https://travis-ci.org/simonw/datasette/jobs/450714602 ",17608, 435862009,I think you need to register a domain name you own separately in order to get a non-IP address address? https://www.digitalocean.com/docs/networking/dns/,17608, 435974786,"I've been thinking a bit about ways of using Jupyter Notebook more effectively with Datasette (thinks like a `publish_dataframes(df1, df2, df3)` function which publishes some Pandas dataframes and returns you a URL to a new hosted Datasette instance) but you're right, Jupyter Lab is potentially a much more interesting fit.",17608, 435976262,"I think there is a useful way forward here though: the image size may be limited to 100MB, but once the instance launches it gets access to a filesystem with a lot more space than that (possibly as much as 15GB given my initial poking around). So... one potential solution here is to teach Datasette to launch from a smaller image and then download a larger SQLite file from a known URL as part of its initial startup. Combined with the ability to get Now to always run at least one copy of an instance this could allow Datasette to host much larger SQLite databases on that platform while playing nicely with the Zeit v2 platform. See also https://github.com/zeit/now-cli/issues/1523",17608, 436037692,"In terms of integration with `pandas`, I was pondering two different ways `datasette`/`csvs_to_sqlite` integration may work: - like [`pandasql`](https://github.com/yhat/pandasql), to provide a SQL query layer either by a direct connection to the sqlite db or via `datasette` API; - as an improvement of `pandas.to_sql()`, which is a bit ropey (e.g. `pandas.to_sql_from_csvs()`, routing the dataframe to sqlite via `csvs_tosqlite` rather than the dodgy mapping that `pandas` supports). The `pandas.publish_*` idea could be quite interesting though... Would it be useful/fruitful to think about `publish_` as a complement to [`pandas.to_`](https://pandas.pydata.org/pandas-docs/stable/api.html#id12)?",17608, 436042445,"Another route would be something like creating a `datasette` IPython magic for notebooks to take a dataframe and easily render it as a `datasette`. You'd need to run the app in the background rather than block execution in the notebook. Related to that, or to publishing a dataframe in notebook cell for use in other cells in a non-blocking way, there may be cribs in something like https://github.com/micahscopes/nbmultitask .",17608, 439194286,I'm diving back into https://salaries.news.baltimoresun.com and what I really want is the ability to inject the request into my context.,17608, 439421164,This would be an awesome feature ❤️ ,17608, 439762759,"It turned out Zeit didn't end up shipping the new 100MB-limit Docker-based Zeit 2.0 after all - they ended up going in a completely different direction, towards lambdas instead (which don't really fit the Datasette model): https://zeit.co/blog/now-2 But... as far as I can tell they have introduced the 100MB image size for all free Zeit accounts ever against their 1.0 platform. So we still need to solve this, or free Zeit users won't be able to use `datasette publish now` even while 1.0 is still available. I made some notes on this here: https://simonwillison.net/2018/Nov/19/smaller-python-docker-images/ I've got it working for the Datasette Publish webapp, but I still need to fix `datasette publish now` to create much smaller patterns. I know how to do this for regular datasette, but I haven't yet figured out an Alpine Linux pattern for spatialite extras: https://github.com/simonw/datasette/blob/5e3a432a0caa23837fa58134f69e2f82e4f632a6/datasette/utils.py#L287-L300",17608, 439763196,This looks like it might be a recipe for spatialite Python on Alpine Linux: https://github.com/bentrm/geopython/blob/8e52062d9545f4b7c1f04a3516354a5a9155e31f/Dockerfile,17608, 439763268,Another example that might be useful: https://github.com/poc-flask/alpine/blob/8e9f48a2351e106347dab36d08cf21dee865993e/Dockerfile,17608, 440128762,"The problem is Sanic. Here's the error I'm getting: ``` (venv) datasette $ pytest -x ============================================================= test session starts ============================================================== platform darwin -- Python 3.7.1, pytest-4.0.0, py-1.7.0, pluggy-0.8.0 rootdir: /Users/simonw/Dropbox/Development/datasette, inifile: collected 258 items tests/test_api.py ...................F =================================================================== FAILURES =================================================================== _______________________________________________________ test_table_with_slashes_in_name ________________________________________________________ app_client = def test_table_with_slashes_in_name(app_client): response = app_client.get('/fixtures/table%2Fwith%2Fslashes.csv?_shape=objects&_format=json') > assert response.status == 200 E AssertionError: assert 404 == 200 ``` That's because something about how Sanic handles escape characters in URLs changed between 0.7.0 and 0.8.3.",17608, 447677798,Thanks for spotting this!,17608, 448437245,"Closing this as Zeit went on a different direction with Now v2, so the 100MB limit is no longer a concern.",17608, 450943172,"Definitely a bug, thanks.",17608, 450943632,"This is the code which is meant to add those options as hidden form fields: https://github.com/simonw/datasette/blob/fe5b6ea95a973534fe8a44907c0ea2449aae7602/datasette/templates/table.html#L150-L155 It's clearly not working. Need to fix this and add a corresponding unit test.",17608, 450944166,"Here's the test that needs updating: https://github.com/simonw/datasette/blob/8b8ae55e7c8b9e1dceef53f55a330b596ca44d41/tests/test_html.py#L427-L435",17608, 450964512,"Thanks, I've fixed this. I had to re-alias it against now: ``` ~ $ now alias google-trends-pnwhfwvgqf.now.sh https://google-trends.datasettes.com/ > Assigning alias google-trends.datasettes.com to deployment google-trends-pnwhfwvgqf.now.sh > Certificate for google-trends.datasettes.com (cert_uXaADIuNooHS3tZ) created [18s] > Success! google-trends.datasettes.com now points to google-trends-pnwhfwvgqf.now.sh [20s] ```",17608, 451046123,The fix was released as part of Datasette 0.26 - you can see the fix working here: https://v0-26.datasette.io/fixtures-dd88475/facetable?_facet=planet_int&planet_int=1#export,17608, 451047426,https://fivethirtyeight.datasettes.com/-/versions is now running 0.26 - so your initial bug demo is now fixed: https://fivethirtyeight.datasettes.com/fivethirtyeight-c300360/classic-rock%2Fclassic-rock-song-list?Release+Year__exact=1989#export,17608, 451415063,Awesome - will get myself up and running on 0.26,17608, 451704724,"I found a really nice pattern for writing the unit tests for this (though it would look even nicer with a solution to #395) ```python @pytest.mark.parametrize(""prefix"", [""/prefix/"", ""https://example.com/""]) @pytest.mark.parametrize(""path"", [ ""/"", ""/fixtures"", ""/fixtures/compound_three_primary_keys"", ""/fixtures/compound_three_primary_keys/a,a,a"", ""/fixtures/paginated_view"", ]) def test_url_prefix_config(prefix, path): for client in make_app_client(config={ ""url_prefix"": prefix, }): response = client.get(path) soup = Soup(response.body, ""html.parser"") for a in soup.findAll(""a""): href = a[""href""] if href not in { ""https://github.com/simonw/datasette"", ""https://github.com/simonw/datasette/blob/master/LICENSE"", ""https://github.com/simonw/datasette/blob/master/tests/fixtures.py"", }: assert href.startswith(prefix), (href, a.parent) ```",17608, 453251589,"What version of SQLite are you seeing in Datasette? You can tell by hitting http://localhost:8001/-/versions - e.g. here: https://latest.datasette.io/-/versions My best guess is that your Python SQLite module is running an older version that doesn't support window functions. One way you can fix that is with the `pysqlite3` module - try running this in your virtual environment: pip install git+git://github.com/karlb/pysqlite3 That's using a fork of the official module that embeds a full recent SQLite. See this issue thread for more details: https://github.com/coleifer/pysqlite3/issues/2",17608, 453252024,"Oh I just saw you're using the official Datasette docker package - yeah, that's not bundled with a recent SQLite at the moment. We should update that: https://github.com/simonw/datasette/blob/5b026115126bedbb66457767e169139146d1c9fd/Dockerfile#L9-L11",17608, 453262703,It turns out this was much easier to support than I expected: https://github.com/simonw/datasette/commit/eac08f0dfc61a99e8887442fc247656d419c76f8,17608, 453324601,Demo: https://latest.datasette.io/-/versions,17608, 453330680,"If you pull [the latest image](https://hub.docker.com/r/datasetteproject/datasette) you should get the right SQLite version now: docker pull datasetteproject/datasette docker run -p 8001:8001 \ datasetteproject/datasette \ datasette -p 8001 -h 0.0.0.0 http://0.0.0.0:8001/-/versions now gives me: ``` ""version"": ""3.26.0"" ```",17608, 453795040,I'm really excited about this - it looks like it could be a great plugin.,17608, 453874429,"It looks like there are two reasons for this: - The `.git` directory was listed in `.dockerignore` so it wasn't being copied into the build process - The docker build stage wasn't installing the `git` executable, so it couldn't read the current version ",17608, 453876023,"``` docker pull datasetteproject/datasette docker run -p 8001:8001 datasetteproject/datasette datasette -p 8001 -h 0.0.0.0 ``` http://0.0.0.0:8001/-/versions now returns: ``` { ""datasette"": { ""version"": ""0.26.2+0.ga418c8b.dirty"" }, ``` I'm not sure why it's showing `.dirty` there. ",17608, 455223551,"It's new in SQLite 3.26.0 so I will need to figure out how to only apply it in that version or higher. https://sqlite.org/releaselog/3_26_0.html",17608, 455224327,https://sqlite.org/security.html has other recommmendations for apps that accept SQLite files from untrusted sources that we should apply.,17608, 455230501,"Datasette-cluster-map doesn't use the new plugin configuration mechanism yet - it really should! The best example of how to use this mechanism right now is embedded in the Datasette unit tests: https://github.com/simonw/datasette/blob/b7257a21bf3dfa7353980f343c83a616da44daa7/tests/fixtures.py#L266-L270 https://github.com/simonw/datasette/blob/b7257a21bf3dfa7353980f343c83a616da44daa7/tests/test_plugins.py#L139-L145",17608, 455231411,Unfortunately it looks like there isn't currently a mechanism in the Python sqlite3 library for setting configuration flags like SQLITE_DBCONFIG_DEFENSIVE,17608, 455445069,I've released a new version of the datasette-cluster-map plugin to illustrate how plugin configuration can work: https://github.com/simonw/datasette-cluster-map/commit/fcc86c450e3df3e6b81c41f31df458923181527a,17608, 455445392,"I talk about that a bit here: https://simonwillison.net/2018/Oct/4/datasette-ideas/#Bundling_the_data_with_the_code One of the key ideas behind Datasette is that if your data is read-only you can package it up with the rest of your code - so the normal limitations that apply with hosting services like now.sh no longer prevent you from including a database. The SQLite database is just another static binary file that gets packaged up as part of your deployment.",17608, 455520561,"Thanks. I'll take a look at your changes. I must admit I was struggling to see how to pass info from the python code in __init__.py into the javascript document.addEventListener function.",17608, 455752238,Ah. That makes much more sense. Interesting approach.,17608, 457975075,Implemented in https://github.com/simonw/datasette/commit/b5dd83981a7dbff571284d4d90a950c740245b05,17608, 457975857,"Demo: https://latest.datasette.io/fixtures-dd88475/facetable.json?_shape=array&_nl=on Also https://b5dd839.datasette.io/fixtures-dd88475/facetable.json?_shape=array&_nl=on",17608, 457976864,"This failed in Python 3.5: ``` File ""/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/jinja2/environment.py"", line 1020, in render_async raise NotImplementedError('This feature is not available for this ' NotImplementedError: This feature is not available for this version of Python ``` It looks like this is caused by this feature detection code: https://github.com/pallets/jinja/blob/a7ba0b637805c53d442e975e3864d3ea38d8743f/jinja2/utils.py#L633-L638",17608, 457978729,Will need to solve #7 for this to become truly efficient.,17608, 457980966,"Remember to remove this TODO (and turn the `[]` into `()` on this line) as part of this task: https://github.com/simonw/sqlite-utils/blob/5309c5c7755818323a0f5353bad0de98ecc866be/sqlite_utils/cli.py#L78-L80",17608, 458011885,Re-opening for the second bit involving the cli tool.,17608, 458011906,"I tested this with a script called `churn_em_out.py` ``` i = 0 while True: i += 1 print( '{""id"": I, ""another"": ""row"", ""number"": J}'.replace(""I"", str(i)).replace( ""J"", str(i + 1) ) ) ``` Then I ran this: ``` python churn_em_out.py | \ sqlite-utils insert /tmp/getbig.db stats - \ --nl --batch-size=10000 ``` And used `watch 'ls -lah /tmp/getbig.db'` to watch the file growing as it had 10,000 lines of junk committed in batches. The memory used by the process never grew about around 50MB.",17608, 459915995,"Do you have any simple working examples of how to use `--static`? Inspection of default served files suggests locations such as `http://example.com/-/static/app.css?0e06ee`. If `datasette` is being proxied to `http://example.com/foo/datasette`, what form should arguments to `--static` take so that static files are correctly referenced? Use case is here: https://github.com/psychemedia/jupyterserverproxy-datasette-demo Trying to do a really simple `datasette` demo in MyBinder using jupyter-server-proxy.",17608, 460897973,This helped my figure out what to do: https://github.com/heroku/heroku-builds/issues/36,17608, 460901857,"I'd really like to use the content-length header here, but Sanic hasn't yet fixed the bug I filed about it: https://github.com/huge-success/sanic/issues/1194",17608, 460902824,"Demo: https://latest.datasette.io/fixtures-dd88475 ",17608, 463917744,is this supported or not? you can comment if it is not supported so that people like me can stop trying.,17608, 464341721,We also get an error if a column name contains a `.`,17608, 466325528,"I ran into the same issue when trying to install datasette on windows after successfully using it on linux. Unfortunately, there has not been any progress in implementing uvloop for windows - so I recommend not to use it on win. You can read about this issue here: [https://github.com/MagicStack/uvloop/issues/14](url)",17608, 466695500,"Fixed in https://github.com/simonw/sqlite-utils/commit/228d595f7d10994f34e948888093c2cd290267c4 ",17608, 466695672,"Rough sketch: ``` +try: + import numpy +except ImportError: + numpy = None + Column = namedtuple( ""Column"", (""cid"", ""name"", ""type"", ""notnull"", ""default_value"", ""is_pk"") ) @@ -70,6 +79,22 @@ class Database: datetime.time: ""TEXT"", None.__class__: ""TEXT"", } + # If numpy is available, add more types + if numpy: + col_type_mapping.update({ + numpy.int8: ""INTEGER"", + numpy.int16: ""INTEGER"", + numpy.int32: ""INTEGER"", + numpy.int64: ""INTEGER"", + numpy.uint8: ""INTEGER"", + numpy.uint16: ""INTEGER"", + numpy.uint32: ""INTEGER"", + numpy.uint64: ""INTEGER"", + numpy.float16: ""FLOAT"", + numpy.float32: ""FLOAT"", + numpy.float64: ""FLOAT"", + numpy.float128: ""FLOAT"", + }) ```",17608, 466695695,Need to test this both with and without `numpy` installed.,17608, 466732039,"Example: http://api.nobelprize.org/v1/laureate.json This includes affiliations which look like this: ""affiliations"": [ { ""name"": ""Sorbonne University"", ""city"": ""Paris"", ""country"": ""France"" } ]",17608, 466794069,"This was fixed by https://github.com/simonw/sqlite-utils/commit/228d595f7d10994f34e948888093c2cd290267c4 - see also #8 ``` >>> db = sqlite_utils.Database("":memory:"") >>> dfX=pd.DataFrame({'order':range(3),'col2':range(3)}) >>> db[""test""].upsert_all(dfX.to_dict(orient='records')) ```",17608, 466794369,"https://www.sqlite.org/lang_createindex.html ![image](https://user-images.githubusercontent.com/9599/53302378-72512c80-3812-11e9-8828-46a03d893879.png) May as well support ``--if-not-exists`` as well.",17608, 466800090,"The `WHERE` clause can be used to create partial indexes: https://www.sqlite.org/partialindex.html I'm going to ignore it for the moment.",17608, 466800210,Likewise I'm going to ignore indexes on expressions (as opposed to just columns): https://www.sqlite.org/expridx.html,17608, 466807308,"Python API: db[""articles""].add_foreign_key(""author_id"", ""authors"", ""id"") CLI: $ sqlite-utils add-foreign-key articles author_id authors id ",17608, 466820167,"It looks like the type information isn't actually used for anything at all, so this: https://github.com/simonw/sqlite-utils/blob/f8d3b7cfe5c1950b0749d40eb2640df50b52f651/tests/test_create.py#L97-L103 Could actually be written like this: ``` fresh_db[""m2m""].insert( {""one_id"": 1, ""two_id"": 1}, foreign_keys=( (""one_id"", ""one"", ""id""), (""two_id"", ""two"", ""id""), ), ) ``` ",17608, 466820188,Sanity checking those foreign keys would be worthwhile.,17608, 466821200,"This involves a breaking API change. I need to call that out in the README and also fix my two other projects which use the old four-tuple version of `foreign_keys=`: https://github.com/simonw/db-to-sqlite/blob/c2f8e93bc6bbdfd135de3656ea0f497859ae49ff/db_to_sqlite/cli.py#L30-L42 And https://github.com/simonw/russian-ira-facebook-ads-datasette/blob/e7106710abdd7bdcae035bedd8bdaba75ae56a12/fetch_and_build_russian_ads.py#L71-L74 I'll also need to set a minimum version for `sqlite-utils` in the `db-to-sqlite` setup.py: https://github.com/simonw/db-to-sqlite/blob/c2f8e93bc6bbdfd135de3656ea0f497859ae49ff/setup.py#L25",17608, 466823422,Re-opening this until I've fixed the other two projects.,17608, 466827533,Need to put out a new release of `sqlite-utils` so `db-to-sqlite` can depend on it.,17608, 466828503,Released: https://sqlite-utils.readthedocs.io/en/latest/changelog.html#v0-14,17608, 466830869,Both projects have been upgraded.,17608, 467264937,I'm working on a port of Datasette to Starlette which I think would fix this issue: https://github.com/encode/starlette,17608, 472844001,It seems this affects the Datasette Publish -site as well: https://github.com/simonw/datasette-publish-support/issues/3,17608, 472875713,also linking this zeit issue in case it is helpful: https://github.com/zeit/now-examples/issues/163#issuecomment-440125769,17608, 473154643,"Deployed a demo: https://datasette-optional-hash-demo.now.sh/ datasette publish now \ ../demo-databses/russian-ads.db \ ../demo-databses/polar-bears.db \ --branch=optional-hash \ -n datasette-optional-hash \ --alias datasette-optional-hash-demo \ --install=datasette-cluster-map \ --install=datasette-json-html ",17608, 473156513,"Still TODO: need to figure out what to do about cache TTL. Defaulting to 365 days no longer makes sense without the hash_urls setting. Maybe drop that setting default to 0? Here's the setting: https://github.com/simonw/datasette/blob/9743e1d91b5f0a2b3c1c0bd6ffce8739341f43c4/datasette/app.py#L84-L86 And here's where it takes affect: https://github.com/simonw/datasette/blob/4462a5ab2817ac0d9ffe20dafbbf27c5c5b81466/datasette/views/base.py#L491-L501",17608, 473156774,"This has been bothering me as well, especially when I try to install `datasette` and `sqlite-utils` at the same time.",17608, 473156905,"Have you tried this? MakePoint(:Long || "", "" || :Lat) ",17608, 473157770,"Interesting idea. I can see how this would make sense if you are dealing with really long SQL queries. My own example of a long query that might benefit from this: https://russian-ads-demo.herokuapp.com/russian-ads-a42c4e8?sql=select%0D%0A++++target_id%2C%0D%0A++++targets.name%2C%0D%0A++++count(*)+as+n%2C%0D%0A++++json_object(%0D%0A++++++++%22href%22%2C+%22%2Frussian-ads%2Ffaceted-targets%3Ftargets%3D%22+||+%0D%0A++++++++++++json_insert(%3Atargets%2C+%27%24[%27+||+json_array_length(%3Atargets)+||+%27]%27%2C+target_id)%0D%0A++++++++%2C%0D%0A++++++++%22label%22%2C+json_insert(%3Atargets%2C+%27%24[%27+||+json_array_length(%3Atargets)+||+%27]%27%2C+target_id)%0D%0A++++)+as+apply_this_facet%2C%0D%0A++++json_object(%0D%0A++++++++%22href%22%2C+%22%2Frussian-ads%2Fdisplay_ads%3F_targets_json%3D%22+||+%0D%0A++++++++++++json_insert(%3Atargets%2C+%27%24[%27+||+json_array_length(%3Atargets)+||+%27]%27%2C+target_id)%0D%0A++++++++%2C%0D%0A++++++++%22label%22%2C+%22See+%22+||+count(*)+||+%22+ads+matching+%22+||+json_insert(%3Atargets%2C+%27%24[%27+||+json_array_length(%3Atargets)+||+%27]%27%2C+target_id)%0D%0A++++)+as+browse_these_ads%0D%0Afrom+ad_targets%0D%0Ajoin+targets+on+ad_targets.target_id+%3D+targets.id%0D%0Awhere%0D%0A++++json_array_length(%3Atargets)+%3D%3D+0+or%0D%0A++++ad_id+in+(%0D%0A++++++++select+ad_id%0D%0A++++++++from+%22ad_targets%22%0D%0A++++++++where+%22ad_targets%22.target_id+in+(select+value+from+json_each(%3Atargets))%0D%0A++++++++group+by+%22ad_targets%22.ad_id%0D%0A++++++++having+count(distinct+%22ad_targets%22.target_id)+%3D+json_array_length(%3Atargets)%0D%0A++++)%0D%0A++++and+target_id+not+in+(select+value+from+json_each(%3Atargets))%0D%0Agroup+by%0D%0A++++target_id+order+by+n+desc%0D%0A&targets=[%22e6200%22] Having a `show/hide` link would be an easy way to support this in the UI, and those could add/remove a `_hide_sql=1` parameter.",17608, 473158506,"I've been thinking about how Datasette instances could query each other for a while - it's a really interesting direction. There are some tricky problems to solve to get this to work. There's a SQLite mechanism called ""virtual table functions"" which can implement things like this, but it's not supported by Python's `sqlite3` module out of the box. https://github.com/coleifer/sqlite-vtfunc is a library that enables this feature. I experimented with using that to implement a function that scrapes HTML content (with an eye to accessing data from other APIs and Datasette instances) a while ago: https://github.com/coleifer/sqlite-vtfunc/issues/6 The bigger challenge is how to get this kind of thing to behave well within a Python 3 async environment. I have some ideas here but they're going to require some very crafty engineering.",17608, 473159679,"Also: if the option is False and the user visits a URL with a hash in it, should we redirect them? I'm inclined to say no: furthermore, I'd be OK continuing to serve a far-future cache header for that case.",17608, 473160476,Thanks!,17608, 473160702,This also needs extensive tests to ensure that with the option turned on all of the redirects behave as they should.,17608, 473164038,"Demo: https://latest.datasette.io/fixtures-dd88475?sql=select+%2A+from+sortable+order+by+pk1%2C+pk2+limit+101 v.s. https://latest.datasette.io/fixtures-dd88475?sql=select+%2A+from+sortable+order+by+pk1%2C+pk2+limit+101&_hide_sql=1 ",17608, 473217334,"Awesome, thanks! 😁 ",17608, 473308631,"This would allow Datasette to be easily used as a ""data library"" (like a data warehouse but less expectation of big data querying technology such as Presto). One of the things I learned at the NICAR CAR 2019 conference in Newport Beach is that there is a very real need for some kind of easily accessible data library at most newsrooms.",17608, 473310026,See #418 ,17608, 473312514,"A neat ability of Datasette Library would be if it can work against other files that have been dropped into the folder. In particular: if a user drops a CSV file into the folder, how about automatically converting that CSV file to SQLite using [sqlite-utils](https://github.com/simonw/sqlite-utils)?",17608, 473313975,"I'm reopening this one as part of #417. Further experience with Python's CSV standard library module has convinced me that pandas is not a required dependency for this. My [sqlite-utils](https://github.com/simonw/sqlite-utils) package can do most of the work here with very few dependencies.",17608, 473323329,"How would Datasette accepting URLs work? I want to support not just SQLite files and CSVs but other extensible formats (geojson, Atom, shapefiles etc) as well. So `datasette serve` needs to be able to take filepaths or URLs to a variety of different content types. If it's a URL, we can use the first 200 downloaded bytes to decide which type of file it is. This is likely more reliable than hoping the web server provided the correct content-type. Also: let's have a threshold for downloading to disk. We will start downloading to a temp file (location controlled by an environment variable) if either the content length header is above that threshold OR we hit that much data cached in memory already and don't know how much more is still to come. There needs to be a command line option for saying ""grab from this URL but force treat it as CSV"" - same thing for files on disk. datasette mydb.db --type=db http://blah/blah --type=csv If you provide less `--type` options thatn you did URLs then the default behavior is used for all of the subsequent URLs. Auto detection could be tricky. Probably do this with a plugin hook. https://github.com/h2non/filetype.py is interesting but deals with images video etc so not right for this purpose. I think we need our own simple content sniffing code via a plugin hook. What if two plugin type hooks can both potentially handle a sniffed file? The CLI can quit and return an error saying content is ambiguous and you need to specify a `--type`, picking from the following list. ",17608, 473708724,"Thinking about this further: I think I may have made a mistake establishing ""immutable"" as the default mode for databases opened by Datasette. What would it look like if files were NOT opened in immutable mode by default? Maybe the command to start Datasette looks like this: datasette mutable1.db mutable2.db --immutable=this_is_immutable.db --immutable=this_is_immutable2.db So regular file arguments are treated as mutable (and opened in `?mode=ro`) while file arguments passed using the new `--immutable` option are opened in immutable mode. The `-i` shortcut has not yet been taken, so this could be abbreviated to: datasette mutable1.db mutable2.db -i this_is_immutable.db -i this_is_immutable2.db",17608, 473708941,"Some problems to solve: * Right now Datasette assumes it can always show the count of rows in a table, because this has been pre-calculated. If a database is mutable the pre-calculation trick no longer works, and for giant tables a `select count(*) from X` query can be expensive to run. Maybe we set a time limit on these? If time limit expires show ""many rows""? * Maintaining a content hash of the table no longer makes sense if it is changing (though interestingly there's a `.sha3sum` built-in SQLite CLI command which takes a hash of the content and stays the same even through vacuum runs). Without that we need a different mechanism for calculating table colours. It also means that we can't do the special dbname-hash URL trick (see #418) at all if the database is opened as mutable.",17608, 473709815,"In #419 I'm now proposing that Datasette default to opening files in ""mutable"" mode, in which case it would not make sense to support hash URLs for those files at all. So actually this feature will only be available for files that are explicitly opened in immutable mode.",17608, 473709883,"Could I persist the last calculated count for a table and somehow detect if that table has been changed in any way by another process, hence invalidating the cached count (and potentially scheduling a new count)? https://www.sqlite.org/c3ref/update_hook.html says that `sqlite3_update_hook()` can be used to register a handler invoked on almost all update/insert/delete operations to a specific table... except that it misses out on deletes triggered by `ON CONFLICT REPLACE` and only works for `ROWID` tables. Also this hook is not exposed in the Python `sqlite3` library - though it may be available using some terrifying `ctypes` hacks: https://stackoverflow.com/a/16920926 So on further research, I think the answer is *no*: I should assume that it won't be possible to cache counts and magically invalidate the cache when the underlying file is changed by another process. Instead I need to assume that counts will be an expensive operation. As such, I can introduce a time limit on counts and use that anywhere a count is displayed. If the time limit is exceeded by the `count(*)` query I can show ""many"" instead. That said... running `count(*)` against a table with 200,000 rows in only takes about 3ms, so even a timeout of 20ms is likely to work fine for tables of around a million rows. It would be really neat if I could generate a lower bound count in a limited amount of time. If I counted up to 4m rows before the timeout I could show ""more than 4m rows"". No idea if that would be possible though. Relevant: https://stackoverflow.com/questions/8988915/sqlite-count-slow-on-big-tables - reports of very slow counts on 6GB database file. Consensus seems to be ""yeah, that's just how SQLite is built"" - though there was a suggestion that you can use `select max(ROWID) from table` provided you are certain there have been no deletions. Also relevant: http://sqlite.1065341.n5.nabble.com/sqlite3-performance-on-select-count-very-slow-for-16-GB-file-td80176.html",17608, 473712820,"So the differences here are: * For immutable databases we calculate content hash and table counts; mutable databases we do not * Immutable databasse open with `file:{}?immutable=1`, mutable databases open with `file:{}?mode=ro` * Anywhere that shows a table count now needs to call a new method which knows to run `count(*)` with a timeout for mutable databases, read from the precalculated counts for immutable databases * The url-hash option should no longer be available at all for mutable databases * New command-line tool syntax: `datasette mutable.db` v.s. `datasette -i immutable.db`",17608,