html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/simonw/datasette/issues/1160#issuecomment-753568428,https://api.github.com/repos/simonw/datasette/issues/1160,753568428,MDEyOklzc3VlQ29tbWVudDc1MzU2ODQyOA==,9599,2021-01-03T05:02:32Z,2021-01-03T05:02:32Z,OWNER,"Should this command include a `--fts` option for configuring full-text search on one-or-more columns? I thought about doing that for `sqlite-utils insert` in https://github.com/simonw/sqlite-utils/issues/202 and decided not to because of the need to include extra options covering the FTS version, porter stemming options and whether or not to create triggers. But maybe I can set sensible defaults for that with `datasette insert ... -f title -f body`? Worth thinking about a bit more.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752275611,https://api.github.com/repos/simonw/datasette/issues/1160,752275611,MDEyOklzc3VlQ29tbWVudDc1MjI3NTYxMQ==,9599,2020-12-29T23:32:04Z,2020-12-29T23:32:04Z,OWNER,"If I can get this working for CSV, TSV, JSON and JSON-NL that should be enough to exercise the API design pretty well across both streaming and non-streaming formats.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752274509,https://api.github.com/repos/simonw/datasette/issues/1160,752274509,MDEyOklzc3VlQ29tbWVudDc1MjI3NDUwOQ==,9599,2020-12-29T23:26:02Z,2020-12-29T23:26:02Z,OWNER,"The documentation for this plugin hook is going to be pretty detailed, since it involves writing custom classes. I'll stick it all on the existing hooks page for the moment, but I should think about breaking up the plugin hook documentation into a page-per-hook in the future.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752274078,https://api.github.com/repos/simonw/datasette/issues/1160,752274078,MDEyOklzc3VlQ29tbWVudDc1MjI3NDA3OA==,9599,2020-12-29T23:23:39Z,2020-12-29T23:23:39Z,OWNER,"If I design this right I can ship a full version of the command-line `datasette insert` command in a release without doing any work at all on the Web UI version of it - that UI can then come later, without needing any changes to be made to the plugin hook.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752273873,https://api.github.com/repos/simonw/datasette/issues/1160,752273873,MDEyOklzc3VlQ29tbWVudDc1MjI3Mzg3Mw==,9599,2020-12-29T23:22:30Z,2020-12-29T23:22:30Z,OWNER,"How much of this should I get done in a branch before merging into `main`? The challenge here is the plugin hook design: ideally I don't want an incomplete plugin hook design in `main` since that could be a blocker for a release.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752273400,https://api.github.com/repos/simonw/datasette/issues/1160,752273400,MDEyOklzc3VlQ29tbWVudDc1MjI3MzQwMA==,9599,2020-12-29T23:19:46Z,2020-12-29T23:19:46Z,OWNER,I'm going to break out some separate tickets.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752273306,https://api.github.com/repos/simonw/datasette/issues/1160,752273306,MDEyOklzc3VlQ29tbWVudDc1MjI3MzMwNg==,9599,2020-12-29T23:19:15Z,2020-12-29T23:19:15Z,OWNER,It would be nice if this abstraction could support progress bars as well. These won't necessarily work for every format - or they might work for things loaded from files but not things loaded over URLs (if the `content-length` HTTP header is missing) - but if they ARE possible it would be good to provide them - both for the CLI interface and the web insert UI.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752267905,https://api.github.com/repos/simonw/datasette/issues/1160,752267905,MDEyOklzc3VlQ29tbWVudDc1MjI2NzkwNQ==,9599,2020-12-29T22:52:09Z,2020-12-29T22:52:09Z,OWNER,"What's the simplest thing that could possible work? I think it's `datasette insert blah.db data.csv` - no URL handling, no other formats.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752266076,https://api.github.com/repos/simonw/datasette/issues/1160,752266076,MDEyOklzc3VlQ29tbWVudDc1MjI2NjA3Ng==,9599,2020-12-29T22:42:23Z,2020-12-29T22:42:59Z,OWNER,"Aside: maybe `datasette insert` works against simple files, but a later mechanism called `datasette import` allows plugins to register sub-commands, like `datasette import github ...` or `datasette import jira ...` or whatever. This would be useful for import mechanisms that are likely to need their own custom set of command-line options unique to that source.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752265600,https://api.github.com/repos/simonw/datasette/issues/1160,752265600,MDEyOklzc3VlQ29tbWVudDc1MjI2NTYwMA==,9599,2020-12-29T22:39:56Z,2020-12-29T22:39:56Z,OWNER,"Does it definitely make sense to break this operation up into the code that turns the incoming format into a iterator of dictionaries, then the code that inserts those into the database using `sqlite-utils`? That seems right for simple imports, where the incoming file represents a sequence of records in a single table. But what about more complex formats? What if a format needs to be represented as multiple tables?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752259345,https://api.github.com/repos/simonw/datasette/issues/1160,752259345,MDEyOklzc3VlQ29tbWVudDc1MjI1OTM0NQ==,9599,2020-12-29T22:11:54Z,2020-12-29T22:11:54Z,OWNER,"Important detail from https://docs.python.org/3/library/csv.html#csv.reader > If *csvfile* is a file object, it should be opened with `newline=''`. [1] > > [...] > > If `newline=''` is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use `\r\n` linendings on write an extra `\r` will be added. It should always be safe to specify `newline=''`, since the csv module does its own ([universal](https://docs.python.org/3/glossary.html#term-universal-newlines)) newline handling.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752257666,https://api.github.com/repos/simonw/datasette/issues/1160,752257666,MDEyOklzc3VlQ29tbWVudDc1MjI1NzY2Ng==,9599,2020-12-29T22:09:18Z,2020-12-29T22:09:18Z,OWNER,"### Figuring out the API design I want to be able to support different formats, and be able to parse them into tables either streaming or in one go depending on if the format supports that. Ideally I want to be able to pull the first 1,024 bytes for the purpose of detecting the format, then replay those bytes again later. I'm considering this a stretch goal though. CSV is easy to parse as a stream - here’s [how sqlite-utils does it](https://github.com/simonw/sqlite-utils/blob/f1277f638f3a54a821db6e03cb980adad2f2fa35/sqlite_utils/cli.py#L630): dialect = ""excel-tab"" if tsv else ""excel"" with file_progress(json_file, silent=silent) as json_file: reader = csv_std.reader(json_file, dialect=dialect) headers = next(reader) docs = (dict(zip(headers, row)) for row in reader) Problem: using `db.insert_all()` could block for a long time on a big set of rows. Probably easiest to batch the records before calling `insert_all()` and then run a batch at a time using a `db.execute_write_fn()` call.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752236520,https://api.github.com/repos/simonw/datasette/issues/1160,752236520,MDEyOklzc3VlQ29tbWVudDc1MjIzNjUyMA==,9599,2020-12-29T20:48:51Z,2020-12-29T20:48:51Z,OWNER,It would be neat if `datasette insert` could accept a `--plugins-dir` option which allowed one-off format plugins to be registered. Bit tricky to implement since the `--format` Click option will already be populated by that plugin hook call.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-751925934,https://api.github.com/repos/simonw/datasette/issues/1160,751925934,MDEyOklzc3VlQ29tbWVudDc1MTkyNTkzNA==,9599,2020-12-29T02:40:13Z,2020-12-29T20:25:57Z,OWNER,"Basic command design: datasette insert data.db blah.csv The options can include: - `--format` to specify the exact format - without this it will be guessed based on the filename - `--table` to specify the table (otherwise the filename is used) - `--pk` to specify one or more primary key columns - `--replace` to specify that existing rows with a matching primary key should be replaced - `--upsert` to specify that existing matching rows should be upserted - `--ignore` to ignore matching rows - `--alter` to alter the table to add missing columns - `--type column type` to specify the type of a column - useful when working with CSV or TSV files","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752208036,https://api.github.com/repos/simonw/datasette/issues/1160,752208036,MDEyOklzc3VlQ29tbWVudDc1MjIwODAzNg==,9599,2020-12-29T19:06:35Z,2020-12-29T19:06:35Z,OWNER,"If I'm going to execute 1000s of writes in an `async def` operation it may make sense to break that up into smaller chunks, so as not to block the event loop for too long. https://stackoverflow.com/a/36648102 and https://github.com/python/asyncio/issues/284 confirm that `await asyncio.sleep(0)` is the recommended way of doing this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-752203909,https://api.github.com/repos/simonw/datasette/issues/1160,752203909,MDEyOklzc3VlQ29tbWVudDc1MjIwMzkwOQ==,9599,2020-12-29T18:54:19Z,2020-12-29T18:54:19Z,OWNER,More thoughts on this: the key mechanism that populates the tables needs to be an `aysnc def` method of some sort so that it can run as part of the async loop in core Datasette - for importing from web uploads.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-751947991,https://api.github.com/repos/simonw/datasette/issues/1160,751947991,MDEyOklzc3VlQ29tbWVudDc1MTk0Nzk5MQ==,9599,2020-12-29T05:06:50Z,2020-12-29T05:07:03Z,OWNER,"Given the URL option could it be possible for plugins to ""subscribe"" to URLs that keep on streaming? datasette insert db.db https://example.con/streaming-api \ --format api-stream","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-751946262,https://api.github.com/repos/simonw/datasette/issues/1160,751946262,MDEyOklzc3VlQ29tbWVudDc1MTk0NjI2Mg==,9599,2020-12-29T04:56:12Z,2020-12-29T04:56:32Z,OWNER,"Potential design for this: a `datasette memory` command which takes most of the same arguments as `datasette serve` but starts an in-memory database and treats the command arguments as things that should be inserted into that in-memory database. tail -f access.log | datasette memory - \ --format clf -p 8002 -o","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-751945094,https://api.github.com/repos/simonw/datasette/issues/1160,751945094,MDEyOklzc3VlQ29tbWVudDc1MTk0NTA5NA==,9599,2020-12-29T04:48:11Z,2020-12-29T04:48:11Z,OWNER,"It would be pretty cool if you could launch Datasette directly against an insert-compatible file or URL without first having to load it into a SQLite database file. Or imagine being able to tail a log file and like that directly into a new Datasette process, which then runs a web server with the UI while simultaneously continuing to load new entries from that log into the in-memory SQLite database that it is serving... Not quite sure what that CLI interface would look like. Maybe treat that as a future stretch goal for the moment.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-751943837,https://api.github.com/repos/simonw/datasette/issues/1160,751943837,MDEyOklzc3VlQ29tbWVudDc1MTk0MzgzNw==,9599,2020-12-29T04:40:30Z,2020-12-29T04:40:30Z,OWNER,"The `insert` command should also accept URLs - anything starting with `http://` or `https://`. It should accept more than one file name at a time for bulk inserts. if using a URL that URL will be passed to the method that decides if a plugin implementation can handle the import or not. This will allow plugins to register themselves for specific websites.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-751926437,https://api.github.com/repos/simonw/datasette/issues/1160,751926437,MDEyOklzc3VlQ29tbWVudDc1MTkyNjQzNw==,9599,2020-12-29T02:43:21Z,2020-12-29T02:43:37Z,OWNER,"Default formats to support: - CSV - TSV - JSON and newline-delimited JSON - YAML Each of these will be implemented as a default plugin.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-751926218,https://api.github.com/repos/simonw/datasette/issues/1160,751926218,MDEyOklzc3VlQ29tbWVudDc1MTkyNjIxOA==,9599,2020-12-29T02:41:57Z,2020-12-29T02:41:57Z,OWNER,"Other names I considered: - `datasette load` - `datasette import` - I decided to keep this name available for any future work that might involve plugins that help import data from APIs as opposed to inserting it from files","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296, https://github.com/simonw/datasette/issues/1160#issuecomment-751926095,https://api.github.com/repos/simonw/datasette/issues/1160,751926095,MDEyOklzc3VlQ29tbWVudDc1MTkyNjA5NQ==,9599,2020-12-29T02:41:15Z,2020-12-29T02:41:15Z,OWNER,"The UI can live at `/-/insert` and be available by default to the `root` user only. It can offer the following: - Upload a file and have the import type detected (equivalent to `datasette insert data.db thatfile.csv`) - Copy and paste the data to be inserted into a textarea - API equivalents of these","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",775666296,