id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,pull_request,body,repo,type,active_lock_reason,performed_via_github_app,reactions,draft,state_reason
504720731,MDU6SXNzdWU1MDQ3MjA3MzE=,1,Add more details on how to request data from google takeout correctly.,1055831,open,0,,,0,2019-10-09T15:17:34Z,2019-10-09T15:17:34Z,,NONE,,"The default is to download everything.  This can result in an enormous amount of data when you only really need 2 types of data for now:

- My Activity
- Location History

In addition unless you specify that ""My Activity"" is downloaded in JSON format the default is HTML.  This then causes the 

`google-takeout-to-sqlite my-activity takeout.db takeout.zip`

command to fail as it only contains html files not json files.

Thanks",206649770,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/1/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1353411865,I_kwDODEpn8M5Qq20Z,1,Problem with my user,2467,open,0,,,0,2022-08-28T16:59:37Z,2022-08-28T16:59:37Z,,NONE,,"If I call the program with:
    inaturalist-to-sqlite inaturalist.db ftricas
the program exits with an error:
 `Importing 36 observations
Traceback (most recent call last):
  File ""/home/ftricas/.pyenv/versions/3.10.6/bin/inaturalist-to-sqlite"", line 8, in <module>
    sys.exit(cli())
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py"", line 1130, in __call__
    return self.main(*args, **kwargs)
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py"", line 1055, in main
    rv = self.invoke(ctx)
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py"", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py"", line 760, in invoke
    return __callback(*args, **kwargs)
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/inaturalist_to_sqlite/cli.py"", line 51, in cli
    save_observation(observation, db)
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/inaturalist_to_sqlite/utils.py"", line 34, in save_observation
    db[""observations""]
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlite_utils/db.py"", line 2965, in insert
    return self.insert_all(
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlite_utils/db.py"", line 3068, in insert_all
    self.create(
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlite_utils/db.py"", line 1564, in create
    self.db.create_table(
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlite_utils/db.py"", line 951, in create_table
    sql = self.create_table_sql(
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlite_utils/db.py"", line 765, in create_table_sql
    foreign_keys = self.resolve_foreign_keys(name, foreign_keys or [])
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlite_utils/db.py"", line 702, in resolve_foreign_keys
    other_table = table.guess_foreign_table(column)
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sqlite_utils/db.py"", line 2061, in guess_foreign_table
    raise NoObviousTable(
sqlite_utils.db.NoObviousTable: No obvious foreign key table for column 'taxon' - tried ['taxon', 'taxons']
`
If I call the program with your user everything seems to go well and then, I can call the program with my own user without problems. Moreover, I can call the program again with my own user and everything goes well now.

Additional info, the command:
    
    sqlite-utils tables inaturalist.db

shows that the correct name can be 'taxons'.

There is another small problem with a warning:
     
   warnings.warn(""urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported ""

",206202864,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/inaturalist-to-sqlite/issues/1/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
541274681,MDU6SXNzdWU1NDEyNzQ2ODE=,2,Add linkedin-to-sqlite,881925,open,0,,,0,2019-12-21T03:13:40Z,2019-12-21T03:13:40Z,,NONE,,"There is an API available. https://developer.linkedin.com/docs/rest-api#

At the minimum, I would think contact list and messages would be of interest.",214746582,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep.github.io/issues/2/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
664793260,MDU6SXNzdWU2NjQ3OTMyNjA=,2,Yak shave,145425,open,0,,,0,2020-07-23T22:04:18Z,2020-07-23T22:04:18Z,,NONE,,"Just a quick note... The 23andme data is not exactly your genome, but a SNP chip of your genome. It's ""some of your genotypes."" Or about 0.1% of your genome. Nice work in any case! It deserves to be liberated!!!!!",209590345,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/genome-to-sqlite/issues/2/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1485017981,I_kwDODEpn8M5Yg5N9,2,table identifications has no column named previous_observation_taxon,520541,open,0,,,0,2022-12-08T16:47:17Z,2022-12-08T16:47:17Z,,NONE,,"Installed successfully with pip and ran `inaturalist-to-sqlite inaturalist.db simonw` and got the error:

```
sqlite3.OperationalError: table identifications has no column named previous_observation_taxon
```",206202864,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/inaturalist-to-sqlite/issues/2/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
769397742,MDU6SXNzdWU3NjkzOTc3NDI=,3,sqlite-utils error on takeout import,231498,open,0,,,0,2020-12-17T01:18:48Z,2020-12-17T01:19:04Z,,NONE,,"```
$ google-takeout-to-sqlite my-activity takeout.db /path/to/zip
...
sqlite3.OperationalError: no such table: main.my_activity
```

there is no table create in `utils.py`, unlike other importers such as github-to-sqlite

additionally, this package and hackernews-to-sqlite have conflicting `sqlite-utils` dep with datasette and dogsheep-beta",206649770,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/3/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1205867842,I_kwDODtX3eM5H4BVC,4,Retrieve the top-level story for a comment,1755789,open,0,,,0,2022-04-15T20:25:39Z,2022-04-15T20:25:39Z,,NONE,,"I think that each comment inserted into the database should include a column `onstory` that contains the ID of the story on which the comment was made. This is exactly equivalent to the link after ""on:"" at the top of an HN comment page ([example](https://news.ycombinator.com/item?id=18358028)). We could do this either by directly retrieving the HTML page and using Beautiful Soup to find that link, or alternatively recurse up the tree in the Firebase API using the `parent` field (probably using `functools.lru_cache` in case a person has commented a bunch of times on the same story).",248903544,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/4/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
723499985,MDExOlB1bGxSZXF1ZXN0NTA1MDc2NDE4,5,Add fitbit-to-sqlite,4632208,open,0,,,0,2020-10-16T20:04:05Z,2020-10-16T20:04:05Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/dogsheep.github.io/pulls/5,,214746582,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep.github.io/issues/5/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1353418822,PR_kwDODtX3eM497MOV,5,The program fails when the user has no submissions,2467,open,0,,,0,2022-08-28T17:25:45Z,2022-08-28T17:25:45Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/hacker-news-to-sqlite/pulls/5,"Tested with:
    
     hacker-news-to-sqlite user hacker-news.db fernand0

Result:
`
Traceback (most recent call last):
  File ""/home/ftricas/.pyenv/versions/3.10.6/bin/hacker-news-to-sqlite"", line 8, in <module>
    sys.exit(cli())
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py"", line 1130, in __call__
    return self.main(*args, **kwargs)
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py"", line 1055, in main
    rv = self.invoke(ctx)
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py"", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py"", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py"", line 760, in invoke
    return __callback(*args, **kwargs)
  File ""/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/hacker_news_to_sqlite/cli.py"", line 27, in user
    submitted = user.pop(""submitted"", None) or []
AttributeError: 'NoneType' object has no attribute 'pop'
`

There is a problem of style with the patch (but not sure what to do) because with the new inicialization ( submitted = []) the part 

     or []

is not needed. Maybe there is a more adequate way of doing this.",248903544,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/5/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1616440856,I_kwDOJHON9s5gWO4Y,5,Configure full text search,9599,open,0,,,0,2023-03-09T05:20:46Z,2023-03-09T05:20:46Z,,MEMBER,,"FTS would be useful.

Maybe even extract the plain text from the notes to make that index easier to create, rather than creating it against the HTML. Can use the `plaintext` property for that.",611552758,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/apple-notes-to-sqlite/issues/5/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
689848827,MDU6SXNzdWU2ODk4NDg4Mjc=,6,ISO timestamps,9599,open,0,,,0,2020-09-01T06:16:42Z,2020-09-01T06:16:42Z,,MEMBER,,"The `time_added`, `time_updated` and `time_read` columns currently store data like this:

    September 19, 2019 - 00:30:30 UTC

Should use ISO instead, e.g. `2020-07-26T01:05:24+00:00`",213286752,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/6/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
689850810,MDU6SXNzdWU2ODk4NTA4MTA=,6,Set up a demo instance,9599,open,0,,,0,2020-09-01T06:20:24Z,2020-09-01T06:20:24Z,,MEMBER,,"Once I've got the Datasette plugin to a state where it's worth building a demo: #3

I can use data from my public https://github-to-sqlite.dogsheep.net/ demo plus the Pocket data subset I use for the demo in https://github.com/dogsheep/pocket-to-sqlite/issues/5 - I could pull in the https://dogsheep-photos.dogsheep.net/ photos data too.",197431109,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-beta/issues/6/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
925384329,MDExOlB1bGxSZXF1ZXN0NjczODcyOTc0,7,Add instagram-to-sqlite,36654812,open,0,,,0,2021-06-19T12:26:16Z,2021-07-28T07:58:59Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/dogsheep.github.io/pulls/7,"The tool covers only chat imports at the time of opening this PR but I'm planning to import everything else that I feel inquisitive about

ref: https://github.com/gavindsouza/instagram-to-sqlite",214746582,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep.github.io/issues/7/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
930946817,MDU6SXNzdWU5MzA5NDY4MTc=,7,KeyError: 'accuracy' when processing Location History,403152,open,0,,,0,2021-06-27T14:39:43Z,2021-06-27T14:39:43Z,,NONE,,"I'm new to both the dogsheep tools and datasette but have been experimenting a bit the last few days and these are really cool tools!

I encountered a problem running my Google location history through this tool running the latest release in a docker container:

```
Traceback (most recent call last):
  File ""/usr/local/bin/google-takeout-to-sqlite"", line 8, in <module>
    sys.exit(cli())
  File ""/usr/local/lib/python3.9/site-packages/click/core.py"", line 829, in __call__
    return self.main(*args, **kwargs)
  File ""/usr/local/lib/python3.9/site-packages/click/core.py"", line 782, in main
    rv = self.invoke(ctx)
  File ""/usr/local/lib/python3.9/site-packages/click/core.py"", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File ""/usr/local/lib/python3.9/site-packages/click/core.py"", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File ""/usr/local/lib/python3.9/site-packages/click/core.py"", line 610, in invoke
    return callback(*args, **kwargs)
  File ""/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/cli.py"", line 49, in my_activity
    utils.save_location_history(db, zf)
  File ""/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/utils.py"", line 27, in save_location_history
    db[""location_history""].upsert_all(
  File ""/usr/local/lib/python3.9/site-packages/sqlite_utils/db.py"", line 1105, in upsert_all
    return self.insert_all(
  File ""/usr/local/lib/python3.9/site-packages/sqlite_utils/db.py"", line 990, in insert_all
    chunk = list(chunk)
  File ""/usr/local/lib/python3.9/site-packages/google_takeout_to_sqlite/utils.py"", line 33, in <genexpr>
    ""accuracy"": row[""accuracy""],
KeyError: 'accuracy'
```

It looks like the tool assumes the `accuracy` key will be in every location history entry. 

My first attempt at a local patch to get myself going was to convert accessing the `accuracy` key to a `.get` instead to hopefully make the row nullable but I wasn't quite sure what `sqlite_utils` would do there. That did work in that the import happened and so I was going to propose a patch that made that change but in updating the existing test to include an entry with a missing accuracy entry, I noticed the expected type of the field appeared to be changing to a string in the test (and from a quick scan through the sqlite_utils code, probably TEXT in the database). Given this change in column type, it seemed that opening an issue first before proposing a fix seemed warranted. It seems the schema would need to be explicitly specified if you wanted a nullable integer column.

Now that I've done a successful import run using my initial fix of calling `.get` on the row dict, I can see with datasette that I only have 7 data points (out of ~250k) that have a null accuracy column. They are all from 2011-2012 in an import that includes points spanning ~2010-2016 so perhaps another approach might be to filter those entries out during import if it really is that infrequent?

I'm happy to provide a PR for a fix but figured I'd ask about which direction is preferred first.",206649770,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/7/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
797728929,MDU6SXNzdWU3OTc3Mjg5Mjk=,8,QUESTION: extract full text,417363,open,0,,,0,2021-01-31T14:50:10Z,2021-01-31T14:50:10Z,,NONE,,"This may be solved or a feature already, but I couldn't figure it out, is it possible to extract and store also full text from the saved pages? The same way that Pocket parses the text, it'd be amazing to be able to store (and thus make searchable later) the text.

Thank you very much for the project, it's such an amazing idea! ",213286752,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/pocket-to-sqlite/issues/8/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
927385540,MDU6SXNzdWU5MjczODU1NDA=,8,any guidance / experience on imessage-to-sqlite ?,2675621,open,0,,,0,2021-06-22T15:46:16Z,2021-06-22T15:46:16Z,,NONE,,,214746582,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep.github.io/issues/8/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
673602857,MDU6SXNzdWU2NzM2MDI4NTc=,9,Define a view that displays photos correctly,9599,open,0,,,0,2020-08-05T14:53:39Z,2020-08-05T14:53:39Z,,MEMBER,,"The `photos` table stores data like this:

id | createdAt | source | prefix | suffix | width | height | visibility | created ▲ | user
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
5e12c9708506bc000840262a | January 06, 2020 - 05:45:20 UTC | Swarm for iOS 1 | https://fastly.4sqi.net/img/general/ | /15889193_AXxGk4I1nbzUZuyYqObgbXdJNyEHiwj6AUDq0tPZWtw.jpg | 1920 | 1440 | public | 2020-01-06T05:45:20 | 15889193

The photo URL can be derived from those pieces - define a SQL view which does that (using `datasette-json-html` to display the pictures)",205429375,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/9/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1046887492,PR_kwDODFE5qs4uMsMJ,9,Removed space from filename My Activity.json,91880982,open,0,,,0,2021-11-08T00:04:31Z,2021-11-08T00:04:31Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/google-takeout-to-sqlite/pulls/9,"File name from google takeout has no space. The code only runs without error if filename is ""MyActivity.json"" and not ""My Activity.json"". Is it a new change by Google?",206649770,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/9/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1617938730,I_kwDOJHON9s5gb8kq,9,"Default to just storing plaintext, store HTML if `--html` is passed",9599,open,0,,,0,2023-03-09T20:19:06Z,2023-03-09T20:19:06Z,,MEMBER,,"The full `body` version of the notes can get HUGE, due to embedded images. It turns out for my own purposes I'm usually happy with just the `plaintext` version.

I'm tempted to say you don't get HTML unless you pass a `--html` option.",611552758,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/apple-notes-to-sqlite/issues/9/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1250287607,PR_kwDODFE5qs44jvRV,11,Update README.md,11887,open,0,,,0,2022-05-27T03:13:59Z,2022-05-27T03:13:59Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/google-takeout-to-sqlite/pulls/11,Fix typo,206649770,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/11/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
692202408,MDU6SXNzdWU2OTIyMDI0MDg=,12,Idea: maps and GeoJSON support,9599,open,0,,,0,2020-09-03T18:47:10Z,2020-09-04T01:45:03Z,,MEMBER,,"It would be cool if the `display_sql` could return a column populated with GeoJSON which would the automatically be displayed on a map in the results (or maybe default JS would look for a `class=""geojson""` element output by the `display` template) - ala https://github.com/simonw/datasette-leaflet-geojson

Then I could render workout routes on a map, or Swarm checkin points.",197431109,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-beta/issues/12/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
892383270,MDExOlB1bGxSZXF1ZXN0NjQ1MTAwODQ4,12,Recovering of malformed ENEX file,8431437,open,0,,,0,2021-05-15T07:49:31Z,2021-05-15T19:57:50Z,,FIRST_TIMER,dogsheep/evernote-to-sqlite/pulls/12,"Hey .. Awesome work developing this project, that I found very useful to me and saved me some work.. Thanks.. :)

Some background to this PR... 
I've been searching around for a tool allowing me to transforming my personal collection of Evernote notes to a format easier to search and potentially easier import to future services. 

Now I discovered problem processing my large data ~5GB using the existing source using Pythons builtin xml-parser that unfortunately was unable to succeed without exception breaking the process. 

My first attempt I tried to adapt to more robust lxml package allowing huge data and with ""recover"", but even if it worked better it also failed processing the whole data. Even using the memory efficient etree.iterparse() it also unfortunately got into trouble.

And with no luck finding any other libraries successfully parsing this enormous file I instead chose to build a ""hugexmlparser"" module that allows parsing this huge file using yield (on a byte-to-byte-level) and allows you to set a maximum size for <note> to cater for potential malformed or undesirable large attachments to export, should succeed covering potential exceptions. Some cases found where the parses discover malformed XML within <content> so also in those cases try to save as much as possible by escaping (to be dealt at a later stage, better than nothing), and if a missing end </note> before new (malformed?) it would add this after encounter a new start-tag.

The code for the recovery process is a bit rough and for certain room for refactoring, but at the moment is seem to achieve what I wanted.

Now with the above we pass this a minor changed version of save_note_recovery() assure the existing works.
Also adding this as a new recover-enex command to click and kept the original options. 
A couple of new tests was added as well to check against using this command.

Now this currently works to me, but thought I might share a PR in such as you find use for this yourself or found useful to others finding this repository.

As a second step .. When the time allows it would have been nice to also be able to easily export from SQLite to formatted HTML/MD and attachments saved... but that might perhaps be better a separate project ... or if you or someone else have something that might shared to save some trouble, I would be interested ;-) ",303218369,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/12/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1557599877,I_kwDODFE5qs5c1xaF,12,location history changes,14809320,open,0,,,0,2023-01-26T03:57:25Z,2023-01-26T03:57:25Z,,NONE,,"not sure if each download is unique, but I had to change some things to work with the takeout zip I made 2023-01-25

filename changed from ""Location History.json"" to ""Records.json""

`""timestampMs""` is not present, `""timestamp""` is roughly iso timestamp

```py
def get_timestamp_ms(raw_timestamp):
    try:
        return datetime.datetime.strptime(raw_timestamp, ""%Y-%m-%dT%H:%M:%SZ"").timestamp()
    except ValueError:
        return datetime.datetime.strptime(raw_timestamp, ""%Y-%m-%dT%H:%M:%S.%fZ"").timestamp()

def save_location_history(db, zf):
    location_history = json.load(
        zf.open(""Takeout/Location History/Records.json"")
    )
    db[""location_history""].upsert_all(
        (
            {
                ""id"": id_for_location_history(row),
                ""latitude"": row[""latitudeE7""] / 1e7,
                ""longitude"": row[""longitudeE7""] / 1e7,
                ""accuracy"": row[""accuracy""],
                ""timestampMs"": get_timestamp_ms(row[""timestamp""]),
                ""when"": row[""timestamp""],
            }
            for row in location_history[""locations""]
        ),
        pk=""id"",
    )


def id_for_location_history(row):
    # We want an ID that is unique but can be sorted by in
    # date order - so we use the isoformat date + the first
    # 6 characters of a hash of the JSON
    first_six = hashlib.sha1(
        json.dumps(row, separators=("","", "":""), sort_keys=True).encode(""utf8"")
    ).hexdigest()[:6]
    return ""{}-{}"".format(
        row['timestamp'],
        first_six,
    )
```

example locations from mine

```json
{
    ""latitudeE7"": 427220206,
    ""longitudeE7"": -923423972,
    ""accuracy"": 10,
    ""deviceTag"": -1312429967,
    ""deviceDesignation"": ""PRIMARY"",
    ""timestamp"": ""2019-01-08T23:31:50.867Z""
  }
```

```json
{
    ""latitudeE7"": 427011317,
    ""longitudeE7"": -923448300,
    ""accuracy"": 5,
    ""deviceTag"": -1312429967,
    ""deviceDesignation"": ""PRIMARY"",
    ""timestamp"": ""2019-01-08T23:33:53Z""
  }, 
```",206649770,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/12/reactions"", ""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 2}",,
1650981564,I_kwDOJHON9s5iZ_q8,12,Error running pytest,14314871,open,0,,,0,2023-04-02T15:02:36Z,2023-04-02T15:07:10Z,,NONE,,"`______________________________________________________ ERROR collecting tests/test_apple_notes_to_sqlite.py _______________________________________________________
ImportError while importing test module '/Users/lol/development/apple-notes-to-sqlite/tests/test_apple_notes_to_sqlite.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.9/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_apple_notes_to_sqlite.py:2: in <module>
    from apple_notes_to_sqlite.cli import cli, COUNT_SCRIPT, FOLDERS_SCRIPT
E   ModuleNotFoundError: No module named 'apple_notes_to_sqlite'`

Solution:
This is likely a PYTHONPATH issue due to having pytest installed both globally and in the venv. We can guarantee the tests run by adding the current directory to sys.path automatically using

`python -m pytest`

The alternative is to activate the venv, install pytest, deactivate, then activate the venv again (https://stackoverflow.com/questions/35045038/how-do-i-use-pytest-with-virtualenv)",611552758,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/apple-notes-to-sqlite/issues/12/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1650984552,PR_kwDOJHON9s5NbyYN,13,use universal command,14314871,open,0,,,0,2023-04-02T15:10:54Z,2023-04-02T15:37:34Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/apple-notes-to-sqlite/pulls/13,,611552758,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/apple-notes-to-sqlite/issues/13/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1884499674,PR_kwDODFE5qs5ZtYMc,13,"use poetry for packages, asdf for versioning, and gh actions for ci",150855,open,0,,,0,2023-09-06T17:59:16Z,2023-09-06T17:59:16Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/google-takeout-to-sqlite/pulls/13,"- build: use poetry for package management, asdf for python version
- build: cleanup poetry config, add keywords, ignore dist
- ci: migrate circleci to gh actions
- fix: dup method definition
",206649770,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/13/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1393330070,PR_kwDODD6af84__DNJ,14,Photo links,6782721,open,0,,,0,2022-10-01T09:44:15Z,2022-11-18T17:10:49Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/swarm-to-sqlite/pulls/14,"* add to `checkin_details` view new column for a calculated photo links
* supported multiple links split by newline
* create `events` table if there's no events in the history to avoid SQL errors

Fixes #9.",205429375,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/swarm-to-sqlite/issues/14/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1880968405,PR_kwDOJHON9s5ZhYny,14,fix: fix the problem of Chinese character garbling,2698003,open,0,,,0,2023-09-04T23:48:28Z,2023-09-04T23:48:28Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/apple-notes-to-sqlite/pulls/14,"1. The code uses two different ways of writing encoding formats, `mac_roman` and `macroman`. It is uncertain whether there are any typo errors.
2. When there are Chinese characters in the content, exporting it results in garbled code. Changing it to `utf8` can fix the issue.",611552758,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/apple-notes-to-sqlite/issues/14/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
793907673,MDExOlB1bGxSZXF1ZXN0NTYxNTEyNTAz,15,added try / except to write_records ,9857779,open,0,,,0,2021-01-26T03:56:21Z,2021-01-26T03:56:21Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/healthkit-to-sqlite/pulls/15,"to keep the data write from failing if it came across an error during processing. In particular when trying to convert my HealthKit zip file (and that of my wife's) it would consistently error out with the following:

```
db.py 1709 insert_chunk
result = self.db.execute(query, params)

db.py 226 execute
return self.conn.execute(sql, parameters)

sqlite3.OperationalError:
too many SQL variables

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
db.py 1709 insert_chunk
result = self.db.execute(query, params)

db.py 226 execute
return self.conn.execute(sql, parameters)

sqlite3.OperationalError:
too many SQL variables

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
db.py 1709 insert_chunk
result = self.db.execute(query, params)

db.py 226 execute
return self.conn.execute(sql, parameters)

sqlite3.OperationalError:
table rBodyMass has no column named metadata_HKWasUserEntered

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
healthkit-to-sqlite 8 <module>
sys.exit(cli())

core.py 829 __call__
return self.main(*args, **kwargs)

core.py 782 main
rv = self.invoke(ctx)

core.py 1066 invoke
return ctx.invoke(self.callback, **ctx.params)

core.py 610 invoke
return callback(*args, **kwargs)

cli.py 57 cli
convert_xml_to_sqlite(fp, db, progress_callback=bar.update, zipfile=zf)

utils.py 42 convert_xml_to_sqlite
write_records(records, db)

utils.py 143 write_records
db[table].insert_all(

db.py 1899 insert_all
self.insert_chunk(

db.py 1720 insert_chunk
self.insert_chunk(

db.py 1720 insert_chunk
self.insert_chunk(

db.py 1714 insert_chunk
result = self.db.execute(query, params)

db.py 226 execute
return self.conn.execute(sql, parameters)

sqlite3.OperationalError:
table rBodyMass has no column named metadata_HKWasUserEntered
```

Adding the try / except in the `write_records` seems to fix that issue. ",197882382,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/15/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1042759769,PR_kwDOEhK-wc4uAJb9,15,include note tags in the export,436138,open,0,,,0,2021-11-02T20:04:31Z,2021-11-02T20:04:31Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/evernote-to-sqlite/pulls/15,"When parsing the Evernote `<note>` elements, the script will now also parse any nested `<tag>` elements, writing them out into a separate sqlite table.

Here is an example of how to query the data after the script has run:
```
select notes.*,
	(select group_concat(tag) from notes_tags where notes_tags.note_id=notes.id) as tags
from notes;
```

My .enex source file is 3+ years old so I am assuming the structure hasn't changed.  Interestingly, my _notebook names_ show up in the _tags_ list where the tag name is prefixed with `notebook_`, so this could maybe help work around the first limitation mentioned in the [evernote-to-sqlite blog post](https://simonwillison.net/2020/Oct/16/building-evernote-sqlite-exporter/).
",303218369,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/15/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
505673645,MDU6SXNzdWU1MDU2NzM2NDU=,16,Do a better job with archived direct message threads,9599,open,0,,,0,2019-10-11T06:55:21Z,2019-10-11T06:55:27Z,,MEMBER,,https://github.com/dogsheep/twitter-to-sqlite/blob/fb2698086d766e0333a55bb73435e7283feeb438/twitter_to_sqlite/archive.py#L98-L99,206156866,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/16/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
830901133,MDExOlB1bGxSZXF1ZXN0NTkyMzY0MjU1,16,"Add a fallback ID, print if no ID found",1234956,open,0,,,0,2021-03-13T13:38:29Z,2021-03-13T14:44:04Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/healthkit-to-sqlite/pulls/16,"Fixes https://github.com/dogsheep/healthkit-to-sqlite/issues/14
",197882382,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/16/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1943259395,I_kwDOEhK-wc5z08kD,16, time data '2014-11-21T11:44:12.000Z' does not match format '%Y%m%dT%H%M%SZ',3746270,open,0,,,0,2023-10-14T13:24:39Z,2023-10-14T13:24:39Z,,NONE,,"
```
evernote-to-sqlite enex evernote.db ./我的笔记.enex
Importing from ENEX  [#####-------------------------------]   14%
Traceback (most recent call last):
  File ""/usr/local/bin/evernote-to-sqlite"", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File ""/usr/local/lib/python3.11/site-packages/click/core.py"", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ""/usr/local/lib/python3.11/site-packages/click/core.py"", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File ""/usr/local/lib/python3.11/site-packages/click/core.py"", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ""/usr/local/lib/python3.11/site-packages/click/core.py"", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ""/usr/local/lib/python3.11/site-packages/click/core.py"", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ""/usr/local/lib/python3.11/site-packages/evernote_to_sqlite/cli.py"", line 31, in enex
    save_note(db, note)
  File ""/usr/local/lib/python3.11/site-packages/evernote_to_sqlite/utils.py"", line 46, in save_note
    ""created"": convert_datetime(created),
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ""/usr/local/lib/python3.11/site-packages/evernote_to_sqlite/utils.py"", line 111, in convert_datetime
    return datetime.datetime.strptime(s, ""%Y%m%dT%H%M%SZ"").isoformat()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ""/usr/local/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/_strptime.py"", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ""/usr/local/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/_strptime.py"", line 349, in _strptime
    raise ValueError(""time data %r does not match format %r"" %
ValueError: time data '2014-11-21T11:44:12.000Z' does not match format '%Y%m%dT%H%M%SZ'
```

enex is exported by evernote mac client ",303218369,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/16/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
836063389,MDU6SXNzdWU4MzYwNjMzODk=,17,Datetime columns are not properly formatted to be recognizes as datetime,1234956,open,0,,,0,2021-03-19T14:33:04Z,2021-03-19T14:33:04Z,,NONE,,"

Currently, the datetimes are formatted in a way that is not recognized by datasette-vega for plotting with a `Date/time` type for the axis. 

For example, if you have datasette running locally with `datasette-vega` installed and have a database that includes resting heart rate:

```
http://localhost:8001/healthkit/rRestingHeartRate#g.mark=line&g.x_column=startDate&g.x_type=temporal&g.y_column=value&g.y_type=quantitative
```

The plot is blank unless you choose `Label` as the type for the date data.

The `startDate` (and `creationDate` and `endDate`) columns appear like: `2019-11-14 18:22:18 -0700`

If instead the format for this column is changed slightly: `2019-11-14T18:22:18-07:00` they are recognized as proper dates and the charting works as expected.

I have a PR that addresses this issue, will submit shortly.",197882382,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/17/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
836064851,MDExOlB1bGxSZXF1ZXN0NTk2NjI3Nzgw,18,Add datetime parsing,1234956,open,0,,,0,2021-03-19T14:34:22Z,2021-03-19T14:34:22Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/healthkit-to-sqlite/pulls/18,"Parses the datetime columns so they are subsequently properly recognized as
datetime.

Fixes https://github.com/dogsheep/healthkit-to-sqlite/issues/17
",197882382,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/18/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
697162939,MDU6SXNzdWU2OTcxNjI5Mzk=,20,Add more tags so people can find your project.,7902810,open,0,,,0,2020-09-09T21:14:09Z,2020-09-09T21:14:09Z,,NONE,,"quantified-self habit-tracking google-fit time-tracking wearables quantifiedself 
for example",197431109,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-beta/issues/20/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 1, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1515717718,PR_kwDOC8tyDs5Gc-VH,23,Include workout statistics,2129,open,0,,,0,2023-01-01T17:29:57Z,2023-01-01T17:29:57Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/healthkit-to-sqlite/pulls/23,"Not sure when this changed (iOS 16 maybe?), but the `WorkoutStatistics` now has a whole bunch of information about workouts, e.g. for runs it contains the distance (as a `<WorkoutStatistics type=""HKQuantityTypeIdentifierDistanceWalkingRunning ...>` element).

Adding it as another column at leat allows me to pull these out (using SQLite's JSON support).
I'm running with this patch on my own data now.",197882382,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/23/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
599776345,MDU6SXNzdWU1OTk3NzYzNDU=,24,Feature idea: github-to-sqlite everything ...,9599,open,0,,,0,2020-04-14T18:34:00Z,2020-04-14T18:34:00Z,,MEMBER,,"At the moment if you want to pull all your repos, issues, issues comments etc you have to do it with a sequence of separate commands.

Consider adding a `everything` or `all` command which fetches everything that the tool knows how to fetch, and is designed to be run on a cron in a way that fetches just new stuff each time.",207052882,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/github-to-sqlite/issues/24/reactions"", ""total_count"": 7, ""+1"": 7, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
621486115,MDU6SXNzdWU2MjE0ODYxMTU=,27,photos_with_apple_metadata view should include labels,9599,open,0,,,0,2020-05-20T06:06:17Z,2020-05-20T06:06:17Z,,MEMBER,,"https://dogsheep-photos.dogsheep.net/public/photos_with_apple_metadata?place_city=New+Orleans&_facet=place_city&_facet_array=albums&_facet_array=persons

Here's one way to add that:
```sql
        select
          rowid,
          photo,
          (
            select
              json_group_array(
                json_object(
                  'label',
                  normalized_string,
                  'href',
                  '/photos/labelled?_hide_sql=1&label=' || normalized_string
                )
              )
            from
              labels
            where
              labels.uuid = photos_with_apple_metadata.uuid
          ) as labels,
          date,
```",256834907,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-photos/issues/27/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
709789634,MDU6SXNzdWU3MDk3ODk2MzQ=,27,Sort order is not persisted by facet filter links,9599,open,0,,,0,2020-09-27T18:22:07Z,2020-09-27T18:22:07Z,,MEMBER,,A link to `/-/beta?category=1&timestamp__date=2018-08-01&q=swedish` should be to `/-/beta?category=1&timestamp__date=2018-08-01&q=swedish&sort=newest`,197431109,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-beta/issues/27/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
655974395,MDExOlB1bGxSZXF1ZXN0NDQ4MzU1Njgw,30,Handle empty bucket on first upload. Allow specifying the endpoint_url for services other than S3 (like b2 and digitalocean spaces),110038,open,0,,,0,2020-07-13T16:15:26Z,2020-07-13T16:15:26Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/dogsheep-photos/pulls/30,"Finally got around to trying dogsheep-photos but I want to use backblaze's b2 service instead of AWS S3.
Had to add a way to optionally specify the endpoint_url to connect to. Then with the bucket being empty the initial key retrieval would fail. Probably a better way to see that the bucket is empty than doing a test inside the paginator loop.

Also probably a better way to specify the endpoint_url as we get and test for it twice using the same code in two different places but did not want to spend too much time worrying about it.",256834907,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-photos/issues/30/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
836923194,MDU6SXNzdWU4MzY5MjMxOTQ=,32,JSON API for search results,9599,open,0,,,0,2021-03-20T22:21:36Z,2021-03-20T22:21:36Z,,MEMBER,,Refs https://github.com/simonw/datasette/issues/878,197431109,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-beta/issues/32/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
268110769,MDU6SXNzdWUyNjgxMTA3Njk=,33,Use locust for benchmarking and load tests,9599,open,0,,,0,2017-10-24T17:00:09Z,2017-12-10T03:12:16Z,,OWNER,,"https://github.com/locustio/locust

Needed for #32 ",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/33/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
830283447,MDU6SXNzdWU4MzAyODM0NDc=,34,bucket name,6213,open,0,,,0,2021-03-12T16:40:57Z,2021-03-12T16:40:57Z,,NONE,,I followed the instructions to setup credentials but I am getting a invalid bucket name. Can you put a sample auth.json file in the base that shows the correct format for this?  Thanks,256834907,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-photos/issues/34/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
983221851,MDU6SXNzdWU5ODMyMjE4NTE=,34,Data folder as index command parameter,1223625,open,0,,,0,2021-08-30T21:29:33Z,2021-08-30T21:29:33Z,,NONE,,"Hi,

First of all, thank you for this wonderful project :smile:

I started to use dogsheep to make my personal data searchable, and by using the project I noticed an issue with the index command.

It always expects you are running it from the root folder from where the data is located, so I got some errors while trying to make it work on my setup.

I separate all databases inside a `data` folder (I published my setup to be easier to follow: https://github.com/humrochagf/my-dogsheep)

Before, I configured `dogsheep.yml` to add the data folder to its path like this:

```yml
data/twitter.db:
    tweets:
        sql: |-
...
```

And running the index command like this:

```
dogsheep-beta index data/dogsheep.db dogsheep.yml
```

It worked to the normal search feature with no problem this way, but when I started adding `display_sql` rules the app started to crash, because at datasette `get_database` it was looking for `data/twitter` and it only had a db called `twitter` there.

So my workaround to that was to cd into the data folder and run the indexer. You can check the way I'm doing it at this line of the makefile: https://github.com/humrochagf/my-dogsheep/blob/main/makefile#L3

It works but it would be nice to have an option to pass the path where the data is located to the index function.",197431109,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-beta/issues/34/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
987985935,MDExOlB1bGxSZXF1ZXN0NzI2OTkwNjgw,35,Support for Datasette's --base-url setting,2670795,open,0,,,0,2021-09-03T17:47:45Z,2021-09-03T17:47:45Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/dogsheep-beta/pulls/35,This makes it so you can use Dogsheep if you're using Datasette with the `--base-url /some-path/` setting.,197431109,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-beta/issues/35/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1751214236,I_kwDOC8SPRc5oYWic,36,Getting sqlite_master may not be modified when creating dogsheep index,8711912,open,0,,,0,2023-06-11T03:21:53Z,2023-06-11T03:21:53Z,,NONE,,"When creating a `dogsheep` index from `config.yml` file on pocket.db (created using pocket-to-sqlite), I am getting this error

```
Traceback (most recent call last):
  File ""/Users/khushmeeet/.pyenv/versions/3.11.2/bin/dogsheep-beta"", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File ""/Users/khushmeeet/.pyenv/versions/3.11.2/lib/python3.11/site-packages/click/core.py"", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ""/Users/khushmeeet/.pyenv/versions/3.11.2/lib/python3.11/site-packages/click/core.py"", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File ""/Users/khushmeeet/.pyenv/versions/3.11.2/lib/python3.11/site-packages/click/core.py"", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ""/Users/khushmeeet/.pyenv/versions/3.11.2/lib/python3.11/site-packages/click/core.py"", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ""/Users/khushmeeet/.pyenv/versions/3.11.2/lib/python3.11/site-packages/click/core.py"", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ""/Users/khushmeeet/.pyenv/versions/3.11.2/lib/python3.11/site-packages/dogsheep_beta/cli.py"", line 36, in index
    run_indexer(
  File ""/Users/khushmeeet/.pyenv/versions/3.11.2/lib/python3.11/site-packages/dogsheep_beta/utils.py"", line 32, in run_indexer
    ensure_table_and_indexes(db, tokenize)
  File ""/Users/khushmeeet/.pyenv/versions/3.11.2/lib/python3.11/site-packages/dogsheep_beta/utils.py"", line 91, in ensure_table_and_indexes
    table.add_foreign_key(*fk)
  File ""/Users/khushmeeet/.pyenv/versions/3.11.2/lib/python3.11/site-packages/sqlite_utils/db.py"", line 2155, in add_foreign_key
    self.db.add_foreign_keys([(self.name, column, other_table, other_column)])
  File ""/Users/khushmeeet/.pyenv/versions/3.11.2/lib/python3.11/site-packages/sqlite_utils/db.py"", line 1116, in add_foreign_keys
    cursor.execute(
sqlite3.OperationalError: table sqlite_master may not be modified
```

Command I ran to get this error
```
dogsheep-beta index pocket.db config.yml
```

Dogsheep version
```
dogsheep-beta, version 0.10.2
```

Python version 
```
Python 3.11.2
```",197431109,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-beta/issues/36/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1293698966,PR_kwDOD079W84600uh,37,Fix former command name in readme,578773,open,0,,,0,2022-07-05T02:09:13Z,2022-07-05T02:09:13Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/dogsheep-photos/pulls/37,Looks like a previous commit missed a `photo-to-sqlite`→ `dogsheep-photos` replacement.,256834907,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-photos/issues/37/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1888477283,I_kwDOC8SPRc5wj-Bj,38,Run `rebuild_fts` after building the index,9599,open,0,,,0,2023-09-08T23:17:45Z,2023-09-08T23:17:45Z,,MEMBER,,"In:
- https://github.com/simonw/datasette.io/issues/152#issuecomment-1712323347

This turned out to be the fix:

```bash
dogsheep-beta index dogsheep-index.db templates/dogsheep-beta.yml
sqlite-utils rebuild-fts dogsheep-index.db
```",197431109,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-beta/issues/38/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1827436260,PR_kwDOD079W85WtVyk,39,Missing option in datasette instructions,319473,open,0,,,0,2023-07-29T10:34:48Z,2023-07-29T10:34:48Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/dogsheep-photos/pulls/39,Gotta tell it where to look,256834907,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/dogsheep-photos/issues/39/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
703216044,MDU6SXNzdWU3MDMyMTYwNDQ=,49,Feature: gists and starred gists,9599,open,0,,,0,2020-09-17T02:30:52Z,2020-09-17T02:30:52Z,,MEMBER,,https://developer.github.com/v3/gists/#list-starred-gists,207052882,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/github-to-sqlite/issues/49/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
703218448,MDU6SXNzdWU3MDMyMTg0NDg=,51,Documentation for twitter-to-sqlite fetch,9599,open,0,,,0,2020-09-17T02:38:10Z,2020-09-17T02:38:10Z,,MEMBER,,"It's mentioned in passing in the README but it deserves its own section:
```
$ twitter-to-sqlite fetch \
    ""https://api.twitter.com/1.1/account/verify_credentials.json"" \
    | grep '""id""' | head -n 1
```",206156866,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/51/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
797784080,MDU6SXNzdWU3OTc3ODQwODA=,62,Stargazers and workflows commands always require an auth file when using GITHUB_TOKEN ,631242,open,0,,,0,2021-01-31T18:56:05Z,2021-01-31T18:56:05Z,,CONTRIBUTOR,,"Requested fix in https://github.com/dogsheep/github-to-sqlite/pull/59

The stargazers and workflows commands always require an auth file, even when using a `GITHUB_TOKEN`.  Other commands don't require the auth file.

",207052882,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/github-to-sqlite/issues/62/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
897212458,MDU6SXNzdWU4OTcyMTI0NTg=,63,Ability to fetch commits from branches other than the default,9599,open,0,,,0,2021-05-20T17:58:08Z,2021-05-20T17:58:08Z,,MEMBER,,This tool is currently almost entirely ignorant of the concept of branches. One example: you can't retrieve commits from any branch other than the default (usually main).,207052882,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/github-to-sqlite/issues/63/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1091850530,I_kwDODEm0Qs5BFFEi,63,Import archive error 'withheld_in_countries',521097,open,0,,,0,2022-01-01T16:58:59Z,2022-01-01T16:58:59Z,,NONE,,"Importing the twitter archive  I received this error:
```bash
$ twitter-to-sqlite import archive.db twitter-2021-12-31-<hash>.zip 
birdwatch-note-rating: not yet implemented
birdwatch-note: not yet implemented
branch-links: not yet implemented
community-tweet: not yet implemented
contact: not yet implemented
device-token: not yet implemented
direct-message-mute: not yet implemented
mute: not yet implemented
periscope-account-information: not yet implemented
periscope-ban-information: not yet implemented
periscope-broadcast-metadata: not yet implemented
periscope-comments-made-by-user: not yet implemented
periscope-expired-broadcasts: not yet implemented
periscope-followers: not yet implemented
periscope-profile-description: not yet implemented
professional-data: not yet implemented
protected-history: not yet implemented
reply-prompt: not yet implemented
screen-name-change: not yet implemented
smartblock: not yet implemented
spaces-metadata: not yet implemented
sso: not yet implemented
Traceback (most recent call last):
  File ""/home/paulox/.virtualenvs/dogsheep/bin/twitter-to-sqlite"", line 8, in <module>
    sys.exit(cli())
  File ""/home/paulox/.virtualenvs/dogsheep/lib/python3.9/site-packages/click/core.py"", line 1128, in __call__
    return self.main(*args, **kwargs)
  File ""/home/paulox/.virtualenvs/dogsheep/lib/python3.9/site-packages/click/core.py"", line 1053, in main
    rv = self.invoke(ctx)
  File ""/home/paulox/.virtualenvs/dogsheep/lib/python3.9/site-packages/click/core.py"", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File ""/home/paulox/.virtualenvs/dogsheep/lib/python3.9/site-packages/click/core.py"", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File ""/home/paulox/.virtualenvs/dogsheep/lib/python3.9/site-packages/click/core.py"", line 754, in invoke
    return __callback(*args, **kwargs)
  File ""/home/paulox/.virtualenvs/dogsheep/lib/python3.9/site-packages/twitter_to_sqlite/cli.py"", line 759, in import_
    archive.import_from_file(db, filename, content)
  File ""/home/paulox/.virtualenvs/dogsheep/lib/python3.9/site-packages/twitter_to_sqlite/archive.py"", line 246, in import_from_file
    db[table_name].insert_all(rows, pk=pk, replace=True)
  File ""/home/paulox/.virtualenvs/dogsheep/lib/python3.9/site-packages/sqlite_utils/db.py"", line 2625, in insert_all
    self.insert_chunk(
  File ""/home/paulox/.virtualenvs/dogsheep/lib/python3.9/site-packages/sqlite_utils/db.py"", line 2406, in insert_chunk
    result = self.db.execute(query, params)
  File ""/home/paulox/.virtualenvs/dogsheep/lib/python3.9/site-packages/sqlite_utils/db.py"", line 422, in execute
    return self.conn.execute(sql, parameters)
sqlite3.OperationalError: table archive_tweet has no column named withheld_in_countries
```

I found only a single tweet with the key `withheld_in_countries` in `tweet.js` that seems the problems:
```JSON
[
{
    ""tweet"" : {
      ""retweeted"" : false,
      ""source"" : ""<a href=\""http://twitter.com/download/android\"" rel=\""nofollow\"">Twitter for Android</a>"",
      ""entities"" : {
        ""hashtags"" : [
          {
            ""text"" : ""NowOnAndroid"",
            ""indices"" : [
              ""64"",
              ""77""
            ]
          }
        ],
        ""symbols"" : [ ],
        ""user_mentions"" : [
          {
            ""name"" : ""Periscope"",
            ""screen_name"" : ""PeriscopeCo"",
            ""indices"" : [
              ""3"",
              ""15""
            ],
            ""id_str"" : ""1111111111"",
            ""id"" : ""222222222""
          }
        ],
        ""urls"" : [
          {
            ""url"" : ""https://t.co/xxxxxxxxx"",
            ""expanded_url"" : ""https://vine.co/v/xxxxxxxxx"",
            ""display_url"" : ""vine.co/v/xxxxxxxxxx"",
            ""indices"" : [
              ""78"",
              ""101""
            ]
          }
        ]
      },
      ""display_text_range"" : [
        ""0"",
        ""101""
      ],
      ""favorite_count"" : ""0"",
      ""id_str"" : ""1111111111111111111111"",
      ""truncated"" : false,
      ""retweet_count"" : ""0"",
      ""withheld_in_countries"" : [
        ""TR""
      ],
      ""id"" : ""000000000000000000"",
      ""possibly_sensitive"" : false,
      ""created_at"" : ""Fri Aug 14 06:04:03 +0000 2015"",
      ""favorited"" : false,
      ""full_text"" : ""RT @periscopeco: Travel the world. LIVE. The Global Map is here #NowOnAndroid https://t.co/NZXdsPWROk"",
      ""lang"" : ""en""
    }
  }
  ]
```

I solved the error removing the key from the `tweet.js` but I'm reporting this error to improve the project.",206156866,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/63/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1097332098,I_kwDODEm0Qs5BZ_WC,64,Include all entities for tweets,111631,open,0,,,0,2022-01-09T23:35:28Z,2022-01-09T23:35:28Z,,NONE,,"Per our conversation [on Twitter](https://twitter.com/mschoening/status/1480312477246054401):

It would be neat if all entities (including URLs) were captured. This way you can ensure, that URLs are parsed out exactly the same way Twitter parses URLs – we all know parsing URLs with a regex ain't fun.

Right now, I believe the tool filters out all entities that are not of type `media`.",206156866,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/64/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1160327106,PR_kwDODEm0Qs4z_V3w,65,"Update Twitter dev link, clarify apps vs projects",2657547,open,0,,,0,2022-03-05T11:56:08Z,2022-03-05T11:56:08Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/twitter-to-sqlite/pulls/65,"Twitter pushes you heavily towards v2 projects instead of v1 apps – I know the README mentions v1 API compatibility at the top, but I still nearly got turned around here.",206156866,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/65/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1244082183,PR_kwDODEm0Qs44PPLy,66,Ageinfo workaround,11887,open,0,,,0,2022-05-21T21:08:29Z,2022-05-21T21:09:16Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/twitter-to-sqlite/pulls/66,"I'm not sure if this is due to a new format or just because my ageinfo file is blank, but trying to import an archive would crash when it got to that file.  This PR adds a guard clause in the `ageinfo` transformer and sets a default value that doesn't throw an exception.  Seems likely to be the same issue mentioned by danp in https://github.com/dogsheep/twitter-to-sqlite/issues/54, my ageinfo file looks the same.  Added that same ageinfo file to the test archive as well to help confirm my workaround didn't break anything.

Let me know if you want any changes!",206156866,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/66/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
981690086,MDExOlB1bGxSZXF1ZXN0NzIxNjg2NzIx,67,Replacing step ID key with step_id,16374374,open,0,,,0,2021-08-28T01:26:41Z,2021-08-28T01:27:00Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/github-to-sqlite/pulls/67,"Workflows that have an `id` in any step result in the following error when running `workflows`:

e.g.`github-to-sqlite workflows github.db nixos/nixpkgs`

```Traceback (most recent call last):
  File ""/usr/local/bin/github-to-sqlite"", line 8, in <module>
    sys.exit(cli())
  File ""/usr/local/lib/python3.8/dist-packages/click/core.py"", line 1137, in __call__
    return self.main(*args, **kwargs)
  File ""/usr/local/lib/python3.8/dist-packages/click/core.py"", line 1062, in main
    rv = self.invoke(ctx)
  File ""/usr/local/lib/python3.8/dist-packages/click/core.py"", line 1668, in invoke```Traceback (most recent call last):
  File ""/usr/local/bin/github-to-sqlite"", line 8, in <module>
    sys.exit(cli())
  File ""/usr/local/lib/python3.8/dist-packages/click/core.py"", line 1137, in __call__
    return self.main(*args, **kwargs)
  File ""/usr/local/lib/python3.8/dist-packages/click/core.py"", line 1062, in main
    rv = self.invoke(ctx)
  File ""/usr/local/lib/python3.8/dist-packages/click/core.py"", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File ""/usr/local/lib/python3.8/dist-packages/click/core.py"", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File ""/usr/local/lib/python3.8/dist-packages/click/core.py"", line 763, in invoke
    return __callback(*args, **kwargs)
  File ""/usr/local/lib/python3.8/dist-packages/github_to_sqlite/cli.py"", line 601, in workflows
    utils.save_workflow(db, repo_id, filename, content)
  File ""/usr/local/lib/python3.8/dist-packages/github_to_sqlite/utils.py"", line 865, in save_workflow
    db[""steps""].insert_all(
  File ""/usr/local/lib/python3.8/dist-packages/sqlite_utils/db.py"", line 2596, in insert_all
    self.insert_chunk(
  File ""/usr/local/lib/python3.8/dist-packages/sqlite_utils/db.py"", line 2378, in insert_chunk
    result = self.db.execute(query, params)
  File ""/usr/local/lib/python3.8/dist-packages/sqlite_utils/db.py"", line 419, in execute
    return self.conn.execute(sql, parameters)
sqlite3.IntegrityError: datatype mismatch
```

 - [Information about the ID key in a step for GHA](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idstepsid)
 - [An example workflow from a public repo](https://github.com/NixOS/nixpkgs/blob/b4cc66827745e525ce7bb54659845ac89788a597/.github/workflows/direct-push.yml#L16)

# Changes
I'm proposing that the key for `id` in step is replaced with `step_id` so that it no longer interferes with the table `id` for tracking the record.

Special thanks to @sarcasticadmin @egiffen and @ruebenramirez for helping a bit on this 😄 ",207052882,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/github-to-sqlite/issues/67/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 1, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1513237712,PR_kwDODEm0Qs5GUoG_,67,Add support for app-only bearer tokens,26161409,open,0,,,0,2022-12-28T23:31:20Z,2022-12-28T23:31:20Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/twitter-to-sqlite/pulls/67,"Previously, twitter-to-sqlite only supported OAuth1 authentication, and the token must be on behalf of a user.  However, Twitter also supports application-only bearer tokens, documented here:
https://developer.twitter.com/en/docs/authentication/oauth-2-0/bearer-tokens This PR adds support to twitter-to-sqlite for using application-only bearer tokens.  To use, the auth.json file just needs to contain a ""bearer_token"" key instead of ""api_key"", ""api_secret_key"", etc.",206156866,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/67/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1013506559,PR_kwDODFdgUs4skaNS,68,Add support for retrieving teams / members,68329,open,0,,,0,2021-10-01T15:55:02Z,2021-10-01T15:59:53Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/github-to-sqlite/pulls/68,Adds a method for retrieving all the teams within an organisation and all the members in those teams. The latter is stored as a join table `team_members` beteween `teams` and `users`.,207052882,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/github-to-sqlite/issues/68/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1513237982,PR_kwDODEm0Qs5GUoKL,68,Archive: Import mute table,26161409,open,0,,,0,2022-12-28T23:32:06Z,2022-12-28T23:32:06Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/twitter-to-sqlite/pulls/68,,206156866,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/68/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1513238152,PR_kwDODEm0Qs5GUoMM,69,Archive: Import new tweets table name,26161409,open,0,,,0,2022-12-28T23:32:44Z,2022-12-28T23:32:44Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/twitter-to-sqlite/pulls/69,"Given the code here, it seems like in the past this file was named ""tweet.js"".  In recent exports, it's named ""tweets.js"".  The archive importer needs to be modified to take this into account.  Existing logic is reused for importing this table.  (However, the resulting table name will be different, matching the different file name -- archive_tweets, rather than archive_tweet).",206156866,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/69/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1149402080,PR_kwDODFdgUs4zaUta,70,scrape-dependents: enable paging through package menu option if present,36061055,open,0,,,0,2022-02-24T15:07:25Z,2022-02-24T15:07:25Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/github-to-sqlite/pulls/70,Some repos organize network dependents by a Package toggle. This PR adds the ability to page through those options and scrape underlying dependents.,207052882,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/github-to-sqlite/issues/70/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1513238314,PR_kwDODEm0Qs5GUoN6,70,Archive: Import Twitter Circle data,26161409,open,0,,,0,2022-12-28T23:33:09Z,2022-12-28T23:33:09Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/twitter-to-sqlite/pulls/70,,206156866,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/70/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1513238455,PR_kwDODEm0Qs5GUoPm,71,"Archive: Fix ""ni devices"" typo in importer",26161409,open,0,,,0,2022-12-28T23:33:31Z,2022-12-28T23:33:31Z,,FIRST_TIME_CONTRIBUTOR,dogsheep/twitter-to-sqlite/pulls/71,,206156866,pull,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/71/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1524431805,I_kwDODEm0Qs5a3Pu9,72,"Import thread, including self- and others' replies",601708,open,0,,,0,2023-01-08T09:51:06Z,2023-01-08T09:51:06Z,,NONE,,"statuses-lookup, home-timeline, mentions (only for auth'ed user) don't cover this.

`twitter-to-sqlite fetch-thread tw-group1.db 1234123412341234`

twitter-to-sqlite focuses on archiving users, but does not easily support archiving conversations or community activity.

For reference, this is [implemented in twarc](https://sourcegraph.com/github.com/DocNow/twarc/-/blob/twarc/client.py?L708-766&subtree=true), using a search, optionally recursively.

Other research suggests that this formerly, or currently, requires a [search query](https://stackoverflow.com/a/30480103/1020467), use of [undocumented `related_results` api](https://stackoverflow.com/a/9419346/1020467), or with requested inclusion of [newer conversation_id](https://stackoverflow.com/a/68115718/1020467) with subsequent query.

",206156866,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/72/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1816830546,I_kwDODEm0Qs5sSqJS,73,Twitter v1 API shutdown,6341745,open,0,,,0,2023-07-22T16:57:41Z,2023-07-22T16:57:41Z,,NONE,,"I've been using this project reliably over the past two years to periodically download my liked tweets, but unfortunately since 19th July I get:

```
[2023-07-19 21:00:04.937536]   File ""/home/pi/code/liked-tweets/lib/python3.7/site-packages/twitter_to_sqlite/utils.py"", line 202, in fetch_timeline
[2023-07-19 21:00:04.937606]     raise Exception(str(tweets[""errors""]))
[2023-07-19 21:00:04.937678] Exception: [{'message': 'You currently have access to a subset of Twitter API v2 endpoints and limited v1.1 endpoints (e.g. media post, oauth) only. If you need access to this endpoint, you may need a different access level. You can learn more here: https://developer.twitter.com/en/portal/product', 'code': 453}]
```

It appears like Twitter has now shut down their v1 endpoints, which is rather gracious of them, considering they [announced they'd be deprecated on 29th April](https://twittercommunity.com/t/reminder-to-migrate-to-the-new-free-basic-or-enterprise-plans-of-the-twitter-api/189737).

Unfortunately [retrieving likes using the v2 API](https://developer.twitter.com/en/docs/twitter-api/tweets/likes/introduction) is not part of their [free plan](https://developer.twitter.com/en/portal/products). In fact, with the free plan one can only post and delete tweets and retrieve information about oneself.

So I'm afraid this is the end of this very nice project. It was very useful, thank you!
",206156866,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/twitter-to-sqlite/issues/73/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,
1363244199,I_kwDODFdgUs5RQXSn,75,Fetch repos doesn't support organisations,2757699,open,0,,,0,2022-09-06T12:55:06Z,2022-09-06T12:55:06Z,,NONE,,"Say I want to get all my Github Org's repos info, for data analysis. Not just the public repos, but also the private/internal repos.

The endpoints are different for organisation, and this tool doesn't take it into account:
https://github.com/dogsheep/github-to-sqlite/blob/ace13ec3d98090d99bd71871c286a4a612c96a50/github_to_sqlite/utils.py#L453
https://github.com/dogsheep/github-to-sqlite/blob/ace13ec3d98090d99bd71871c286a4a612c96a50/github_to_sqlite/utils.py#L455

The endpoints for organisation repos is instead ([source](https://docs.github.com/en/rest/repos/repos#list-organization-repositories)):
`url = ""https://api.github.com/orgs/{}/repos"".format(username)`

Let's add support for organisations repo scraping.",207052882,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/github-to-sqlite/issues/75/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1410548368,I_kwDODFdgUs5UE0KQ,77,Feature: Support GitHub discussions,631242,open,0,,,0,2022-10-16T16:53:38Z,2022-10-16T16:53:38Z,,CONTRIBUTOR,,"Hi @simonw I've been a happy user of this tool.  Thank you for writing it and sharing it.

I wanted to suggest a feature request to support  Discussions.  For example the VisiData project has discussions https://github.com/saulpw/visidata/discussions , and it would be useful if there was a way to pull that data into the database.

However, I'm not offering a pull request.",207052882,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/github-to-sqlite/issues/77/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1505411725,I_kwDODFdgUs5ZusKN,78,self-hosted or corp github enterprise,549431,open,0,,,0,2022-12-20T22:51:45Z,2022-12-20T22:51:45Z,,NONE,,"We use github enterprise at work and I would like to use this tool to pull info from that site rather than the public github.com instance.  Is there an option for this?  If not, can one be added for a custom repo URL?",207052882,issue,,,"{""url"": ""https://api.github.com/repos/dogsheep/github-to-sqlite/issues/78/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
581795570,MDU6SXNzdWU1ODE3OTU1NzA=,93,Support more string values for types in .add_column(),9599,open,0,,,0,2020-03-15T19:32:49Z,2020-09-24T20:36:46Z,,OWNER,,"https://sqlite-utils.readthedocs.io/en/2.4.2/python-api.html#adding-columns says:
> SQLite types you can specify are ""TEXT"", ""INTEGER"", ""FLOAT"" or ""BLOB"".

As discovered in #92 this isn't the right list of values. I should expand this to match https://www.sqlite.org/datatype3.html",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/93/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
275159710,MDU6SXNzdWUyNzUxNTk3MTA=,128,"Every visualization should have an ""embed"" button",9599,open,0,,,0,2017-11-19T13:38:13Z,2019-05-13T18:33:51Z,,OWNER,,"At least for the first round of visualizations, any time you construct one using the UI the result should include an ""embed this"" button that returns source code to copy and paste

These examples should use unpkg.com (or similarl) urls with SRI hashes, eg https://www.srihash.org - and should load data from the datasette JSON API.",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/128/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
688352145,MDU6SXNzdWU2ODgzNTIxNDU=,141,insert-files support for compressed values,9599,open,0,,,0,2020-08-28T20:59:46Z,2020-09-24T20:36:08Z,,OWNER,,"The `sqlar` format supports this, it would be useful if `insert-files` could support this too.

https://www.sqlite.org/sqlar.html",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/141/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
312395790,MDU6SXNzdWUzMTIzOTU3OTA=,197,Ability to sort by more than one column,9599,open,0,,,0,2018-04-09T05:13:30Z,2018-07-10T17:45:37Z,,OWNER,,"Split off from #189.

I'd like to support ""sort by X descending, then by Y ascending if there are dupes for X"" as well. Suggested syntax for that:

    ?_sort_desc=X&_sort=Y

we currently only allow one argument to be sent. We should allow as many arguments as there are columns, for example:

    ?_sort=department&_sort_desc=precinct&_sort=age&_sort_desc=size",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/197/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
312396095,MDU6SXNzdWUzMTIzOTYwOTU=,198,Ability to sort with nulls last,9599,open,0,,,0,2018-04-09T05:15:40Z,2018-07-10T17:45:37Z,,OWNER,,"Split off from #189

Here's how to do that in SQL: https://fivethirtyeight.datasettes.com/fivethirtyeight-2628db9?sql=select+rowid%2C+*+from+%5Bnfl-wide-receivers%2Fadvanced-historical%5D%0D%0Aorder+by+case+when+career_ranypa+is+null+then+1+else+0+end%2C+career_ranypa%2C+rowid

    order by case when career_ranypa is null then 1 else 0 end, career_ranypa",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/198/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
314771615,MDU6SXNzdWUzMTQ3NzE2MTU=,218,"Support custom unit display in order to handle ""$10,000""",9599,open,0,,,0,2018-04-16T18:39:31Z,2018-07-10T17:45:38Z,,OWNER,,"I tried to get Datasette to display `$10,000` using the new units support but we currently only display units as a suffix:

https://github.com/simonw/datasette/blob/10a34f995c70daa37a8a2aa02c3135a4b023a24c/datasette/app.py#L563-L572

It would be neat if there was a mechanism for specifying a custom unit display - maybe something like this:

```
{
    ""custom_units"": {
        ""us_dollar"": {
            ""unit"": ""us_dollar = [] = $"",
            ""format"": ""${:,}""
        }
    }
}
```",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/218/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
314834783,MDU6SXNzdWUzMTQ4MzQ3ODM=,219,Expose units in the JSON API?,45057,open,0,,,0,2018-04-16T22:04:25Z,2018-04-16T22:04:25Z,,CONTRIBUTOR,,"From #203: it would be nice for the JSON API to (optionally) return columns rendered with units in them - if, for example, you're consuming the JSON to render the rows on a map.

I'm not entirely sure how useful this will be though - at the moment my map queries are custom SQL queries (a few have joins in, the rest might be fetching large amounts of data so it makes sense to limit columns fetched). Perhaps the SQL function is a better approach in general.",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/219/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
318490133,MDU6SXNzdWUzMTg0OTAxMzM=,241,Default datasette logging format should be JSON,9599,open,0,,,0,2018-04-27T17:32:48Z,2018-07-10T17:45:40Z,,OWNER,,"Structured logs are better. Datasette should default to outputting it's HTTP access log lines as newline delimited JSON instead of the Sanic default format it uses at the moment.

For improved greppability these logs should have keys ordered in a consistent way. Python's JSON module can do this with ordered dictionaries.",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/241/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
816601354,MDExOlB1bGxSZXF1ZXN0NTgwMjM1NDI3,241,Extract expand - work in progress,9599,open,0,,,0,2021-02-25T16:36:38Z,2021-02-25T16:36:38Z,,OWNER,simonw/sqlite-utils/pulls/241,Refs #239. Still needs documentation and CLI implementation.,140912432,pull,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/241/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1,
818684978,MDU6SXNzdWU4MTg2ODQ5Nzg=,243,How can i use this utils to deal with fts on column meta of tables ?,27874014,open,0,,,0,2021-03-01T09:45:05Z,2021-03-01T09:45:05Z,,NONE,,"Thank you to release this bravo project.
When i use this project on multi table db, I want to implement 
convenient search on column name from different tables.
I want to develop a meta table to save the meta data of different columns
of different tables and search on this meta table to get rows from the
data table (which the meta table describes)
does this project provide some simple function on it ?

You can think a have a knowledge graph about the table in the db, 
and i save this knowledge graph into the db with fts enabled.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/243/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
320132682,MDU6SXNzdWUzMjAxMzI2ODI=,250,Setup some issue templates,9599,open,0,,,0,2018-05-04T01:49:07Z,2018-05-04T01:49:07Z,,OWNER,,"https://twitter.com/left_pad/status/99216385740464537

I like the idea of using these to help people understand some of the ways I want to use issues.",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/250/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
326778161,MDU6SXNzdWUzMjY3NzgxNjE=,290,Consider increasing the default for num_sql_threads (currently 3),9599,open,0,,,0,2018-05-27T00:52:41Z,2018-05-27T00:52:41Z,,OWNER,,"I ran a very rough micro-benchmark on the new `num_sql_threads` config option (added in #285)

    datasette --config num_sql_threads:1 fivethirtyeight.db

Then

    ab -n 100 -c 10 'http://127.0.0.1:8011/fivethirtyeight-2628db9/twitter-ratio%2Fsenators'

| Number of threads | Requests/second |
|---|---|
| 1 | 4.57 |
| 3 | 9.77 |
| 10 | 13.53 |
| 20 | 15.24 
| 50 |  8.21 | 

This was on my early 2018 OS X laptop. Need to benchmark in other common environments before making a decision on changing the default. That said, the default of 3 was a number I plucked out of thin air.",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/290/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
344654623,MDU6SXNzdWUzNDQ2NTQ2MjM=,347,"Rename ""datasette package"" to ""datasette publish docker""",9599,open,0,,,0,2018-07-26T00:42:46Z,2018-07-26T00:42:46Z,,OWNER,,,107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/347/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
346026869,MDU6SXNzdWUzNDYwMjY4Njk=,354,Handle many-to-many relationships,9599,open,0,,,0,2018-07-31T04:03:13Z,2020-11-24T19:51:18Z,,OWNER,,This is a master tracking ticket for various many-2-many features.,107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/354/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1090798237,I_kwDOCGYnMM5BBEKd,359,Use RETURNING if available to populate last_pk,9599,open,0,,,0,2021-12-29T23:43:23Z,2021-12-29T23:43:23Z,,OWNER,,"Inspired by this: https://news.ycombinator.com/item?id=29729283

> Because SQLite is effectively serializing all the writes for us, we have zero locking in our code. We used to have to lock when inserting new items (to get the LastInsertRowId), but the newer version of SQLite supports the RETURNING keyword, so we don't even have to lock on inserts now.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/359/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
359075028,MDExOlB1bGxSZXF1ZXN0MjE0NjUzNjQx,364,Support for other types of databases using external connectors,11912854,open,0,,,0,2018-09-11T14:31:47Z,2018-09-11T14:31:47Z,,FIRST_TIME_CONTRIBUTOR,simonw/datasette/pulls/364,"This PR is related to #293, but now all commits have been merged.

The purpose is to support other file formats that aren't SQLite, like files with PyTables format. I've tried to accomplish that using external connectors published with entry points.

The modifications in the original datasette code are minimal and many are in a separated file.",107914493,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/364/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
377166793,MDU6SXNzdWUzNzcxNjY3OTM=,372,Docker build tools,82988,open,0,,,0,2018-11-04T16:02:35Z,2018-11-04T16:02:35Z,,CONTRIBUTOR,,"In terms of small pieces lightly joined, I note that there are several tools starting to appear for building generating Dockerfiles and building Docker containers from simpler components such as `requirements.txt` files.

If plugin/extensions builders want to include additional packages, then things like incremental builds of composable builds that add additional items into a base `datasette` container may be required.

Examples of Dockerfile generators / container builders:

- [openshift/source-to-image (s2i)](https://github.com/openshift/source-to-image)
- [jupyter/repo2docker](https://github.com/jupyter/repo2docker)
- [stencila/dockter](https://github.com/stencila/dockter)

Discussions / threads  (via Binderhub gitter) on:
- [why `repo2docker` not `s2i`](http://words.yuvi.in/post/why-not-s2i/)
- [why `dockter` not `repo2docker`](https://twitter.com/choldgraf/status/1058499607309647872)
- [composability in `s2i`](https://trello.com/c/AexIVZNf/1008-8-composable-builds-builds-evg)

Relates to things like:

- https://github.com/simonw/datasette/pull/280",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/372/reactions"", ""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 2, ""rocket"": 0, ""eyes"": 0}",,
426722204,MDU6SXNzdWU0MjY3MjIyMDQ=,423,?_search_col=X not reflected correctly in the UI,9599,open,0,,,0,2019-03-28T21:48:19Z,2020-11-03T19:01:59Z,,OWNER,,"e.g. https://latest.datasette.io/fixtures/searchable?_search_text1=barry

![2019-03-28 at 2 47 PM](https://user-images.githubusercontent.com/9599/55195035-84ebb800-5168-11e9-910b-fc9868bcd93e.png)
",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/423/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
440325850,MDExOlB1bGxSZXF1ZXN0Mjc1OTIzMDY2,452,SQL builder utility classes,45057,open,0,,,0,2019-05-04T13:57:47Z,2019-05-04T14:03:04Z,,CONTRIBUTOR,simonw/datasette/pulls/452,"This adds a straightforward set of classes to aid in the construction of
SQL queries.

My plan for this was to allow plugins to manipulate the
Datasette-generated SQL in a more structured way. I'm not sure that's
going to work, but I feel like this is still a step forward - it
reduces the number of intermediate variables in `TableView.data` which
aids readability, and also factors out a lot of the boring string
concatenation.

There are a fair number of minor structure changes in here too as I've
tried to make the ordering of `TableView.data` a bit more logical. As
far as I can tell, I haven't broken anything...",107914493,pull,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/452/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",0,
1324659241,I_kwDOCGYnMM5O9LIp,459,Single quoted transform recipes on Windows do not work as expected ,19921,open,0,,,0,2022-08-01T16:14:54Z,2022-08-01T16:14:54Z,,CONTRIBUTOR,,"Trying to follow the tutorial for sqlite-utils and datasette https://datasette.io/tutorials/clean-data on Windows 11 OS `Microsoft Windows [Version 10.0.22622.440]`, with sqlite-utils and datasette installed using pipx.

```
pipx list
package datasette 0.61.1, installed using Python 3.10.4
    - datasette.exe
package sqlite-utils 3.28, installed using Python 3.10.4
    - sqlite-utils.exe
```  

In the step to transform dates into ISO dates the quoted value `'r.parsedatetime(value)'` is copied verbatim into the columns instead of applying the output of the Python recipe.

```
sqlite-utils convert manatees.db locations \
  REPDATE created_date last_edited_date \
  'r.parsedatetime(value)' --dry-run

1975/01/31 00:00:00+00
 --- becomes:
r.parsedatetime(value)

Would affect 13568 rows
```

However, if I change the code from single quotes to double quotes, it works as expected.

```
sqlite-utils convert manatees.db locations \
  REPDATE created_date last_edited_date \
  ""r.parsedatetime(value)"" --dry-run

1975/01/31 00:00:00+00
 --- becomes:
1975-01-31T00:00:00+00:00

Would affect 13568 rows
```

Specifying the transform code recipe should work with single quotes on Windows.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/459/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1355193529,I_kwDOCGYnMM5Qxpy5,479,OperationalError: cannot VACUUM from within a transaction,7908073,open,0,,,0,2022-08-30T05:34:24Z,2022-08-30T05:34:24Z,,CONTRIBUTOR,,"Maybe when calling `.vacuum()` and other DB-level write-lock operations `sqlite_utils` could guard against this error message by automatically committing first?

```
     46 db[""media""].optimize()  # type: ignore
---> 47 db.vacuum()

File ~/.local/lib/python3.10/site-packages/sqlite_utils/db.py:1047, in Database.vacuum(self)
   1045 def vacuum(self):
   1046     ""Run a SQLite ``VACUUM`` against the database.""
-> 1047     self.execute(""VACUUM;"")

File ~/.local/lib/python3.10/site-packages/sqlite_utils/db.py:470, in Database.execute(self, sql, parameters)
    468     return self.conn.execute(sql, parameters)
    469 else:
--> 470     return self.conn.execute(sql)

OperationalError: cannot VACUUM from within a transaction
```

It might also be nice to add a sentence or two about how transactions are committed on the [docs page](https://sqlite-utils.datasette.io/en/latest/python-api.html#detect-fts). When I was swapping out my sqlite3 code for this library it was nice that everything was pretty much drop-in but I was/am unsure what to do about the places I explicitly call `.commit()` in my code

Related to https://github.com/simonw/sqlite-utils/issues/121",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/479/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1359604075,I_kwDOCGYnMM5RCelr,481,"Idea: `sqlite-utils create-table tablename --sql ""select ...""`",9599,open,0,,,0,2022-09-02T01:41:24Z,2022-09-02T01:42:08Z,,OWNER,,"Could offer syntactic sugar for:

```sql
create table foo as select * from bar
```

```
sqlite-utils create-table data.db foo --sql ""select * from bar""
```
https://sqlite-utils.datasette.io/en/stable/cli-reference.html#create-table",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/481/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
449445715,MDU6SXNzdWU0NDk0NDU3MTU=,491,Figure out how to use Firebase with cloudrun to enable vanity URLs and CDN caching,9599,open,0,,,0,2019-05-28T19:48:06Z,2019-05-28T19:48:35Z,,OWNER,,"It looks like Firebase can solve a couple of problems with the existing `datasette publish cloudrun` hosting mechanism:

* The URLs it produces aren't pretty enough. Firebase offers more control over vanity URLs.
* CDN caching (as seen in `datasette publish now`) is great for improving performance and saving money on Cloud Run execution time.

https://firebase.google.com/docs/hosting/cloud-run looks like it can help with both of these.

Lots of interesting questions:

* Should this be a new `datasette publish firebase` command or should it instead be implemented as additional custom options to `datasette publish cloudrun`?
* How much harder does it become to do account setup?
* How much will this option cost users?",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/491/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1453134846,I_kwDOCGYnMM5WnRP-,513,Add or document streamlined workflow for importing Datasette csv / json exports,19328961,open,0,,,0,2022-11-17T10:54:47Z,2022-11-17T10:54:47Z,,NONE,,"I'm working on some small front-end enhancements to the laion-aesthetic-datasette project, and I wanted to partially populate a database directly using exports from the existing Datasette instance instead of downloading the parquet files and creating my own multi-GB database.

There have been a number of small issues that are certainly related to my relative lack of familiarity with the toolkit, but that are still surprising. 

For example: a CSV export of the images table (http://laion-aesthetic.datasette.io/laion-aesthetic-6pls.csv?sql=select+rowid%2C+url%2C+text%2C+domain_id%2C+width%2C+height%2C+similarity%2C+punsafe%2C+pwatermark%2C+aesthetic%2C+hash%2C+__index_level_0__+from+images+order+by+random%28%29+limit+100) has nested single quotes, double quotes, and commas that aren't handled by rows_from_file. Similarly, the json output has to be manually transformed to add the column names and remove extraneous information before sqlite_utils can import it.

I was able to work through these issues, but as an enhancement it would be really helpful to create or document a clear workflow that avoids the friction of this data transformation.",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/513/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
459469278,MDU6SXNzdWU0NTk0NjkyNzg=,515,Try shrinking official image with docker-slim,9599,open,0,,,0,2019-06-22T12:25:37Z,2019-06-22T12:25:37Z,,OWNER,,"This looks really promising: https://github.com/docker-slim/docker-slim

If it can shave substantial size from our official container reliably we could add it to the automated build process.",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/515/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1550536442,I_kwDOCGYnMM5ca076,521,Custom JSON encoder,31504,open,0,,,0,2023-01-20T09:19:40Z,2023-01-20T09:19:40Z,,NONE,,"It would be nice if we could specify a custom encoder (and decoder) for types that will need extra deserialisation – e.g., sets, enums or sparse matrices – or even project-specific types",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/521/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
460095928,MDU6SXNzdWU0NjAwOTU5Mjg=,528,Establish a pattern for Datasette plugins built on top of Pandas,9599,open,0,,,0,2019-06-24T21:05:52Z,2019-06-24T21:05:52Z,,OWNER,,"The Pandas ecosystem is huge, varied and full of tools that are really good at doing interesting analysis on top of tabular data.

Pandas should not be a dependency of Datasette core, but I think there is a lot of potential in having plugins which use Pandas to apply interesting analysis to data sucked out of Datasette's SQLite tables.

One example ([thanks, Tony](https://twitter.com/psychemedia/status/1143259809715752962)): https://github.com/ResidentMario/missingno could form the basis of a fantastic plugin for getting a high-level overview of how complete each column in a table is.

Some thought is needed here about what shape these kind of plugins might take, and what plugin hooks they would use.",107914493,issue,,,"{""url"": ""https://api.github.com/repos/simonw/datasette/issues/528/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1754174496,I_kwDOCGYnMM5ojpQg,558,Ability to define unique columns when creating a table,1910303,open,0,,,0,2023-06-13T06:56:19Z,2023-08-18T01:06:03Z,,NONE,,"When creating a new table, it would be good to have an option to set unique columns similar to how not_null is set.

```python
from sqlite_utils import Database

columns = {""mRID"": str, ""name"": str}
db = Database(""example.db"")
db[""ExampleTable""].create(columns, pk=""mRID"", not_null=[""mRID""], if_not_exists=True)
db[""ExampleTable""].create_index([""mRID""], unique=True, if_not_exists=True)
```

So something like this would add the UNIQUE flag to the table definition. 

```python
db[""ExampleTable""].create(columns, pk=""mRID"", not_null=[""mRID""], unique=[""mRID""], if_not_exists=True)
```

```sql
CREATE TABLE ExampleTable (
    mRID TEXT PRIMARY KEY
              NOT NULL
              UNIQUE,
    name TEXT
);
```",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/558/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1821108702,I_kwDOCGYnMM5si-ne,579,Special handling for SQLite column of type `JSON`,15178711,open,0,,,0,2023-07-25T20:37:23Z,2023-07-25T20:37:23Z,,CONTRIBUTOR,,"`sqlite-utils` should detect and have specially handling for column with a `JSON` column. For example:

```sql
CREATE TABLE ""dogs"" (
  id INTEGER PRIMARY KEY,
  name TEXT,
  friends JSON 
);
```

## Automatic Nesting

According to [""Nested JSON Values""](https://sqlite-utils.datasette.io/en/stable/cli.html#nested-json-values), sqlite-utils will only expand JSON if the `--json-cols` flag is passed. It looks like it'll try to `json.load` all text column to test if its JSON, which can get expensive on non-json columns. 

Instead, `sqlite-utils` should be default (ie without the `--json-cols` flags) do the `maybe_json()` operation on columns with a declared `JSON` type. So the above table would expand the `""friends""` column as expected, withoutthe `--json-cols` flag:

```bash
sqlite-utils dogs.db ""select * from dogs"" | python -mjson.tool
```

```
[
    {
        ""id"": 1,
        ""name"": ""Cleo"",
        ""friends"": [
            {
                ""name"": ""Pancakes""
            },
            {
                ""name"": ""Bailey""
            }
        ]
    }
]
```

---

I'm sure there's other ways `sqlite-utils` can specially handle JSON columns, so keeping this open while I think of more",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/579/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,
1822918995,I_kwDOCGYnMM5sp4lT,580,Add way to export to a csv file using the Python library,44324811,open,0,,,0,2023-07-26T18:09:26Z,2023-07-26T18:09:26Z,,NONE,,"According to the documentation, we can make a csv output using the CLI tool, but not the Python library. Could we have the latter?",140912432,issue,,,"{""url"": ""https://api.github.com/repos/simonw/sqlite-utils/issues/580/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,