IAP 21: Agenda and notes

Samuel Klein; Agnes Cameron

Structuring Collective Knowledge, v2 : Interpreting Layered Knowledge
Collaborative notes — Edit these notes

Monday ₍_{intro slides}₎

Background examples:

Perseus Project (1996, 2018) [aligning annotated classics]
Wikidata (2005, 2007, 2016) [generalized collective data]
Complete genomes (1990, 2015) [annotated gene surfing]
Human Cell Atlas (2019) [aligned multi-omic taxo; origin & data needs]
1632 (essay, authors’ manual) [collectively authored sci-fi]
Life - Slime molds; Íslendingabók; Addgene
World - Geograph (ideas); Google Sky; iNaturalist; Zooniverse
Research - WikiData + data underlays; PubPub; CERN + Zenodo
Synthesis - Roam + Codex + mind maps;
…add yours here!

Introductions

[[flancian]] took notes here, will transcribe later:
anagora.org/node/iap21 | agora for this agenda

[[metasj]]
[[agnes cameron]] works with [[metasj]], part of knowledge collaboration innovation data. In [[uk]].
[[samir ghosh]] runs a [[maker space]]. Does [[vr]].
[[sarah s]] [[neuroscience]] and [[machine learning]]. [[crowdsourcing]] to probe [[human perception]].
[[amr kayid]] [[munich]] [[research engineer]] at [[open mind]]. [[reinforcement learning]].
[[thomas renkert]] [[religious studies]] [[knowledge representation]].
[[kevin]] (pending)
[[favour kelvin]]
Jose Labra

Add your name + current projects / tidbits below

Project ideas + lightning talks

Codex Editor : paper
Go-links : can this be made global? A global authority file
Coronavirus : collecting data on those who have passed away
- In Austria — the lack of such initiatives is itself a topic of discussion
- US: the NYT list of names; historical parallel: the Yale AIDS memorial
ImageNet : human annotations of human images; linked to WordNet
- WordNet: grouped words into synsets (concepts)
- Images associated to 20k synsets, via URLs (imgs not owned), tagged
- ~> used to train ResNet, SENet, BigGAN —> inception

Thomas: want to see tools that can be used to represent text in context.
Suggestions: Quotebacks / ObjectNet / OpenMined / Dall-E / Are.na

Shared proposals

Visual concept vocabulary (like a “map” of GAN latent space) — SS, SG
How to structure data from collective labelling —> Toward a Topology of Visual Experience
Annotated Hub of Science — Amr, SJ, SG, EI
(approaches to hypertexts + research)
Connector Library (electrical + physical - UCK ) — AC, SG
(visual, geometric, parametric descriptors; HTMAE)
-> some kind of nodal graph? (connect X to Y with Z properties)
Recipes + Food sustainability — EI, SJ
(important + good model, clear requirements)
Movebank + eBird? — FK, AC?
(mix of researchers + crowd contributions)
Meta - wikilinks + query language (start at root, apply constraints)
(what do we need to streamline this collab)

Tuesday

SG: Specs for network graphs or … hierarchical datasets? When you click on something: be able to render arbitrary doc-set or html snippet? I’ve been thinking about latent space for a while; in the uni, people talk about getting meaningful info out of a neural net / that’s network data of its own, that could use complex viz.

AC: that’s really interesting — kg problems are often sth that sits b/t a standard network graph and a more structured thing.

Choosing viz and hierarchy models?

Keyword branching out into other ways.
How to specify a choice of graph viz along w/ what you render on mouseover/clicking on nodes?
Let users shove away information / bring relevant info to the front.

Neo4j : can we improve on both the defaults and how we interact with them?

New proposal: capturing graph-viz approaches + styles?

are.na : structuring a set of X (viz types?)
web apps can sit on top of a. channels. [YT chan -> a.chan -> player]
scoby (a.chan) —> website
code : a list of awesome graph viz? [awesome lists as meta-layer]
HUDs and GUIs: annotated reviews of designs in fiction (film)

Links

“Let’s do better than force-directed graphs”
Julius | d-on-s
FoodOn
Encyclopedia Britannica Online (1994)
Arena search
The Blit
Perpendicular Institute (and its performance hacks)
[Splatoon 2 getting shut down vs Ring Fit + proprietary hardware…]
Science of progress

Wednesday: Inception

Working w/ public knowledge graphs, building new visualizations + overlays

Room: Andra, Blake, FK, ET, SG, AC, SS, SK

Example: Food Sustainability

You don’t always know what you mean to ask until you start seeing some answers. Example of that — Google Translate (showing potential results, letting you refine the result based on possible alternatives for each part of a paragraph)

Postive questions to answer:
What Q’s / queries do you want to answering?
What datasets are you using?
What audiences might use this, or contribute to it?
What results in the world (here: regulations, food guidance) might need it?
Are there natural focused subsets to develop along the way?

Think about a single Overlay, a few Underlay collections, and an Interlay spec that could link them / handle any interchange + interfacing needed.

Obstacles:
What does it take for data currently under license / not being shared to become usable in this way?

Maintenance:
What curation and updating are happening? are needed?

Iteration, persistence:
What steps can be iterated? Where there are obstacles, is there a related step that is easier to start with (at another level of abstraction or access, w/ a similar source or topic)? Clearly describing an obstacle is a first step towards working around it.

Network: How does this fit into other popular / existing / needed sources?
Are there ideals that you can approach w/ enough clarity? (ideal pound cake)

Comments:

To what degree can we use half-closed data to make tools for the user?
You can always get metadata or analyze closed data and share that.

Example: Blake Elias, COVID modelling

Countering the zero-sum game narrative: Using a model to derive the optimal health outcomes is actually very similar to a model that optimises economic outcomes: you have an optimisation question where you can minimise the summation of costs due to both cases and restrictions

source — Oxford tool: http://epidemicforecasting.com allows you to modulate different aspects that affect the reproduction rate

More difficult to find data on what different COVID measures will cost. Can also account for vaccination models, including staging etc.

Where is the data?

‘Towards an optimal policy dashboard’ — want an idea of costs and cases caused by different policies.

Data sources: Cases, vaccinations, outcomes
Data interlay: Parameterized models: what parameters matter? What sources + outcomes are tied to each?
Data overlays: Something like CovidTracking for state-level data. But: these show raw data, not projections + decisional data: hat the map should show you “here are potential outputs, here are what data would inform these outputs.at you should do today”

Example: bret victor, ladder of abstraction. A nice example of representing complex models.

SJ: Q: How much do your choice of data sources, derivatives (these visuals), and audiences affect how effective this work is? [how effective it feels; empirically the feedback you get, if appropriate]

Favour Kelvin: How do we get decisional data for the dashboard?

Having a policy doesn’t mean yours is correct. Your policy may be wrong.
Most important parameters in this case: R₀. Underestimated (lowest in the literature). Cost: a ratio mattered most; cost of lockdown / cost per case. Took a ballpark: [lost ~10% of GDP over a year] — orders of magnitude.
How would better data affect those estimates? Unclear what data would make this more convincing. Now I have leads re: what data this model is sensitive to. So what’s the best way to organize that feedback loop.
- “looks like model N changes when this parameter goes about X.”
- Large scale: too many models that predict A, but aren’t clear about how to influence that.

SG: Thoughts from CA.
CA details: Lockdown decisions are based strictly on case data.
Some counties are oscillating in that threshold (purple/red/other tiers)
You can see how those tiers affect the progression of cases. (time offset makes modeling essential).

Can you combine the healthcare impact of death/hospitalization ($50k hospital cost) and the human impact? ($10M in actuarial tables?) What about the impact of lockdown?
: Ex: every county has different economies. LA has lots of labor, directly affected - not just sprawl; restaurant + tourism.
: Santa Clara / SF has lots of tech, already reducing % of in-person economy

CA.gov: Safer-Economy map (‘under control’?), Data + Tools
UK: Covid19Local, stats + analysis

BE: 3 options: let it go; keep it to some low level; get to zero.
The more valuable the local economy is, the more important it is.

AC: there are interesting outcomes, not just a specific recommendation.
Answers to individual questions under given models + parameters. Even if not a tool for specific policy recommendation, an interesting rhetorical tool

BE: 1) Did you really account for reality? Do we know what #s we need, do we have good estimates of them, what’s the sensitivity of a given model on them?
2) What’s the balance b/t being realistic enough, not over-promising

SG: really useful for me - imaging gov decisions as a dynamical thing; see its effect as it changes. Iterative decisions. (Blake: we made a little game for this)