Poster for Riverview Amusement Park, c.1930s, Chicago.
One of my fave signs on Montrose #Chicago #typevstime
A 6th grader’s advice to next year’s 6th graders.
Data is a buzzword nowadays. Whether it’s sifting Big Data to influence business, or the promise of Open Data to transform government, or Data Analytics winning elections, data is constantly in the news. But one thing that gets glossed over in all the buzz is that data is hard. Really, really hard. One of the hardest parts is cleaning, standardizing, and formatting data in a way that journalists and others can start to work with. These are real challenges faced by newsrooms and we’re hoping to make some of that a little easier with two new Code Sprints we’re happy to announce today.
One of the biggest problems with data sets is figuring out if information in one set of data is the same as information in another. When you have a small set of data, the work is pretty straightforward. But as your rows increase, the work becomes daunting. Derek Eder and Forest Gregg at Chicago’s DataMade have been working on an automated process for deduplification of data, and we’re happy to help get it to a state where running it through huge datasets is as simple as a few calls from the command line.
A clear early use for the tool is in deduplifying campaign finance records, which can often be a slog. We’ve recruited the help of Derek Willis and others from the New York Times—a href=”who know something about The DataMade team have done a great deal of heavy lifting already—“we’ve solved the most of major engineering challenges of scaling up on large datasets,” DataMade’s Eder says—but getting a lower barrier to entry on the tool is time and money well spent. If you can program Python, you can fork and start running Dedupe today. If you want to wait for the simplified version, we’re expecting development to wrap up early this summer.
The US Treasury releases a statement of, essentially, the Federal Government’s checkbook every day at 4pm EST. Unhelpfully, they release it as a straight-up text file or a PDF. Newsroom developers and info-hackers Cezary Podkul, Burton DeWilde, Thomas Levine, Jake Bialer, Brian Abelson, and Michael Keller started work on scraping and parsing that daily statement at the Bicostal Datafest earlier this year.
The team got far enough along at the Datafest that they approached us about helping to turn it into an open API that any newsroom developer can access. With our Code Sprint grant, the team will take this once nearly-inaccessible dataset and transforming it into an easily accessible API that returns machine-readable JSON. In this time of cutbacks and budget wrangling, the FMS Parser should offer developers and journalists a new way to dive deeply into governmental spending.
The tool should see some immediate use too, as the team of developers working on it include newsroom developers at Reuters, the Daily Beast, and the Huffington Post (along with our Knight-Mozilla Fellow at the New York Times). While it’s still being developed, you can fork and follow at the FMS Parser Github repo.
A month ago I announced a reimagined Code Sprint application process, and we’re excited to help tools like this get the funding and attention they need through it. We’re always looking for developers and newsrooms with great ideas they want to build (along with newsrooms that want to betatest them), so please drop a line. Let’s do this!Next up: FMS Parser
Onward
The Chicago Colleens batting at Wrigley, All American Girls Professional Baseball League, 1948, Chicago.
[video]
[video]
(Source: quietsarcasm, via dailybungalow)
[video]
Sun Chaser by Andrew Holder.
(via colepierce)
Could prove useful
(via sunfoundation)
Back at the Hacks/Hackers Media Party in Buenos Aires, I announced the creation of Code Sprints—funding opportunities to build open-sourced tools for journalism. We used Code Sprints to fund a collaboration between WNYC in New York and KPCC in Southern California to build a parser for election night XML data that ended up used on well over 100 sites—it was a great collaboration to kick off the Code Sprint concept.
Originally, Code Sprints were designed to work like the XML parser project: Driven in concept and execution by newsrooms. While that proved great for working with WNYC, we heard from a lot of independent developers working on great tools that fit the intent of Code Sprints, but not the wording of the contract. And we heard from a lot of newsrooms that wanted to use code, but not drive development, so we rethought how Code Sprints work. Today we’re excited to announce refactored Code Sprints for 2013.
Now, instead of a single way to execute a Code Sprint, there are now three ways to help make Code Sprints happen:
Each of these options means we can work with amazing code, news organizations, and developers and collaborate together to create lots of great open-source tools for journalism.
I always think real-world examples are better than theoreticals, so today I’m also excited to announce the first grant of our revamped Code Sprints will go to Jessica Lord to develop her great Sheetsee.js library for the newsroom. Sheetsee has been on the OpenNews radar for a while—we profiled the project in Source a number of months back, and we’re thrilled to help fund its continued development.
Sheetsee was originally designed for use in the Macon, Georgia government as part of Jessica’s Code for America fellowship, but the intent of the project—simple data visualizations using a spreadsheet for the backend—has always had implications far beyond the OpenGov space. We’re excited today to pair Jessica with Chicago Public Media (WBEZ) to collaborate on turning Sheetsee into a kick-ass and dead-simple data journalism tool.
For WBEZ’s Matt Green, Sheetsee fit the bill for a lightweight tool that could help get the reporters “around the often steep learning curve with data publishing tools.” Helping to guide Jessica’s development to meet those needs ensures that Sheetsee becomes a tool that works at WBEZ and at other news organizations as well.
We’re excited to fund Sheetsee, to work with a developer as talented as Jessica, to collaborate with a news organization like WBEZ, and to relaunch Code Sprints for 2013. Onward!
Pyramids.
(via kenyatta)
[video]
Cabin on the Long Trail, Vermont.
Submitted by David Szlasa.