Just When I Thought I Was Out…

Reader, it turns out dissertations have a great big gravity well, with the pull of a 1,000 black holes. Light can’t escape. Thoughts can’t escape. I can’t escape.

Just a few weeks ago (weeks. ago.), I waxed poetical on focusing on the now.

Look, I tried.

My intention until, oh, about two days ago, was to do a sort of genealogical study of new Latinx musical forms, specifically Reggaetón, from the insular and continental Caribbean. I was going to use 21st century computational literary studies methods (networks) to analyze a (mostly) 21st century musical form. It was going to be about now.

What happened? I discovered two important obstacles, both having to do with technical expertise (or lack thereof). It turns out that it is very, very difficult to use 21st century methods…when you don’t have 21st century technical skills. Shocker, right?

To work on music, I wanted to scrape data from Spotify, which is really easy–if you know Python and know how to use Spotify’s API (application programming interface). I had neither skills. But I was intent on trying (I’m, like, really smart). I found one video that seemed particularly helpful; the video goes through the process of getting an artist’s album art from Spotify by scraping the web page source code (not the API). This video was helpful in introducing me to Visual Studio Code, which is allegedly used by everyone (I say allegedly because I truly don’t know–I’m not throwing shade). I’ve played around with Visual Studio Code, and it does feel like an easy to use interface, even for a non-coder like me. I also learned some Python commands.

But I couldn’t really figure out how to get what I needed, and honestly not only scraping web source code but parsing it was quite beyond me, even with borrowed code. So, I did more searches and found that there were great tools that could be used through a combination of Python and Spotify’s API. But I also found that it’s very difficult to follow many articles that pretend to show you how to do use these tools (it’s really easy, you see). My sense is that the writers of these articles really believed they were being very clear about all of the steps (it also seemed that way when I read the articles as opposed to trying to replicate what they showed), but in my opinion, those articles are very writerly (or writer-focused) texts, not readerly (or reader-focused) texts. And hey, it makes sense: it is well known that once you cross an educational threshold, it can be very difficult to explain the concept or skill to someone who is learning it.

I resorted to taking a LinkedIn Learning course on Python, which is useful for understanding the basics of the language. But the class is several hours long, and I don’t really have the time for that. The parts that I did do, however, were enough to help me decipher some of the implicit instructions in some of these articles (e.g., to install a library or module in Python 3, you have to use the command pip3, not pip, etc.). But it wasn’t enough to get me anywhere. I started to think about pivoting to another project (wait for it…), but also kept tooling around with Python. As I started to madly look about for a new project (could I look at Latin American science fiction? what about Marvel superhero comics? what about the posthuman in Battlestar Galactica?), I found a website that promised to scrape the data for me! Yay! It was very expensive. Boo! But there was a free trial. Yay! The free trial was very much a trap, like most free trials. The website did scrape the Spotify data through the API, but it only returned 10 rows at a time. And the queries I ran came back with pretty much the same 10 tracks/artists, which really wouldn’t work for the project.

But research, Reader, is a team sport. A good friend who is working more and more with DH tools called me (he got tired, I think, of my frantic, desperate texts). That call coincided with one last ditch effort to follow the instructions in an article, which I used to assemble a program that would scrape the data.

Image of several lines of Python code on the VS Code editor.
Coding. It’s a thing.

As I answered my friend’s phone call, I literally hit the “run” button on VS Code…and got my dataset.

Image of an Excel CSV spreadsheet with track information scraped from Spotify.
Datasets. They’re a thing.

While the dataset was legit, it didn’t have the information I needed, not for what I had set out to do. I meant to do, again, a kind of genealogy by mapping out network relationships between these artists and others outside the genre. That means that I would need to add sampling information to this dataset, which I don’t quite know how to do. My friend suggested taskrabbit, but that seemed like adding pieces to this project that I just didn’t have the time to do.

My friend and I kept going back and forth on discussing the feasibility of this music project and a completely new one. He mentioned Project Gutenberg, which has lots and lots of texts in public domain. He specifically referenced nineteenth century novels (he knows that this is a strength of mine) and threw out “Melville.” Which made think of Hawthorne. Which made think of the last few pages of the last chapter of my dissertation. Which sort of speculated on a parallel between the dynamics of the custom sketch and the form of the novel (the bulk of my dissertation) and the dynamics between social custom or norms and the law. And, before I knew it, I couldn’t reach the escape velocity to resist the pull of my dissertation. In fact, I sort of dove right in. And, I’m either gonna go through the 1,000 black holes or slingshot my way around them never to visit that galaxy again.

As of now, I have about 30 texts I downloaded from Project Gutenberg that I will be analyzing through a few DH tools (you can see my proposal here). Even though I have a long road ahead of me to see what I can come up with, it’s good to know that I have a corpus–and that there’s a there, though what it is, I can’t tell yet.

Image of an initial corpus analysis using Voyant.
Initial queries using Voyant

As you can see from the image above, there’s lots to be done.

Image of an initial collocation analysis using the AntConc tool.
Collocation analysis using AntConc

This collocation analysis is very suggestive. The term I searched for was “custom,” and here are several words that point to the law (e.g., clerkships, sanctions, authorises, archive, etc.). There’s also “anathematise,” which I don’t think means anything except: God bless nineteenth century writers.

Image of a concordance analysis using the AntConc tool.
Concordance using AntConc

This concordance, again run on “custom,” also seems promising (and intelligible in ways that statistical output, frankly, isn’t for me).

I understand that the length of this post is not ideal. But I wanted to show the real vagaries of research, which operate at a resource level (what can I study?), an intellectual/conceptual level (what questions can/should I ask?), and a deeply personal level (I will study the present. I am studying the past). In other words: Research. It’s a journey.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: