DenyTranDFW TranDenyDFW

Howdy! 👋
I'm Deny.

About Me

I’m Deny Tran, a data professional with a passion for solving complex problems and continuous learning. My career path has been driven by curiosity and a desire to excel, from mastering Python and SQL to navigating the intricacies of machine learning.

Self-Concocted SQL-/ETL-/ML-/Analyst-Related Stuff

SEC-Financial-Statements-And-Notes-Dataset

Built from the SEC's Financial Statements & Notes .tsv datasets.
Environment: Docker on Ubuntu w/SQL Server 2019.
Joins/keys were based on guidance from FSNDS Notes, but the results were mainly trial and error due to the quality of the dataset.
Configurations were based on general SQL best practices, including bit and pieces of ideas from the following fantastic works:

10-Year Auto-Loan Payments + PySpark ML

Based on one of the many parsed SEC's ABS-EE (Asset Backed Securities - Electronic Exhibits) XML datasets.
Purpose of illustration was more-or-less for:
- Testing the parsed datasets,
- practicing SQL,
- working with ML models/libraries, and
- working with PySpark
Some key takeaways here were:
- PySpark is very good with working with large amounts of data, but is a different story when trying to get it to use the GPU.
  - It's nice that PySpark developers have ML options, but for serious work, you'll likely have to use PyTorch or TensorFlow.
- DuckDB is amazing. It handled the 35M+ rows here easily. On top of that (prior to this), the 550M+ rows of data from the FSNDS was easily consumed and queried.
  - With that said, DuckDB will only work well if you let it do the indexing. For example, if you tried to add all the FSNDS primary/foreign keys, it'll become unresponsive during data insert.
  - So, a good quick go-to for big datasets. But not so much if you're looking for something more.

Financial Statements And No-Notes Datasets

This was an extraction of the FSNDS to test the quality of the data after it was exported from SQL Server.

Tabular Data Generator

Similarly, an extraction of the FSNDS, but used here instead, for creating training data for a Tabular LLM (i.e., TaBERT).

Excel Operating Model

A financial model I created back in 2017 from scraped data.
The way the data is stored is somewhat naive, but the project as a whole was nonetheless, enlightening.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly