-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup and scrape tasks which define a set of scrapers and relevant scraping options #32
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal, wow, a hell of a PR.
I must say I can’t wait to be able to start using what you implemented here 👍
Other than what I wrote inline, I have several general comments:
-
The underscore notation of private fields makes me tick… 😄 . I would rather see functions extracted outside of the class if they are not to be shared publicly. At some point I gave up and stopped adding comments about it, but I would love to see it changed.
-
This was a massive PR, and I really had a hard time reviewing all of it at once. I would prefer seeing smaller incremental PRs, but anyway… Given this comment, I never tend to re-review PRs, but in this case I reserve the right to review again 😄
-
Would be great to see some reflection of this in the README file
-
This PR was created following Scraping multiple accounts together #28.
@esakal you might want to close this issue using one of the relevant keywords in the PR description.
-
Thanks so much for all the effort you put into this!
.eslintrc
Outdated
@@ -3,6 +3,8 @@ | |||
"arrow-body-style": 0, | |||
"no-shadow": 0, | |||
"no-console": 0, | |||
"no-underscore-dangle": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal hmm, I saw your comment about this in the description, and I only partially agree.
The way we could write private code in ES6 classes for now is to extract them into regular functions, outside the body of the class. I know it feels weird, but that's how I think the airbnb folks intended for it to work, and that's how I did it in the scrapers repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this line and will fix the code accordingly
.eslintrc
Outdated
@@ -3,6 +3,8 @@ | |||
"arrow-body-style": 0, | |||
"no-shadow": 0, | |||
"no-console": 0, | |||
"no-underscore-dangle": 0, | |||
"class-methods-use-this": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal yeah, I agree about this one, I think it's irrelevant.
src/helpers/files.js
Outdated
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal this function should probably throw an exception if the file doesn't exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good question, whether not having a file should throw an exception even if the purpose of this function is to remove that file. I think that a developer wants to delete a file, it doesn't matter for him if the file was not there to begin with as long as this file is not there once this method complete its execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal I disagree.
Most file APIs I know, as well as OS commands, will return an error if the file you're trying to delete doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, i removed the existsAsync
check condition and following your comment I assume that node fs.unlink
function will result with error if the file is missing.
btw, I don't have a machine with microsoft OS to check this out, I assume node handles both OS when dealing with the file system.
src/helpers/tasks-manager.js
Outdated
import { TASKS_FOLDER } from '../definitions'; | ||
import { SCRAPERS } from '../helpers/scrapers'; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal redundant line break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/helpers/tasks-manager.js
Outdated
|
||
const taskManager = new TaskManager(); | ||
|
||
export default taskManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal we should export the class, not the instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, holding this class as singleton and using the same instance can help us in the future if we will need to share things among classes. since we don't have a proper DI we can either do what I did to mimic singleton or to export the class which means that every consumer will hold its own instance.
BTW I would go with ` export { taskManager, TaskManager } which will allow the user to choose.
I don't mind changing that if you still want me to, let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal I usually try to keep stuff simple and to avoid unnecessary "mizgun" of code.
I avoid Singletons if not needed. I wouldn't hesitate using them if I really need them, but that's not the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what 'mizgun' means but sure, I will change it as requested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal 😄
"mizgun" comes out of the Hebrew saying, "Preparing for Air condition".
The meaning for it here is, setting up stuff in preparation for a future addition, which may or may not ever come.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eshaham It's funny I use this term a lot so I should have figure it out, I usually say הכנה למזגן so it is very close. :)
} | ||
|
||
async run() { | ||
if (!this._taskName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal go back above seems to be returning an empty string, but it doesn't seem like the code handles that use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice Catch! fixed
return Object.keys(SCRAPERS).map((scraperId) => { | ||
const result = { value: scraperId, name: SCRAPERS[scraperId].name }; | ||
const hasCredentials = this._taskData.scrapers | ||
.findIndex(scraper => scraper.id === scraperId) !== -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal you can just use find
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
message: 'How many months do you want to scrape (1-12)?', | ||
default: dateDiffByMonth, | ||
filter: (value) => { | ||
if (Number.isFinite(value) && !Number.isNaN(value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal does this ever happen? Doesn't input
field type always returns a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the value is saved as a number in the json file, if you don't modify it the default value will be a number, if you type it inquirer returns it as string
filter: (value) => { | ||
if (Number.isFinite(value) && !Number.isNaN(value)) { | ||
return value; | ||
} else if (typeof value === 'string' && value.match(/^[0-9]+$/)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal wouldn't this be simpler?
try {
return parseInt(value, 10);
} catch (e) {
console.log(...);
return null;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseInt
is not reliable for this specific task, see the following, an example taken from the link above:
var g = parseInt("He was 40") + "<br>";
console.log(g); // will print 40
name: 'scraperId', | ||
message: 'Select a scraper', | ||
when: (answers) => { | ||
if (answers.action === 'delete') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal would be nice you could extract the action names into an enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but since they are relevant only in the scope of function I don't think it will contribute having them as a global constants. each function can declare its own actions and there is not connection between those functions
@eshaham a spoiler alert, this is going to be a bit long comment, I tried to cover everything. I will work on the inline comments soon.
Thanks! it feels very powerful to scrape all the accounts at the same time.
When I created the PR I was really hoping that you and I are on the same side of this debate :) I don't understand how private properties/methods were left out of the specification. It is be very hard for me to remove the
We are on the same page, having smaller incremental PRs are easier to follow. This is the reason I created #29 and #30 to begin with. But I don't think there is an intuitive way to split the rest of the code. This PR deals with managing & scraping tasks and splitting those two tasks is irrelevant. As a side note, I would like to recommend an awesome reviewing tool that I integrated into my company development process which was essential to the success of our project and is called reviewable.io. it is marked as beta but I think it is like that just because they forgot to remove the label. I love this tool and I think it has everything you dream for in reviewing service. Obviously you don't need to use it with the project unless you want to give it a try, but I owe it to the developers of this service to talk about it whenever I can.
Sure, will do
Done
It is the minimum I can do, I really appreciate your work on those libraries and your contribution to the open banking concept in Israel. |
Could you TL;DR about what's so great about reviewable.io? |
I think the following article layout nicely the reasons why reviewable.io transcends github reviewing support. I think it sums up to whether you believe that code reviews are essential part of the development process, if you do then once you will start using it you will not look back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will push my fixes in a minute
.eslintrc
Outdated
@@ -3,6 +3,8 @@ | |||
"arrow-body-style": 0, | |||
"no-shadow": 0, | |||
"no-console": 0, | |||
"no-underscore-dangle": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this line and will fix the code accordingly
src/helpers/files.js
Outdated
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good question, whether not having a file should throw an exception even if the purpose of this function is to remove that file. I think that a developer wants to delete a file, it doesn't matter for him if the file was not there to begin with as long as this file is not there once this method complete its execution.
src/helpers/tasks-manager.js
Outdated
import { TASKS_FOLDER } from '../definitions'; | ||
import { SCRAPERS } from '../helpers/scrapers'; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/helpers/tasks-manager.js
Outdated
async getTasksList() { | ||
const files = await getFolderFiles(TASKS_FOLDER, '.json'); | ||
const result = files.map(file => path.basename(file, '.json')); | ||
result.sort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
src/helpers/tasks-manager.js
Outdated
} | ||
|
||
async loadTask(taskName) { | ||
if (taskName && this.isValidTaskName(taskName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
|
||
async run() { | ||
if (!this._taskName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice Catch! fixed
return Object.keys(SCRAPERS).map((scraperId) => { | ||
const result = { value: scraperId, name: SCRAPERS[scraperId].name }; | ||
const hasCredentials = this._taskData.scrapers | ||
.findIndex(scraper => scraper.id === scraperId) !== -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
message: 'How many months do you want to scrape (1-12)?', | ||
default: dateDiffByMonth, | ||
filter: (value) => { | ||
if (Number.isFinite(value) && !Number.isNaN(value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the value is saved as a number in the json file, if you don't modify it the default value will be a number, if you type it inquirer returns it as string
filter: (value) => { | ||
if (Number.isFinite(value) && !Number.isNaN(value)) { | ||
return value; | ||
} else if (typeof value === 'string' && value.match(/^[0-9]+$/)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseInt
is not reliable for this specific task, see the following, an example taken from the link above:
var g = parseInt("He was 40") + "<br>";
console.log(g); // will print 40
name: 'scraperId', | ||
message: 'Select a scraper', | ||
when: (answers) => { | ||
if (answers.action === 'delete') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but since they are relevant only in the scope of function I don't think it will contribute having them as a global constants. each function can declare its own actions and there is not connection between those functions
@@ -3,6 +3,8 @@ | |||
"arrow-body-style": 0, | |||
"no-shadow": 0, | |||
"no-console": 0, | |||
"no-underscore-dangle": ["error", { "allow": ["_private"] }], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I found a compromise imo.
- for private methods - I moved them outside of the class if they didn't use the state of the instance , for others I removed the prefix and set
@private
flag of jsdoc - for private properties - I used the following strategy to provide true support for private properties. I remember that you prefer not disabling lint by row so I added the exception of
_private
that as you can see is private by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal ok, better I guess.
I still need to digest this idea, but I guess it's the least bad option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, let me know :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal my way of saying sababa
@eshaham I pushed some improvements/fixes to this pr:
I know this PR is already big so I try to to a few adjustments as possible. Hopefully you are experimenting reviewable.io with this branch and if this is the case so it should be much simpler to follow the adjustments. Have a great weekend |
@eshaham if you still find it difficult to review, I can split this one into two separated branches/prs:
There are some complications with this separation:
I know it is hard to follow big reviews when using github PR dashboard so if you think it worth the effort we can do it and tackle the complications as we go. Please lmk if you want to continue with the PR or with the suggested alternative |
@esakal let's keep this PR, as we've already made some progress with it. |
thanks, I will wait for your feedback. Regarding reviewable.io, I also almost ruled out reviewable. But since my work project rely heavily on reviews we learned over time that once you start working with reviewable you don't feel the need to use github dashboard anymore. This is my personal experience but I understand your concerns about the comments becoming top level comments instead of inline comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal finally got around to review this PR, left some more comments.
@@ -3,6 +3,8 @@ | |||
"arrow-body-style": 0, | |||
"no-shadow": 0, | |||
"no-console": 0, | |||
"no-underscore-dangle": ["error", { "allow": ["_private"] }], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal ok, better I guess.
I still need to digest this idea, but I guess it's the least bad option
src/helpers/files.js
Outdated
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal I disagree.
Most file APIs I know, as well as OS commands, will return an error if the file you're trying to delete doesn't exist.
src/constants.js
Outdated
@@ -1,2 +1,3 @@ | |||
const PASSWORD_FIELD = 'password'; | |||
export default PASSWORD_FIELD; | |||
const DATE_AND_TIME_MOMENT_FORMAT = 'DD-MM-YYYY_HH-mm-ss'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal mind changing this to DATE_TIME_FORMAT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/helpers/tasks-manager.js
Outdated
|
||
const taskManager = new TaskManager(); | ||
|
||
export default taskManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal I usually try to keep stuff simple and to avoid unnecessary "mizgun" of code.
I avoid Singletons if not needed. I wouldn't hesitate using them if I really need them, but that's not the case here.
src/scrape/scrape-individual.js
Outdated
} else { | ||
console.log(`error type: ${result.errorType}`); | ||
console.log('error:', result.errorMessage); | ||
console.log('error:', e.message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal console.error
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed back to console.error
]); | ||
|
||
switch (answers.action) { | ||
case 'summary': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal can we use constants for the different actions?
I saw your comment about not making these global enums, but at least re-use strings under the same function.
This comment also applies to other such functions in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eshaham I can do that. Before I change the code I would like to confirm the convention you want me to use when adding those constants at the beginning of the functions.
Do you prefer using the global constants convention like const SUMMARY_ACTION
or the regular convention of variable like const summaryAction
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal good question. I think uppercase SUMMARY_ACTION
is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/helpers/tasks-manager.js
Outdated
console.log(`- ${colors.bold(key)}: ${value}`); | ||
} | ||
|
||
async printTaskSummary(taskName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think we should consider adding it as one of the options to do on a task in the setup process, what do you think?
@esakal sounds like a good idea, but let's do that on a separate PR 😄
because there might be a feature request for printing a summary of all tasks or specific task as part of the setup process
I would still put it on a different location - either a helper procedural function. Not even sure it belongs to a Task
class, as it it purely presentational.
src/scrape/generate-reports.js
Outdated
return acc; | ||
}, []); | ||
const filePath = `${saveLocation}/${moment().format('DD-MM-YYYY_HH-mm-ss')}.csv`; | ||
const fileFields = ['Institude', 'Account', 'Date', 'Payee', 'Inflow', 'Installment', 'Total']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Bank here instead will not fit since we scrape also credit cards companies, can you recommend other alternatives?
hmm, ok. But I would love to be consistent. I've used "bank" in the past to describe a credit card account as well. I've also used companyId
. Institute is an option I guess, but we will need to fix across both repos.
Your call.
It is needed in single report since you need to associate each transaction to its scraper/account. The separated report doesn't need institute and Account but I don't mind adding them there as well for readability.
const FIELDS = ['Date', 'Payee', 'Inflow', 'Installment', 'Total'];
const EXTRA_FIELDS = ['Institude', 'Account', ...FIELDS];
Wouldn't that be simpler?
src/scrape/generate-reports.js
Outdated
const filePath = `${saveLocation}/${moment().format('DD-MM-YYYY_HH-mm-ss')}.csv`; | ||
const fileFields = ['Institude', 'Account', 'Date', 'Payee', 'Inflow', 'Installment', 'Total']; | ||
const fileContent = json2csv({ data: fileTransactions, fileFields, withBOM: true }); | ||
await writeFile(filePath, fileContent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal I meant something extracting code to a different function in this class. But now I see it may be more complicated and the benefit is too small, so let's drop it.
src/scrape/scrape-task.js
Outdated
const saveLocation = `${saveLocationRootPath}/tasks/${taskName}`; | ||
await generateSingleReport(scrapedAccounts, saveLocation); | ||
} else { | ||
const currentExecutionFolder = moment().format('DD-MM-YYYY_HH-mm-ss'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal yeah, I got mixed emotions about this rule as well. Let's keep it for now though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eshaham I have some followup questions if you can review them.
You don't need to review the code at the moment because I'm delaying the push until all your comments will be answered. So you can ignore comments that describe changes in the code as you will not see them at the moment.
@@ -3,6 +3,8 @@ | |||
"arrow-body-style": 0, | |||
"no-shadow": 0, | |||
"no-console": 0, | |||
"no-underscore-dangle": ["error", { "allow": ["_private"] }], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, let me know :)
src/constants.js
Outdated
@@ -1,2 +1,3 @@ | |||
const PASSWORD_FIELD = 'password'; | |||
export default PASSWORD_FIELD; | |||
const DATE_AND_TIME_MOMENT_FORMAT = 'DD-MM-YYYY_HH-mm-ss'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/helpers/files.js
Outdated
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, i removed the existsAsync
check condition and following your comment I assume that node fs.unlink
function will result with error if the file is missing.
btw, I don't have a machine with microsoft OS to check this out, I assume node handles both OS when dealing with the file system.
src/helpers/tasks-manager.js
Outdated
|
||
const taskManager = new TaskManager(); | ||
|
||
export default taskManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what 'mizgun' means but sure, I will change it as requested
src/scrape/scrape-individual.js
Outdated
} else { | ||
console.log(`error type: ${result.errorType}`); | ||
console.log('error:', result.errorMessage); | ||
console.log('error:', e.message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed back to console.error
]); | ||
|
||
switch (answers.action) { | ||
case 'summary': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eshaham I can do that. Before I change the code I would like to confirm the convention you want me to use when adding those constants at the beginning of the functions.
Do you prefer using the global constants convention like const SUMMARY_ACTION
or the regular convention of variable like const summaryAction
?
@eshaham I handled all the comments we currently have in this PR.
Night! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@esakal looks good 😄 👍 👍
@esakal I don't believe in squashing commits, I like to see the evolution of the PR in the commit list. |
Overview
Motivation
This PR was created following #28.
A task structure
A task defines a set of scrapers and relevant scraping options. A user can define multiple tasks that serve different budgets (for example separated work and home budget).
An example for a real task can be:
Setup a task
A user can manage tasks using the existing setup command
npm run setup
. I was working on the setup process to be as friendly as possible.You can see the task setup menu structure here.
Changes in the repository
New dependencies
Linting adjustments
Splitting individual scrape code
A task is basically a group of individual scrapers with predefined scraping options. So basically we want to run the individual scraper multiple times when executing a task.
To keep DRY principle I needed to split the original scrape-individual.js file into two files. The first file handles the actual scraping and is used both by the individual scraper and the task scraper. The second file is the entry point when the user wants to scrape individual scraper and is used to get the relevant parameters from the user.
Extracting the report generation
A task scraper support two types of output, a single report that combine all the scrapped accounts together and for backward compatible it also allow creating multiple reports, one per each account.
For the same reason as explained above to keep the DRY principle I extracted the report generation into a separated file.
That's it, all the remaining files are there mostly to setup and scrape a task.
@eshaham Please review. Thanks!
Closes #28
This change is